Thread overview
Unicode, graphemes and D
Apr 05, 2012
bearophile
Apr 05, 2012
Dmitry Olshansky
Apr 05, 2012
stephan
Apr 05, 2012
Dmitry Olshansky
Apr 05, 2012
stephan
April 05, 2012
For people interested in a better Unicode handling in D, I have seen that Perl has some support for graphemes, /\X/ matches an extended grapheme cluster:

http://perldoc.perl.org/perl5120delta.html#Unicode-overhaul

http://perldoc.perl.org/perluniprops.html

Perl seems one of the best languages to manage Unicode (D and Go too are good): http://rosettacode.org/wiki/String_length#Grapheme_Length_2

Bye,
bearophile
April 05, 2012
On 05.04.2012 15:53, bearophile wrote:
> For people interested in a better Unicode handling in D, I have seen that Perl has some support for graphemes, /\X/ matches an extended grapheme cluster:
>
> http://perldoc.perl.org/perl5120delta.html#Unicode-overhaul
>
> http://perldoc.perl.org/perluniprops.html
>
> Perl seems one of the best languages to manage Unicode (D and Go too are good):
> http://rosettacode.org/wiki/String_length#Grapheme_Length_2
>
> Bye,
> bearophile

FYI
http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/dolsh/20002#

-- 
Dmitry Olshansky
April 05, 2012
>
> FYI
> http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/dolsh/20002#

Maybe helpful for your GSOC project: as part of a larger code base, we have implemented many standard Unicode algorithms (normalization; casefolding; graphemes; info like general category, Bidi class, joining type, etc.; ...).

The doc and source can be found at http://stephan.bitbucket.org/. As this was just a helper, it is not fully polished (but it works and is reasonably fast).
April 05, 2012
On 05.04.2012 18:56, stephan wrote:
>
>>
>> FYI
>> http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/dolsh/20002#
>>
>
> Maybe helpful for your GSOC project: as part of a larger code base, we
> have implemented many standard Unicode algorithms (normalization;
> casefolding; graphemes; info like general category, Bidi class, joining
> type, etc.; ...).
>
> The doc and source can be found at http://stephan.bitbucket.org/. As
> this was just a helper, it is not fully polished (but it works and is
> reasonably fast).

Nice.
I'll add a link to my proposal. Though I can use it iff the license is Boost compatible.

-- 
Dmitry Olshansky
April 05, 2012
On Thursday, 5 April 2012 at 16:17:46 UTC, Dmitry Olshansky wrote:
> Though I can use it iff the license is Boost compatible.

Ah, the licensing question. I am not a lawyer and I don't know much about copyright law. So you have to do your own research. But here is my view regarding the unicodedata.d license situation.

Our code is Boost licensed. It is however not a clean-room installation. Although almost all algorithms and data structures are different and there is minimal (and clearly marked) direct copying, we have looked quite a bit at the ICU implementation (and its predecessors) for inspiration. The ICU license is very permissive, hence you should be ok here.

Furthermore, data files from the Unicode Consortium are part of the distribution. They are used in the "script mode" (version SCRIPT_DATA) to generate the relevant Unicode data in an appropriate format. Furthermore, they are used in the extensive unit tests (version ALL_UNIT_TESTS) for testing correctness against various test files and derived property files. Again, the data files have a very permissive license.

Let me know if I can be of any help.