October 22, 2015
On 21-Oct-2015 19:21, Shriramana Sharma wrote:
> Shriramana Sharma wrote:
>
>> iterating through a
>> string as a range will produce each semantically meaningful Unicode
>> character rather than each UTF-8 or UTF-16 codepoint, it does make sense
>> to do this.
>
> Dear me... I meant UTF-8 encoded byte, rather than "codepoint", since all
> characters have codepoints, but not all codepoints (such as the surrogates)
> correspond to characters.
>

Aye, careful here. Unicode is a slippery road... Not even talking of code units and code points, there are things like "abstract character" and "user-perceived character". well, I tried my best to summarize most of it at:
http://dlang.org/phobos/std_uni.html

-- 
Dmitry Olshansky
1 2
Next ›   Last »