Thread overview
UTF8/16 always 8/16 bits ?
Apr 22, 2004
Ben Hinkle
Apr 22, 2004
Scott Egan
April 22, 2004
The unicode standard says that UTF8 and UTF16 characters vary in size. How D handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16 chars are always 16 bits ?


April 22, 2004
On Thu, 22 Apr 2004 11:57:31 +0000 (UTC), Achilleas Margaritis
<Achilleas_member@pathlink.com> wrote:

>The unicode standard says that UTF8 and UTF16 characters vary in size. How D handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16 chars are always 16 bits ?

In std.utf
http://www.digitalmars.com/d/phobos.html#utf
there are functions like
 dchar decode(char[] s, inout uint idx)
that take a UTF8 char[] and an index and return the UTF32 codepoint
and advances the index by one or more bytes. The regular array
indexing [] doesn't know about multi-slot characters.

-Ben
April 22, 2004
It doesn't although they are called UTF-8 and UTF-16 they are just arrays of appropriate lengh chars.

The O/S is what really has to deal with them as Unicode.

This means of course that using indexes against the char[] and mucking aroung with the data you may end up with invalid unicode.

telle est la vie


"Achilleas Margaritis" <Achilleas_member@pathlink.com> wrote in message news:c68bvb$1vgk$1@digitaldaemon.com...
> The unicode standard says that UTF8 and UTF16 characters vary in size. How
D
> handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16
chars
> are always 16 bits ?
>
>