Thread overview | |||||
---|---|---|---|---|---|
|
April 22, 2004 UTF8/16 always 8/16 bits ? | ||||
---|---|---|---|---|
| ||||
The unicode standard says that UTF8 and UTF16 characters vary in size. How D handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16 chars are always 16 bits ? |
April 22, 2004 Re: UTF8/16 always 8/16 bits ? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Achilleas Margaritis | On Thu, 22 Apr 2004 11:57:31 +0000 (UTC), Achilleas Margaritis <Achilleas_member@pathlink.com> wrote: >The unicode standard says that UTF8 and UTF16 characters vary in size. How D handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16 chars are always 16 bits ? In std.utf http://www.digitalmars.com/d/phobos.html#utf there are functions like dchar decode(char[] s, inout uint idx) that take a UTF8 char[] and an index and return the UTF32 codepoint and advances the index by one or more bytes. The regular array indexing [] doesn't know about multi-slot characters. -Ben |
April 22, 2004 Re: UTF8/16 always 8/16 bits ? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Achilleas Margaritis | It doesn't although they are called UTF-8 and UTF-16 they are just arrays of appropriate lengh chars. The O/S is what really has to deal with them as Unicode. This means of course that using indexes against the char[] and mucking aroung with the data you may end up with invalid unicode. telle est la vie "Achilleas Margaritis" <Achilleas_member@pathlink.com> wrote in message news:c68bvb$1vgk$1@digitaldaemon.com... > The unicode standard says that UTF8 and UTF16 characters vary in size. How D > handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16 chars > are always 16 bits ? > > |
Copyright © 1999-2021 by the D Language Foundation