UTF8/16 always 8/16 bits ? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » D » UTF8/16 always 8/16 bits ?

Thread overview

UTF8/16 always 8/16 bits ?
Apr 22, 2004 Achilleas Margaritis
Apr 22, 2004 Ben Hinkle
Apr 22, 2004 Scott Egan

April 22, 2004

UTF8/16 always 8/16 bits ?

Posted by Achilleas Margaritis

Achilleas Margaritis

The unicode standard says that UTF8 and UTF16 characters vary in size. How D handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16 chars are always 16 bits ?

April 22, 2004

Re: UTF8/16 always 8/16 bits ?

Posted by Ben Hinkle
in reply to Achilleas Margaritis

Ben Hinkle

Posted in reply to Achilleas Margaritis

On Thu, 22 Apr 2004 11:57:31 +0000 (UTC), Achilleas Margaritis
<Achilleas_member@pathlink.com> wrote:

>The unicode standard says that UTF8 and UTF16 characters vary in size. How D handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16 chars are always 16 bits ?

In std.utf
http://www.digitalmars.com/d/phobos.html#utf
there are functions like
 dchar decode(char[] s, inout uint idx)
that take a UTF8 char[] and an index and return the UTF32 codepoint
and advances the index by one or more bytes. The regular array
indexing [] doesn't know about multi-slot characters.

-Ben

April 22, 2004

Re: UTF8/16 always 8/16 bits ?

Posted by Scott Egan
in reply to Achilleas Margaritis

Scott Egan

Posted in reply to Achilleas Margaritis

It doesn't although they are called UTF-8 and UTF-16 they are just arrays of appropriate lengh chars.

The O/S is what really has to deal with them as Unicode.

This means of course that using indexes against the char[] and mucking aroung with the data you may end up with invalid unicode.

telle est la vie


"Achilleas Margaritis" <Achilleas_member@pathlink.com> wrote in message news:c68bvb$1vgk$1@digitaldaemon.com...
> The unicode standard says that UTF8 and UTF16 characters vary in size. How
D
> handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16
chars
> are always 16 bits ?
>
>

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation