Documentation Error

Hi, Not sure if this is the right place to report this. I am very, VERY impressed with D - especially with the UTF support. Spending some time learning D now. But there's an error in the documentation of the Basic Data Types. It says: "char = unsigned 8 bit ASCII". I would point out that ASCII, *BY DEFINITION* is only seven bits wide, and therefore that the phrase "8 bit ASCII" is meaningless, and open to all sorts of possible misinterpretation. Three corrections are possible, and I don't know which one is right: 1. char = unsigned 7 bit ASCII. 2. char = unsigned 8 bit UTF-8 3. char = unsigned 8-bit ISO 8059-1 (aka LATIN-1) Please note that while choice 3 is a subset of Unicode, it is incompatible with choice 2. Each of the three choices differ in how codepoints 0x80 to 0xFF are interpretted. Specifically: 1. ASCII - codepoints 0x80 to 0xFF are undefined 2. UTF-8 - codepoints 0x80 to 0xFF are reserved for proper UTF-8 encoding 3. ISO 8059-1 - codepoints 0x80 to 0xFF represent Unicode chars U+0080 to U+00FF. This seems a simple thing to fix. If this is not the right place to report this, please can someone point me to the right place. Thanks.

March 12, 2004

Re: Documentation Error

Posted by Walter
in reply to Unicode User

Permalink

Walter

Posted in reply to Unicode User

Permalink

You must be looking at an old version. The current doc defines char as unsigned 8 bit UTF-8. -Walter

"Unicode User" <Unicode_member@pathlink.com> wrote in message news:c2pgq5$1tnc$1@digitaldaemon.com...
> Hi,
>
> Not sure if this is the right place to report this. I am very, VERY
impressed
> with D - especially with the UTF support. Spending some time learning D
now.
>
> But there's an error in the documentation of the Basic Data Types. It
says:
> "char = unsigned 8 bit ASCII".
>
> I would point out that ASCII, *BY DEFINITION* is only seven bits wide, and therefore that the phrase "8 bit ASCII" is meaningless, and open to all
sorts of
> possible misinterpretation. Three corrections are possible, and I don't
know
> which one is right:
> 1. char = unsigned 7 bit ASCII.
> 2. char = unsigned 8 bit UTF-8
> 3. char = unsigned 8-bit ISO 8059-1 (aka LATIN-1)
>
> Please note that while choice 3 is a subset of Unicode, it is incompatible
with
> choice 2. Each of the three choices differ in how codepoints 0x80 to 0xFF
are
> interpretted. Specifically:
> 1. ASCII - codepoints 0x80 to 0xFF are undefined
> 2. UTF-8 - codepoints 0x80 to 0xFF are reserved for proper UTF-8 encoding
> 3. ISO 8059-1 - codepoints 0x80 to 0xFF represent Unicode chars U+0080 to
> U+00FF.
>
> This seems a simple thing to fix. If this is not the right place to report
this,
> please can someone point me to the right place. Thanks.
>
>

Forums