March 11, 2004 Documentation Error | ||||
---|---|---|---|---|
| ||||
Hi, Not sure if this is the right place to report this. I am very, VERY impressed with D - especially with the UTF support. Spending some time learning D now. But there's an error in the documentation of the Basic Data Types. It says: "char = unsigned 8 bit ASCII". I would point out that ASCII, *BY DEFINITION* is only seven bits wide, and therefore that the phrase "8 bit ASCII" is meaningless, and open to all sorts of possible misinterpretation. Three corrections are possible, and I don't know which one is right: 1. char = unsigned 7 bit ASCII. 2. char = unsigned 8 bit UTF-8 3. char = unsigned 8-bit ISO 8059-1 (aka LATIN-1) Please note that while choice 3 is a subset of Unicode, it is incompatible with choice 2. Each of the three choices differ in how codepoints 0x80 to 0xFF are interpretted. Specifically: 1. ASCII - codepoints 0x80 to 0xFF are undefined 2. UTF-8 - codepoints 0x80 to 0xFF are reserved for proper UTF-8 encoding 3. ISO 8059-1 - codepoints 0x80 to 0xFF represent Unicode chars U+0080 to U+00FF. This seems a simple thing to fix. If this is not the right place to report this, please can someone point me to the right place. Thanks. |
March 12, 2004 Re: Documentation Error | ||||
---|---|---|---|---|
| ||||
Posted in reply to Unicode User | You must be looking at an old version. The current doc defines char as unsigned 8 bit UTF-8. -Walter "Unicode User" <Unicode_member@pathlink.com> wrote in message news:c2pgq5$1tnc$1@digitaldaemon.com... > Hi, > > Not sure if this is the right place to report this. I am very, VERY impressed > with D - especially with the UTF support. Spending some time learning D now. > > But there's an error in the documentation of the Basic Data Types. It says: > "char = unsigned 8 bit ASCII". > > I would point out that ASCII, *BY DEFINITION* is only seven bits wide, and therefore that the phrase "8 bit ASCII" is meaningless, and open to all sorts of > possible misinterpretation. Three corrections are possible, and I don't know > which one is right: > 1. char = unsigned 7 bit ASCII. > 2. char = unsigned 8 bit UTF-8 > 3. char = unsigned 8-bit ISO 8059-1 (aka LATIN-1) > > Please note that while choice 3 is a subset of Unicode, it is incompatible with > choice 2. Each of the three choices differ in how codepoints 0x80 to 0xFF are > interpretted. Specifically: > 1. ASCII - codepoints 0x80 to 0xFF are undefined > 2. UTF-8 - codepoints 0x80 to 0xFF are reserved for proper UTF-8 encoding > 3. ISO 8059-1 - codepoints 0x80 to 0xFF represent Unicode chars U+0080 to > U+00FF. > > This seems a simple thing to fix. If this is not the right place to report this, > please can someone point me to the right place. Thanks. > > |
Copyright © 1999-2021 by the D Language Foundation