Suggestion: char.init, wchar.init and dchar.init

Jun 07, 2004

Arcane Jill

Jun 07, 2004

Ilya Minkov

Jun 07, 2004

Jun 07, 2004

Jun 07, 2004

Jun 07, 2004

Hi, The default value of NaN for floating point numbers is an excellent idea. I suggest that we do the same thing for chars, wchars and dchars. The init value for char should (IMO) be 0xFF. Rationale - char by definition contains a UTF-8 fragment. The byte 0xFF will never occur in a valid UTF-8 sequence. It is a clear indication of an unassigned value. The init value for wchar and dchar should be 0xFFFF (that is, 0x0000FFFF for dchar). Rationale - wchar and dchar by definiton contain UTF-16 and UTF-32 (equivalent to plain Unicode within their defined ranges). The codepoint U+FFFF is not a legitimate Unicode character, and, furthermore, it is guaranteed by the Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode character. This codepoint will remain forever unassigned, precisely so that it may be used for purposes such as this. Be it noted that that the codepoint 0 is a bad choice for a default value. It might have made sense in C, where '\0' has special meaning as a string terminator, but in D '\0' is just another character. Unicode defines '\0' as a control character whose interpretation is implementation dependent. Better, I feel, to use a value with universal meaning. Jill

That's a good idea. "Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:ca17qq$224t$1@digitaldaemon.com... > Hi, > > The default value of NaN for floating point numbers is an excellent idea. I > suggest that we do the same thing for chars, wchars and dchars. > > The init value for char should (IMO) be 0xFF. Rationale - char by definition > contains a UTF-8 fragment. The byte 0xFF will never occur in a valid UTF-8 sequence. It is a clear indication of an unassigned value. > > The init value for wchar and dchar should be 0xFFFF (that is, 0x0000FFFF for > dchar). Rationale - wchar and dchar by definiton contain UTF-16 and UTF-32 > (equivalent to plain Unicode within their defined ranges). The codepoint U+FFFF > is not a legitimate Unicode character, and, furthermore, it is guaranteed by the > Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode character. > This codepoint will remain forever unassigned, precisely so that it may be used > for purposes such as this. > > Be it noted that that the codepoint 0 is a bad choice for a default value. It > might have made sense in C, where '\0' has special meaning as a string terminator, but in D '\0' is just another character. Unicode defines '\0' as a > control character whose interpretation is implementation dependent. Better, I > feel, to use a value with universal meaning. > > Jill > >

June 07, 2004

Re: Suggestion: char.init, wchar.init and dchar.init

Posted by Hauke Duden
in reply to Arcane Jill

Permalink

Hauke Duden

Posted in reply to Arcane Jill

Permalink

Arcane Jill wrote:
> Hi,
> 
> The default value of NaN for floating point numbers is an excellent idea. I
> suggest that we do the same thing for chars, wchars and dchars.
> 
> The init value for char should (IMO) be 0xFF. Rationale - char by definition
> contains a UTF-8 fragment. The byte 0xFF will never occur in a valid UTF-8
> sequence. It is a clear indication of an unassigned value.
> 
> The init value for wchar and dchar should be 0xFFFF (that is, 0x0000FFFF for
> dchar). Rationale - wchar and dchar by definiton contain UTF-16 and UTF-32
> (equivalent to plain Unicode within their defined ranges). The codepoint U+FFFF
> is not a legitimate Unicode character, and, furthermore, it is guaranteed by the
> Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode character.
> This codepoint will remain forever unassigned, precisely so that it may be used
> for purposes such as this.
> 
> Be it noted that that the codepoint 0 is a bad choice for a default value. It
> might have made sense in C, where '\0' has special meaning as a string
> terminator, but in D '\0' is just another character. Unicode defines '\0' as a
> control character whose interpretation is implementation dependent. Better, I
> feel, to use a value with universal meaning.

I like the 0 initialization. It is consistent and easy to understand and remember.

And it has an important function. If anyone ever passes an uninitialized D memory block to functions that expect a 0-terminated string then nothing bad will happen.

But then again, I also don't like that floats are initialized to NaN.

If it HAS to be done then there should definitely be an easy-to-remember property for the char types to test for this. Otherwise many programmers will have a hard time remembering which value means "not a char".

Hauke

> If it HAS to be done then there should definitely be an easy-to-remember property for the char types to test for this. Otherwise many programmers will have a hard time remembering which value means "not a char". .init?

In article <ca2754$h5k$1@digitaldaemon.com>, Hauke Duden says... >If it HAS to be done then there should definitely be an easy-to-remember property for the char types to test for this. Otherwise many programmers will have a hard time remembering which value means "not a char". You're not supposed to /test/ for uninitialized variables - you're simply supposed to initialize them! And that error, of course is exactly what we're trying to catch. Anyway, you could always test for "if (c == char.init)" no matter what char.init was. By the way, I got to look at your Unichar code today. Excellent stuff. It's on my machine now. Also, you were right about doxygen, judging by the quality of your documentation - it really does rock. Jill

Forums