Thread overview
Suggestion: char.init, wchar.init and dchar.init
Jun 07, 2004
Arcane Jill
Jun 07, 2004
Ilya Minkov
Jun 07, 2004
Walter
Jun 07, 2004
Hauke Duden
Jun 07, 2004
Ben Hinkle
Jun 07, 2004
Arcane Jill
June 07, 2004
Hi,

The default value of NaN for floating point numbers is an excellent idea. I suggest that we do the same thing for chars, wchars and dchars.

The init value for char should (IMO) be 0xFF. Rationale - char by definition contains a UTF-8 fragment. The byte 0xFF will never occur in a valid UTF-8 sequence. It is a clear indication of an unassigned value.

The init value for wchar and dchar should be 0xFFFF (that is, 0x0000FFFF for dchar). Rationale - wchar and dchar by definiton contain UTF-16 and UTF-32 (equivalent to plain Unicode within their defined ranges). The codepoint U+FFFF is not a legitimate Unicode character, and, furthermore, it is guaranteed by the Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode character. This codepoint will remain forever unassigned, precisely so that it may be used for purposes such as this.

Be it noted that that the codepoint 0 is a bad choice for a default value. It might have made sense in C, where '\0' has special meaning as a string terminator, but in D '\0' is just another character. Unicode defines '\0' as a control character whose interpretation is implementation dependent. Better, I feel, to use a value with universal meaning.

Jill


June 07, 2004
Gets my vote!

-eye
June 07, 2004
That's a good idea.

"Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:ca17qq$224t$1@digitaldaemon.com...
> Hi,
>
> The default value of NaN for floating point numbers is an excellent idea.
I
> suggest that we do the same thing for chars, wchars and dchars.
>
> The init value for char should (IMO) be 0xFF. Rationale - char by
definition
> contains a UTF-8 fragment. The byte 0xFF will never occur in a valid UTF-8 sequence. It is a clear indication of an unassigned value.
>
> The init value for wchar and dchar should be 0xFFFF (that is, 0x0000FFFF
for
> dchar). Rationale - wchar and dchar by definiton contain UTF-16 and UTF-32
> (equivalent to plain Unicode within their defined ranges). The codepoint
U+FFFF
> is not a legitimate Unicode character, and, furthermore, it is guaranteed
by the
> Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode
character.
> This codepoint will remain forever unassigned, precisely so that it may be
used
> for purposes such as this.
>
> Be it noted that that the codepoint 0 is a bad choice for a default value.
It
> might have made sense in C, where '\0' has special meaning as a string terminator, but in D '\0' is just another character. Unicode defines '\0'
as a
> control character whose interpretation is implementation dependent.
Better, I
> feel, to use a value with universal meaning.
>
> Jill
>
>


June 07, 2004
Arcane Jill wrote:
> Hi,
> 
> The default value of NaN for floating point numbers is an excellent idea. I
> suggest that we do the same thing for chars, wchars and dchars.
> 
> The init value for char should (IMO) be 0xFF. Rationale - char by definition
> contains a UTF-8 fragment. The byte 0xFF will never occur in a valid UTF-8
> sequence. It is a clear indication of an unassigned value.
> 
> The init value for wchar and dchar should be 0xFFFF (that is, 0x0000FFFF for
> dchar). Rationale - wchar and dchar by definiton contain UTF-16 and UTF-32
> (equivalent to plain Unicode within their defined ranges). The codepoint U+FFFF
> is not a legitimate Unicode character, and, furthermore, it is guaranteed by the
> Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode character.
> This codepoint will remain forever unassigned, precisely so that it may be used
> for purposes such as this.
> 
> Be it noted that that the codepoint 0 is a bad choice for a default value. It
> might have made sense in C, where '\0' has special meaning as a string
> terminator, but in D '\0' is just another character. Unicode defines '\0' as a
> control character whose interpretation is implementation dependent. Better, I
> feel, to use a value with universal meaning.

I like the 0 initialization. It is consistent and easy to understand and remember.

And it has an important function. If anyone ever passes an uninitialized D memory block to functions that expect a 0-terminated string then nothing bad will happen.

But then again, I also don't like that floats are initialized to NaN.

If it HAS to be done then there should definitely be an easy-to-remember property for the char types to test for this. Otherwise many programmers will have a hard time remembering which value means "not a char".

Hauke
June 07, 2004
> If it HAS to be done then there should definitely be an easy-to-remember property for the char types to test for this. Otherwise many programmers will have a hard time remembering which value means "not a char".

.init?


June 07, 2004
In article <ca2754$h5k$1@digitaldaemon.com>, Hauke Duden says...

>If it HAS to be done then there should definitely be an easy-to-remember property for the char types to test for this. Otherwise many programmers will have a hard time remembering which value means "not a char".

You're not supposed to /test/ for uninitialized variables - you're simply supposed to initialize them! And that error, of course is exactly what we're trying to catch.

Anyway, you could always test for "if (c == char.init)" no matter what char.init
was.

By the way, I got to look at your Unichar code today. Excellent stuff. It's on my machine now. Also, you were right about doxygen, judging by the quality of your documentation - it really does rock.

Jill