Thread overview
Whitespace for Walter
Jun 26, 2004
Arcane Jill
Jun 27, 2004
Phill
Jun 27, 2004
Walter
June 26, 2004
Another mad suggestion coming up, but this one might actually make some sort of sense.

Unicode whitespace is defined as any of the following characters, and no other:

0009..000D    <control-0009>..<control-000D>
0020          SPACE
0085          <control-0085>
00A0          NO-BREAK SPACE
1680          OGHAM SPACE MARK
180E          MONGOLIAN VOWEL SEPARATOR
2000..200A    EN QUAD..HAIR SPACE
2028          LINE SEPARATOR
2029          PARAGRAPH SEPARATOR
202F          NARROW NO-BREAK SPACE
205F          MEDIUM MATHEMATICAL SPACE
3000          IDEOGRAPHIC SPACE

How straightforward would it be to allow the DMD compiler to accept /precisely/ this list as whitespace in a D source file?

Java got itself into a bit of a pickle by defining whitespace differently from
Unicode. They ended up having to have two separate functions (which from memory
I think are called isWhitespace() and isJavaWhitespace(), but I could be wrong).

It would be quite cool to have D whitespace and Unicode whitespace as one and the same thing, don't you think?

Arcane Jill

PS. I /don't/ reccommend changing the value of const char[] whitespace; in std.string, however. To do so would set an AWFUL precedent which const char letters would NOT want to follow. You might, however, consider renaming those constants to ASCII_WHITESPACE, ASCII_LETTERS, etc., once the new Unicode stuff is up.



June 27, 2004
"Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:cbkqcc$usj$1@digitaldaemon.com...
> Another mad suggestion coming up, but this one might actually make some
sort of
> sense.
>
> Unicode whitespace is defined as any of the following characters, and no
other:
>
> 0009..000D    <control-0009>..<control-000D>
> 0020          SPACE
> 0085          <control-0085>
> 00A0          NO-BREAK SPACE
> 1680          OGHAM SPACE MARK
> 180E          MONGOLIAN VOWEL SEPARATOR
> 2000..200A    EN QUAD..HAIR SPACE
> 2028          LINE SEPARATOR
> 2029          PARAGRAPH SEPARATOR
> 202F          NARROW NO-BREAK SPACE
> 205F          MEDIUM MATHEMATICAL SPACE
> 3000          IDEOGRAPHIC SPACE
>
> How straightforward would it be to allow the DMD compiler to accept
/precisely/
> this list as whitespace in a D source file?
>
> Java got itself into a bit of a pickle by defining whitespace differently
from
> Unicode. They ended up having to have two separate functions (which from
memory
> I think are called isWhitespace() and isJavaWhitespace(), but I could be
wrong).
>
In Java:
Character.isSpace(char c)
 is deprecated and replaced by
Character.isWhiteSpace(char c)
Also
Character.isSpace(char ch)
for the Unicode space char.

There is no "isJavaWhitespace()" or
"Whitespace()"

There is Character.isJavaLetterOrDigit(char c) which
is deprecated, maybe you were confused with this.

Phill.







June 27, 2004
I think it's a good idea.

"Arcane Jill" <Arcane_member@pathlink.com> wrote in message news:cbkqcc$usj$1@digitaldaemon.com...
> Another mad suggestion coming up, but this one might actually make some
sort of
> sense.
>
> Unicode whitespace is defined as any of the following characters, and no
other:
>
> 0009..000D    <control-0009>..<control-000D>
> 0020          SPACE
> 0085          <control-0085>
> 00A0          NO-BREAK SPACE
> 1680          OGHAM SPACE MARK
> 180E          MONGOLIAN VOWEL SEPARATOR
> 2000..200A    EN QUAD..HAIR SPACE
> 2028          LINE SEPARATOR
> 2029          PARAGRAPH SEPARATOR
> 202F          NARROW NO-BREAK SPACE
> 205F          MEDIUM MATHEMATICAL SPACE
> 3000          IDEOGRAPHIC SPACE
>
> How straightforward would it be to allow the DMD compiler to accept
/precisely/
> this list as whitespace in a D source file?
>
> Java got itself into a bit of a pickle by defining whitespace differently
from
> Unicode. They ended up having to have two separate functions (which from
memory
> I think are called isWhitespace() and isJavaWhitespace(), but I could be
wrong).
>
> It would be quite cool to have D whitespace and Unicode whitespace as one
and
> the same thing, don't you think?
>
> Arcane Jill
>
> PS. I /don't/ reccommend changing the value of const char[] whitespace; in std.string, however. To do so would set an AWFUL precedent which const
char
> letters would NOT want to follow. You might, however, consider renaming
those
> constants to ASCII_WHITESPACE, ASCII_LETTERS, etc., once the new Unicode
stuff
> is up.
>
>
>