Thread overview
char code
May 17, 2005
Hiroshi Sakurai
May 17, 2005
Ben Hinkle
May 17, 2005
Uwe Salomon
May 17, 2005
Uwe Salomon
May 17, 2005
Thomas Kuehne
May 17, 2005
Uwe Salomon
Re: char code [OT]
May 18, 2005
Thomas Kuehne
May 17, 2005
Thomas Kuehne
May 17, 2005
Hi.
this topic writen in 2ch BBS.
http://pc8.2ch.net/test/read.cgi/tech/1109933426/567

and Japanese D language wiki bugtrack. http://f17.aaa.livedoor.jp/~labamba/?BugTrack%2F13

Illegal non-ascii WYSIWYG string.

ver dmd0.123 /*code page utf8 */ private import std.stream; void main() { // valid char[] str = "ƒƒƒX‚—"; stdout.writeString(str); // valid output : E3 83 AF E3 83 AD E3 82 B9 EF BD 97 // invalid char[] str2 = r"ƒƒƒX‚—"; // or char[] str = `ƒƒƒX‚—`; stdout.writeString(str2); // invalid output : E3 E3 E3 EF return; }

thanks,
Hiroshi Sakurai.
sorry, my english is very poor. OTL


May 17, 2005
"Hiroshi Sakurai" <Hiroshi_member@pathlink.com> wrote in message news:d6bm67$cfr$1@digitaldaemon.com...
> Hi.
> this topic writen in 2ch BBS.
> http://pc8.2ch.net/test/read.cgi/tech/1109933426/567
>
> and Japanese D language wiki bugtrack. http://f17.aaa.livedoor.jp/~labamba/?BugTrack%2F13
>
> Illegal non-ascii WYSIWYG string.
>
> ver dmd0.123
> /*code page utf8 */
> private import std.stream;
> void main()
> {
> // valid
> char[] str = "fffX,-";
> stdout.writeString(str); // valid output : E3 83 AF E3 83 AD E3 82 B9 EF
> BD 97
> // invalid
> char[] str2 = r"fffX,-"; // or char[] str = `fffX,-`;
> stdout.writeString(str2); // invalid output : E3 E3 E3 EF
> return;
> }
>
> thanks,
> Hiroshi Sakurai.
> sorry, my english is very poor. OTL

I'm confused. Is the problem with raw strings like r"blah" or with std.stream? The Stream.writeString doesn't look at encodings so whatever is going wrong is happening before the call to writeString. Since I don't have the proper fonts or encoding support in my new reader I only see the raw string r"f..." with boxes in them so I can't tell what is actually in the source file you are trying to compile. The raw strings format is a sequence of bytes assumed to be in utf-8 encoding. Is that what is in your source file?

-Ben


May 17, 2005
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hiroshi Sakurai schrieb am Tue, 17 May 2005 02:50:47 +0000 (UTC):
> Hi.
> this topic writen in 2ch BBS.
> http://pc8.2ch.net/test/read.cgi/tech/1109933426/567
>
> and Japanese D language wiki bugtrack. http://f17.aaa.livedoor.jp/~labamba/?BugTrack%2F13
>
> Illegal non-ascii WYSIWYG string.
>
> ver dmd0.123 /*code page utf8 */ private import std.stream; void main() { // valid char[] str = "ƒƒƒX‚—"; stdout.writeString(str); // valid output : E3 83 AF E3 83 AD E3 82 B9 EF BD 97 // invalid char[] str2 = r"ƒƒƒX‚—"; // or char[] str = `ƒƒƒX‚—`; stdout.writeString(str2); // invalid output : E3 E3 E3 EF return; }

Added to DStress as http://dstress.kuehne.cn/run/u/unicode_08_A.d http://dstress.kuehne.cn/run/u/unicode_08_B.d http://dstress.kuehne.cn/run/u/unicode_08_C.d http://dstress.kuehne.cn/run/u/unicode_08_D.d

> sorry, my english is very poor. OTL
I could understand your message, thus your English can't be that bad ;)

Thomas


-----BEGIN PGP SIGNATURE-----

iD8DBQFCiiHX3w+/yD4P9tIRAm7FAKC2uCVJSP8I8scW77UtSU7uTt+YewCfWqVT
uzO/m5SpoJA+kZG9qiJA/Fk=
=TjZu
-----END PGP SIGNATURE-----
May 17, 2005
>> void main()
>> {
>> // valid
>> char[] str = "f�f�fX,-";
>> stdout.writeString(str); // valid output : E3 83 AF E3 83 AD E3 82 B9 EF
>> BD 97
>> // invalid
>> char[] str2 = r"f�f�fX,-"; // or char[] str = `f�f�fX,-`;
>> stdout.writeString(str2); // invalid output : E3 E3 E3 EF
>> return;
>> }
>>
>> thanks,
>> Hiroshi Sakurai.
>> sorry, my english is very poor. OTL
>
> I'm confused. Is the problem with raw strings like r"blah" or with
> std.stream? The Stream.writeString doesn't look at encodings so whatever is
> going wrong is happening before the call to writeString. Since I don't have
> the proper fonts or encoding support in my new reader I only see the raw
> string r"f..." with boxes in them so I can't tell what is actually in the
> source file you are trying to compile.

Yes, and the boxes are U+FFFD, that is the Unicode replacement character. Whatever he typed in, it didn't make its way to us. But it is interesting to note that dmd's behaviour for the normal and the wysiwyg string is still different:

UTF8: 66 ef bf bd 66 ef bf bd 66 58 2c 2d
UTF16: 66 fffd 66 fffd 66 58 2c 2d

This is the normal string in UTF8 and UTF16 (note the U+FFFD replacement character).

UTF8:   66 ef 66 ef 66 58 2c 2d
UTF16: 66 f9af 66 58 2c 2d

And this one is the wysiwyg string, with the contents of the other one copied+pasted. Note that dmd omitted the "BF BD" after "66 EF". That produces illegal unicode, as you can see by the UTF16 translation (which is simply wrong - the algorithm does not check on invalid input).

Hmm, after some more thinking i found that the whole f?f?fX,- sequence is wrong, it just does not match the "valid output" he denotes above. He wants to input the following:

UTF8: e3 83 af e3 83 ad e3 82 b9 ef bd 97
UTF16: 30ef 30ed 30b9 ff57

Does anybody know how to input these characters with Linux? I don't have any input device for that :)
Or easier, Hiroshi, could you please send your input file over the list?

Ciao
uwe
May 17, 2005
> Does anybody know how to input these characters with Linux? I don't have any input device for that :)
> Or easier, Hiroshi, could you please send your input file over the list?

Hm, as i see now, Thomas already accomplished that (how?). Please ignore my posting.  :(

Ciao
uwe
May 17, 2005
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Uwe Salomon schrieb am Tue, 17 May 2005 19:16:21 +0200:
>> Does anybody know how to input these characters with Linux? I don't have
>> any input device for that :)
>> Or easier, Hiroshi, could you please send your input file over the list?
>
> Hm, as i see now, Thomas already accomplished that (how?).

Where is the pröbļém with Uniode on Linux 吗 ?

Thomas

-----BEGIN PGP SIGNATURE-----

iD8DBQFCik0B3w+/yD4P9tIRAuTsAKCpwmUDrhQEV11P/Za+5aDB1A/c1gCgxKrg
6KBpjbBb7mTAZ3HGeuLjb7E=
=rVZK
-----END PGP SIGNATURE-----
May 17, 2005
>> Hm, as i see now, Thomas already accomplished that (how?).
>
> Where is the pröbļém with Uniode on Linux 吗 ?

Yes, put salt on the open wound! :-P
I don't have problems with Unicode, but i don't know a program/method to insert arbitrary Unicode characters into text... Thus i can only insert the characters that are on my keyboard.. (@ł€¶ŧ←↓→øþ¨æßðđŋjĸł etc.)

uwe
May 18, 2005
Uwe Salomon wrote:
>>> Hm, as i see now, Thomas already accomplished that (how?).
>>
>>
>> Where is the pröbļém with Uniode on Linux 吗 ?
>
>
> Yes, put salt on the open wound! :-P
> I don't have problems with Unicode, but i don't know a program/method
> to  insert arbitrary Unicode characters into text... Thus i can only
> insert  the characters that are on my keyboard.. (@ł€¶ŧ←↓→øþ¨æßðđŋjĸł etc.)

There are input modules for X11 and gtk that support quite a range of scripts. Last time I checked qt/KDE didn't any way to add native input modules. If you are desperate you might try http://yudit.org/ (X-based) or http://sourceforge.net/projects/jgim/ (Java based) to input "simple" languages.

Thomas