Thread overview | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
May 17, 2005 char code | ||||
---|---|---|---|---|
| ||||
Hi. this topic writen in 2ch BBS. http://pc8.2ch.net/test/read.cgi/tech/1109933426/567 and Japanese D language wiki bugtrack. http://f17.aaa.livedoor.jp/~labamba/?BugTrack%2F13 Illegal non-ascii WYSIWYG string. ver dmd0.123 /*code page utf8 */ private import std.stream; void main() { // valid char[] str = "ƒƒƒX‚—"; stdout.writeString(str); // valid output : E3 83 AF E3 83 AD E3 82 B9 EF BD 97 // invalid char[] str2 = r"ƒƒƒX‚—"; // or char[] str = `ƒƒƒX‚—`; stdout.writeString(str2); // invalid output : E3 E3 E3 EF return; } thanks, Hiroshi Sakurai. sorry, my english is very poor. OTL |
May 17, 2005 Re: char code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Hiroshi Sakurai | "Hiroshi Sakurai" <Hiroshi_member@pathlink.com> wrote in message news:d6bm67$cfr$1@digitaldaemon.com... > Hi. > this topic writen in 2ch BBS. > http://pc8.2ch.net/test/read.cgi/tech/1109933426/567 > > and Japanese D language wiki bugtrack. http://f17.aaa.livedoor.jp/~labamba/?BugTrack%2F13 > > Illegal non-ascii WYSIWYG string. > > ver dmd0.123 > /*code page utf8 */ > private import std.stream; > void main() > { > // valid > char[] str = "fffX,-"; > stdout.writeString(str); // valid output : E3 83 AF E3 83 AD E3 82 B9 EF > BD 97 > // invalid > char[] str2 = r"fffX,-"; // or char[] str = `fffX,-`; > stdout.writeString(str2); // invalid output : E3 E3 E3 EF > return; > } > > thanks, > Hiroshi Sakurai. > sorry, my english is very poor. OTL I'm confused. Is the problem with raw strings like r"blah" or with std.stream? The Stream.writeString doesn't look at encodings so whatever is going wrong is happening before the call to writeString. Since I don't have the proper fonts or encoding support in my new reader I only see the raw string r"f..." with boxes in them so I can't tell what is actually in the source file you are trying to compile. The raw strings format is a sequence of bytes assumed to be in utf-8 encoding. Is that what is in your source file? -Ben |
May 17, 2005 Re: char code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Hiroshi Sakurai | -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hiroshi Sakurai schrieb am Tue, 17 May 2005 02:50:47 +0000 (UTC): > Hi. > this topic writen in 2ch BBS. > http://pc8.2ch.net/test/read.cgi/tech/1109933426/567 > > and Japanese D language wiki bugtrack. http://f17.aaa.livedoor.jp/~labamba/?BugTrack%2F13 > > Illegal non-ascii WYSIWYG string. > > ver dmd0.123 /*code page utf8 */ private import std.stream; void main() { // valid char[] str = "ƒƒƒX‚—"; stdout.writeString(str); // valid output : E3 83 AF E3 83 AD E3 82 B9 EF BD 97 // invalid char[] str2 = r"ƒƒƒX‚—"; // or char[] str = `ƒƒƒX‚—`; stdout.writeString(str2); // invalid output : E3 E3 E3 EF return; } Added to DStress as http://dstress.kuehne.cn/run/u/unicode_08_A.d http://dstress.kuehne.cn/run/u/unicode_08_B.d http://dstress.kuehne.cn/run/u/unicode_08_C.d http://dstress.kuehne.cn/run/u/unicode_08_D.d > sorry, my english is very poor. OTL I could understand your message, thus your English can't be that bad ;) Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFCiiHX3w+/yD4P9tIRAm7FAKC2uCVJSP8I8scW77UtSU7uTt+YewCfWqVT uzO/m5SpoJA+kZG9qiJA/Fk= =TjZu -----END PGP SIGNATURE----- |
May 17, 2005 Re: char code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ben Hinkle | >> void main()
>> {
>> // valid
>> char[] str = "f�f�fX,-";
>> stdout.writeString(str); // valid output : E3 83 AF E3 83 AD E3 82 B9 EF
>> BD 97
>> // invalid
>> char[] str2 = r"f�f�fX,-"; // or char[] str = `f�f�fX,-`;
>> stdout.writeString(str2); // invalid output : E3 E3 E3 EF
>> return;
>> }
>>
>> thanks,
>> Hiroshi Sakurai.
>> sorry, my english is very poor. OTL
>
> I'm confused. Is the problem with raw strings like r"blah" or with
> std.stream? The Stream.writeString doesn't look at encodings so whatever is
> going wrong is happening before the call to writeString. Since I don't have
> the proper fonts or encoding support in my new reader I only see the raw
> string r"f..." with boxes in them so I can't tell what is actually in the
> source file you are trying to compile.
Yes, and the boxes are U+FFFD, that is the Unicode replacement character. Whatever he typed in, it didn't make its way to us. But it is interesting to note that dmd's behaviour for the normal and the wysiwyg string is still different:
UTF8: 66 ef bf bd 66 ef bf bd 66 58 2c 2d
UTF16: 66 fffd 66 fffd 66 58 2c 2d
This is the normal string in UTF8 and UTF16 (note the U+FFFD replacement character).
UTF8: 66 ef 66 ef 66 58 2c 2d
UTF16: 66 f9af 66 58 2c 2d
And this one is the wysiwyg string, with the contents of the other one copied+pasted. Note that dmd omitted the "BF BD" after "66 EF". That produces illegal unicode, as you can see by the UTF16 translation (which is simply wrong - the algorithm does not check on invalid input).
Hmm, after some more thinking i found that the whole f?f?fX,- sequence is wrong, it just does not match the "valid output" he denotes above. He wants to input the following:
UTF8: e3 83 af e3 83 ad e3 82 b9 ef bd 97
UTF16: 30ef 30ed 30b9 ff57
Does anybody know how to input these characters with Linux? I don't have any input device for that :)
Or easier, Hiroshi, could you please send your input file over the list?
Ciao
uwe
|
May 17, 2005 Re: char code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Uwe Salomon | > Does anybody know how to input these characters with Linux? I don't have any input device for that :)
> Or easier, Hiroshi, could you please send your input file over the list?
Hm, as i see now, Thomas already accomplished that (how?). Please ignore my posting. :(
Ciao
uwe
|
May 17, 2005 Re: char code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Uwe Salomon | -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Uwe Salomon schrieb am Tue, 17 May 2005 19:16:21 +0200: >> Does anybody know how to input these characters with Linux? I don't have >> any input device for that :) >> Or easier, Hiroshi, could you please send your input file over the list? > > Hm, as i see now, Thomas already accomplished that (how?). Where is the pröbļém with Uniode on Linux 吗 ? Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFCik0B3w+/yD4P9tIRAuTsAKCpwmUDrhQEV11P/Za+5aDB1A/c1gCgxKrg 6KBpjbBb7mTAZ3HGeuLjb7E= =rVZK -----END PGP SIGNATURE----- |
May 17, 2005 Re: char code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thomas Kuehne | >> Hm, as i see now, Thomas already accomplished that (how?). > > Where is the pröbļém with Uniode on Linux 吗 ? Yes, put salt on the open wound! :-P I don't have problems with Unicode, but i don't know a program/method to insert arbitrary Unicode characters into text... Thus i can only insert the characters that are on my keyboard.. (@ł€¶ŧ←↓→øþ¨æßðđŋjĸł etc.) uwe |
May 18, 2005 Re: char code [OT] | ||||
---|---|---|---|---|
| ||||
Posted in reply to Uwe Salomon Attachments:
| Uwe Salomon wrote: >>> Hm, as i see now, Thomas already accomplished that (how?). >> >> >> Where is the pröbļém with Uniode on Linux 吗 ? > > > Yes, put salt on the open wound! :-P > I don't have problems with Unicode, but i don't know a program/method > to insert arbitrary Unicode characters into text... Thus i can only > insert the characters that are on my keyboard.. (@ł€¶ŧ←↓→øþ¨æßðđŋjĸł etc.) There are input modules for X11 and gtk that support quite a range of scripts. Last time I checked qt/KDE didn't any way to add native input modules. If you are desperate you might try http://yudit.org/ (X-based) or http://sourceforge.net/projects/jgim/ (Java based) to input "simple" languages. Thomas |
Copyright © 1999-2021 by the D Language Foundation