Thread overview | ||||||
---|---|---|---|---|---|---|
|
January 23, 2007 stream.readLine | ||||
---|---|---|---|---|
| ||||
The implementation of stream.readLine() threats char.init as EOF, which is not right because char.init is 255 (which is ÿ in Cyrillic). I believe EOF should be 0. |
January 23, 2007 Re: stream.readLine | ||||
---|---|---|---|---|
| ||||
Posted in reply to bobef | bobef wrote:
> The implementation of stream.readLine() threats char.init as EOF, which is not right because char.init is 255 (which is ÿ in Cyrillic). I believe EOF should be 0.
No, char.init is 255 which is an invalid byte in UTF-8 data.
Codepoint 255 *is* ÿ, IIRC, but char doesn't store codepoints. It stores UTF-8 bytes (code units?).
Forgive me if I got the terminology wrong.
|
January 23, 2007 Re: stream.readLine | ||||
---|---|---|---|---|
| ||||
Posted in reply to Frits van Bommel | Then it is impossible to use the readLine() function to read non-utf8 streams? If it is so this sucks ass, because I have to read the stream to convert it to utf8, because obviously I can't force any stream out there to be utf8 just because D likes it :) |
January 23, 2007 Re: stream.readLine | ||||
---|---|---|---|---|
| ||||
Posted in reply to bobef | bobef wrote: > Then it is impossible to use the readLine() function to read non-utf8 streams? InputStream.readLine (which I presume is the one you mean) returns an UTF-8 string. It doesn't mention in what format it is read. If someone wants to implement it to read a non-UTF string from somewhere and then convert it to UTF-8 and return it, that's a perfectly valid implementation. > If it is so this sucks ass, because I have to read the stream to convert it to utf8, because obviously I can't force any stream out there to be utf8 just because D likes it :) A conversion stream may not be so hard to implement. Just create an object implementing InputStream and pass another InputStream to its constructor. Or you can even inherit it directly from std.stream.File, forward the constructors, and only override the readLine* functions. Then if you're reading a file formatted in some ASCII + extended codepage format, you just need a lookup table (or conversion function) to convert the last 128 values to the corresponding UTF codepoints and use std.utf.encode. For Latin-1 data it's even simpler, just pass it straight to std.utf.encode. You'll probably want to use the read(inout ubyte) method to read such a file. The process for other text formats is probably similar, perhaps using other read() overloads to read it (for multi-byte encodings). (Warning: I've never actually implemented a Stream, so the above may well be riddled with errors and misinformation :) ) |
Copyright © 1999-2021 by the D Language Foundation