Thread overview | |||||||||
---|---|---|---|---|---|---|---|---|---|
|
December 22, 2006 print non-ASCII/UTF-8 string | ||||
---|---|---|---|---|
| ||||
Let's say that file q.txt contains some characters bigger than 0x7f (for example, from windows-1252 encoding). In such case the following snippet: *** import std.stream; void main() { Stream f = new BufferedFile("q.txt"); for (char[] l; f) { writefln(l); } } *** will fail with 'Error: 4invalid UTF-8 sequence' because D's strings are in UTF-8, right? My question is: is there any way to print out non-UTF-8 data exactly in the same encoding (which may be unknown) as in original file? |
December 22, 2006 Re: print non-ASCII/UTF-8 string | ||||
---|---|---|---|---|
| ||||
Posted in reply to Egor Starostin | Egor Starostin wrote: > Let's say that file q.txt contains some characters bigger than 0x7f (for > example, from windows-1252 encoding). > In such case the following snippet: > *** > import std.stream; > void main() { > Stream f = new BufferedFile("q.txt"); > for (char[] l; f) { > writefln(l); > } > } > *** > will fail with 'Error: 4invalid UTF-8 sequence' because D's strings are in > UTF-8, right? > > My question is: is there any way to print out non-UTF-8 data exactly in the > same encoding (which may be unknown) as in original file? It's funny that you should bring this up now. I had a thread over in d.D.learn regarding this very thing. The following should help you get started: char[] Latin1ToUTF8(char[] value){ char[] result; for(uint i=0; i<value.length; i++){ char ch = value[i]; if(ch < 0x80){ result ~= ch; } else{ result ~= 0xC0 | (ch >> 6); result ~= 0x80 | (ch & 0x3F); } } return result; } (this could be optimized to use fewer concatenations, but I think it gets the point across) I have no clue how to work from other code pages, as I gather the transform would be far less than straightforward as Latin-1. Also, I have no idea how to *detect* what code page is being used based on the input set. I don't even know if that's possible, like you, , I'd love to hear about it should someone else know of an algorithm. -- - EricAnderton at yahoo |
December 22, 2006 Re: print non-ASCII/UTF-8 string | ||||
---|---|---|---|---|
| ||||
Posted in reply to Pragma | > > My question is: is there any way to print out non-UTF-8 data exactly in the same encoding (which may be unknown) as in original file?
> It's funny that you should bring this up now. I had a thread over in
> d.D.learn regarding this very thing. The following should help you get
> started:
> char[] Latin1ToUTF8(char[] value){
It's not my case, I think.
I don't need to convert to UTF-8. I just need to raw print exactly the same string as in original file.
|
December 22, 2006 Re: print non-ASCII/UTF-8 string | ||||
---|---|---|---|---|
| ||||
Posted in reply to Egor Starostin | "Egor Starostin" <egorst@gmail.com> wrote in message news:emgvkj$1ll6$1@digitaldaemon.com... > I don't need to convert to UTF-8. I just need to raw print exactly the > same string > as in original file. Hm. This might be one case where printf is actually useful: foreach(l; f) printf("%s\n", toStringz(l)); |
December 22, 2006 Re: print non-ASCII/UTF-8 string | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jarrett Billingsley Attachments: | Jarrett Billingsley schrieb am 2006-12-22:
> "Egor Starostin" <egorst@gmail.com> wrote in message news:emgvkj$1ll6$1@digitaldaemon.com...
>
>> I don't need to convert to UTF-8. I just need to raw print exactly the
>> same string
>> as in original file.
>
> Hm. This might be one case where printf is actually useful:
>
> foreach(l; f)
> printf("%s\n", toStringz(l));
This should work more reliable and consume less resources:
printf("%.*s\n", l.length, l.ptr);
Thomas
|
December 22, 2006 Re: print non-ASCII/UTF-8 string | ||||
---|---|---|---|---|
| ||||
Posted in reply to Thomas Kuehne | Thomas Kuehne wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Jarrett Billingsley schrieb am 2006-12-22:
>> "Egor Starostin" <egorst@gmail.com> wrote in message news:emgvkj$1ll6$1@digitaldaemon.com...
>>
>>> I don't need to convert to UTF-8. I just need to raw print exactly the same string
>>> as in original file.
>> Hm. This might be one case where printf is actually useful:
>>
>> foreach(l; f)
>> printf("%s\n", toStringz(l));
>
> This should work more reliable and consume less resources:
> printf("%.*s\n", l.length, l.ptr);
>
> Thomas
This works as well. But only because array parts are in the correct order to begin with
printf("%.*s\n", l);
|
December 23, 2006 Re: print non-ASCII/UTF-8 string | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jarrett Billingsley | Jarrett Billingsley wrote: > "Egor Starostin" <egorst@gmail.com> wrote in message news:emgvkj$1ll6$1@digitaldaemon.com... > >> I don't need to convert to UTF-8. I just need to raw print exactly the same string >> as in original file. > > Hm. This might be one case where printf is actually useful: > > foreach(l; f) > printf("%s\n", toStringz(l)); > > Or rather: dout.write(cast(ubyte[]) line); ? -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D |
Copyright © 1999-2021 by the D Language Foundation