Thread overview |
---|
August 08, 2016 encoding ISO-8859-1 to UTF-8 in std.net.curl | ||||
---|---|---|---|---|
| ||||
import std.stdio; import std.net.curl; void main() { string url = "www.site.ru/xml/api.asp"; string data = "<?xml version='1.0' encoding='UTF-8'?> <request> <category> <id>59538</id> </category> ... </request>"; auto http = HTTP(); http.clearRequestHeaders(); http.addRequestHeader("Content-Type", "application/xml"); //Accept-Charset: utf-8 http.addRequestHeader("Accept-Charset", "utf-8"); //ISO-8859-1 //http://www.artlebedev.ru/tools/decoder/ //ISO-8859-1 → UTF-8 auto content = post(url, "data", http); // content in ISO-8859-1 to UTF-8 encoding but I lose //the Cyrillic "<?xml version='1.0' encoding='UTF-8'?>отсутствует или неверно задан параметр" // I get it "<?xml version='1.0' encoding='UTF-8'?>оÑÑÑÑÑÑвÑÐµÑ Ð¸Ð»Ð¸ невеÑно задан паÑамеÑÑ" // How do I change the encoding to UTF-8 in response string s = cast(immutable char[])content; auto f = File("output.txt","w"); // output.txt file in UTF-8; f.write(s); f.close; } |
August 08, 2016 Re: encoding ISO-8859-1 to UTF-8 in std.net.curl | ||||
---|---|---|---|---|
| ||||
Posted in reply to Alexsej | On 08/08/2016 09:57 PM, Alexsej wrote:
> // content in ISO-8859-1 to UTF-8 encoding but I lose
> //the Cyrillic "<?xml version='1.0'
> encoding='UTF-8'?>отсутствует или неверно задан параметр"
> // I get it "<?xml version='1.0'
> encoding='UTF-8'?>оÑÑÑÑÑÑвÑÐµÑ Ð¸Ð»Ð¸ невеÑно
> задан паÑамеÑÑ"
> // How do I change the encoding to UTF-8 in response
>
>
> string s = cast(immutable char[])content;
> auto f = File("output.txt","w"); // output.txt file in UTF-8;
> f.write(s);
The server doesn't include the encoding in the Content-Type header, right? So curl assumes the default, which is ISO 8859-1. It interprets the data as that and transcodes to UTF-8. The result is garbage, of course.
I don't see a way to change the default encoding. Maybe that should be added.
Until then you can reverse the wrong transcoding:
----
import std.encoding: Latin1String, transcode;
Latin1String pseudo_latin1;
transcode(content.idup, pseudo_latin1);
string s = cast(string) pseudo_latin1;
----
Tiny rant:
Why on earth does transcode only accept immutable characters for input? Every other post here uncovers some bug/shortcoming :(
|
August 08, 2016 Re: encoding ISO-8859-1 to UTF-8 in std.net.curl | ||||
---|---|---|---|---|
| ||||
Posted in reply to ag0aep6g | On Monday, 8 August 2016 at 21:11:26 UTC, ag0aep6g wrote: > On 08/08/2016 09:57 PM, Alexsej wrote: >> // content in ISO-8859-1 to UTF-8 encoding but I lose >> //the Cyrillic "<?xml version='1.0' >> encoding='UTF-8'?>отсутствует или неверно задан параметр" >> // I get it "<?xml version='1.0' >> encoding='UTF-8'?>оÑÑÑÑÑÑвÑÐµÑ Ð¸Ð»Ð¸ невеÑно >> задан паÑамеÑÑ" >> // How do I change the encoding to UTF-8 in response >> >> >> string s = cast(immutable char[])content; >> auto f = File("output.txt","w"); // output.txt file in UTF-8; >> f.write(s); > > The server doesn't include the encoding in the Content-Type header, right? So curl assumes the default, which is ISO 8859-1. It interprets the data as that and transcodes to UTF-8. The result is garbage, of course. > > I don't see a way to change the default encoding. Maybe that should be added. > > Until then you can reverse the wrong transcoding: > > ---- > import std.encoding: Latin1String, transcode; > Latin1String pseudo_latin1; > transcode(content.idup, pseudo_latin1); > string s = cast(string) pseudo_latin1; > ---- > > Tiny rant: > > Why on earth does transcode only accept immutable characters for input? Every other post here uncovers some bug/shortcoming :( //header from server server: nginx date: Mon, 08 Aug 2016 22:02:15 GMT content-type: text/xml; Charset=utf-8 content-length: 204 connection: keep-alive vary: Accept-Encoding cache-control: private expires: Mon, 08 Aug 2016 22:02:15 GMT set-cookie: ASPSESSIONIDSSCCDASA=KIAPMCMDMPEDHPBJNMGFHMEB; path=/ x-powered-by: ASP.NET |
August 09, 2016 Re: encoding ISO-8859-1 to UTF-8 in std.net.curl | ||||
---|---|---|---|---|
| ||||
Posted in reply to ag0aep6g | On 08/08/2016 11:11 PM, ag0aep6g wrote: > Why on earth does transcode only accept immutable characters for input? https://github.com/dlang/phobos/pull/4722 |
August 08, 2016 Re: encoding ISO-8859-1 to UTF-8 in std.net.curl | ||||
---|---|---|---|---|
| ||||
Posted in reply to ag0aep6g | On Monday, 8 August 2016 at 21:11:26 UTC, ag0aep6g wrote:
> On 08/08/2016 09:57 PM, Alexsej wrote:
>> // content in ISO-8859-1 to UTF-8 encoding but I lose
>> //the Cyrillic "<?xml version='1.0'
>> encoding='UTF-8'?>отсутствует или неверно задан параметр"
>> // I get it "<?xml version='1.0'
>> encoding='UTF-8'?>оÑÑÑÑÑÑвÑÐµÑ Ð¸Ð»Ð¸ невеÑно
>> задан паÑамеÑÑ"
>> // How do I change the encoding to UTF-8 in response
>>
>>
>> string s = cast(immutable char[])content;
>> auto f = File("output.txt","w"); // output.txt file in UTF-8;
>> f.write(s);
>
> The server doesn't include the encoding in the Content-Type header, right? So curl assumes the default, which is ISO 8859-1. It interprets the data as that and transcodes to UTF-8. The result is garbage, of course.
>
> I don't see a way to change the default encoding. Maybe that should be added.
>
> Until then you can reverse the wrong transcoding:
>
> ----
> import std.encoding: Latin1String, transcode;
> Latin1String pseudo_latin1;
> transcode(content.idup, pseudo_latin1);
> string s = cast(string) pseudo_latin1;
> ----
>
> Tiny rant:
>
> Why on earth does transcode only accept immutable characters for input? Every other post here uncovers some bug/shortcoming :(
thanks it works.
|
August 09, 2016 Re: encoding ISO-8859-1 to UTF-8 in std.net.curl | ||||
---|---|---|---|---|
| ||||
Posted in reply to Alexsej | On 08/09/2016 12:05 AM, Alexsej wrote: > //header from server > server: nginx > date: Mon, 08 Aug 2016 22:02:15 GMT > content-type: text/xml; Charset=utf-8 > content-length: 204 > connection: keep-alive > vary: Accept-Encoding > cache-control: private > expires: Mon, 08 Aug 2016 22:02:15 GMT > set-cookie: ASPSESSIONIDSSCCDASA=KIAPMCMDMPEDHPBJNMGFHMEB; path=/ > x-powered-by: ASP.NET Looks like std.net.curl doesn't handle "Charset" correctly. It only works with lowercase "charset". https://github.com/dlang/phobos/pull/4723 |
Copyright © 1999-2021 by the D Language Foundation