Thread overview
std.net.curl get webpage asia font issue
Jun 07, 2012
Sam Hu
Jun 07, 2012
Kevin
Jun 08, 2012
Sam Hu
Jun 07, 2012
Dmitry Olshansky
Jun 08, 2012
Sam Hu
Jun 08, 2012
Dmitry Olshansky
June 07, 2012
Greeting!

The document on this website provide an example on how to get webpage information by std.net.curl.It is quite straightforward:

[code]
import std.net.curl, std.stdio;

void main(){

// Return a string containing the content specified by an URL
string content = get("dlang.org");

writefln("%s\n",content);

readln;
}
[/code]

When I change get("dlang.org") to get("yahoo.com"),everything goes fine;but when I change to get("yahoo.com.cn"),a runtime error said bad gbk encoding bla...

So my very simple question is how to retrieve information from a webpage which could possibily contains asia font (like Chinese font)?

Thanks for your help in advance.

Regards,
Sam
June 07, 2012
On 07/06/12 02:57, Sam Hu wrote:
> string content = get("dlang.org");
> writefln("%s\n",content);
>
> So my very simple question is how to retrieve information from a webpage which could possibily contains asia font (like Chinese font)?

I'm not really sure but try:
wstring content = get("dlang.org");

Also make sure your terminal is set up for unicode.
June 07, 2012
On 07.06.2012 10:57, Sam Hu wrote:
> Greeting!
>
> The document on this website provide an example on how to get webpage
> information by std.net.curl.It is quite straightforward:
>
> [code]
> import std.net.curl, std.stdio;
>
> void main(){
>
> // Return a string containing the content specified by an URL
> string content = get("dlang.org");

It's simple this line you "convert" whatever site content was to unicode. Problem is that "convert" is either broken or it's simply a cast whereas it should re-encode source as unicode. So the way around is to get it to array of bytes and decode yourself.

>
> writefln("%s\n",content);
>
> readln;
> }
> [/code]
>
> When I change get("dlang.org") to get("yahoo.com"),everything goes
> fine;but when I change to get("yahoo.com.cn"),a runtime error said bad
> gbk encoding bla...
>
> So my very simple question is how to retrieve information from a webpage
> which could possibily contains asia font (like Chinese font)?
>
I think it's not "font" but encoding problem.

> Thanks for your help in advance.
>
> Regards,
> Sam


-- 
Dmitry Olshansky
June 08, 2012
On Thursday, 7 June 2012 at 10:43:32 UTC, Dmitry Olshansky wrote:
>> string content = get("dlang.org");
>
> It's simple this line you "convert" whatever site content was to unicode. Problem is that "convert" is either broken or it's simply a cast whereas it should re-encode source as unicode. So the way around is to get it to array of bytes and decode yourself.
>

Thanks.May I know how ?Appreciated a piece of code segment.
June 08, 2012
On Thursday, 7 June 2012 at 10:38:53 UTC, Kevin wrote:
> On 07/06/12 02:57, Sam Hu wrote:
>> string content = get("dlang.org");
>> writefln("%s\n",content);
>>
>> So my very simple question is how to retrieve information from a
>> webpage which could possibily contains asia font (like Chinese font)?
>
> I'm not really sure but try:
> wstring content = get("dlang.org");
>
> Also make sure your terminal is set up for unicode.

Sorry,no,it does not work,I tried to print the content to DFL
TextBox control but still the same issue.
June 08, 2012
On 08.06.2012 5:03, Sam Hu wrote:
> On Thursday, 7 June 2012 at 10:43:32 UTC, Dmitry Olshansky wrote:
>>> string content = get("dlang.org");
>>
>> It's simple this line you "convert" whatever site content was to
>> unicode. Problem is that "convert" is either broken or it's simply a
>> cast whereas it should re-encode source as unicode. So the way around
>> is to get it to array of bytes and decode yourself.
>>
>
> Thanks.May I know how ?Appreciated a piece of code segment.

seems like
ubyte[] data = get!(AutoProtocol, ubyte)("your-site.cn");
//should work, sorry I'm on windows and curl doesn't work here for me
then you work with your data, decode and whatever, at least this:
writeln(data);//will not throw but will print bytes

-- 
Dmitry Olshansky