January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
There are two ways: Change global variable for module: dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own This will change headers for all clients. --- Change instance headers: string[string] my_headers = dhttpclient.FFHeaders; // there are more headers than just User-Agent and you have to copy it my_headers["User-Agent"] = "My own spider!"; HTTPClient navegador = new HTTPClient(); navegador.setClientHeaders(my_headers); --- Headers are defined as: public enum string[string] FFHeaders = [ "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13", "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", "Accept-Charset" : "utf-8", "Keep-Alive" : "300", "Connection" : "keep-alive" ]; /// Headers from firefox 3.6.13 on Linux public enum string[string] LFFHeaders = [ "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13", "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", "Accept-Charset" : "utf-8", "Keep-Alive" : "300", "Connection" : "keep-alive" ]; Accept, Accept-Charset, Kepp-ALive and Connection are important and if you redefine it, module can stop work with some servers. On 20.1.2012 15:56, Xan xan wrote: > On the other hand, I see dhttpclient identifies as > "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) > Gecko/20100401 Firefox/3.6.13" > > How can I change that? |
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
First version was buggy. I've updated code at github, so if you want to try it, pull new version (git pull). I've also added new example into examples/user_agent_change.d
On 20.1.2012 16:08, Bystroushaak wrote:
> There are two ways:
>
> Change global variable for module:
>
> dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own
>
> This will change headers for all clients.
>
> ---
>
> Change instance headers:
>
> string[string] my_headers = dhttpclient.FFHeaders; // there are more
> headers than just User-Agent and you have to copy it
> my_headers["User-Agent"] = "My own spider!";
>
> HTTPClient navegador = new HTTPClient();
> navegador.setClientHeaders(my_headers);
>
> ---
>
> Headers are defined as:
>
> public enum string[string] FFHeaders = [
> "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
> Gecko/20100401 Firefox/3.6.13",
> "Accept" :
> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>
> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
> "Accept-Charset" : "utf-8",
> "Keep-Alive" : "300",
> "Connection" : "keep-alive"
> ];
>
> /// Headers from firefox 3.6.13 on Linux
> public enum string[string] LFFHeaders = [
> "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3)
> Gecko/20100401 Firefox/3.6.13",
> "Accept" :
> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>
> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
> "Accept-Charset" : "utf-8",
> "Keep-Alive" : "300",
> "Connection" : "keep-alive"
> ];
>
> Accept, Accept-Charset, Kepp-ALive and Connection are important and if
> you redefine it, module can stop work with some servers.
>
> On 20.1.2012 15:56, Xan xan wrote:
>> On the other hand, I see dhttpclient identifies as
>> "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
>> Gecko/20100401 Firefox/3.6.13"
>>
>> How can I change that?
|
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
Thank you very much, Bystroushaak. I see you limite httpclient to xml/html documents. Is there possibility of download any files (and not only html or xml). Just like: HTTPClient navegador = new HTTPClient(); auto file = navegador.download("http://www.google.com/myfile.pdf") ? Thanks a lot, 2012/1/20 Bystroushaak <bystrousak@kitakitsune.org>: > First version was buggy. I've updated code at github, so if you want to try it, pull new version (git pull). I've also added new example into examples/user_agent_change.d > > > On 20.1.2012 16:08, Bystroushaak wrote: >> >> There are two ways: >> >> Change global variable for module: >> >> dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own >> >> This will change headers for all clients. >> >> --- >> >> Change instance headers: >> >> string[string] my_headers = dhttpclient.FFHeaders; // there are more headers than just User-Agent and you have to copy it my_headers["User-Agent"] = "My own spider!"; >> >> HTTPClient navegador = new HTTPClient(); >> navegador.setClientHeaders(my_headers); >> >> --- >> >> Headers are defined as: >> >> public enum string[string] FFHeaders = [ >> "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) >> Gecko/20100401 Firefox/3.6.13", >> "Accept" : >> >> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", >> >> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", >> "Accept-Charset" : "utf-8", >> "Keep-Alive" : "300", >> "Connection" : "keep-alive" >> ]; >> >> /// Headers from firefox 3.6.13 on Linux >> public enum string[string] LFFHeaders = [ >> "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3) >> Gecko/20100401 Firefox/3.6.13", >> "Accept" : >> >> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", >> >> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", >> "Accept-Charset" : "utf-8", >> "Keep-Alive" : "300", >> "Connection" : "keep-alive" >> ]; >> >> Accept, Accept-Charset, Kepp-ALive and Connection are important and if you redefine it, module can stop work with some servers. >> >> On 20.1.2012 15:56, Xan xan wrote: >>> >>> On the other hand, I see dhttpclient identifies as "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13" >>> >>> How can I change that? |
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
It is unlimited, you just have to cast output to ubyte[]: std.file.write("logo3w.png", cast(ubyte[]) cl.get("http://www.google.cz/images/srpr/logo3w.png")); On 20.1.2012 17:53, Xan xan wrote: > Thank you very much, Bystroushaak. > I see you limite httpclient to xml/html documents. Is there > possibility of download any files (and not only html or xml). Just > like: > > HTTPClient navegador = new HTTPClient(); > auto file = navegador.download("http://www.google.com/myfile.pdf") > > ? > > Thanks a lot, > > > > 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>: >> First version was buggy. I've updated code at github, so if you want to try >> it, pull new version (git pull). I've also added new example into >> examples/user_agent_change.d >> >> >> On 20.1.2012 16:08, Bystroushaak wrote: >>> >>> There are two ways: >>> >>> Change global variable for module: >>> >>> dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own >>> >>> This will change headers for all clients. >>> >>> --- >>> >>> Change instance headers: >>> >>> string[string] my_headers = dhttpclient.FFHeaders; // there are more >>> headers than just User-Agent and you have to copy it >>> my_headers["User-Agent"] = "My own spider!"; >>> >>> HTTPClient navegador = new HTTPClient(); >>> navegador.setClientHeaders(my_headers); >>> >>> --- >>> >>> Headers are defined as: >>> >>> public enum string[string] FFHeaders = [ >>> "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) >>> Gecko/20100401 Firefox/3.6.13", >>> "Accept" : >>> >>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", >>> >>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", >>> "Accept-Charset" : "utf-8", >>> "Keep-Alive" : "300", >>> "Connection" : "keep-alive" >>> ]; >>> >>> /// Headers from firefox 3.6.13 on Linux >>> public enum string[string] LFFHeaders = [ >>> "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3) >>> Gecko/20100401 Firefox/3.6.13", >>> "Accept" : >>> >>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", >>> >>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", >>> "Accept-Charset" : "utf-8", >>> "Keep-Alive" : "300", >>> "Connection" : "keep-alive" >>> ]; >>> >>> Accept, Accept-Charset, Kepp-ALive and Connection are important and if >>> you redefine it, module can stop work with some servers. >>> >>> On 20.1.2012 15:56, Xan xan wrote: >>>> >>>> On the other hand, I see dhttpclient identifies as >>>> "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) >>>> Gecko/20100401 Firefox/3.6.13" >>>> >>>> How can I change that? |
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
If you want to know what type of file you just downloaded, look at .getResponseHeaders(): std.file.write("logo3w.png", cast(ubyte[]) cl.get("http://www.google.cz/images/srpr/logo3w.png")); writeln(cl.getResponseHeaders()["Content-Type"]); Which will print in this case: image/png Here is full example: https://github.com/Bystroushaak/DHTTPClient/blob/master/examples/download_binary_file.d On 20.1.2012 18:00, Bystroushaak wrote: > It is unlimited, you just have to cast output to ubyte[]: > > std.file.write("logo3w.png", cast(ubyte[]) > cl.get("http://www.google.cz/images/srpr/logo3w.png")); > > On 20.1.2012 17:53, Xan xan wrote: >> Thank you very much, Bystroushaak. >> I see you limite httpclient to xml/html documents. Is there >> possibility of download any files (and not only html or xml). Just >> like: >> >> HTTPClient navegador = new HTTPClient(); >> auto file = navegador.download("http://www.google.com/myfile.pdf") >> >> ? >> >> Thanks a lot, >> >> >> >> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>: >>> First version was buggy. I've updated code at github, so if you want >>> to try >>> it, pull new version (git pull). I've also added new example into >>> examples/user_agent_change.d >>> >>> >>> On 20.1.2012 16:08, Bystroushaak wrote: >>>> >>>> There are two ways: >>>> >>>> Change global variable for module: >>>> >>>> dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own >>>> >>>> This will change headers for all clients. >>>> >>>> --- >>>> >>>> Change instance headers: >>>> >>>> string[string] my_headers = dhttpclient.FFHeaders; // there are more >>>> headers than just User-Agent and you have to copy it >>>> my_headers["User-Agent"] = "My own spider!"; >>>> >>>> HTTPClient navegador = new HTTPClient(); >>>> navegador.setClientHeaders(my_headers); >>>> >>>> --- >>>> >>>> Headers are defined as: >>>> >>>> public enum string[string] FFHeaders = [ >>>> "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; >>>> rv:1.9.2.3) >>>> Gecko/20100401 Firefox/3.6.13", >>>> "Accept" : >>>> >>>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", >>>> >>>> >>>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", >>>> "Accept-Charset" : "utf-8", >>>> "Keep-Alive" : "300", >>>> "Connection" : "keep-alive" >>>> ]; >>>> >>>> /// Headers from firefox 3.6.13 on Linux >>>> public enum string[string] LFFHeaders = [ >>>> "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3) >>>> Gecko/20100401 Firefox/3.6.13", >>>> "Accept" : >>>> >>>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain", >>>> >>>> >>>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3", >>>> "Accept-Charset" : "utf-8", >>>> "Keep-Alive" : "300", >>>> "Connection" : "keep-alive" >>>> ]; >>>> >>>> Accept, Accept-Charset, Kepp-ALive and Connection are important and if >>>> you redefine it, module can stop work with some servers. >>>> >>>> On 20.1.2012 15:56, Xan xan wrote: >>>>> >>>>> On the other hand, I see dhttpclient identifies as >>>>> "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) >>>>> Gecko/20100401 Firefox/3.6.13" >>>>> >>>>> How can I change that? |
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
Before and now, I get this error: $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf [Excepció: std.conv.ConvException@/usr/include/d2/4.6/std/conv.d(1640): Can't convert value `HTT' of type string to type uint] The code: //D 2.0 //gdmd-4.6 <fitxer> => surt el fitxer amb el mateix nom i .o //Usa https://github.com/Bystroushaak/DHTTPClient import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; import dhttpclient; int main(string [] args) { if (args.length < 2) { writeln("Usage:"); writeln(" ./spider {<url1>, <url2>, ...}"); return 0; } else { try { string[string] capcalera = dhttpclient.FFHeaders; //capcalera["User-Agent"] = "arachnida yottiuma"; HTTPClient navegador = new HTTPClient(); navegador.setClientHeaders(capcalera); foreach (a; args[1..$]) { writeln("[Contingut: ", cast(ubyte[]) navegador.get(a), "]"); } } catch (Exception e) { writeln("[Excepció: ", e, "]"); } return 0; } } What happens? 2012/1/20 Bystroushaak <bystrousak@kitakitsune.org>: > It is unlimited, you just have to cast output to ubyte[]: > > std.file.write("logo3w.png", cast(ubyte[]) > cl.get("http://www.google.cz/images/srpr/logo3w.png")); > > |
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
Thanks, but what fails that, because I downloaded as collection of bytes. No matter if a file is a pdf, png or whatever if I downloaded as bytes, isn't?
Thanks,
2012/1/20 Bystroushaak <bystrousak@kitakitsune.org>:
> If you want to know what type of file you just downloaded, look at
> .getResponseHeaders():
>
>
> std.file.write("logo3w.png", cast(ubyte[])
> cl.get("http://www.google.cz/images/srpr/logo3w.png"));
> writeln(cl.getResponseHeaders()["Content-Type"]);
>
> Which will print in this case: image/png
>
> Here is full example: https://github.com/Bystroushaak/DHTTPClient/blob/master/examples/download_binary_file.d
>
>
> On 20.1.2012 18:00, Bystroushaak wrote:
>>
>> It is unlimited, you just have to cast output to ubyte[]:
>>
>> std.file.write("logo3w.png", cast(ubyte[])
>> cl.get("http://www.google.cz/images/srpr/logo3w.png"));
>>
>> On 20.1.2012 17:53, Xan xan wrote:
>>>
>>> Thank you very much, Bystroushaak.
>>> I see you limite httpclient to xml/html documents. Is there
>>> possibility of download any files (and not only html or xml). Just
>>> like:
>>>
>>> HTTPClient navegador = new HTTPClient();
>>> auto file = navegador.download("http://www.google.com/myfile.pdf")
>>>
>>> ?
>>>
>>> Thanks a lot,
>>>
>>>
>>>
>>> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>:
>>>>
>>>> First version was buggy. I've updated code at github, so if you want
>>>> to try
>>>> it, pull new version (git pull). I've also added new example into
>>>> examples/user_agent_change.d
>>>>
>>>>
>>>> On 20.1.2012 16:08, Bystroushaak wrote:
>>>>>
>>>>>
>>>>> There are two ways:
>>>>>
>>>>> Change global variable for module:
>>>>>
>>>>> dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own
>>>>>
>>>>> This will change headers for all clients.
>>>>>
>>>>> ---
>>>>>
>>>>> Change instance headers:
>>>>>
>>>>> string[string] my_headers = dhttpclient.FFHeaders; // there are more headers than just User-Agent and you have to copy it my_headers["User-Agent"] = "My own spider!";
>>>>>
>>>>> HTTPClient navegador = new HTTPClient();
>>>>> navegador.setClientHeaders(my_headers);
>>>>>
>>>>> ---
>>>>>
>>>>> Headers are defined as:
>>>>>
>>>>> public enum string[string] FFHeaders = [
>>>>> "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs;
>>>>> rv:1.9.2.3)
>>>>> Gecko/20100401 Firefox/3.6.13",
>>>>> "Accept" :
>>>>>
>>>>>
>>>>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>>>>>
>>>>>
>>>>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
>>>>> "Accept-Charset" : "utf-8",
>>>>> "Keep-Alive" : "300",
>>>>> "Connection" : "keep-alive"
>>>>> ];
>>>>>
>>>>> /// Headers from firefox 3.6.13 on Linux
>>>>> public enum string[string] LFFHeaders = [
>>>>> "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3)
>>>>> Gecko/20100401 Firefox/3.6.13",
>>>>> "Accept" :
>>>>>
>>>>>
>>>>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>>>>>
>>>>>
>>>>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
>>>>> "Accept-Charset" : "utf-8",
>>>>> "Keep-Alive" : "300",
>>>>> "Connection" : "keep-alive"
>>>>> ];
>>>>>
>>>>> Accept, Accept-Charset, Kepp-ALive and Connection are important and if you redefine it, module can stop work with some servers.
>>>>>
>>>>> On 20.1.2012 15:56, Xan xan wrote:
>>>>>>
>>>>>>
>>>>>> On the other hand, I see dhttpclient identifies as "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13"
>>>>>>
>>>>>> How can I change that?
|
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
Thats because you are trying writeln binary data, and that is impossible, because writeln IMHO checks UTF8 validity.
On 20.1.2012 18:08, Xan xan wrote:
> Before and now, I get this error:
>
> $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf
> [Excepció: std.conv.ConvException@/usr/include/d2/4.6/std/conv.d(1640):
> Can't convert value `HTT' of type string to type uint]
>
> The code:
>
> //D 2.0
> //gdmd-4.6<fitxer> => surt el fitxer amb el mateix nom i .o
> //Usa https://github.com/Bystroushaak/DHTTPClient
> import std.stdio, std.string, std.conv, std.stream;
> import std.socket, std.socketstream;
> import dhttpclient;
>
> int main(string [] args)
> {
> if (args.length< 2) {
> writeln("Usage:");
> writeln(" ./spider {<url1>,<url2>, ...}");
> return 0;
> }
> else {
> try {
> string[string] capcalera = dhttpclient.FFHeaders;
> //capcalera["User-Agent"] = "arachnida yottiuma";
> HTTPClient navegador = new HTTPClient();
> navegador.setClientHeaders(capcalera);
>
> foreach (a; args[1..$]) {
> writeln("[Contingut: ", cast(ubyte[]) navegador.get(a), "]");
> }
> }
> catch (Exception e) {
> writeln("[Excepció: ", e, "]");
> }
> return 0;
> }
> }
>
>
>
> What happens?
>
>
> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>:
>> It is unlimited, you just have to cast output to ubyte[]:
>>
>> std.file.write("logo3w.png", cast(ubyte[])
>> cl.get("http://www.google.cz/images/srpr/logo3w.png"));
>>
>>
|
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
Mmmm... I understand it. But is there any way of circumvent it? Perhaps I could write to one file, isn't?
2012/1/20 Bystroushaak <bystrousak@kitakitsune.org>:
> Thats because you are trying writeln binary data, and that is impossible, because writeln IMHO checks UTF8 validity.
>
>
> On 20.1.2012 18:08, Xan xan wrote:
>>
>> Before and now, I get this error:
>>
>> $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf
>> [Excepció: std.conv.ConvException@/usr/include/d2/4.6/std/conv.d(1640):
>> Can't convert value `HTT' of type string to type uint]
>>
>> The code:
>>
>> //D 2.0
>> //gdmd-4.6<fitxer> => surt el fitxer amb el mateix nom i .o
>> //Usa https://github.com/Bystroushaak/DHTTPClient
>> import std.stdio, std.string, std.conv, std.stream;
>> import std.socket, std.socketstream;
>> import dhttpclient;
>>
>> int main(string [] args)
>> {
>> if (args.length< 2) {
>> writeln("Usage:");
>> writeln(" ./spider {<url1>,<url2>, ...}");
>> return 0;
>> }
>> else {
>> try {
>> string[string] capcalera = dhttpclient.FFHeaders;
>> //capcalera["User-Agent"] = "arachnida yottiuma";
>> HTTPClient navegador = new HTTPClient();
>> navegador.setClientHeaders(capcalera);
>>
>> foreach (a; args[1..$]) {
>> writeln("[Contingut: ", cast(ubyte[])
>> navegador.get(a), "]");
>> }
>> }
>> catch (Exception e) {
>> writeln("[Excepció: ", e, "]");
>> }
>> return 0;
>> }
>> }
>>
>>
>>
>> What happens?
>>
>>
>> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>:
>>>
>>> It is unlimited, you just have to cast output to ubyte[]:
>>>
>>> std.file.write("logo3w.png", cast(ubyte[])
>>> cl.get("http://www.google.cz/images/srpr/logo3w.png"));
>>>
>>>
>
|
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
rawWrite():
stdout.rawWrite(cast(ubyte[]) navegador.get(a));
On 20.1.2012 18:18, Xan xan wrote:
> Mmmm... I understand it. But is there any way of circumvent it?
> Perhaps I could write to one file, isn't?
>
>
>
> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>:
>> Thats because you are trying writeln binary data, and that is impossible,
>> because writeln IMHO checks UTF8 validity.
>>
>>
>> On 20.1.2012 18:08, Xan xan wrote:
>>>
>>> Before and now, I get this error:
>>>
>>> $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf
>>> [Excepció: std.conv.ConvException@/usr/include/d2/4.6/std/conv.d(1640):
>>> Can't convert value `HTT' of type string to type uint]
>>>
>>> The code:
>>>
>>> //D 2.0
>>> //gdmd-4.6<fitxer> => surt el fitxer amb el mateix nom i .o
>>> //Usa https://github.com/Bystroushaak/DHTTPClient
>>> import std.stdio, std.string, std.conv, std.stream;
>>> import std.socket, std.socketstream;
>>> import dhttpclient;
>>>
>>> int main(string [] args)
>>> {
>>> if (args.length< 2) {
>>> writeln("Usage:");
>>> writeln(" ./spider {<url1>,<url2>, ...}");
>>> return 0;
>>> }
>>> else {
>>> try {
>>> string[string] capcalera = dhttpclient.FFHeaders;
>>> //capcalera["User-Agent"] = "arachnida yottiuma";
>>> HTTPClient navegador = new HTTPClient();
>>> navegador.setClientHeaders(capcalera);
>>>
>>> foreach (a; args[1..$]) {
>>> writeln("[Contingut: ", cast(ubyte[])
>>> navegador.get(a), "]");
>>> }
>>> }
>>> catch (Exception e) {
>>> writeln("[Excepció: ", e, "]");
>>> }
>>> return 0;
>>> }
>>> }
>>>
>>>
>>>
>>> What happens?
>>>
>>>
>>> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>:
>>>>
>>>> It is unlimited, you just have to cast output to ubyte[]:
>>>>
>>>> std.file.write("logo3w.png", cast(ubyte[])
>>>> cl.get("http://www.google.cz/images/srpr/logo3w.png"));
>>>>
>>>>
>>
|
Copyright © 1999-2021 by the D Language Foundation