View mode: basic / threaded / horizontal-split · Log in · Help
January 20, 2012
Re: Reading web pages
There are two ways:

Change global variable for module:

dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own

This will change headers for all clients.

---

Change instance headers:

string[string] my_headers = dhttpclient.FFHeaders; // there are more 
headers than just User-Agent and you have to copy it
my_headers["User-Agent"] = "My own spider!";

HTTPClient navegador = new HTTPClient();
navegador.setClientHeaders(my_headers);

---

Headers are defined as:

public enum string[string] FFHeaders = [
  "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; 
rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13",
  "Accept" : 
"text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
  "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
  "Accept-Charset" : "utf-8",
  "Keep-Alive" : "300",
  "Connection" : "keep-alive"
];

/// Headers from firefox 3.6.13 on Linux
public enum string[string] LFFHeaders = [
  "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3) 
Gecko/20100401 Firefox/3.6.13",
  "Accept" : 
"text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
  "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
  "Accept-Charset" : "utf-8",
  "Keep-Alive" : "300",
  "Connection" : "keep-alive"
];

Accept, Accept-Charset, Kepp-ALive and Connection are important and if 
you redefine it, module can stop work with some servers.

On 20.1.2012 15:56, Xan xan wrote:
> On the other hand, I see dhttpclient  identifies as
>   "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
> Gecko/20100401 Firefox/3.6.13"
>
> How can I change that?
January 20, 2012
Re: Reading web pages
First version was buggy. I've updated code at github, so if you want to 
try it, pull new version (git pull). I've also added new example into 
examples/user_agent_change.d

On 20.1.2012 16:08, Bystroushaak wrote:
> There are two ways:
>
> Change global variable for module:
>
> dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own
>
> This will change headers for all clients.
>
> ---
>
> Change instance headers:
>
> string[string] my_headers = dhttpclient.FFHeaders; // there are more
> headers than just User-Agent and you have to copy it
> my_headers["User-Agent"] = "My own spider!";
>
> HTTPClient navegador = new HTTPClient();
> navegador.setClientHeaders(my_headers);
>
> ---
>
> Headers are defined as:
>
> public enum string[string] FFHeaders = [
> "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
> Gecko/20100401 Firefox/3.6.13",
> "Accept" :
> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>
> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
> "Accept-Charset" : "utf-8",
> "Keep-Alive" : "300",
> "Connection" : "keep-alive"
> ];
>
> /// Headers from firefox 3.6.13 on Linux
> public enum string[string] LFFHeaders = [
> "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3)
> Gecko/20100401 Firefox/3.6.13",
> "Accept" :
> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>
> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
> "Accept-Charset" : "utf-8",
> "Keep-Alive" : "300",
> "Connection" : "keep-alive"
> ];
>
> Accept, Accept-Charset, Kepp-ALive and Connection are important and if
> you redefine it, module can stop work with some servers.
>
> On 20.1.2012 15:56, Xan xan wrote:
>> On the other hand, I see dhttpclient identifies as
>> "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
>> Gecko/20100401 Firefox/3.6.13"
>>
>> How can I change that?
January 20, 2012
Re: Reading web pages
Thank you very much, Bystroushaak.
I see you limite httpclient to xml/html documents. Is there
possibility of download any files (and not only html or xml). Just
like:

HTTPClient navegador = new HTTPClient();
auto file = navegador.download("http://www.google.com/myfile.pdf")

?

Thanks a lot,



2012/1/20 Bystroushaak <bystrousak@kitakitsune.org>:
> First version was buggy. I've updated code at github, so if you want to try
> it, pull new version (git pull). I've also added new example into
> examples/user_agent_change.d
>
>
> On 20.1.2012 16:08, Bystroushaak wrote:
>>
>> There are two ways:
>>
>> Change global variable for module:
>>
>> dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own
>>
>> This will change headers for all clients.
>>
>> ---
>>
>> Change instance headers:
>>
>> string[string] my_headers = dhttpclient.FFHeaders; // there are more
>> headers than just User-Agent and you have to copy it
>> my_headers["User-Agent"] = "My own spider!";
>>
>> HTTPClient navegador = new HTTPClient();
>> navegador.setClientHeaders(my_headers);
>>
>> ---
>>
>> Headers are defined as:
>>
>> public enum string[string] FFHeaders = [
>> "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
>> Gecko/20100401 Firefox/3.6.13",
>> "Accept" :
>>
>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>>
>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
>> "Accept-Charset" : "utf-8",
>> "Keep-Alive" : "300",
>> "Connection" : "keep-alive"
>> ];
>>
>> /// Headers from firefox 3.6.13 on Linux
>> public enum string[string] LFFHeaders = [
>> "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3)
>> Gecko/20100401 Firefox/3.6.13",
>> "Accept" :
>>
>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>>
>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
>> "Accept-Charset" : "utf-8",
>> "Keep-Alive" : "300",
>> "Connection" : "keep-alive"
>> ];
>>
>> Accept, Accept-Charset, Kepp-ALive and Connection are important and if
>> you redefine it, module can stop work with some servers.
>>
>> On 20.1.2012 15:56, Xan xan wrote:
>>>
>>> On the other hand, I see dhttpclient identifies as
>>> "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
>>> Gecko/20100401 Firefox/3.6.13"
>>>
>>> How can I change that?
January 20, 2012
Re: Reading web pages
It is unlimited, you just have to cast output to ubyte[]:

std.file.write("logo3w.png", cast(ubyte[]) 
cl.get("http://www.google.cz/images/srpr/logo3w.png"));

On 20.1.2012 17:53, Xan xan wrote:
> Thank you very much, Bystroushaak.
> I see you limite httpclient to xml/html documents. Is there
> possibility of download any files (and not only html or xml). Just
> like:
>
> HTTPClient navegador = new HTTPClient();
> auto file = navegador.download("http://www.google.com/myfile.pdf")
>
> ?
>
> Thanks a lot,
>
>
>
> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>:
>> First version was buggy. I've updated code at github, so if you want to try
>> it, pull new version (git pull). I've also added new example into
>> examples/user_agent_change.d
>>
>>
>> On 20.1.2012 16:08, Bystroushaak wrote:
>>>
>>> There are two ways:
>>>
>>> Change global variable for module:
>>>
>>> dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own
>>>
>>> This will change headers for all clients.
>>>
>>> ---
>>>
>>> Change instance headers:
>>>
>>> string[string] my_headers = dhttpclient.FFHeaders; // there are more
>>> headers than just User-Agent and you have to copy it
>>> my_headers["User-Agent"] = "My own spider!";
>>>
>>> HTTPClient navegador = new HTTPClient();
>>> navegador.setClientHeaders(my_headers);
>>>
>>> ---
>>>
>>> Headers are defined as:
>>>
>>> public enum string[string] FFHeaders = [
>>> "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
>>> Gecko/20100401 Firefox/3.6.13",
>>> "Accept" :
>>>
>>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>>>
>>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
>>> "Accept-Charset" : "utf-8",
>>> "Keep-Alive" : "300",
>>> "Connection" : "keep-alive"
>>> ];
>>>
>>> /// Headers from firefox 3.6.13 on Linux
>>> public enum string[string] LFFHeaders = [
>>> "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3)
>>> Gecko/20100401 Firefox/3.6.13",
>>> "Accept" :
>>>
>>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>>>
>>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
>>> "Accept-Charset" : "utf-8",
>>> "Keep-Alive" : "300",
>>> "Connection" : "keep-alive"
>>> ];
>>>
>>> Accept, Accept-Charset, Kepp-ALive and Connection are important and if
>>> you redefine it, module can stop work with some servers.
>>>
>>> On 20.1.2012 15:56, Xan xan wrote:
>>>>
>>>> On the other hand, I see dhttpclient identifies as
>>>> "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
>>>> Gecko/20100401 Firefox/3.6.13"
>>>>
>>>> How can I change that?
January 20, 2012
Re: Reading web pages
If you want to know what type of file you just downloaded, look at 
.getResponseHeaders():

  std.file.write("logo3w.png", cast(ubyte[]) 
cl.get("http://www.google.cz/images/srpr/logo3w.png"));
  writeln(cl.getResponseHeaders()["Content-Type"]);

Which will print in this case: image/png

Here is full example: 
https://github.com/Bystroushaak/DHTTPClient/blob/master/examples/download_binary_file.d

On 20.1.2012 18:00, Bystroushaak wrote:
> It is unlimited, you just have to cast output to ubyte[]:
>
> std.file.write("logo3w.png", cast(ubyte[])
> cl.get("http://www.google.cz/images/srpr/logo3w.png"));
>
> On 20.1.2012 17:53, Xan xan wrote:
>> Thank you very much, Bystroushaak.
>> I see you limite httpclient to xml/html documents. Is there
>> possibility of download any files (and not only html or xml). Just
>> like:
>>
>> HTTPClient navegador = new HTTPClient();
>> auto file = navegador.download("http://www.google.com/myfile.pdf")
>>
>> ?
>>
>> Thanks a lot,
>>
>>
>>
>> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>:
>>> First version was buggy. I've updated code at github, so if you want
>>> to try
>>> it, pull new version (git pull). I've also added new example into
>>> examples/user_agent_change.d
>>>
>>>
>>> On 20.1.2012 16:08, Bystroushaak wrote:
>>>>
>>>> There are two ways:
>>>>
>>>> Change global variable for module:
>>>>
>>>> dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own
>>>>
>>>> This will change headers for all clients.
>>>>
>>>> ---
>>>>
>>>> Change instance headers:
>>>>
>>>> string[string] my_headers = dhttpclient.FFHeaders; // there are more
>>>> headers than just User-Agent and you have to copy it
>>>> my_headers["User-Agent"] = "My own spider!";
>>>>
>>>> HTTPClient navegador = new HTTPClient();
>>>> navegador.setClientHeaders(my_headers);
>>>>
>>>> ---
>>>>
>>>> Headers are defined as:
>>>>
>>>> public enum string[string] FFHeaders = [
>>>> "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs;
>>>> rv:1.9.2.3)
>>>> Gecko/20100401 Firefox/3.6.13",
>>>> "Accept" :
>>>>
>>>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>>>>
>>>>
>>>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
>>>> "Accept-Charset" : "utf-8",
>>>> "Keep-Alive" : "300",
>>>> "Connection" : "keep-alive"
>>>> ];
>>>>
>>>> /// Headers from firefox 3.6.13 on Linux
>>>> public enum string[string] LFFHeaders = [
>>>> "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3)
>>>> Gecko/20100401 Firefox/3.6.13",
>>>> "Accept" :
>>>>
>>>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>>>>
>>>>
>>>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
>>>> "Accept-Charset" : "utf-8",
>>>> "Keep-Alive" : "300",
>>>> "Connection" : "keep-alive"
>>>> ];
>>>>
>>>> Accept, Accept-Charset, Kepp-ALive and Connection are important and if
>>>> you redefine it, module can stop work with some servers.
>>>>
>>>> On 20.1.2012 15:56, Xan xan wrote:
>>>>>
>>>>> On the other hand, I see dhttpclient identifies as
>>>>> "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
>>>>> Gecko/20100401 Firefox/3.6.13"
>>>>>
>>>>> How can I change that?
January 20, 2012
Re: Reading web pages
Before and now, I get this error:

$ ./spider http://static.arxiv.org/pdf/1109.4897.pdf
[Excepció: std.conv.ConvException@/usr/include/d2/4.6/std/conv.d(1640):
Can't convert value `HTT' of type string to type uint]

The code:

//D 2.0
//gdmd-4.6 <fitxer> => surt el fitxer amb el mateix nom i .o
//Usa https://github.com/Bystroushaak/DHTTPClient
import std.stdio, std.string, std.conv, std.stream;
import std.socket, std.socketstream;
import dhttpclient;

int main(string [] args)
{
   if (args.length < 2) {
		writeln("Usage:");
		writeln("   ./spider {<url1>, <url2>, ...}");
		return 0;
	}
	else {
		try {
			string[string] capcalera = dhttpclient.FFHeaders;
			//capcalera["User-Agent"] = "arachnida yottiuma";
			HTTPClient navegador = new HTTPClient();
			navegador.setClientHeaders(capcalera);

			foreach (a; args[1..$]) {
				writeln("[Contingut: ", cast(ubyte[]) navegador.get(a), "]");
			}
		}
		catch (Exception e) {
			writeln("[Excepció: ", e, "]");
		}
		return 0;
	}
}



What happens?


2012/1/20 Bystroushaak <bystrousak@kitakitsune.org>:
> It is unlimited, you just have to cast output to ubyte[]:
>
> std.file.write("logo3w.png", cast(ubyte[])
> cl.get("http://www.google.cz/images/srpr/logo3w.png"));
>
>
January 20, 2012
Re: Reading web pages
Thanks, but what fails that, because I downloaded as collection of
bytes. No matter if a file is a pdf, png or whatever if I downloaded
as bytes, isn't?

Thanks,


2012/1/20 Bystroushaak <bystrousak@kitakitsune.org>:
> If you want to know what type of file you just downloaded, look at
> .getResponseHeaders():
>
>
>  std.file.write("logo3w.png", cast(ubyte[])
> cl.get("http://www.google.cz/images/srpr/logo3w.png"));
>  writeln(cl.getResponseHeaders()["Content-Type"]);
>
> Which will print in this case: image/png
>
> Here is full example:
> https://github.com/Bystroushaak/DHTTPClient/blob/master/examples/download_binary_file.d
>
>
> On 20.1.2012 18:00, Bystroushaak wrote:
>>
>> It is unlimited, you just have to cast output to ubyte[]:
>>
>> std.file.write("logo3w.png", cast(ubyte[])
>> cl.get("http://www.google.cz/images/srpr/logo3w.png"));
>>
>> On 20.1.2012 17:53, Xan xan wrote:
>>>
>>> Thank you very much, Bystroushaak.
>>> I see you limite httpclient to xml/html documents. Is there
>>> possibility of download any files (and not only html or xml). Just
>>> like:
>>>
>>> HTTPClient navegador = new HTTPClient();
>>> auto file = navegador.download("http://www.google.com/myfile.pdf")
>>>
>>> ?
>>>
>>> Thanks a lot,
>>>
>>>
>>>
>>> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>:
>>>>
>>>> First version was buggy. I've updated code at github, so if you want
>>>> to try
>>>> it, pull new version (git pull). I've also added new example into
>>>> examples/user_agent_change.d
>>>>
>>>>
>>>> On 20.1.2012 16:08, Bystroushaak wrote:
>>>>>
>>>>>
>>>>> There are two ways:
>>>>>
>>>>> Change global variable for module:
>>>>>
>>>>> dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own
>>>>>
>>>>> This will change headers for all clients.
>>>>>
>>>>> ---
>>>>>
>>>>> Change instance headers:
>>>>>
>>>>> string[string] my_headers = dhttpclient.FFHeaders; // there are more
>>>>> headers than just User-Agent and you have to copy it
>>>>> my_headers["User-Agent"] = "My own spider!";
>>>>>
>>>>> HTTPClient navegador = new HTTPClient();
>>>>> navegador.setClientHeaders(my_headers);
>>>>>
>>>>> ---
>>>>>
>>>>> Headers are defined as:
>>>>>
>>>>> public enum string[string] FFHeaders = [
>>>>> "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs;
>>>>> rv:1.9.2.3)
>>>>> Gecko/20100401 Firefox/3.6.13",
>>>>> "Accept" :
>>>>>
>>>>>
>>>>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>>>>>
>>>>>
>>>>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
>>>>> "Accept-Charset" : "utf-8",
>>>>> "Keep-Alive" : "300",
>>>>> "Connection" : "keep-alive"
>>>>> ];
>>>>>
>>>>> /// Headers from firefox 3.6.13 on Linux
>>>>> public enum string[string] LFFHeaders = [
>>>>> "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3)
>>>>> Gecko/20100401 Firefox/3.6.13",
>>>>> "Accept" :
>>>>>
>>>>>
>>>>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>>>>>
>>>>>
>>>>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
>>>>> "Accept-Charset" : "utf-8",
>>>>> "Keep-Alive" : "300",
>>>>> "Connection" : "keep-alive"
>>>>> ];
>>>>>
>>>>> Accept, Accept-Charset, Kepp-ALive and Connection are important and if
>>>>> you redefine it, module can stop work with some servers.
>>>>>
>>>>> On 20.1.2012 15:56, Xan xan wrote:
>>>>>>
>>>>>>
>>>>>> On the other hand, I see dhttpclient identifies as
>>>>>> "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
>>>>>> Gecko/20100401 Firefox/3.6.13"
>>>>>>
>>>>>> How can I change that?
January 20, 2012
Re: Reading web pages
Thats because you are trying writeln binary data, and that is 
impossible, because writeln IMHO checks UTF8 validity.

On 20.1.2012 18:08, Xan xan wrote:
> Before and now, I get this error:
>
> $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf
> [Excepció: std.conv.ConvException@/usr/include/d2/4.6/std/conv.d(1640):
> Can't convert value `HTT' of type string to type uint]
>
> The code:
>
> //D 2.0
> //gdmd-4.6<fitxer>  =>  surt el fitxer amb el mateix nom i .o
> //Usa https://github.com/Bystroushaak/DHTTPClient
> import std.stdio, std.string, std.conv, std.stream;
> import std.socket, std.socketstream;
> import dhttpclient;
>
> int main(string [] args)
> {
>      if (args.length<  2) {
> 		writeln("Usage:");
> 		writeln("   ./spider {<url1>,<url2>, ...}");
> 		return 0;
> 	}
> 	else {
> 		try {
> 			string[string] capcalera = dhttpclient.FFHeaders;
> 			//capcalera["User-Agent"] = "arachnida yottiuma";
> 			HTTPClient navegador = new HTTPClient();
> 			navegador.setClientHeaders(capcalera);
>
> 			foreach (a; args[1..$]) {
> 				writeln("[Contingut: ", cast(ubyte[]) navegador.get(a), "]");
> 			}
> 		}
> 		catch (Exception e) {
> 			writeln("[Excepció: ", e, "]");
> 		}
> 		return 0;
> 	}
> }
>
>
>
> What happens?
>
>
> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>:
>> It is unlimited, you just have to cast output to ubyte[]:
>>
>> std.file.write("logo3w.png", cast(ubyte[])
>> cl.get("http://www.google.cz/images/srpr/logo3w.png"));
>>
>>
January 20, 2012
Re: Reading web pages
Mmmm... I understand it. But is there any way of circumvent it?
Perhaps I could write to one file, isn't?



2012/1/20 Bystroushaak <bystrousak@kitakitsune.org>:
> Thats because you are trying writeln binary data, and that is impossible,
> because writeln IMHO checks UTF8 validity.
>
>
> On 20.1.2012 18:08, Xan xan wrote:
>>
>> Before and now, I get this error:
>>
>> $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf
>> [Excepció: std.conv.ConvException@/usr/include/d2/4.6/std/conv.d(1640):
>> Can't convert value `HTT' of type string to type uint]
>>
>> The code:
>>
>> //D 2.0
>> //gdmd-4.6<fitxer>  =>  surt el fitxer amb el mateix nom i .o
>> //Usa https://github.com/Bystroushaak/DHTTPClient
>> import std.stdio, std.string, std.conv, std.stream;
>> import std.socket, std.socketstream;
>> import dhttpclient;
>>
>> int main(string [] args)
>> {
>>     if (args.length<  2) {
>>                writeln("Usage:");
>>                writeln("   ./spider {<url1>,<url2>, ...}");
>>                return 0;
>>        }
>>        else {
>>                try {
>>                        string[string] capcalera = dhttpclient.FFHeaders;
>>                        //capcalera["User-Agent"] = "arachnida yottiuma";
>>                        HTTPClient navegador = new HTTPClient();
>>                        navegador.setClientHeaders(capcalera);
>>
>>                        foreach (a; args[1..$]) {
>>                                writeln("[Contingut: ", cast(ubyte[])
>> navegador.get(a), "]");
>>                        }
>>                }
>>                catch (Exception e) {
>>                        writeln("[Excepció: ", e, "]");
>>                }
>>                return 0;
>>        }
>> }
>>
>>
>>
>> What happens?
>>
>>
>> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>:
>>>
>>> It is unlimited, you just have to cast output to ubyte[]:
>>>
>>> std.file.write("logo3w.png", cast(ubyte[])
>>> cl.get("http://www.google.cz/images/srpr/logo3w.png"));
>>>
>>>
>
January 20, 2012
Re: Reading web pages
rawWrite():

stdout.rawWrite(cast(ubyte[]) navegador.get(a));

On 20.1.2012 18:18, Xan xan wrote:
> Mmmm... I understand it. But is there any way of circumvent it?
> Perhaps I could write to one file, isn't?
>
>
>
> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>:
>> Thats because you are trying writeln binary data, and that is impossible,
>> because writeln IMHO checks UTF8 validity.
>>
>>
>> On 20.1.2012 18:08, Xan xan wrote:
>>>
>>> Before and now, I get this error:
>>>
>>> $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf
>>> [Excepció: std.conv.ConvException@/usr/include/d2/4.6/std/conv.d(1640):
>>> Can't convert value `HTT' of type string to type uint]
>>>
>>> The code:
>>>
>>> //D 2.0
>>> //gdmd-4.6<fitxer>    =>    surt el fitxer amb el mateix nom i .o
>>> //Usa https://github.com/Bystroushaak/DHTTPClient
>>> import std.stdio, std.string, std.conv, std.stream;
>>> import std.socket, std.socketstream;
>>> import dhttpclient;
>>>
>>> int main(string [] args)
>>> {
>>>      if (args.length<    2) {
>>>                 writeln("Usage:");
>>>                 writeln("   ./spider {<url1>,<url2>, ...}");
>>>                 return 0;
>>>         }
>>>         else {
>>>                 try {
>>>                         string[string] capcalera = dhttpclient.FFHeaders;
>>>                         //capcalera["User-Agent"] = "arachnida yottiuma";
>>>                         HTTPClient navegador = new HTTPClient();
>>>                         navegador.setClientHeaders(capcalera);
>>>
>>>                         foreach (a; args[1..$]) {
>>>                                 writeln("[Contingut: ", cast(ubyte[])
>>> navegador.get(a), "]");
>>>                         }
>>>                 }
>>>                 catch (Exception e) {
>>>                         writeln("[Excepció: ", e, "]");
>>>                 }
>>>                 return 0;
>>>         }
>>> }
>>>
>>>
>>>
>>> What happens?
>>>
>>>
>>> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>:
>>>>
>>>> It is unlimited, you just have to cast output to ubyte[]:
>>>>
>>>> std.file.write("logo3w.png", cast(ubyte[])
>>>> cl.get("http://www.google.cz/images/srpr/logo3w.png"));
>>>>
>>>>
>>
1 2 3
Top | Discussion index | About this forum | D home