Thread overview | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
January 19, 2012 Reading web pages | ||||
---|---|---|---|---|
| ||||
Hi, I want to simply code a script to get the url as string in D 2.0. I have this code: //D 2.0 //gdmd-4.6 import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; int main(string [] args) { if (args.length < 2) { writeln("Usage:"); writeln(" ./aranya {<url1>, <url2>, ...}"); return 0; } else { foreach (a; args[1..$]) { Socket sock = new TcpSocket(new InternetAddress(a, 80)); scope(exit) sock.close(); Stream ss = new SocketStream(sock); ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n"); writeln(ss); } return 0; } } but when I use it, I receive: $ ./aranya http://www.google.com std.socket.AddressException@../../../src/libphobos/std/socket.d(697): Unable to resolve host 'http://www.google.com' What fails? Thanks in advance, Xan. |
January 19, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
Posted in reply to Xan xan | On 01/19/2012 04:30 PM, Xan xan wrote:
> Hi,
>
> I want to simply code a script to get the url as string in D 2.0.
> I have this code:
>
> //D 2.0
> //gdmd-4.6
> import std.stdio, std.string, std.conv, std.stream;
> import std.socket, std.socketstream;
>
> int main(string [] args)
> {
> if (args.length< 2) {
> writeln("Usage:");
> writeln(" ./aranya {<url1>,<url2>, ...}");
> return 0;
> }
> else {
> foreach (a; args[1..$]) {
> Socket sock = new TcpSocket(new InternetAddress(a, 80));
> scope(exit) sock.close();
> Stream ss = new SocketStream(sock);
> ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n");
> writeln(ss);
> }
> return 0;
> }
> }
>
>
> but when I use it, I receive:
> $ ./aranya http://www.google.com
> std.socket.AddressException@../../../src/libphobos/std/socket.d(697):
> Unable to resolve host 'http://www.google.com'
>
> What fails?
>
> Thanks in advance,
> Xan.
The protocol specification is part of the get request.
./aranaya www.google.com
seems to actually connect to google. (it still does not work fully, I get back 400 Bad Request, but maybe you can figure it out)
|
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
Posted in reply to Xan xan | The host is www.google.com - http is only a web protocol. The DNS lookup is independent of HTTP, and thus should not include it. Note that you're also missing a space after the GET. Also, in terms of the example given, some servers won't like you not using the Host header, some won't like the GET being an absolute path instead of relative (but the two combined should make most accept it). There's a CURL wrapper added, and a higher level version should be available within the next release or two, you make want to look into that.
On 19/01/2012 9:30 AM, Xan xan wrote:
> Hi,
>
> I want to simply code a script to get the url as string in D 2.0.
> I have this code:
>
> //D 2.0
> //gdmd-4.6
> import std.stdio, std.string, std.conv, std.stream;
> import std.socket, std.socketstream;
>
> int main(string [] args)
> {
> if (args.length< 2) {
> writeln("Usage:");
> writeln(" ./aranya {<url1>,<url2>, ...}");
> return 0;
> }
> else {
> foreach (a; args[1..$]) {
> Socket sock = new TcpSocket(new InternetAddress(a, 80));
> scope(exit) sock.close();
> Stream ss = new SocketStream(sock);
> ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n");
> writeln(ss);
> }
> return 0;
> }
> }
>
>
> but when I use it, I receive:
> $ ./aranya http://www.google.com
> std.socket.AddressException@../../../src/libphobos/std/socket.d(697):
> Unable to resolve host 'http://www.google.com'
>
> What fails?
>
> Thanks in advance,
> Xan.
|
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | You can always use my module: https://github.com/Bystroushaak/DHTTPClient On 19.1.2012 20:24, Timon Gehr wrote: > On 01/19/2012 04:30 PM, Xan xan wrote: >> Hi, >> >> I want to simply code a script to get the url as string in D 2.0. >> I have this code: >> >> //D 2.0 >> //gdmd-4.6 >> import std.stdio, std.string, std.conv, std.stream; >> import std.socket, std.socketstream; >> >> int main(string [] args) >> { >> if (args.length< 2) { >> writeln("Usage:"); >> writeln(" ./aranya {<url1>,<url2>, ...}"); >> return 0; >> } >> else { >> foreach (a; args[1..$]) { >> Socket sock = new TcpSocket(new InternetAddress(a, 80)); >> scope(exit) sock.close(); >> Stream ss = new SocketStream(sock); >> ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n"); >> writeln(ss); >> } >> return 0; >> } >> } >> >> >> but when I use it, I receive: >> $ ./aranya http://www.google.com >> std.socket.AddressException@../../../src/libphobos/std/socket.d(697): >> Unable to resolve host 'http://www.google.com' >> >> What fails? >> >> Thanks in advance, >> Xan. > > The protocol specification is part of the get request. > > ./aranaya www.google.com > > seems to actually connect to google. (it still does not work fully, I > get back 400 Bad Request, but maybe you can figure it out) |
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | Nope: xan@gerret:~/yottium/@codi/aranya-d2.0$ gdmd-4.6 aranya.d xan@gerret:~/yottium/@codi/aranya-d2.0$ ./aranya www.google.com std.socket.TcpSocket What fails? 2012/1/19 Timon Gehr <timon.gehr@gmx.ch>: > On 01/19/2012 04:30 PM, Xan xan wrote: >> >> Hi, >> >> I want to simply code a script to get the url as string in D 2.0. I have this code: >> >> //D 2.0 >> //gdmd-4.6 >> import std.stdio, std.string, std.conv, std.stream; >> import std.socket, std.socketstream; >> >> int main(string [] args) >> { >> if (args.length< 2) { >> writeln("Usage:"); >> writeln(" ./aranya {<url1>,<url2>, ...}"); >> return 0; >> } >> else { >> foreach (a; args[1..$]) { >> Socket sock = new TcpSocket(new InternetAddress(a, >> 80)); >> scope(exit) sock.close(); >> Stream ss = new SocketStream(sock); >> ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n"); >> writeln(ss); >> } >> return 0; >> } >> } >> >> >> but when I use it, I receive: >> $ ./aranya http://www.google.com >> std.socket.AddressException@../../../src/libphobos/std/socket.d(697): >> Unable to resolve host 'http://www.google.com' >> >> What fails? >> >> Thanks in advance, >> Xan. > > > The protocol specification is part of the get request. > > ./aranaya www.google.com > > seems to actually connect to google. (it still does not work fully, I get back 400 Bad Request, but maybe you can figure it out) |
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
Thanks for that. The standard library would include it. It will easy the things.... high level, please. For the other hand, how to specify the protocol? It's not the same http://foo than ftp://foo Thanks, Xan. 2012/1/20 Bystroushaak <bystrousak@kitakitsune.org>: > You can always use my module: https://github.com/Bystroushaak/DHTTPClient > > > On 19.1.2012 20:24, Timon Gehr wrote: >> >> On 01/19/2012 04:30 PM, Xan xan wrote: >>> >>> Hi, >>> >>> I want to simply code a script to get the url as string in D 2.0. I have this code: >>> >>> //D 2.0 >>> //gdmd-4.6 >>> import std.stdio, std.string, std.conv, std.stream; >>> import std.socket, std.socketstream; >>> >>> int main(string [] args) >>> { >>> if (args.length< 2) { >>> writeln("Usage:"); >>> writeln(" ./aranya {<url1>,<url2>, ...}"); >>> return 0; >>> } >>> else { >>> foreach (a; args[1..$]) { >>> Socket sock = new TcpSocket(new InternetAddress(a, 80)); >>> scope(exit) sock.close(); >>> Stream ss = new SocketStream(sock); >>> ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n"); >>> writeln(ss); >>> } >>> return 0; >>> } >>> } >>> >>> >>> but when I use it, I receive: >>> $ ./aranya http://www.google.com >>> std.socket.AddressException@../../../src/libphobos/std/socket.d(697): >>> Unable to resolve host 'http://www.google.com' >>> >>> What fails? >>> >>> Thanks in advance, >>> Xan. >> >> >> The protocol specification is part of the get request. >> >> ./aranaya www.google.com >> >> seems to actually connect to google. (it still does not work fully, I get back 400 Bad Request, but maybe you can figure it out) |
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
I get errors: xan@gerret:~/yottium/@codi/aranya-d2.0$ gdmd-4.6 spider.d spider.o: In function `_Dmain': spider.d:(.text+0x4d): undefined reference to `_D11dhttpclient10HTTPClient7__ClassZ' spider.d:(.text+0x5a): undefined reference to `_D11dhttpclient10HTTPClient6__ctorMFZC11dhttpclient10HTTPClient' spider.o:(.data+0x24): undefined reference to `_D11dhttpclient12__ModuleInfoZ' collect2: ld returned 1 exit status with the file spider.d: //D 2.0 //gdmd-4.6 <fitxer> => surt el fitxer amb el mateix nom i .o //Usa https://github.com/Bystroushaak/DHTTPClient import std.stdio, std.string, std.conv, std.stream; import std.socket, std.socketstream; import dhttpclient; int main(string [] args) { if (args.length < 2) { writeln("Usage:"); writeln(" ./spider {<url1>, <url2>, ...}"); return 0; } else { try { HTTPClient navegador = new HTTPClient(); foreach (a; args[1..$]) { writeln("[Contingut: ", navegador.get(a), "]"); } } catch (Exception e) { writeln("[Excepció: ", e, "]"); } return 0; } } What happens now? Thanks a lot, Xan. 2012/1/20 Bystroushaak <bystrousak@kitakitsune.org>: > You can always use my module: https://github.com/Bystroushaak/DHTTPClient > > |
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
With dmd 2.057 on my linux machine: bystrousak:DHTTPClient,0$ dmd spider.d dhttpclient.d bystrousak:DHTTPClient,0$ ./spider http://kitakitsune.org [Contingut: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML> ..... On 20.1.2012 15:37, Xan xan wrote: > I get errors: > > xan@gerret:~/yottium/@codi/aranya-d2.0$ gdmd-4.6 spider.d > spider.o: In function `_Dmain': > spider.d:(.text+0x4d): undefined reference to > `_D11dhttpclient10HTTPClient7__ClassZ' > spider.d:(.text+0x5a): undefined reference to > `_D11dhttpclient10HTTPClient6__ctorMFZC11dhttpclient10HTTPClient' > spider.o:(.data+0x24): undefined reference to `_D11dhttpclient12__ModuleInfoZ' > collect2: ld returned 1 exit status > > > with the file spider.d: > > //D 2.0 > //gdmd-4.6<fitxer> => surt el fitxer amb el mateix nom i .o > //Usa https://github.com/Bystroushaak/DHTTPClient > import std.stdio, std.string, std.conv, std.stream; > import std.socket, std.socketstream; > import dhttpclient; > > int main(string [] args) > { > if (args.length< 2) { > writeln("Usage:"); > writeln(" ./spider {<url1>,<url2>, ...}"); > return 0; > } > else { > try { > HTTPClient navegador = new HTTPClient(); > foreach (a; args[1..$]) { > writeln("[Contingut: ", navegador.get(a), "]"); > } > } > catch (Exception e) { > writeln("[Excepció: ", e, "]"); > } > return 0; > } > } > > > > What happens now? > > Thanks a lot, > Xan. > > 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>: >> You can always use my module: >> https://github.com/Bystroushaak/DHTTPClient >> >> |
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
Yes. I ddi not know that I have to compile the two d files, although it has sense ;-)
Perfect.
On the other hand, I see dhttpclient identifies as
"Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
Gecko/20100401 Firefox/3.6.13"
How can I change that?
2012/1/20 Bystroushaak <bystrousak@kitakitsune.org>:
> With dmd 2.057 on my linux machine:
>
> bystrousak:DHTTPClient,0$ dmd spider.d dhttpclient.d
> bystrousak:DHTTPClient,0$ ./spider http://kitakitsune.org
> [Contingut: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
> "http://www.w3.org/TR/html4/loose.dtd">
>
> <HTML>
> .....
>
>
>
> On 20.1.2012 15:37, Xan xan wrote:
>>
>> I get errors:
>>
>> xan@gerret:~/yottium/@codi/aranya-d2.0$ gdmd-4.6 spider.d
>> spider.o: In function `_Dmain':
>> spider.d:(.text+0x4d): undefined reference to
>> `_D11dhttpclient10HTTPClient7__ClassZ'
>> spider.d:(.text+0x5a): undefined reference to
>> `_D11dhttpclient10HTTPClient6__ctorMFZC11dhttpclient10HTTPClient'
>> spider.o:(.data+0x24): undefined reference to
>> `_D11dhttpclient12__ModuleInfoZ'
>> collect2: ld returned 1 exit status
>>
>>
>> with the file spider.d:
>>
>> //D 2.0
>> //gdmd-4.6<fitxer> => surt el fitxer amb el mateix nom i .o
>> //Usa https://github.com/Bystroushaak/DHTTPClient
>> import std.stdio, std.string, std.conv, std.stream;
>> import std.socket, std.socketstream;
>> import dhttpclient;
>>
>> int main(string [] args)
>> {
>> if (args.length< 2) {
>> writeln("Usage:");
>> writeln(" ./spider {<url1>,<url2>, ...}");
>> return 0;
>> }
>> else {
>> try {
>> HTTPClient navegador = new HTTPClient();
>> foreach (a; args[1..$]) {
>> writeln("[Contingut: ", navegador.get(a),
>> "]");
>> }
>> }
>> catch (Exception e) {
>> writeln("[Excepció: ", e, "]");
>> }
>> return 0;
>> }
>> }
>>
>>
>>
>> What happens now?
>>
>> Thanks a lot,
>> Xan.
>>
>> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>:
>>>
>>> You can always use my module: https://github.com/Bystroushaak/DHTTPClient
>>>
>>>
>
|
January 20, 2012 Re: Reading web pages | ||||
---|---|---|---|---|
| ||||
This module is very simple, only for HTTP protocol, but there is way how to add HTTPS:
public void setTcpSocketCreator(TcpSocket function(string domain, ushort port) fn)
You can add lambda function which return SSL socket, which will be called for every connection.
FTP is not supported - it is DHTTPCLient, not DFTPClient :)
On 20.1.2012 15:24, Xan xan wrote:
> For the other hand, how to specify the protocol? It's not the same
> http://foo thanftp://foo
|
Copyright © 1999-2021 by the D Language Foundation