View mode: basic / threaded / horizontal-split · Log in · Help
January 19, 2012
Reading web pages
Hi,

I want to simply code a script to get the url as string in D 2.0.
I have this code:

//D 2.0
//gdmd-4.6
import std.stdio, std.string, std.conv, std.stream;
import std.socket, std.socketstream;

int main(string [] args)
{
   if (args.length < 2) {
		writeln("Usage:");
		writeln("   ./aranya {<url1>, <url2>, ...}");
		return 0;
	}
	else {
		foreach (a; args[1..$]) {
			Socket sock = new TcpSocket(new InternetAddress(a, 80));
			scope(exit) sock.close();
			Stream ss = new SocketStream(sock);
			ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n");
			writeln(ss);
		}
		return 0;
	}
}


but when I use it, I receive:
$ ./aranya http://www.google.com
std.socket.AddressException@../../../src/libphobos/std/socket.d(697):
Unable to resolve host 'http://www.google.com'

What fails?

Thanks in advance,
Xan.
January 19, 2012
Re: Reading web pages
On 01/19/2012 04:30 PM, Xan xan wrote:
> Hi,
>
> I want to simply code a script to get the url as string in D 2.0.
> I have this code:
>
> //D 2.0
> //gdmd-4.6
> import std.stdio, std.string, std.conv, std.stream;
> import std.socket, std.socketstream;
>
> int main(string [] args)
> {
>      if (args.length<  2) {
> 		writeln("Usage:");
> 		writeln("   ./aranya {<url1>,<url2>, ...}");
> 		return 0;
> 	}
> 	else {
> 		foreach (a; args[1..$]) {
> 			Socket sock = new TcpSocket(new InternetAddress(a, 80));
> 			scope(exit) sock.close();
> 			Stream ss = new SocketStream(sock);
> 			ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n");
> 			writeln(ss);
> 		}
> 		return 0;
> 	}
> }
>
>
> but when I use it, I receive:
> $ ./aranya http://www.google.com
> std.socket.AddressException@../../../src/libphobos/std/socket.d(697):
> Unable to resolve host 'http://www.google.com'
>
> What fails?
>
> Thanks in advance,
> Xan.

The protocol specification is part of the get request.

./aranaya www.google.com

seems to actually connect to google. (it still does not work fully, I 
get back 400 Bad Request, but maybe you can figure it out)
January 20, 2012
Re: Reading web pages
The host is www.google.com - http is only a web protocol. The DNS lookup 
is independent of HTTP, and thus should not include it. Note that you're 
also missing a space after the GET. Also, in terms of the example given, 
some servers won't like you not using the Host header, some won't like 
the GET being an absolute path instead of relative (but the two combined 
should make most accept it). There's a CURL wrapper added, and a higher 
level version should be available within the next release or two, you 
make want to look into that.

On 19/01/2012 9:30 AM, Xan xan wrote:
> Hi,
>
> I want to simply code a script to get the url as string in D 2.0.
> I have this code:
>
> //D 2.0
> //gdmd-4.6
> import std.stdio, std.string, std.conv, std.stream;
> import std.socket, std.socketstream;
>
> int main(string [] args)
> {
>      if (args.length<  2) {
> 		writeln("Usage:");
> 		writeln("   ./aranya {<url1>,<url2>, ...}");
> 		return 0;
> 	}
> 	else {
> 		foreach (a; args[1..$]) {
> 			Socket sock = new TcpSocket(new InternetAddress(a, 80));
> 			scope(exit) sock.close();
> 			Stream ss = new SocketStream(sock);
> 			ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n");
> 			writeln(ss);
> 		}
> 		return 0;
> 	}
> }
>
>
> but when I use it, I receive:
> $ ./aranya http://www.google.com
> std.socket.AddressException@../../../src/libphobos/std/socket.d(697):
> Unable to resolve host 'http://www.google.com'
>
> What fails?
>
> Thanks in advance,
> Xan.
January 20, 2012
Re: Reading web pages
You can always use my module:
  https://github.com/Bystroushaak/DHTTPClient

On 19.1.2012 20:24, Timon Gehr wrote:
> On 01/19/2012 04:30 PM, Xan xan wrote:
>> Hi,
>>
>> I want to simply code a script to get the url as string in D 2.0.
>> I have this code:
>>
>> //D 2.0
>> //gdmd-4.6
>> import std.stdio, std.string, std.conv, std.stream;
>> import std.socket, std.socketstream;
>>
>> int main(string [] args)
>> {
>> if (args.length< 2) {
>> writeln("Usage:");
>> writeln(" ./aranya {<url1>,<url2>, ...}");
>> return 0;
>> }
>> else {
>> foreach (a; args[1..$]) {
>> Socket sock = new TcpSocket(new InternetAddress(a, 80));
>> scope(exit) sock.close();
>> Stream ss = new SocketStream(sock);
>> ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n");
>> writeln(ss);
>> }
>> return 0;
>> }
>> }
>>
>>
>> but when I use it, I receive:
>> $ ./aranya http://www.google.com
>> std.socket.AddressException@../../../src/libphobos/std/socket.d(697):
>> Unable to resolve host 'http://www.google.com'
>>
>> What fails?
>>
>> Thanks in advance,
>> Xan.
>
> The protocol specification is part of the get request.
>
> ./aranaya www.google.com
>
> seems to actually connect to google. (it still does not work fully, I
> get back 400 Bad Request, but maybe you can figure it out)
January 20, 2012
Re: Reading web pages
Nope:

xan@gerret:~/yottium/@codi/aranya-d2.0$ gdmd-4.6 aranya.d
xan@gerret:~/yottium/@codi/aranya-d2.0$ ./aranya www.google.com
std.socket.TcpSocket


What fails?

2012/1/19 Timon Gehr <timon.gehr@gmx.ch>:
> On 01/19/2012 04:30 PM, Xan xan wrote:
>>
>> Hi,
>>
>> I want to simply code a script to get the url as string in D 2.0.
>> I have this code:
>>
>> //D 2.0
>> //gdmd-4.6
>> import std.stdio, std.string, std.conv, std.stream;
>> import std.socket, std.socketstream;
>>
>> int main(string [] args)
>> {
>>     if (args.length<  2) {
>>                writeln("Usage:");
>>                writeln("   ./aranya {<url1>,<url2>, ...}");
>>                return 0;
>>        }
>>        else {
>>                foreach (a; args[1..$]) {
>>                        Socket sock = new TcpSocket(new InternetAddress(a,
>> 80));
>>                        scope(exit) sock.close();
>>                        Stream ss = new SocketStream(sock);
>>                        ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n");
>>                        writeln(ss);
>>                }
>>                return 0;
>>        }
>> }
>>
>>
>> but when I use it, I receive:
>> $ ./aranya http://www.google.com
>> std.socket.AddressException@../../../src/libphobos/std/socket.d(697):
>> Unable to resolve host 'http://www.google.com'
>>
>> What fails?
>>
>> Thanks in advance,
>> Xan.
>
>
> The protocol specification is part of the get request.
>
> ./aranaya www.google.com
>
> seems to actually connect to google. (it still does not work fully, I get
> back 400 Bad Request, but maybe you can figure it out)
January 20, 2012
Re: Reading web pages
Thanks for that. The standard library would include it. It will easy
the things.... high level, please.

For the other hand, how to specify the protocol? It's not the same
http://foo than ftp://foo

Thanks,
Xan.

2012/1/20 Bystroushaak <bystrousak@kitakitsune.org>:
> You can always use my module:
>  https://github.com/Bystroushaak/DHTTPClient
>
>
> On 19.1.2012 20:24, Timon Gehr wrote:
>>
>> On 01/19/2012 04:30 PM, Xan xan wrote:
>>>
>>> Hi,
>>>
>>> I want to simply code a script to get the url as string in D 2.0.
>>> I have this code:
>>>
>>> //D 2.0
>>> //gdmd-4.6
>>> import std.stdio, std.string, std.conv, std.stream;
>>> import std.socket, std.socketstream;
>>>
>>> int main(string [] args)
>>> {
>>> if (args.length< 2) {
>>> writeln("Usage:");
>>> writeln(" ./aranya {<url1>,<url2>, ...}");
>>> return 0;
>>> }
>>> else {
>>> foreach (a; args[1..$]) {
>>> Socket sock = new TcpSocket(new InternetAddress(a, 80));
>>> scope(exit) sock.close();
>>> Stream ss = new SocketStream(sock);
>>> ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n");
>>> writeln(ss);
>>> }
>>> return 0;
>>> }
>>> }
>>>
>>>
>>> but when I use it, I receive:
>>> $ ./aranya http://www.google.com
>>> std.socket.AddressException@../../../src/libphobos/std/socket.d(697):
>>> Unable to resolve host 'http://www.google.com'
>>>
>>> What fails?
>>>
>>> Thanks in advance,
>>> Xan.
>>
>>
>> The protocol specification is part of the get request.
>>
>> ./aranaya www.google.com
>>
>> seems to actually connect to google. (it still does not work fully, I
>> get back 400 Bad Request, but maybe you can figure it out)
January 20, 2012
Re: Reading web pages
I get errors:

xan@gerret:~/yottium/@codi/aranya-d2.0$ gdmd-4.6 spider.d
spider.o: In function `_Dmain':
spider.d:(.text+0x4d): undefined reference to
`_D11dhttpclient10HTTPClient7__ClassZ'
spider.d:(.text+0x5a): undefined reference to
`_D11dhttpclient10HTTPClient6__ctorMFZC11dhttpclient10HTTPClient'
spider.o:(.data+0x24): undefined reference to `_D11dhttpclient12__ModuleInfoZ'
collect2: ld returned 1 exit status


with the file spider.d:

//D 2.0
//gdmd-4.6 <fitxer> => surt el fitxer amb el mateix nom i .o
//Usa https://github.com/Bystroushaak/DHTTPClient
import std.stdio, std.string, std.conv, std.stream;
import std.socket, std.socketstream;
import dhttpclient;

int main(string [] args)
{
   if (args.length < 2) {
		writeln("Usage:");
		writeln("   ./spider {<url1>, <url2>, ...}");
		return 0;
	}
	else {
		try {
			HTTPClient navegador = new HTTPClient();
			foreach (a; args[1..$]) {
				writeln("[Contingut: ", navegador.get(a), "]");
			}
		}
		catch (Exception e) {
			writeln("[Excepció: ", e, "]");
		}
		return 0;
	}
}



What happens now?

Thanks a lot,
Xan.

2012/1/20 Bystroushaak <bystrousak@kitakitsune.org>:
> You can always use my module:
>  https://github.com/Bystroushaak/DHTTPClient
>
>
January 20, 2012
Re: Reading web pages
With dmd 2.057 on my linux machine:

bystrousak:DHTTPClient,0$ dmd spider.d dhttpclient.d
bystrousak:DHTTPClient,0$ ./spider http://kitakitsune.org
[Contingut: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 
Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

<HTML>
.....


On 20.1.2012 15:37, Xan xan wrote:
> I get errors:
>
> xan@gerret:~/yottium/@codi/aranya-d2.0$ gdmd-4.6 spider.d
> spider.o: In function `_Dmain':
> spider.d:(.text+0x4d): undefined reference to
> `_D11dhttpclient10HTTPClient7__ClassZ'
> spider.d:(.text+0x5a): undefined reference to
> `_D11dhttpclient10HTTPClient6__ctorMFZC11dhttpclient10HTTPClient'
> spider.o:(.data+0x24): undefined reference to `_D11dhttpclient12__ModuleInfoZ'
> collect2: ld returned 1 exit status
>
>
> with the file spider.d:
>
> //D 2.0
> //gdmd-4.6<fitxer>  =>  surt el fitxer amb el mateix nom i .o
> //Usa https://github.com/Bystroushaak/DHTTPClient
> import std.stdio, std.string, std.conv, std.stream;
> import std.socket, std.socketstream;
> import dhttpclient;
>
> int main(string [] args)
> {
>      if (args.length<  2) {
> 		writeln("Usage:");
> 		writeln("   ./spider {<url1>,<url2>, ...}");
> 		return 0;
> 	}
> 	else {
> 		try {
> 			HTTPClient navegador = new HTTPClient();
> 			foreach (a; args[1..$]) {
> 				writeln("[Contingut: ", navegador.get(a), "]");
> 			}
> 		}
> 		catch (Exception e) {
> 			writeln("[Excepció: ", e, "]");
> 		}
> 		return 0;
> 	}
> }
>
>
>
> What happens now?
>
> Thanks a lot,
> Xan.
>
> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>:
>> You can always use my module:
>>   https://github.com/Bystroushaak/DHTTPClient
>>
>>
January 20, 2012
Re: Reading web pages
Yes. I ddi not know that I have to compile the two d files, although
it has sense ;-)

Perfect.

On the other hand, I see dhttpclient  identifies as
"Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
Gecko/20100401 Firefox/3.6.13"

How can I change that?




2012/1/20 Bystroushaak <bystrousak@kitakitsune.org>:
> With dmd 2.057 on my linux machine:
>
> bystrousak:DHTTPClient,0$ dmd spider.d dhttpclient.d
> bystrousak:DHTTPClient,0$ ./spider http://kitakitsune.org
> [Contingut: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
> "http://www.w3.org/TR/html4/loose.dtd">
>
> <HTML>
> .....
>
>
>
> On 20.1.2012 15:37, Xan xan wrote:
>>
>> I get errors:
>>
>> xan@gerret:~/yottium/@codi/aranya-d2.0$ gdmd-4.6 spider.d
>> spider.o: In function `_Dmain':
>> spider.d:(.text+0x4d): undefined reference to
>> `_D11dhttpclient10HTTPClient7__ClassZ'
>> spider.d:(.text+0x5a): undefined reference to
>> `_D11dhttpclient10HTTPClient6__ctorMFZC11dhttpclient10HTTPClient'
>> spider.o:(.data+0x24): undefined reference to
>> `_D11dhttpclient12__ModuleInfoZ'
>> collect2: ld returned 1 exit status
>>
>>
>> with the file spider.d:
>>
>> //D 2.0
>> //gdmd-4.6<fitxer>  =>  surt el fitxer amb el mateix nom i .o
>> //Usa https://github.com/Bystroushaak/DHTTPClient
>> import std.stdio, std.string, std.conv, std.stream;
>> import std.socket, std.socketstream;
>> import dhttpclient;
>>
>> int main(string [] args)
>> {
>>     if (args.length<  2) {
>>                writeln("Usage:");
>>                writeln("   ./spider {<url1>,<url2>, ...}");
>>                return 0;
>>        }
>>        else {
>>                try {
>>                        HTTPClient navegador = new HTTPClient();
>>                        foreach (a; args[1..$]) {
>>                                writeln("[Contingut: ", navegador.get(a),
>> "]");
>>                        }
>>                }
>>                catch (Exception e) {
>>                        writeln("[Excepció: ", e, "]");
>>                }
>>                return 0;
>>        }
>> }
>>
>>
>>
>> What happens now?
>>
>> Thanks a lot,
>> Xan.
>>
>> 2012/1/20 Bystroushaak<bystrousak@kitakitsune.org>:
>>>
>>> You can always use my module:
>>>  https://github.com/Bystroushaak/DHTTPClient
>>>
>>>
>
January 20, 2012
Re: Reading web pages
This module is very simple, only for HTTP protocol, but there is way how 
to add HTTPS:

public void setTcpSocketCreator(TcpSocket function(string domain, ushort 
port) fn)

You can add lambda function which return SSL socket, which will be 
called for every connection.

FTP is not supported - it is DHTTPCLient, not DFTPClient :)

On 20.1.2012 15:24, Xan xan wrote:
> For the other hand, how to specify the protocol? It's not the same
> http://foo  thanftp://foo
« First   ‹ Prev
1 2 3
Top | Discussion index | About this forum | D home