Thread overview
SocketStream
Jan 18, 2005
Bob
Jan 18, 2005
Kris
Jan 19, 2005
Bob
Jan 19, 2005
Ben Hinkle
January 18, 2005
Tried to convert one of my scriptlets to D:

I was trying to use SocketStream to capture HTML pages, because I am certain that D is much faster parsing the contents than Javascript.

Unfortunately I have not found any means to get the proper timing when the HTML page has been received.

"eof()" returns only TRUE when the connection to the server is terminated, which usually takes many times longer than receiving the contents.

"available()" always returns 0 bytes, which is also no help.

Some HTTP headers do mention the content size, but this is not always the case. I could look for [/HTML] tags but some documents contain none and others have multiple [/HTML] tags.

Is there any solution for the program to know when the document has been fully loaded other than waiting for eof()?

Thanks.



January 18, 2005
Yes, there is a solution: you might consider using Mango (over at dsource.org) instead, since it has a fully operational HTTP server and Servlet wrapper. You can grab whatever headers you need at either level (mango.http.server, or mango.http.servlet). Take a look at some of the examples to get started.

If you're building a client rather than a server, then you migh consider mango.http.client instead -- it provides you with access to all the headers also.

Oh, and the server is rather fast: once operational, it doesn't allocate any memory at all -- so the GC is never active (except for allocations made within your own code).

There was a test done on Gentoo linux, with a 1.4Ghz Pentium-M running both the server and a single client feeding it with requests: I recall it was completing ~3500 requests per second, and half of the CPU was eaten by the client portion.


In article <csjh1m$1spp$1@digitaldaemon.com>, Bob says...
>
>Tried to convert one of my scriptlets to D:
>
>I was trying to use SocketStream to capture HTML pages, because I am certain that D is much faster parsing the contents than Javascript.
>
>Unfortunately I have not found any means to get the proper timing when the HTML page has been received.
>
>"eof()" returns only TRUE when the connection to the server is terminated, which usually takes many times longer than receiving the contents.
>
>"available()" always returns 0 bytes, which is also no help.
>
>Some HTTP headers do mention the content size, but this is not always the case. I could look for [/HTML] tags but some documents contain none and others have multiple [/HTML] tags.
>
>Is there any solution for the program to know when the document has been fully loaded other than waiting for eof()?
>
>Thanks.
>
>
>


January 19, 2005
In article <csjh1m$1spp$1@digitaldaemon.com>, Bob says...
>
>Tried to convert one of my scriptlets to D:
>
>I was trying to use SocketStream to capture HTML pages, because I am certain that D is much faster parsing the contents than Javascript.
>
>Unfortunately I have not found any means to get the proper timing when the HTML page has been received.
>
>"eof()" returns only TRUE when the connection to the server is terminated, which usually takes many times longer than receiving the contents.
>
>"available()" always returns 0 bytes, which is also no help.
>
>Some HTTP headers do mention the content size, but this is not always the case. I could look for [/HTML] tags but some documents contain none and others have multiple [/HTML] tags.
>
>Is there any solution for the program to know when the document has been fully loaded other than waiting for eof()?
>
>Thanks.

I don't know the answer to your question but if you have ideas to improve std.socketstream don't hesitate to try them out, post and/or email them to Walter. One thing I see glancing over the code is that it doesn't take advantage of the API to readLine that accepts an input buffer. That would improve performance if that turns out to be a problem. Also using a BufferedStream might help. It could probably use a fresh look to see what needs updating. On the other hand Mango is also an option as Kris mentioned.

In terms of knowing when the content ends, I think you've answered your own question: either wait for eof or bail at /html. But that's my naive guess.

-Ben


January 19, 2005
Quite interesting project. Thanks for your info.
Doing some test now ...


In article <csjmhq$24c8$1@digitaldaemon.com>, Kris says...
>
>Yes, there is a solution: you might consider using Mango (over at dsource.org) instead, since it has a fully operational HTTP server and Servlet wrapper. You can grab whatever headers you need at either level (mango.http.server, or mango.http.servlet). Take a look at some of the examples to get started.
>
>If you're building a client rather than a server, then you migh consider mango.http.client instead -- it provides you with access to all the headers also.
>
>Oh, and the server is rather fast: once operational, it doesn't allocate any memory at all -- so the GC is never active (except for allocations made within your own code).
>
>There was a test done on Gentoo linux, with a 1.4Ghz Pentium-M running both the server and a single client feeding it with requests: I recall it was completing ~3500 requests per second, and half of the CPU was eaten by the client portion.
>
>
>In article <csjh1m$1spp$1@digitaldaemon.com>, Bob says...
>>
>>Tried to convert one of my scriptlets to D:
>>
>>I was trying to use SocketStream to capture HTML pages, because I am certain that D is much faster parsing the contents than Javascript.
>>
>>Unfortunately I have not found any means to get the proper timing when the HTML page has been received.
>>
>>"eof()" returns only TRUE when the connection to the server is terminated, which usually takes many times longer than receiving the contents.
>>
>>"available()" always returns 0 bytes, which is also no help.
>>
>>Some HTTP headers do mention the content size, but this is not always the case. I could look for [/HTML] tags but some documents contain none and others have multiple [/HTML] tags.
>>
>>Is there any solution for the program to know when the document has been fully loaded other than waiting for eof()?
>>
>>Thanks.
>>
>>
>>
>
>