View mode: basic / threaded / horizontal-split · Log in · Help
January 18, 2005
SocketStream
Tried to convert one of my scriptlets to D:

I was trying to use SocketStream to capture HTML pages, because I am certain
that D is much faster parsing the contents than Javascript.

Unfortunately I have not found any means to get the proper timing when the HTML
page has been received.

"eof()" returns only TRUE when the connection to the server is terminated, which
usually takes many times longer than receiving the contents.

"available()" always returns 0 bytes, which is also no help.

Some HTTP headers do mention the content size, but this is not always the case.
I could look for [/HTML] tags but some documents contain none and others have
multiple [/HTML] tags.

Is there any solution for the program to know when the document has been fully
loaded other than waiting for eof()?

Thanks.
January 18, 2005
Re: SocketStream
Yes, there is a solution: you might consider using Mango (over at dsource.org)
instead, since it has a fully operational HTTP server and Servlet wrapper. You
can grab whatever headers you need at either level (mango.http.server, or
mango.http.servlet). Take a look at some of the examples to get started.

If you're building a client rather than a server, then you migh consider
mango.http.client instead -- it provides you with access to all the headers
also.

Oh, and the server is rather fast: once operational, it doesn't allocate any
memory at all -- so the GC is never active (except for allocations made within
your own code).

There was a test done on Gentoo linux, with a 1.4Ghz Pentium-M running both the
server and a single client feeding it with requests: I recall it was completing
~3500 requests per second, and half of the CPU was eaten by the client portion.


In article <csjh1m$1spp$1@digitaldaemon.com>, Bob says...
>
>Tried to convert one of my scriptlets to D:
>
>I was trying to use SocketStream to capture HTML pages, because I am certain
>that D is much faster parsing the contents than Javascript.
>
>Unfortunately I have not found any means to get the proper timing when the HTML
>page has been received.
>
>"eof()" returns only TRUE when the connection to the server is terminated, which
>usually takes many times longer than receiving the contents.
>
>"available()" always returns 0 bytes, which is also no help.
>
>Some HTTP headers do mention the content size, but this is not always the case.
>I could look for [/HTML] tags but some documents contain none and others have
>multiple [/HTML] tags.
>
>Is there any solution for the program to know when the document has been fully
>loaded other than waiting for eof()?
>
>Thanks.
>
>
>
January 19, 2005
Re: SocketStream
In article <csjh1m$1spp$1@digitaldaemon.com>, Bob says...
>
>Tried to convert one of my scriptlets to D:
>
>I was trying to use SocketStream to capture HTML pages, because I am certain
>that D is much faster parsing the contents than Javascript.
>
>Unfortunately I have not found any means to get the proper timing when the HTML
>page has been received.
>
>"eof()" returns only TRUE when the connection to the server is terminated, which
>usually takes many times longer than receiving the contents.
>
>"available()" always returns 0 bytes, which is also no help.
>
>Some HTTP headers do mention the content size, but this is not always the case.
>I could look for [/HTML] tags but some documents contain none and others have
>multiple [/HTML] tags.
>
>Is there any solution for the program to know when the document has been fully
>loaded other than waiting for eof()?
>
>Thanks.

I don't know the answer to your question but if you have ideas to improve
std.socketstream don't hesitate to try them out, post and/or email them to
Walter. One thing I see glancing over the code is that it doesn't take advantage
of the API to readLine that accepts an input buffer. That would improve
performance if that turns out to be a problem. Also using a BufferedStream might
help. It could probably use a fresh look to see what needs updating. On the
other hand Mango is also an option as Kris mentioned.

In terms of knowing when the content ends, I think you've answered your own
question: either wait for eof or bail at /html. But that's my naive guess.

-Ben
January 19, 2005
Re: SocketStream
Quite interesting project. Thanks for your info.
Doing some test now ...


In article <csjmhq$24c8$1@digitaldaemon.com>, Kris says...
>
>Yes, there is a solution: you might consider using Mango (over at dsource.org)
>instead, since it has a fully operational HTTP server and Servlet wrapper. You
>can grab whatever headers you need at either level (mango.http.server, or
>mango.http.servlet). Take a look at some of the examples to get started.
>
>If you're building a client rather than a server, then you migh consider
>mango.http.client instead -- it provides you with access to all the headers
>also.
>
>Oh, and the server is rather fast: once operational, it doesn't allocate any
>memory at all -- so the GC is never active (except for allocations made within
>your own code).
>
>There was a test done on Gentoo linux, with a 1.4Ghz Pentium-M running both the
>server and a single client feeding it with requests: I recall it was completing
>~3500 requests per second, and half of the CPU was eaten by the client portion.
>
>
>In article <csjh1m$1spp$1@digitaldaemon.com>, Bob says...
>>
>>Tried to convert one of my scriptlets to D:
>>
>>I was trying to use SocketStream to capture HTML pages, because I am certain
>>that D is much faster parsing the contents than Javascript.
>>
>>Unfortunately I have not found any means to get the proper timing when the HTML
>>page has been received.
>>
>>"eof()" returns only TRUE when the connection to the server is terminated, which
>>usually takes many times longer than receiving the contents.
>>
>>"available()" always returns 0 bytes, which is also no help.
>>
>>Some HTTP headers do mention the content size, but this is not always the case.
>>I could look for [/HTML] tags but some documents contain none and others have
>>multiple [/HTML] tags.
>>
>>Is there any solution for the program to know when the document has been fully
>>loaded other than waiting for eof()?
>>
>>Thanks.
>>
>>
>>
>
>
Top | Discussion index | About this forum | D home