SocketStream

Jan 18, 2005

Bob

Jan 18, 2005

Kris

Jan 19, 2005

Bob

Jan 19, 2005

Ben Hinkle

Tried to convert one of my scriptlets to D: I was trying to use SocketStream to capture HTML pages, because I am certain that D is much faster parsing the contents than Javascript. Unfortunately I have not found any means to get the proper timing when the HTML page has been received. "eof()" returns only TRUE when the connection to the server is terminated, which usually takes many times longer than receiving the contents. "available()" always returns 0 bytes, which is also no help. Some HTTP headers do mention the content size, but this is not always the case. I could look for [/HTML] tags but some documents contain none and others have multiple [/HTML] tags. Is there any solution for the program to know when the document has been fully loaded other than waiting for eof()? Thanks.

Yes, there is a solution: you might consider using Mango (over at dsource.org) instead, since it has a fully operational HTTP server and Servlet wrapper. You can grab whatever headers you need at either level (mango.http.server, or mango.http.servlet). Take a look at some of the examples to get started. If you're building a client rather than a server, then you migh consider mango.http.client instead -- it provides you with access to all the headers also. Oh, and the server is rather fast: once operational, it doesn't allocate any memory at all -- so the GC is never active (except for allocations made within your own code). There was a test done on Gentoo linux, with a 1.4Ghz Pentium-M running both the server and a single client feeding it with requests: I recall it was completing ~3500 requests per second, and half of the CPU was eaten by the client portion. In article <csjh1m$1spp$1@digitaldaemon.com>, Bob says... > >Tried to convert one of my scriptlets to D: > >I was trying to use SocketStream to capture HTML pages, because I am certain that D is much faster parsing the contents than Javascript. > >Unfortunately I have not found any means to get the proper timing when the HTML page has been received. > >"eof()" returns only TRUE when the connection to the server is terminated, which usually takes many times longer than receiving the contents. > >"available()" always returns 0 bytes, which is also no help. > >Some HTTP headers do mention the content size, but this is not always the case. I could look for [/HTML] tags but some documents contain none and others have multiple [/HTML] tags. > >Is there any solution for the program to know when the document has been fully loaded other than waiting for eof()? > >Thanks. > > >

In article <csjh1m$1spp$1@digitaldaemon.com>, Bob says... > >Tried to convert one of my scriptlets to D: > >I was trying to use SocketStream to capture HTML pages, because I am certain that D is much faster parsing the contents than Javascript. > >Unfortunately I have not found any means to get the proper timing when the HTML page has been received. > >"eof()" returns only TRUE when the connection to the server is terminated, which usually takes many times longer than receiving the contents. > >"available()" always returns 0 bytes, which is also no help. > >Some HTTP headers do mention the content size, but this is not always the case. I could look for [/HTML] tags but some documents contain none and others have multiple [/HTML] tags. > >Is there any solution for the program to know when the document has been fully loaded other than waiting for eof()? > >Thanks. I don't know the answer to your question but if you have ideas to improve std.socketstream don't hesitate to try them out, post and/or email them to Walter. One thing I see glancing over the code is that it doesn't take advantage of the API to readLine that accepts an input buffer. That would improve performance if that turns out to be a problem. Also using a BufferedStream might help. It could probably use a fresh look to see what needs updating. On the other hand Mango is also an option as Kris mentioned. In terms of knowing when the content ends, I think you've answered your own question: either wait for eof or bail at /html. But that's my naive guess. -Ben

January 19, 2005

Re: SocketStream

Posted by Bob
in reply to Kris

Permalink

Bob

Posted in reply to Kris

Permalink

Quite interesting project. Thanks for your info.
Doing some test now ...


In article <csjmhq$24c8$1@digitaldaemon.com>, Kris says...
>
>Yes, there is a solution: you might consider using Mango (over at dsource.org) instead, since it has a fully operational HTTP server and Servlet wrapper. You can grab whatever headers you need at either level (mango.http.server, or mango.http.servlet). Take a look at some of the examples to get started.
>
>If you're building a client rather than a server, then you migh consider mango.http.client instead -- it provides you with access to all the headers also.
>
>Oh, and the server is rather fast: once operational, it doesn't allocate any memory at all -- so the GC is never active (except for allocations made within your own code).
>
>There was a test done on Gentoo linux, with a 1.4Ghz Pentium-M running both the server and a single client feeding it with requests: I recall it was completing ~3500 requests per second, and half of the CPU was eaten by the client portion.
>
>
>In article <csjh1m$1spp$1@digitaldaemon.com>, Bob says...
>>
>>Tried to convert one of my scriptlets to D:
>>
>>I was trying to use SocketStream to capture HTML pages, because I am certain that D is much faster parsing the contents than Javascript.
>>
>>Unfortunately I have not found any means to get the proper timing when the HTML page has been received.
>>
>>"eof()" returns only TRUE when the connection to the server is terminated, which usually takes many times longer than receiving the contents.
>>
>>"available()" always returns 0 bytes, which is also no help.
>>
>>Some HTTP headers do mention the content size, but this is not always the case. I could look for [/HTML] tags but some documents contain none and others have multiple [/HTML] tags.
>>
>>Is there any solution for the program to know when the document has been fully loaded other than waiting for eof()?
>>
>>Thanks.
>>
>>
>>
>
>

Forums