March 27, 2011
On 25/03/11 10.54, Johannes Pfau wrote:
> Jonas Drewsen wrote:
>> Hi,
>>
>>    So I've been working a bit on the etc.curl module. Currently most
>> of
>> the HTTP functionality is done and some very simple Ftp.
>>
>> I would very much like to know if this has a chance of getting in
>> phobos if I finish it with the current design. If not then it will be
>> for my own project only and doesn't need as much documentation or all
>> the features.
>>
>> https://github.com/jcd/phobos/tree/curl
>>
>> I do know that the error handling is currently not good enough... WIP.
>>
>> /Jonas
>>
>>
>> On 11/03/11 16.20, Jonas Drewsen wrote:
>>> Hi,
>>>
>>> So I've spent some time trying to wrap libcurl for D. There is a lot
>>> of things that you can do with libcurl which I did not know so I'm
>>> starting out small.
>>>
>>> For now I've created all the declarations for the latest public curl
>>> C api. I have put that in the etc.c.curl module.
>>>
>>> On top of that I've created a more D like api as seen below. This is
>>> located in the 'etc.curl' module. What you can see below currently
>>> works but before proceeding further down this road I would like to
>>> get your comments on it.
>>>
>>> //
>>> // Simple HTTP GET with sane defaults
>>> // provides the .content, .headers and .status
>>> //
>>> writeln( Http.get("http://www.google.com").content );
>>>
>>> //
>>> // GET with custom data receiver delegates
>>> //
>>> Http http = new Http("http://www.google.dk");
>>> http.setReceiveHeaderCallback( (string key, string value) {
>>> writeln(key ~ ":" ~ value);
>>> } );
>>> http.setReceiveCallback( (string data) { /* drop */ } );
>>> http.perform;
>>>
>>> //
>>> // POST with some timouts
>>> //
>>> http.setUrl("http://www.testing.com/test.cgi");
>>> http.setReceiveCallback( (string data) { writeln(data); } );
>>> http.setConnectTimeout(1000);
>>> http.setDataTimeout(1000);
>>> http.setDnsTimeout(1000);
>>> http.setPostData("The quick....");
>>> http.perform;
>>>
>>> //
>>> // PUT with data sender delegate
>>> //
>>> string msg = "Hello world";
>>> size_t len = msg.length; /* using chuncked transfer if omitted */
>>>
>>> http.setSendCallback( delegate size_t(char[] data) {
>>> if (msg.empty) return 0;
>>> auto l = msg.length;
>>> data[0..l] = msg[0..$];
>>> msg.length = 0;
>>> return l;
>>> },
>>> HttpMethod.put, len );
>>> http.perform;
>>>
>>> //
>>> // HTTPS
>>> //
>>> writeln(Http.get("https://mail.google.com").content);
>>>
>>> //
>>> // FTP
>>> //
>>> writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
>>> "./downloaded-file"));
>>>
>>>
>>> // ... authenication, cookies, interface select, progress callback
>>> // etc. is also implemented this way.
>>>
>>>
>>> /Jonas
>>
>
> I looked at the code again and I got 2 more suggestions:
>
> 1.) Would it be useful to have a headersReceived callback which would be
> called when all headers have been received (when the data callback is
> called the first time)? I think of a situation where you don't know
> what data the server will return: a few KB html which you can easily
> keep in memory or a huge file which you'd have to save to disk. You
> can only know that if the headers have been received. It would also be
> possible to do that by just overwriting the headerCallback and looking
> out for the ContentLength/ContentType header, but I think it should
> also work with the default headerCallback.

I'm a little confused as to what a headersReceived(string[string] headers) would give you compared to the onReceiveHeader(const(char)[], const(char)[])) callback that exists today in the example.

The headersReceived callback would probably lookup the content-length header and set a flag about whether to save content to file or memory.

The existing onReceiveHeader could do the same by setting the flag when it receives the content-length field.

Or maybe I'm misunderstanding you?


> 2.)
> As far as I can see you store the http headers in a case sensitive way.
> (res.headers[key] ~= value;). This means "Content-Length" vs
> "content-length" would produce two entries in the array and it makes
> it difficult to get the header from the associative array. It is maybe
> useful to keep the original casing, but probably not in the array key.
>
> BTW: According to RFC2616 the only headers which are allowed
> to be included multiple times in the response must consist of comma
> separated lists. So in theory we could keep a simple string[string]
> list and if we see a header twice we can just merge it with a ','.
>
> http://tools.ietf.org/html/rfc2616#section-4.2
> Relevant part from the RFC:
> ----------------------
>     Multiple message-header fields with the same field-name MAY be
>     present in a message if and only if the entire field-value for that
>     header field is defined as a comma-separated list [i.e., #(values)].
>     It MUST be possible to combine the multiple header fields into one
>     "field-name: field-value" pair, without changing the semantics of the
>     message, by appending each subsequent field-value to the first, each
>     separated by a comma. The order in which header fields with the same
>     field-name are received is therefore significant to the
>     interpretation of the combined field value, and thus a proxy MUST NOT
>     change the order of these field values when a message is forwarded.
> ----------------------

I will surely implement this combined value functionality. I also noted that header field names are case insensitive. This means that they could just be stored internally as lower cased and the documentation could specify lowercase for looking up by field name.


> I'm also done with the first pass through the http parsers.
> Documentation is here:
> http://dl.dropbox.com/u/24218791/std.protocol.http/http/http.html
>
> Code here:
> https://gist.github.com/886612
> The http.d file is generated from the http.d.rl file.
>

This is a nice protocol parser. I would very much like it to be used with the curl API but without it being a dependency. This is already possible now using the onReceiveHeader callback and this would decouple the two. At least until std.protocol.http is in phobos as well - at that point convenience methods could be added :)

/Jonas






March 27, 2011
On 25/03/11 12.07, Johannes Pfau wrote:
> Johannes Pfau wrote:
>> Jonas Drewsen wrote:
>>> Hi,
>>>
>>>    So I've been working a bit on the etc.curl module. Currently most
>>> of
>>> the HTTP functionality is done and some very simple Ftp.
>>>
>>> I would very much like to know if this has a chance of getting in
>>> phobos if I finish it with the current design. If not then it will be
>>> for my own project only and doesn't need as much documentation or all
>>> the features.
>>>
>>> https://github.com/jcd/phobos/tree/curl
>>>
>>> I do know that the error handling is currently not good enough... WIP.
>>>
>>> /Jonas
>>>
>>>
>>> On 11/03/11 16.20, Jonas Drewsen wrote:
>>>> Hi,
>>>>
>>>> So I've spent some time trying to wrap libcurl for D. There is a lot
>>>> of things that you can do with libcurl which I did not know so I'm
>>>> starting out small.
>>>>
>>>> For now I've created all the declarations for the latest public curl
>>>> C api. I have put that in the etc.c.curl module.
>>>>
>>>> On top of that I've created a more D like api as seen below. This is
>>>> located in the 'etc.curl' module. What you can see below currently
>>>> works but before proceeding further down this road I would like to
>>>> get your comments on it.
>>>>
>>>> //
>>>> // Simple HTTP GET with sane defaults
>>>> // provides the .content, .headers and .status
>>>> //
>>>> writeln( Http.get("http://www.google.com").content );
>>>>
>>>> //
>>>> // GET with custom data receiver delegates
>>>> //
>>>> Http http = new Http("http://www.google.dk");
>>>> http.setReceiveHeaderCallback( (string key, string value) {
>>>> writeln(key ~ ":" ~ value);
>>>> } );
>>>> http.setReceiveCallback( (string data) { /* drop */ } );
>>>> http.perform;
>>>>
>>>> //
>>>> // POST with some timouts
>>>> //
>>>> http.setUrl("http://www.testing.com/test.cgi");
>>>> http.setReceiveCallback( (string data) { writeln(data); } );
>>>> http.setConnectTimeout(1000);
>>>> http.setDataTimeout(1000);
>>>> http.setDnsTimeout(1000);
>>>> http.setPostData("The quick....");
>>>> http.perform;
>>>>
>>>> //
>>>> // PUT with data sender delegate
>>>> //
>>>> string msg = "Hello world";
>>>> size_t len = msg.length; /* using chuncked transfer if omitted */
>>>>
>>>> http.setSendCallback( delegate size_t(char[] data) {
>>>> if (msg.empty) return 0;
>>>> auto l = msg.length;
>>>> data[0..l] = msg[0..$];
>>>> msg.length = 0;
>>>> return l;
>>>> },
>>>> HttpMethod.put, len );
>>>> http.perform;
>>>>
>>>> //
>>>> // HTTPS
>>>> //
>>>> writeln(Http.get("https://mail.google.com").content);
>>>>
>>>> //
>>>> // FTP
>>>> //
>>>> writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
>>>> "./downloaded-file"));
>>>>
>>>>
>>>> // ... authenication, cookies, interface select, progress callback
>>>> // etc. is also implemented this way.
>>>>
>>>>
>>>> /Jonas
>>>
>>
>> I looked at the code again and I got 2 more suggestions:
>>
>> 1.) Would it be useful to have a headersReceived callback which would
>> be called when all headers have been received (when the data callback
>> is called the first time)? I think of a situation where you don't know
>> what data the server will return: a few KB html which you can easily
>> keep in memory or a huge file which you'd have to save to disk. You
>> can only know that if the headers have been received. It would also be
>> possible to do that by just overwriting the headerCallback and looking
>> out for the ContentLength/ContentType header, but I think it should
>> also work with the default headerCallback.
>>
>> 2.)
>> As far as I can see you store the http headers in a case sensitive way.
>> (res.headers[key] ~= value;). This means "Content-Length" vs
>> "content-length" would produce two entries in the array and it makes
>> it difficult to get the header from the associative array. It is maybe
>> useful to keep the original casing, but probably not in the array key.
>>
>> BTW: According to RFC2616 the only headers which are allowed
>> to be included multiple times in the response must consist of comma
>> separated lists. So in theory we could keep a simple string[string]
>> list and if we see a header twice we can just merge it with a ','.
>>
>> http://tools.ietf.org/html/rfc2616#section-4.2
>> Relevant part from the RFC:
>> ----------------------
>>    Multiple message-header fields with the same field-name MAY be
>>    present in a message if and only if the entire field-value for that
>>    header field is defined as a comma-separated list [i.e., #(values)].
>>    It MUST be possible to combine the multiple header fields into one
>>    "field-name: field-value" pair, without changing the semantics of
>> the message, by appending each subsequent field-value to the first,
>> each separated by a comma. The order in which header fields with the
>> same field-name are received is therefore significant to the
>>    interpretation of the combined field value, and thus a proxy MUST
>> NOT change the order of these field values when a message is
>> forwarded.
>> ----------------------
>>
>> I'm also done with the first pass through the http parsers.
>> Documentation is here:
>> http://dl.dropbox.com/u/24218791/std.protocol.http/http/http.html
>>
>> Code here:
>> https://gist.github.com/886612
>> The http.d file is generated from the http.d.rl file.
>>
>
> I added some code to show how I think this could be used in the HTTP
> client:
> https://gist.github.com/886612#file_gistfile1.d
>
> Like in the .net webclient we'd need two of these collections: one for
> received headers and one for headers to be sent.

Thanks!

It would be very nice to have in the std.protocol.http in phobos so that the curl stuff could use it. If that happened then std.protocol.{smtp,imap,....} could probably also be built on your framework and be added as support in the curl wrappers.

/Jonas



March 29, 2011
Jonas Drewsen wrote:
>
>This is a nice protocol parser. I would very much like it to be used with the curl API but without it being a dependency. This is already possible now using the onReceiveHeader callback and this would decouple the two. At least until std.protocol.http is in phobos as well - at that point convenience methods could be added :)
>
>/Jonas

Thanks, I think I'll propose the parser for the new experimental namespace when it's available.

About the headersReceived callback: You're totally right, it can be done with the onReceiveHeader callback right now. But I think in the common case the user wants the headers in an key/value array. So if the user doesn't want to use the onReceiveHeader api, a headersReceived callback would probably be convenient. But, as said it's not necessary.

Reading the curl documentation showed another small trap: CURLOPT_HEADERFUNCTION
------------------------------------------------------------
It's important to note that the callback will be invoked for the headers of all responses received after initiating a request and not just the final response. This includes all responses which occur during authentication negotiation. If you need to operate on only the headers from the final response, you will need to collect headers in the callback yourself and use HTTP status lines, for example, to delimit response boundaries.
------------------------------------------------------------

I think if we store the headers into an array, we should only store the headers of the final response. Another question is should all headers or only final headers trigger the onReceiveHeader callback? Passing only the final headers would require extra work, passing all headers should at least be documented.

Thinking of this more, this also means the _receiveHeaderCallback is
not 100% correct, as it expects all lines after the first line to be
header or empty lines, but it's possible that we get multiple statuslines.
It still works, the regex doesn't match anything and the code
ignores that line. But this way, the stored statusline will always be
the first statusline, which isn't optimal. We'd also need to detect if a
line is a statusline to reset the headers array if it's used. Seems
like we have to think about this some more.

-- 
Johannes Pfau


March 30, 2011
On 29/03/11 17.31, Johannes Pfau wrote:
> Jonas Drewsen wrote:
>>
>> This is a nice protocol parser. I would very much like it to be used
>> with the curl API but without it being a dependency. This is already
>> possible now using the onReceiveHeader callback and this would
>> decouple the two. At least until std.protocol.http is in phobos as
>> well - at that point convenience methods could be added :)
>>
>> /Jonas
>
> Thanks, I think I'll propose the parser for the new experimental
> namespace when it's available.

I'm looking forward to that.

> About the headersReceived callback: You're totally right, it can be
> done with the onReceiveHeader callback right now. But I think in the
> common case the user wants the headers in an key/value array. So if the
> user doesn't want to use the onReceiveHeader api, a headersReceived
> callback would probably be convenient. But, as said it's not necessary.

I'll put it on my todo and reconsider when I get to it :)

> Reading the curl documentation showed another small trap:
> CURLOPT_HEADERFUNCTION
> ------------------------------------------------------------
> It's important to note that the callback will be invoked for the
> headers of all responses received after initiating a request and not
> just the final response. This includes all responses which occur during
> authentication negotiation. If you need to operate on only the headers
> from the final response, you will need to collect headers in the
> callback yourself and use HTTP status lines, for example, to delimit
> response boundaries.
> ------------------------------------------------------------
>
> I think if we store the headers into an array, we should only store the
> headers of the final response. Another question is should all headers
> or only final headers trigger the onReceiveHeader callback? Passing
> only the final headers would require extra work, passing all headers
> should at least be documented.

Yeah... I've discovered this myself as well.

The current implementation does as libcurl does it an passes all headers not just for the final subrequest.

> Thinking of this more, this also means the _receiveHeaderCallback is
> not 100% correct, as it expects all lines after the first line to be
> header or empty lines, but it's possible that we get multiple statuslines.
> It still works, the regex doesn't match anything and the code
> ignores that line. But this way, the stored statusline will always be
> the first statusline, which isn't optimal. We'd also need to detect if a
> line is a statusline to reset the headers array if it's used. Seems
> like we have to think about this some more.

My local version already takes care of this. It was the wrong place for parsing status lines and headers anyway. It is now moved to the Http class where it should have been all the time.

I have implemented almost all of the features/changes suggested now. The last one I'm currently fighting is the support for "foreach" and async .byLine/.byChunk. I may have to make some changes in the current design to support this with the calling API that I would like to expose.

I wonder who could take the step and open a std.experimental package for submissions?

Thank you for the feedback!
1 2 3 4 5
Next ›   Last »