March 14, 2011 Re: Curl support RFC | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On 3/14/11 4:36 AM, Jonathan M Davis wrote:
> That's debatable. Some would argue one way, some another. Personally, I'd argue
> ubyte[]. I don't like void[] one bit. Others would agree with me, and yet others
> would disagree. I don't think that there's really a general agreement on whether
> void[] or ubyte[] is better when it comes to reading binary data like that.
void[]: "There is a typed array underneath, but I forgot its exact type".
Evidence: all array types convert to void[] automatically.
ubyte[]: "We're dealing with an array of octets here."
Evidence: ubyte[] has no special properties over T[].
All raw data reads should yield ubyte[], not void[]. This is because the user may or may not know that underneath really there's a different type, but the compiler and runtime have no such idea. So the burden of the assumption is on the user.
Raw data writes that take arrays could be allowed to accept void[] if implicit conversion from T[] is desirable.
Andrei
| |||
March 14, 2011 Re: Curl support RFC | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jonas Drewsen | On 3/14/11 10:06 AM, Jonas Drewsen wrote:
> const(ubyte)[] for input
> void[] for output
>
> that sounds reasonable. I guess that if everybody can agree on this then
> the all of phobos (e.g. std.file) should use the same types?
Move the const from the first to the second line :o). I see no reason why user code can't mess with the buffer once read.
Yes, I agree std.file et al should switch to ubyte[].
Andrei
| |||
March 14, 2011 Re: Curl support RFC | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jonas Drewsen | On 3/14/11 10:38 AM, Jonas Drewsen wrote: > On 13/03/11 23.44, Andrei Alexandrescu wrote: >> You'll probably need to justify the existence of a class hierarchy and >> what overridable methods there are. In particular, since you seem to >> offer hooks via delegates, probably classes wouldn't be needed at all. >> (FWIW I would've done the same; I wouldn't want to inherit just to >> intercept the headers etc.) > > Missed this one in my last reply. > > Ftp/Http etc. are all inheriting from a Protocol class. The Protocol > class defines common settings (@properties) for all protocols e.g. > dnsTimeout, connectTimeout, networkInterface, url, port selection. > > I could make these into a mixin and thereby get rid of the inheritance > of course. Use Occam's razor and the path of least resistence to get the most natural interface. > I think that keeping the Protocol as an abstract base class would > benefit e.g. the integration with streams. In that case we could simply > create a CurlTransport that contains a reference to a Protocol derived > objects (Http,Ftp...). > > Or would it be better to have specific HttpTransport, FtpTransport? Count the commonalities and the differences and then make an executive decision. Andrei | |||
March 14, 2011 Re: Curl support RFC | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jonas Drewsen | On 3/14/11 4:16 AM, Jonas Drewsen wrote: > On 13/03/11 23.44, Andrei Alexandrescu wrote: >> Sweet. As has been discussed, often the content is not text so you may >> want to have content return ubyte[] and add a new property such as >> "textContent" or "text". > > I've already changed it to void[] as done in the std.file module. Is > ubyte[] better suited? Yah, as per the ensuing discussion. >> As discussed, properties may be better here than setXxx and getXxx. The >> setReceiveCallback hook should take a ubyte[]. The >> setReceiveHeaderCallback should take a const(char)[]. That way you won't >> need to copy all headers, leaving safely that option to the client. > > I've already replaced the set/get methods with properties and renamed > them. Hadn't thought of using const(char)[].. thanks for the hint. A good general guideline: make sure that the user could easily and safely use a loop that reads a large http stream (with hooks and all) without allocating one item each pass through the loop. >> Regarding a range interface, it would be great if you allowed e.g. >> >> foreach (line; Http.get("https://mail.google.com").byLine()) { >> ... >> } >> >> The data transfer should happen concurrently with the foreach code. The >> type of line is char[] or const(char)[]. Similarly, there would be a >> byChunk interface that transfers in ubyte[] chunks. >> >> Also we need a head() method for the corresponding command. >> >> Andrei > > That would be neat. What do you mean about concurrent data transfers > with foreach? Assume the body of the loop does some time-consuming processing - like e.g. writing to another HTTP stream. Then your network reads should not wait for that processing. While the user code does something, you should already have the next transfer in flight. Example: a utility that efficiently uses GET from one http source and uses the data to POST it to an http target should be an efficient few-liner. (FTP versions and mixed ones too.) Andrei | |||
March 14, 2011 Re: Curl support RFC | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Johannes Pfau | On 14/03/11 16.40, Johannes Pfau wrote: > Jonas Drewsen wrote: >>> Do you plan to add some kind of support for header parsing? I think >>> something like what the .net webclient uses >>> ( http://msdn.microsoft.com/en-us/library/system.net.webclient(v=VS.100).aspx ) >>> would be great. Especially the HeaderCollection supporting headers as >>> strings and as data types (for both parsing and formatting), but >>> without a class hierarchy for the headers, using templates instead. >> >> It would be nice to be able to get/set headers by string and enums >> (http://msdn.microsoft.com/en-us/library/system.net.httprequestheader.aspx). >> But I cannot see that .net is using datatypes or templates for it. >> Could you give me a pointer please? >> > > You're right I didn't look close enough at the .net documentation. I > thought HttpRequestHeader is a class. What I meant for D was something > like this: > > struct ETagHeader > { > //Data members > bool Weak = false; > string Value; > > //All header structs provide these > static string Key = "ETag"; > > static ETagHeader parse(string value) > { > //parser logic here > } > > void format(T writer) > if (isOutputRange!(T, string)) > { > if(etag.Weak) > writer.put("W/"); > assert(etag.Value != ""); > writer.put(quote(etag.Value)); > } > } > > Then we can offer methods like these: > > setHeader(T)(T header) > if(isHeader(T)) > { > headers[T.Key] = formatHeader(header); > } > > T getHeader(T type)() > if(isHeader(T)) > { > if(!T.Key in headers) > throw Exception(); > return T.parse(headers[T.key]); > } > > So user code wouldn't have to deal with header parsing / formatting: > auto etag = client.getHeader!ETagHeader(); > assert(etag.Weak); Seems like a very nice addition. I will have a look at your github and probably wait until you have made it ready for consumption before adding it :) >>> I've written D parsers/formatters for almost all headers in >>> rfc2616 (1 or 2 might be missing) and for a few additional commonly >>> used headers (Content-Disposition, cookie headers). The parsers are >>> written with ragel and are to be used with curl (continuations must >>> be removed and the parsers always take 1 line of input, just as you >>> get it from curl). Right now only the client side is implemented (no >>> parsers for headers which can only be sent from client-->server ). >>> However, I need to add some more documentation to the parsers, need >>> to do some refactoring and I've got absolutely no time for that in >>> the next 2 weeks ('abitur' final exams). But if you could wait 2 >>> weeks or if you wanted to do the refactoring yourself, I would be >>> happy to contribute that code. >> >> That sounds very interesting. I would very much like to see the code >> and see if fits in. > > Ok, here it is, but it seriously needs to be refactored and documented: > https://gist.github.com/869324 > | |||
March 14, 2011 Re: Curl support RFC | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 14/03/11 18.46, Andrei Alexandrescu wrote: > On 3/14/11 10:06 AM, Jonas Drewsen wrote: >> const(ubyte)[] for input >> void[] for output >> >> that sounds reasonable. I guess that if everybody can agree on this then >> the all of phobos (e.g. std.file) should use the same types? > > Move the const from the first to the second line :o). I see no reason > why user code can't mess with the buffer once read. You are right of course. bummer. > Yes, I agree std.file et al should switch to ubyte[]. > > Andrei Then lets hope someone makes a patch for it. Maybe I'll make it when I'm done with the curl stuff if no one beats me to it. /Jonas | |||
March 14, 2011 Re: Curl support RFC | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 14/03/11 18.55, Andrei Alexandrescu wrote: > On 3/14/11 4:16 AM, Jonas Drewsen wrote: >> On 13/03/11 23.44, Andrei Alexandrescu wrote: >>> Sweet. As has been discussed, often the content is not text so you may >>> want to have content return ubyte[] and add a new property such as >>> "textContent" or "text". >> >> I've already changed it to void[] as done in the std.file module. Is >> ubyte[] better suited? > > Yah, as per the ensuing discussion. > >>> As discussed, properties may be better here than setXxx and getXxx. The >>> setReceiveCallback hook should take a ubyte[]. The >>> setReceiveHeaderCallback should take a const(char)[]. That way you won't >>> need to copy all headers, leaving safely that option to the client. >> >> I've already replaced the set/get methods with properties and renamed >> them. Hadn't thought of using const(char)[].. thanks for the hint. > > A good general guideline: make sure that the user could easily and > safely use a loop that reads a large http stream (with hooks and all) > without allocating one item each pass through the loop. Makes sense. I'll keep that in mind. >>> Regarding a range interface, it would be great if you allowed e.g. >>> >>> foreach (line; Http.get("https://mail.google.com").byLine()) { >>> ... >>> } >>> >>> The data transfer should happen concurrently with the foreach code. The >>> type of line is char[] or const(char)[]. Similarly, there would be a >>> byChunk interface that transfers in ubyte[] chunks. >>> >>> Also we need a head() method for the corresponding command. >>> >>> Andrei >> >> That would be neat. What do you mean about concurrent data transfers >> with foreach? > > Assume the body of the loop does some time-consuming processing - like > e.g. writing to another HTTP stream. Then your network reads should not > wait for that processing. While the user code does something, you should > already have the next transfer in flight. > > Example: a utility that efficiently uses GET from one http source and > uses the data to POST it to an http target should be an efficient > few-liner. (FTP versions and mixed ones too.) > > > Andrei I get it. Any existing implementation that does this I can have a look at? /Jonas | |||
March 14, 2011 Re: Curl support RFC | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jonas Drewsen | On 3/14/11 4:11 PM, Jonas Drewsen wrote: > On 14/03/11 18.55, Andrei Alexandrescu wrote: >> Assume the body of the loop does some time-consuming processing - like >> e.g. writing to another HTTP stream. Then your network reads should not >> wait for that processing. While the user code does something, you should >> already have the next transfer in flight. >> >> Example: a utility that efficiently uses GET from one http source and >> uses the data to POST it to an http target should be an efficient >> few-liner. (FTP versions and mixed ones too.) >> >> >> Andrei > > I get it. Any existing implementation that does this I can have a look at? Unfortunately not at the moment. I wanted to define such a thing for std.stdio called byLineAsync and byChunkAsync but never got to it. The basic idea is: 1. Define a new range type, e.g. AsyncHttpInputRange 2. Inside that range start a secondary thread that does the actual transfer and passes read buffers to the main thread by means of messages 3. See std.concurrency and the free chapter http://www.informit.com/articles/printerfriendly.aspx?p=1609144 for details 4. Control congestion (too many buffers in flight) with setMaxMailboxSize. 5. Make sure you have a little protocol that stops the secondary thread when the range is destroyed. Andrei | |||
March 25, 2011 Re: Curl support RFC | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jonas Drewsen Attachments: | Jonas Drewsen wrote: >Hi, > > So I've been working a bit on the etc.curl module. Currently most > of >the HTTP functionality is done and some very simple Ftp. > >I would very much like to know if this has a chance of getting in phobos if I finish it with the current design. If not then it will be for my own project only and doesn't need as much documentation or all the features. > >https://github.com/jcd/phobos/tree/curl > >I do know that the error handling is currently not good enough... WIP. > >/Jonas > > >On 11/03/11 16.20, Jonas Drewsen wrote: >> Hi, >> >> So I've spent some time trying to wrap libcurl for D. There is a lot of things that you can do with libcurl which I did not know so I'm starting out small. >> >> For now I've created all the declarations for the latest public curl C api. I have put that in the etc.c.curl module. >> >> On top of that I've created a more D like api as seen below. This is located in the 'etc.curl' module. What you can see below currently works but before proceeding further down this road I would like to get your comments on it. >> >> // >> // Simple HTTP GET with sane defaults >> // provides the .content, .headers and .status >> // >> writeln( Http.get("http://www.google.com").content ); >> >> // >> // GET with custom data receiver delegates >> // >> Http http = new Http("http://www.google.dk"); >> http.setReceiveHeaderCallback( (string key, string value) { >> writeln(key ~ ":" ~ value); >> } ); >> http.setReceiveCallback( (string data) { /* drop */ } ); >> http.perform; >> >> // >> // POST with some timouts >> // >> http.setUrl("http://www.testing.com/test.cgi"); >> http.setReceiveCallback( (string data) { writeln(data); } ); >> http.setConnectTimeout(1000); >> http.setDataTimeout(1000); >> http.setDnsTimeout(1000); >> http.setPostData("The quick...."); >> http.perform; >> >> // >> // PUT with data sender delegate >> // >> string msg = "Hello world"; >> size_t len = msg.length; /* using chuncked transfer if omitted */ >> >> http.setSendCallback( delegate size_t(char[] data) { >> if (msg.empty) return 0; >> auto l = msg.length; >> data[0..l] = msg[0..$]; >> msg.length = 0; >> return l; >> }, >> HttpMethod.put, len ); >> http.perform; >> >> // >> // HTTPS >> // >> writeln(Http.get("https://mail.google.com").content); >> >> // >> // FTP >> // >> writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds", >> "./downloaded-file")); >> >> >> // ... authenication, cookies, interface select, progress callback // etc. is also implemented this way. >> >> >> /Jonas > I looked at the code again and I got 2 more suggestions: 1.) Would it be useful to have a headersReceived callback which would be called when all headers have been received (when the data callback is called the first time)? I think of a situation where you don't know what data the server will return: a few KB html which you can easily keep in memory or a huge file which you'd have to save to disk. You can only know that if the headers have been received. It would also be possible to do that by just overwriting the headerCallback and looking out for the ContentLength/ContentType header, but I think it should also work with the default headerCallback. 2.) As far as I can see you store the http headers in a case sensitive way. (res.headers[key] ~= value;). This means "Content-Length" vs "content-length" would produce two entries in the array and it makes it difficult to get the header from the associative array. It is maybe useful to keep the original casing, but probably not in the array key. BTW: According to RFC2616 the only headers which are allowed to be included multiple times in the response must consist of comma separated lists. So in theory we could keep a simple string[string] list and if we see a header twice we can just merge it with a ','. http://tools.ietf.org/html/rfc2616#section-4.2 Relevant part from the RFC: ---------------------- Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma. The order in which header fields with the same field-name are received is therefore significant to the interpretation of the combined field value, and thus a proxy MUST NOT change the order of these field values when a message is forwarded. ---------------------- I'm also done with the first pass through the http parsers. Documentation is here: http://dl.dropbox.com/u/24218791/std.protocol.http/http/http.html Code here: https://gist.github.com/886612 The http.d file is generated from the http.d.rl file. -- Johannes Pfau | |||
March 25, 2011 Re: Curl support RFC | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Johannes Pfau Attachments: | Johannes Pfau wrote: >Jonas Drewsen wrote: >>Hi, >> >> So I've been working a bit on the etc.curl module. Currently most >> of >>the HTTP functionality is done and some very simple Ftp. >> >>I would very much like to know if this has a chance of getting in phobos if I finish it with the current design. If not then it will be for my own project only and doesn't need as much documentation or all the features. >> >>https://github.com/jcd/phobos/tree/curl >> >>I do know that the error handling is currently not good enough... WIP. >> >>/Jonas >> >> >>On 11/03/11 16.20, Jonas Drewsen wrote: >>> Hi, >>> >>> So I've spent some time trying to wrap libcurl for D. There is a lot of things that you can do with libcurl which I did not know so I'm starting out small. >>> >>> For now I've created all the declarations for the latest public curl C api. I have put that in the etc.c.curl module. >>> >>> On top of that I've created a more D like api as seen below. This is located in the 'etc.curl' module. What you can see below currently works but before proceeding further down this road I would like to get your comments on it. >>> >>> // >>> // Simple HTTP GET with sane defaults >>> // provides the .content, .headers and .status >>> // >>> writeln( Http.get("http://www.google.com").content ); >>> >>> // >>> // GET with custom data receiver delegates >>> // >>> Http http = new Http("http://www.google.dk"); >>> http.setReceiveHeaderCallback( (string key, string value) { >>> writeln(key ~ ":" ~ value); >>> } ); >>> http.setReceiveCallback( (string data) { /* drop */ } ); >>> http.perform; >>> >>> // >>> // POST with some timouts >>> // >>> http.setUrl("http://www.testing.com/test.cgi"); >>> http.setReceiveCallback( (string data) { writeln(data); } ); >>> http.setConnectTimeout(1000); >>> http.setDataTimeout(1000); >>> http.setDnsTimeout(1000); >>> http.setPostData("The quick...."); >>> http.perform; >>> >>> // >>> // PUT with data sender delegate >>> // >>> string msg = "Hello world"; >>> size_t len = msg.length; /* using chuncked transfer if omitted */ >>> >>> http.setSendCallback( delegate size_t(char[] data) { >>> if (msg.empty) return 0; >>> auto l = msg.length; >>> data[0..l] = msg[0..$]; >>> msg.length = 0; >>> return l; >>> }, >>> HttpMethod.put, len ); >>> http.perform; >>> >>> // >>> // HTTPS >>> // >>> writeln(Http.get("https://mail.google.com").content); >>> >>> // >>> // FTP >>> // >>> writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds", >>> "./downloaded-file")); >>> >>> >>> // ... authenication, cookies, interface select, progress callback // etc. is also implemented this way. >>> >>> >>> /Jonas >> > >I looked at the code again and I got 2 more suggestions: > >1.) Would it be useful to have a headersReceived callback which would be called when all headers have been received (when the data callback is called the first time)? I think of a situation where you don't know what data the server will return: a few KB html which you can easily keep in memory or a huge file which you'd have to save to disk. You can only know that if the headers have been received. It would also be possible to do that by just overwriting the headerCallback and looking out for the ContentLength/ContentType header, but I think it should also work with the default headerCallback. > >2.) >As far as I can see you store the http headers in a case sensitive way. >(res.headers[key] ~= value;). This means "Content-Length" vs >"content-length" would produce two entries in the array and it makes >it difficult to get the header from the associative array. It is maybe >useful to keep the original casing, but probably not in the array key. > >BTW: According to RFC2616 the only headers which are allowed >to be included multiple times in the response must consist of comma >separated lists. So in theory we could keep a simple string[string] >list and if we see a header twice we can just merge it with a ','. > >http://tools.ietf.org/html/rfc2616#section-4.2 >Relevant part from the RFC: >---------------------- > Multiple message-header fields with the same field-name MAY be > present in a message if and only if the entire field-value for that > header field is defined as a comma-separated list [i.e., #(values)]. > It MUST be possible to combine the multiple header fields into one > "field-name: field-value" pair, without changing the semantics of > the message, by appending each subsequent field-value to the first, > each separated by a comma. The order in which header fields with the > same field-name are received is therefore significant to the > interpretation of the combined field value, and thus a proxy MUST > NOT change the order of these field values when a message is > forwarded. >---------------------- > >I'm also done with the first pass through the http parsers. Documentation is here: http://dl.dropbox.com/u/24218791/std.protocol.http/http/http.html > >Code here: >https://gist.github.com/886612 >The http.d file is generated from the http.d.rl file. > I added some code to show how I think this could be used in the HTTP client: https://gist.github.com/886612#file_gistfile1.d Like in the .net webclient we'd need two of these collections: one for received headers and one for headers to be sent. -- Johannes Pfau | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply