Thread overview
Httparsed - fast native dlang HTTP 1.x message header parser
Dec 14, 2020
tchaloupka
Dec 15, 2020
Adam D. Ruppe
Dec 15, 2020
H. S. Teoh
Dec 15, 2020
Adam D. Ruppe
Dec 15, 2020
tchaloupka
Dec 15, 2020
Adam D. Ruppe
Dec 15, 2020
Jacob Carlborg
December 14, 2020
Hi,
I was missing some commonly usable HTTP parser on code.dlang.org and after some research and work I've published httparsed[1].

It's inspired by picohttpparser[2] which is great, but instead of a binding, I wanted something native to D. Go has it's own parsers, Rust has it's own parsers, why not D?

I think we're missing other small libraries like this on the code.dlang.org to be commonly used in larger ones like it's so common in other languages - while improving the ecosystem. Vibe-d is just huuuge.

It is nothrow, @nogc and can work with betterC. It just parses the message header and calls provided callbacks with slices to the original buffer to be handled as needed by the caller.

Same as picohttpparser it uses SSE4.2 `_mm_cmpestri` instruction to speedup the invalid characters lookup (when built with ldc2 and target that supports it).

It has pretty thorough test suite.
Can parse incomplete message headers.
Can continue parsing from the last completely parsed line.
Doesn't enforce method or protocol version on itself to be usable with other internet message like protocols as is for example RTSP.

Performance wise it's pretty on par with picohttpparser [3]. Without SSE4.2 it's a bit faster, with SSE4.2 it's a bit slower and I can't figure out why :/.
But overall, I'm pretty happy with the outcome.

I've tried to check and compare with two popular libraries and:

* vibe-d - performs nearly the same as http_parser[4] (but that itself is pretty slow and now obsolete), but as it looks, doesn't do much in regard of RFC conformance - some test's from [2] won't pass for sure

* arsd's cgi.d - I haven't expected it to be so much slower than vibe-d parser, it's almost 3 times slower, but on the other hand it's super simple idiomatic D (again doesn't check or allow what RFC says it should and many tests will fail)
  * I guess the main problem would be `idup` on every line and autodecode
  * Stripped down minimalistic version of the original [5] is here [6]

[1] https://code.dlang.org/packages/httparsed
[2] https://github.com/h2o/picohttpparser
[3] https://i.imgur.com/iRCDGVo.png
[4] https://github.com/nodejs/http-parser
[5] https://github.com/adamdruppe/arsd/blob/402ea062b81197410b05df7f75c299e5e3eef0d8/cgi.d#L1737
[6] https://github.com/tchaloupka/httparsed/blob/230ba9a4a280ba91267a22e97137be12269b5574/bench/bench.d#L194
December 15, 2020
On Monday, 14 December 2020 at 21:59:02 UTC, tchaloupka wrote:
> * arsd's cgi.d - I haven't expected it to be so much slower than vibe-d parser, it's almost 3 times slower, but on the other hand it's super simple idiomatic D (again doesn't check or allow what RFC says it should and many tests will fail)

yeah, I think I actually wrote that about eight years ago and then never revisited it.... actually git blame says "committed on Mar 24, 2012" so almost nine! And indeed, that git blame shows the bulk of it is still the initial commit, though a few `toLower`s got changed to `asLowerCase` a few years ago... so it used to be even worse! lol

But wanna see something that will make you cry?

https://github.com/adamdruppe/arsd/blob/master/http2.d#L1232

I have another http header parser!!! That's for my client, and as you can see, it is... not great. The case-insensitivity for example is a mega hack and I actually need to fix that eventually.

At least there's some support for line continuations there. I don't remember if I ever actually tested that though, it seems most clients and servers don't do that anyway.
December 14, 2020
On Tue, Dec 15, 2020 at 12:11:44AM +0000, Adam D. Ruppe via Digitalmars-d-announce wrote:
> On Monday, 14 December 2020 at 21:59:02 UTC, tchaloupka wrote:
> > * arsd's cgi.d - I haven't expected it to be so much slower than vibe-d parser, it's almost 3 times slower, but on the other hand it's super simple idiomatic D (again doesn't check or allow what RFC says it should and many tests will fail)
> 
> yeah, I think I actually wrote that about eight years ago and then never revisited it.... actually git blame says "committed on Mar 24, 2012" so almost nine! And indeed, that git blame shows the bulk of it is still the initial commit, though a few `toLower`s got changed to `asLowerCase` a few years ago... so it used to be even worse! lol

Slow or not, cgi.d is totally awesome in my book, because recently it saved my life.  While helping out someone, I threw together a little D script to do what he wanted; only, I run Linux and he runs a Mac, and my script is CLI-only while he's a non-poweruser and has no idea what to do at the command prompt.  So naturally my thought was, let's give this a web interface so that there's a fighting chance non-programmers would know how to use it.  Being a program I wrote in literally 4 hours (possibly less), I wasn't going to let it turn into a monster full of hundreds of 3rd party dependencies, so I reached for my trusty solution: arsd's cgi.d.

Just a single file, no network dependencies, no complicated builds, just drop the file into my code, import it, and off I go.  Better yet, it came with a built-in CLI request tester: perfect for local testing without the hassle of needing to start/stop an entire web service just to run a quick test; plus a compile-time switch to adapt it to any common webserver interface you like: CGI, FastCGI, even standalone HTTP server.  Problem solved in a couple o' hours, as opposed to who knows how long it would have taken to engineer a "real" solution with vibe.d or one of the other heavyweight "frameworks" out there.

It may not be the fastest web module in the D world, but it's certainly danged convenient, does the necessary job with a minimum of fuss, easily adaptable to a variety of common use cases, and best of all, requires basically no dependencies beyond just dropping the file into your code.

For that alone, I think Adam deserves a salute.

(But of course, if Adam improves cgi.d to be competitive with vibe.d,
then it could totally rock the D world! ;-))


T

-- 
Written on the window of a clothing store: No shirt, no shoes, no service.
December 15, 2020
On Tuesday, 15 December 2020 at 00:32:42 UTC, H. S. Teoh wrote:
> It may not be the fastest web module in the D world

It actually does quite well, see: https://github.com/tchaloupka/httpbench (from the same OP here :) )

The header parser is nothing special, but since header parsing is a small part of the overall problem, it is good enough.

Though I have been tempted to optimize it a bit more since in a hello world benchmark even a small thing like header parsing can be noticeable. The fact that it does some totally unnecessary GC allocations can perhaps add up too.

(If I was doing all this again from scratch I'd actually be tempted to do a zero-copy, all lazy version. Read from the socket directly into the request-local buffer, then slice into it while parsing, then do decoding on-demand in that same buffer - url encoding always takes more space than the decoded version - and the result should be basically the fastest thing you can get. And if something comes in above typical size, then it can go back to the normal reallocated buffer and still win big on the average request. The problem with doing that now would be maintaining compatibility with my existing API.)

> (But of course, if Adam improves cgi.d to be competitive with vibe.d

My biggest deficit compared to vibe is prolly documentation. Especially of my advanced features which are practically hidden.
December 15, 2020
On Tuesday, 15 December 2020 at 00:32:42 UTC, H. S. Teoh wrote:
> For that alone, I think Adam deserves a salute.
>
> (But of course, if Adam improves cgi.d to be competitive with vibe.d,
> then it could totally rock the D world! ;-))
> T

Yes absolutely, arsd has a bit different usecase and target audience, no one should expect it to beat top 10 of highly optimized frameworks in techempower benchmark ;-)

But if these benchmarks helps Adam to make some incremental improvements it's a plus and many of that can be pretty low hanging fruit.

If I take one number of arsd from the httpbench - 27469 RPS
It means 36.4us per request.
In http parser test it is about 2.4us per request, while httparsed is about 0.1us per request.

That means that with a performant parser, arsd could go up to around 27548 RPS -> not much of a difference that would be worth the hassle..
December 15, 2020
On Tuesday, 15 December 2020 at 10:04:42 UTC, tchaloupka wrote:
> But if these benchmarks helps Adam to make some incremental improvements it's a plus and many of that can be pretty low hanging fruit.

Yeah, I think the biggest benefit to changing this around is to just avoid creating unnecessary garbage.

On the individual item, it doesn't really matter, but it can build up to a totally wasted collection cycle as time goes on. Just on the other hand, in any non-trivial real world application there's likely to be some garbage generated anyway and this will disappear into the noise.

Though in the hello world benches it could bring the "max" column down since I'm p sure that is caused by a GC cycle and hello world can potentially avoid having even one :P

> That means that with a performant parser, arsd could go up to around 27548 RPS -> not much of a difference that would be worth the hassle..

Yeah, that one is basically entirely the result of the thread work queue. If everything else was perfect, the thread stuff would still dominate. (My evidence for this is the hybrid and process dispatchers doing pretty consistently better. The thread one though is simple and cross-platform which is nice - like without it, that Mac version probably wouldn't have worked at all since I've written no mac-specific code in this module.)
December 15, 2020
On 2020-12-14 22:59, tchaloupka wrote:
> Hi,
> I was missing some commonly usable HTTP parser on code.dlang.org and after some research and work I've published httparsed[1].

This is awesome. I wanted to use picohttpparser myself and used the C version. But if you already have created a HTTP parser with the same properties in D, that's even better.


-- 
/Jacob Carlborg