Jump to page: 1 2
Thread overview
HTTP frameworks benchmark focused on D libraries
Sep 20, 2020
tchaloupka
Sep 20, 2020
Adam D. Ruppe
Sep 21, 2020
Imperatorn
Sep 21, 2020
tchaloupka
Sep 21, 2020
ikod
Sep 21, 2020
James Blachly
Sep 27, 2020
tchaloupka
Sep 27, 2020
ikod
Sep 27, 2020
Adam D. Ruppe
Sep 27, 2020
Adam D. Ruppe
Sep 28, 2020
James Blachly
Sep 28, 2020
Daniel Kozak
Sep 29, 2020
tchaloupka
May 26, 2022
tchaloupka
May 27, 2022
zoujiaqing
May 28, 2022
tchaloupka
May 28, 2022
ikod
May 29, 2022
Andrea Fontana
May 30, 2022
tchaloupka
May 31, 2022
Andrea Fontana
September 20, 2020
Hi,
as it pops up now and then (last one in https://forum.dlang.org/thread/qttjlgxjmrzzuflrjiio@forum.dlang.org) I wanted to see the various D libraries performance against each other too and ended up with https://github.com/tchaloupka/httpbench

It's just a simple plaintext response testing (nothing fancy as in Techempower) but this interests me the most as it gives the idea about the potential of the library.

More details in the README.

Hope it helps to test some ideas or improve the current solutions.

Tom
September 20, 2020
With my lib, the -version=embedded_httpd_threads build should give more consistent results in tests like this.

The process pool it uses by default in a dub build is more crash resilient, but does have a habit of dropping excessive concurrent connections. This forces them to retry which slaughters benchmarks like this. It will have like 5 ms 99th percentile (2x faster than the same test with the threads version btw), but then that final 1% of responses can take several seconds complete (indeed with 256 concurrent on my box it takes a whopping 30 seconds!). Even with only like 40 concurrent, there's a final 1% spike there, but it is more like 10ms so it isn't so noticeable, but with hundreds it grows fast.

That's probably what you're seeing here. The thread build accepts more smoothly and thus evens it out giving a nicer benchmark number... but it actually performs worse on average in real world deployments in my experience and is not as resilient to buggy code segfaulting (with processes, the individual handler respawns and resets that individual connection with no other requests affected. with threads, the whole server must respawn which also often slips by unnoticed but is more likely to disrupt unrelated users).

There is a potential "fix" for the process handler to complete these benchmarks more smoothly too, but it comes at a cost: even in the long retry cases, at least the client has some feedback. It knows its connection is not accepted and can respond appropriately. At a minimum, they won't be shoveling data at you yet. The "fix" though breaks this - you accept ALL the connections, even if you are too busy to actually process them. This leads to more inbound data potentially worsening the existing congestion and leaving users more likely to just hang. At least the unaccepted connection is specified (by TCP) to retry later automatically, but if it is accepted, acknowledged, yet unprocessed, it is unclear what to do. Odds are the user will just be left hanging until the browser decides to timeout and display its error which can actually take longer than the TCP retry window.

My threads version does it this way anyway though. So it'd probably look better on the benchmark.


But BTW stuff like this is why I don't put too much stock in benchmarks. Even if you aren't "cheating" like checking length instead of path and other tricks like that (which btw I think are totally legitimate in some cases, I said recently I see it as a *strength* when you can do that), it still leaves some nuance on the ground. Is it crash resilient? Debuggable when it crashes? Is it compatible with third-party libraries or force you to choose from ones that share your particular event loop at risk of blocking the whole server when you disobey? Does it *actually* provide the scalability it claims to under real world conditions, or did it optimize to the controlled conditions of benchmarks at the expense of dynamic adaptation to reality?

Harder to measure those.
September 21, 2020
On Sunday, 20 September 2020 at 20:03:27 UTC, tchaloupka wrote:
> Hi,
> as it pops up now and then (last one in https://forum.dlang.org/thread/qttjlgxjmrzzuflrjiio@forum.dlang.org) I wanted to see the various D libraries performance against each other too and ended up with https://github.com/tchaloupka/httpbench
>
> It's just a simple plaintext response testing (nothing fancy as in Techempower) but this interests me the most as it gives the idea about the potential of the library.
>
> More details in the README.
>
> Hope it helps to test some ideas or improve the current solutions.
>
> Tom

Cool! Nice to see such good results for D. Did you try netcore 3.1 btw? 🤔
September 21, 2020
On Monday, 21 September 2020 at 05:48:54 UTC, Imperatorn wrote:
> On Sunday, 20 September 2020 at 20:03:27 UTC, tchaloupka wrote:
>> Hi,
>> as it pops up now and then (last one in https://forum.dlang.org/thread/qttjlgxjmrzzuflrjiio@forum.dlang.org) I wanted to see the various D libraries performance against each other too and ended up with https://github.com/tchaloupka/httpbench
>>
>> It's just a simple plaintext response testing (nothing fancy as in Techempower) but this interests me the most as it gives the idea about the potential of the library.
>>
>> More details in the README.
>>
>> Hope it helps to test some ideas or improve the current solutions.
>>
>> Tom
>
> Cool! Nice to see such good results for D. Did you try netcore 3.1 btw? 🤔

There's really no reason for D to by any slower than others. It's just about the whole library package and how efficiently it's written.

Eventcore is probably closest to the system and all above just adds more overhead.

I've tried to run .Net core out of docker (I'm using podman actually) and it seems to be more performant than .Net Core 5. But it was out of the container so maybe it's just that.

I've added switches to CLI to set some load generator parameters so we can test scaling easier.

Thanks to Adam I've also pushed tests for arsd:cgi package. It's in it's own category as others are using async I/O loops. But everything has it's pros and cons.
September 21, 2020
On Sunday, 20 September 2020 at 20:03:27 UTC, tchaloupka wrote:
> Hi,
> as it pops up now and then (last one in https://forum.dlang.org/thread/qttjlgxjmrzzuflrjiio@forum.dlang.org) I wanted to see the various D libraries performance against each other too and ended up with https://github.com/tchaloupka/httpbench
>
> It's just a simple plaintext response testing (nothing fancy as in Techempower) but this interests me the most as it gives the idea about the potential of the library.
>
> More details in the README.
>
> Hope it helps to test some ideas or improve the current solutions.
>
> Tom

thanks! Very good news.
September 21, 2020
On 9/20/20 4:03 PM, tchaloupka wrote:
> Hi,
> as it pops up now and then (last one in https://forum.dlang.org/thread/qttjlgxjmrzzuflrjiio@forum.dlang.org) I wanted to see the various D libraries performance against each other too and ended up with https://github.com/tchaloupka/httpbench
> 
> It's just a simple plaintext response testing (nothing fancy as in Techempower) but this interests me the most as it gives the idea about the potential of the library.
> 
> More details in the README.
> 
> Hope it helps to test some ideas or improve the current solutions.
> 
> Tom

Thank you for doing this!

One of the most fascinating things I think is how photon really shines when concurrency gets dialed up. With 8 workers, it performs about as well, but below, the rest of the micro, including below Rust and Go /platforms/.

However, at 64 concurrent workers, photon rises to the top of the stack, performing about as well as eventcore and hunt. When going all the way up to 256, it was the only one that demonstrated **consistent performance** -- about the same as w/64, whereas ALL others dropped off, performing WORSE with 256 workers.

September 27, 2020
Hi all, I've just pushed the updated results.

Test suite modifications:

* added runner command to list available tests
* possibility to switch off keepalive connections - causes `hey` to make a new connection for each request
* added parameter to run each test multiple times and choose the best result out of the runs

Tests additions:

* new RAW tests in C to utilize epoll and io_uring (using liburing) event loops - so we have some ground base we can compare against
* same RAW tests but in Dlang too - both in betterC, epoll is basically the same, io_uring differs in that it uses my during[1] library - so we can see if there are some performance problems (as it should perform basically the same as C variant)

Some insights:

I've found the test results from hey[2] pretty inconsistent (run locally or over the network). That's the reason I've added the `bestof` switch to the runner. And the current test results are the best of 10 runs for each of them.

Some results are a bit surprising, ie that even with 10 runs there are tests that are faster than C/dlang raw tests - as they should be at the top because they really don't do anything with HTTP handling.. And eventcore/fibers to beat raw C epoll loop with fibers overhead? It just seems odd..

I'll probably add wrk[3] load generator too to see a difference with a longer running tests.

[1] https://github.com/tchaloupka/during
[2] https://github.com/rakyll/hey
[3] https://github.com/wg/wrk
September 27, 2020
On Sunday, 27 September 2020 at 10:08:24 UTC, tchaloupka wrote:
> Hi all, I've just pushed the updated results.


> * new RAW tests in C to utilize epoll and io_uring (using liburing) event loops - so we have some ground base we can


> I'll probably add wrk[3] load generator too to see a difference with a longer running tests.
>
> [1] https://github.com/tchaloupka/during
> [2] https://github.com/rakyll/hey
> [3] https://github.com/wg/wrk

Thank for this job. It may be worth to add nginx as baseline for real C-based server.

I'll add my framework as soon as it will be ready.
September 27, 2020
I fixed my event loop last night so I'll prolly release that at some point after a lil more testing, it fixes my keep-alive numbers... but harms the others so I wanna see if I can maintain those too.
September 27, 2020
On Sunday, 27 September 2020 at 10:08:24 UTC, tchaloupka wrote:
> * new RAW tests in C to utilize epoll and io_uring (using liburing) event loops - so we have some ground base we can compare against

I fixed some buffering issues in cgi.d and, if you have the right concurrency level that happens to align with the number of worker processes... I'm getting incredible results. 65k rps.

It *might* just beat the raw there. The kernel does a really good job.

Of course, it still will make other connections wait forever... but my new event loop in threads mode is now also giving me a pretty solid 26k rps on random concurrency levels with the buffering fix.

I just need to finish testing this to get some confidence before I push live but here it is on a github branch if you're curious to look:

https://github.com/adamdruppe/arsd/blob/cgi_preview/cgi.d

Compile with `-version=embedded_httpd_threads -version=cgi_use_fiber` to opt into the new event loop. But the buffering improvements should register in all usage modes.
« First   ‹ Prev
1 2