September 27, 2020
On 9/27/20 6:08 AM, tchaloupka wrote:
> Hi all, I've just pushed the updated results.
> 

Thanks for continuing to work on this!

 vibe-core performs quite well -- scaling up with additional workers from 8 through 256, whereas vibe-d platform tops out around ~35,000-45,000 RPS irrespective of simultaneous workers (plateauing between 8-64 workers).

Given the outstanding performance of vibe-core it looks like there is room to continue to improve the vibe-d platform.

Cheers again for your work.
September 28, 2020
On Sun, Sep 27, 2020 at 12:10 PM tchaloupka via Digitalmars-d-announce < digitalmars-d-announce@puremagic.com> wrote:

> ...
> Some results are a bit surprising, ie that even with 10 runs
> there are tests that are faster than C/dlang raw tests - as they
> should be at the top because they really don't do anything with
> HTTP handling.. And eventcore/fibers to beat raw C epoll loop
> with fibers overhead? It just seems odd..
>  ...


I do not see TCP_NODELAY anywhere in your code for raw tests, so maybe you should try that


September 29, 2020
On Monday, 28 September 2020 at 09:44:14 UTC, Daniel Kozak wrote:
> I do not see TCP_NODELAY anywhere in your code for raw tests, so maybe you should try that

I've added new results with these changes:

* added NGINX test
* edge and level triggered variants for epoll tests (level should be faster in this particular test)
* new hybrid server variant of ARSD (thanks Adam)
* added TCP_NODELAY to listen socket in some tests (client sockets should derive this)
* make response sizes even for all tests
* errors and response size columns are printed out only when there's some difference as they are pretty meaningless otherwise
* switch to wrk load generator as a default (hey is still supported) - as it is less resource demanding and gives better control over client connections and it's worker threads

Some tests insights:

* arsd - I'm more inclined to switch it to multiCore category, at least the hybrid variant now (as it's not too fair against the others that should run just one worker with one thread event loop) - see https://github.com/tchaloupka/httpbench/pull/5 for discussion
  * ideal would be to add current variant to multiCore tests and limited variant to singleCore
* photon - I've assumed that it's working in a single thread, but it doesn't seems to (see https://github.com/tchaloupka/httpbench/issues/7)
May 26

Hi,
as there are two more HTTP server implementations:

It was time to update some numbers!

Last results can be seen here - it's a lot of numbers..

Some notes:

  • I've updated all frameworks and compilers to latest versions
  • tests has been run on the same host but separated using VMs (for workload generator and servers) with pinned CPUs (so they don't interfere each other)
  • as I have "only" 16 available threads to be used and in 12 vs 4 CPUs scenario wrk saturated all 12 CPUs, I had to switch it to 14/2 CPUs to give wrk some space
  • virtio bridged network between VMs
  • Archttp have some problem with only 2 CPUs so it's included only in the first test (it was ok with 4 CPUs and was cca 2x faster than hunt-web)
  • Serverino is set to use same number of processes as are CPUs (leaving it to default was slower so I kept it set like that)

One may notice some strange adio-http it the results. Well, it's a WIP framework (adio as an "async dlang I/O"), that is not public (yet). It has some design goals (due to it's targeted usage), that some can prefer and some won't like at all:

  • betterC - so no GC, no delegates, no classes (D ones), no exceptions, etc.
    • should be possible later to work with full D too, but it's easier to go from betterC to full D than other way around and is not in the focus now
  • linux as an only target atm.
  • epoll and io_uring async I/O api backends (can be extended with IOCP or Kqueue, but linux is main target now)
  • performance, simplicity, safety in this order (and yes with betterC there are many pointers, function callbacks, manual memory management, etc. - thanks for asan ldc team ;-))
  • middleware support - one can setup router with ie request logger, gzip, auth middlewares easily (REST API middleware would be one of them)
  • can be used with just callbacks or combined with fibers (http server itself is fibers only as it would be a callback hell otherwise)
  • each async operation can be set with timeout to simplify usage

It doesn't use any "hacks" in the benchmark. Just a real HTTP parser, simple path router, real headers writing, real Date header, etc. But has tuned parameters (no timeouts set - which others doesn't use too).
It'll be released when API settles a bit and real usage with sockets, websockets, http clients, REST API, etc. would be possible.

May 27

On Thursday, 26 May 2022 at 07:49:23 UTC, tchaloupka wrote:

>

Some notes:

  • Archttp have some problem with only 2 CPUs so it's included only in the first test (it was ok with 4 CPUs and was cca 2x faster than hunt-web)

Hi tchaloupka:

First Thank you for the benchmark project!

I fixed the performance bug the first time. (The default HTTP 1.1 connection is keep-alive)

Archttp version 1.0.2 has been released, and retesting has yielded significant performance improvements.

-- zoujiaqing

May 28

On Friday, 27 May 2022 at 20:51:14 UTC, zoujiaqing wrote:

>

On Thursday, 26 May 2022 at 07:49:23 UTC, tchaloupka wrote:

I fixed the performance bug the first time. (The default HTTP 1.1 connection is keep-alive)

Archttp version 1.0.2 has been released, and retesting has yielded significant performance improvements.

-- zoujiaqing

Hi, thanks for the PR. I've rerun the tests for archttp and it is indeed much better.
Now on par with vibe-d

Some more notes for a better performance (it's the same with vibe-d too).
See what syscalls are called during the request processing:

[pid  1453] read(10, "GET / HTTP/1.1\r\nHost: 192.168.12"..., 1024) = 117
[pid  1453] write(10, "HTTP/1.1 200 OK\r\nDate: Sat, 28 M"..., 173) = 173
[pid  1453] write(10, "Hello, World!", 13) = 13

It means two separate syscalls for header and body. This alone have huge impact on the performance and if it can be avoided, it would be much better.

Also read/write while working with a socket too, are a bit slower than recv/send.

May 28

On Saturday, 28 May 2022 at 05:44:11 UTC, tchaloupka wrote:

>

On Friday, 27 May 2022 at 20:51:14 UTC, zoujiaqing wrote:

>

It means two separate syscalls for header and body. This alone have huge impact on the performance and if it can be avoided, it would be much better.

sendv/writev also can help to save syscals when you have to send data from non-contiguous buffers.

May 29

On Thursday, 26 May 2022 at 07:49:23 UTC, tchaloupka wrote:

>

Hi,
as there are two more HTTP server implementations:

Thank you! Since it's just a young library that results sounds promising.

I'm just working on the next version, focusing on performance enhancement and windows support :)

I see there is a test where numbers are identical to arsd ones, is it a typo or a coincidence?
Andrea

May 30

On Sunday, 29 May 2022 at 06:22:43 UTC, Andrea Fontana wrote:

>

On Thursday, 26 May 2022 at 07:49:23 UTC, tchaloupka wrote:

I see there is a test where numbers are identical to arsd ones, is it a typo or a coincidence?
Andrea

Hi Andrea,
it was just a coincidence, straight out copy of the tool results.
But as I've found some bugs calculating percentiles from hey properly, I've updated the results after the fix.

I've also added results for geario (thanks #zoujiaqing).
For serverino, I've added variant that uses 16 worker subprocesses in the pool, that should lead to less blocking and worse per request times in the test environment.

Tom

May 31

On Monday, 30 May 2022 at 20:57:02 UTC, tchaloupka wrote:

>

On Sunday, 29 May 2022 at 06:22:43 UTC, Andrea Fontana wrote:

>

On Thursday, 26 May 2022 at 07:49:23 UTC, tchaloupka wrote:

I see there is a test where numbers are identical to arsd ones, is it a typo or a coincidence?
Andrea

Hi Andrea,
it was just a coincidence, straight out copy of the tool results.
But as I've found some bugs calculating percentiles from hey properly, I've updated the results after the fix.

I've also added results for geario (thanks #zoujiaqing).
For serverino, I've added variant that uses 16 worker subprocesses in the pool, that should lead to less blocking and worse per request times in the test environment.

Tom

Thank's again! Benchmark are always welcome :)

Next ›   Last »
1 2