Thread overview
eventcore vs boost.asio performance?
Feb 19, 2023
zoujiaqing
Feb 20, 2023
Sönke Ludwig
Feb 20, 2023
Daniel Kozak
Feb 21, 2023
tchaloupka
Feb 21, 2023
Daniel Kozak
Mar 05, 2023
zoujiaqing
Mar 07, 2023
Sönke Ludwig
Mar 08, 2023
ikod
February 19, 2023

eventcore is a very good and stable network library based on the Proactor model.
These high-performing C++ frameworks all use asio.
What can we achieve if we use eventcore as network io?

February 20, 2023
Am 19.02.2023 um 20:53 schrieb zoujiaqing:
> eventcore is a very good and stable network library based on the Proactor model.
> These high-performing C++ frameworks all use asio.
> What can we achieve if we use eventcore as network io?
> 
>   * https://github.com/vibe-d/eventcore
>   * https://www.techempower.com/benchmarks/#section=data-r21&hw=cl&test=plaintext

I'm not sure where it would be today on that list, but I got pretty competitive results for local tests on Linux a few years back. However, there are at least two performance related issues still present:

- The API uses internal allocation of memory for I/O operations and per socket descriptor. This is to work around the lack of a way to disable struct moves (and thus making it unsafe to store any kind of pointer to stack values). The allocations are pretty well optimized, but it does lead to some additional memory copies that impede performance.

- On both, Linux and Windows, there are new, faster I/O APIs: io_uring and RIO. A PR by Tobias Pankrath (https://github.com/vibe-d/eventcore/pull/175) for io_uring exists, but it still needs to be finished.

Another thing that needs to be tackled is better error propagation. Right now, there is often just a generic "error" status, without the possibility to get a more detailed error code or message.

By the way, although the vibe.d HTTP implementation naturally adds some overhead over the raw network I/O, the vibe.d results in that list, judging by their poor performance on many-core machines, appear to be affected by GC runs, or possibly some other lock contention, whereas the basic HTTP request handling should be more or less GC-free. So those shouldn't be used for comparison.
February 20, 2023
On Mon, Feb 20, 2023 at 9:30 AM Sönke Ludwig via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

> ...
>
> By the way, although the vibe.d HTTP implementation naturally adds some overhead over the raw network I/O, the vibe.d results in that list, judging by their poor performance on many-core machines, appear to be affected by GC runs, or possibly some other lock contention, whereas the basic HTTP request handling should be more or less GC-free. So those shouldn't be used for comparison.
>

Last time I checked the main reason why vibed was slower has been because of HTTP parsing. vibe-core with manual http parsing has been the same fast as all other fastest alternatives.


February 21, 2023
On Monday, 20 February 2023 at 09:12:37 UTC, Daniel Kozak wrote:
>
> Last time I checked the main reason why vibed was slower has been because of HTTP parsing. vibe-core with manual http parsing has been the same fast as all other fastest alternatives.

I've compared what syscalls various frameworks generates and by far the most difference makes that in vibe-d response header and body are written in two separate syscalls (tested on linux with epoll). That makes a pretty huge difference of about 30% if I remember correctly. Eventcore itself is not slow and is comparable with the top ones.

Tom
February 21, 2023
On Tue, Feb 21, 2023 at 10:45 AM tchaloupka via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

> On Monday, 20 February 2023 at 09:12:37 UTC, Daniel Kozak wrote:
> >
> > Last time I checked the main reason why vibed was slower has been because of HTTP parsing. vibe-core with manual http parsing has been the same fast as all other fastest alternatives.
>
> I've compared what syscalls various frameworks generates and by far the most difference makes that in vibe-d response header and body are written in two separate syscalls (tested on linux with epoll). That makes a pretty huge difference of about 30% if I remember correctly. Eventcore itself is not slow and is comparable with the top ones.
>

Yes, you are right I have changed that too when I have been trying to make vibed as fast as possible.


>
> Tom
>


March 05, 2023
On Monday, 20 February 2023 at 08:26:23 UTC, Sönke Ludwig wrote:
> I'm not sure where it would be today on that list, but I got pretty competitive results for local tests on Linux a few years back. However, there are at least two performance related issues still present:
>
> - The API uses internal allocation of memory for I/O operations and per socket descriptor. This is to work around the lack of a way to disable struct moves (and thus making it unsafe to store any kind of pointer to stack values). The allocations are pretty well optimized, but it does lead to some additional memory copies that impede performance.
>
> - On both, Linux and Windows, there are new, faster I/O APIs: io_uring and RIO. A PR by Tobias Pankrath (https://github.com/vibe-d/eventcore/pull/175) for io_uring exists, but it still needs to be finished.
>
> Another thing that needs to be tackled is better error propagation. Right now, there is often just a generic "error" status, without the possibility to get a more detailed error code or message.
>
> By the way, although the vibe.d HTTP implementation naturally adds some overhead over the raw network I/O, the vibe.d results in that list, judging by their poor performance on many-core machines, appear to be affected by GC runs, or possibly some other lock contention, whereas the basic HTTP request handling should be more or less GC-free. So those shouldn't be used for comparison.

First, the io_uring is very much to look forward to! When can you merge this PR?

Secondly, how to optimize memory allocation and release under high concurrency? Nbuff is a great library, and I've used it before.

March 07, 2023
Am 05.03.2023 um 16:14 schrieb zoujiaqing:
> On Monday, 20 February 2023 at 08:26:23 UTC, Sönke Ludwig wrote:
>> I'm not sure where it would be today on that list, but I got pretty competitive results for local tests on Linux a few years back. However, there are at least two performance related issues still present:
>>
>> - The API uses internal allocation of memory for I/O operations and per socket descriptor. This is to work around the lack of a way to disable struct moves (and thus making it unsafe to store any kind of pointer to stack values). The allocations are pretty well optimized, but it does lead to some additional memory copies that impede performance.
>>
>> - On both, Linux and Windows, there are new, faster I/O APIs: io_uring and RIO. A PR by Tobias Pankrath (https://github.com/vibe-d/eventcore/pull/175) for io_uring exists, but it still needs to be finished.
>>
>> Another thing that needs to be tackled is better error propagation. Right now, there is often just a generic "error" status, without the possibility to get a more detailed error code or message.
>>
>> By the way, although the vibe.d HTTP implementation naturally adds some overhead over the raw network I/O, the vibe.d results in that list, judging by their poor performance on many-core machines, appear to be affected by GC runs, or possibly some other lock contention, whereas the basic HTTP request handling should be more or less GC-free. So those shouldn't be used for comparison.
> 
> First, the io_uring is very much to look forward to! When can you merge this PR?

It doesn't pass the tests and has conflicts, so it needs some work. I could look into that, too, but I don't have much time available.

> Secondly, how to optimize memory allocation and release under high concurrency? Nbuff is a great library, and I've used it before.

Apart for accidental allocations in the timer code, there are very few allocations in eventcore itself. The allocation scheme exploits the small integer nature of Posix handles and keeps a fixed-size slot per file descriptor in an array of arrays. Buffer allocations for read/write operations are the responsibility of the library user.

In that regard, nbuff should be usable for high-level data buffers just fine and although I haven't used it, it sounds like a very interesting concept in terms of using range interfaces with network data.

For vibe.d's HTTP server module, I'm using a free list based allocation scheme, where each requests gets a pre-allocated buffer that is later going to be reused by another request. This means that after a warmup phase, there will be few to no allocations per request, at least in the core HTTP handling code.
March 08, 2023
On Tuesday, 7 March 2023 at 07:55:04 UTC, Sönke Ludwig wrote:
> Am 05.03.2023 um 16:14 schrieb zoujiaqing:
>> On Monday, 20 February 2023 at 08:26:23 UTC, Sönke Ludwig wrote:

> In that regard, nbuff should be usable for high-level data buffers just fine and although I haven't used it, it sounds like a very interesting concept in terms of using range interfaces with network data.
>

There is few simple ideas behind nbuff:

1) all accepted network data are immutable - this allow share buffers safely
2) usually we get data from network as a chunks of "contiguous and endless" stream, but actually we interested only in a small moving forward window of data. So it would be nice to automate receiving new data chunks, process them in current "window" and throw safely away as soon as they are processed.

Nbuff manage list of smart pointers to immutable byte buffers to implement this view on problem.