September 21, 2017
On Thursday, 21 September 2017 at 17:13:16 UTC, Daniel Kozak wrote:
> Ok, maybe there is a some small improvments, but it is nothing what would make this faster than rust and still undere 30K on my pc

More test results:
On my Win10 PC, DMD/x86/libevent:27-29K, Go:31-33K

September 21, 2017
On Thursday, 21 September 2017 at 08:01:23 UTC, Vadim Lopatin wrote:
> There is a simple set of simple web server apps written in several languages (Go, Rust, Scala, Node-js):
>
> https://github.com/nuald/simple-web-benchmark
>
> I've sent PR to include D benchmark (vibe.d).
>
> I was hoping it could show performance at least not worse than other languages.
> But it appears to be slower than Go and even Node.js
>
> Are there any tips to achieve better performance in this test?
>
> Under windows, I can get vibe.d configured to use libevent to show results comparable with Go. With other configurations, it works two times slower.
>
> Under linux, vibe.d shows 45K requests/seconds, and Go - 50K. The only advantage of D here is CPU load - 90% vs 120% in Go.
>
> I'm using DMD. Probably, ldc could speed up it a bit.
>
> Probably, it's caused by single threaded async implementation while other languages are using parallel handling of requests?

Doesn't vibe-d use Fibers?

I tried to build a simple web server with a fiber-based approach once - it was horribly slow.

I hope C# (and soon C++) style stackless resumable functions will eventually come to D.


September 21, 2017
On Thursday, 21 September 2017 at 18:49:00 UTC, bitwise wrote:
> On Thursday, 21 September 2017 at 08:01:23 UTC, Vadim Lopatin wrote:
>> There is a simple set of simple web server apps written in several languages (Go, Rust, Scala, Node-js):
>>
>> https://github.com/nuald/simple-web-benchmark
>>
>> I've sent PR to include D benchmark (vibe.d).
>>
>> I was hoping it could show performance at least not worse than other languages.
>> But it appears to be slower than Go and even Node.js
>>
>> Are there any tips to achieve better performance in this test?
>>
>> Under windows, I can get vibe.d configured to use libevent to show results comparable with Go. With other configurations, it works two times slower.
>>
>> Under linux, vibe.d shows 45K requests/seconds, and Go - 50K. The only advantage of D here is CPU load - 90% vs 120% in Go.
>>
>> I'm using DMD. Probably, ldc could speed up it a bit.
>>
>> Probably, it's caused by single threaded async implementation while other languages are using parallel handling of requests?
>
> Doesn't vibe-d use Fibers?
>
> I tried to build a simple web server with a fiber-based approach once - it was horribly slow.
>
> I hope C# (and soon C++) style stackless resumable functions will eventually come to D.

It does. But Golang uses them, too. Goroutines.

September 21, 2017
On Thursday, 21 September 2017 at 18:55:04 UTC, Vadim Lopatin wrote:
> On Thursday, 21 September 2017 at 18:49:00 UTC, bitwise wrote:
>> On Thursday, 21 September 2017 at 08:01:23 UTC, Vadim Lopatin wrote:
>>> [...]
>>
>> Doesn't vibe-d use Fibers?
>>
>> I tried to build a simple web server with a fiber-based approach once - it was horribly slow.
>>
>> I hope C# (and soon C++) style stackless resumable functions will eventually come to D.
>
> It does. But Golang uses them, too. Goroutines.

Indeed. I'm reading about them right now, and they seem to be "multiplexed". I wonder if Vibe.d does something similar.

The fact that you've observed lower CPU usage by the D version makes me think some kind of scheduling or thread-priority issue is the cause.

For example, on windows, the default timer frequency is very low. It would seem reasonable to get 1000 iterations per second in the example below, but you get ~64.

`
auto now = steady_clock::now();
auto done = now + milliseconds(10000);
int iterations = 0;

while(steady_clock::now() < done) {
    ++iterations;
    Sleep(1);
}

cout << (iterations / 10) << endl;
`

When I wrap the above code with timeBeginPeriod(1) and timeEndPeriod(1), I get ~550 on my machine.

IIRC, you get similar behavior on MacOS(maybe linux too?) unless you explicitly raise the thread priority.

https://msdn.microsoft.com/en-us/library/windows/desktop/dd757624(v=vs.85).aspx

So if you're benchmarking anything that sleeps regularly, like an event based framework, something like timeBeginPeriod/timeEndPeriod may help.


September 22, 2017
On Thursday, 21 September 2017 at 19:40:48 UTC, bitwise wrote:
> On Thursday, 21 September 2017 at 18:55:04 UTC, Vadim Lopatin
>> It does. But Golang uses them, too. Goroutines.
>
> Indeed. I'm reading about them right now, and they seem to be "multiplexed". I wonder if Vibe.d does something similar.
>
> The fact that you've observed lower CPU usage by the D version makes me think some kind of scheduling or thread-priority issue is the cause.

Fibers are being switched by waiting for signals/events.
Waiting blocks thread.
Timer should affect only non-blocked threads switching IMHO.
September 22, 2017
Am 22.09.2017 um 09:45 schrieb Vadim Lopatin:
> On Thursday, 21 September 2017 at 19:40:48 UTC, bitwise wrote:
>> On Thursday, 21 September 2017 at 18:55:04 UTC, Vadim Lopatin
>>> It does. But Golang uses them, too. Goroutines.
>>
>> Indeed. I'm reading about them right now, and they seem to be "multiplexed". I wonder if Vibe.d does something similar.
>>
>> The fact that you've observed lower CPU usage by the D version makes me think some kind of scheduling or thread-priority issue is the cause.
> 
> Fibers are being switched by waiting for signals/events.
> Waiting blocks thread.
> Timer should affect only non-blocked threads switching IMHO.

What's was the last status? Could you observe any meaningful thread scaling?

I tested on a 32-core machine a while back and could observe the performance rising almost linearly when increasing the number of cores (as it should). The effect is obviously smaller on a dual-core system where the benchmark application runs on the same system, but even there it was well visible.

If the multi-threaded version doesn't show 100% CPU usage, that would mean that some kind of thread-blocking is happening - GC collections or lock contention would be the likely candidates for that. The latter shouldn't happen anymore, as everything except for the logger should be thread-local in the latest version.

BTW, I ran Daniel's version on my dual-core notebook against wrk (Linux) and got 75kreq/s when using runWorkerTask and ~56kreq/s when using just a single thread, which is about what I would expect, considering that wrk ran on the same machine.
September 22, 2017
Am 21.09.2017 um 20:49 schrieb bitwise:
> 
> Doesn't vibe-d use Fibers?
> 
> I tried to build a simple web server with a fiber-based approach once - it was horribly slow.
> 
> I hope C# (and soon C++) style stackless resumable functions will eventually come to D.

It uses them and the overhead actually diminishes once the application does anything meaningful. To test this, I created two low-level tests for eventcore that mimic a minimal HTTP server. AFAIR, I got around 300kreq/s on a single core without fibers and around 290kreq/s with fibers, which amounts to an overhead of about 0.1µs per request.

https://github.com/vibe-d/eventcore/tree/master/examples

Stackless fibers would be really nice to have because of the merged stacks and the lower amount of reserved memory required (even though this is not a really big issue on 64-bit systems), but for pure performance I don't think they would be a critical addition.
September 22, 2017
On Thursday, 21 September 2017 at 08:01:23 UTC, Vadim Lopatin wrote:
> There is a simple set of simple web server apps written in several languages (Go, Rust, Scala, Node-js):
>
> https://github.com/nuald/simple-web-benchmark
>
> I've sent PR to include D benchmark (vibe.d).
>
> I was hoping it could show performance at least not worse than other languages.
> But it appears to be slower than Go and even Node.js
>
> Are there any tips to achieve better performance in this test?
>
> Under windows, I can get vibe.d configured to use libevent to show results comparable with Go. With other configurations, it works two times slower.
>
> Under linux, vibe.d shows 45K requests/seconds, and Go - 50K. The only advantage of D here is CPU load - 90% vs 120% in Go.
>
> I'm using DMD. Probably, ldc could speed up it a bit.
>
> Probably, it's caused by single threaded async implementation while other languages are using parallel handling of requests?

Its a bit uneven benchmark as you are testing default Go vs default D + Vibe.D.

One can use a more faster framework like Go's Gin

https://github.com/gin-gonic/gin

In my tests in the past with Vibe.D 0.8, Go was faster with the alternative frameworks.
September 23, 2017
On 9/21/17 11:49, bitwise wrote:
> On Thursday, 21 September 2017 at 08:01:23 UTC, Vadim Lopatin wrote:
>> There is a simple set of simple web server apps written in several
>> languages (Go, Rust, Scala, Node-js):
>>
>> https://github.com/nuald/simple-web-benchmark
>>
>> I've sent PR to include D benchmark (vibe.d).
>>
>> I was hoping it could show performance at least not worse than other
>> languages.
>> But it appears to be slower than Go and even Node.js
>>
>> Are there any tips to achieve better performance in this test?
>>
>> Under windows, I can get vibe.d configured to use libevent to show
>> results comparable with Go. With other configurations, it works two
>> times slower.
>>
>> Under linux, vibe.d shows 45K requests/seconds, and Go - 50K. The only
>> advantage of D here is CPU load - 90% vs 120% in Go.
>>
>> I'm using DMD. Probably, ldc could speed up it a bit.
>>
>> Probably, it's caused by single threaded async implementation while
>> other languages are using parallel handling of requests?
>
> Doesn't vibe-d use Fibers?
>
> I tried to build a simple web server with a fiber-based approach once -
> it was horribly slow.
>
> I hope C# (and soon C++) style stackless resumable functions will
> eventually come to D.
>
>

The purpose of Async/Await in C# is not to improve performance but to free up the thread while some long-running IO operation is taking place (such as talking to a remote server). In C# the biggest use case is ASP.NET/Core which allows the server to process many times the number of incoming requests(threads) than there are physical cores on the device. This works because another request is often doing some other work behind the scenes (DB query, HTTP call to remote service, etc.)

In fact MSFT says that Async/Await will decrease performance of a single instance of execution and are not to be used in situations where the delay is less than about 50ms (in 2011, i've heard that it could be even less with newer versions of the compiler) as it can actually take more time dehydrate/rehydrate the thread than the blocking operation would've taken.

-- 
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;
September 23, 2017
On Friday, 22 September 2017 at 09:48:47 UTC, Sönke Ludwig wrote:
> Am 21.09.2017 um 20:49 schrieb bitwise:
>> 
>> Doesn't vibe-d use Fibers?
>> 
>> I tried to build a simple web server with a fiber-based approach once - it was horribly slow.
>> 
>> I hope C# (and soon C++) style stackless resumable functions will eventually come to D.
>
> It uses them and the overhead actually diminishes once the application does anything meaningful. To test this, I created two low-level tests for eventcore that mimic a minimal HTTP server. AFAIR, I got around 300kreq/s on a single core without fibers and around 290kreq/s with fibers, which amounts to an overhead of about 0.1µs per request.

Interesting - I thought the cost would be higher.

Of the few different architectures I tried, the fiber based approach was much slower. It's possible that my implementation did too many unnecessary context switches.

> Stackless fibers would be really nice to have because of the merged stacks and the lower amount of reserved memory required (even though this is not a really big issue on 64-bit systems), but for pure performance I don't think they would be a critical addition.

I suppose this is off topic, but for games, or any realtime application where things need to run intermittently, but at high frequency, and in high numbers, stackless resumable functions are a big win. A lot of AI I've been writing lately (including some flocking behaviors) have been built on top of C# IEnumerators.