Language performance benchmark be updated 2019/11/09 (page 2)

On Sat, Nov 16, 2019 at 5:50 PM Jacob Shtokolov via Digitalmars-d <digitalmars-d@puremagic.com> wrote: > > On Saturday, 16 November 2019 at 16:34:58 UTC, Jacob Shtokolov wrote: > > Just tried to compile and run Base64 > > The Havlak test is closer to reality: > > ``` > Nim: 12.24s, 477.8Mb > C++: 17.33s, 179.3Mb > Golang: 21.58s, 358.0Mb > D LDC2: 23.55s, 460.4Mb > D DMD: 29.04s, 461.9Mb > ``` > > Nim is the winner. > > But here I would look into the code: what makes LDC produce such poorly optimized binary. > LDC binary is ok, this is about GC, I was able to make it lamost as twice fast for ldc with some improvments

On Sun, Nov 17, 2019 at 11:36 AM Daniel Kozak <kozzi11@gmail.com> wrote: > > > Nim is the winner. > > > > But here I would look into the code: what makes LDC produce such poorly optimized binary. > > > > LDC binary is ok, this is about GC, I was able to make it lamost as twice fast for ldc with some improvements original code Golang: 22.74s, 364.1Mb D LDC2: 29.55s, 463.9Mb D DMD: 29.42s, 462.5Mb D GDC: 25.28s, 415.3Mb Nim: 14.26s, 468,9Mb with small changes: Golang: 22.74s, 364.1Mb D LDC2: 15.90s, 389.8Mb D DMD: 16.86s, 387.3Mb D GDC: 19.48s, 403.8Mb Nim: 14.26s, 468,9Mb

On Saturday, 16 November 2019 at 16:45:02 UTC, Jacob Shtokolov wrote: > On Saturday, 16 November 2019 at 16:34:58 UTC, Jacob Shtokolov wrote: >> Just tried to compile and run Base64 > > The Havlak test is closer to reality: > > ``` > Nim: 12.24s, 477.8Mb > C++: 17.33s, 179.3Mb > Golang: 21.58s, 358.0Mb > D LDC2: 23.55s, 460.4Mb > D DMD: 29.04s, 461.9Mb > ``` > > Nim is the winner. > > But here I would look into the code: what makes LDC produce such poorly optimized binary. C++ memory consumption is way lower than the rest. Is this because of the tracing GC penalty? It would have been interesting to see Rust here as it doesn't use GC and if it would get close to the C++ memory consumption.

On 11/17/19 6:04 AM, Daniel Kozak wrote: > On Sun, Nov 17, 2019 at 11:36 AM Daniel Kozak <kozzi11@gmail.com> wrote: >> LDC binary is ok, this is about GC, I was able to make it lamost as >> twice fast for ldc with some improvements Can you summarize or share the changes for learning purposes?

On Sunday, 17 November 2019 at 10:36:41 UTC, Daniel Kozak wrote: > LDC binary is ok, this is about GC, I was able to make it almost as twice fast for ldc with some improvements Just checked the code and found that they're using allocations with `new` in loops. But that's very interesting to see what changes you made to make it run so much faster! Could you please share it somewhere?

On Sun, Nov 17, 2019 at 2:50 PM Jacob Shtokolov via Digitalmars-d <digitalmars-d@puremagic.com> wrote: > > On Sunday, 17 November 2019 at 10:36:41 UTC, Daniel Kozak wrote: > > LDC binary is ok, this is about GC, I was able to make it almost as twice fast for ldc with some improvements > > Just checked the code and found that they're using allocations with `new` in loops. But that's very interesting to see what changes you made to make it run so much faster! > > Could you please share it somewhere? > Sorry I missed insert the link. It is on my github: https://github.com/Kozzi11/benchmarks/tree/improve_d

On Sunday, 17 November 2019 at 11:04:55 UTC, Daniel Kozak wrote: > On Sun, Nov 17, 2019 at 11:36 AM Daniel Kozak <kozzi11@gmail.com> wrote: >> >> > Nim is the winner. >> > >> > But here I would look into the code: what makes LDC produce such poorly optimized binary. >> > >> >> LDC binary is ok, this is about GC, I was able to make it lamost as twice fast for ldc with some improvements > > original code > Golang: 22.74s, 364.1Mb > D LDC2: 29.55s, 463.9Mb > D DMD: 29.42s, 462.5Mb > D GDC: 25.28s, 415.3Mb > Nim: 14.26s, 468,9Mb > > with small changes: > Golang: 22.74s, 364.1Mb > D LDC2: 15.90s, 389.8Mb > D DMD: 16.86s, 387.3Mb > D GDC: 19.48s, 403.8Mb > Nim: 14.26s, 468,9Mb With full LTO, I'm seeing an additional 5% boost on Windows (-flto=full -defaultlib=phobos2-ldc-lto,druntime-ldc-lto). As they are using gcc LTO for the brainfuck2 benchmark too (https://github.com/kostya/benchmarks/blob/2777925c4e64987e83e9a53478910de080408057/brainfuck2/build.sh#L5), I wouldn't consider it to be cheating.

On Sunday, 17 November 2019 at 14:15:00 UTC, Daniel Kozak wrote: > Sorry I missed insert the link. It is on my github: https://github.com/Kozzi11/benchmarks/tree/improve_d Now it's faster than the C++ version on my machine: ``` Nim: 12.01s, 478.1Mb D LDC2: 13.48s, 428.1Mb C++: 19.97s, 179.3Mb Golang: 21.90s, 364.7Mb ``` So basically the only critical change was to replace the built-in associative arrays with Appender types? That's really amazing!

> So basically the only critical change was to replace the built-in associative arrays with Appender types? > > That's really amazing! Not only, other change is not filling number AA with UNVISITED, the other change is to disable parallel GC, because it is cause performance decrease

November 17, 2019

Re: Language performance benchmark be updated 2019/11/09

Posted by Jon Degenhardt
in reply to Daniel Kozak

Permalink

Jon Degenhardt

Posted in reply to Daniel Kozak

Permalink

On Sunday, 17 November 2019 at 16:25:52 UTC, Daniel Kozak wrote:
>> So basically the only critical change was to replace the built-in associative arrays with Appender types?
>>
>> That's really amazing!
>
> Not only, other change is not filling number AA with UNVISITED, the other change is to disable parallel GC, because it is cause performance decrease

Regarding the benefits seen from switching from AAs to Appenders - This is a nice performance improvement. Also a nice example of often available performance improvements in D programs.

At a high level, I feel I've seen this pattern a number of times. When people starting with D run benchmarks as part of their initial experiments, they naturally start with the simplest and most straightforward programming approaches. Nothing wrong with this. It's a strength of D that quality code can be written quickly.

However, in many cases these simple approaches allocate a fair bit of GC memory, memory that becomes unused quickly and needs to be GC collected. Again, nothing wrong with this. But, I have the impression that many times there is an expectation that such code will perform similarly to code using manually managed memory in other native compiled languages. And often this expectation is not met, as memory allocation and use patterns are a major performance driver.

What often gets missed in these assessments is that D has quite a few mechanisms available to enable better memory management use, without needing to drop GC paradigms entirely and move to fully manually managed memory. Modifying performance sensitive programs to use these mechanisms is often not hard. The switch here from AAs to Appenders is an example.

Being able to improve program performance in this way is a strength of D. One consideration is that until one has some experience with the language, it may not be obvious that these options exist, and the specific changes and approaches that can be used. This can lead to perception issues if nothing else.

--Jon

Forums