Another performance problem

Dec 07, 2013

bearophile

Dec 08, 2013

David Nadlinger

Dec 08, 2013

Dec 24, 2013

Dec 24, 2013

Dec 24, 2013

Dec 08, 2013

Dec 09, 2013

Dec 10, 2013

I have found another case where the code compiled with LDC2 is slower than the same code compiled with dmd. This time the performance difference seems very large. The D code (I compile it on 32 bit Windows): http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_Version (It's the third D entry). The dmd compile runs in about 1.22 seconds on my PC. The ldc2 compile is very slow. I compile using: dmd -O -release -inline -noboundscheck self_referential_sequence3.d ldmd2 -O -release -inline -noboundscheck self_referential_sequence3.d + strip Bye, bearophile

On Sat, Dec 7, 2013 at 2:25 AM, bearophile <bearophileHUGS@lycos.com> wrote: > http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_Version Looking at the IR generated on Linux, I can't see any obvious big performance issues (such as e.g. GC allocations where there shouldn't be any), and indeed the code runs in < 1 s on my machine (can't test on Windows right now). I did, however, find a rather severe bug: We emit the "__gshared static" globals in the MemoryPool struct as thread-local, which is a) a big correctness problem, and b) might cause substantial slowdown due to the additional overhead incurred when accessing them. David

David Nadlinger: > I did, however, find a rather severe bug: We emit the "__gshared > static" globals in the MemoryPool struct as thread-local, which is a) > a big correctness problem, and b) might cause substantial slowdown due > to the additional overhead incurred when accessing them. Usually not adding the "__gshared" annotation with DMD doesn't cause a significant slowdown of the code. > Looking at the IR generated on Linux, I can't see any obvious big > performance issues (such as e.g. GC allocations where there shouldn't > be any), and indeed the code runs in < 1 s on my machine (can't test > on Windows right now). DMD: http://codepad.org/8e6RCzlz LDC2: http://codepad.org/r2eYIOKg Bye, bearophile

On Saturday, 7 December 2013 at 01:25:30 UTC, bearophile wrote: > I have found another case where the code compiled with LDC2 is slower than the same code compiled with dmd. This time the performance difference seems very large. The D code (I compile it on 32 bit Windows): > > http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_Version > > (It's the third D entry). > > The dmd compile runs in about 1.22 seconds on my PC. The ldc2 compile is very slow. > > I compile using: > > dmd -O -release -inline -noboundscheck self_referential_sequence3.d > > ldmd2 -O -release -inline -noboundscheck self_referential_sequence3.d > + > strip > > Bye, > bearophile In my case ldmd is faster than dmd

On Monday, 9 December 2013 at 00:11:37 UTC, bearophile wrote: > Kozzi: > >> In my case ldmd is faster than dmd > > What is your operating system and compiler versions used? > > Bye, > bearophile Archlinux: LDC - the LLVM D compiler (0.12.1): based on DMD v2.063.2 and LLVM 3.3 DMD - DMD64 D Compiler v2.064

On Sunday, 8 December 2013 at 14:03:18 UTC, David Nadlinger wrote: > I did, however, find a rather severe bug: We emit the "__gshared > static" globals in the MemoryPool struct as thread-local, which is a) > a big correctness problem, and b) might cause substantial slowdown due > to the additional overhead incurred when accessing them. Turns out that this is actually a DMD issue: http://d.puremagic.com/issues/show_bug.cgi?id=4419 You might want to watch out for this trap in the future (or annoy people to fix it in the frontend, of course). David

December 24, 2013

Re: Another performance problem

Posted by bearophile
in reply to David Nadlinger

Permalink

bearophile

Posted in reply to David Nadlinger

Permalink

David Nadlinger:

> Turns out that this is actually a DMD issue: http://d.puremagic.com/issues/show_bug.cgi?id=4419
>
> You might want to watch out for this trap in the future (or annoy people to fix it in the frontend, of course).

I don't fully understand.

I didn't know about issue 4419, it looks like an ugly bug.

The same code from the rosettacode was compiled with both ldc2 and dmd, so if there's a front-end bug it should hit both compilers.

Also here the performance difference I have seen is so large (20+?) that I don't think thread local bugs could be enough to justify it.

The last version of the code I put on Rosettcode can't be compiled with the LDC2 I have because it uses a recent bug fix (it uses the .ptr of a zero length field, that until now was always null), but the Wiki site keeps all the older versions of the page. So I have used the precedent version of the D code, with and without swapping static and __gshared. And the code compiled with dmd is exactly the same performance as before, and the ldc2 code is still as slow as before in both cases.

So I think Issue 4419 is not the cause of this problem, unless there's something I don't understand still.

Bye,
bearophile

On Tuesday, 24 December 2013 at 07:39:14 UTC, bearophile wrote: > I don't fully understand. I wasn't trying to imply that this is the reason for the slowdown you observe, just that my initial response of that being a severe LDC bug was wrong. As for the actual issue, I couldn't reproduce it yet… David

Forums