Thread overview
Another performance problem
Dec 07, 2013
bearophile
Dec 08, 2013
David Nadlinger
Dec 08, 2013
bearophile
Dec 24, 2013
David Nadlinger
Dec 24, 2013
bearophile
Dec 24, 2013
David Nadlinger
Dec 08, 2013
Kozzi
Dec 09, 2013
bearophile
Dec 10, 2013
Daniel Kozak
December 07, 2013
I have found another case where the code compiled with LDC2 is slower than the same code compiled with dmd. This time the performance difference seems very large. The D code (I compile it on 32 bit Windows):

http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_Version

(It's the third D entry).

The dmd compile runs in about 1.22 seconds on my PC. The ldc2 compile is very slow.

I compile using:

dmd -O -release -inline -noboundscheck self_referential_sequence3.d

ldmd2 -O -release -inline -noboundscheck self_referential_sequence3.d
+
strip

Bye,
bearophile
December 08, 2013
On Sat, Dec 7, 2013 at 2:25 AM, bearophile <bearophileHUGS@lycos.com> wrote:
> http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_Version

Looking at the IR generated on Linux, I can't see any obvious big performance issues (such as e.g. GC allocations where there shouldn't be any), and indeed the code runs in < 1 s on my machine (can't test on Windows right now).

I did, however, find a rather severe bug: We emit the "__gshared static" globals in the MemoryPool struct as thread-local, which is a) a big correctness problem, and b) might cause substantial slowdown due to the additional overhead incurred when accessing them.

David
December 08, 2013
David Nadlinger:

> I did, however, find a rather severe bug: We emit the "__gshared
> static" globals in the MemoryPool struct as thread-local, which is a)
> a big correctness problem, and b) might cause substantial slowdown due
> to the additional overhead incurred when accessing them.

Usually not adding the "__gshared" annotation with DMD doesn't cause a significant slowdown of the code.



> Looking at the IR generated on Linux, I can't see any obvious big
> performance issues (such as e.g. GC allocations where there shouldn't
> be any), and indeed the code runs in < 1 s on my machine (can't test
> on Windows right now).

DMD:
http://codepad.org/8e6RCzlz

LDC2:
http://codepad.org/r2eYIOKg

Bye,
bearophile
December 08, 2013
On Saturday, 7 December 2013 at 01:25:30 UTC, bearophile wrote:
> I have found another case where the code compiled with LDC2 is slower than the same code compiled with dmd. This time the performance difference seems very large. The D code (I compile it on 32 bit Windows):
>
> http://rosettacode.org/wiki/Self-referential_sequence#Faster_Low-level_Version
>
> (It's the third D entry).
>
> The dmd compile runs in about 1.22 seconds on my PC. The ldc2 compile is very slow.
>
> I compile using:
>
> dmd -O -release -inline -noboundscheck self_referential_sequence3.d
>
> ldmd2 -O -release -inline -noboundscheck self_referential_sequence3.d
> +
> strip
>
> Bye,
> bearophile
In my case ldmd is faster than dmd
December 09, 2013
Kozzi:

> In my case ldmd is faster than dmd

What is your operating system and compiler versions used?

Bye,
bearophile
December 10, 2013
On Monday, 9 December 2013 at 00:11:37 UTC, bearophile wrote:
> Kozzi:
>
>> In my case ldmd is faster than dmd
>
> What is your operating system and compiler versions used?
>
> Bye,
> bearophile

Archlinux:
LDC - the LLVM D compiler (0.12.1):
  based on DMD v2.063.2 and LLVM 3.3

DMD - DMD64 D Compiler v2.064
December 24, 2013
On Sunday, 8 December 2013 at 14:03:18 UTC, David Nadlinger wrote:
> I did, however, find a rather severe bug: We emit the "__gshared
> static" globals in the MemoryPool struct as thread-local, which is a)
> a big correctness problem, and b) might cause substantial slowdown due
> to the additional overhead incurred when accessing them.

Turns out that this is actually a DMD issue: http://d.puremagic.com/issues/show_bug.cgi?id=4419

You might want to watch out for this trap in the future (or annoy people to fix it in the frontend, of course).

David
December 24, 2013
David Nadlinger:

> Turns out that this is actually a DMD issue: http://d.puremagic.com/issues/show_bug.cgi?id=4419
>
> You might want to watch out for this trap in the future (or annoy people to fix it in the frontend, of course).

I don't fully understand.

I didn't know about issue 4419, it looks like an ugly bug.

The same code from the rosettacode was compiled with both ldc2 and dmd, so if there's a front-end bug it should hit both compilers.

Also here the performance difference I have seen is so large (20+?) that I don't think thread local bugs could be enough to justify it.

The last version of the code I put on Rosettcode can't be compiled with the LDC2 I have because it uses a recent bug fix (it uses the .ptr of a zero length field, that until now was always null), but the Wiki site keeps all the older versions of the page. So I have used the precedent version of the D code, with and without swapping static and __gshared. And the code compiled with dmd is exactly the same performance as before, and the ldc2 code is still as slow as before in both cases.

So I think Issue 4419 is not the cause of this problem, unless there's something I don't understand still.

Bye,
bearophile
December 24, 2013
On Tuesday, 24 December 2013 at 07:39:14 UTC, bearophile wrote:
> I don't fully understand.

I wasn't trying to imply that this is the reason for the slowdown you observe, just that my initial response of that being a severe LDC bug was wrong.

As for the actual issue, I couldn't reproduce it yet…

David