Jump to page: 1 2
Thread overview
vectorization of a simple loop -- not in DMD?
Jul 11, 2022
Ivan Kazmenko
Jul 11, 2022
max haughton
Jul 11, 2022
IGotD-
Jul 11, 2022
ryuukk_
Jul 12, 2022
Siarhei Siamashka
Jul 12, 2022
Siarhei Siamashka
Jul 12, 2022
bauss
Jul 12, 2022
Siarhei Siamashka
Jul 12, 2022
ryuukk_
Jul 12, 2022
ryuukk_
Jul 12, 2022
bauss
Jul 12, 2022
ryuukk_
Jul 14, 2022
Siarhei Siamashka
Jul 14, 2022
ryuukk_
Jul 14, 2022
max haughton
Jul 12, 2022
bachmeier
Jul 11, 2022
Bruce Carneal
Jul 14, 2022
z
July 11, 2022

Hi.

I'm looking at the compiler output of DMD (-O -release), LDC (-O -release), and GDC (-O3) for a simple array operation:

void add1 (int [] a)
{
    foreach (i; 0..a.length)
        a[i] += 1;
}

Here are the outputs: https://godbolt.org/z/GcznbjEaf

From what I gather at the view linked above, DMD does not use XMM registers for speedup, and does not unroll the loop either. Switching between 32bit and 64bit doesn't help either. However, I recall in the past it was capable of at least some of these optimizations. So, how do I enable them for such a function?

Ivan Kazmenko.

July 11, 2022

On Monday, 11 July 2022 at 18:15:16 UTC, Ivan Kazmenko wrote:

>

Hi.

I'm looking at the compiler output of DMD (-O -release), LDC (-O -release), and GDC (-O3) for a simple array operation:

void add1 (int [] a)
{
    foreach (i; 0..a.length)
        a[i] += 1;
}

Here are the outputs: https://godbolt.org/z/GcznbjEaf

From what I gather at the view linked above, DMD does not use XMM registers for speedup, and does not unroll the loop either. Switching between 32bit and 64bit doesn't help either. However, I recall in the past it was capable of at least some of these optimizations. So, how do I enable them for such a function?

Ivan Kazmenko.

How long ago is the past? The godbolt.org dmd is quite old.

The dmd backend is ancient, it isn't really capable of these kinds of loop optimizations.

July 11, 2022

On Monday, 11 July 2022 at 18:15:16 UTC, Ivan Kazmenko wrote:

>

Hi.

I'm looking at the compiler output of DMD (-O -release), LDC (-O -release), and GDC (-O3) for a simple array operation:

void add1 (int [] a)
{
    foreach (i; 0..a.length)
        a[i] += 1;
}

Here are the outputs: https://godbolt.org/z/GcznbjEaf

From what I gather at the view linked above, DMD does not use XMM registers for speedup, and does not unroll the loop either.
[snip]

Specifying a SIMD capable target will reveal an even wider gap in capability. (LDC -mcpu=x86-64-v3 or gdc -march=x86-64-v3).

July 11, 2022

On Monday, 11 July 2022 at 18:19:41 UTC, max haughton wrote:

>

The dmd backend is ancient, it isn't really capable of these kinds of loop optimizations.

I've said it several times before. Just depreciate the the DMD backend, it's just not up to the task anymore. This is not criticism against the original purpose of it as back in the 90s and early 2000s it made sense to create your own backend. Time has moved on and we have LLVM and GCC backends with a lot of CPU support that the D project could never achieve themselves. The D project should just can the DMD backend in order to free up resources for more important tasks.

Some people say they like it because it is fast, yes it is fast because it doesn't do much.

July 11, 2022

On Monday, 11 July 2022 at 21:46:10 UTC, IGotD- wrote:

>

On Monday, 11 July 2022 at 18:19:41 UTC, max haughton wrote:

>

The dmd backend is ancient, it isn't really capable of these kinds of loop optimizations.

I've said it several times before. Just depreciate the the DMD backend, it's just not up to the task anymore. This is not criticism against the original purpose of it as back in the 90s and early 2000s it made sense to create your own backend. Time has moved on and we have LLVM and GCC backends with a lot of CPU support that the D project could never achieve themselves. The D project should just can the DMD backend in order to free up resources for more important tasks.

Some people say they like it because it is fast, yes it is fast because it doesn't do much.

I use D because DMD compiles my huge project in ~1 second (full clean rebuild)

It is a competitive advantage that many languages doesn't have

LDC clean full rebuild

$ time dub build -f --compiler=ldc2
Performing "debug" build using ldc2 for x86_64.
game ~master: building configuration "desktop"...
Linking...

real    0m18.033s
user    0m0.000s
sys     0m0.015s

LDC incremental

$ time dub build --compiler=ldc2
Performing "debug" build using ldc2 for x86_64.
game ~master: building configuration "desktop"...
Linking...

real    0m17.215s
user    0m0.000s
sys     0m0.000s

DMD clean full rebuild

$ time dub build -f --compiler=dmd
Performing "debug" build using dmd for x86_64.
game ~master: building configuration "desktop"...
Linking...

real    0m1.348s
user    0m0.031s
sys     0m0.015s

DMD incremental

$ time dub build --compiler=dmd
Performing "debug" build using dmd for x86_64.
game ~master: building configuration "desktop"...
Linking...

real    0m1.249s
user    0m0.000s
sys     0m0.000s

The day DMD gets removed is the day i will good a different language

I want to thank Walter for maintaining DMD the compiler, and making it incredibly fast at compiling code

Release perf can't beat LLVM and its amount of optimizations, but the advantage is it allows VERY FAST and QUICK iteration time, it is ESSENTIAL for developing software

July 12, 2022

On Monday, 11 July 2022 at 22:16:05 UTC, ryuukk_ wrote:

>

I use D because DMD compiles my huge project in ~1 second (full clean rebuild)

It is a competitive advantage that many languages doesn't have

The other programming languages typically use an interpreter for quick iterations and rapid development. For example, Python programming language has CPython interpreter, PyPy Just-in-Time compiler and Cython optimizing static compiler (not perfect right now, but shows a lot of promise).

D still has a certain advantage over interpreters, because DMD generated code is typically only up to twice slower than LDC generated code. If the x86 architecture stops being dominant in the future and gets displaced by ARM or RISC-V, then this may become a problem for DMD. But we'll cross that bridge when we get there.

July 12, 2022

On Monday, 11 July 2022 at 22:16:05 UTC, ryuukk_ wrote:

>

LDC clean full rebuild

$ time dub build -f --compiler=ldc2
Performing "debug" build using ldc2 for x86_64.
game ~master: building configuration "desktop"...
Linking...

real    0m18.033s
user    0m0.000s
sys     0m0.015s

DMD clean full rebuild

$ time dub build -f --compiler=dmd
Performing "debug" build using dmd for x86_64.
game ~master: building configuration "desktop"...
Linking...

real    0m1.348s
user    0m0.031s
sys     0m0.015s

BTW, I'm very curious about investigating the reason for such huge build time difference, but can't reproduce it on my computer. For example, compiling the DUB source code itself via the same DUB commands only results in DMD showing roughly twice faster build times (which is great, but nowhere close to ~13x difference):

$ git clone https://github.com/dlang/dub.git
$ cd dub
$ time dub build -f --compiler=ldc2
Performing "debug" build using ldc2 for x86_64.
dub 1.29.1+commit.38.g7f6f024f: building configuration "application"...
Serializing composite type Flags!(BuildRequirement) which has no serializable fields
Serializing composite type Flags!(BuildOption) which has no serializable fields
Linking...

real	0m34.371s
user	0m32.883s
sys	0m1.488s
$ time dub build -f --compiler=dmd
Performing "debug" build using dmd for x86_64.
dub 1.29.1+commit.38.g7f6f024f: building configuration "application"...
Serializing composite type Flags!(BuildRequirement) which has no serializable fields
Serializing composite type Flags!(BuildOption) which has no serializable fields
Linking...

real	0m14.078s
user	0m12.941s
sys	0m1.129s

Is there an open source DUB package, which can be used to reproduce a huge build time difference between LDC and DMD?

July 12, 2022

On Tuesday, 12 July 2022 at 07:06:37 UTC, Siarhei Siamashka wrote:

>

real	0m34.371s
user	0m32.883s
sys	0m1.488s

real	0m14.078s
user	0m12.941s
sys	0m1.129s

Is there an open source DUB package, which can be used to reproduce a huge build time difference between LDC and DMD?

You don't think this difference is huge? DMD is over 2x as fast.

July 12, 2022

On Tuesday, 12 July 2022 at 07:58:44 UTC, bauss wrote:

>

You don't think this difference is huge? DMD is over 2x as fast.

I think that DMD having more than 10x faster compilation speed in ryuukk_'s project shows that there is likely either a misconfiguration in DUB build setup or some other low hanging fruit for LDC. This looks like an opportunity to easily improve something in a major way.

July 12, 2022

On Tuesday, 12 July 2022 at 09:18:02 UTC, Siarhei Siamashka wrote:

>

On Tuesday, 12 July 2022 at 07:58:44 UTC, bauss wrote:

>

You don't think this difference is huge? DMD is over 2x as fast.

I think that DMD having more than 10x faster compilation speed in ryuukk_'s project shows that there is likely either a misconfiguration in DUB build setup or some other low hanging fruit for LDC. This looks like an opportunity to easily improve something in a major way.

You where right! looks like i accidentally put a dflags (O3) into the debug config for ldc!

$ time dub build -f --compiler=ldc2
Performing "debug" build using ldc2 for x86_64.
game ~master: building configuration "desktop"...
Linking...
   Creating library .dub\build\desktop-debug-windows-x86_64-ldc_v1.30.0-beta1-4B08B3C693144187830F0F15271A53A3\game.lib and object .dub\build\desktop-debug-windows-x86_64-ldc_v1.30.0-beta1-4B08B3C693144187830F0F15271A53A3\game.exp
LINK : warning LNK4098: defaultlib 'libvcruntime' conflicts with use of other libs; use /NODEFAULTLIB:library

real    0m4.521s
user    0m0.000s
sys     0m0.000s

Incremental:

$ time dub build --compiler=ldc2
Performing "debug" build using ldc2 for x86_64.
game ~master: building configuration "desktop"...
Linking...
   Creating library .dub\build\desktop-debug-windows-x86_64-ldc_v1.30.0-beta1-4B08B3C693144187830F0F15271A53A3\game.lib and object .dub\build\desktop-debug-windows-x86_64-ldc_v1.30.0-beta1-4B08B3C693144187830F0F15271A53A3\game.exp
LINK : warning LNK4098: defaultlib 'libvcruntime' conflicts with use of other libs; use /NODEFAULTLIB:library

real    0m4.516s
user    0m0.015s
sys     0m0.000s

Here updated result, down to 4.5sec

« First   ‹ Prev
1 2