January 25, 2022
On Tue, Jan 25, 2022 at 01:30:59PM -0800, Ali Çehreli via Digitalmars-d-learn wrote: [...]
> I posted the program to have more eyes on the assembly. ;)
[...]

I tested the code locally, and observed, just like Ali did, that the LDC version is unambiguously slower than the gdc version by a small margin.

So I decided to compare the disassembly.  Due to the large number of templates in the main spellOut/spellOutImpl functions, I didn't have the time to look at all of them; I just arbitrarily picked the !(int) instantiation. And I'm seeing something truly fascinating:

- The GDC version has at its core a single idivl instruction for the /
  and %= operators (I surmise that the optimizer realized that both
  could share the same instruction because it yields both results).  The
  function is short and compact.

- The LDC version, however, seems to go out of its way to avoid the
  idivl instruction, having instead a whole bunch of shr instructions
  and imul instructions involving magic constants -- the kind of stuff
  you see in bit-twiddling hacks when people try to ultra-optimize their
  code.  There also appears to be some loop unrolling, and the function
  is markedly longer than the GDC version because of this.

This is very interesting because idivl is known to be one of the slower instructions, but gdc nevertheless considered it not worthwhile to replace it, whereas ldc seems obsessed about avoid idivl at all costs.

I didn't check the other instantiations, but it would appear that in this case the simpler route of just using idivl won over the complexity of trying to replace it with shr+mul.


T

-- 
Guns don't kill people. Bullets do.
January 25, 2022
On Tuesday, 25 January 2022 at 22:33:37 UTC, H. S. Teoh wrote:
> interesting because idivl is known to be one of the slower instructions, but gdc nevertheless considered it not worthwhile to replace it, whereas ldc seems obsessed about avoid idivl at all costs.

Interesting indeed.  Two remarks:

1. Actual performance cost of div depends a lot on hardware.  IIRC on my old intel laptop it's like 40-60 cycles; on my newer amd chip it's more like 20; on my mac it's ~10.  GCC may be assuming newer hardware than llvm.  Could be worth popping on a -march=native -mtune=native.  Also could depend on how many ports can do divs; i.e. how many of them you can have running at a time.

2. LLVM is more aggressive wrt certain optimizations than gcc, by default.  Though I don't know how relevant that is at -O3.
January 25, 2022
On 1/25/22 14:33, H. S. Teoh wrote:

> This is very interesting

Fascinating code generation and investigation! :)

Ali

January 25, 2022
On Tue, Jan 25, 2022 at 10:41:35PM +0000, Elronnd via Digitalmars-d-learn wrote:
> On Tuesday, 25 January 2022 at 22:33:37 UTC, H. S. Teoh wrote:
> > interesting because idivl is known to be one of the slower instructions, but gdc nevertheless considered it not worthwhile to replace it, whereas ldc seems obsessed about avoid idivl at all costs.
> 
> Interesting indeed.  Two remarks:
> 
> 1. Actual performance cost of div depends a lot on hardware.  IIRC on my old intel laptop it's like 40-60 cycles; on my newer amd chip it's more like 20; on my mac it's ~10.  GCC may be assuming newer hardware than llvm.  Could be worth popping on a -march=native -mtune=native. Also could depend on how many ports can do divs; i.e. how many of them you can have running at a time.

I tried `ldc2 -mcpu=native` but that did not significantly change the performance.


> 2. LLVM is more aggressive wrt certain optimizations than gcc, by default.  Though I don't know how relevant that is at -O3.

Yeah, I've noted in the past that LDC seems to be pretty aggressive with inlining / loop unrolling, whereas GDC has a thing for vectorization and SIMD/XMM usage.  The exact outcomes are a toss-up, though. Sometimes LDC wins, sometimes GDC wins.  Depends on what exactly the code is doing.


T

-- 
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next. -- (Stolen from the net)
January 25, 2022
On Tuesday, 25 January 2022 at 20:01:18 UTC, Johan wrote:
>
> Tough to say. Of course DMD is not a serious contender, but I believe the difference between GDC and LDC is very small and really in the details, i.e. you'll have to look at assembly to find out the delta.
> Have you tried `--enable-cross-module-inlining` with LDC?
>
> -Johan

dmd is the best though, in terms of compilation speed without optimisation.

As I write/test A LOT of code, that time saved is very much appreciated ;-)

I hope it remains that way.
January 25, 2022
On Tue, Jan 25, 2022 at 11:01:57PM +0000, forkit via Digitalmars-d-learn wrote:
> On Tuesday, 25 January 2022 at 20:01:18 UTC, Johan wrote:
> > 
> > Tough to say. Of course DMD is not a serious contender, but I believe the difference between GDC and LDC is very small and really in the details, i.e. you'll have to look at assembly to find out the delta.  Have you tried `--enable-cross-module-inlining` with LDC?
[...]
> dmd is the best though, in terms of compilation speed without optimisation.
> 
> As I write/test A LOT of code, that time saved is very much appreciated ;-)
[...]

My general approach is: use dmd for iterating the code - compile - test cycle, and use LDC for release/production builds.


T

-- 
Chance favours the prepared mind. -- Louis Pasteur
January 26, 2022
On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali Çehreli wrote:
>
> I am using compilers installed by Manjaro Linux's package system:
>
> ldc: LDC - the LLVM D compiler (1.28.0):
>   based on DMD v2.098.0 and LLVM 13.0.0
>
> gdc: dc (GCC) 11.1.0
>
> dmd: DMD64 D Compiler v2.098.1

What phobos version is gdc using?

-Johan

January 25, 2022
On 1/25/22 16:15, Johan wrote:
> On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali Çehreli wrote:
>>
>> I am using compilers installed by Manjaro Linux's package system:
>>
>> ldc: LDC - the LLVM D compiler (1.28.0):
>>   based on DMD v2.098.0 and LLVM 13.0.0
>>
>> gdc: dc (GCC) 11.1.0
>>
>> dmd: DMD64 D Compiler v2.098.1
>
> What phobos version is gdc using?

Oh! Good question. Unfortunately, I don't think Phobos modules contain that information. The following line outputs 2076L:

pragma(msg, __VERSION__);

So, I guess I've been comparing apples to oranges but in this case an older gdc is doing pretty well.

Ali

January 26, 2022
On Wednesday, 26 January 2022 at 04:28:25 UTC, Ali Çehreli wrote:
> On 1/25/22 16:15, Johan wrote:
> > On Tuesday, 25 January 2022 at 19:52:17 UTC, Ali Çehreli
> wrote:
> >>
> >> I am using compilers installed by Manjaro Linux's package
> system:
> >>
> >> ldc: LDC - the LLVM D compiler (1.28.0):
> >>   based on DMD v2.098.0 and LLVM 13.0.0
> >>
> >> gdc: dc (GCC) 11.1.0
> >>
> >> dmd: DMD64 D Compiler v2.098.1
> >
> > What phobos version is gdc using?
>
> Oh! Good question. Unfortunately, I don't think Phobos modules contain that information. The following line outputs 2076L:
>
> pragma(msg, __VERSION__);
>
> So, I guess I've been comparing apples to oranges but in this case an older gdc is doing pretty well.
>

Doubt it.  Functions such as to(), map(), etc. have pretty much remained unchanged for the last 6-7 years.

Whenever I've watched talks/demos where benchmarks were the central topic, GDC has always blown LDC out the water when it comes to matters of math.

Even in more recent examples where I've been pushing for native complex to be replaced with std.complex, LDC was found to be slower with std.complex, but GDC was either equal, or faster than native (and GDC std.complex was faster than LDC).
January 26, 2022
On Wednesday, 26 January 2022 at 11:25:47 UTC, Iain Buclaw wrote:
>
> Whenever I've watched talks/demos where benchmarks were the central topic, GDC has always blown LDC out the water when it comes to matters of math.
> ..

https://dlang.org/blog/2020/05/14/lomutos-comeback/