June 07, 2022
dmd doesn't do autovectorization.

However, you can write vector code using vector types if you wish.
June 07, 2022

On Tuesday, 7 June 2022 at 06:22:00 UTC, Walter Bright wrote:

>

On 6/6/2022 9:53 PM, bauss wrote:

>

Typically the difference is in the milliseconds, which won't matter much for most enterprise work.

Thanks for the kind words! A few years ago, one person posted here that clang invented data flow analysis and I should add it to dmd. I replied with a link to the data flow analysis code in dmd, that was written in 1985 or so :-)

>

The result of that article, though, destroyed my sales for a quarter or so. Then other compilers added DFA, and the benchmark code got fixed for the next compiler roundup.

Sigh.

Unfair, there are bad people!
They are always coping.

June 07, 2022
On Tuesday, 7 June 2022 at 04:53:36 UTC, bauss wrote:
> On Tuesday, 7 June 2022 at 01:24:07 UTC, Walter Bright wrote:
>>
>> There is a persistent idea that there is something fundamentally wrong with DMD. There isn't. It's just that optimization involves an accumulation of a large number of special cases, and clang has a lot of people adding special cases.
>>
>
> I have only ever used DMD, never bothered using anything else and it has never hindered any of my work or mattered in the slightest.
>
> Nothing I work with suffers from the difference in optimization as I don't have anything that's real-time sensitive.
>
> As long as the work is done in a reasonable amount of time (That of course depends on what it is.) then I'm fine with it.
>
> Typically the difference is in the milliseconds, which won't matter much for most enterprise work.

Unfortunately the dmd optimizer and inliner are somewhat buggy so most "enterprise" users actually avoid them like the plague or have fairly hard-won scars. At least one of our libraries doesn't work with -O -inline, many others cases have similar issues.

It's not just about performance, even if LLVM and GCC are much faster.

optimization isn't just about adding special cases, the sheer amount of optimizations dmd would have to learn on top of an already horrific codebase in the backend (note that to start with the register allocator is pretty bad, but isn't easily replaced) means while it's technically possible it's practically just not realistic.

The IR used by the backend is also quite annoying to work with. SSA is utterly dominant in compilers (and has been since the late 90s) for a reason. It's not it's fault because it's so old but the infrastructure and invariants you get from it are just nothing even close to being easy to work on versus even GCC before they realized LLVM was about to completely destroy them (almost no one does new research with GCC anymore mostly because LLVM is easier to work on).

This IR and the lack of structure around operations on it is why dmd has so many bugs wrt things like SIMD code generation. GCC and LLVM learnt the hard way that you need to segment work into multiple passes and possibly different data structures entirely (either artificially, like LCSSA, or using an entirely different IR like GCC's rtl). These are not cheap things to do but paying for them also buys you correctness.

dmd should just focus on being the debug compiler i.e. be fast. I would much rather have the codebase be clean enough that I could easily get it generating code on my AArch64 Mac than have it chase after being able to do xyz. I will never actually care about dmds ability to optimize something.
June 07, 2022

On Tuesday, 7 June 2022 at 07:05:04 UTC, Walter Bright wrote:

>

dmd doesn't do autovectorization.

However, you can write vector code using vector types if you wish.

vector types are a great feature. That said, for readability I'm migrating my __vector code base to autovectorization for the CPU-only deployments and autovec+SIMT/dcompute for the rest.

Fortunately the recent autovectorizer code performance equals or exceeds the "manual" code in many instances (more aggressive unrolling and better/finer-grain handling of intermediate lengths). OTOH, if perf drops,and SIMT is not available, __vector it is!

June 07, 2022

On Tuesday, 7 June 2022 at 08:44:27 UTC, max haughton wrote:

>

This IR and the lack of structure around operations on it is why dmd has so many bugs wrt things like SIMD code generation. GCC and LLVM learnt the hard way that you need to segment work into multiple passes and possibly different data structures entirely (either artificially, like LCSSA, or using an entirely different IR like GCC's rtl). These are not cheap things to do but paying for them also buys you correctness.

dmd should just focus on being the debug compiler i.e. be fast. I would much rather have the codebase be clean enough that I could easily get it generating code on my AArch64 Mac than have it chase after being able to do xyz. I will never actually care about dmds ability to optimize something.

However, I hope 'DMD' can work on the back end. D author can make use of his back-end knowledge, and do not have to follow the 'llvm''s path!
Llvm is not necessarily the best
.

June 07, 2022
On Tuesday, 7 June 2022 at 08:44:27 UTC, max haughton wrote:
> dmd should just focus on being the debug compiler i.e. be fast. I would much rather have the codebase be clean enough that I could easily get it generating code on my AArch64 Mac than have it chase after being able to do xyz. I will never actually care about dmds ability to optimize something.

I did left DMD because of unreliable codegen. But I do miss the build times dearly.
It seems to me things are actually getting better in DMD over time and my pan is to enable D_SIMD in intel-intrinsics ASAP (or at least try).
The reason is that DMD generates surprisingly usable code in no time when you use D_SIMD.
Builtins like in D_SIMD are interesting because they require less optimizer busy work,
making it especially useful for debug builds.

But, yeah, it's not sure the improved performance is worth the backend churn, just for debug builds, so not sure.
June 07, 2022

On Tuesday, 7 June 2022 at 09:41:34 UTC, zjh wrote:

>

However, I hope 'DMD' can work on the back end. D author can make use of his back-end knowledge, and do not have to follow the 'llvm''s path!
Llvm is not necessarily the best
.

Moreover, I like to see the competition between small companies or small languages and large companies.
I vote small ones very much.
This is very exciting.

June 07, 2022
On 6/6/22 23:22, Walter Bright wrote:

> The
> results were thrown out for my compiler, as the journalist concluded it
> had a bug in it where it deleted the test suite code.

That's too funny. :) And that's why peer reviewed articles exist.

> He wrote lots of
> bad things as a result.

Anything more to the story? You should get more sales when they hopefully published the explanation with an apology. (?)

Ali

June 07, 2022
On 6/4/2022 8:12 PM, rikki cattermole wrote:
> Lol, I'll take that to mean its architecturally pretty far from where it needs to be.

Not really, I just never thought it was worth the effort.
June 07, 2022
On 6/7/2022 2:23 AM, Bruce Carneal wrote:
> On Tuesday, 7 June 2022 at 07:05:04 UTC, Walter Bright wrote:
>> dmd doesn't do autovectorization.
>>
>> However, you can write vector code using vector types if you wish.
> 
> vector types are a great feature.  That said, for readability I'm migrating my __vector code base to autovectorization for the CPU-only deployments and autovec+SIMT/dcompute for the rest.
> 
> Fortunately the recent autovectorizer code performance equals or exceeds the "manual" code in many instances (more aggressive unrolling and better/finer-grain handling of intermediate lengths).  OTOH, if perf drops,and SIMT is not available, __vector it is!
> 

I've never much liked autovectorization:

1. you never know if it is going to vectorize or not. The vector instruction sets vary all over the place, and whether they line up with your loops or not is not determinable in general - you have to look at the assembler dump.

2. when autovectorization doesn't happen, the compiler reverts to non-vectorized slow code. Often, you're not aware this has happened, and the expected performance doesn't happen. You can usually refactor the loop so it will autovectorize, but that's something only an expert programmer can accomplish, but he can't do it if he doesn't *realize* the autovectorization didn't happen.  You said it yourself: "if perf drops"!

3. it's fundamentally a backwards thing. The programmer writes low level code (explicit loops) and the compiler tries to work backwards to create high level code (vectors) for it! This is completely backwards to how compilers normally work - specify a high level construct, and the compiler converts it into low level.

4. with vector code, the compiler will tell you when the instruction set won't map onto it, so you have a chance to refactor it so it will.