Jump to page: 1 2
Thread overview
Re: Is there any language that native-compiles faster than D?
Aug 26, 2020
H. S. Teoh
Aug 26, 2020
ketmar
Aug 26, 2020
Dukc
Aug 26, 2020
ketmar
Aug 26, 2020
Stefan Koch
Aug 26, 2020
ketmar
Aug 26, 2020
drug
Aug 26, 2020
ketmar
Aug 26, 2020
Paolo Invernizzi
Aug 26, 2020
Stefan Koch
Aug 26, 2020
ketmar
August 25, 2020
On Wed, Aug 26, 2020 at 01:31:01AM +0000, James Lu via Digitalmars-d wrote: [...]
> DMD -O doesn't make a significant difference over DMD, clocking in at 12 seconds total.
[...]

DMD's optimizer is a joke compared to modern optimizing backends like LDC/LLVM or GCC.  These days I don't even look at DMD for anything remotely performance-related.  I consistently get 15-20% faster executables from LDC than from DMD (even without any optimization flags!), and for compute-heavy programs with -O2/-O3, the difference can be up to 40-50%.  Now that LDC releases are closely tracking DMD releases, I honestly have lost interest in DMD codegen quality, and only use DMD for rapid prototyping during development. For everything else, LDC is my go-to compiler.

(And don't even get me started on backend codegen bugs triggered by -O and/or -inline. After getting bitten a few times by a couple of those, I stay away from dmd -O / dmd -inline like the plague. If I want optimization, I use LDC instead.)


On Wed, Aug 26, 2020 at 01:38:27AM +0000, James Lu via Digitalmars-d wrote: [...]
> I wonder if anyone in the D community has the expertise to change modify or rewrite DMD's backend to be up to be at most 1.5-2x slower at normal, non-SIMD tasks, up to a poor version of LuaJIT or V8 while retaining the speed.

Supposedly Walter is one of the only people who understands the backend well enough to be able to make significant improvements to it.

However, Walter is busy with other D-related stuff (important language-level stuff), and we really don't want his time to be spent optimizing a backend that, to be frank, almost nobody is interested in these days. (I'm willing to be pleasantly surprised, though. If Walter can singlehandedly clean up DMD's optimizer and hone it at least to the same ballpark as LDC/GDC, then I'll be all ears. But I'm not holding my breath.)


T

-- 
People say I'm indecisive, but I'm not sure about that. -- YHL, CONLANG
August 26, 2020
H. S. Teoh wrote:

>> I wonder if anyone in the D community has the expertise to change
>> modify or rewrite DMD's backend to be up to be at most 1.5-2x slower
>> at normal, non-SIMD tasks, up to a poor version of LuaJIT or V8 while
>> retaining the speed.
>
> Supposedly Walter is one of the only people who understands the backend
> well enough to be able to make significant improvements to it.
that's why there is no reason to "improve" current DMD backend at all. it is much easier to throw it away, and write a brand new one, SSA-based. i bet that bog-standard SSA with linear register allocator will generate code at least as good as DMD -O, but it will be faster, and more maintainable. it is also easy to retarget it, because most analysis (and even spilling, partially) is done on SSA level, and you only have to port instruction selector. so no problems maintaining backends for x86, x86_64 and arm (even in the same executable).

also, the same backend can be used to jit ctfe code later.

now we only need somebody to do it.
August 26, 2020
On Wednesday, 26 August 2020 at 04:37:06 UTC, ketmar wrote:
> H. S. Teoh wrote:
>
>>> I wonder if anyone in the D community has the expertise to change
>>> modify or rewrite DMD's backend to be up to be at most 1.5-2x slower
>>> at normal, non-SIMD tasks, up to a poor version of LuaJIT or V8 while
>>> retaining the speed.
>>
>> Supposedly Walter is one of the only people who understands the backend
>> well enough to be able to make significant improvements to it.
> that's why there is no reason to "improve" current DMD backend at all.

Perhaps we should not be that quick to downplay DMD just because it does not optimize as heavily as GDC and LDC at max settings. I may be too theoretical, but I think using only relatively basic optimizations for release build might be preferable to always using the most aggressive setting. Why?

Because the program usually spends almost all it's time in tiny fraction of itself. One has profile where it is and do some hand-optimization anyway to get a performant program, no matter what `-O#` is used. It makes sense to use some basic optimizations, enough to avoid hand-assembly and things like `foreach(vector; cast(long[])intArray){...}` in the critical parts. But max-optimizing the whole program, for me, just seems to bloat binary size and compile times for relatively little benefit.

Also, one supposedly wants to benchmark the critical parts. With conservative optimization, the benchmarks are faster to compile, and supposedly more reliable. There is less surface for compiler-caused performance regression, and your code is more likely to stay fast if you decide you need to use size optimization instead.
August 26, 2020
On Wednesday, 26 August 2020 at 04:37:06 UTC, ketmar wrote:
> also, the same backend can be used to jit ctfe code later.
>
> now we only need somebody to do it.

CTFE needs a different code path from the regular backend.
You need to be able to hook many things which usually you wouldn't need to hook.
August 26, 2020
Dukc wrote:

> On Wednesday, 26 August 2020 at 04:37:06 UTC, ketmar wrote:
>> H. S. Teoh wrote:
>>
>>>> I wonder if anyone in the D community has the expertise to change
>>>> modify or rewrite DMD's backend to be up to be at most 1.5-2x slower
>>>> at normal, non-SIMD tasks, up to a poor version of LuaJIT or V8 while
>>>> retaining the speed.
>>>
>>> Supposedly Walter is one of the only people who understands the backend
>>> well enough to be able to make significant improvements to it.
>> that's why there is no reason to "improve" current DMD backend at all.
>
> Perhaps we should not be that quick to downplay DMD just because it does not optimize as heavily as GDC and LDC at max settings.
it's not the reason, at least for me. the real reason is that DMD backend is virtually impenetrable. it is a giant black box with the label "DO NOT ENTER IF YOUR NAME IS NOT WALTER" on its side.

SSA backend is much easier to maintain, much easier to retarget, and optimisations over SSA can be nicely layered, from "nothing" to "set of aggressive multipass optimisers". the best thing is that those optimisers are mostly independent of each other, they only need to maintain SSA invariant. so you can write alot of them doing one simple optimisation at a time, and run them as long as you want.
August 26, 2020
Stefan Koch wrote:

> On Wednesday, 26 August 2020 at 04:37:06 UTC, ketmar wrote:
>> also, the same backend can be used to jit ctfe code later.
>>
>> now we only need somebody to do it.
>
> CTFE needs a different code path from the regular backend.
> You need to be able to hook many things which usually you wouldn't need to hook.

you're right... with the current backend. but with universal SSA backend, once you lowered the code to SSA, it doesn't matter anymore. for native code, lowering engine can emit direct memory manipulation SSA opcodes, and for CTFE it can emit function calls. the backend doesn't care, it will still produce machine code you can either write to disk, or run directly. or don't even bother producing machine code at all, but run some SSA optimisers and execute SSA code directly.
August 26, 2020
On 8/26/20 4:02 PM, ketmar wrote:
> you're right... with the current backend. but with universal SSA backend, once you lowered the code to SSA, it doesn't matter anymore. for native code, lowering engine can emit direct memory manipulation SSA opcodes, and for CTFE it can emit function calls. the backend doesn't care, it will still produce machine code you can either write to disk, or run directly. or don't even bother producing machine code at all, but run some SSA optimisers and execute SSA code directly.

What are disadvantages of SSA based backend?
August 26, 2020
drug wrote:

> What are disadvantages of SSA based backend?
somebody have to write it.
August 26, 2020
On Wednesday, 26 August 2020 at 13:07:03 UTC, drug wrote:
> On 8/26/20 4:02 PM, ketmar wrote:
>> you're right... with the current backend. but with universal SSA backend, once you lowered the code to SSA, it doesn't matter anymore. for native code, lowering engine can emit direct memory manipulation SSA opcodes, and for CTFE it can emit function calls. the backend doesn't care, it will still produce machine code you can either write to disk, or run directly. or don't even bother producing machine code at all, but run some SSA optimisers and execute SSA code directly.
>
> What are disadvantages of SSA based backend?

If I'm not wrong, from what I remember LLVM IR is SSA, for I guess there's a lot of literature around pro and versus the SSA approach...


August 26, 2020
On Wednesday, 26 August 2020 at 13:07:03 UTC, drug wrote:
> On 8/26/20 4:02 PM, ketmar wrote:
>> you're right... with the current backend. but with universal SSA backend, once you lowered the code to SSA, it doesn't matter anymore. for native code, lowering engine can emit direct memory manipulation SSA opcodes, and for CTFE it can emit function calls. the backend doesn't care, it will still produce machine code you can either write to disk, or run directly. or don't even bother producing machine code at all, but run some SSA optimisers and execute SSA code directly.
>
> What are disadvantages of SSA based backend?

Well formed SSA is a little tricky to generate.
And does not map well on hardware.
Without a few dedicated rewrite and optimization passes, it produces code which is dog slow.
« First   ‹ Prev
1 2