dmd codegen improvements (page 9)

On 19-Aug-2015 15:53, Jacob Carlborg wrote: > On 2015-08-19 00:55, Walter Bright wrote: > >> Exactly. That's why people just want to type "-O" and it optimizes. > > So why not just "-pgo" that does that you described above? > +1 for -pgo to use trace.log in the same folder that way running -profile folowed by -pgo will just work (tm). -- Dmitry Olshansky

On 2015-08-19 15:00, Dmitry Olshansky wrote: > +1 for -pgo to use trace.log in the same folder that way running > -profile folowed by -pgo will just work (tm). I was thinking something the compiler would handle everything automatically in one command with the -pgo flag present. If necessary, one could pass arguments after the -pgo flag which will be used when running application. -- /Jacob Carlborg

On Tuesday, 18 August 2015 at 10:45:49 UTC, Walter Bright wrote: > So if you're comparing code generated by dmd/gdc/ldc, and notice something that dmd could do better at (1, 2 or 3), please let me know. Often this sort of thing is low hanging fruit that is fairly easily inserted into the back end. > I have a about 30 lines of numerical code (using real) where the gap is about 200%-300% between ldc/gdc and dmd (linux x86_64). In fact dmd -O etc is at the level of ldc/gdc without any optimizations and dmd without -0 is even slower. With double instead of real the gap is about 30%. dmd is unable to inline 3 function calls (pragma(inline, true) => compiler error) but for ldc disabling inlining does not really hurt performance. My knowledge of asm and compiler optimizations are quite limited and I can't figure out what dmd could do better. If someone is interested to investigate this, I can put the source + input file for the benchmark on github. Just ping me:) Notice: I use ldc/gdc anyway for such stuff and imo the performance of dmd is not the most important issue with D - e.g. compared to C++ interface (mainly std::vector).

On 18 August 2015 at 12:45, Walter Bright via Digitalmars-d < digitalmars-d@puremagic.com> wrote: > Martin ran some benchmarks recently that showed that ddmd compiled with dmd was about 30% slower than when compiled with gdc/ldc. This seems to be fairly typical. > > I'm interested in ways to reduce that gap. > > There are 3 broad kinds of optimizations that compilers do: > > 1. source translations like rewriting x*2 into x<<1, and function inlining > > 2. instruction selection patterns like should one generate: > > SETC AL > MOVZ EAX,AL > > or: > SBB EAX > NEG EAX > > 3. data flow analysis optimizations like constant propagation, dead code elimination, register allocation, loop invariants, etc. > > Modern compilers (including dmd) do all three. > > So if you're comparing code generated by dmd/gdc/ldc, and notice something that dmd could do better at (1, 2 or 3), please let me know. Often this sort of thing is low hanging fruit that is fairly easily inserted into the back end. > > For example, recently I improved the usage of the SETcc instructions. > > https://github.com/D-Programming-Language/dmd/pull/4901 https://github.com/D-Programming-Language/dmd/pull/4904 > > A while back I improved usage of BT instructions, the way switch statements were implemented, and fixed integer divide by a constant with multiply by its reciprocal. > You didn't fix integer divide on all targets? https://issues.dlang.org/show_bug.cgi?id=14936 (Consider this my contribution to your low hanging fruit)

August 19, 2015

Re: dmd codegen improvements

Posted by deadalnix
in reply to Ola Fosheim Grøstad

Permalink

deadalnix

Posted in reply to Ola Fosheim Grøstad

Permalink

On Wednesday, 19 August 2015 at 09:26:43 UTC, Ola Fosheim Grøstad wrote:
> On Wednesday, 19 August 2015 at 08:22:58 UTC, Dmitry Olshansky wrote:
>> Also DMD's backend strives to stay fast _and_ generate fine machine code. Getting within 10% of GCC/LLVM and being fast is IMHO both possible and should be done.
>
> But if iOS/OS-X and others are essentially requiring an LLVM-like IR as the object code format then it makes most sense to have LLVM as the default backend. If WebAsm et al is focusing on mimicing LLVM, then D's backend have to do the same. And that is not unlikely giving PNACL being LLVM based. Intel is also supportive of LLVM…
>

Apple is invested in LLVM. For other thing you mention, WebAssembly is an AST representation, which is both dumb and do not look like anything like LLVM IR.

> Replicating a scalar SSA like LLVM does not make a lot of sense. What would make a lot of sense would be to start work on an experimental SIMD SSA implemented in D that could leverage benefits for next gen x86 SIMD and make Phobos target it. That could attract new people to D and make D beat LLVM. You could even combine LLVM and your own SIMD backend (run both, then profile and pick the best code in production on a function-by-function base)
>

WAT ?

> Or a high level compile-time oriented IR for D that can boost templates semantics and compilation speed.
>

That's impossible in the state of template right now (I know I've been there and dropped it as the return on investement was too low).

On 8/19/2015 7:34 AM, anonymous wrote: > I have a about 30 lines of numerical code (using real) where the gap is about > 200%-300% between ldc/gdc and dmd (linux x86_64). In fact dmd -O etc is at the > level of ldc/gdc without any optimizations and dmd without -0 is even slower. > With double instead of real the gap is about 30%. If it's just 30 lines of code, you can put it on bugzilla.

On 2015-08-18 12:45, Walter Bright wrote: > Martin ran some benchmarks recently that showed that ddmd compiled with > dmd was about 30% slower than when compiled with gdc/ldc. This seems to > be fairly typical. Not sure how the compilers behave in this case but what about devirtualization? Since I think most developers compile their D programs with all files at once there should be pretty good opportunities to do devirtualization. -- /Jacob Carlborg

On 2015-08-18 23:43, Walter Bright wrote: > I wonder how many people actually use the llvm profile guided > optimizations. I suspect very, very few. In Xcode there's a checkbox for PGO in the build configuration. Should be just as easy to enable as any other build setting. -- /Jacob Carlborg

Forums