dmd codegen improvements (page 7)

On 19-Aug-2015 00:43, Walter Bright wrote: > On 8/18/2015 1:33 PM, Jacob Carlborg wrote: >> There's profile guided optimization, which LLVM supports. > > dmd does have that to some extent. If you run with -profile, the > profiler will emit a trace.def file. This is a script which can be fed > to the linker which controls the layout of functions in the executable. > The layout is organized so that strongly connected functions reside in > the same page, minimizing swapping and maximizing cache hits. > > Unfortunately, nobody makes use of it, which makes me reluctant to > expend further effort on PGO. > > http://www.digitalmars.com/ctg/trace.html > > I wonder how many people actually use the llvm profile guided > optimizations. I suspect very, very few. I guess this needs a prominent article to show some bung for the buck. -- Dmitry Olshansky

On 19-Aug-2015 01:14, Walter Bright wrote: > On 8/18/2015 3:04 PM, deadalnix wrote: >> My understanding is that the inliner is in the front end. This >> definitively do >> not work the way I describe it here. > > But it uses a cost function and runs repeatedly until there is no more > inlining to be done. > When looking at AST there is no way to correctly estimate cost function - code generated may be huge with user-defined types/operators. -- Dmitry Olshansky

On 19-Aug-2015 00:34, H. S. Teoh via Digitalmars-d wrote: >>> On Tuesday, 18 August 2015 at 10:45:49 UTC, Walter Bright wrote: >>>> Martin ran some benchmarks recently that showed that ddmd compiled >>>> with dmd was about 30% slower than when compiled with gdc/ldc. This >>>> seems to be fairly typical. > [...] > > This matches my experience of dmd vs. gdc as well. No surprise there. > > >>>> I'm interested in ways to reduce that gap. > [...] > > Replace the backend with GDC or LLVM? :-P > Oh come on - LLVM was an inferiour backend for some time. So what? Let us no work on it 'cause GCC is faster? Contrary to that turns our C++ plus a better intermediate repr foundation is a big win that allows to close the gap. Also DMD's backend strives to stay fast _and_ generate fine machine code. Getting within 10% of GCC/LLVM and being fast is IMHO both possible and should be done. Lastly a backend written in D may take advantage of D's feature to do in x5 less LOCs what others do in C. And there is plenty of research papers on optimization floating around and implemented in GCC/LLVM/MSVC so most of R&D cost is payed by other backends/researchers. -- Dmitry Olshansky

On 8/19/2015 1:11 AM, Dmitry Olshansky wrote: > When looking at AST there is no way to correctly estimate cost function - code > generated may be huge with user-defined types/operators. Sure the cost function is fuzzy, but it tends to work well enough.

On Wednesday, 19 August 2015 at 08:29:05 UTC, Walter Bright wrote: > On 8/19/2015 1:11 AM, Dmitry Olshansky wrote: >> When looking at AST there is no way to correctly estimate cost function - code >> generated may be huge with user-defined types/operators. > > Sure the cost function is fuzzy, but it tends to work well enough. No, looking at what DMD geenrate, it is obviously not good at inlining. Here is the issue, when you have A calling B calling C, once you have inlined C into B, and ran optimization, you often find that there are dramatic simplifications you can do (this tends to be especially true with templates) and that may make B eligible for inlining into A, because it became simpler instead of more complex. Optimize top-down, inline bottom-up and reoptimize as you inline. That's proven tech.

August 19, 2015

Re: dmd codegen improvements

Posted by Ola Fosheim Grøstad
in reply to Dmitry Olshansky

Permalink

Ola Fosheim Grøstad

Posted in reply to Dmitry Olshansky

Permalink

On Wednesday, 19 August 2015 at 08:22:58 UTC, Dmitry Olshansky wrote:
> Also DMD's backend strives to stay fast _and_ generate fine machine code. Getting within 10% of GCC/LLVM and being fast is IMHO both possible and should be done.

But if iOS/OS-X and others are essentially requiring an LLVM-like IR as the object code format then it makes most sense to have LLVM as the default backend. If WebAsm et al is focusing on mimicing LLVM, then D's backend have to do the same. And that is not unlikely giving PNACL being LLVM based. Intel is also supportive of LLVM…

Replicating a scalar SSA like LLVM does not make a lot of sense. What would make a lot of sense would be to start work on an experimental SIMD SSA implemented in D that could leverage benefits for next gen x86 SIMD and make Phobos target it. That could attract new people to D and make D beat LLVM. You could even combine LLVM and your own SIMD backend (run both, then profile and pick the best code in production on a function-by-function base)

Or a high level compile-time oriented IR for D that can boost templates semantics and compilation speed.

> And there is plenty of research papers on optimization floating around and implemented in GCC/LLVM/MSVC so most of R&D cost is payed by other backends/researchers.

I think you underestimate the amount of experimental work that has gone into those backends, work that ends up being trashed. It's not like you have to implement what LLVM has now. You have to implement what LLVM has and a lot of the stuff they have thrown out.

On 19-Aug-2015 12:26, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang@gmail.com>" wrote: >> And there is plenty of research papers on optimization floating around >> and implemented in GCC/LLVM/MSVC so most of R&D cost is payed by other >> backends/researchers. > > I think you underestimate the amount of experimental work that has gone > into those backends, work that ends up being trashed. It's not like you > have to implement what LLVM has now. You have to implement what LLVM has > and a lot of the stuff they have thrown out. > I do not. I underestime the benefits of tons of subtle passes that play into 0.1-0.2% in some cases. There are lots and lots of this in GCC/LLVM. If having the best code generated out there is not the goal we can safely omit most of these focusing on the most critical bits. -- Dmitry Olshansky

On Wednesday, 19 August 2015 at 09:29:31 UTC, Dmitry Olshansky wrote: > I do not. I underestime the benefits of tons of subtle passes that play into 0.1-0.2% in some cases. There are lots and lots of this in GCC/LLVM. If having the best code generated out there is not the goal we can safely omit most of these focusing on the most critical bits. Well, you can start on this now, but by the time it is ready and hardened, LLVM might have received improved AVX2 and AVX-512 code gen from Intel. Which basically will leave DMD in the dust.

On 19-Aug-2015 12:46, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= <ola.fosheim.grostad+dlang@gmail.com>" wrote: > On Wednesday, 19 August 2015 at 09:29:31 UTC, Dmitry Olshansky wrote: >> I do not. I underestime the benefits of tons of subtle passes that >> play into 0.1-0.2% in some cases. There are lots and lots of this in >> GCC/LLVM. If having the best code generated out there is not the goal >> we can safely omit most of these focusing on the most critical bits. > > Well, you can start on this now, but by the time it is ready and > hardened, LLVM might have received improved AVX2 and AVX-512 code gen > from Intel. Which basically will leave DMD in the dust. > On numerics, video-codecs and the like. Not like compilers solely depend on AVX. -- Dmitry Olshansky

Forums