dmd codegen improvements (page 6)

On 8/18/2015 2:57 PM, H. S. Teoh via Digitalmars-d wrote: > From the little that I've seen of dmd's output, it seems that it's > rather weak in the areas of inlining and loop unrolling / refactoring. DMD does not do loop unrolling. I've thought about it many times, but just never did it. > In both cases, it seems that the optimizer gives up too quickly -- an > if-else function body will get inlined, but an if without an else > doesn't, etc.. It should do this. An example would be nice. > There's also the more general optimizations, like eliminating redundant > loads, eliding useless allocation of stack space in functions that > return constant values, etc.. While DMD does do some of this, it's not > as thorough as GDC. While it may sound like only a small difference, if > they happen to run inside an inner loop, they can add up to quite a > significant difference. dmd has a full data flow analysis pass, which includes dead code elimination and dead store elimination. It goes as far as possible with the intermediate code. Any dead stores still generated are an artifact of the the detailed code generation, which I agree is a problem. > DMD needs to be much more aggressive in eliminating useless / redundant > code blocks; a lot of this comes not from people writing unusually > redundant code, but from template expansions and inlined range-based > code, which sometimes produce a lot of redundant operations if > translated directly. Aggressively reducing these generated code blocks > will often open up further optimization opportunities. I'm not aware of any case of DMD generating dead code blocks. I'd like to see it if you have one.

On 8/18/2015 3:07 PM, Joseph Rushton Wakeling wrote: > I was backing up your rationale, even if I disagree with your > prioritizing these concerns at this stage of the dmd => ddmd transition. I want to move to ddmd right now, and I mean right now. But it's stalled, awaiting Daniel and Martin. https://github.com/D-Programming-Language/dmd/pull/4884 I thought I'd investigate back end issues while waiting, as Martin is very concerned about the ddmd compile speed.

On 8/18/2015 3:17 PM, welkam wrote: > People are lazy and if it takes more than one click people wont use it. Just > like unitesting everyone agrees that its good to write them but nobody does > that. When you put unitesting in compiler more people are writing tests. PGO is > awesome, but it needs to be made much simpler before people use it everyday. Exactly. That's why people just want to type "-O" and it optimizes.

August 18, 2015

Re: dmd codegen improvements

Posted by H. S. Teoh
in reply to Walter Bright

Permalink

H. S. Teoh

Posted in reply to Walter Bright

Permalink

On Tue, Aug 18, 2015 at 03:25:38PM -0700, Walter Bright via Digitalmars-d wrote:
> On 8/18/2015 2:57 PM, H. S. Teoh via Digitalmars-d wrote:
> >From the little that I've seen of dmd's output, it seems that it's rather weak in the areas of inlining and loop unrolling / refactoring.
> 
> DMD does not do loop unrolling. I've thought about it many times, but just never did it.

What's the reason for it?


> >In both cases, it seems that the optimizer gives up too quickly -- an if-else function body will get inlined, but an if without an else doesn't, etc..
> 
> It should do this. An example would be nice.

Sorry, I wrote this from memory, so I don't have an example handy. But IIRC it was either a lambda or a function with a single-line body, where if the function has the form:

	auto f() {
		if (cond)
			return a;
		else
			return b;
	}

it would be inlined, but if it was written:

	auto f() {
		if (cond)
			return a;
		return b;
	}

it would remain as a function call. (I didn't test this, btw, like I said, I'm writing this from memory.)


> >There's also the more general optimizations, like eliminating redundant loads, eliding useless allocation of stack space in functions that return constant values, etc.. While DMD does do some of this, it's not as thorough as GDC. While it may sound like only a small difference, if they happen to run inside an inner loop, they can add up to quite a significant difference.
> 
> dmd has a full data flow analysis pass, which includes dead code elimination and dead store elimination. It goes as far as possible with the intermediate code. Any dead stores still generated are an artifact of the the detailed code generation, which I agree is a problem.
> 
> 
> >DMD needs to be much more aggressive in eliminating useless / redundant code blocks; a lot of this comes not from people writing unusually redundant code, but from template expansions and inlined range-based code, which sometimes produce a lot of redundant operations if translated directly.  Aggressively reducing these generated code blocks will often open up further optimization opportunities.
> 
> I'm not aware of any case of DMD generating dead code blocks. I'd like to see it if you have one.

Sorry, I didn't write it clearly. I meant dead or redundant loads/stores, caused either by detailed codegen or from template expansion or inlining(?). Overall, the assembly produced by GDC tends to be "cleaner" or "leaner", whereas the assembly produced by DMD tends to be more "frilly" (doing the same thing in more instructions than GDC does in fewer).

Maybe when I get some free time this week, I could look at the disassembly of one of my programs again to give some specific examples.


T

-- 
Doubtless it is a good thing to have an open mind, but a truly open mind should be open at both ends, like the food-pipe, with the capacity for excretion as well as absorption. -- Northrop Frye

On 8/18/2015 4:05 PM, H. S. Teoh via Digitalmars-d wrote: > Maybe when I get some free time this week, I could look at the > disassembly of one of my programs again to give some specific examples. Please do.

On Tuesday, 18 August 2015 at 23:30:26 UTC, Walter Bright wrote: > On 8/18/2015 4:05 PM, H. S. Teoh via Digitalmars-d wrote: >> Maybe when I get some free time this week, I could look at the >> disassembly of one of my programs again to give some specific examples. > > Please do. Sorry to repeat myself, but isn't https://issues.dlang.org/show_bug.cgi?id=11821 such an example? Perhaps other examples can be generated by examining the assembly output of some simple range-based programs. So what I am suggesting is a kind of test-driven approach. Just throw some random range stuff together, like ----- import std.algorithm, std.range; int main() {return [0,1,4,9,16] . take(3) . filter!(q{a&1}) . front;} ----- and look at the generated assembly. For me, the above example did not inline some FilterResult lambda call. Isn't it the optimizer's fault in the end, one which can be addressed? Besides, there are quite a few other hits at the bugzilla when searching for "backend" or "performance".

On 8/18/2015 5:27 PM, Walter Bright wrote: > On 8/18/2015 5:07 PM, Ivan Kazmenko wrote: >> Sorry to repeat myself, but isn't https://issues.dlang.org/show_bug.cgi?id=11821 >> such an example? > Yes, absolutely. I remarked it as an enhancement request.

On Tuesday, 18 August 2015 at 22:01:16 UTC, rsw0x wrote: > On Tuesday, 18 August 2015 at 21:53:43 UTC, Meta wrote: >> On Tuesday, 18 August 2015 at 21:45:42 UTC, rsw0x wrote: >>> If you want D to have a GC, you have to design the language around having a GC. Right now, D could be likened to using C++ with Boehm. >> >> The irony is that most GC-related complaints are the exact opposite - that the language depends too much on the GC. > > Phobos relies on the GC, but the language itself is not designed around being GC friendly. There are array literals, delegates, associative arrays, pointers, new, delete, and classes, all of which depend on the GC and are part of the language.

On Wednesday, 19 August 2015 at 01:52:56 UTC, Meta wrote: > On Tuesday, 18 August 2015 at 22:01:16 UTC, rsw0x wrote: >> On Tuesday, 18 August 2015 at 21:53:43 UTC, Meta wrote: >>> On Tuesday, 18 August 2015 at 21:45:42 UTC, rsw0x wrote: >>>> If you want D to have a GC, you have to design the language around having a GC. Right now, D could be likened to using C++ with Boehm. >>> >>> The irony is that most GC-related complaints are the exact opposite - that the language depends too much on the GC. >> >> Phobos relies on the GC, but the language itself is not designed around being GC friendly. > > There are array literals, delegates, associative arrays, pointers, new, delete, and classes, all of which depend on the GC and are part of the language. That doesn't make it GC friendly, that makes it GC reliant.

Forums