August 18, 2015
On 2015-08-18 20:59, Walter Bright wrote:

> There is some potential there, but since a static compiler doesn't do
> runtime profiling, some sort of hinting scheme would have to be invented.

There's profile guided optimization, which LLVM supports.

-- 
/Jacob Carlborg
August 18, 2015
On Tuesday, 18 August 2015 at 20:28:48 UTC, Walter Bright wrote:
> On 8/18/2015 12:38 PM, deadalnix wrote:
>> And honestly, there is no way DMD can catch up.
>
> I find your lack of faith disturbing.
>
> https://www.youtube.com/watch?v=Zzs-OvfG8tE&feature=player_detailpage#t=91

Let's say I have some patches in LLVM and a pretty good understanding of how it works. There are some big optimizations that DMD could benefit from, but a lot of it is getting heuristics just right and recognize a sludge of patterns.

For instance this: http://llvm.org/docs/doxygen/html/DAGCombiner_8cpp_source.html is what you get to recognize patterns created by legalization. For more general patterns: https://github.com/llvm-mirror/llvm/tree/master/lib/Transforms/InstCombine

And that is just the general case pass. You then have a sludge of passes that do canonicalization (GVN for instance) in order to reduce the amount of pattern other passes have to match, and others looking for specialized things (SROA, LoadCombine, ...) and finally a ton of them looking for higher level things to change (SimplifyCFG, Inliner, ...).

All of them require a cheer amount of pure brute force, by recognizing more and more patterns, while other required fine tuned heuristics.

Realistically, D does not have the man power required to reach the same level of optimization, and have many higher impact task to spend that manpower on.
August 18, 2015
On 8/18/2015 6:01 AM, ponce wrote:
> One thing that was striking to me is that it by and large it doesn't use PUSH,
> POP, and SETcc. Actually I don't remember such an instruction being emitted by it.
>
> And indeed using PUSH/POP/SETcc in assembly were often slower than the
> alternative. Which is _way_ different that the old x86 where each of these
> things would gain speed.

The 32 bit code generator does a lot of push/pop, but the 64 bit one does far less because function parameters are passed in registers most of the time.

August 18, 2015
On Tuesday, 18 August 2015 at 10:45:49 UTC, Walter Bright wrote:
> Martin ran some benchmarks recently that showed that ddmd compiled with dmd was about 30% slower than when compiled with gdc/ldc. This seems to be fairly typical.
>
> I'm interested in ways to reduce that gap.

retire dmd?
this is ridiculous.
August 18, 2015
On Tuesday, 18 August 2015 at 20:24:31 UTC, Vladimir Panteleev wrote:
> On Tuesday, 18 August 2015 at 19:02:20 UTC, Walter Bright wrote:
>> On 8/18/2015 5:37 AM, Vladimir Panteleev wrote:
>>> IIRC, I have had three releases affected by optimization/inlining DMD bugs (two
>>> of Digger and one of RABCDAsm). These do not speak well for D when end-users ask
>>> me what the cause of the bug is, and I have to say "Yeah, it's a bug in the
>>> official D compiler".
>>
>> Are they filed in bugzilla?
>
> Yep, just search for wrong-code regressions. The specific bugs in question have been fixed, but that doesn't change the general problem.

I would like to add that fixing the regression does not make it go away. Even though it's fixed in git, and even after the fix ships with a new DMD release, there is still a D version out there that has the bug, and that will never change until the end of time. The consequence of this is that affected programs cannot be built with certain versions of DMD (e.g. RABCDAsm's build tool checks for the compiler bug and asks users to use another compiler version or disable optimizations). This affects users who get DMD by some other means than downloading it from dlang.org themselves, e.g. via their OS package repository (especially LTS OS release users).

Fixing regressions is not enough. We need to try harder to prevent them from ending up in DMD releases at all.

August 18, 2015
On 8/18/2015 1:47 PM, deadalnix wrote:
> Realistically, D does not have the man power required to reach the same level of
> optimization, and have many higher impact task to spend that manpower on.

dmd also does a sludge of patterns. I'm just looking for a few that would significantly impact the result.
August 18, 2015
On Tuesday, 18 August 2015 at 21:18:34 UTC, rsw0x wrote:
> On Tuesday, 18 August 2015 at 10:45:49 UTC, Walter Bright wrote:
>> Martin ran some benchmarks recently that showed that ddmd compiled with dmd was about 30% slower than when compiled with gdc/ldc. This seems to be fairly typical.
>>
>> I'm interested in ways to reduce that gap.
>
> retire dmd?
> this is ridiculous.

To further expand upon this,
if you want to make D fast - Fix the interface between the compiler and the runtime(including the inability for compilers to inline simple things like allocations which makes allocations have massive overheads.) Then, fix the GC. Make the GC both shared and immutable aware, then moving the GC to a thread local "island"-style GC would be fairly easy. D's GC is probably the slowest GC of any major language available, and the entire thing is wrapped in mutexes.

D has far, far bigger performance problems that dmd's backend.

Maybe you should take a look at what Go has recently done with their GC to get an idea of what D's competition has been up to. https://talks.golang.org/2015/go-gc.pdf
August 18, 2015
On 8/18/2015 1:24 PM, Vladimir Panteleev wrote:
> The specific bugs in question have
> been fixed, but that doesn't change the general problem.

The reason we have regression tests is to make sure things that are fixed stay fixed. Codegen bugs also always had the highest priority.

Being paralyzed by fear of introducing new bugs is not a way forward with any project.

(Switching to ddmd, and eventually put the back end in D, will also help with this. DMC++ is always built with any changes and tested to exactly duplicate itself, and that filters out a lot of problems. Unfortunately, DMC++ is a 32 bit program and doesn't exercise the 64 bit code gen. Again, ddmd will fix that.)
August 18, 2015
On Tuesday, 18 August 2015 at 21:26:43 UTC, rsw0x wrote:
> On Tuesday, 18 August 2015 at 21:18:34 UTC, rsw0x wrote:
> D has far, far bigger performance problems that dmd's backend.

However true that may be in general, those almost certainly aren't the reasons why ddmd benchmarks 30% slower than dmd.  I would suspect that particular speed difference is heavily backend-dependent.

August 18, 2015
On Tuesday, 18 August 2015 at 21:25:35 UTC, Walter Bright wrote:
> On 8/18/2015 1:47 PM, deadalnix wrote:
>> Realistically, D does not have the man power required to reach the same level of
>> optimization, and have many higher impact task to spend that manpower on.
>
> dmd also does a sludge of patterns. I'm just looking for a few that would significantly impact the result.

There is none. There is a ton of 0.5% one that adds up to the 30% difference.

If I'd were to bet on what would impact DMD perfs the most, I'd go for SRAO, and a inliner in the middle end that works bottom up :
 - Explore the call graph to-down optimizing functions along the way
 - Backtrack bottom-up and check for inlining opportunities.
 - Rerun optimizations on the function inlining was done in.

It require a fair amount of tweaking and probably need a way for the backends to provide a cost heuristic for various functions, but that would leverage the patterns already existing in the backend.