August 11, 2015
On 08/11/2015 02:58 PM, David Nadlinger via dmd-internals wrote:
> 
> Every time somebody adds/removes imports in Phobos or makes them function-local, it has a chance of causing breakage in client code because of the various holes in the module system. I've witnessed this with two releases in the Weka codebase now. With that observation, I was just trying to make the point that we should not shy away from fixing those issues due to fear of causing unforeseen regressions, because we are already doing so all the time.

Yes, we're paying a constant price for not fixing 313/314/DIP25. Whether we do it before or after ddmd, I don't care, but we should do it within this year IMO.




August 11, 2015
On 08/11/2015 01:25 PM, Iain Buclaw wrote:
> Because I operate around GCC releases, rather than DMD.  In the 6-8 months that a GCC development cycle and release is done, there could be many DMD releases during that time.  If (the nonexistent) GDC-6.2 built on 2.068 cannot build GDC-7.1 (2.072) or current GDC-8.0 development snapshot (2.075), then I have a serious problem with upstream development.

So how would you bootstrap gdc when it was part of the gcc codebase?
Downloading an old version of gdc?
How much freedom do you have for gdc's build system, e.g. would you have
to use tarballs or could you use a git repo?



August 11, 2015
On 08/11/2015 03:24 PM, Martin Nowak via dmd-internals wrote:
> And the profiles of both are very similar, e.g. comparing the relative
> times spend compiling dub.
> https://dawg.eu/dmd_vs_ddmd_prof.svg

Same graph comparing absolute times. https://dawg.eu/dmd_vs_ddmd_prof_abs.svg



August 11, 2015
On 11 August 2015 at 16:02, Martin Nowak via dmd-internals < dmd-internals@puremagic.com> wrote:

> On 08/11/2015 03:24 PM, Martin Nowak via dmd-internals wrote:
> > And the profiles of both are very similar, e.g. comparing the relative
> > times spend compiling dub.
> > https://dawg.eu/dmd_vs_ddmd_prof.svg
>
> Same graph comparing absolute times. https://dawg.eu/dmd_vs_ddmd_prof_abs.svg
>
>
Those extra memcpy calls look interesting.  I wonder whether it's because g++ inlines small copies, or the dmd codegen produces a *lot* of them where a better alternative could be used.

Also, I wonder whether making use of `= void` for local declarations that don't have an initializer helps at all with anything performance-wise. That at least should be a trivial change to magicport.

Iain


August 12, 2015
On 08/11/2015 04:38 PM, Iain Buclaw wrote:
> Those extra memcpy calls look interesting.

Sure it looks interesting but it won't suffice for 30% and even for that 10% we'd need to profile and optimize a lot of code during the next 6 weeks.

Here's a breakdown of __memcpy_avx_unaligned callers.

Samples: 40K of event 'cycles', Event count (approx.): 12450194501
  Children      Self  Comm  Shared Objec  Symbol
-    9.46%     3.44%  ddmd  libc-2.20.so  [.] __memcpy_avx_unaligned
   - __memcpy_avx_unaligned
      + 9.06% TypeIdentifier::syntaxCopy
      + 5.98% TemplateInstance::syntaxCopy
      + 4.07% TemplateDeclaration::matchWithInstance
      + 3.65% Scope::alloc
      + 2.76% TemplateTypeParameter::matchArg
      + 2.69% TemplateDeclaration::doHeaderInstantiation
      + 2.20% TypeInstance::syntaxCopy
      + 2.12% TemplateDeclaration::declareParameter
      + 2.05% TemplateDeclaration::evaluateConstraint
      + 1.82% functionResolve::ParamDeduce::fp
      + 1.70% DsymbolExp::semantic
      + 1.62% FuncDeclaration::semantic3
      + 1.56% Parser::parsePrimaryExp
      + 1.54% IsExp::syntaxCopy
      + 1.54% TemplateDeclaration::deduceFunctionTemplateMatch
      + 1.41% IdentifierExp::semantic
      + 1.29% TypeFunction::syntaxCopy
      + 1.27% AliasDeclaration::syntaxCopy
      + 1.24% 0
      + 1.21% VarDeclaration::syntaxCopy
      + 1.10% Parser::parseDeclarations
      + 1.09% StaticIfDeclaration::syntaxCopy
      + 1.01% deduceType::DeduceType::visit
      + 0.99% TemplateInstance::semantic
      + 0.90% castTo::CastTo::visit
      + 0.86% Scope::insert
      + 0.83% Parser::parseBasicType
      + 0.78% IsExp::semantic
      + 0.75% TemplateInstance::findBestMatch::ParamBest::fp
      + 0.73% TemplateTupleParameter::matchArg
      + 0.65% Parameter::arraySyntaxCopy
      + 0.64%
_D4ddmd5mtype9TypeTuple6__ctorMFPS4ddmd4root5array41__T5ArrayTC4ddmd10expression10ExpressionZ5ArrayZC4ddmd5mtype9Type
      + 0.61% ForeachStatement::semantic
      + 0.58% IndexExp::semantic
      + 0.57% TupleExp::semantic
      + 0.57% ExpInitializer::syntaxCopy
      + 0.57% functionResolve
      + 0.56% CallExp::semantic
      + 0.55% ScopeExp::semantic
      + 0.55% VarDeclaration::semantic
      + 0.54% FuncDeclaration::syntaxCopy
      + 0.52% StringExp::semantic




August 11, 2015

On 8/11/2015 5:58 AM, David Nadlinger wrote:
> On 11 Aug 2015, at 11:25, Walter Bright via dmd-internals wrote:
>> There's always going to be reasons not to switch to ddmd. We need to just do it.
>
> I do fully agree. For exactly this reason, I worked with Daniel to get DDMD built by LDC up and running the day after DConf instead of immediately going on my trip through Utah as originally planned. :)

That's great news!

>
>> I don't know what you mean by 314 causing regressions in every release?
>
> Every time somebody adds/removes imports in Phobos or makes them function-local, it has a chance of causing breakage in client code because of the various holes in the module system. I've witnessed this with two releases in the Weka codebase now. With that observation, I was just trying to make the point that we should not shy away from fixing those issues due to fear of causing unforeseen regressions, because we are already doing so all the time.
>

Ok, I understand now.
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals
August 11, 2015

On 8/11/2015 6:24 AM, Martin Nowak via dmd-internals wrote:
> Please don't underestimate the problem. If we release a self-hosted compiler that is 30% slower, then the message between the lines is that's b/c of D is slower than C++.

30% is a problem, but not a disaster. Some mitigating factors:

1. I believe that there is an excessive number of template instantiations going on. This is borne out by the profile results you supplied (thank you). I.e. it should not be necessary to instantiate a template in order to determine if it has already been instantiated.

2. Not much effort has been expended in profiling dmd in a while. I suspect there is more unrecognized low hanging fruit to speed it up.

3. dmd for Windows is compiled by the same backend as dmd has, so there shouldn't be a speed difference there.

4. I want to unwind the changes that resulted in the large dmd slowdowns that recently appeared and find other ways.

5. Having the source code in D offers possibilities for optimization that are not so practical in C++ source.

_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals
August 12, 2015
On Wednesday, 12 August 2015 at 00:44:19 UTC, Walter Bright wrote:
> 30% is a problem, but not a disaster. Some mitigating factors:
>
> 1. I believe that there is an excessive number of template instantiations going on. This is borne out by the profile results you supplied (thank you). I.e. it should not be necessary to instantiate a template in order to determine if it has already been instantiated.

Yes, but it's seems a bit unrealistic to rewrite template instantiation and make a ddmd switch in one release.

> 2. Not much effort has been expended in profiling dmd in a while. I suspect there is more unrecognized low hanging fruit to speed it up.

???
Look at the profile again, there is nothing left but a smarter template instantiation (it dominates 65% when accounting for self+child time).
Last time I squeezed 1% out of the compiler I had to rewrite StringTable.

> 3. dmd for Windows is compiled by the same backend as dmd has, so there shouldn't be a speed difference there.

That's true, the slowdown would only hit half of our user base.

> 4. I want to unwind the changes that resulted in the large dmd slowdowns that recently appeared and find other ways.

That only affected single file compilation, so it doesn't help.
Also we can't undo this w/o reintroducing bugs.

> 5. Having the source code in D offers possibilities for optimization that are not so practical in C++ source.

That's a vague hope at best. In fact we'll first have to deal with small slowdowns caused by D, e.g. unnecessary initialization.


If you want to compensate the slowdown by optimizing the compiler we should first try to improve our template instantiation, then do the switch.
https://trello.com/c/L0nV131G/17-investigate-fix-compiler-slowdown
https://github.com/D-Programming-Language/dmd/pull/4780#issuecomment-124087604

Let me try if I can get ddmd numbers for ldc, that seems like a more feasible approach to me.
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals
August 12, 2015
On 12 August 2015 at 08:40, Martin Nowak via dmd-internals < dmd-internals@puremagic.com> wrote:

> On Wednesday, 12 August 2015 at 00:44:19 UTC, Walter Bright wrote:
>
>> 30% is a problem, but not a disaster. Some mitigating factors:
>>
>> 1. I believe that there is an excessive number of template instantiations going on. This is borne out by the profile results you supplied (thank you). I.e. it should not be necessary to instantiate a template in order to determine if it has already been instantiated.
>>
>
> Yes, but it's seems a bit unrealistic to rewrite template instantiation and make a ddmd switch in one release.
>
> 2. Not much effort has been expended in profiling dmd in a while. I
>> suspect there is more unrecognized low hanging fruit to speed it up.
>>
>
> ???
> Look at the profile again, there is nothing left but a smarter template
> instantiation (it dominates 65% when accounting for self+child time).
> Last time I squeezed 1% out of the compiler I had to rewrite StringTable.
>
> 3. dmd for Windows is compiled by the same backend as dmd has, so there
>> shouldn't be a speed difference there.
>>
>
> That's true, the slowdown would only hit half of our user base.
>
> 4. I want to unwind the changes that resulted in the large dmd slowdowns
>> that recently appeared and find other ways.
>>
>
> That only affected single file compilation, so it doesn't help. Also we can't undo this w/o reintroducing bugs.
>
> 5. Having the source code in D offers possibilities for optimization that
>> are not so practical in C++ source.
>>
>
> That's a vague hope at best. In fact we'll first have to deal with small slowdowns caused by D, e.g. unnecessary initialization.
>
>
> If you want to compensate the slowdown by optimizing the compiler we should first try to improve our template instantiation, then do the switch. https://trello.com/c/L0nV131G/17-investigate-fix-compiler-slowdown
>
> https://github.com/D-Programming-Language/dmd/pull/4780#issuecomment-124087604
>
> Let me try if I can get ddmd numbers for ldc, that seems like a more feasible approach to me.


For the sake of completeness, I can backport cppmangle from 2.067 down to gdc to allow you to test that also, I have already verified that it is all that's needed to build ddmd (with a couple of small omissions or changes) and the resultant compile passes the D2 testsuite.

Regards
Iain


August 12, 2015

On 8/11/2015 11:40 PM, Martin Nowak via dmd-internals wrote:
>
> If you want to compensate the slowdown by optimizing the compiler we should first try to improve our template instantiation, then do the switch.
> https://trello.com/c/L0nV131G/17-investigate-fix-compiler-slowdown
> https://github.com/D-Programming-Language/dmd/pull/4780#issuecomment-124087604

The slowdown has a solution - 4780 - which you and Kenji don't agree with. Nevertheless, that is how the compiler used to work before the slowdowns and regressions. The worst case of 4780 is the user will add the -allinst compiler switch, and their compilation will get slower. This is far better than it gets slower for every case.

If we keep finding reasons not to do the switch, it will never happen.

If we wait for ldc/gdc to catch up to 2.068, then we'll be in the same situation with 2.069+.
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals