August 12, 2015
On 08/12/2015 12:03 PM, Walter Bright via dmd-internals wrote:
> If we keep finding reasons not to do the switch, it will never happen.

So far everybody wants to make the switch, I'm just trying to ensure it's going to be a success.

> If we wait for ldc/gdc to catch up to 2.068, then we'll be in the same situation with 2.069+.

We do have a problem and it's unclear whether we can resolve it, so we should keep a backup plan, e.g. gdc/ldc 2.067 compatibility.



August 12, 2015
On 08/12/2015 09:00 AM, Iain Buclaw wrote:
> For the sake of completeness, I can backport cppmangle from 2.067 down to gdc to allow you to test that also, I have already verified that it is all that's needed to build ddmd (with a couple of small omissions or changes) and the resultant compile passes the D2 testsuite.

Sounds great, what do we need to do?

cppmangle is needed for the frontend to backend binding, right? Isn't there more to the 2.067 C++ support?

A couple of small omissions and changes sounds like we could incorporate those into upstream.

Will there be a 2.067 gdc?

-Martin



August 12, 2015
On 08/12/2015 12:03 PM, Walter Bright via dmd-internals wrote:
> If we wait for ldc/gdc to catch up to 2.068, then we'll be in the same situation with 2.069+.

This is also a good opportunity to improve our collaboration with gdc/ldc and bring down the lag time.



August 12, 2015
On 12 August 2015 at 14:13, Martin Nowak via dmd-internals < dmd-internals@puremagic.com> wrote:

> On 08/12/2015 09:00 AM, Iain Buclaw wrote:
> > For the sake of completeness, I can backport cppmangle from 2.067 down to gdc to allow you to test that also, I have already verified that it is
> all
> > that's needed to build ddmd (with a couple of small omissions or changes) and the resultant compile passes the D2 testsuite.
>
> Sounds great, what do we need to do?
>
> cppmangle is needed for the frontend to backend binding, right? Isn't there more to the 2.067 C++ support?
>
>
Nope, just demangling.  Sharing backend with G++ ensures ABI compatibility in all things except va_list on certain targets where it's a static array (x86_64).  But that is mostly dealt with so long as you try to pass a va_list structure across boundaries (never happens in the existing codebase).

https://github.com/D-Programming-GDC/GDC/pull/131

By the way, I noticed that ddmd is always built in non-release mode, and I discovered that I can squeeze out more performance by speculatively skipping unneeded _d_invariant calls.

https://github.com/D-Programming-GDC/GDC/pull/132

Someone with better know-how of DMD backend could give it a try too...


> A couple of small omissions and changes sounds like we could incorporate those into upstream.
>
> Will there be a 2.067 gdc?
>
>
After I finish re-writing the entire codegen layer, yes; eventually...

Again, pointing you in the direction of our current release tasks.

http://wiki.dlang.org/GDC/CurrentReleaseTasks

In one respect, I've kind of given myself more work to do now because I ignored the first round of visitor conversions in 2.066.  So in 2.067 I've got twice as many to do, and there little that can be done to work around it this time.

Regards
Iain


August 12, 2015
On Wednesday, 12 August 2015 at 12:44:00 UTC, Iain Buclaw wrote:
> By the way, I noticed that ddmd is always built in non-release mode, and I discovered that I can squeeze out more performance by speculatively skipping unneeded _d_invariant calls.

I already compiled ddmd w/ -release to get my numbers.

_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals
August 12, 2015
On 12 August 2015 at 00:59, Martin Nowak <code@dawg.eu> wrote:

> On 08/11/2015 04:38 PM, Iain Buclaw wrote:
> > Those extra memcpy calls look interesting.
>
> Sure it looks interesting but it won't suffice for 30% and even for that 10% we'd need to profile and optimize a lot of code during the next 6 weeks.
>
> Here's a breakdown of __memcpy_avx_unaligned callers.
>
> Samples: 40K of event 'cycles', Event count (approx.): 12450194501
>   Children      Self  Comm  Shared Objec  Symbol
> -    9.46%     3.44%  ddmd  libc-2.20.so  [.] __memcpy_avx_unaligned
>    - __memcpy_avx_unaligned
>       + 9.06% TypeIdentifier::syntaxCopy
>       + 5.98% TemplateInstance::syntaxCopy
>       + 4.07% TemplateDeclaration::matchWithInstance
>       + 3.65% Scope::alloc
>       + 2.76% TemplateTypeParameter::matchArg
>       + 2.69% TemplateDeclaration::doHeaderInstantiation
>       + 2.20% TypeInstance::syntaxCopy
>       + 2.12% TemplateDeclaration::declareParameter
>       + 2.05% TemplateDeclaration::evaluateConstraint
>       + 1.82% functionResolve::ParamDeduce::fp
>       + 1.70% DsymbolExp::semantic
>       + 1.62% FuncDeclaration::semantic3
>       + 1.56% Parser::parsePrimaryExp
>       + 1.54% IsExp::syntaxCopy
>       + 1.54% TemplateDeclaration::deduceFunctionTemplateMatch
>       + 1.41% IdentifierExp::semantic
>       + 1.29% TypeFunction::syntaxCopy
>       + 1.27% AliasDeclaration::syntaxCopy
>       + 1.24% 0
>       + 1.21% VarDeclaration::syntaxCopy
>       + 1.10% Parser::parseDeclarations
>       + 1.09% StaticIfDeclaration::syntaxCopy
>       + 1.01% deduceType::DeduceType::visit
>       + 0.99% TemplateInstance::semantic
>       + 0.90% castTo::CastTo::visit
>       + 0.86% Scope::insert
>       + 0.83% Parser::parseBasicType
>       + 0.78% IsExp::semantic
>       + 0.75% TemplateInstance::findBestMatch::ParamBest::fp
>       + 0.73% TemplateTupleParameter::matchArg
>       + 0.65% Parameter::arraySyntaxCopy
>       + 0.64%
>
> _D4ddmd5mtype9TypeTuple6__ctorMFPS4ddmd4root5array41__T5ArrayTC4ddmd10expression10ExpressionZ5ArrayZC4ddmd5mtype9Type
>       + 0.61% ForeachStatement::semantic
>       + 0.58% IndexExp::semantic
>       + 0.57% TupleExp::semantic
>       + 0.57% ExpInitializer::syntaxCopy
>       + 0.57% functionResolve
>       + 0.56% CallExp::semantic
>       + 0.55% ScopeExp::semantic
>       + 0.55% VarDeclaration::semantic
>       + 0.54% FuncDeclaration::syntaxCopy
>       + 0.52% StringExp::semantic
>
>

These are all places where class allocations occur the most it seems.

I was about to propose making this change:

 extern (C) Object _d_newclass(const ClassInfo ci)
 {
     auto p = allocmemory(ci.init.length);
+    *(cast(void **) p) = cast(void*) ci.vtbl;
-    p[0 .. ci.init.length] = cast(void[])ci.init[];
     return cast(Object)p;
 }

But then I checked and found out that Daniel removes all ctors in the D conversion. =)

I guess this is the reason why memcpy calls have increased!

Regards
Iain


August 12, 2015
On 12 August 2015 at 16:53, Iain Buclaw <ibuclaw@gdcproject.org> wrote:

>
> On 12 August 2015 at 00:59, Martin Nowak <code@dawg.eu> wrote:
>
>> On 08/11/2015 04:38 PM, Iain Buclaw wrote:
>> > Those extra memcpy calls look interesting.
>>
>> Sure it looks interesting but it won't suffice for 30% and even for that 10% we'd need to profile and optimize a lot of code during the next 6 weeks.
>>
>> Here's a breakdown of __memcpy_avx_unaligned callers.
>>
>> Samples: 40K of event 'cycles', Event count (approx.): 12450194501
>>   Children      Self  Comm  Shared Objec  Symbol
>> -    9.46%     3.44%  ddmd  libc-2.20.so  [.] __memcpy_avx_unaligned
>>    - __memcpy_avx_unaligned
>>       + 9.06% TypeIdentifier::syntaxCopy
>>       + 5.98% TemplateInstance::syntaxCopy
>>       + 4.07% TemplateDeclaration::matchWithInstance
>>       + 3.65% Scope::alloc
>>       + 2.76% TemplateTypeParameter::matchArg
>>       + 2.69% TemplateDeclaration::doHeaderInstantiation
>>       + 2.20% TypeInstance::syntaxCopy
>>       + 2.12% TemplateDeclaration::declareParameter
>>       + 2.05% TemplateDeclaration::evaluateConstraint
>>       + 1.82% functionResolve::ParamDeduce::fp
>>       + 1.70% DsymbolExp::semantic
>>       + 1.62% FuncDeclaration::semantic3
>>       + 1.56% Parser::parsePrimaryExp
>>       + 1.54% IsExp::syntaxCopy
>>       + 1.54% TemplateDeclaration::deduceFunctionTemplateMatch
>>       + 1.41% IdentifierExp::semantic
>>       + 1.29% TypeFunction::syntaxCopy
>>       + 1.27% AliasDeclaration::syntaxCopy
>>       + 1.24% 0
>>       + 1.21% VarDeclaration::syntaxCopy
>>       + 1.10% Parser::parseDeclarations
>>       + 1.09% StaticIfDeclaration::syntaxCopy
>>       + 1.01% deduceType::DeduceType::visit
>>       + 0.99% TemplateInstance::semantic
>>       + 0.90% castTo::CastTo::visit
>>       + 0.86% Scope::insert
>>       + 0.83% Parser::parseBasicType
>>       + 0.78% IsExp::semantic
>>       + 0.75% TemplateInstance::findBestMatch::ParamBest::fp
>>       + 0.73% TemplateTupleParameter::matchArg
>>       + 0.65% Parameter::arraySyntaxCopy
>>       + 0.64%
>>
>> _D4ddmd5mtype9TypeTuple6__ctorMFPS4ddmd4root5array41__T5ArrayTC4ddmd10expression10ExpressionZ5ArrayZC4ddmd5mtype9Type
>>       + 0.61% ForeachStatement::semantic
>>       + 0.58% IndexExp::semantic
>>       + 0.57% TupleExp::semantic
>>       + 0.57% ExpInitializer::syntaxCopy
>>       + 0.57% functionResolve
>>       + 0.56% CallExp::semantic
>>       + 0.55% ScopeExp::semantic
>>       + 0.55% VarDeclaration::semantic
>>       + 0.54% FuncDeclaration::syntaxCopy
>>       + 0.52% StringExp::semantic
>>
>>
>
> These are all places where class allocations occur the most it seems.
>
> I was about to propose making this change:
>
>  extern (C) Object _d_newclass(const ClassInfo ci)
>  {
>      auto p = allocmemory(ci.init.length);
> +    *(cast(void **) p) = cast(void*) ci.vtbl;
> -    p[0 .. ci.init.length] = cast(void[])ci.init[];
>      return cast(Object)p;
>  }
>
> But then I checked and found out that Daniel removes all ctors in the D conversion. =)
>
> I guess this is the reason why memcpy calls have increased!
>
>
Oops, apparently I grep'd wrong.  I'll get onto testing this and will raise a PR.


August 12, 2015

On 8/12/2015 6:15 AM, Martin Nowak via dmd-internals wrote:
> On Wednesday, 12 August 2015 at 12:44:00 UTC, Iain Buclaw wrote:
>> By the way, I noticed that ddmd is always built in non-release mode, and I discovered that I can squeeze out more performance by speculatively skipping unneeded _d_invariant calls.
>
> I already compiled ddmd w/ -release to get my numbers.
>

-O -release -inline -boundscheck=off

to get the fastest compiler.
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals
August 12, 2015

On 8/12/2015 9:01 AM, Iain Buclaw via dmd-internals wrote:
>
>
> Oops, apparently I grep'd wrong.  I'll get onto testing this and will raise a PR.
>
>


Memory allocation and object initialization is always a fruitful source for speedups.
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals
August 12, 2015

On 8/12/2015 5:07 AM, Martin Nowak via dmd-internals wrote:
> On 08/12/2015 12:03 PM, Walter Bright via dmd-internals wrote:
>> If we keep finding reasons not to do the switch, it will never happen.
> So far everybody wants to make the switch, I'm just trying to ensure
> it's going to be a success.
>

How about making a PR with the necessary .d files?
_______________________________________________
dmd-internals mailing list
dmd-internals@puremagic.com
http://lists.puremagic.com/mailman/listinfo/dmd-internals