On 12 August 2015 at 16:53, Iain Buclaw <ibuclaw@gdcproject.org> wrote:

On 12 August 2015 at 00:59, Martin Nowak <code@dawg.eu> wrote:
On 08/11/2015 04:38 PM, Iain Buclaw wrote:
> Those extra memcpy calls look interesting.

Sure it looks interesting but it won't suffice for 30% and even for that
10% we'd need to profile and optimize a lot of code during the next 6 weeks.

Here's a breakdown of __memcpy_avx_unaligned callers.

Samples: 40K of event 'cycles', Event count (approx.): 12450194501
  Children      Self  Comm  Shared Objec  Symbol
-    9.46%     3.44%  ddmd  libc-2.20.so  [.] __memcpy_avx_unaligned
   - __memcpy_avx_unaligned
      + 9.06% TypeIdentifier::syntaxCopy
      + 5.98% TemplateInstance::syntaxCopy
      + 4.07% TemplateDeclaration::matchWithInstance
      + 3.65% Scope::alloc
      + 2.76% TemplateTypeParameter::matchArg
      + 2.69% TemplateDeclaration::doHeaderInstantiation
      + 2.20% TypeInstance::syntaxCopy
      + 2.12% TemplateDeclaration::declareParameter
      + 2.05% TemplateDeclaration::evaluateConstraint
      + 1.82% functionResolve::ParamDeduce::fp
      + 1.70% DsymbolExp::semantic
      + 1.62% FuncDeclaration::semantic3
      + 1.56% Parser::parsePrimaryExp
      + 1.54% IsExp::syntaxCopy
      + 1.54% TemplateDeclaration::deduceFunctionTemplateMatch
      + 1.41% IdentifierExp::semantic
      + 1.29% TypeFunction::syntaxCopy
      + 1.27% AliasDeclaration::syntaxCopy
      + 1.24% 0
      + 1.21% VarDeclaration::syntaxCopy
      + 1.10% Parser::parseDeclarations
      + 1.09% StaticIfDeclaration::syntaxCopy
      + 1.01% deduceType::DeduceType::visit
      + 0.99% TemplateInstance::semantic
      + 0.90% castTo::CastTo::visit
      + 0.86% Scope::insert
      + 0.83% Parser::parseBasicType
      + 0.78% IsExp::semantic
      + 0.75% TemplateInstance::findBestMatch::ParamBest::fp
      + 0.73% TemplateTupleParameter::matchArg
      + 0.65% Parameter::arraySyntaxCopy
      + 0.64%
_D4ddmd5mtype9TypeTuple6__ctorMFPS4ddmd4root5array41__T5ArrayTC4ddmd10expression10ExpressionZ5ArrayZC4ddmd5mtype9Type
      + 0.61% ForeachStatement::semantic
      + 0.58% IndexExp::semantic
      + 0.57% TupleExp::semantic
      + 0.57% ExpInitializer::syntaxCopy
      + 0.57% functionResolve
      + 0.56% CallExp::semantic
      + 0.55% ScopeExp::semantic
      + 0.55% VarDeclaration::semantic
      + 0.54% FuncDeclaration::syntaxCopy
      + 0.52% StringExp::semantic



These are all places where class allocations occur the most it seems.

I was about to propose making this change:

 extern (C) Object _d_newclass(const ClassInfo ci)
 {
     auto p = allocmemory(ci.init.length);
+    *(cast(void **) p) = cast(void*) ci.vtbl;
-    p[0 .. ci.init.length] = cast(void[])ci.init[];
     return cast(Object)p;
 }

But then I checked and found out that Daniel removes all ctors in the D conversion. =)

I guess this is the reason why memcpy calls have increased!


Oops, apparently I grep'd wrong.  I'll get onto testing this and will raise a PR.