On 08/11/2015 04:38 PM, Iain Buclaw wrote:
> Those extra memcpy calls look interesting.
Sure it looks interesting but it won't suffice for 30% and even for that
10% we'd need to profile and optimize a lot of code during the next 6 weeks.
Here's a breakdown of __memcpy_avx_unaligned callers.
Samples: 40K of event 'cycles', Event count (approx.): 12450194501
Children Self Comm Shared Objec Symbol
- 9.46% 3.44% ddmd libc-2.20.so [.] __memcpy_avx_unaligned
- __memcpy_avx_unaligned
+ 9.06% TypeIdentifier::syntaxCopy
+ 5.98% TemplateInstance::syntaxCopy
+ 4.07% TemplateDeclaration::matchWithInstance
+ 3.65% Scope::alloc
+ 2.76% TemplateTypeParameter::matchArg
+ 2.69% TemplateDeclaration::doHeaderInstantiation
+ 2.20% TypeInstance::syntaxCopy
+ 2.12% TemplateDeclaration::declareParameter
+ 2.05% TemplateDeclaration::evaluateConstraint
+ 1.82% functionResolve::ParamDeduce::fp
+ 1.70% DsymbolExp::semantic
+ 1.62% FuncDeclaration::semantic3
+ 1.56% Parser::parsePrimaryExp
+ 1.54% IsExp::syntaxCopy
+ 1.54% TemplateDeclaration::deduceFunctionTemplateMatch
+ 1.41% IdentifierExp::semantic
+ 1.29% TypeFunction::syntaxCopy
+ 1.27% AliasDeclaration::syntaxCopy
+ 1.24% 0
+ 1.21% VarDeclaration::syntaxCopy
+ 1.10% Parser::parseDeclarations
+ 1.09% StaticIfDeclaration::syntaxCopy
+ 1.01% deduceType::DeduceType::visit
+ 0.99% TemplateInstance::semantic
+ 0.90% castTo::CastTo::visit
+ 0.86% Scope::insert
+ 0.83% Parser::parseBasicType
+ 0.78% IsExp::semantic
+ 0.75% TemplateInstance::findBestMatch::ParamBest::fp
+ 0.73% TemplateTupleParameter::matchArg
+ 0.65% Parameter::arraySyntaxCopy
+ 0.64%
_D4ddmd5mtype9TypeTuple6__ctorMFPS4ddmd4root5array41__T5ArrayTC4ddmd10expression10ExpressionZ5ArrayZC4ddmd5mtype9Type
+ 0.61% ForeachStatement::semantic
+ 0.58% IndexExp::semantic
+ 0.57% TupleExp::semantic
+ 0.57% ExpInitializer::syntaxCopy
+ 0.57% functionResolve
+ 0.56% CallExp::semantic
+ 0.55% ScopeExp::semantic
+ 0.55% VarDeclaration::semantic
+ 0.54% FuncDeclaration::syntaxCopy
+ 0.52% StringExp::semantic