Thread overview | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
September 08, 2016 Templates are slow. | ||||
---|---|---|---|---|
| ||||
Hi Guys, I have just hit a barrier trying to optimize the compile-time in binderoo. Roughly 90% of the compile-time is spent instantiating templates. The 10% for CTFE are small in comparison. I will write an article about why templates are slow. The gist will be however : "Templates being slow is an inherent property of templates." (We are only talking about templates as defined by (C++ and D)). That said: Templates are great! But you have to use them sanely. If you are instantiating a template inside another template think very hard about the reason, often you can "inline" the template body of the inner template and get an instant speed win right there. (Don't do this preemptively, ONLY when you know that this template is a problem!) Phobos is has many templates inside templates. In constraints for example. I have no idea how to cut down on template-instanciations in phobos while still maintaining the same user-friendliness. Of course myself and other will continue fighting on the compiler-front. To give you that fastest implementation possible! Cheers, Stefan |
September 08, 2016 Re: Templates are slow. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On Thursday, 8 September 2016 at 05:02:38 UTC, Stefan Koch wrote:
> (Don't do this preemptively, ONLY when you know that this template is a problem!)
How would you measure such things?
Is there such a thing like a "compilation time profiler" ?
(Running oprofile on a dmd with debug info comes first to mind ; however, this would only give me statistics on dmd's source code, not mine.)
|
September 08, 2016 Re: Templates are slow. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sebastien Alaiwan | On Thursday, 8 September 2016 at 06:34:58 UTC, Sebastien Alaiwan wrote:
> On Thursday, 8 September 2016 at 05:02:38 UTC, Stefan Koch wrote:
>> (Don't do this preemptively, ONLY when you know that this template is a problem!)
>
> How would you measure such things?
> Is there such a thing like a "compilation time profiler" ?
>
> (Running oprofile on a dmd with debug info comes first to mind ; however, this would only give me statistics on dmd's source code, not mine.)
I use a special profilng-build of dmd.
oprofile on dmd can give you a good first impression of where you run into problems and then you can wirte special profiling code for this.
If you do not want to write such code send me a message and I will look into it for you :)
|
September 08, 2016 Re: Templates are slow. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On Thursday, 8 September 2016 at 05:02:38 UTC, Stefan Koch wrote:
> I have just hit a barrier trying to optimize the compile-time in binderoo.
I did a double take when Stefan told me the representative sample code I gave him to run with Binderoo instantiated ~20,000 templates and resulted in ~10,000,000 hash map look ups inside the compiler.
I can certainly write it to be more optimised, but one of the goals I have is to make the codebase human readable so that it's not just Manu and myself that can understand the code. As a result, I figure this could be representative of how an ordinary user would write templated code.
|
September 08, 2016 Re: Templates are slow. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On 9/8/16 7:02 AM, Stefan Koch wrote:
>
> I will write an article about why templates are slow.
>
> The gist will be however : "Templates being slow is an inherent property
> of templates." (We are only talking about templates as defined by (C++
> and D)).
That would be a great article. Are there any situations that we can special-case away? Raising the roof. -- Andrei
|
September 08, 2016 Re: Templates are slow. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Thursday, 8 September 2016 at 12:23:35 UTC, Andrei Alexandrescu wrote:
> Are there any situations that we can special-case away? Raising the roof. -- Andrei
The rangefying functions in std.array come to mind.
That will give a huge boost to everyone.
(everyone who uses arrays anyway :))
|
September 08, 2016 Re: Templates are slow. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On Thursday, 8 September 2016 at 05:02:38 UTC, Stefan Koch wrote:
>
> If you are instantiating a template inside another template think very hard about the reason, often you can "inline" the template body of the inner template and get an instant speed win right there.
> (Don't do this preemptively, ONLY when you know that this template is a problem!)
>
What about functions in std.algorithm, like reduce, that have templates with multiple functions within them and the headline function will call private impl functions within them?
|
September 08, 2016 Re: Templates are slow. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ethan Watson | On Thursday, September 08, 2016 07:43:10 Ethan Watson via Digitalmars-d wrote:
> On Thursday, 8 September 2016 at 05:02:38 UTC, Stefan Koch wrote:
> > I have just hit a barrier trying to optimize the compile-time in binderoo.
>
> I did a double take when Stefan told me the representative sample code I gave him to run with Binderoo instantiated ~20,000 templates and resulted in ~10,000,000 hash map look ups inside the compiler.
>
> I can certainly write it to be more optimised, but one of the goals I have is to make the codebase human readable so that it's not just Manu and myself that can understand the code. As a result, I figure this could be representative of how an ordinary user would write templated code.
IIRC, Don posted at one point about how the std.algorithm unit tests ended up with over a million template instantiations. All of the eponymous templates that we use for template constraints really add up, and having heavily range-based code is going to rack up the number of template instantiations as well. It's critical that we do what we can to make templates fast. And if we can't make them fast enough, we'll definitely have to come up with techniques/guidelines for reducing their usage when they're not really needed.
Improvements to CTFE have really helped though. std.metastrings was dog slow in comparison to using CTFE, and at this point, we don't need std.metastrings anymore, and so it's gone. That's definitely not going to work in all cases though - just in cases where something is being done with templates that could be done with a function that could run at runtime now that CTFE can do most things that can be done at runtime. And we've probably gotten most everything we can out of that transition already - at least in Phobos.
- Jonathan M Davis
|
September 08, 2016 Re: Templates are slow. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Thursday, 8 September 2016 at 15:45:53 UTC, Jonathan M Davis wrote:
>
> It's critical that we do what we can to make templates fast. And if we can't make them fast enough, we'll definitely have to come up with techniques/guidelines for reducing their usage when they're not really needed.
>
> - Jonathan M Davis
I agree. We need to make templates faster.
But it will be like squeezing water out of stones.
A few more oblivious optimizations I have tried did not have the desired effect at all.
@Andrei
Also we need to special case ranges in general.
And try harder to inline calls to range functions.
Maybe even early in the frontend.
secondly we need to inline range-chains into each other whenever possible.
If we do this the right way early on we can reduce the symbolName-length as well.
All we need for this is pattern-matching on a type-resolved call-graph.
Which is something I am working on as part of my ctfe work.
|
September 08, 2016 Re: Templates are slow. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stefan Koch | On Thu, Sep 08, 2016 at 04:37:36PM +0000, Stefan Koch via Digitalmars-d wrote: [...] > Also we need to special case ranges in general. > And try harder to inline calls to range functions. > Maybe even early in the frontend. [...] Yeah, dmd's inliner is really pessimistic. It gives up too easily, and as a result often misses a whole series of further opportunities that could have opened up, had it decided to inline. Having range-specific optimizations is a good idea, I think. Since it's one of D's big selling points, it needs to be as high-performance as we can possibly make it. I think there is a lot we can do in this area that we can't do as general optimizations; range-based code has certain recurring structures that should be highly-exploitable in terms of optimization opportunities. IME I've found that GDC is much better at optimizing range-based code because of its aggressive inliner (and also better loop optimization algorithms); but there's probably still room for range-specific optimizations that a general optimizer like the gcc backend wouldn't have. T -- Democracy: The triumph of popularity over principle. -- C.Bond |
Copyright © 1999-2021 by the D Language Foundation