September 08, 2016
I recently went through the process of optimizing the build time on one of my projects. I started at ~3.08s, and got it down to ~1.6s. The project is around 7000 non-comment-non-whitespace LOC. I timed the build in a pretty non-rigourous fashion (I just timed the python script that kicks off a command-line call to dmd and waits for it to finish), however I only cared about making changes that resulted in large improvements to the build time, so this was good enough for my purposes.

I too found that template instantiation was responsible for a lot of the extra build time. I found running dmd -v very helpful in tracking down excessive template instantiations or other places where the compiler was doing a lot of work that could be avoided.

The steps I took were as follows:

- Start (3.08s)
- I was using custom assert function that grabbed __LINE__ and __FILE__ as template arguments, meaning each of the ~130 assert calls required a separate instantiation. I switched to passing those in as run-time arguments (2.85s)
- I had a similar wrapper around some logging functions in std.experimental.logger. I made a small change to std.experimental.logger to allow a call path with no template instantiations, and similarly fixed my own wrapper. I had ~70 logging calls (2.7s)
- Recompiled DMD with VS2015 (2.5s)
- Overclocked my CPU :D (2.3s)
- Created a file called heavytemplates.d that built to a .lib in a separate build step. The first templates I pulled out were a couple std.regex calls and instantiations (1.9s)
- Changed some tuples into structs (negligible improvement)
- Pulled several templates into heavytemplates.d that instantiate recursively over the Gamestate (a very large struct) (1.75s)
- Pulled out template instantiations used by msgpack-d, which also instantiate recursively over the Gamestate for save/load of the game (1.6s)

Of all of these, I was most surprised by the gain I got from pulling out std.regex calls into a separate build (0.4ms). Whether or not I used compile-time regexes didn't seem to affect build time substantially, just that I used anything at all. Also, whether I had one regex call or five didn't seem to matter, likely because std.regex instantiates using the string type as a parameter, and I just used plain old 'string' for all regex uses.

There's still work I could do, but at some point I start to get diminishing returns, and have to actually work on features instead of just optimizing my build  :D
September 08, 2016
It's true that templates are inherently slow, and there isn't a ton we can do about that. However, almost every time I compile the project (hundreds of times per day), the overwhelming majority of the time, the same templates are being re-instantiated in exactly the same way. I can't help but wonder if there were some way to automatically cache templates instantiations between runs of dmd?

The heavytemplates.d workaround I've used kind of accomplishes this as a total hack job. However...

- It adds complexity to the build process
- It adds a small overhead of linking in an extra .lib (although this is dwarfed by the win from no longer rebuilding expensive templates every build)
- It means that when heavytemplates.d changes, my rebuild is significantly longer than before since I'm running dmd twice
- It means extra work to implement that we don't want every developer to do themselves

Am I crazy in wondering about caching template instantiations? I understand that an incremental build would kind of accomplish this goal, but that comes with its own set of problems. I can't help but think that there's some way to make dmd smarter about not redoing the exact same work build after build, when the templates and their instantiations only change very rarely.
September 08, 2016
On Thursday, 8 September 2016 at 19:17:42 UTC, Lewis wrote:
> I can't help but wonder if there were some way to automatically cache templates instantiations between runs of dmd?

I'm running with Visual D, which has a "COMPILE ALL THE THINGS" mentality as the default. As part of the rapid iteration part of Binderoo, I plan on doing incremental linking.

Of course, if all template instantiations go in to one object file, that really ruins it. Each template instantiation going in to a separate object file will actually make life significantly easier, as each compile will have less output. The only time those template instantiations need to recompile is if the invoking module changes; the template's dependencies change; or the module the template lives in changes.

My opinion is that splitting up object files will do more to reduce compile time for me than anything else, the pipeline we had for Quantum Break was to compile and link in separate steps so it's not much effort at all for me to keep that idea running in Binderoo and make it incrementally link. But I don't know the DMD code and I'm not a compiler writer, so I cannot say that authoritatively. It sounds very reasonable to me at least.
September 08, 2016
On Thursday, 8 September 2016 at 19:49:38 UTC, Ethan Watson wrote:
> On Thursday, 8 September 2016 at 19:17:42 UTC, Lewis wrote:
>> I can't help but wonder if there were some way to automatically cache templates instantiations between runs of dmd?
>
> I'm running with Visual D, which has a "COMPILE ALL THE THINGS" mentality as the default. As part of the rapid iteration part of Binderoo, I plan on doing incremental linking.
>
> Of course, if all template instantiations go in to one object file, that really ruins it. Each template instantiation going in to a separate object file will actually make life significantly easier, as each compile will have less output. The only time those template instantiations need to recompile is if the invoking module changes; the template's dependencies change; or the module the template lives in changes.
>
> My opinion is that splitting up object files will do more to reduce compile time for me than anything else, the pipeline we had for Quantum Break was to compile and link in separate steps so it's not much effort at all for me to keep that idea running in Binderoo and make it incrementally link. But I don't know the DMD code and I'm not a compiler writer, so I cannot say that authoritatively. It sounds very reasonable to me at least.

generating separate object files for each template instanciation is and then only re-generating on change will only be effective if they do not change much.
From one build to the next.

For binderoos purpose this could be rather effective.
As long as no one adds fields at the beginning of the structs :)
Without incremental linking however your compile-times will shoot through the roof. And will probably damage the moon as well.

September 08, 2016
On Thursday, 8 September 2016 at 19:17:42 UTC, Lewis wrote:
>
> Am I crazy in wondering about caching template instantiations? I understand that an incremental build would kind of accomplish this goal, but that comes with its own set of problems.

Not as good as what you propose, but: LDC 1.1.0 can do _codegen_ caching which I guess is some intermediate form of incremental building.

My testcase is a unittest piece from Weka.io that instantiates, oh, I don't remember exactly, 100.000+ templates. It takes about 65 seconds to compile. With codegen caching, the re(!)compile time on cache-hit is reduced to 39s. Note that the front-end still instantiates all those templates, but LDC's codegen at -O0 is not as fast as DMD's (calculating the hash also takes time and could be optimized further).

In summary, for LDC -O3 builds, you can expect a large speed boost by just adding `-ir2obj-cache=<cache dir>` to the commandline (LDC >= 1.1.0-alpha1).

-Johan

September 08, 2016
Hi Guys,

I have some more data.
In the binderoo example the main time is spent in the backend.
generating code and writing objects files.

The front-end spends most of it's time comparing strings of unique type-names :)
One particular outlier in the backend code is the function ecom which eliminates common subexpression.
We would potentially save some time by not emitting those in the first-place.


September 08, 2016
On Thursday, 8 September 2016 at 22:57:07 UTC, Stefan Koch wrote:

> The front-end spends most of it's time comparing strings of unique type-names :)

(Waits for Walter to say, "Use a pool Luke!")


September 09, 2016
On Thursday, 8 September 2016 at 20:10:01 UTC, Stefan Koch wrote:
> generating separate object files for each template instanciation is and then only re-generating on change will only be effective if they do not change much.
> From one build to the next.
>

You'd have tens of thousands of file and a big io problem.

September 09, 2016
On Friday, 9 September 2016 at 01:38:40 UTC, deadalnix wrote:
> On Thursday, 8 September 2016 at 20:10:01 UTC, Stefan Koch wrote:
>> generating separate object files for each template instanciation is and then only re-generating on change will only be effective if they do not change much.
>> From one build to the next.
>>
>
> You'd have tens of thousands of file and a big io problem.

I already thought about that.
The Idea is to stuff the object-code of all templates in one-file with a bit a meta-data.
make a do a hash lookup at instanciation.
And write the cached code in of the instanciation is found.

I agree, file I/O would kill any speed win many times over!

September 09, 2016
On 9/8/16 6:57 PM, Stefan Koch wrote:
> Hi Guys,
>
> I have some more data.
> In the binderoo example the main time is spent in the backend.
> generating code and writing objects files.

If we ever get Rainer's patch to collapse repetitive templates, we may help this problem. https://github.com/dlang/dmd/pull/5855

>
> The front-end spends most of it's time comparing strings of unique
> type-names :)

I thought the front end was changed to use the string pointer for symbol names as the match so string comparisons aren't done?

Hm... maybe to intern the string? That kind of makes sense.

I just had a thought. If you hash the string, and then compare the length of the string and first and last character along with the hash, what are the chances of it being a false positive?

-Steve