Jump to page: 1 2
Thread overview
Compilation times and idiomatic D code
Jul 05, 2017
H. S. Teoh
Jul 05, 2017
Stefan Koch
Jul 05, 2017
jmh530
Jul 05, 2017
H. S. Teoh
Jul 07, 2017
H. S. Teoh
Jul 06, 2017
John Colvin
Jul 05, 2017
kinke
Jul 06, 2017
Jacob Carlborg
Jul 06, 2017
Atila Neves
Jul 06, 2017
H. S. Teoh
July 05, 2017
Over time, what is considered "idiomatic D" has changed, and nowadays it seems to be leaning heavily towards range-based code with UFCS chains using std.algorithm and similar reusable pieces of code.

D (well, DMD specifically) is famed for its lightning speed compilation
times.

So this left me wondering why my latest D project, a smallish codebase with only ~5000 lines of code, a good part of which are comments, takes about 11 seconds to compile.

A first hint is that these meager 5000 lines of code compile to a 600MB executable. Well, large executables have been the plague of D since the beginning, but the reasoning has always been that hello world examples don't really count, because the language offers the machinery for much more than that, and the idea is that as the code size grows, the "bloat to functionality" ratio decreases.  But still... 600MB for 5000 lines of code seems a bit excessive. Especially when stripping symbols cut off about *half* of that size.

Which leads to the discovery, to my horror, that there are some very,
VERY large symbols that are generated. Including one that's 388881
characters long. Yes, that's almost 400KB just for ONE symbol.  This
particular symbol is the result of a long UFCS chain in the main
program, and contains a lot of repeated elements, like
myTemplate__lambdaXXX_myTemplateArguments__mapXXX__Result__myTemplateArguments
and so on.  Each additional member in the UFCS chain causes a repetition
of all the previous members' return type names, plus the new typename,
causing an O(n^2) explosion in symbol size.

Worse yet, because the typename encoded in this monster symbol is a range, you have the same 300+KB of typename repeated for each of the range primitives. And anything else this typename happens to be a template argument to.  There's another related symbol that's 388944 characters long.  Not to mention all the range primitives (along with their similarly huge typenames) of all the smaller types contained within this monster typename.

Given this, it's no surprise that the compiler took 11 seconds to compile a 5000-line program. Just imagine how much time is spent generating these huge symbols, storing them in the symbol table, comparing them in symbol table lookups, writing them to the executable, etc..  And we're not even talking about the other smaller, but still huge symbols that are also present -- 100KB symbols, 50KB symbols, 10KB symbols, etc..  And think about the impact of this on the compiler's memory footprint.

IOW, the very range-based idiom that has become one of the defining characteristics of modern D is negating the selling point of fast compilation.

I vaguely remember there was talk about compressing symbols when they get too long... is there any hope of seeing this realized in the near future?


T

-- 
War doesn't prove who's right, just who's left. -- BSD Games' Fortune
July 05, 2017
On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote:
>
> I vaguely remember there was talk about compressing symbols when they get too long... is there any hope of seeing this realized in the near future?

Yes there is.
Rainer Schuetze is quite close to a solution. Which reduces the symbol-name bloat significantly.
See https://github.com/dlang/dmd/pull/5855


There is still a problem with the template system as a whole.
Which I am working on in my spare time.
And which will become my focus after newCTFE is done.
July 05, 2017
On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote:
> I vaguely remember there was talk about compressing symbols when they get too long... is there any hope of seeing this realized in the near future?

LDC has an experimental feature replacing long names by their hash; ldc2 -help:
...
  -hash-threshold=<uint>                    - Hash symbol names longer than this threshold (experimental)
July 05, 2017
On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:
>
> Yes there is.
> Rainer Schuetze is quite close to a solution. Which reduces the symbol-name bloat significantly.
> See https://github.com/dlang/dmd/pull/5855
>
>

A table in the comments [1] shows a significant reduction in bloat when compiling phobos unit tests. However, it shows a slight increase in build time. I would have expected a decrease. Any idea why that is?

[1] https://github.com/dlang/dmd/pull/5855#issuecomment-310653542
July 05, 2017
On Wed, Jul 05, 2017 at 09:18:45PM +0000, jmh530 via Digitalmars-d wrote:
> On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:
> > 
> > Yes there is.
> > Rainer Schuetze is quite close to a solution. Which reduces the
> > symbol-name bloat significantly.
> > See https://github.com/dlang/dmd/pull/5855

That's very nice.  Hope we will get this through sooner rather than later!


[...]
> A table in the comments [1] shows a significant reduction in bloat when compiling phobos unit tests. However, it shows a slight increase in build time. I would have expected a decrease. Any idea why that is?
> 
> [1] https://github.com/dlang/dmd/pull/5855#issuecomment-310653542

The same comment points to a refactoring PR by Walter (dmd #6841). Not sure why that PR would interact with this one in this way.

In any case, I think the actual compilation times would depend on the details of the code.  If you're using relatively shallow UFCS chains, like Phobos unittests tend to do, probably the compressed symbols won't give very much advantage over the cost of computing the compression. But if you have heavy usage of UFCS like in my code, this should cause significant speedups from not having to operate on 300KB large symbols.


T

-- 
Help a man when he is in trouble and he will remember you when he is in trouble again.
July 06, 2017
On 2017-07-05 22:12, H. S. Teoh via Digitalmars-d wrote:
> Over time, what is considered "idiomatic D" has changed, and nowadays it
> seems to be leaning heavily towards range-based code with UFCS chains
> using std.algorithm and similar reusable pieces of code.

It's not UFCS per say that causes the problem. If you're using the traditional calling syntax it would generate the same symbols.

> D (well, DMD specifically) is famed for its lightning speed compilation
> times.
>
> So this left me wondering why my latest D project, a smallish codebase
> with only ~5000 lines of code, a good part of which are comments, takes
> about 11 seconds to compile.

Yeah, it's usually all these D specific compile time features that is slowing down compilation.

DWT and Tango are two good examples of large code bases where very few of these features are used, they're written in a more traditional style. They're at least 200k lines of code each and, IIRC, takes around 10 seconds (or less) to compile, for a full build.

-- 
/Jacob Carlborg
July 06, 2017
On Thursday, 6 July 2017 at 12:00:29 UTC, Jacob Carlborg wrote:
> On 2017-07-05 22:12, H. S. Teoh via Digitalmars-d wrote:
>> [...]
>
> It's not UFCS per say that causes the problem. If you're using the traditional calling syntax it would generate the same symbols.
>
>> [...]
>
> Yeah, it's usually all these D specific compile time features that is slowing down compilation.
>
> DWT and Tango are two good examples of large code bases where very few of these features are used, they're written in a more traditional style. They're at least 200k lines of code each and, IIRC, takes around 10 seconds (or less) to compile, for a full build.

IIRC building Tango per package instead of all-at-once got the build time down to less than a second.

Atila
July 06, 2017
On Thu, Jul 06, 2017 at 01:32:04PM +0000, Atila Neves via Digitalmars-d wrote:
> On Thursday, 6 July 2017 at 12:00:29 UTC, Jacob Carlborg wrote:
[...]
> > Yeah, it's usually all these D specific compile time features that is slowing down compilation.
> > 
> > DWT and Tango are two good examples of large code bases where very few of these features are used, they're written in a more traditional style.  They're at least 200k lines of code each and, IIRC, takes around 10 seconds (or less) to compile, for a full build.
> 
> IIRC building Tango per package instead of all-at-once got the build time down to less than a second.
[...]

Well, obviously D's famed compilation speed must still be applicable *somewhere*, otherwise we'd be hearing loud complaints. :-D

My point was that D's compile-time features, which are a big draw to me personally, and also becoming a selling point of D, need improvement in this area.

I'm very happy to be pointed to Rainer's PR that implements symbol backreferencing compression.  Apparently it has successfully compressed the largest symbol generated by Phobos unittests from 30KB (or something like that) down to about 1100 characters, which, though still on the large side, is much more reasonable than the present situation.  I hope this PR will get merged in the near future.


T

-- 
Making non-nullable pointers is just plugging one hole in a cheese grater. -- Walter Bright
July 06, 2017
On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:
> On Wednesday, 5 July 2017 at 20:12:40 UTC, H. S. Teoh wrote:
>>
>> I vaguely remember there was talk about compressing symbols when they get too long... is there any hope of seeing this realized in the near future?
>
> Yes there is.
> Rainer Schuetze is quite close to a solution. Which reduces the symbol-name bloat significantly.
> See https://github.com/dlang/dmd/pull/5855
>
>
> There is still a problem with the template system as a whole.
> Which I am working on in my spare time.
> And which will become my focus after newCTFE is done.

Please give consent for the D Foundation to clone you.
July 07, 2017
On 7/5/17 5:24 PM, H. S. Teoh via Digitalmars-d wrote:
> On Wed, Jul 05, 2017 at 09:18:45PM +0000, jmh530 via Digitalmars-d wrote:
>> On Wednesday, 5 July 2017 at 20:32:08 UTC, Stefan Koch wrote:
>>>
>>> Yes there is.
>>> Rainer Schuetze is quite close to a solution. Which reduces the
>>> symbol-name bloat significantly.
>>> See https://github.com/dlang/dmd/pull/5855
> 
> That's very nice.  Hope we will get this through sooner rather than
> later!

I'm super-psyched this has moved from "proof of concept" to ready for review. Kudos to Rainer for his work on this! Has been a PITA for a while:

https://issues.dlang.org/show_bug.cgi?id=15831
https://forum.dlang.org/post/n96k3g$ka5$1@digitalmars.com

> In any case, I think the actual compilation times would depend on the
> details of the code.  If you're using relatively shallow UFCS chains,
> like Phobos unittests tend to do, probably the compressed symbols won't
> give very much advantage over the cost of computing the compression.
> But if you have heavy usage of UFCS like in my code, this should cause
> significant speedups from not having to operate on 300KB large symbols.

I have found that the linker gets REALLY slow when the symbols get large. So it's not necessarily the compiler that's slow for this.

-Steve
« First   ‹ Prev
1 2