June 14, 2012
On Thursday, June 14, 2012 10:03:05 Don Clugston wrote:
> On 13/06/12 16:29, Walter Bright wrote:
> > On 6/13/2012 1:07 AM, Don Clugston wrote:
> >> On 12/06/12 18:46, Walter Bright wrote:
> >>> On 6/12/2012 2:07 AM, timotheecour wrote:
> >>>> There's a current pull request to improve di file generation
> >>>> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
> >>>> suggest
> >>>> further ideas.
> >>>> As far as I understand, di interface files try to achieve these
> >>>> conflicting goals:
> >>>> 
> >>>> 1) speed up compilation by avoiding having to reparse large files over
> >>>> and over.
> >>>> 2) hide implementation details for proprietary reasons
> >>>> 3) still maintain source code in some form to allow inlining and CTFE
> >>>> 4) be human readable
> >>> 
> >>> (4) was not a goal.
> >>> 
> >>> A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler.
> >> 
> >> I don't understand (1) actually.
> >> 
> >> For two reasons:
> >> (a) Is lexing + parsing really a significant part of the compilation
> >> time? Has
> >> anyone done some solid profiling?
> > 
> > It is for debug builds.
> 
> Iain's data indicates that it's only a few % of the time taken on
> semantic1().
> Do you have data that shows otherwise?
> 
> It seems to me, that slow parsing is a C++ problem which D already solved.

If this is the case, is there any value at all to using .di files in druntime or Phobos other than in cases where we're specifically trying to hide implementation (e.g. with the GC)? Or do we still end up paying the semantic cost for importing the .d files such that using .di files would still help with compilation times?

- Jonathan M Davis
June 14, 2012
On Thursday, 14 June 2012 at 08:11:02 UTC, Jonathan M Davis wrote:
> Or do we still end up paying the semantic
> cost for importing the .d files such that using .di files would still help with
> compilation times?

Oh, right, the module can use mixins and CTFE, so it should be semantically checked, but the semantic check may be minimal just like in the case of a .di file.
June 15, 2012
On 14/06/12 10:10, Jonathan M Davis wrote:
> On Thursday, June 14, 2012 10:03:05 Don Clugston wrote:
>> On 13/06/12 16:29, Walter Bright wrote:
>>> On 6/13/2012 1:07 AM, Don Clugston wrote:
>>>> On 12/06/12 18:46, Walter Bright wrote:
>>>>> On 6/12/2012 2:07 AM, timotheecour wrote:
>>>>>> There's a current pull request to improve di file generation
>>>>>> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
>>>>>> suggest
>>>>>> further ideas.
>>>>>> As far as I understand, di interface files try to achieve these
>>>>>> conflicting goals:
>>>>>>
>>>>>> 1) speed up compilation by avoiding having to reparse large files over
>>>>>> and over.
>>>>>> 2) hide implementation details for proprietary reasons
>>>>>> 3) still maintain source code in some form to allow inlining and CTFE
>>>>>> 4) be human readable
>>>>>
>>>>> (4) was not a goal.
>>>>>
>>>>> A .di file could very well be a binary file, but making it look like D
>>>>> source enabled them to be loaded with no additional implementation work
>>>>> in the compiler.
>>>>
>>>> I don't understand (1) actually.
>>>>
>>>> For two reasons:
>>>> (a) Is lexing + parsing really a significant part of the compilation
>>>> time? Has
>>>> anyone done some solid profiling?
>>>
>>> It is for debug builds.
>>
>> Iain's data indicates that it's only a few % of the time taken on
>> semantic1().
>> Do you have data that shows otherwise?
>>
>> It seems to me, that slow parsing is a C++ problem which D already solved.
>
> If this is the case, is there any value at all to using .di files in druntime
> or Phobos other than in cases where we're specifically trying to hide
> implementation (e.g. with the GC)? Or do we still end up paying the semantic
> cost for importing the .d files such that using .di files would still help with
> compilation times?
>
> - Jonathan M Davis

I don't think Phobos should use .di files at all. I don't think there are any cases where we want to conceal code.

The performance benefit you would get is completely negligible. It doesn't even reduce the number of files that need to be loaded, just the length of each one.

I think that, for example, improving the way that array literals are dealt with would have at least as much impact on compilation time.
For the DMD backend, fixing up the treatment of comma expressions would have a much bigger impact than getting lexing and parsing time to zero.

And we're well set up for parallel compilation. There's no shortage of things we can do to improve compilation time.

Using di files for speed seems a bit like jettisoning the cargo to keep the ship afloat. It works but you only do it when you've got no other options.
June 15, 2012
On Friday, June 15, 2012 08:58:55 Don Clugston wrote:
> I don't think Phobos should use .di files at all. I don't think there are any cases where we want to conceal code.
> 
> The performance benefit you would get is completely negligible. It doesn't even reduce the number of files that need to be loaded, just the length of each one.
> 
> I think that, for example, improving the way that array literals are
> dealt with would have at least as much impact on compilation time.
> For the DMD backend, fixing up the treatment of comma expressions would
> have a much bigger impact than getting lexing and parsing time to zero.
> 
> And we're well set up for parallel compilation. There's no shortage of things we can do to improve compilation time.
> 
> Using di files for speed seems a bit like jettisoning the cargo to keep the ship afloat. It works but you only do it when you've got no other options.

On several occasions, Walter has expressed the desire to make Phobos use .di files like druntime does, otherwise I probably would never have considered it. Personally, I don't want to bother with it unless there's a large benefit from it, so if we're sure that the gain is minimal, then I say that we should just leave it all as .d files. Most of of Phobos would have to have its implementation left in any .di files anyway so that inlining and CTFE could work.

- Jonathan M Davis
June 16, 2012
On 13 June 2012 12:47, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
> On 13 June 2012 12:33, Kagamin <spam@here.lot> wrote:
>> On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:
>>>
>>> The measurements should be done for modules being imported, not the module
>>> being compiled.
>>> Something like this.
>>> ---
>>> import std.algorithm;
>>> import std.stdio;
>>> import std.typecons;
>>> import std.datetime;
>>>
>>> int ok;
>>> ---
>>
>>
>> Oh and let it import .d files, not .di
>
> std.datetime is one reason for me to run it again. I can imagine that *that* module will have an impact on parse times.  But I'm still persistent that the majority of the compile time in the frontend is done in the first semantic pass, and not the read/parser stage. :~)
>
>

Rebuilt a compile log with latest gdc as of writing on the 2.059 frontend / library.

http://iainbuclaw.files.wordpress.com/2012/06/d2time_report32_2059.pdf http://iainbuclaw.files.wordpress.com/2012/06/d2time_report64_2059.pdf


Notes about it:
- GCC has 4 new time counters
  -  phase setup  (time spent loading the compile time environment)
  -  phase parsing  (time spent in the frontend)
  -  phase generate (time spent in the backend)
  -  phase finalize  (time spent cleaning up and exiting)

- Of the phase parsing stage, it is broken down into 5 components
  -  Module::parse
  -  Module::semantic
  -  Module::semantic2
  -  Module::semantic3
  -  Module::genobjfile

- Module::read, Module::parse and Module::importAll in the one I did 2 years ago are now counted as part of just the one parsing stage, rather than separate just to make it a little bit more balanced. :-)


I'll post a tl;dr later on it.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
June 16, 2012
On 16 June 2012 10:18, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
> On 13 June 2012 12:47, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
>> On 13 June 2012 12:33, Kagamin <spam@here.lot> wrote:
>>> On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:
>>>>
>>>> The measurements should be done for modules being imported, not the module
>>>> being compiled.
>>>> Something like this.
>>>> ---
>>>> import std.algorithm;
>>>> import std.stdio;
>>>> import std.typecons;
>>>> import std.datetime;
>>>>
>>>> int ok;
>>>> ---
>>>
>>>
>>> Oh and let it import .d files, not .di
>>
>> std.datetime is one reason for me to run it again. I can imagine that *that* module will have an impact on parse times.  But I'm still persistent that the majority of the compile time in the frontend is done in the first semantic pass, and not the read/parser stage. :~)
>>
>>
>
> Rebuilt a compile log with latest gdc as of writing on the 2.059 frontend / library.
>
> http://iainbuclaw.files.wordpress.com/2012/06/d2time_report32_2059.pdf http://iainbuclaw.files.wordpress.com/2012/06/d2time_report64_2059.pdf
>
>
> Notes about it:
> - GCC has 4 new time counters
>  -  phase setup  (time spent loading the compile time environment)
>  -  phase parsing  (time spent in the frontend)
>  -  phase generate (time spent in the backend)
>  -  phase finalize  (time spent cleaning up and exiting)
>
> - Of the phase parsing stage, it is broken down into 5 components
>  -  Module::parse
>  -  Module::semantic
>  -  Module::semantic2
>  -  Module::semantic3
>  -  Module::genobjfile
>
> - Module::read, Module::parse and Module::importAll in the one I did 2 years ago are now counted as part of just the one parsing stage, rather than separate just to make it a little bit more balanced. :-)
>
>
> I'll post a tl;dr later on it.
>

tl;dr

Total number of source files compiled: 207
Total time to build druntime and phobos:  78.08 seconds
Time spent parsing: 17.15 seconds
Average time spent parsing: 0.08 seconds
Time spent running semantic passes: 10.04 seconds

Time spent generating backend AST: 2.15 seconds
Time spent in backend: 48.62 seconds


So parsing time has taken quite a hit since I last did any reports on compilation speed of building phobos.  I suspect most of that comes from the loading of symbols from all imports and that there have been some large additions to phobos recently which provide a constant bottle neck if one was to choose compiling one source at a time.  As the apparent large amount of time spent parsing sources does not show when compiling all at once.

 Module::parse: 0.58 seconds (1%)
 Module::semantic: 0.24 seconds (1%)
 Module::semantic2: 0.01 seconds (0%)
 Module::semantic3: 2.85 seconds (6%)
 Module::genobjfile: 1.24 seconds ( 3%)
 TOTAL: 47.06 seconds

Considering that the entire phobos library is some 165K lines of code, I don't see why people aren't laughing about just how quick the frontend is at parsing. :~)


Regards
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
June 16, 2012
> So parsing time has taken quite a hit since I last did any reports on compilation speed of building phobos.

So maybe my post about "keeping import clean" wasn't as irrelevant as I thought.

http://www.digitalmars.com/d/archives/digitalmars/D/Keeping_imports_clean_162890.html#N162890

--
Guillaume
June 16, 2012
On 6/14/2012 1:03 AM, Don Clugston wrote:
>> It is for debug builds.
> Iain's data indicates that it's only a few % of the time taken on semantic1().
> Do you have data that shows otherwise?

Nothing recent, it's mostly from my C++ compiler testing.


>> Yes, it is designed so you could just import a symbol table. It is done
>> as source code, however, because it's trivial to implement.
>
> It has those nasty side-effects listed under (3) though.

I don't think they're nasty or are side effects.
June 16, 2012
On 6/14/2012 11:58 PM, Don Clugston wrote:
> And we're well set up for parallel compilation. There's no shortage of things we
> can do to improve compilation time.

The language is carefully designed, so that at least in theory all the passes could be done in parallel. I've got the file reads in parallel, but I'd love to have the lexing, parsing, semantic, optimization, and code gen all done in parallel. Wouldn't that be awesome!

> Using di files for speed seems a bit like jettisoning the cargo to keep the ship
> afloat. It works but you only do it when you've got no other options.

.di files don't make a whole lotta sense for small files, but the bigger they get, the more they are useful. D needs to be scalable to enormous project sizes.
June 18, 2012
On 17/06/12 00:37, Walter Bright wrote:
> On 6/14/2012 1:03 AM, Don Clugston wrote:
>>> It is for debug builds.
>> Iain's data indicates that it's only a few % of the time taken on
>> semantic1().
>> Do you have data that shows otherwise?
>
> Nothing recent, it's mostly from my C++ compiler testing.

But you argued in your blog that C++ parsing is inherently slow, and you've fixed those problems in the design of D.
And as far as I can tell, you were extremely successful!
Parsing in D is very, very fast.

>>> Yes, it is designed so you could just import a symbol table. It is done
>>> as source code, however, because it's trivial to implement.
>>
>> It has those nasty side-effects listed under (3) though.
>
> I don't think they're nasty or are side effects.

They are new problems which people ask for solutions for. And they are far more difficult to solve than the original problem.