View mode: basic / threaded / horizontal-split · Log in · Help
June 12, 2012
Re: AST files instead of DI interface files for faster compilation and easier distribution
Le 12/06/2012 14:39, foobar a écrit :
> Another related question - AFAIK the LLVM folks did/are doing work to
> make their implementation less platform-depended. Could we leverage this
> in ldc to store LLVM bit code as D libs which still retain enough info
> for the compiler to replace header files?
>

LLVM is definitively something I look at more and more. It is a great 
weapon for D IMO.
June 12, 2012
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Tue, 12 Jun 2012 06:46:44 -0700, Jacob Carlborg <doob@me.com> wrote:

> On 2012-06-12 14:09, foobar wrote:
>
>> This is a solved problem since the 80's (E.g. Pascal units). Per Adam's
>> post, the issue is tied to DMD's use of OMF/optlink which we all would
>> like to get rid of anyway. Once we're in proper COFF land, couldn't we
>> just store the required metadata (binary AST?) in special sections in
>> the object files themselves?
>
> Can't the same be done with OMF? I'm not saying I want to keep OMF.
>

OMF doesn't support Custom Sections and I think a custom section is the  
right way to handle this. I found the Borland OMF docs once a while back  
to verify this.

-- 
Adam Wilson
IRC: LightBender
Project Coordinator
The Horizon Project
http://www.thehorizonproject.org/
June 12, 2012
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 06/12/2012 03:54 PM, deadalnix wrote:
> Le 12/06/2012 12:23, Tobias Pankrath a écrit :
>> Currently .di-files are compiler independent. If this should hold for
>> dib-files, too, we'll need a standard ast structure, won't we?
>>
>
> We need it anyway at some point.

Plain D code is already a perfectly fine standard AST structure.

>  AST macro is another example.
>

AST macros may refer to AST structures by their representations as D code.

> It would also greatly simplify compiler writing if the D interpreter
> could be provided as lib (and so run on top of dib file).
>

I don't think so. Writing the interpreter is a rather straightforward 
part of the compiler implementation. Why would you want to run it on top 
of a '.dib' file anyway? Serializing/deserializing the AST is too much 
overhead.

> I want to mention that LLVM IR + metadata can do a really good job here.
> In addition, LLVM people are working on a JIT backend, if you know what
> I mean ;)

Interpreting manually is not harder than CTFE-compatible LLVM IR code 
generation, but the LLVM JIT could certainly be leveraged to improve 
compilation speeds.
June 12, 2012
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 6/12/2012 2:07 AM, timotheecour wrote:
> There's a current pull request to improve di file generation
> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest
> further ideas.
> As far as I understand, di interface files try to achieve these conflicting goals:
>
> 1) speed up compilation by avoiding having to reparse large files over and over.
> 2) hide implementation details for proprietary reasons
> 3) still maintain source code in some form to allow inlining and CTFE
> 4) be human readable

(4) was not a goal.

A .di file could very well be a binary file, but making it look like D source 
enabled them to be loaded with no additional implementation work in the compiler.
June 12, 2012
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Tue, 12 Jun 2012 05:23:16 -0700, Dmitry Olshansky  
<dmitry.olsh@gmail.com> wrote:

> On 12.06.2012 16:09, foobar wrote:
>> On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
>>> On 12/06/12 11:07, timotheecour wrote:
>>>> There's a current pull request to improve di file generation
>>>> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
>>>> suggest further ideas.
>>>> As far as I understand, di interface files try to achieve these
>>>> conflicting goals:
>>>>
>>>> 1) speed up compilation by avoiding having to reparse large files over
>>>> and over.
>>>> 2) hide implementation details for proprietary reasons
>>> > 3) still maintain source code in some form to allow inlining
>>> and CTFE
>>> > 4) be human readable
>>>
>>> Is that actually true? My recollection is that the original motivation
>>> was only goal (2), but I was fairly new to D at the time (2005).
>>>
>>> Here's the original post where it was implemented:
>>> http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
>>> and it got partially merged into DMD 0.141 (Dec 4 2005), first usable
>>> in DMD0.142
>>>
>>> Personally I believe that.di files are *totally* the wrong approach
>>> for goal (1). I don't think goal (1) and (2) have anything in common
>>> at all with each other, except that C tried to achieve both of them
>>> using header files. It's an OK solution for (1) in C, it's a failure
>>> in C++, and a complete failure in D.
>>>
>>> IMHO: If we want goal (1), we should try to achieve goal (1), and stop
>>> pretending its in any way related to goal (2).
>>
>> I absolutely agree with the above and would also add that goal (4) is an
>> anti-feature. In order to get a human readable version of the API the
>> programmer should use *documentation*. D claims that one of its goals is
>> to make it a breeze to provide documentation by bundling a standard tool
>> - DDoc. There's no need to duplicate this just to provide another format
>> when DDoc itself supposed to be format agnostic.
>>
> Absolutely. DDoc being built-in didn't sound right to me at first, BUT  
> it allows us to essentially being able to say that APIs are covered in  
> the DDoc generated files. Not header files etc.
>
>> This is a solved problem since the 80's (E.g. Pascal units).
>
> Right, seeing yet another newbie hit it everyday is a clear indication  
> of a simple fact: people would like to think & work in modules rather  
> then seeing guts of old and crappy OBJ file technology. Linking with C  
> != using C tools everywhere.
>

I completely agree with this. The interactions between the D module system  
and D toolchain are utterly confusing to newcomers, especially those from  
other C-like languages. There are better ways, see .NET Assemblies and  
Pascal Units. These problems were solved decades ago. Why are we still  
using 40-year-old paradigms?

>  >Per Adam's
>> post, the issue is tied to DMD's use of OMF/optlink which we all would
>> like to get rid of anyway. Once we're in proper COFF land, couldn't we
>> just store the required metadata (binary AST?) in special sections in
>> the object files themselves?
>>
> Seconded. At least lexed form could be very compact, I recall early  
> compressors tried doing the Huffman thing on source code tokens with a  
> certain success.
>

I don't see the value of compression. Lexing would already reduce the size  
significantly and compression would only add to processing times. Disk is  
cheap.

Beyond that though, this is absolutely the direction D must head in. In my  
mind the DI generation patch was mostly just a stop-gap to bring DI-gen  
up-to-date with the current system thereby giving us enough time to tackle  
the (admittedly huge) task of building COFF into the backend, emitting the  
lexed source into a special section and then giving the compiler *AND*  
linker the ability to read out the source. For example the giving the  
linker the ability to read out source code essentially requires a  
brand-new linker. Although, it is my personal opinion that the linker  
should be integrated with the compiler and done as one step, this way the  
linker could have intimate knowledge of the source and would enable some  
spectacular LTO options. If only DMD were written in D, then we could  
really open the compile speed throttles with an MT build model...

>> Another related question - AFAIK the LLVM folks did/are doing work to
>> make their implementation less platform-depended. Could we leverage this
>> in ldc to store LLVM bit code as D libs which still retain enough info
>> for the compiler to replace header files?
>>
>
>


-- 
Adam Wilson
IRC: LightBender
Project Coordinator
The Horizon Project
http://www.thehorizonproject.org/
June 12, 2012
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 12.06.2012 22:47, Adam Wilson wrote:
> On Tue, 12 Jun 2012 05:23:16 -0700, Dmitry Olshansky
> <dmitry.olsh@gmail.com> wrote:
>
>> On 12.06.2012 16:09, foobar wrote:
>>> On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
>>>> On 12/06/12 11:07, timotheecour wrote:
>>>>> There's a current pull request to improve di file generation
>>>>> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
>>>>> suggest further ideas.
>>>>> As far as I understand, di interface files try to achieve these
>>>>> conflicting goals:
>>>>>
>>>>> 1) speed up compilation by avoiding having to reparse large files over
>>>>> and over.
>>>>> 2) hide implementation details for proprietary reasons
>>>> > 3) still maintain source code in some form to allow inlining
>>>> and CTFE
>>>> > 4) be human readable
>>>>
>>>> Is that actually true? My recollection is that the original motivation
>>>> was only goal (2), but I was fairly new to D at the time (2005).
>>>>
>>>> Here's the original post where it was implemented:
>>>> http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
>>>> and it got partially merged into DMD 0.141 (Dec 4 2005), first usable
>>>> in DMD0.142
>>>>
>>>> Personally I believe that.di files are *totally* the wrong approach
>>>> for goal (1). I don't think goal (1) and (2) have anything in common
>>>> at all with each other, except that C tried to achieve both of them
>>>> using header files. It's an OK solution for (1) in C, it's a failure
>>>> in C++, and a complete failure in D.
>>>>
>>>> IMHO: If we want goal (1), we should try to achieve goal (1), and stop
>>>> pretending its in any way related to goal (2).
>>>
>>> I absolutely agree with the above and would also add that goal (4) is an
>>> anti-feature. In order to get a human readable version of the API the
>>> programmer should use *documentation*. D claims that one of its goals is
>>> to make it a breeze to provide documentation by bundling a standard tool
>>> - DDoc. There's no need to duplicate this just to provide another format
>>> when DDoc itself supposed to be format agnostic.
>>>
>> Absolutely. DDoc being built-in didn't sound right to me at first, BUT
>> it allows us to essentially being able to say that APIs are covered in
>> the DDoc generated files. Not header files etc.
>>
>>> This is a solved problem since the 80's (E.g. Pascal units).
>>
>> Right, seeing yet another newbie hit it everyday is a clear indication
>> of a simple fact: people would like to think & work in modules rather
>> then seeing guts of old and crappy OBJ file technology. Linking with C
>> != using C tools everywhere.
>>
>
> I completely agree with this. The interactions between the D module
> system and D toolchain are utterly confusing to newcomers, especially
> those from other C-like languages. There are better ways, see .NET
> Assemblies and Pascal Units. These problems were solved decades ago. Why
> are we still using 40-year-old paradigms?
>
>> >Per Adam's
>>> post, the issue is tied to DMD's use of OMF/optlink which we all would
>>> like to get rid of anyway. Once we're in proper COFF land, couldn't we
>>> just store the required metadata (binary AST?) in special sections in
>>> the object files themselves?
>>>
>> Seconded. At least lexed form could be very compact, I recall early
>> compressors tried doing the Huffman thing on source code tokens with a
>> certain success.
>>
>
> I don't see the value of compression. Lexing would already reduce the
> size significantly and compression would only add to processing times.
> Disk is cheap.

I/O is not. (De)Compression on the fly is more and more intersecting 
direction these days. The less you read/write the faster you get. 
Knowing beforehand the distribution of keywords relative frequency is a 
boon. Yet I agree that it's premature at the moment.

>
> Beyond that though, this is absolutely the direction D must head in. In
> my mind the DI generation patch was mostly just a stop-gap to bring
> DI-gen up-to-date with the current system thereby giving us enough time
> to tackle the (admittedly huge) task of building COFF into the backend,
> emitting the lexed source into a special section and then giving the
> compiler *AND* linker the ability to read out the source. For example
> the giving the linker the ability to read out source code essentially
> requires a brand-new linker. Although, it is my personal opinion that
> the linker should be integrated with the compiler and done as one step,
> this way the linker could have intimate knowledge of the source and
> would enable some spectacular LTO options. If only DMD were written in
> D, then we could really open the compile speed throttles with an MT
> build model...
>


-- 
Dmitry Olshansky
June 13, 2012
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Tuesday, 12 June 2012 at 12:23:21 UTC, Dmitry Olshansky wrote:
> On 12.06.2012 16:09, foobar wrote:
>> On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
>>> On 12/06/12 11:07, timotheecour wrote:
>>>> There's a current pull request to improve di file generation
>>>> (https://github.com/D-Programming-Language/dmd/pull/945); 
>>>> I'd like to
>>>> suggest further ideas.
>>>> As far as I understand, di interface files try to achieve 
>>>> these
>>>> conflicting goals:
>>>>
>>>> 1) speed up compilation by avoiding having to reparse large 
>>>> files over
>>>> and over.
>>>> 2) hide implementation details for proprietary reasons
>>> > 3) still maintain source code in some form to allow inlining
>>> and CTFE
>>> > 4) be human readable
>>>
>>> Is that actually true? My recollection is that the original 
>>> motivation
>>> was only goal (2), but I was fairly new to D at the time 
>>> (2005).
>>>
>>> Here's the original post where it was implemented:
>>> http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
>>> and it got partially merged into DMD 0.141 (Dec 4 2005), 
>>> first usable
>>> in DMD0.142
>>>
>>> Personally I believe that.di files are *totally* the wrong 
>>> approach
>>> for goal (1). I don't think goal (1) and (2) have anything in 
>>> common
>>> at all with each other, except that C tried to achieve both 
>>> of them
>>> using header files. It's an OK solution for (1) in C, it's a 
>>> failure
>>> in C++, and a complete failure in D.
>>>
>>> IMHO: If we want goal (1), we should try to achieve goal (1), 
>>> and stop
>>> pretending its in any way related to goal (2).
>>
>> I absolutely agree with the above and would also add that goal 
>> (4) is an
>> anti-feature. In order to get a human readable version of the 
>> API the
>> programmer should use *documentation*. D claims that one of 
>> its goals is
>> to make it a breeze to provide documentation by bundling a 
>> standard tool
>> - DDoc. There's no need to duplicate this just to provide 
>> another format
>> when DDoc itself supposed to be format agnostic.
>>
> Absolutely. DDoc being built-in didn't sound right to me at 
> first, BUT it allows us to essentially being able to say that 
> APIs are covered in the DDoc generated files. Not header files 
> etc.
>
>> This is a solved problem since the 80's (E.g. Pascal units).
>
> Right, seeing yet another newbie hit it everyday is a clear 
> indication of a simple fact: people would like to think & work 
> in modules rather then seeing guts of old and crappy OBJ file 
> technology. Linking with C != using C tools everywhere.
>

Back in the 90's I only moved 100% away from Turbo Pascal into C
land, when I started using Linux at the University and eventually
spent some time doing C++ as well.

It still baffles me, that in 2012 we still need to rely in crappy
C linker tooling, when in the 80's we already had languages with 
proper
modules.

Now we have many mainstream languages with proper modules, but 
many
of them leave in VM land.

Oberon, Go and Delphi/Free Pascal seem to be the only languages 
with native code generation compilers that offer the binary only 
modules solution, while many rely on some form of .di files.
June 13, 2012
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 12/06/12 18:46, Walter Bright wrote:
> On 6/12/2012 2:07 AM, timotheecour wrote:
>> There's a current pull request to improve di file generation
>> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
>> suggest
>> further ideas.
>> As far as I understand, di interface files try to achieve these
>> conflicting goals:
>>
>> 1) speed up compilation by avoiding having to reparse large files over
>> and over.
>> 2) hide implementation details for proprietary reasons
>> 3) still maintain source code in some form to allow inlining and CTFE
>> 4) be human readable
>
> (4) was not a goal.
>
> A .di file could very well be a binary file, but making it look like D
> source enabled them to be loaded with no additional implementation work
> in the compiler.

I don't understand (1) actually.

For two reasons:
(a) Is lexing + parsing really a significant part of the compilation 
time? Has anyone done some solid profiling?

(b) Wasn't one of the goals of D's module system supposed to be that you 
could import a symbol table? Why not just implement that? Seems like 
that would be much faster than .di files can ever be.
June 13, 2012
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 13 June 2012 09:07, Don Clugston <dac@nospam.com> wrote:
> On 12/06/12 18:46, Walter Bright wrote:
>>
>> On 6/12/2012 2:07 AM, timotheecour wrote:
>>>
>>> There's a current pull request to improve di file generation
>>> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
>>> suggest
>>> further ideas.
>>> As far as I understand, di interface files try to achieve these
>>> conflicting goals:
>>>
>>> 1) speed up compilation by avoiding having to reparse large files over
>>> and over.
>>> 2) hide implementation details for proprietary reasons
>>> 3) still maintain source code in some form to allow inlining and CTFE
>>> 4) be human readable
>>
>>
>> (4) was not a goal.
>>
>> A .di file could very well be a binary file, but making it look like D
>> source enabled them to be loaded with no additional implementation work
>> in the compiler.
>
>
> I don't understand (1) actually.
>
> For two reasons:
> (a) Is lexing + parsing really a significant part of the compilation time?
> Has anyone done some solid profiling?
>

Lexing and Parsing are miniscule tasks in comparison to the three
semantic runs done on the code.

I added speed counters into the glue code of GDC some time ago.
http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/

And here is the relavent report to go with it.
http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf


Example: std/xml.d
Module::parse : 0.01 ( 0%)
Module::semantic : 0.50 ( 9%)
Module::semantic2 : 0.02 ( 0%)
Module::semantic3 : 0.04 ( 1%)
Module::genobjfile : 0.10 ( 2%)

For the entire time it took to compile the one file (5.22 seconds) -
it spent almost 10% of it's time running the first semantic analysis.


But that was the D2 frontend / phobos as of September 2010.  I should
re-run a report on updated times and draw some comparisons. :~)


Regards
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
June 13, 2012
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 13.06.2012 13:37, Iain Buclaw wrote:
> On 13 June 2012 09:07, Don Clugston<dac@nospam.com>  wrote:
>> On 12/06/12 18:46, Walter Bright wrote:
>>>
>>> On 6/12/2012 2:07 AM, timotheecour wrote:
>>>>
>>>> There's a current pull request to improve di file generation
>>>> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
>>>> suggest
>>>> further ideas.
>>>> As far as I understand, di interface files try to achieve these
>>>> conflicting goals:
>>>>
>>>> 1) speed up compilation by avoiding having to reparse large files over
>>>> and over.
>>>> 2) hide implementation details for proprietary reasons
>>>> 3) still maintain source code in some form to allow inlining and CTFE
>>>> 4) be human readable
>>>
>>>
>>> (4) was not a goal.
>>>
>>> A .di file could very well be a binary file, but making it look like D
>>> source enabled them to be loaded with no additional implementation work
>>> in the compiler.
>>
>>
>> I don't understand (1) actually.
>>
>> For two reasons:
>> (a) Is lexing + parsing really a significant part of the compilation time?
>> Has anyone done some solid profiling?
>>
>
> Lexing and Parsing are miniscule tasks in comparison to the three
> semantic runs done on the code.
>
> I added speed counters into the glue code of GDC some time ago.
> http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/
>
> And here is the relavent report to go with it.
> http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf
>
>
> Example: std/xml.d
> Module::parse : 0.01 ( 0%)
> Module::semantic : 0.50 ( 9%)
> Module::semantic2 : 0.02 ( 0%)
> Module::semantic3 : 0.04 ( 1%)
> Module::genobjfile : 0.10 ( 2%)
>
> For the entire time it took to compile the one file (5.22 seconds) -
> it spent almost 10% of it's time running the first semantic analysis.
>
>
> But that was the D2 frontend / phobos as of September 2010.  I should
> re-run a report on updated times and draw some comparisons. :~)
>

Is time spent on I/O accounted for in the parse step? And where is the 
rest spent :)

-- 
Dmitry Olshansky
1 2 3 4 5 6
Top | Discussion index | About this forum | D home