Jump to page: 1 26  
Page
Thread overview
AST files instead of DI interface files for faster compilation and easier distribution
Jun 12, 2012
timotheecour
Jun 12, 2012
Tobias Pankrath
Jun 12, 2012
Timon Gehr
Jun 12, 2012
deadalnix
Jun 12, 2012
Timon Gehr
Jun 12, 2012
Don Clugston
Jun 12, 2012
foobar
Jun 12, 2012
Dmitry Olshansky
Jun 12, 2012
Adam Wilson
Jun 12, 2012
Dmitry Olshansky
Jun 13, 2012
Paulo Pinto
Jun 12, 2012
Jacob Carlborg
Jun 12, 2012
Adam Wilson
Jun 12, 2012
foobar
Jun 12, 2012
deadalnix
Jun 12, 2012
Walter Bright
Jun 13, 2012
Don Clugston
Jun 13, 2012
Iain Buclaw
Jun 13, 2012
Dmitry Olshansky
Jun 13, 2012
Iain Buclaw
Jun 13, 2012
Dmitry Olshansky
Jun 13, 2012
deadalnix
Jun 13, 2012
Kagamin
Jun 13, 2012
Kagamin
Jun 13, 2012
Iain Buclaw
Jun 13, 2012
Kagamin
Jun 13, 2012
Jacob Carlborg
Jun 16, 2012
Iain Buclaw
Jun 19, 2012
deadalnix
Jun 16, 2012
Iain Buclaw
Jun 16, 2012
Guillaume Chatelet
Jun 19, 2012
Iain Buclaw
Jun 13, 2012
Walter Bright
Jun 14, 2012
Don Clugston
Jun 14, 2012
Jonathan M Davis
Jun 14, 2012
Kagamin
Jun 15, 2012
Don Clugston
Jun 15, 2012
Jonathan M Davis
Jun 16, 2012
Walter Bright
Jun 19, 2012
deadalnix
Jun 16, 2012
Walter Bright
Jun 18, 2012
Don Clugston
Jun 18, 2012
Walter Bright
Jun 18, 2012
Daniel
Jun 19, 2012
Chris Cain
Jun 19, 2012
Timon Gehr
Jun 19, 2012
Kagamin
Jun 19, 2012
Kagamin
Jun 19, 2012
dennis luehring
Jun 19, 2012
dennis luehring
Jun 19, 2012
deadalnix
Jun 25, 2012
Martin Nowak
June 12, 2012
There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas.
As far as I understand, di interface files try to achieve these conflicting goals:

1) speed up compilation by avoiding having to reparse large files over and over.
2) hide implementation details for proprietary reasons
3) still maintain source code in some form to allow inlining and CTFE
4) be human readable

-Goals 2) and 3) are clearly contradictory, so that calls for a command line switch (eg -hidesource), which should be off by default, which when set will indeed remove any implementation details (where possible, ie for non-template and non-auto-return functions) but as a counterpart also prevent any chance for inlining/CTFE for the corresponding exported API. That choice will be left to the user.

-Regarding point 1), it won't be untypical to have a D interface file to be almost as large (and slow to parse) as the original source file, even with the upcoming di file improvements (dmd/pull/945), as D encourages the use of templates/auto-return throughout (a large part of phobos would be left quasi-unchanged). In fact, the fast compile time of D _does_ suffer when there are heavy use of templates, or scaling up.

So to make interface files really useful in terms of speeding up compilation, why not directly store the AST (could be text-based like JSON but preferably a portable binary format for speed, call it ".dib" file), with possibly some amount of analysis (eg: version(windows) could be pre-handled). This would be analoguous to precompiled header files (http://en.wikipedia.org/wiki/Precompiled_header), which don't exist in D AFAIK. This could be done by extending the currently incomplete json file generation by dmd, to include AST of implementation of each function we want to export such as templates or stuff to inline). During compilation of a module, "import myfun;" would look for 1) myfun.dib (binary or json precompiled interface file), 2) myfun.di (if still needed), 3) myfun.d.



We could even go a step further, borrowing some ideas from the "framework" feature found in OSX to distribute components: a single D framework would combine the AST (~ precompiled .dib headers) of a set of D modules and a set of libraries.
The user would then use a framework as follows:

    dmd -L-framework mylib -L-Lpath/to/mylib main.d

or simply:

    dmd main.d

if main.d contains pragma(framework,"mylib") and framework mylib is in the search path

As in OSX's frameworks, framework mylib is used both during compilation (resolving import statements in main.d) and linking. Upon encountering an "import myfun;" declaration, the compiler would search the linked in frameworks for a symbol or file representing the corresponding AST of module myfun, and if not found, use the default import mechanism.
That will both speed up compilation times and make distribution of libraries and versioning a breeze: single framework to download and to link against (this is different from what rdmd does). On OSX, frameworks appear as a single file in Finder but are actually directories; here we could have either a single file or a directory as well.

Finally, regarding point 4), a simple command line switch (eg dmd --pretty-print myfun.di) will pretty-print to stdout the AST, and omit the implementation of templates and auto functions for brevity, so they appear as simple di files (but some options could filter out AST nodes for IDE use, etc).

Thanks for your comments!
June 12, 2012
Currently .di-files are compiler independent. If this should hold for dib-files, too, we'll need a standard ast structure, won't we?

June 12, 2012
On 12-06-2012 12:23, Tobias Pankrath wrote:
> Currently .di-files are compiler independent. If this should hold for
> dib-files, too, we'll need a standard ast structure, won't we?
>

Which is a Good Thing (TM). It would /require/ formalization of the language once and for all.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
June 12, 2012
On 12/06/12 11:07, timotheecour wrote:
> There's a current pull request to improve di file generation
> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
> suggest further ideas.
> As far as I understand, di interface files try to achieve these
> conflicting goals:
>
> 1) speed up compilation by avoiding having to reparse large files over
> and over.
> 2) hide implementation details for proprietary reasons
> 3) still maintain source code in some form to allow inlining and CTFE
> 4) be human readable

Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005).

Here's the original post where it was implemented:
http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142

Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D.

IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2).
June 12, 2012
On 06/12/2012 12:47 PM, Alex Rønne Petersen wrote:
> On 12-06-2012 12:23, Tobias Pankrath wrote:
>> Currently .di-files are compiler independent. If this should hold for
>> dib-files, too, we'll need a standard ast structure, won't we?
>>
>
> Which is a Good Thing (TM). It would /require/ formalization of the
> language once and for all.
>

I do not see how this conclusion could be reached.
June 12, 2012
On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
> On 12/06/12 11:07, timotheecour wrote:
>> There's a current pull request to improve di file generation
>> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
>> suggest further ideas.
>> As far as I understand, di interface files try to achieve these
>> conflicting goals:
>>
>> 1) speed up compilation by avoiding having to reparse large files over
>> and over.
>> 2) hide implementation details for proprietary reasons
> > 3) still maintain source code in some form to allow inlining
> and CTFE
> > 4) be human readable
>
> Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005).
>
> Here's the original post where it was implemented:
> http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
> and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142
>
> Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D.
>
> IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2).

I absolutely agree with the above and would also add that goal (4) is an anti-feature. In order to get a human readable version of the API the programmer should use *documentation*. D claims that one of its goals is to make it a breeze to provide documentation by bundling a standard tool - DDoc. There's no need to duplicate this just to provide another format when DDoc itself supposed to be format agnostic.

This is a solved problem since the 80's (E.g. Pascal units). Per Adam's post, the issue is tied to DMD's use of OMF/optlink which we all would like to get rid of anyway. Once we're in proper COFF land, couldn't we just store the required metadata (binary AST?) in special sections in the object files themselves?

Another related question - AFAIK the LLVM folks did/are doing work to make their implementation less platform-depended. Could we leverage this in ldc to store LLVM bit code as D libs which still retain enough info for the compiler to replace header files?

June 12, 2012
On 12.06.2012 16:09, foobar wrote:
> On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
>> On 12/06/12 11:07, timotheecour wrote:
>>> There's a current pull request to improve di file generation
>>> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
>>> suggest further ideas.
>>> As far as I understand, di interface files try to achieve these
>>> conflicting goals:
>>>
>>> 1) speed up compilation by avoiding having to reparse large files over
>>> and over.
>>> 2) hide implementation details for proprietary reasons
>> > 3) still maintain source code in some form to allow inlining
>> and CTFE
>> > 4) be human readable
>>
>> Is that actually true? My recollection is that the original motivation
>> was only goal (2), but I was fairly new to D at the time (2005).
>>
>> Here's the original post where it was implemented:
>> http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
>> and it got partially merged into DMD 0.141 (Dec 4 2005), first usable
>> in DMD0.142
>>
>> Personally I believe that.di files are *totally* the wrong approach
>> for goal (1). I don't think goal (1) and (2) have anything in common
>> at all with each other, except that C tried to achieve both of them
>> using header files. It's an OK solution for (1) in C, it's a failure
>> in C++, and a complete failure in D.
>>
>> IMHO: If we want goal (1), we should try to achieve goal (1), and stop
>> pretending its in any way related to goal (2).
>
> I absolutely agree with the above and would also add that goal (4) is an
> anti-feature. In order to get a human readable version of the API the
> programmer should use *documentation*. D claims that one of its goals is
> to make it a breeze to provide documentation by bundling a standard tool
> - DDoc. There's no need to duplicate this just to provide another format
> when DDoc itself supposed to be format agnostic.
>
Absolutely. DDoc being built-in didn't sound right to me at first, BUT it allows us to essentially being able to say that APIs are covered in the DDoc generated files. Not header files etc.

> This is a solved problem since the 80's (E.g. Pascal units).

Right, seeing yet another newbie hit it everyday is a clear indication of a simple fact: people would like to think & work in modules rather then seeing guts of old and crappy OBJ file technology. Linking with C != using C tools everywhere.

>Per Adam's
> post, the issue is tied to DMD's use of OMF/optlink which we all would
> like to get rid of anyway. Once we're in proper COFF land, couldn't we
> just store the required metadata (binary AST?) in special sections in
> the object files themselves?
>
Seconded. At least lexed form could be very compact, I recall early compressors tried doing the Huffman thing on source code tokens with a certain success.

> Another related question - AFAIK the LLVM folks did/are doing work to
> make their implementation less platform-depended. Could we leverage this
> in ldc to store LLVM bit code as D libs which still retain enough info
> for the compiler to replace header files?
>


-- 
Dmitry Olshansky
June 12, 2012
On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
> On 12/06/12 11:07, timotheecour wrote:
>> There's a current pull request to improve di file generation
>> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
>> suggest further ideas.
>> As far as I understand, di interface files try to achieve these
>> conflicting goals:
>>
>> 1) speed up compilation by avoiding having to reparse large files over
>> and over.
>> 2) hide implementation details for proprietary reasons
> > 3) still maintain source code in some form to allow inlining
> and CTFE
> > 4) be human readable
>
> Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005).
>
> Here's the original post where it was implemented:
> http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
> and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142
>
> Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D.
>
> IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2).

I absolutely agree with the above and would also add that goal (4) is an anti-feature. In order to get a human readable version of the API the programmer should use *documentation*. D claims that one of its goals is to make it a breeze to provide documentation by bundling a standard tool - DDoc. There's no need to duplicate this just to provide another format when DDoc itself supposed to be format agnostic.

This is a solved problem since the 80's (E.g. Pascal units). Per Adam's post, the issue is tied to DMD's use of OMF/optlink which we all would like to get rid of anyway. Once we're in proper COFF land, couldn't we just store the required metadata (binary AST?) in special sections in the object files themselves?

Another related question - AFAIK the LLVM folks did/are doing work to make their implementation less platform-depended. Could we leverage this in ldc to store LLVM bit code as D libs which still retain enough info for the compiler to replace header files?

June 12, 2012
On 2012-06-12 14:09, foobar wrote:

> This is a solved problem since the 80's (E.g. Pascal units). Per Adam's
> post, the issue is tied to DMD's use of OMF/optlink which we all would
> like to get rid of anyway. Once we're in proper COFF land, couldn't we
> just store the required metadata (binary AST?) in special sections in
> the object files themselves?

Can't the same be done with OMF? I'm not saying I want to keep OMF.

-- 
/Jacob Carlborg
June 12, 2012
Le 12/06/2012 12:23, Tobias Pankrath a écrit :
> Currently .di-files are compiler independent. If this should hold for
> dib-files, too, we'll need a standard ast structure, won't we?
>

We need it anyway at some point. AST macro is another example.

It would also greatly simplify compiler writing if the D interpreter could be provided as lib (and so run on top of dib file).

I want to mention that LLVM IR + metadata can do a really good job here. In addition, LLVM people are working on a JIT backend, if you know what I mean ;)
« First   ‹ Prev
1 2 3 4 5 6