Jump to page: 1 2
Thread overview
proposal: lazy compilation model for compiling binaries
Jun 22, 2013
Timothee Cour
Jun 22, 2013
bearophile
Jun 22, 2013
Dicebot
Jun 24, 2013
Martin Nowak
Jun 24, 2013
Martin Nowak
Jun 24, 2013
Paulo Pinto
Jun 24, 2013
Dicebot
Jun 24, 2013
Martin Nowak
Jun 24, 2013
Timothee Cour
Jun 24, 2013
JS
Jun 24, 2013
Jacob Carlborg
Jun 24, 2013
JS
Jun 25, 2013
Jacob Carlborg
Jun 25, 2013
JS
Jun 24, 2013
OlliP
Jun 24, 2013
Timothee Cour
Jun 25, 2013
Paulo Pinto
June 22, 2013
A)
Currently, D suffers from a high degree of interdependency between modules;
when one wants to use a single symbol (say std.traits.isInputRange), we
pull out all of std.traits, which in turn pulls out all of
std.array,std.string, etc. This results in slow compile times (relatively
to the case where we didn't have to pull all this), and fat binaries: see
example in point "D)" below.

This has been discussed many times before, and some people have suggested breaking modules into submodules such as: std.range.traits, etc to mitigate this a little, however this requires people to change 'import std.range' to 'import std.range.traits' to benefit from it, and also in many cases this will be ineffective.

B)
I'd like to propose something different that can potentially dramatically
reduce compile time/binary size, while not requiring users to scar their
source code as above.

*in short: *perform semantic analysis for a function/template/struct/class
on demand, if that symbol is encountered starting from main().
*
*
*in more details:*
suppose we compile a binary (dmd -ofmain foo1.d foo2.d main.d)
input files are lexed, parsed (code should be syntactically valid)
semantic analysis is performed, but doesn't go inside at
function/template/struct/class declaration
main() symbol is located in symbol table
start lazy semantic analysis from the main() function and using a breadth
first search (BFS) propagation strategy:
a symbol (function/template/struct/class) 's body/return type/template
constraints is only semantically analyzed when that symbol is encountered
along the BFS path.

this strategy could be enabled by a switch -lazy_compilation in dmd. The only time it would differ from existing compilation model would be when some unused code triggers compile error: eg:
----
void foo(){int x=y;}
void main(){}
----
dmd main.d //error: y is undefined
dmd -lazy_compilation main.d //OK: foo is never mentioned starting from
main(), so accept.

This would be very useful to speed up the edit/compile/debug cycle.

Example2:
----
auto foo(){return "import std.stdio;";}
mixin(foo);
void fun2(){import b;}
void main(){writeln("ok");}
----
lazy semantic analysis will analyze main, foo but not fun2, which is not used. foo is analyzed because it is used in a module-level mixin declaration.

C)
*caveats:*
this works when compiling *binaries*, as we know which symbols end up in
the final binary
for compiling libraries (-shared/-static), it works if we have a way to
specify which symbols are meant to be exported (eg
https://www.gnu.org/software/gnulib/manual/html_node/Exported-Symbols-of-Shared-Libraries.html).
Is there, currently?

We could specify a list of symbols to export to dmd via a command line flag.

This could be:
dmd -exported_symbols=filename.d main.d bar.d
with filename.d containing all exported symbols, eg:
----
module exported_symbols;
public import foo.d; //imports all symbols from foo
public import bar:baz;//imports just bar.baz
void fun(){}//imports fun
----


D)
Example showing problem with current situation:
----
module main;
version(A)
import std.range;
else{
      //copy paste here body of 'isInputRange' from std.range
}
void fun(){ auto a=isInputRange!string;}
----
dmd -c main.d:
nm main.o|wc -l: 8
file size of main.o: 1.1K
cpu time (10 runs): 0.119 s

dmd -c -version=A main.d:
nm main.o|wc -l: 324 => 40X
file size of main.o: 72K => 70X
cpu time (10 runs): 2.7 s => 23X

Q: Why do we care about compilation speed, etc, since dmd is already fast?
A1: Many cases where it matters, eg for the REPL I'm working on, that
requires compiling on the fly and needs interactive speed.
A2: for large projects, where compilation can become slow


June 22, 2013
Timothee Cour:

> C)
> *caveats:*
> this works when compiling *binaries*, as we know which symbols end up in the final binary for compiling libraries
> (-shared/-static), it works if we have a way to specify which
> symbols are meant to be exported (eg
> https://www.gnu.org/software/gnulib/manual/html_node/Exported-Symbols-of-Shared-Libraries.html).
> Is there, currently?

For D perhaps there are better/nicer ways to do this.

Bye,
bearophile
June 22, 2013
D has "export" keyword that I always expected to do exactly this until have found out it is actually platform-dependent and useless.
June 24, 2013
On 06/22/2013 11:20 AM, Dicebot wrote:
> D has "export" keyword that I always expected to do exactly this until
> have found out it is actually platform-dependent and useless.
It's buggy and useful.
http://d.puremagic.com/issues/show_bug.cgi?id=9816
We should try to strive for -fvisibility=hidden on UNIX because it allows to optimize non-exported symbols and because we need explicit exports for anyhow.
June 24, 2013
On 06/22/2013 06:45 AM, Timothee Cour wrote:
> Example2:
> ----
> auto foo(){return "import std.stdio;";}
> mixin(foo);
> void fun2(){import b;}
> void main(){writeln("ok");}
> ----
> lazy semantic analysis will analyze main, foo but not fun2, which is not
> used. foo is analyzed because it is used in a module-level mixin
> declaration.
>
Overall it's a good idea. There are already some attempts to shift to lazy semantic analysis, mainly to solve any remaining forward reference issues.
Also for non-optimized builds parsing takes a huge part of the compilation time so that would remain, I don't have detailed numbers though.

June 24, 2013
On 06/24/2013 02:23 AM, Martin Nowak wrote:
> exports for anyhow.
for Windows that is

June 24, 2013
On Sun, Jun 23, 2013 at 5:36 PM, Martin Nowak <code@dawg.eu> wrote:

> On 06/22/2013 06:45 AM, Timothee Cour wrote:
>
>> Example2:
>> ----
>> auto foo(){return "import std.stdio;";}
>> mixin(foo);
>> void fun2(){import b;}
>> void main(){writeln("ok");}
>> ----
>> lazy semantic analysis will analyze main, foo but not fun2, which is not used. foo is analyzed because it is used in a module-level mixin declaration.
>>
>>  Overall it's a good idea. There are already some attempts to shift to
> lazy semantic analysis, mainly to solve any remaining forward reference
> issues.
> Also for non-optimized builds parsing takes a huge part of the compilation
> time so that would remain, I don't have detailed numbers though.
>

why 'that would remain' ? in the proposed lazy compilation model, optimization level is irrelevant. The only thing that matters is whether we have to export all symbols or only specified ones. I agree we should require marking those explicitly with 'export' on all platforms, not just windows. But in doing so we must allow to define those exported symbols outside of where they're defined, otherwise it will make code ugly (eg, what if we want to export std.process.kill in a user shared library and std.process.kill isn't marked as export)

Here's a possibility

module define_exported_symbols;
import std.process;
export std.process.kill; //export all std.process.kill overloads (just 1
function in this case)
export std.process; //export all functions in std.process
export std; //export all functions in std

But I think the best is to keep the current export semantics (but make it work on all platforms not just windows) and provide library code to help with exporting entire modules/packages:

module std.sharedlib; //helper functions for dlls on all platforms
void export_module(alias module_)(module_ mymodule){
}
void export_symbols(R) (R symbols) if(isInputRange!R){//export a range of
symbols
}
/+
usage:
export_module(std.process); //exports all functions in std.process
export_symbols(enumerateFunctions(std.process)); //exports all functions in
std.process; allows to be more flexible by exporting only a subset of those
+/


June 24, 2013
It should be possible to "export"(or rather "share") types,
mixins, templates, generic unit tests, etc. (shared compile time
constructs would just be "copied" to a shared library as they
can't be compiled)

All public compilable constructs should be automatically
exported. A shared keyword added to a function declaration can
mark it as "exportable".


e.g.,

module A;

shared foo(){ ... };
shared mixin template bar() { ... };
shared template Foo(T) { .... };
shared interface Bar { .... };
shared myunittest(F1, F2, ...) { ... );
shared mycontract(F) { .... };
etc...

All shared constructs are added to the export table and available
for use. Generic unit tests and contracts allows one to "collect"
common unit tests and contracts and apply them to arbitrary
functions and classes. By including compile time constructs in a
library allows one to group a set of functionality, both run-time
and compile-time, at one location.



As far as lazy evaluation goes, I think only any reachable symbol
from main should be included regardless unless otherwise
specified.

e.g., suppose we have a scriptable application that uses some
statically shared library. It may be that some custom look
function lookup is used. One needs a way to insure that the
compiler will include symbols that might not be reachable at
compile time. In this case one should simply have to mark a
module as reachable as to include all shared symbols... or lets
say just a group of symbols:

import A {foo, bar, FOO*, !BAR*, ... }

where the brackets are used to tell the compiler to include all
the symbols(with regex capabilities). ! can be used to force
exclusion, technically it shouldn't be needed but it could be
useful in some cases.



June 24, 2013
On Monday, 24 June 2013 at 01:20:46 UTC, Martin Nowak wrote:
> On 06/24/2013 02:23 AM, Martin Nowak wrote:
>> exports for anyhow.
> for Windows that is

And Aix, unless they have adopted the more common UNIX model meanwhile.
June 24, 2013
This is now a bit confusing to me. I just made up my mind to go
with D instead of Go, because Go is too simplistic in my opinion.
Furthermore, calling C from D is a lot easier than from Go. And
now this ... I have too little understanding of D to see what the
impact of this build time issue is. Does this mean build times
come close to what they are in C++ or is this issue only about
builds not being as fast as the D people are used to ..?

Thanks, Oliver


On Saturday, 22 June 2013 at 04:45:31 UTC, Timothee Cour wrote:
> A)
> Currently, D suffers from a high degree of interdependency between modules;
> when one wants to use a single symbol (say std.traits.isInputRange), we
> pull out all of std.traits, which in turn pulls out all of
> std.array,std.string, etc. This results in slow compile times (relatively
> to the case where we didn't have to pull all this), and fat binaries: see
> example in point "D)" below.
>
> This has been discussed many times before, and some people have suggested
> breaking modules into submodules such as: std.range.traits, etc to mitigate
> this a little, however this requires people to change 'import std.range'
> to 'import std.range.traits' to benefit from it, and also in many cases
> this will be ineffective.
>
> B)
> I'd like to propose something different that can potentially dramatically
> reduce compile time/binary size, while not requiring users to scar their
> source code as above.
> ....
« First   ‹ Prev
1 2