View mode: basic / threaded / horizontal-split · Log in · Help
July 25, 2012
Re: What is the compilation model of D?
On Wed, 25 Jul 2012 21:54:29 +0200
"David Piepgrass" <qwertie256@gmail.com> wrote:

> Thanks for the very good description, Nick! So if I understand 
> correctly, if
> 
> 1. I use an "auto" return value or suchlike in a module Y.d
> 2. module X.d calls this function
> 3. I call "dmd -c X.d" and "dmd -c Y.d" as separate steps
> 

See, now you're getting into some details that I'm not entirely
familiar with ;)... 

> Then the compiler will have to fully parse Y twice and fully 
> analyze the Y function twice, although it generates object code 
> for the function only once. Right?

That's my understanding of it, yes.

> I wonder how smart it is about 
> not analyzing things it does not need to analyze (e.g. when Y is 
> a big module but X only calls one function from it - the compiler 
> has to parse Y fully but it should avoid most of the semantic 
> analysis.)

I don't know how smart it is about that.

If you have a template that never gets instantiated by *anything*, then
I do know that semantic analysis won't get run on it since
D's templates, like C++ templates (and unlike C#'s generics) can *only*
be evaluated once they're instantiated.  

If, OTOH, you have a plain old function that never gets called, I'm
guessing semantics probably still get run on it.

Anything else: I dunno. :/

> 
> What about templates? In C++ it is a problem that the compiler 
> will instantiate templates repeatedly, say if I use 
> vector<string> in 20 source files, the compiler will generate and 
> store 20 copies of vector<string> (plus 20 copies of 
> basic_string<char>, too) in object files.
> 
> 1. So in D, if I compile the 20 sources separately, does the same 
> thing happen (same collection template instantiated 20 times with 
> all 20 copies stored)?

Again, I'm not certain about this, other people would be able to
answer better, but I *think* it works like this:

If you pass all the files into DMD at once, then it'll only evaluate
and generate code for vector<string> once. If you pass the files in
as separate calls to DMD, then it's do semantic analysis on
vector<string> twenty times, and I have no idea whether code will get
generated one time or twenty times.

> 2. If I compile the 20 sources all together, I guess the template 
> would be instantiated just once, but then which .obj file does 
> the instantiated template go in?
> 

Unless things have been fixed since last I heared, this is actually the
root of the problem with incremental compilation and templates. The
compiler apparently makes some odd, or maybe inconsistent choices about
what obj to stick the template into. I don't know the details of
it though, just that in the past, people attempting to do incremental
compilation have run into occasional linking issues that were traced
back to problems in how DMD handles where to put instantiated
templates. 

> 
> I don't even want to legitimize C++ compiler speed by comparing 
> it to any other language ;)
> 

Fair enough :)

> >> - Is there any concept of an incremental build?
> >
> > Yes, but there's a few "gotcha"s:
> >
> > 1. D compiles so damn fast that it's not nearly as much of an 
> > issue as
> > it is with C++ (which is notoriously ultra-slow compared
> > to...everything, hence the monumental importance of C++'s 
> > incremental
> > builds).
> 
> I figure as CTFE is used more, especially when it is used to 
> decide which template overloads are valid or how a mixin will 
> behave, this will slow down the compiler more and more, thus 
> making incremental builds more important. A typical example would 
> be a compile-time parser-generator, or compiled regexes.
> 

That's probably a fair assumption.

> Plus, I've heard some people complaining that the compiler uses 
> over 1 GB RAM, and splitting up compilation into parts might help 
> with that.
> 

Yea, the problem is, DMD doesn't currently free any of the memory it
takes, so mem usage just grows and grows. That's a known issue that
needs to be taken care of at some point. 

> BTW, I think I heard the compiler uses multithreading to speed up 
> the build, is that right?
> 

Yes, it does. But someone else will have to explain how it actually uses
multithreading, ie, what it multithreads, because I've got no clue ;)
I think it's fairly coarse-grained, like on the module-level, but
that's all I know.

> > It keeps diving deeper and deeper to find anything it can 
> > "start" with.
> > One it finds that, it'll just build everything back up in 
> > whatever
> > order is necessary.
> 
> I hope someone can give more details about this.
> 

I hope so too :)
July 26, 2012
Re: What is the compilation model of D?
On 2012-07-25 17:35, David Piepgrass wrote:

> Plus, it isn't just build times that concern me. In C# I'm used to
> having an IDE that immediately understands what I have typed, giving me
> error messages and keeping metadata about the program up-to-date within
> 2 seconds. I can edit a class definition in file A and get code
> completion for it in file B, 2 seconds later. I don't expect the IDE can
> ever do that if the compiler can't do a debug build in a similar timeframe.

That's not necessarily true. The C# and Java compilers in these IDEs are 
built to be able to handle incremental compilation at a very fine 
grained level. We're not talking recompiling just a single file, we're 
talking recompiling just a part of a single file.

DMD and other D compiler are just not built to handle this. They don't 
handle incremental builds at all. There are various reason why it's more 
difficult to make an incremental build system with D. Most of the reason 
are due to meta programming (templates, CTFE, mixins and other things).

-- 
/Jacob Carlborg
July 26, 2012
Re: What is the compilation model of D?
On 2012-07-25 21:54, David Piepgrass wrote:
> Thanks for the very good description, Nick! So if I understand
> correctly, if
>
> 1. I use an "auto" return value or suchlike in a module Y.d
> 2. module X.d calls this function
> 3. I call "dmd -c X.d" and "dmd -c Y.d" as separate steps
>
> Then the compiler will have to fully parse Y twice and fully analyze the
> Y function twice, although it generates object code for the function
> only once. Right? I wonder how smart it is about not analyzing things it
> does not need to analyze (e.g. when Y is a big module but X only calls
> one function from it - the compiler has to parse Y fully but it should
> avoid most of the semantic analysis.)

Yes, I think that's correct. But if you give the compiler all the source 
code at once it should only need to parse a given module only once. D 
doesn't use textual includes like C/C++ does, it just symbolically 
refers to other symbols (or something like that).

> What about templates? In C++ it is a problem that the compiler will
> instantiate templates repeatedly, say if I use vector<string> in 20
> source files, the compiler will generate and store 20 copies of
> vector<string> (plus 20 copies of basic_string<char>, too) in object files.
>
> 1. So in D, if I compile the 20 sources separately, does the same thing
> happen (same collection template instantiated 20 times with all 20
> copies stored)?

If you compile them separately I think so, yes. How would it otherwise 
work, store some info between compile runs?

> 2. If I compile the 20 sources all together, I guess the template would
> be instantiated just once, but then which .obj file does the
> instantiated template go in?

I think it only need to instantiate it once. If it does that or not, I 
don't know. About the object file, that is probably unspecified. 
Although if you compile with the -lib flag it will output the templates 
to all object files. This is one of the problems making it hard to 
create an incremental build system for D.


> I figure as CTFE is used more, especially when it is used to decide
> which template overloads are valid or how a mixin will behave, this will
> slow down the compiler more and more, thus making incremental builds
> more important. A typical example would be a compile-time
> parser-generator, or compiled regexes.

I think that's correct. I did some simple benchmarking comparing 
different uses of string mixins in Derelict. It turns out that it's a 
lot better to have few string mixins containing a lot of code then many 
string mixins containing very little code. I suspect other meta 
programming features (CTFE, templates, static if, mixins) could behave 
in a similar way.

> Plus, I've heard some people complaining that the compiler uses over 1
> GB RAM, and splitting up compilation into parts might help with that.

Yeah, I just run in to a compiler bug (not been able to create a simple 
test case) where it consumed around 3.5 GB of memory then just crashed 
after a while.

> BTW, I think I heard the compiler uses multithreading to speed up the
> build, is that right?

Yes, I'm pretty sure it reads all (many) the files in concurrently or in 
parallel. It probably can lex and parse in parallel as well, don't know 
if it does that though.


> Anyway, I can't even figure out how to enumerate the members of a module
> A; __traits(allMembers, A) causes "Error: import Y has no members".

Currently there's a bug which forces you to put the module in a package, 
try:

module foo.A;

__traits(allMembers, foo.A);

-- 
/Jacob Carlborg
July 26, 2012
Re: What is the compilation model of D?
On Wed, 2012-07-25 at 01:03 -0700, Jonathan M Davis wrote:
[…]
> I've heard of overnight builds, and I've heard of _regression tests_ running 
> for over a week, but I've never heard of builds being over 2 days. Ouch.

Indeed the full test suite did take about a week to run. I think the
core problem then was it was 2006, computers were slower, parallel
compilation was not as well managed as multicore hadn't really taken
hold, and they were doing the equivalent of trying -O2 and -O3 to see
which space/time balance was best.

> It has got to have been possible to have a shorter build than that. Of course, 
> if their code was bad enough that the build was that long, it may have been 
> rather disgusting code to clean up. But then again, maybe they genuinely had a 
> legitimate reason for having the build take that long. I'd be very surprised 
> though.

These were smart people, so my suspicion is very much that there was a
necessary complexity. I think there was also an element of they were in
the middle of a global refactoring. I suspect they have now had time to
get stuff into a better state, but I do not know.

> In any case, much as I like C++ (not as much as D, but I still like it quite a 
> bit), its build times are undeniably horrible.

Indeed, especially with -O2 or -O3.

This is an area where VM + JIT can actually make things a lot better.
Optimization happens on actually running code and is therefore focused
on the "hot spot" rather than trying to optimize the entire code base.
Java is doing this quite successfully, as is PyPy.

-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder
July 26, 2012
Re: What is the compilation model of D?
On Thu, 26 Jul 2012 09:27:03 +0100
Russel Winder <russel@winder.org.uk> wrote:

> On Wed, 2012-07-25 at 01:03 -0700, Jonathan M Davis wrote:
> 
> > In any case, much as I like C++ (not as much as D, but I still like
> > it quite a bit), its build times are undeniably horrible.
> 
> Indeed, especially with -O2 or -O3.
> 
> This is an area where VM + JIT can actually make things a lot better.
> Optimization happens on actually running code and is therefore focused
> on the "hot spot" rather than trying to optimize the entire code base.
> Java is doing this quite successfully, as is PyPy.
> 

That's not something that actually necessitates a VM though. It's just
that no native-compiled language (to my knowledge) has actually put
something like that into its runtime yet.
Next ›   Last »
1 2 3
Top | Discussion index | About this forum | D home