Jump to page: 1 2 3
Thread overview
Incremental compilation with DMD
Sep 11, 2009
Tom S
Sep 11, 2009
Ary Borenszweig
Sep 11, 2009
Robert Jacques
Sep 12, 2009
Walter Bright
Sep 12, 2009
Tom S
Sep 12, 2009
Tom S
Sep 12, 2009
Walter Bright
Sep 12, 2009
Tom S
Sep 13, 2009
Walter Bright
Sep 13, 2009
Tom S
Sep 13, 2009
Walter Bright
Sep 13, 2009
Tom S
Sep 13, 2009
Walter Bright
Sep 13, 2009
Tom S
Sep 13, 2009
Walter Bright
Sep 13, 2009
Don
Sep 13, 2009
Tom S
Sep 15, 2009
Tom S
Sep 17, 2009
Walter Bright
Sep 18, 2009
Tom S
Sep 18, 2009
Walter Bright
Sep 18, 2009
Tom S
Sep 18, 2009
Walter Bright
Sep 18, 2009
Tom S
Sep 17, 2009
Walter Bright
Sep 18, 2009
Tom S
September 11, 2009
Short story: DMD probably needs an option to output template instances to all object files that need them.

Long story:

I've been trying to make incremental compilation in xfBuild reliable, but it turns out that it's really tricky with DMD. Consider the following example:

* module A instantiates template T from module C
* module B instantiates the same template T from module C (with the same arguments)
* compile all modules at the same time in the order: A, B, C
* now A.obj contains the instantiation of T
* remove the instantiation from the A module
* perform an incremental compilation - 'A' was changed, so only it has to be recompiled
* linking of A.obj, B.obj and C.obj fails because no module has the instantiation of T for B.obj

What happens is that the optimization in DMD to only emit templates to the first module that needs it creates implicit inter-module dependencies. I've tried tracking them by modifying DMD, but still wouldn't find them all - it seems that one would have to dig deep in the codegen, my attempts at hacking the frontend (mostly template.c) weren't enough.

Yet, I still managed to get some of these implicit dependencies figured and attempted using this extra info in xfBuild when deciding what to compile incrementally. I've tossed it on a project of mine with > 350 modules and no circular imports. The result was that even a trivial change caused most of the project to be pulled into compilation.

When doing regular incremental compilation, all modules that import the modified ones must be recompiled as well. And all modules that import these, and so on, up to the root of the project. This is because the incremental build tool must assume that the modules that import module 'A' could have code of the form 'static if (A.something) { ... } else { ... }' or another form of it. As far as I know, it's not trivial to detect whether this is really the case or whether the change is isolated to 'A'.

When trying to cope with the implicit dependencies caused by template instantiations and references, one also has to recompile all modules that contain template references to a module/object file which gets the instance. In the first example, it would mean recompiling module 'B' whenever 'A' changes. The graph of dependencies here doesn't depend very much on the structure of imports in a project, but rather in the order that DMD decides to run semantic() on template instances.

Add up these two conservative mechanisms and it turns out that tweaking a simple function causes half of your project to be rebuilt. This is not acceptable. Even if it was feasible - getting these implicit dependencies is probably a matter of either hacking the backend or dumping object files and matching unresolved symbols with comdats. Neither would be very fast or portable.

Compiling modules one-at-a-time is not a solution because it's too slow.

Thus my suggestion of adding an option to DMD so it may emit template instances to all object files that use them. If anyone has alternative ideas, I'd be glad to hear them, because I'm running out of options. The approach I'm currently using in an experimental version of xfBuild is:

* get a fixed order of modules to be compiled determined by the order DMD calls semantic() on them with the root modules at the end
* when a module is modified, additionally recompile all modules that occur after it in the list

This quite obviously ends up compiling way too many modules, but seems to work reliably (except when OPTLINK decides to crash) without requiring full rebuilds all the time. Still, I fear there might be corner cases where it will fail as well. DMD sometimes places initializers in weird places, e.g.:

.objs\xf-nucleus-model-ILinkedKernel.obj(xf-nucleus-model-ILinkedKernel)
 Error 42: Symbol Undefined _D61TypeInfo_S2xf7nucleus9particles13BasicParticle13BasicParticle6__initZ

The two modules (xf.nucleus.model.ILinkedKernel and xf.nucleus.particles.BasicParticle) are unrelated. This error occured once, somewhere deep into an automated attempt to break the experimental xfBuild by touching random modules and performing incremental builds.


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode
September 11, 2009
Tom S escribió:
> Short story: DMD probably needs an option to output template instances to all object files that need them.

Hi Tom,

What you describe here is very interesting and useful. I think of adding an incremental builder to Descent in some point in the future and I'll probably encounter the same problem.

So I vote++ to emmiting template instances in every obj that uses them.
September 11, 2009
On Fri, 11 Sep 2009 07:47:11 -0400, Tom S <h3r3tic@remove.mat.uni.torun.pl> wrote:
> Short story: DMD probably needs an option to output template instances to all object files that need them.
>
> Long story:
>
> I've been trying to make incremental compilation in xfBuild reliable, but it turns out that it's really tricky with DMD. Consider the following example:
>
> * module A instantiates template T from module C
> * module B instantiates the same template T from module C (with the same arguments)
> * compile all modules at the same time in the order: A, B, C
> * now A.obj contains the instantiation of T
> * remove the instantiation from the A module
> * perform an incremental compilation - 'A' was changed, so only it has to be recompiled
> * linking of A.obj, B.obj and C.obj fails because no module has the instantiation of T for B.obj
>
> What happens is that the optimization in DMD to only emit templates to the first module that needs it creates implicit inter-module dependencies. I've tried tracking them by modifying DMD, but still wouldn't find them all - it seems that one would have to dig deep in the codegen, my attempts at hacking the frontend (mostly template.c) weren't enough.
>
> Yet, I still managed to get some of these implicit dependencies figured and attempted using this extra info in xfBuild when deciding what to compile incrementally. I've tossed it on a project of mine with > 350 modules and no circular imports. The result was that even a trivial change caused most of the project to be pulled into compilation.
>
> When doing regular incremental compilation, all modules that import the modified ones must be recompiled as well. And all modules that import these, and so on, up to the root of the project. This is because the incremental build tool must assume that the modules that import module 'A' could have code of the form 'static if (A.something) { ... } else { ... }' or another form of it. As far as I know, it's not trivial to detect whether this is really the case or whether the change is isolated to 'A'.
>
> When trying to cope with the implicit dependencies caused by template instantiations and references, one also has to recompile all modules that contain template references to a module/object file which gets the instance. In the first example, it would mean recompiling module 'B' whenever 'A' changes. The graph of dependencies here doesn't depend very much on the structure of imports in a project, but rather in the order that DMD decides to run semantic() on template instances.
>
> Add up these two conservative mechanisms and it turns out that tweaking a simple function causes half of your project to be rebuilt. This is not acceptable. Even if it was feasible - getting these implicit dependencies is probably a matter of either hacking the backend or dumping object files and matching unresolved symbols with comdats. Neither would be very fast or portable.
>
> Compiling modules one-at-a-time is not a solution because it's too slow.
>
> Thus my suggestion of adding an option to DMD so it may emit template instances to all object files that use them. If anyone has alternative ideas, I'd be glad to hear them, because I'm running out of options. The approach I'm currently using in an experimental version of xfBuild is:
>
> * get a fixed order of modules to be compiled determined by the order DMD calls semantic() on them with the root modules at the end
> * when a module is modified, additionally recompile all modules that occur after it in the list
>
> This quite obviously ends up compiling way too many modules, but seems to work reliably (except when OPTLINK decides to crash) without requiring full rebuilds all the time. Still, I fear there might be corner cases where it will fail as well. DMD sometimes places initializers in weird places, e.g.:
>
> .objs\xf-nucleus-model-ILinkedKernel.obj(xf-nucleus-model-ILinkedKernel)
>   Error 42: Symbol Undefined _D61TypeInfo_S2xf7nucleus9particles13BasicParticle13BasicParticle6__initZ
>
> The two modules (xf.nucleus.model.ILinkedKernel and xf.nucleus.particles.BasicParticle) are unrelated. This error occured once, somewhere deep into an automated attempt to break the experimental xfBuild by touching random modules and performing incremental builds.
>
>

On the other hand, one-at-a-time builds can be done in parallel if you have multi-cores. Of course, still not a net win on my system, so vote++
September 12, 2009
Tom S wrote:
> Thus my suggestion of adding an option to DMD so it may emit template instances to all object files that use them. If anyone has alternative ideas, I'd be glad to hear them, because I'm running out of options.

Try compiling with -lib, which will put each template instance into its own obj file.
September 12, 2009
Walter Bright wrote:
> Tom S wrote:
>> Thus my suggestion of adding an option to DMD so it may emit template instances to all object files that use them. If anyone has alternative ideas, I'd be glad to hear them, because I'm running out of options.
> 
> Try compiling with -lib, which will put each template instance into its own obj file.

Thanks for the suggestion. Unfortunately it's a no-go since -lib seems to have the same issue that compiling without -op does - if you have multiple modules with the same name (but different packages), one will overwrite the other in the lib. On the other hand, I was able to hack DMD a bit and use -multiobj since your suggestion gave me an idea :)

Basically, the approach would be to compile the project with -multiobj and move the generated objects to a local (per project) directory, renaming them so no conflicts arise.

The next step is to determine all public and comdat symbols in all of these object files - this might be done via a specialized program, however I've used Burton Radons' exelib to optimally run libunres.exe from DMC. The exports are saved to some sort of a database (a dumb structured file is ok).

The following is done on the initial build - so the next time we have some object files in a directory and a map of all their exported symbols. In an incremental step, we'll compile the modified modules, but don't move their object files immediately over to the special directory. We'll instead scan their public and comdat symbols and figure out which object files they replace from our already compiled set. For each symbol in the newly compiled objects, find which object in the original set defined it, then mark it. For all marked files, add them to a library ( I call it junk.lib ), then remove the source object. Finally, move the newly compiled objects to the special object directory.

The junk.lib will be used if the newly compiled object files missed any shared symbols that were in the old objects and that would be generated, had more modules be passed to the compiler. In other words, it contains symbols that the naive incremental compilation will lose.

When linking, all object files from the directory are passed explicitly to the compiler and symbols are pulled eagerly from them, however junk.lib will be queried only if a symbol cannot be found in the set of objects in the special directory.

I've put up a proof-of-concept implementation at http://h3.team0xf.com/increBuild.7z . It requires a slightly patched DMD (well, hacked actually), so it prints out the names of all objects it generates. Basically, uncomment the `printf("writing '%s'\n", fname);` in glue.c at line 133 and add `printf("writing '%s'\n", m->objfile->name->str);` after `m->genobjfile(global.params.multiobj);` in mars.c. I'm compiling the build tool with a recent (SVN-ish) version of Tango and DMD 1.047.

As for my own impressions of this idea, its biggest drawback probably is that the multitude of object files created via -multiobj strains the filesystem. Even when running on a ramdrive, my WinXP-based system took a good fraction of a second to move a few hundred object files to their destination directory. This can probably be improved on, as -multiobj seems to produce some empty object files (at least according to libunres and ddlinfo). It might also be possible to use specialized storage for object files by patching up dmd and hooking OPTLINK's calls to CreateFile. I'm not sure about Linux, but perhaps something based on FUSE might work. These last options are probably long shots, so I'm still quite curious how DMD might perform with outputting template instantiations into each object file that uses them.


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode
September 12, 2009
Tom S wrote:
> Walter Bright wrote:
>> Tom S wrote:
>>> Thus my suggestion of adding an option to DMD so it may emit template instances to all object files that use them. If anyone has alternative ideas, I'd be glad to hear them, because I'm running out of options.
>>
>> Try compiling with -lib, which will put each template instance into its own obj file.
> 
> Thanks for the suggestion. Unfortunately it's a no-go since -lib seems to have the same issue that compiling without -op does - if you have multiple modules with the same name (but different packages), one will overwrite the other in the lib.

To clarify, this is not the only issue with -lib. The libs would either have to be expanded into objects or static ctors would not run. And why extract them if -multiobj already generates them extracted?


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode
September 12, 2009
Tom S wrote:
> As for my own impressions of this idea, its biggest drawback probably is that the multitude of object files created via -multiobj strains the filesystem.

Sure, but -multiobj and -lib generate exactly the same object files, it's just that -lib puts them all into a library so it doesn't strain the file system.

Extracting the obj files from the lib is pretty simple, you can see the libomf.c for the format.
September 12, 2009
Walter Bright wrote:
> Tom S wrote:
>> As for my own impressions of this idea, its biggest drawback probably is that the multitude of object files created via -multiobj strains the filesystem.
> 
> Sure, but -multiobj and -lib generate exactly the same object files, it's just that -lib puts them all into a library so it doesn't strain the file system.
> 
> Extracting the obj files from the lib is pretty simple, you can see the libomf.c for the format.

You're right, I'm sorry. I must've overlooked something in the lib dumps and assumed one module overwrites the other.

So with -lib, it should be possible to only extract the object files that contain static constructors and the main function and keep the rest packed up. Does that sound about right?

By the way, using -lib causes DMD to eat a LOT of memory compared to the 'normal' mode - in one of my projects, it eats up easily > 1.2GB and dies. This could be a downside to this approach. I haven't tested whether it's the same with -multiobj

Would it be hard to add an option to DMD to control template emission? Apparently GDC has -femit-templates, so it's doable ;) LDC outputs instantiations to all objects.


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode
September 13, 2009
Tom S wrote:
> Walter Bright wrote:
>> Tom S wrote:
>>> As for my own impressions of this idea, its biggest drawback probably is that the multitude of object files created via -multiobj strains the filesystem.
>>
>> Sure, but -multiobj and -lib generate exactly the same object files, it's just that -lib puts them all into a library so it doesn't strain the file system.
>>
>> Extracting the obj files from the lib is pretty simple, you can see the libomf.c for the format.
> 
> You're right, I'm sorry. I must've overlooked something in the lib dumps and assumed one module overwrites the other.
> 
> So with -lib, it should be possible to only extract the object files that contain static constructors and the main function and keep the rest packed up. Does that sound about right?

All the .lib file is, is:

[header]
[all the object files concatenated together and aligned]
[dictionary and index]

Linux .a libraries are the same idea, just a different format for the header, dictionary and index. The obj files are unmodified in the library. You can extract them based on whatever criteria you need.

> By the way, using -lib causes DMD to eat a LOT of memory compared to the 'normal' mode - in one of my projects, it eats up easily > 1.2GB and dies. This could be a downside to this approach. I haven't tested whether it's the same with -multiobj

Hmm. I build Phobos with -lib, and haven't experienced any problems, but it's possible as dmd doesn't ever discard any memory.


> Would it be hard to add an option to DMD to control template emission? Apparently GDC has -femit-templates, so it's doable ;) LDC outputs instantiations to all objects.

I've found the LDC approach to be generally a poor one (having much experience with it for C++, where there is no choice). It generates huge object files and there are often linker problems trying to remove the duplicates. I really got tired of "COMDAT" problems with linkers, and no, it wasn't just with Optlink. Having each template instantiation in its own obj file works out great, eliminating all those problems.

I don't really understand why the -lib approach is not working for your needs.
September 13, 2009
Walter Bright wrote:
> I don't really understand why the -lib approach is not working for your needs.

I'm not sure what you mean by "the -lib approach". Just how do you exactly apply it to incremental compilation? If my project has a few hundred modules and I change just one line in one function, I don't want to rebuild it with -lib all again. I thought you were referring to the proof-of-concept incremental build tool I posted yesterday which used -multiobj, as it should be possible to optimize it using -lib... I just haven't tried that yet.


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode
« First   ‹ Prev
1 2 3