September 13, 2009
Tom S wrote:
> Walter Bright wrote:
>> I don't really understand why the -lib approach is not working for your needs.
> 
> I'm not sure what you mean by "the -lib approach". Just how do you exactly apply it to incremental compilation? If my project has a few hundred modules and I change just one line in one function, I don't want to rebuild it with -lib all again. I thought you were referring to the proof-of-concept incremental build tool I posted yesterday which used -multiobj, as it should be possible to optimize it using -lib... I just haven't tried that yet.

You only have to build one source file with -lib, not all of them.
September 13, 2009
Walter Bright wrote:
> Tom S wrote:
>> Walter Bright wrote:
>>> I don't really understand why the -lib approach is not working for your needs.
>>
>> I'm not sure what you mean by "the -lib approach". Just how do you exactly apply it to incremental compilation? If my project has a few hundred modules and I change just one line in one function, I don't want to rebuild it with -lib all again. I thought you were referring to the proof-of-concept incremental build tool I posted yesterday which used -multiobj, as it should be possible to optimize it using -lib... I just haven't tried that yet.
> 
> You only have to build one source file with -lib, not all of them.

So you mean compiling each file separately? That's only an option if we turn to the C/C++ way of doing projects - using .di files just like C headers - *everywhere*. Only then can changes in .d files be localized to just one module and compiled quickly. Java and C# do without header files because (to my knowledge) they have no means of changing what's compiled based on the contents of an imported module (basically they lack metaprogramming).

So we could give up and do it the C/C++ way with lots of duplicated code in headers (C++ is better here with allowing you to only implement methods of a class in the .cpp file instead of rewriting the complete class and filling in member functions, like the .d/.di approach would force) or we might have an incremental build tool that doesn't suck.

This is the picture as I see it:

* I need to rebuild all modules that import the changed modules, because some code in them might evaluate differently (static ifs on the imported modules, for instance - I explained that in my first post in this topic).

* I need to compile them all at once, because compiling each of them in succession yields massively long compile times.

* With your suggestion of using -lib, I assumed that you were suggesting building all these modules at once into a lib and then figuring out what to do with their object files one by one.

* Some object files need to be extracted because otherwise module ctors won't be linked into the executable.

* As this is incremental compilation, there will be object files from the previous build, some of which should not be linked, because that would cause multiple definition errors.

* The obsoleted object files can't be simply removed, since they might contain comdat symbols needed by some objects outside of the newly compiled set (I gave an example in my first post, but can provide actual D code that illustrates this issue). Thus they have to be moved into a lib and only pulled into linking on demand.

That's how my experimental build tool maps to the "-lib approach".


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode
September 13, 2009
Tom S wrote:
> Walter Bright wrote:
>> Tom S wrote:
>>> Walter Bright wrote:
>>>> I don't really understand why the -lib approach is not working for your needs.
>>>
>>> I'm not sure what you mean by "the -lib approach". Just how do you exactly apply it to incremental compilation? If my project has a few hundred modules and I change just one line in one function, I don't want to rebuild it with -lib all again. I thought you were referring to the proof-of-concept incremental build tool I posted yesterday which used -multiobj, as it should be possible to optimize it using -lib... I just haven't tried that yet.
>>
>> You only have to build one source file with -lib, not all of them.
> 
> So you mean compiling each file separately?

Yes. Or a subset of the files.

> That's only an option if we turn to the C/C++ way of doing projects - using .di files just like C headers - *everywhere*. Only then can changes in .d files be localized to just one module and compiled quickly. Java and C# do without header files because (to my knowledge) they have no means of changing what's compiled based on the contents of an imported module (basically they lack metaprogramming).
> 
> So we could give up and do it the C/C++ way with lots of duplicated code in headers (C++ is better here with allowing you to only implement methods of a class in the .cpp file instead of rewriting the complete class and filling in member functions, like the .d/.di approach would force) or we might have an incremental build tool that doesn't suck.
> 
> This is the picture as I see it:
> 
> * I need to rebuild all modules that import the changed modules, because some code in them might evaluate differently (static ifs on the imported modules, for instance - I explained that in my first post in this topic).
> 
> * I need to compile them all at once, because compiling each of them in succession yields massively long compile times.
> 
> * With your suggestion of using -lib, I assumed that you were suggesting building all these modules at once into a lib and then figuring out what to do with their object files one by one.
> 
> * Some object files need to be extracted because otherwise module ctors won't be linked into the executable.
> 
> * As this is incremental compilation, there will be object files from the previous build, some of which should not be linked, because that would cause multiple definition errors.
> 
> * The obsoleted object files can't be simply removed, since they might contain comdat symbols needed by some objects outside of the newly compiled set (I gave an example in my first post, but can provide actual D code that illustrates this issue). Thus they have to be moved into a lib and only pulled into linking on demand.
> 
> That's how my experimental build tool maps to the "-lib approach".

What you can try is creating a database that is basically a lib (call it A.lib) of all the modules compiled with -lib. Then recompile all modules that depend on changed modules in one command, also with -lib, call it B.lib. Then for all the obj's in B, replace the corresponding ones in A.
September 13, 2009
Walter Bright wrote:
> What you can try is creating a database that is basically a lib (call it A.lib) of all the modules compiled with -lib. Then recompile all modules that depend on changed modules in one command, also with -lib, call it B.lib. Then for all the obj's in B, replace the corresponding ones in A.

That's what I'm getting at :)


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode
September 13, 2009
Tom S wrote:
> Walter Bright wrote:
>> What you can try is creating a database that is basically a lib (call it A.lib) of all the modules compiled with -lib. Then recompile all modules that depend on changed modules in one command, also with -lib, call it B.lib. Then for all the obj's in B, replace the corresponding ones in A.
> 
> That's what I'm getting at :)

With this approach, you could wind up with some 'dead' obj files in A.lib, but aside from a bit of bloat in the lib file, they'll never wind up in the executable.
September 13, 2009
Walter Bright wrote:
> Tom S wrote:
>> Walter Bright wrote:
>>> What you can try is creating a database that is basically a lib (call it A.lib) of all the modules compiled with -lib. Then recompile all modules that depend on changed modules in one command, also with -lib, call it B.lib. Then for all the obj's in B, replace the corresponding ones in A.
>>
>> That's what I'm getting at :)
> 
> With this approach, you could wind up with some 'dead' obj files in A.lib, but aside from a bit of bloat in the lib file, they'll never wind up in the executable.

I'm feeling horribly guilty for having asked for module-level static if(). I have a dreadful suspicion that it might have been a profoundly bad idea.
September 13, 2009
Don wrote:
> Walter Bright wrote:
>> Tom S wrote:
>>> Walter Bright wrote:
>>>> What you can try is creating a database that is basically a lib (call it A.lib) of all the modules compiled with -lib. Then recompile all modules that depend on changed modules in one command, also with -lib, call it B.lib. Then for all the obj's in B, replace the corresponding ones in A.
>>>
>>> That's what I'm getting at :)
>>
>> With this approach, you could wind up with some 'dead' obj files in A.lib, but aside from a bit of bloat in the lib file, they'll never wind up in the executable.
> 
> I'm feeling horribly guilty for having asked for module-level static if(). I have a dreadful suspicion that it might have been a profoundly bad idea.

No need to feel guilty. This problem actually manifests itself in many other cases than just static if, e.g. changing an alias in the modified module, adding some fields to a struct or methods to a class. Basically anything that would bite us if we had C/C++ projects solely in .h files (except multiple definition errors). I've prepared some examples (.d and .bat files) of these at http://h3.team0xf.com/dependencyFail.7z (-version is used instead of literally changing the code). I have no idea how Java or C# deal with these. Could be smart linking or some sort of static analysis.

As for the 'dead' obj files, one could run a 'garbage collection' step from time to time ;)

-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode
September 15, 2009
Walter Bright wrote:
> What you can try is creating a database that is basically a lib (call it A.lib) of all the modules compiled with -lib. Then recompile all modules that depend on changed modules in one command, also with -lib, call it B.lib. Then for all the obj's in B, replace the corresponding ones in A.

OK, there we go: http://h3.team0xf.com/increBuild2.7z     // I hope it's fine to include LIBUNRES here. It's just for convenience.

This is the second incarnation of that incremental build tool experiment. This time it uses -lib instead of -multiobj, as suggested by Walter.

The algorithm works as follows:

* compile modules to a .lib file
* extract objects with static ctors or the __Dmain function (remove them from the lib)
* find out which old object files should be replaced
	* any objects whose any symbols were re-generated in this compilation pass
* pack up the obsoleted object files into a 'junk' library
* prepend the 'junk' library to the /library chain/
* prepend the newly compiled library to the /library chain/
* link the executable by passing the cached object files and the whole library chain to the linker

It doesn't use the simple approach of having just one 'junk'/'A.lib' library and appending objects to it, because that's pretty slow due to the librarian having to re-generate the dictionary at each such operation. So instead it keeps a chain of all libraries generated in this process and passes them to the linker in the right order. This will waste more space than the naive approach, but should be faster.

The archive contains the source code and a compiled binary (DMD-Win only for now... Sorry, folks) as well as a little test in the test/ directory. It shows how naive incremental compilation fails (break.bat) and how this tool works (work.bat).

The tool can be used with the latest Mercurial revision of xfBuild ( http://bitbucket.org/h3r3tic/xfbuild/ ) by passing "+cincreBuild" to it. The support is a massive hack though, so expect some strangeness.

I was able to run it on the 'Test1' demo of my Hybrid GUI ( http://team0xf.com:1024/hybrid/file/c841d95675ca/Test1.d ) and a simple/dumb ray tracer based on OMG ( http://team0xf.com:1024/omg/file/5199ed783490/Tracer.d ). In incremental compilation it's not noticeably slower than the naive approach, however DMD consumes more memory in the -lib mode and the executables produced by this approach are larger for some reason. For instance, with Hybrid, Test1.exe has about 20MB with increBuild, compared to about 5MB with the traditional approach. Perhaps there's some simple way to remove this bloat, as compressed with UPX even with the fastest compression method the executables differ by just a few kilobytes.

When building my second largest project, DMD eats up about 1.2GB of memory and dies (even without -g). Luckily, xfBuild allows me to set the limit of modules to be compiled at a time, so when I cap it to 200, it compiled... but didn't link :( Somewhere in the process a library is created that confuses OPTLINK as well as "lib -l". There's one symbol in it that neither of these are unable to see and it results in an undefined reference when linking. The symbol is clearly there when using a lib dumping tool from DDL or "libunres -d -c". I've dropped the lib at http://h3.team0xf.com/strangeLib.7z . The symbol in question is compressed and this newsgroup probably won't chew the non-ansi chars well, but it can be found via a regex "D2xf3omg4core.*ctFromRealVee0P0Z".

One thing slowing this tool down is the need to call the librarian multiple times. DMD -lib will sometimes generate multiple objects with the same name and you can only extract them (when using the librarian) by running lib -x multiple times. DMD should probably be patched up to include fully qualified module names in objects instead of just the last name (foo.Mod and bar.Mod both yield Mod.obj in the library), as -op doesn't seem to help here.

Another idea that will map well onto any incremental builder would be to write a tool that will find the differences between modules and tell whether e.g. they're limited to function bodies. Then an incremental builder could assume that it doesn't have to recompile any dependencies, just this one modified file. Unfortunately, this assumption doesn't always hold - functions could be used via CTFE to generate code, thus the changes escape. Personally I'm of the opinion that functions should be explicitly marked for CTFE, and this is just another reason for such. I'm using a patched DMD with added pragma(ctfe) which instructs the compiler not to run any codegen or generate debug info functions/aggregates marked as such. This trick alone can slim an executable down by a good megabyte, which sometimes is a life-saver with OPTLINK. I've been hearing that other people put their CTFE stuff into .di files, but this approach doesn't cover all cases of codegen via CTFE and string mixins.

I'm afraid I won't be doing any other prototypes shortly - I really need to focus on my master's thesis :P But then, I don't really know how this tool can be improved without hacking the compiler or writing custom OMF processing.


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode
September 17, 2009
Tom S wrote:
> Personally I'm of the opinion that functions should be explicitly marked for CTFE, and this is just another reason for such. I'm using a patched DMD with added pragma(ctfe) which instructs the compiler not to run any codegen or generate debug info functions/aggregates marked as such. This trick alone can slim an executable down by a good megabyte, which sometimes is a life-saver with OPTLINK.

If you are compiling files with -lib, and nobody calls those CTFE functions at runtime, then they should never be linked in. (Virtual functions are always linked in, as they have a reference to them even if they are never called.)

Executables built this way shouldn't have dead functions in them.
September 17, 2009
Tom S wrote:
> When building my second largest project, DMD eats up about 1.2GB of memory and dies (even without -g). Luckily, xfBuild allows me to set the limit of modules to be compiled at a time, so when I cap it to 200, it compiled... but didn't link :( Somewhere in the process a library is created that confuses OPTLINK as well as "lib -l". There's one symbol in it that neither of these are unable to see and it results in an undefined reference when linking. The symbol is clearly there when using a lib dumping tool from DDL or "libunres -d -c". I've dropped the lib at http://h3.team0xf.com/strangeLib.7z . The symbol in question is compressed and this newsgroup probably won't chew the non-ansi chars well, but it can be found via a regex "D2xf3omg4core.*ctFromRealVee0P0Z".

Please post to bugzilla.


> One thing slowing this tool down is the need to call the librarian multiple times. DMD -lib will sometimes generate multiple objects with the same name

Please post to bugzilla.