Thread overview
Why doesn't DMD create any redundant symbols?
Aug 28, 2007
Gregor Richards
Aug 28, 2007
Walter Bright
Aug 28, 2007
Gregor Richards
Aug 28, 2007
Walter Bright
Feb 14, 2008
Frank Benoit
Feb 15, 2008
Christopher Wright
Feb 16, 2008
Derek Parnell
Feb 16, 2008
Alexander Panek
August 28, 2007
This is a problem that comes up for me again and again in making DSSS work everywhere. When DMD is being used to compile several modules with -c, it never creates any redundant data, and it also doesn't mark any data which could be redundant as common as far as I can tell. This means that DSSS has to build one file at a time with DMD. This makes certain obnoxious people complain about DSSS being slow, because it takes an incredible ten seconds to compile a fairly large library. When I switched it to compiling multiple files simultaneously, it takes <1 second, but was wrong for reasons that will be described below.

When DMD comes over typeinfo (for example), it only puts the typeinfo symbol into one .o file it is generating, even if it's used within several. On the surface, this seems like a good idea, but in reality it causes a whole slew of problems with bogus intermodule dependencies. With this, foo.io.output could arbitrarily depend on foo.net.ipvsix.udp, because some piece of typeinfo was put there.

First, libraries. I don't know precisely how .lib files work on Windows, but linking .a files will pick-and-choose only those .o files that are used. With these bogus inter-module dependencies, it will often be forced to drag in the whole library, even though only a small chunk of it is actually necessary. This just causes big binaries, except when libraries have conditional dependencies - if foo.a depends on another library, but foo.b does not, it is now unpredictable what libraries are necessary. Oof.

Second, incremental compilation. This is one I didn't realize was a problem until recently. DSSS will perform incremental compilation when only one file has changed by only compiling that file. However, that causes more issues with these common data problems. Now, typeinfo could be doubly defined but not marked common, or (by means I don't quite understand) not defined at all. So, I now have to compile one file at a time, even when building binaries.

The solution to all of this is simple: Create redundant symbols in the object files, marked as common. I know this can be done because it's done properly with one file at a time. This increases the size of the object files, but since it reduces bogus intermodule dependencies and sections marked as common will be merged anyway, it actually reduces the size of produced binaries, as well as making linking a significantly less complex problem.

I have to assume there's a reason for this, so, to summarize: Why doesn't DMD create any redundant symbols in .o files?

 - Gregor Richards
August 28, 2007
Gregor Richards wrote:
> I have to assume there's a reason for this, so, to summarize: Why doesn't DMD create any redundant symbols in .o files?

It can improve build speed a lot. With C++, which doesn't do this, huge .obj files can be generated.

The compiler assumes that if there are multiple modules on the command line, they'll all be linked together, so why generate redundant output?
August 28, 2007
Walter Bright wrote:
> Gregor Richards wrote:
>> I have to assume there's a reason for this, so, to summarize: Why doesn't DMD create any redundant symbols in .o files?
> 
> It can improve build speed a lot. With C++, which doesn't do this, huge .obj files can be generated.
> 
> The compiler assumes that if there are multiple modules on the command line, they'll all be linked together, so why generate redundant output?

OK, so how about for those willing to (or required to) take the performance penalty, adding an option to create redundant data? I imagine the speed difference between compiling one file at a time and compiling all at once but with redundant data is greater than the speed difference between compiling all at once with and without redundant data, so your improvement to build speed significantly hinders my build speed.

 - Gregor Richards
August 28, 2007
Gregor Richards wrote:
> Walter Bright wrote:
>> Gregor Richards wrote:
>>> I have to assume there's a reason for this, so, to summarize: Why doesn't DMD create any redundant symbols in .o files?
>>
>> It can improve build speed a lot. With C++, which doesn't do this, huge .obj files can be generated.
>>
>> The compiler assumes that if there are multiple modules on the command line, they'll all be linked together, so why generate redundant output?
> 
> OK, so how about for those willing to (or required to) take the performance penalty, adding an option to create redundant data? I imagine the speed difference between compiling one file at a time and compiling all at once but with redundant data is greater than the speed difference between compiling all at once with and without redundant data, so your improvement to build speed significantly hinders my build speed.

It's a good idea, but it would be a fair bit of work the way dmd is designed.
February 14, 2008
Walter Bright schrieb:
> Gregor Richards wrote:
>> Walter Bright wrote:
>>> Gregor Richards wrote:
>>>> I have to assume there's a reason for this, so, to summarize: Why doesn't DMD create any redundant symbols in .o files?
>>>
>>> It can improve build speed a lot. With C++, which doesn't do this, huge .obj files can be generated.
>>>
>>> The compiler assumes that if there are multiple modules on the command line, they'll all be linked together, so why generate redundant output?
>>
>> OK, so how about for those willing to (or required to) take the performance penalty, adding an option to create redundant data? I imagine the speed difference between compiling one file at a time and compiling all at once but with redundant data is greater than the speed difference between compiling all at once with and without redundant data, so your improvement to build speed significantly hinders my build speed.
> 
> It's a good idea, but it would be a fair bit of work the way dmd is designed.

DSSS has the option oneatatime=on as the default now, to avoid problems. But the compile time is no more acceptable.

Several ppl complained that after 15 min they canceled compilation of DWT. With doing it with oneatatime=off the same took <15 sec.

See also http://d.puremagic.com/issues/show_bug.cgi?id=1838



February 15, 2008
Frank Benoit wrote:
> Walter Bright schrieb:
>> Gregor Richards wrote:
>>> Walter Bright wrote:
>>>> Gregor Richards wrote:
>>>>> I have to assume there's a reason for this, so, to summarize: Why doesn't DMD create any redundant symbols in .o files?
>>>>
>>>> It can improve build speed a lot. With C++, which doesn't do this, huge .obj files can be generated.
>>>>
>>>> The compiler assumes that if there are multiple modules on the command line, they'll all be linked together, so why generate redundant output?
>>>
>>> OK, so how about for those willing to (or required to) take the performance penalty, adding an option to create redundant data? I imagine the speed difference between compiling one file at a time and compiling all at once but with redundant data is greater than the speed difference between compiling all at once with and without redundant data, so your improvement to build speed significantly hinders my build speed.
>>
>> It's a good idea, but it would be a fair bit of work the way dmd is designed.
> 
> DSSS has the option oneatatime=on as the default now, to avoid problems. But the compile time is no more acceptable.
> 
> Several ppl complained that after 15 min they canceled compilation of DWT. With doing it with oneatatime=off the same took <15 sec.
> 
> See also http://d.puremagic.com/issues/show_bug.cgi?id=1838
> 

The problem being that, without those possibly redundant symbols, you get stuff dying at link time because DMD never bothered to include the symbol anywhere?

Performance is secondary to correctness.
February 16, 2008
On Tue, 28 Aug 2007 10:36:03 -0700, Walter Bright wrote:

> Gregor Richards wrote:
>> I have to assume there's a reason for this, so, to summarize: Why doesn't DMD create any redundant symbols in .o files?
> 
> It can improve build speed a lot. With C++, which doesn't do this, huge .obj files can be generated.
> 
> The compiler assumes that if there are multiple modules on the command line, they'll all be linked together, so why generate redundant output?

However, this assumption is not a valid one. They are valid reasons to compile a set of files (all named on the one command line) that are not necessarily going to be linked together.

Also, tools such as make, rebuild and bud can determine which subset of a set of files has been changed and thus only recompiling the subset. I have found that doing this sometimes causes conflicting object file definitions between the subset object files and previously compiled object files from others in the full set.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
February 16, 2008
Walter Bright wrote:
> Gregor Richards wrote:
>> I have to assume there's a reason for this, so, to summarize: Why doesn't DMD create any redundant symbols in .o files?
> 
> It can improve build speed a lot. With C++, which doesn't do this, huge .obj files can be generated.

Pardon my ignorance, but, who cares? DMD is fast enough that such a time penalty for doing something correctly is "excusable" (in other words: needed).

Requiring [forcing] the developers of build tools to work around that problem seems kinda weird, to me.