February 22, 2007 Re: Lib change leads to larger executables | ||||
---|---|---|---|---|
| ||||
Posted in reply to Frits van Bommel | Frits van Bommel wrote:
> Sean Kelly wrote:
>> Ideally, perhaps a linker could provide both options: link fast and potentially bloat the exe or link carefully (and slowly) for a lean exe. I'd use the fast link for debugging and the slow link for releases. Assuming, of course, that the linker were reliable enough that there was no risk of changing app behavior between the two.
>
> That might not be the case here: if a module's object file is pulled in, that module's static constructors and destructors are called at runtime, right? So if different modules are pulled in with those options, different static constructors/destructors get called.
> (Same goes for unit tests, if enabled, by the way)
Yuck. Good point.
Sean
|
February 22, 2007 Re: Lib change leads to larger executables | ||||
---|---|---|---|---|
| ||||
Posted in reply to Frits van Bommel | On Thu, 22 Feb 2007 18:32:18 +0200, Frits van Bommel <fvbommel@REMwOVExCAPSs.nl> wrote:
> Sean Kelly wrote:
>> Ideally, perhaps a linker could provide both options: link fast and potentially bloat the exe or link carefully (and slowly) for a lean exe. I'd use the fast link for debugging and the slow link for releases. Assuming, of course, that the linker were reliable enough that there was no risk of changing app behavior between the two.
>
> That might not be the case here: if a module's object file is pulled in, that module's static constructors and destructors are called at runtime, right? So if different modules are pulled in with those options, different static constructors/destructors get called.
> (Same goes for unit tests, if enabled, by the way)
Hmm, yes, but how that's different from the today's situation? Currently the linker chooses *arbitrary* object modules that happen to contain the needed typeinfo.
|
February 22, 2007 Re: Lib change leads to larger executables | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anders F Björklund | Anders F Björklund wrote:
> John Reimer wrote:
>
>> That's not a good argument. ld is pig slow? I'm sorry but I don't get
>> that. It works; it works as intended; and, strangely, I don't hear people
>> complain about its apparent lack of speed.
>> So what if a linker is blitzingly fast. If it's outdated and broken,
>> there's not much to get excited about. I'll choose the slow working one
>> any day.
>
> I've find OPTLINK to hang and crash a lot when linking the wxD programs
> on Windows XP. But every time I try to reproduce it, it goes away... :-(
>
> So now I just run the "make" like three times in a row, and it usually
> succeeds in building everything. And yeah, it's rather fast in doing so.
>
> But I prefer the MinGW gdc/ld, since it works the first time but slower?
> (well that and that I have problems getting DMC to work with SDL / GL)
I see hangs occasionally even for small programs. Even on single files compiled with dmd -run. Every time it happens if I Ctrl-C kill it and run the same command again, everything is fine. Frequency is maybe like 1 out of every 50 compiles.
--bb
|
February 22, 2007 Re: Lib change leads to larger executables | ||||
---|---|---|---|---|
| ||||
Posted in reply to kris | kris wrote: > > 6) The dependency that /does/ cause the problem is one generated by the D compiler itself. It generates and injects false 'dependencies' across object modules. Specifically, that's effectively how the linker treats them. ... > 9) The Fake dependencies cause the linker to pick up and bind whatever module happens to satisfy it's need for the typeinfo resolution. In this case, the linker sees Core.obj with a char[][] decl exposed, so it say "hey, this is the place!" and binds it. Along with everything else that Core.obj actually requires. This is the crux of the problem. In C/C++, problem areas can typically be identified and addressed. In D however, the problem areas are related to "hidden" data and may manifest differently for different applications. They can still be identified and addressed, but the process is far more brittle, as any code change can have cascading effects on application size. Still, I don't entirely understand why this appears to not be an issue using Build, which has historically had bloat issues in some cases. Was it just luck, or do things actually change when objects are stored in a library as opposed to not? > Case in point: you have to strip the library down by hand, and very very carefully sift through the symbols and literally hundreds of library builds until you finally get lucky enough to stumble over the problem. > > Walter asserts that the linker can be tricked into doing the right thing. This seems to show a lack of understanding on his part about the problem and the manner in which the lib and linker operate. So far, I see two options for using Win32 libraries: the old way, which created relatively lean EXEs but had link errors for template code, or the new way, which requires the laborious process outlined above. Interestingly, eschewing libraries in favor of object-level linking has always worked and doesn't seem to exhibit either of the above problems (as I mentioned above). As much as people have been pushing for working D libraries, given the above alternatives I'm somewhat inclined to stick with Build unless I'm integrating with a C application. By the same token none of these problems appear to have ever existed on Linux, be it because of the ELF format, the 'ld' linker, or some other confluence of planetary alignment and sheer luck. Can anyone confirm that this is indeed true? > As was pointed out to me, OMF librarians actually uses a two-level hashmap to index the library entries. This is used by the linker to locate missing symbols. I think it's clear that this is not a linear lookup mechanism as had been claimed, and is borne out by experiments that show the process cannot be controlled, and the linker cannot be faked in usable or dependable manner. This may not be true of optlink however. I suspect the hashmap is probably more likely of linkers that do segment-level linking? There seems little point in the complexity otherwise. > I feel it important to point out that this powerful I18N package is, in no way, at fault here. The D compiler simply injected the wrong symbol into the wrong module at the wrong time, in the wrong order, and the result is that this package gets linked when it's not used or imported in any fashion by the code design. Instead, the dependency is created entirely by the compiler. That's a problem. It is a big problem. And it is a problem every D developer will face, at some point, when using DM tools. But it is not a problem rooted in the Tango code. Agreed. Tango was merely one of the first to encounter it because it's one of the first "large" D libraries. Sean |
February 22, 2007 Re: Lib change leads to larger executables | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kristian Kilpi | Kristian Kilpi wrote:
> On Thu, 22 Feb 2007 18:32:18 +0200, Frits van Bommel <fvbommel@REMwOVExCAPSs.nl> wrote:
>
>> Sean Kelly wrote:
>>> Ideally, perhaps a linker could provide both options: link fast and potentially bloat the exe or link carefully (and slowly) for a lean exe. I'd use the fast link for debugging and the slow link for releases. Assuming, of course, that the linker were reliable enough that there was no risk of changing app behavior between the two.
>>
>> That might not be the case here: if a module's object file is pulled in, that module's static constructors and destructors are called at runtime, right? So if different modules are pulled in with those options, different static constructors/destructors get called.
>> (Same goes for unit tests, if enabled, by the way)
>
> Hmm, yes, but how that's different from the today's situation? Currently the linker chooses *arbitrary* object modules that happen to contain the needed typeinfo.
Because as long as the list of dependencies remains unchanged, the same arbitrary choices should be made.
Sean
|
February 22, 2007 Re: Lib change leads to larger executables | ||||
---|---|---|---|---|
| ||||
Posted in reply to Frits van Bommel | Frits van Bommel (fvbommel@REMwOVExCAPSs.nl) wrote: > jcc7 wrote: > > Frits van Bommel (fvbommel@REMwOVExCAPSs.nl) wrote: > >> kris wrote: > >>> Isn't there some way to isolate the typeinfo such that only a segment is linked, rather than the entire "hosting" module (the one that just happened to be found first in the lib) ? > >> The obvious solution would be to always generate typeinfo even if it can be determined imported modules will already supply it. The current approach seems to confuse the linker, causing it to link in unrelated objects that happen to supply the symbol even though the compiler "meant" for another object file to supply it. > >> > >> Yes, that will "bloat" object files, but the current approach apparently bloats applications. Care to guess which are distributed most often? ;) > > > > I think your idea could work. It makes sense to me, but I'd like to go one better: (By the way, this topic is mostly over-my-head, so I'll probably have to quit offering ideas pretty soon lest I embarrass myselft more than I already may have.) > > Let's have DMD postpone creating TypeInfo until an .exe or .dll is being created and only include them with the .obj for the "main" module (i.e. the module with the main or DllMain function). > Not all libraries may have a DllMain, IIRC it's completely optional. On Windows it's required for D DLLs if you want to use the GC from within the DLL, or have static constructors/destructors in the DLL but otherwise you may get by without. I think if you write C-style D you may well get away without it. Well, I don't want to prevent anyone from playing by their own rules, so my proposed TypeInfo-postponing compiler could have a switch to add the TypeInfo as it's compiling any arbitrary code into an .obj file. But in usual circumstances, I'd think that the TypeInfo would only be needed when producing an .exe or .dll. > > Surely, the compiler can figure out which TypeInfo's it needs at the point of compiling an .exe or .dll. > Not necessarily. Any modules that are linked in but not called by other modules (e.g. code only reachable from static constructors and/or static destructors) may not be seen when main/DllMain is compiled, if there even is one of these (see above point about DllMain being optional). I don't see how static constructors and/or destructors interferes with the compiler detecting which TypeInfo's would be necessary, but I don't think such a problem would be insurmountable. Perhaps, it'd be a question of "Is it worth the effort?". But then again, I don't know much about what the compiler and linker do "under the hood". It's mostly a black box for me. But from reading Walter and Kris discuss the issues involved, I'm convinced there has to be a less haphazard way for DMD and optlink to interact. > > If not, even if we have to wait for linker to spit out > > a list of missing TypeInfo's and then generate the TypeInfo > > (trial-and-error), I think that would be a small price to pay for > > eliminating all of this bloat of unneeded module that Kris has > > discovered. > This would mean you can't "manually" link stuff together, using optlink/ld/whatever directly. I don't know how many people want to do this, but Walter has made it pretty clear he wants to be able to use a generic linker[1] (i.e. one that doesn't require specialized knowledge of D) and I agree with that. Isn't there still a question of whether anyone has found a "generic linker" for OMF (other than OptLink) that can work with DMD anyway? > Consider this: if every (or even more than one) language required a special way of linking, that would mean you couldn't link together code written in those languages without writing a linker (or perhaps wrapper) that supports both... Yeah, that doesn't sound like fun. > Though arguably the situation with DMD/Windows is already worse when it comes to that, since almost nobody else uses OMF anymore... Right. We seem to be on our own when it comes to using OMF. I think we're mostly trying to find a fix for the problem with the OMF files generated by DMD right now. Apparently, GDC doesn't have these same problems (or if GDC does have linker problems, Walter isn't the one responsible for fixing them). So I think the problem is limited to using DMD's OMF files on Windows. (Doesn't DMD on Linux use ELF? I think that's the case.) > For above-mentioned reasons, I don't think it will work for all > (corner)cases. You might be right, but I haven't given up hope yet. jcc7 |
February 22, 2007 Re: Lib change leads to larger executables | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | On Thu, 22 Feb 2007 22:08:46 +0200, Sean Kelly <sean@f4.ca> wrote:
> Kristian Kilpi wrote:
>> On Thu, 22 Feb 2007 18:32:18 +0200, Frits van Bommel <fvbommel@REMwOVExCAPSs.nl> wrote:
>>
>>> Sean Kelly wrote:
>>>> Ideally, perhaps a linker could provide both options: link fast and potentially bloat the exe or link carefully (and slowly) for a lean exe. I'd use the fast link for debugging and the slow link for releases. Assuming, of course, that the linker were reliable enough that there was no risk of changing app behavior between the two.
>>>
>>> That might not be the case here: if a module's object file is pulled in, that module's static constructors and destructors are called at runtime, right? So if different modules are pulled in with those options, different static constructors/destructors get called.
>>> (Same goes for unit tests, if enabled, by the way)
>> Hmm, yes, but how that's different from the today's situation? Currently the linker chooses *arbitrary* object modules that happen to contain the needed typeinfo.
>
> Because as long as the list of dependencies remains unchanged, the same arbitrary choices should be made.
>
>
> Sean
Well yes, except there is no guarantees of that, in the specs I mean. Another linker may (and likely will) produce a different result. And the same can happen when a library is rebuild. The order of object modules affect how the linker will choose modules to be linked in.
|
February 22, 2007 Re: Lib change leads to larger executables | ||||
---|---|---|---|---|
| ||||
Posted in reply to jcc7 | jcc7 wrote: > Frits van Bommel (fvbommel@REMwOVExCAPSs.nl) wrote: >> jcc7 wrote: >>> Surely, the compiler can figure out which TypeInfo's it needs at >>> the point of compiling an .exe or .dll. >> Not necessarily. Any modules that are linked in but not called by >> other modules (e.g. code only reachable from static constructors >> and/or static destructors) may not be seen when main/DllMain is >> compiled, if there even is one of these (see above point about >> DllMain being optional). > > I don't see how static constructors and/or destructors interferes with the > compiler detecting which TypeInfo's would be necessary, but I don't think such a > problem would be insurmountable. How static constructors could interfere: --- module selfcontained; static this() { // some code that requires TypeInfo not used in other modules // (including Phobos), perhaps for a type defined in this module. } --- (Change 'static this' to 'static ~this' or 'unittest' for similar problems) If this module isn't imported (directly or indirectly) from the file defining main() the compiler can't possibly know what TypeInfo needs to be generated for it when compiling main(), simply because it doesn't parse that file when pointed at the file containing main(). Yes, this could be "fixed" by having the module containing main() import all such modules, but it shouldn't have to. We shouldn't need to work around toolchain limitations, especially if there's a way to make it Just Work(TM). > Perhaps, it'd be a question of "Is it worth the > effort?". It'll be worth the effort when one of _your_ projects fail to compile because of it :P. > But then again, I don't know much about what the compiler and linker do "under the > hood". It's mostly a black box for me. But from reading Walter and Kris discuss > the issues involved, I'm convinced there has to be a less haphazard way for DMD > and optlink to interact. Like I've mentioned earlier: I'm pretty sure this problem would go away entirely if the compiler simply generated all TypeInfo used in the module. If that generates larger intermediate object files I'm okay with that. In fact, that was how I thought it worked until I started reading about this problem... >>> If not, even if we have to wait for linker to spit out >>> a list of missing TypeInfo's and then generate the TypeInfo >>> (trial-and-error), I think that would be a small price to pay for >>> eliminating all of this bloat of unneeded module that Kris has >>> discovered. >> This would mean you can't "manually" link stuff together, using >> optlink/ld/whatever directly. I don't know how many people want to >> do this, but Walter has made it pretty clear he wants to be able to >> use a generic linker[1] (i.e. one that doesn't require specialized >> knowledge of D) and I agree with that. > > Isn't there still a question of whether anyone has found a "generic linker" for > OMF (other than OptLink) that can work with DMD anyway? I believe I mentioned that a bit later ;). [snip special linker discussion] >> Though arguably the situation with DMD/Windows is already worse when >> it comes to that, since almost nobody else uses OMF anymore... > > Right. We seem to be on our own when it comes to using OMF. Well, it seems OpenWatcom supports it. From what I've read here the linker doesn't like DMD object files though. Walter claims it's buggy. I don't know enough about OMF to say one way or the other. > I think we're mostly trying to find a fix for the problem with the OMF files > generated by DMD right now. Apparently, GDC doesn't have these same problems (or > if GDC does have linker problems, Walter isn't the one responsible for fixing > them). So I think the problem is limited to using DMD's OMF files on Windows. > (Doesn't DMD on Linux use ELF? I think that's the case.) Yes, DMD/Linux uses ELF. It just calls ld (through gcc) to link instead of using optlink. I'm not sure if ld (or the mingw port of it) can use ELF to create Windows executables, but if it can that may be an option: just switch to ELF entirely and trash optlink. (this paragraph wasn't entirely serious, in case you hadn't noticed :P) |
February 22, 2007 Re: Lib change leads to larger executables | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Sean Kelly wrote:
> Still, I don't entirely understand why this appears to not be an issue using Build, which has historically had bloat issues in some cases. Was it just luck, or do things actually change when objects are stored in a library as opposed to not?
Doesn't Build only link together the object files for modules that are imported at any point in the application?
This problem is that the locale.Core module isn't used by the program and is still linked in.
But AFAIK Build wouldn't add Core.obj to the link command line if the module isn't imported from any module it compiled.
The linker can't pick the wrong object file to link in if it only considers "right" ones...
So the problem here is pretty related to library usage, in particular to the fact that libraries can contain object files that aren't needed for a particular program.
|
February 22, 2007 Re: Lib change leads to larger executables | ||||
---|---|---|---|---|
| ||||
Posted in reply to Frits van Bommel | Frits van Bommel wrote: > jcc7 wrote: >> Frits van Bommel (fvbommel@REMwOVExCAPSs.nl) wrote: >>> jcc7 wrote: >>>> Surely, the compiler can figure out which TypeInfo's it needs at >>>> the point of compiling an .exe or .dll. >>> Not necessarily. Any modules that are linked in but not called by >>> other modules (e.g. code only reachable from static constructors >>> and/or static destructors) may not be seen when main/DllMain is >>> compiled, if there even is one of these (see above point about >>> DllMain being optional). >> >> I don't see how static constructors and/or destructors interferes with the >> compiler detecting which TypeInfo's would be necessary, but I don't think such a >> problem would be insurmountable. > > How static constructors could interfere: > --- > module selfcontained; > > static this() { > // some code that requires TypeInfo not used in other modules > // (including Phobos), perhaps for a type defined in this module. > } > --- > (Change 'static this' to 'static ~this' or 'unittest' for similar problems) > > If this module isn't imported (directly or indirectly) from the file defining main() the compiler can't possibly know what TypeInfo needs to be generated for it when compiling main(), simply because it doesn't parse that file when pointed at the file containing main(). Oh, I thought the .obj file included mentions of things that are needed, but not contained in a particular .obj. I thought that's why "Error 42: Symbol Undefined" will appear if I don't give the compiler enough source files. If that's not right, that would be a serious flaw in my proposal. > Yes, this could be "fixed" by having the module containing main() import all such modules, but it shouldn't have to. We shouldn't need to work around toolchain limitations, especially if there's a way to make it Just Work(TM). > >> Perhaps, it'd be a question of "Is it worth the >> effort?". > > It'll be worth the effort when one of _your_ projects fail to compile because of it :P. Well, of course, my plan is contingent upon my projects successfully compiling. ;) [snip my older thoughts] > Like I've mentioned earlier: I'm pretty sure this problem would go away entirely if the compiler simply generated all TypeInfo used in the module. If that generates larger intermediate object files I'm okay with that. In fact, that was how I thought it worked until I started reading about this problem... If that'd solve the problem, that'd be an improvement from the status quo. But I had the understanding that there is a problem with the linker picking the TypeInfo from an arbitrary .obj (such as a large module that isn't needed for a particular program)? I'm afraid the linker might continue to choose an inappropriate TypeInfo. Or do you plan for all of the TypeInfo's to be unique, thus probably still bloating the .exe (but in a different way)? > [snip special linker discussion] >>> Though arguably the situation with DMD/Windows is already worse when >>> it comes to that, since almost nobody else uses OMF anymore... >> >> Right. We seem to be on our own when it comes to using OMF. > > Well, it seems OpenWatcom supports it. From what I've read here the linker doesn't like DMD object files though. Walter claims it's buggy. I don't know enough about OMF to say one way or the other. Well, it doesn't really matter to me if DMD continues to use OMF if the format doesn't cause a bunch of bloat or other broken features. But I still wonder Walter needs to stay so close to the "official" format if DMC/DMD's OMF doesn't seems to be compatible with any other compiler. [snip my older thoughts] > Yes, DMD/Linux uses ELF. It just calls ld (through gcc) to link instead of using optlink. > > I'm not sure if ld (or the mingw port of it) can use ELF to create Windows executables, but if it can that may be an option: just switch to ELF entirely and trash optlink. (this paragraph wasn't entirely serious, in case you hadn't noticed :P) I suspect the option of ELF output would be welcomed by OMF's harshest critics. Not that I know anything about ELF. -- jcc7 |
Copyright © 1999-2021 by the D Language Foundation