Lib change leads to larger executables (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Lib change leads to larger executables (page 4)

February 21, 2007

Re: Lib change leads to larger executables

Posted by kris
in reply to Walter Bright

kris

Posted in reply to Walter Bright

Walter Bright wrote:
> kris wrote:
> 
>> It also finds that particular one whether the module is listed first or last in the lib response-file.
> 
> 
> I bet that's because that module was imported (directly or indirectly) by every other module that used char[][], and so it was the only module that defines it.
> 
>> What is one supposed to do for production-quality libraries?
> 
> 
> Some strategies:
> 
> 1) minimize importing of modules that are never used
> 
> 2) for modules with a lot of code in them, import them as a .di file rather than a .d
> 
> 3) create a separate module that defines the relevant typeinfo's, and put that first in the library

1) Tango takes this very seriously ... more so than Phobos, for example.

2) That is something that could be used in certain scenario's, but is not a general or practical solution for widespread use of D.

3) Hack around an undocumented and poorly understood problem in developer-land. Great.

you might as well add:

4) have the user instantiate a pointless and magic char[][] in their own program, so that they can link with the Tango library?

None of this is not gonna fly in practice, and you surely know that?

I get a subtle impression that you're being defensive about the problem rather than actively thinking about a practical solution? We're trying to help D get some traction here, yet it seems you're not particularly interested in removing some roadblocks? Or are you scheming a resolution in private?

"frustrated with D tools again"

February 21, 2007

Re: Lib change leads to larger executables

Posted by Lionello Lunesu
in reply to Walter Bright

Lionello Lunesu

Posted in reply to Walter Bright

"Walter Bright" <newshound@digitalmars.com> wrote in message news:erie2v$kad$2@digitalmars.com...
> Frits van Bommel wrote:
>> GNU ld seems to be perfectly happy working at the section level (with --gc-sections).
>
> Yeah, well, try linking D programs with --gc-sections, and you'll get a crashing executable.

Thomas has suggested some fixes for that in bugzilla #879.

L.

February 21, 2007

Re: Lib change leads to larger executables

Posted by Frits van Bommel
in reply to Walter Bright

Frits van Bommel

Posted in reply to Walter Bright

Walter Bright wrote:
> Frits van Bommel wrote:
>> GNU ld seems to be perfectly happy working at the section level (with --gc-sections).
> 
> Yeah, well, try linking D programs with --gc-sections, and you'll get a crashing executable.

Haven't had trouble with it so far, though I seem to recall reading there being some issues with exceptions. AFAIK that can be fixed by using an appropriate linker script that KEEP()s the exception info, though I haven't tried it since I haven't had that problem so far...

Is there anything else that breaks with --gc-sections?

February 21, 2007

Re: Lib change leads to larger executables

Posted by Kristian Kilpi
in reply to Walter Bright

Kristian Kilpi

Posted in reply to Walter Bright

On Wed, 21 Feb 2007 11:00:44 +0200, Walter Bright <newshound@digitalmars.com> wrote:

>> It does, but increases the exe size of the first example from 180kb to 617kb!
>  > 180kb is when compiled using build/rebuild/jake etc (no library) and the 617kb
>  > is when using dmd+lib only. Same flags in both cases: none at all
>
> Let's say you have a template instance, TI. It is declared in two modules, M1 and M2:
>
> -----------M1------------
> TI
> A
> -----------M2------------
> TI
> B
> -------------------------
>
> M1 also declares A, and M2 also declares B. Now, the linker is looking to resolve TI, and the first one it finds is one in M1, and so links in M1. Later on, it needs to resolve B, and so links in M2. The redundant TI is discarded (because it's a COMDAT).
>
> However, suppose the program never references A, and A is a chunk of code that pulls in lots of other bloat. This could make the executable much larger than if, in resolving TI, it had picked M2 instead.
>
> You can control which module containing TI will be pulled in by the linker to resolve TI, by specifying that module first to lib.exe.
>
> You can also put TI in a third module that has neither A nor B in it. When compiling M1 and M2, import that third module, so TI won't be generated for M1 or M2.

Here's a quick thought. (It's probably too impractical/absurd. ;) ) Could template instances to be put to their own, separate modules? Then the linker will find a module containing the template instance only, and no bloat will be pulled in with it. I don't know if this would require the compiler to generate separate, extra .obj files for template instances or something.

February 21, 2007

Re: Lib change leads to larger executables

Posted by Pragma
in reply to Kristian Kilpi

Pragma

Posted in reply to Kristian Kilpi

Kristian Kilpi wrote:
> On Wed, 21 Feb 2007 11:00:44 +0200, Walter Bright <newshound@digitalmars.com> wrote:
> 
>>> It does, but increases the exe size of the first example from 180kb to 617kb!
>>  > 180kb is when compiled using build/rebuild/jake etc (no library) and the 617kb
>>  > is when using dmd+lib only. Same flags in both cases: none at all
>>
>> Let's say you have a template instance, TI. It is declared in two modules, M1 and M2:
>>
>> -----------M1------------
>> TI
>> A
>> -----------M2------------
>> TI
>> B
>> -------------------------
>>
>> M1 also declares A, and M2 also declares B. Now, the linker is looking to resolve TI, and the first one it finds is one in M1, and so links in M1. Later on, it needs to resolve B, and so links in M2. The redundant TI is discarded (because it's a COMDAT).
>>
>> However, suppose the program never references A, and A is a chunk of code that pulls in lots of other bloat. This could make the executable much larger than if, in resolving TI, it had picked M2 instead.
>>
>> You can control which module containing TI will be pulled in by the linker to resolve TI, by specifying that module first to lib.exe.
>>
>> You can also put TI in a third module that has neither A nor B in it. When compiling M1 and M2, import that third module, so TI won't be generated for M1 or M2.
> 
> Here's a quick thought. (It's probably too impractical/absurd. ;) ) Could template instances to be put to their own, separate modules? Then the linker will find a module containing the template instance only, and no bloat will be pulled in with it. I don't know if this would require the compiler to generate separate, extra .obj files for template instances or something.

Nice idea, but I'd rather see the librarian to (optionally?) do this job instead.  It would avoid any complications for the existing toolchain by not introducing any behavior that is radically different from other platforms (i.e. "foo.d" ==> "foo.obj" and "foo-t.obj").

Now if you're talking about breaking each-and-every COMDAT out into it's own .obj, then having the librarian do it is a must.  I can't imagine what my workspace would look like otherwise.

Either way, all this involves the rather messy business of turning each COMDAT fixup reference within an .obj file into an EXTERN.  I doubt that the DMD/DMC backend would make this job easy (I could be wrong!), so again, putting the job elsewhere (librarian) might be easier to maintain.

-- 
- EricAnderton at yahoo

February 21, 2007

Re: Lib change leads to larger executables

Posted by kris
in reply to Pragma

kris

Posted in reply to Pragma

Pragma wrote:
> Kristian Kilpi wrote:
> 
>> On Wed, 21 Feb 2007 11:00:44 +0200, Walter Bright <newshound@digitalmars.com> wrote:
>>
>>>> It does, but increases the exe size of the first example from 180kb to 617kb!
>>>
>>>  > 180kb is when compiled using build/rebuild/jake etc (no library) and the 617kb
>>>  > is when using dmd+lib only. Same flags in both cases: none at all
>>>
>>> Let's say you have a template instance, TI. It is declared in two modules, M1 and M2:
>>>
>>> -----------M1------------
>>> TI
>>> A
>>> -----------M2------------
>>> TI
>>> B
>>> -------------------------
>>>
>>> M1 also declares A, and M2 also declares B. Now, the linker is looking to resolve TI, and the first one it finds is one in M1, and so links in M1. Later on, it needs to resolve B, and so links in M2. The redundant TI is discarded (because it's a COMDAT).
>>>
>>> However, suppose the program never references A, and A is a chunk of code that pulls in lots of other bloat. This could make the executable much larger than if, in resolving TI, it had picked M2 instead.
>>>
>>> You can control which module containing TI will be pulled in by the linker to resolve TI, by specifying that module first to lib.exe.
>>>
>>> You can also put TI in a third module that has neither A nor B in it. When compiling M1 and M2, import that third module, so TI won't be generated for M1 or M2.
>>
>>
>> Here's a quick thought. (It's probably too impractical/absurd. ;) ) Could template instances to be put to their own, separate modules? Then the linker will find a module containing the template instance only, and no bloat will be pulled in with it. I don't know if this would require the compiler to generate separate, extra .obj files for template instances or something.
> 
> 
> Nice idea, but I'd rather see the librarian to (optionally?) do this job instead.  It would avoid any complications for the existing toolchain by not introducing any behavior that is radically different from other platforms (i.e. "foo.d" ==> "foo.obj" and "foo-t.obj").
> 
> Now if you're talking about breaking each-and-every COMDAT out into it's own .obj, then having the librarian do it is a must.  I can't imagine what my workspace would look like otherwise.
> 
> Either way, all this involves the rather messy business of turning each COMDAT fixup reference within an .obj file into an EXTERN.  I doubt that the DMD/DMC backend would make this job easy (I could be wrong!), so again, putting the job elsewhere (librarian) might be easier to maintain.
> 


Just to clarify the current situation: the ballooned exe file has nothing to do with templates. There are no templates involved in that particular issue, and it appears the prior template demons have been driven under the bridge for the interim. There is some progress here, but it led to the uncovering of another problem ;)

February 21, 2007

Re: Lib change leads to larger executables

Posted by Derek Parnell
in reply to Walter Bright

Derek Parnell

Posted in reply to Walter Bright

Walter,
do we (the developer community) have a problem here?

If yes, will you be actively trying to find a satisfactory resolution in the near future?

On Wed, 21 Feb 2007 13:22:09 -0800, Walter Bright wrote:

> Frits van Bommel wrote:
>> kris wrote:
>>> Isn't there some way to isolate the typeinfo such that only a segment is linked, rather than the entire "hosting" module (the one that just happened to be found first in the lib) ?
> 
> No, the linker deals with .obj files as a unit.

This has been pointed out a few times now; if any single item in an .OBJ file is referenced in the program, the whole .OBJ file is linked into the executable.

This implies that in order to make small executable files, we need to ensure that .OBJ files are as atomic as possible and to minimize references to other modules. Yes, these are at conflict with each other so a compromise made be made somehow.

A better link editor would be able to only link in the portions of the .OBJ
file that are needed, but until someone writes a replacement for OptLink,
we are pretty well stuck with Walter's approach.

>> The obvious solution would be to always generate typeinfo even if it can be determined imported modules will already supply it. The current approach seems to confuse the linker, causing it to link in unrelated objects that happen to supply the symbol even though the compiler "meant" for another object file to supply it.
> 
> I wish to be precise - there is no "seems" or "confuse" with linking. It simply follows the algorithm I outlined previously - have an unresolved symbol, find the first .obj module in the library which resolves it. It does this in a loop until there are no further unreferenced symbols.

Walter, I know that you are not going to change OptLink, so this next question is purely theoretical ... instead of finding the 'first' object file that resolves it, is there a better algorithm ... maybe the smallest object file that resolves it, or ... I don't know ... but it might be worth thinking about.

> Most of the complexity in a linker stems from:
> 
> 1) trying to make it fast

How fast is fast enough?

> 2) the over-complicated .obj file format

Can we improve the OBJ file format?

> Conceptually, it is a very simple program.

And that might be a part of the problem.

>> Yes, that will "bloat" object files, but the current approach apparently bloats applications. Care to guess which are distributed most often? ;)
> 
> TypeInfo's are only going to grow, and this could create gigantic obj files.

So, have we got a problem or not?

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Justice for David Hicks!"
22/02/2007 10:01:57 AM

February 22, 2007

Re: Lib change leads to larger executables

Posted by Walter Bright
in reply to Derek Parnell

Walter Bright

Posted in reply to Derek Parnell

Derek Parnell wrote:
> If yes, will you be actively trying to find a satisfactory resolution in
> the near future?

I posted some suggestions to Kris.

> This has been pointed out a few times now; if any single item in an .OBJ
> file is referenced in the program, the whole .OBJ file is linked into the
> executable.

That's right. COMDATs make things slightly more complicated, as unreferenced COMDATs get discarded by the linker.

> This implies that in order to make small executable files, we need to
> ensure that .OBJ files are as atomic as possible and to minimize references
> to other modules. Yes, these are at conflict with each other so a
> compromise made be made somehow.
> 
> A better link editor would be able to only link in the portions of the .OBJ
> file that are needed, but until someone writes a replacement for OptLink,
> we are pretty well stuck with Walter's approach.

It's important to work with existing tools (like linkers and librarians), which (among other things) helps ensure that D programs can link with the output of other compilers (like gcc).

> Walter, I know that you are not going to change OptLink, so this next
> question is purely theoretical ... instead of finding the 'first' object
> file that resolves it, is there a better algorithm ... maybe the smallest
> object file that resolves it, or ... I don't know ... but it might be worth
> thinking about.

The 'smallest' doesn't do what you ask, either, because even the smallest obj file could contain a reference to something big.

>> Most of the complexity in a linker stems from:
>> 1) trying to make it fast
> How fast is fast enough?

It's never fast enough. I know a fellow who made his fortune just writing a faster linker than MS-LINK. (You can guess the name of that linker!) Borland based their whole company's existence on fast compile-link times. Currently, ld is pig slow, it's a big bottleneck on the edit-compile-link-debug cycle on Linux.

>> 2) the over-complicated .obj file format
> Can we improve the OBJ file format?

Only if we want to write a replacement for every tool out there that manipulates object files, and if we want to give up linking with the output of C compilers (or any other compilers).

>> Conceptually, it is a very simple program.
> And that might be a part of the problem.

Might be, but there also shouldn't be any confusion or mystery about what it's doing. Understanding how it works makes it possible to build a professional quality library. You can't really escape understanding it - and there's no reason to, it *is* a simple program.

>>> Yes, that will "bloat" object files, but the current approach apparently bloats applications. Care to guess which are distributed most often? ;)
>> TypeInfo's are only going to grow, and this could create gigantic obj files.
> 
> So, have we got a problem or not?

Given limited resources, we have to deal with what we have.

February 22, 2007

Re: Lib change leads to larger executables

Posted by Walter Bright
in reply to kris

Walter Bright

Posted in reply to kris

kris wrote:
> Walter Bright wrote:
>> Some strategies:
>>
>> 1) minimize importing of modules that are never used
>>
>> 2) for modules with a lot of code in them, import them as a .di file rather than a .d
>>
>> 3) create a separate module that defines the relevant typeinfo's, and put that first in the library
> 
> 
> 1) Tango takes this very seriously ... more so than Phobos, for example.

Sure, but in this particular case, it seems that "core" is being imported without referencing code in it. The only reason the compiler doesn't generate the char[][] TypeInfo is because an import defines it. The compiler does work on the assumption that if a module is imported, then it will also be linked in.

> 2) That is something that could be used in certain scenario's, but is not a general or practical solution for widespread use of D.

The compiler can automatically generate .di files. You're probably going to want to do that anyway as part of polishing the library - it speeds compilation times, aids proper encapsulation, etc. That's why the gc does it, and I've been meaning to do it for other bulky libraries like std.regexp.

I wish to point out that the current scheme does *work*, it generates working executables. In the age of demand paged executable loading (which both Linux and Windows do), unused code in the executable never even gets loaded into memory. The downside to size is really in shipping code over a network (also in embedded systems).

So I disagree with your characterization of it as impractical.

For professional libraries, it is not unreasonable to expect some extra effort in tuning the libraries to minimize dependency. This is a normal process, it's been going on at least for the 25 years I've been doing it. Standard C runtime libraries, for example, have been *extensively* tweaked and tuned in this manner, and that's just boring old C. They are not just big lumps of code.

> 3) Hack around an undocumented and poorly understood problem in developer-land. Great.

I think you understand the problem now, and the solution. Every developer of professional libraries should understand this problem, it crops up with most every language. If a developer doesn't understand it, one winds up with something like Java where even the simplest hello world winds up pulling in the entire Java runtime library, because dependencies were not engineered properly.

> you might as well add:
> 
> 4) have the user instantiate a pointless and magic char[][] in their own program, so that they can link with the Tango library?

I wouldn't add it, as I would expect the library developer to take care of such things by adding them to the Tango library as part of the routine process of optimizing executable size by minimizing dependencies.

> None of this is not gonna fly in practice, and you surely know that?

For features like runtime time identification, etc., that are generated by the compiler (instead of explicitly by the programmer), then the dependencies they generate are a fact of life.

Optimizing the size of a generated program is a routine programming task. It isn't something new with D. I've been doing this for 25 years.

> I get a subtle impression that you're being defensive about the problem rather than actively thinking about a practical solution? We're trying to help D get some traction here, yet it seems you're not particularly interested in removing some roadblocks? Or are you scheming a resolution in private?

If you have any ideas that don't involve reinventing obj file formats or that don't preclude using standard linkers, please let me know.

February 22, 2007

Re: Lib change leads to larger executables

Posted by kris
in reply to Walter Bright

kris

Posted in reply to Walter Bright

Walter Bright wrote:

> 3) create a separate module that defines the relevant typeinfo's, and put that first in the library

Just to satify your stance I tried this; guess what? It has no effect whatsoever, since you /cannot/ dictate the order in which the decls will be inspected in advance.

I hope this captures the sheer absurdity of trying to "outwit" the librarian/linker in the first place?

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation