Striping unused code (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Striping unused code (page 2)

November 22, 2005

Re: Striping unused code

Posted by Georg Wrede
in reply to James Dunne

Georg Wrede

Posted in reply to James Dunne

Hmm, now I know why some people hate top-posting. If I comment both James's and John's stuff here, my comments seem inconsistent because of the "wrong" order I have to write them too. (I'll try to shift my thoughts around some, to make it less obvious. But it's more work for me. ;-(

James Dunne wrote:
> Perhaps when linking an object file with a static library, it might
> be more difficult to relink the static library and eliminate parts of
> it than to treat the static library as one big linking unit and link
> it in as a whole.

The linker would then have to have knowledge of possible internal dependencies between functions in the library. And it would have to have knowledge of which functions will be called from the linked program.

It's not impossible, but I guess it would be much easier to write a linker if the library contained hints on which functions in the library call others.

> I'm no expert in linkers by any means, but I can't think of why this couldn't or shouldn't be done.
> 
> John Demme wrote:
> 
>> No, it's not possible to do this during compilation. It is the compiler's job to take all of the methods/functions and put them in
>> the object file. The compiler doesn't necessarily know what symbols
>> are going to be used and what aren't.
>>
>> This is a task for the linker- one that I'm surprised it doesn't do
>>  already by default. There's probably a reason for this, but I
>> don't know what it is... anyone?
>> ~John Demme

See above.

If the linker had the knowledge I wrote above, then it would be easy, because the linker already knows which functions the main executable calls.

>> Tiago Gasiba wrote:
>>>
>>>  Is it possible to strip unused code from an executable, during
>>>  compilation? In gcc it was possible to include something like:
>>>  -ffunction-sections -fdata-sections -Wl,--gc-sections Look at the
>>>  following example:

Old Turbo Pascal compilers (maybe newer too, but I haven't used them for some time) used to throw out unused code from the program. It analyzed what parts of code were unused, and skipped them before compilation.

So, if you had the equivalent of the following D code,

if(1==0)
{
    // lots of code
}
else
{
    // some other code
}

then the "lots of code" part would be entirely skipped. And interestingly those compilers were still the fastest Pascal compilers around!

I don't remember whether code in linked libraries was skipped too. But HelloWorld was quite small!!!

Oh, times change. In the Old Days, one used to respect a customer's disk space. And today it's not just executables, it's a lot of other crap. Acrobat Reader, Real Player, they're the worst. 90% of the crap hasn't even to do with what they're for -- it's there for marketing and other not-asked-for things.

Not that FSF is any better. Even a Linux distro where source is not specifically included, virtually every smallest file that they've ever laid their hands on, contains a kilobyte of their copyright declarations. That adds up to apalling numbers real quickly.

>>> --- lib.d ---
>>> module lib;
>>>
>>> import std.c.stdio;
>>>
>>> void func1( int x ){
>>>  printf("%d\n",x);
>>> }
>>>
>>> int func2( int x ){
>>>  printf("%d\n",x);
>>>  return x;
>>> }
>>>
>>>
>>> --- test.d ---
>>> import std.c.stdio;
>>> import lib;
>>>
>>> int main(){
>>>  func1(2);
>>> }
>>> -----------------
>>> Compilation:
>>>
>>> dmd test.d lib.d   - generates a 292651 byte file!!! (why so big?)
>>>
>>> -> func2() is NOT used, but is included into the code:
>>> nm test |grep func2
>>> 0804bb2c T _D3lib5func2FiZi
>>>
>>> After stripping the code, I still obtain a 157444 byte file!!! (still
>>> large...)

I just tried the following on (DMD .139) Linux:

-----libtest1.d:
import std.c.stdio;
void main(){}
-----libtest2.d:
void main(){}
-----libtest3.d:
int main() {return 0;}

And got the following sizes:
libtest1 291704
libtest2 291704
libtest3 291704

This is actually a lot more than I got about a year ago. Hmm.

>>> Question: How can I remove unused code, such that my executable
>>> does not grow too much in size? (I'm almost sure that inside the
>>> executable there are still many internal and non-used functions
>>> also)

Haven't tried it with D, but at least Turbo Pascal manuals contained explanations on what you should do if you know that there are library functions that you know you don't want to use. There was a nice Librarian with which you could very easily trim a library to your own needs. Or even make one with some of theirs and some of your own stuff.

On Linux, try "man strip". There are many options for discarding different things from the executable. And after that you could try the executable compression somebody mentioned on this thread.

But still, it would be cool to have a compiler + linker that aggressively skip everything useless!

November 22, 2005

Re: Striping unused code

Posted by Tiago Gasiba
in reply to Georg Wrede

Tiago Gasiba

Posted in reply to Georg Wrede

Attachments:

pgp.sig

Georg Wrede schrieb:
> 
> Haven't tried it with D, but at least Turbo Pascal manuals contained explanations on what you should do if you know that there are library functions that you know you don't want to use. There was a nice Librarian with which you could very easily trim a library to your own needs. Or even make one with some of theirs and some of your own stuff.
Well, but that's a problem! For me, the concept of a library is something where you can "throw" a lot of code and it gets stored as "in a library".
You just need to call it (import) by its name to get access to its functions.
Now, imagine you are using a single function from a huge library and, you didn't even wrote that library.
Its extremely difficult to keep track of each individual function.
If you only use one function from that library you expect that all the other "trash" evaporates after linkage!
Therefore, it is a must that the compiler and/or linker can throw away unused code!
I can not imagine putting each function in a separated library and importing each and every library individually, although this would certainly strip unused code :)

> On Linux, try "man strip". There are many options for discarding different things from the executable. And after that you could try the executable compression somebody mentioned on this thread.
Yes, but there are two problems with this approach!
First, "strip" is used to discard symbols, not code (sections), i.e. stuff that might be used by a debugger, another linker, etc...
Second, compression software solve make the file smaller but are "fooling" the user, in the sense that uncompressed files still have all that trash inside!

There are many many of good reasons why to cut unused code and as to why we should remove symbolic information, for example HD space, etc...


> But still, it would be cool to have a compiler + linker that aggressively skip everything useless!
I would say that, not only cool but highly required/necessary!

Best,
Tiago

- --
Tiago Gasiba (M.Sc.) - http://www.gasiba.de
Everything should be made as simple as possible, but not simpler.

November 22, 2005

Re: Striping unused code

Posted by Derek Parnell
in reply to Tiago Gasiba

Derek Parnell

Posted in reply to Tiago Gasiba

On Mon, 21 Nov 2005 17:26:50 +0100, Tiago Gasiba wrote:

> Hi all,
> 
>   Is it possible to strip unused code from an executable, during compilation?

If you are using DigitalMars linker, OptLink, as called by dmd, then you might get some savings by using the /PACKFUNCTIONS switch on the linker.

e.g.

   dmd test.d -L/PACKFUNCTIONS

-- 
Derek Parnell
Melbourne, Australia
22/11/2005 9:39:57 PM

November 22, 2005

Re: Striping unused code

Posted by Jari-Matti Mäkelä
in reply to Georg Wrede

Jari-Matti Mäkelä

Posted in reply to Georg Wrede

Georg Wrede wrote:
> Old Turbo Pascal compilers (maybe newer too, but I haven't used them for some time) used to throw out unused code from the program. It analyzed what parts of code were unused, and skipped them before compilation.
> 
> So, if you had the equivalent of the following D code,
> 
> if(1==0)
> {
>     // lots of code
> }
> else
> {
>     // some other code
> }
> 
> then the "lots of code" part would be entirely skipped. And interestingly those compilers were still the fastest Pascal compilers around!

I agree. Some compiler black magic needed here ;)

> Oh, times change. In the Old Days, one used to respect a customer's disk space. And today it's not just executables, it's a lot of other crap. Acrobat Reader, Real Player, they're the worst. 90% of the crap hasn't even to do with what they're for -- it's there for marketing and other not-asked-for things.
> 
> Not that FSF is any better. Even a Linux distro where source is not specifically included, virtually every smallest file that they've ever laid their hands on, contains a kilobyte of their copyright declarations. That adds up to apalling numbers real quickly.

Quite a lot of FSF & other old code written in C are total bull***t. First there is the rcs header, then some copyright stuff, then some unnecessary includes and finally some undocumented ioccc stuff.

> I just tried the following on (DMD .139) Linux:
> 
> -----libtest1.d:
> import std.c.stdio;
> void main(){}
> -----libtest2.d:
> void main(){}
> -----libtest3.d:
> int main() {return 0;}
> 
> And got the following sizes:
> libtest1 291704
> libtest2 291704
> libtest3 291704

I got 295105 (DMD .139 + Linux) -> 157396 (strip -s) -> 56778 (upx)! Although this is not much, I'm afraid that bigger programs will get dangerously bloated. I've only done some small stuff with DUI and the resulting binary was already 2,5 MB in size!

November 22, 2005

Re: Striping unused code

Posted by Sean Kelly
in reply to MicroWizard

Sean Kelly

Posted in reply to MicroWizard

MicroWizard wrote:
> Phobos is huge it is clear, but there are still problems with Ares.
> 
> Under win32 the library make does not work:
> - references to \bin\dmd\ etc. which is hardcoded and not the "DM standard"

The Ares makefiles are styled after the Phobos makefiles (as I'm not a makefile expert) and they do contain hard path references, but these are at the top of each makefile.  Changing them should take all of 10 seconds.

> - namespace conflicts (maybe these are burden in DMD deeply AFAIK)

Please explain.

> I have tried to link the precompiled ares library (phobos.lib)
> to oe of my live D projects and some modules were missing.
> Ex. The Error class, some ToString, ToInt, etc.

Yup.  Ares is still in its infancy and it makes no attempt to duplicate what's in Phobos... even down to that level.  It's really just a minimal D runtime library at the moment, separated into three sub-libraries for ease of maintenance.

> I know all of these small can be patched easily but not all of us
> are "system programmer"/hacker.
> 
> Ares would be very nice. Some minor changes are only needed.
> I am looking forward to see newer versions.

This has been an extremely busy fall for me, but I should have more time in a few weeks :-)

Sean

November 23, 2005

Re: Striping unused code

Posted by Don Clugston
in reply to Tiago Gasiba

Don Clugston

Posted in reply to Tiago Gasiba

Tiago Gasiba wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Georg Wrede schrieb:
> 
>>Haven't tried it with D, but at least Turbo Pascal manuals contained
>>explanations on what you should do if you know that there are library
>>functions that you know you don't want to use. There was a nice
>>Librarian with which you could very easily trim a library to your own
>>needs. Or even make one with some of theirs and some of your own stuff.
> 
> Well, but that's a problem! For me, the concept of a library is something where you can "throw" a lot of code and it gets stored as "in a library".
> You just need to call it (import) by its name to get access to its functions.
> Now, imagine you are using a single function from a huge library and, you didn't even wrote that library.
> Its extremely difficult to keep track of each individual function.
> If you only use one function from that library you expect that all the other "trash" evaporates after linkage!
> Therefore, it is a must that the compiler and/or linker can throw away unused code!
> I can not imagine putting each function in a separated library and importing each and every library individually, although this would certainly strip unused code :)

I agree entirely. If everything gets linked in, IT IS NOT A LIBRARY.
It's just a massive .obj file which has a .lib extension!
It's like needing to bring a semitrailer with you when you want to borrow a library book, because all the books in your local library are stapled together...

The way it used to work, was every .cpp file put in the library went into its own section. Using one function from it pulled in the section that it was in (ie, pulled in the original .obj file). Now, it could be that all D modules end up in in a single section in the library. As a quick-n-dirty hack to get the language going, that's OK -- but in the longer term, it's something that needs to be fixed.
Imagine if Phobos grew to be as extensive as the .NET libraries. A 'hello world' app could be 25Mb in size! Bingo, one of the selling points of D is gone.

November 23, 2005

Re: Striping unused code

Posted by Tiago Gasiba
in reply to Don Clugston

Tiago Gasiba

Posted in reply to Don Clugston

Attachments:

pgp.sig

Don Clugston schrieb:
> 
> The way it used to work, was every .cpp file put in the library went into its own section. Using one function from it pulled in the section that it was in (ie, pulled in the original .obj file). Now, it could be that all D modules end up in in a single section in the library.

I'm even going one step further and telling that putting every function in a individual section would be advantageous.
The example that I've first posted contained a single library with two functions.
What happens (I think) is that DMD puts  both functions in the same section (we only have one library/module).
In the main file, only function 1 is used, not function 2, but since they are both in the same section, function 2 gets into the same executable.
The "-ffunction-sections -fdata-sections -Wl,--gc-sections" GCC compiler flags tell the compiler to do exactly that - put every function in its own section.
The linker can, therefore, discard those sections (i.e. individual functions) that are not used even if they are come from the same module.
Note that with GDC this feature is present, but not with DMD!
I would propose (in the future) adding support to DMD to allow separating every function into its own section.

Tiago

- --
Tiago Gasiba (M.Sc.) - http://www.gasiba.de
Everything should be made as simple as possible, but not simpler.

November 23, 2005

Re: Striping unused code

Posted by Georg Wrede
in reply to Don Clugston

Georg Wrede

Posted in reply to Don Clugston

Don Clugston wrote:
> Tiago Gasiba wrote:
>> Georg Wrede schrieb:
>> 
>>> Haven't tried it with D, but at least Turbo Pascal manuals
>>> contained explanations on what you should do if you know that
>>> there are library functions that you know you don't want to use.
>>> There was a nice Librarian with which you could very easily trim
>>> a library to your own needs. Or even make one with some of theirs
>>> and some of your own stuff.
>> 
>> Well, but that's a problem! For me, the concept of a library is something where you can "throw" a lot of code and it gets stored as
>>  "in a library". You just need to call it (import) by its name to
>> get access to its functions. Now, imagine you are using a single
>> function from a huge library and, you didn't even wrote that
>> library. Its extremely difficult to keep track of each individual
>> function. If you only use one function from that library you expect
>> that all the other "trash" evaporates after linkage! Therefore, it
>> is a must that the compiler and/or linker can throw away unused
>> code! I can not imagine putting each function in a separated
>> library and importing each and every library individually, although
>> this would certainly strip unused code :)
> 
> I agree entirely. If everything gets linked in, IT IS NOT A LIBRARY. It's just a massive .obj file which has a .lib extension! It's like
> needing to bring a semitrailer with you when you want to borrow a
> library book, because all the books in your local library are stapled
> together...
> 
> The way it used to work, was every .cpp file put in the library went
>  into its own section. Using one function from it pulled in the
> section that it was in (ie, pulled in the original .obj file). Now,
> it could be that all D modules end up in in a single section in the
> library. As a quick-n-dirty hack to get the language going, that's OK
> -- but in the longer term, it's something that needs to be fixed. Imagine if Phobos grew to be as extensive as the .NET libraries. A 'hello world' app could be 25Mb in size! Bingo, one of the selling points of D is gone.

I agree with you both! (I just offered a workaround for now.)

BTW, as another temporary workaround, somebody posted recently a shell script that I think could be modified into creating the smallest library containing the needed stuff for a specific program.

So the script would try to compile, then read the error messages, copy all the mentioned library routines into a temporary library which then would be linked in.

Granted, slow and kludgy, but a possibility.
And automatic. :-)

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation