Program size, linking matter, and static this() (page 9) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Program size, linking matter, and static this() (page 9)

December 19, 2011

Re: Program size, linking matter, and static this()

Posted by Steven Schveighoffer
in reply to Marco Leise

Steven Schveighoffer

Posted in reply to Marco Leise

On Sun, 18 Dec 2011 18:02:10 -0500, Marco Leise <Marco.Leise@gmx.de> wrote:

> Am 16.12.2011, 23:08 Uhr, schrieb Steven Schveighoffer <schveiguy@yahoo.com>:
>
>> Note that on Linux today, the executable is not truly static -- OS libs are dynamically linked.
>
> That should hold true for any OS. Otherwise, how would the program communicate with the kernel and drivers, i.e. render a button on the screen? Some dynamically linked in functions must provide the interface to that "administrative singleton" that manages system resources.

Not necessarily.  On Linux, system calls provide the "interface" between the code and the OS.  A system call is essentially an OS interrupt, similar to a network protocol.  You don't need dynamic linking to implement it.

Remember, Linux didn't even support dynamic libraries before kernel 1.2 maybe?  Hm... must check wikipedia...

But my point is, if the intention is that you have a myriad of D based libraries or executables on your system, then druntime and phobos enter the same realm as glibc.

-Steve

December 19, 2011

Re: Program size, linking matter, and static this()

Posted by Steven Schveighoffer
in reply to torhu

Steven Schveighoffer

Posted in reply to torhu

On Fri, 16 Dec 2011 17:30:44 -0500, torhu <no@spam.invalid> wrote:

> On 16.12.2011 22:28, Steven Schveighoffer wrote:
>> In short, dlls will solve the problem, let's work on that instead of
>> shuffling around code.
>
> How exactly do they solve the problem?  An exe plus a DLL version of the library will usually be larger than just a statically linked exe.

The DLL is loaded into memory once.  With static linking, it's loaded every time you run an exe.

-Steve

December 19, 2011

Re: Program size, linking matter, and static this()

Posted by Steven Schveighoffer
in reply to Walter Bright

Steven Schveighoffer

Posted in reply to Walter Bright

On Fri, 16 Dec 2011 17:55:47 -0500, Walter Bright <newshound2@digitalmars.com> wrote:

> On 12/16/2011 1:45 PM, Andrei Alexandrescu wrote:
>> On 12/16/11 3:38 PM, Trass3r wrote:
>>> A related issue is phobos being an intermodule dependency monster.
>>> A simple hello world pulls in almost 30 modules!
>>> And std.stdio is supposed to be just a simple wrapper around C FILE.
>>
>> In fact it doesn't (after yesterday's commit). The std code in hello, world is a
>> minuscule 3KB. The rest of 218KB is druntime.
>
> Another thing is to avoid using classes for things where one does not expect it to ever be derived from. Use a struct instead, as referencing parts of the struct implementation will not pull in the whole of it, nor is there a vtbl[] to pull it all in.
>
> For example, in std.datetime there's "final class Clock". It inherits nothing, and nothing can be derived from it. The comments for it say it is merely a namespace. It should be a struct.

Although I don't disagree with you that it should be a struct and not a class, does it have anything in its vtbl anyways if it's final?  I'm just trying to understand what gets pulled in when you import a module with static ctors...

-Steve

December 19, 2011

Re: Program size, linking matter, and static this()

Posted by Walter Bright
in reply to Walter Bright

Walter Bright

Posted in reply to Walter Bright

On 12/16/2011 2:55 PM, Walter Bright wrote:
> For example, in std.datetime there's "final class Clock". It inherits nothing,
> and nothing can be derived from it. The comments for it say it is merely a
> namespace. It should be a struct.

Or perhaps it should be in its own module.

December 19, 2011

Re: Program size, linking matter, and static this()

Posted by torhu
in reply to Steven Schveighoffer

torhu

Posted in reply to Steven Schveighoffer

On 19.12.2011 16:08, Steven Schveighoffer wrote:
> On Fri, 16 Dec 2011 17:30:44 -0500, torhu<no@spam.invalid>  wrote:
>
>>  On 16.12.2011 22:28, Steven Schveighoffer wrote:
>>>  In short, dlls will solve the problem, let's work on that instead of
>>>  shuffling around code.
>>
>>  How exactly do they solve the problem?  An exe plus a DLL version of the
>>  library will usually be larger than just a statically linked exe.
>
> The DLL is loaded into memory once.  With static linking, it's loaded
> every time you run an exe.

I thought we were talking about distribution sizes, not memory use.  But anyway, DLL's won't do a lot as long as people don't have a whole bunch of D programs installed.

December 19, 2011

Re: Program size, linking matter, and static this()

Posted by Walter Bright
in reply to Steven Schveighoffer

Walter Bright

Posted in reply to Steven Schveighoffer

On 12/19/2011 7:17 AM, Steven Schveighoffer wrote:
> On Fri, 16 Dec 2011 17:55:47 -0500, Walter Bright <newshound2@digitalmars.com>
> wrote:
>> For example, in std.datetime there's "final class Clock". It inherits nothing,
>> and nothing can be derived from it. The comments for it say it is merely a
>> namespace. It should be a struct.
>
> Although I don't disagree with you that it should be a struct and not a class,
> does it have anything in its vtbl anyways if it's final?

Yes. The pointers to Object's functions, and a pointer to the TypeInfo for that class.

> I'm just trying to
> understand what gets pulled in when you import a module with static ctors...

Write some trivial code snippets, compile them, and take a look at the object file with obj2asm.

December 19, 2011

Re: Program size, linking matter, and static this()

Posted by Steven Schveighoffer
in reply to torhu

Steven Schveighoffer

Posted in reply to torhu

On Mon, 19 Dec 2011 13:09:18 -0500, torhu <no@spam.invalid> wrote:

> On 19.12.2011 16:08, Steven Schveighoffer wrote:
>> On Fri, 16 Dec 2011 17:30:44 -0500, torhu<no@spam.invalid>  wrote:
>>
>>>  On 16.12.2011 22:28, Steven Schveighoffer wrote:
>>>>  In short, dlls will solve the problem, let's work on that instead of
>>>>  shuffling around code.
>>>
>>>  How exactly do they solve the problem?  An exe plus a DLL version of the
>>>  library will usually be larger than just a statically linked exe.
>>
>> The DLL is loaded into memory once.  With static linking, it's loaded
>> every time you run an exe.
>
> I thought we were talking about distribution sizes, not memory use.  But anyway, DLL's won't do a lot as long as people don't have a whole bunch of D programs installed.

Right, in order for dlls to make a difference, you need to separate the library install from the exe install, as is done most of the time.

If you are installing one D application on your box, what would be the issue with the size anyway?  The complaint is generally that the size is much bigger than a hello world compiled for C/C++, which obviously doesn't take into account that the C/C++ standard libraries are DLLs.

-Steve

December 19, 2011

Re: Program size, linking matter, and static this()

Posted by Steven Schveighoffer
in reply to Walter Bright

Steven Schveighoffer

Posted in reply to Walter Bright

On Mon, 19 Dec 2011 13:09:42 -0500, Walter Bright <newshound2@digitalmars.com> wrote:

> On 12/19/2011 7:17 AM, Steven Schveighoffer wrote:
>> On Fri, 16 Dec 2011 17:55:47 -0500, Walter Bright <newshound2@digitalmars.com>
>> wrote:
>>> For example, in std.datetime there's "final class Clock". It inherits nothing,
>>> and nothing can be derived from it. The comments for it say it is merely a
>>> namespace. It should be a struct.
>>
>> Although I don't disagree with you that it should be a struct and not a class,
>> does it have anything in its vtbl anyways if it's final?
>
> Yes. The pointers to Object's functions, and a pointer to the TypeInfo for that class.

Well pointers to Object's functions shouldn't add any bloat.  The TypeInfo may, but that shouldn't pull in any real code from the module, right?

>> I'm just trying to
>> understand what gets pulled in when you import a module with static ctors...
>
> Write some trivial code snippets, compile them, and take a look at the object file with obj2asm.

I'll rephrase -- I'm trying to understand what's *supposed* to happen :)  Trusting that the compiler is doing it right isn't always correct.  Though it probably is in this case.

-Steve

December 19, 2011

Re: Program size, linking matter, and static this()

Posted by Jacob Carlborg
in reply to torhu

Jacob Carlborg

Posted in reply to torhu

On 2011-12-19 19:09, torhu wrote:
> On 19.12.2011 16:08, Steven Schveighoffer wrote:
>> On Fri, 16 Dec 2011 17:30:44 -0500, torhu<no@spam.invalid> wrote:
>>
>>> On 16.12.2011 22:28, Steven Schveighoffer wrote:
>>>> In short, dlls will solve the problem, let's work on that instead of
>>>> shuffling around code.
>>>
>>> How exactly do they solve the problem? An exe plus a DLL version of the
>>> library will usually be larger than just a statically linked exe.
>>
>> The DLL is loaded into memory once. With static linking, it's loaded
>> every time you run an exe.
>
> I thought we were talking about distribution sizes, not memory use. But
> anyway, DLL's won't do a lot as long as people don't have a whole bunch
> of D programs installed.

It could be useful for a package manager. Theoretically all installed packages could share the same dynamic library. But I would guess the the packages would depend on different versions of the library and the package manager would end up installing a whole bunch of different versions of the Phobos and druntime.

-- 
/Jacob Carlborg

December 20, 2011

Re: Program size, linking matter, and static this()

Posted by Denis Shelomovskij
in reply to Andrei Alexandrescu

Denis Shelomovskij

Posted in reply to Andrei Alexandrescu

16.12.2011 21:29, Andrei Alexandrescu пишет:
> Hello,
>
>
> Late last night Walter and I figured a few interesting tidbits of
> information. Allow me to give some context, discuss them, and sketch a
> few approaches for improving things.
>
> A while ago Walter wanted to enable function-level linking, i.e. only
> get the needed functions from a given (and presumably large) module. So
> he arranged things that a library contains many small object "files"
> (that actually are generated from a single .d file and never exist on
> disk, only inside the library file, which can be considered an archive
> like tar). Then the linker would only pick the used object "files" from
> the library and link those in. Unfortunately that didn't have nearly the
> expected impact - essentially the size of most binaries stayed the same.
> The mystery was unsolved, and Walter needed to move on to other things.
>
> One particularly annoying issue is that even programs that don't
> ostensibly use anything from an imported module may balloon inexplicably
> in size. Consider:
>
> import std.path;
> void main(){}
>
> This program, after stripping and all, has some 750KB in size. Removing
> the import line reduces the size to 218KB. That includes the runtime
> support, garbage collector, and such, and I'll consider it a baseline.
> (A similar but separate discussion could be focused on reducing the
> baseline size, but herein I'll consider it constant.)
>
> What we'd simply want is to be able to import stuff without blatantly
> paying for what we don't use. If a program imports std.path and uses no
> function from it, it should be as large as a program without the import.
> Furthermore, the increase should be incremental - using 2-3 functions
> from std.path should only increase the executable size by a little, not
> suddenly link in all code in that module.
>
> But in experiments it seemed like program size would increase in sudden
> amounts when certain modules were included. After much investigation we
> figured that the following fateful causal sequence happened:
>
> 1. Some modules define static constructors with "static this()" or
> "static shared this()", and/or static destructors.
>
> 2. These constructors/destructors are linked in automatically whenever a
> module is included.
>
> 3. Importing a module with a static constructor (or destructor) will
> generate its ModuleInfo structure, which contains static information
> about all module members. In particular, it keeps virtual table pointers
> for all classes defined inside the module.
>
> 4. That means generating ModuleInfo refers all virtual functions defined
> in that module, whether they're used or not.
>
> 5. The phenomenon is transitive, e.g. even if std.path has no static
> constructors but imports std.datetime which does, a ModuleInfo is
> generated for std.path too, in addition to the one for std.datetime. So
> now classes inside std.path (if any) will be all linked in.
>
> 6. It follows that a module that defines classes which in turn use other
> functions in other modules, and has static constructors (or includes
> other modules that do) will baloon the size of the executable suddenly.
>
> There are a few approaches that we can use to improve the state of affairs.
>
> A. On the library side, use static constructors and destructors
> sparingly inside druntime and std. We can use lazy initialization
> instead of compulsively initializing library internals. I think this is
> often a worthy thing to do in any case (dynamic libraries etc) because
> it only does work if and when work needs to be done at the small cost of
> a check upon each use.
>
> B. On the compiler side, we could use a similar lazy initialization
> trick to only refer class methods in the module if they're actually
> needed. I'm being vague here because I'm not sure what and how that can
> be done.
>
> Here's a list of all files in std using static cdtors:
>
> std/__fileinit.d
> std/concurrency.d
> std/cpuid.d
> std/cstream.d
> std/datebase.d
> std/datetime.d
> std/encoding.d
> std/internal/math/biguintcore.d
> std/internal/math/biguintx86.d
> std/internal/processinit.d
> std/internal/windows/advapi32.d
> std/mmfile.d
> std/parallelism.d
> std/perf.d
> std/socket.d
> std/stdiobase.d
> std/uri.d
>
> The majority of them don't do a lot of work and are not much used inside
> phobos, so they don't blow up the executable. The main one that could
> receive some attention is std.datetime. It has a few static ctors and a
> lot of classes. Essentially just importing std.datetime or any std
> module that transitively imports std.datetime (and there are many of
> them) ends up linking in most of Phobos and blows the size up from the
> 218KB baseline to 700KB.
>
> Jonathan, could I impose on you to replace all static cdtors in
> std.datetime with lazy initialization? I looked through it and it
> strikes me as a reasonably simple job, but I think you'd know better
> what to do than me.
>
> A similar effort could be conducted to reduce or eliminate static cdtors
> from druntime. I made the experiment of commenting them all, and that
> reduced the size of the baseline from 218KB to 200KB. This is a good
> amount, but not as dramatic as what we can get by working on std.datetime.
>
>
> Thanks,
>
> Andrei

Really sorry, but it sounds silly for me. It's a minor problem. Does anyone really cares about 600 KiB (3.5x) size change in an empty program? Yes, he does, but only if there is no other size increases in real programs.



Now dmd have at least _two order of magnitude_ file size increase. I posted that problem four months ago at "Building GtkD app on Win32 results in 111 MiB file mostly from zeroes".

An example of this bug is in archive:
http://deoma-cmd.ru/files/other/gtkD-1.5.1-size.7z

Built version (with *.exe and *.lib files):
http://deoma-cmd.ru/files/other/gtkD-1.5.1-size-built.7z


Detailed description:
GtkD is built using singe (gtk-one-obj.lib) or separate (one per source file) object files (gtk-sep-obj.lib).

Than main.d that imports gtk.Main is built using those libraries.

Than zeroCount utils is built and launched over resulting files:
--------------------------------------------------
Now let's calculate zero bytes counts:
--------------------------------------------------
  Zero bytes|     %|    Non-zero| Total bytes|        File
     3628311| 21.56|    13202153|    16830464|gtk-one-obj.lib
     1953124| 15.98|    10272924|    12226048|gtk-sep-obj.lib
   127968798| 99.00|     1298430|   129267228|main-one-obj.exe
      743821| 37.51|     1239183|     1983004|main-sep-obj.exe
Done.

So we have to use very slow per-file build to produce a good (not 100 MiB) executable.
No matter what *.exe is launched, its process allocates ~20MiB of RAM (loaded Gtk dll-s).



The second dmd issue (that was discovered because of 99.00% of zeros) is that _it doesn't use bss section_.
Lets look at the C++ program built using Microsoft's cl:
---
char arr[1024 * 1024 * 10];
void main() { }
---
It resultis in ~10KiB executable, because `arr` is initialized with zero bytes and put in bss section. If one of its elements is set to non-zero:
---
char arr[1024 * 1024 * 10] = { 1 };
void main() { }
---
The array can't be in .bss any more and resulting executable size will be increased by adding ~10MiB. The following D program results in ~10MiB executable:
---
ubyte[1024 * 1024 * 10] arr;
void main() { }
---
So, if there really is a reason not to use .bss, it should be clearly explained.



If described issues aren't much more significant than "static this()", show me where am I wrong, please.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation