September 20, 2013
On Friday, 20-Sep-13 3:04 PM, Nick Sabalausky wrote:
> On Fri, 20 Sep 2013 21:45:48 +0200
> "Temtaime" <temtaime@gmail.com> wrote:
>>
>> Software MUST
>> running almost ANYWHERE and consumes minimal resources.
>>
>> For example i hate 3dsmax developers when on my game's map it
>> uses several GB of ram amd freezes sometimes, when Blender uses
>> only 500 MB and runs fast. The only reason for me for use 3dsmax
>> is more friendly contoling. But this is another story...
>>
>> Some users which doesn't have ""modern"" PC will hate your app
>> too i think.
>> One should optimize ALL things which he can to optimize.
>>
>
> I agree with what you're saying here, but the problem is we're looking
> at a difference of only a few hundred k.
>
> Heck, my primary PC was a 32-bit single-core right up until last year
> (and I still use it as a secondary system), and I didn't care one bit if
> a hello world was 1k or 1MB.
>
> How many real world programs are as trivial as a hello world? A few
> maybe, but not many. Certainly not enough to actually add up to
> anything significant, unless maybe you happen to be running on a 286 or
> such.
>
> If we were talking about real-world D programs taking tens/hundreds of
> MB more than they should, then that would be a problem. But they
> don't. We're just talking about a few hundred k for an *entire* program.
>

I should have been a bit more clear!! It's the _relative_ size difference that bothers me!! One is almost 26 times larger than the other. If I'm to expect that same variance in a large to huge project, that I think that I'd me in a world of bullshine!!
September 20, 2013
On Friday, 20-Sep-13 2:20 PM, H. S. Teoh wrote:
> On Fri, Sep 20, 2013 at 11:26:18AM -0600, Duke Normandin wrote:
>> On Friday, 20-Sep-13 10:45 AM, Adam D. Ruppe wrote:
>>> On Friday, 20 September 2013 at 16:20:34 UTC, Duke Normandin wrote:
>>>> Why such a huge difference???
>>>
>>> The D program carries its additional D runtime library code with it,
>>> whereas the C program only depends on libraries provided by the
>>> operating system, and thus it doesn't have to include it in the exe.
>>
>> Now that I know _why_ , is there a way to shave tons off those
>> executables? Any optimization possible?
>
> If you're on Linux:
>
> 	dmd -release -O myprogram.d
> 	strip myprogram
> 	upx myprogram
>
> I've seen this reduce a 50MB executable down to about 400k. YMMV.
>
> Keep in mind, though, that stripping basically deletes all debugging
> information from the executable (plus a bunch of other stuff -- you
> don't want to do this to an object file or a library, for example), so
> it's not something you want to do during development. And upx turns your
> executable into something that probably violates the ELF spec in many
> different ways, but resembles it closely enough that the kernel will
> still run it. File type recognizers like 'file' may fail to recognize
> the result as an executable afterwards. But it will still work. (That's
> how cool upx is, in case you don't already know that.)

Thx!  I'll have to do some experimenting ...

September 20, 2013
On Friday, 20-Sep-13 11:59 AM, JohnnyK wrote:
> On Friday, 20 September 2013 at 16:20:34 UTC, Duke Normandin wrote:
>> I'm re-visiting the D language. I've compared the file sizes of 2
>> executables - 1 is compiled C code using gcc; the other is D code
>> using dmd.
>>
>> helloWorld.d => helloWorld.exe = 146,972 bytes
>> ex1hello.c => ex1-hello.exe = 5,661 bytes
>>
>> Why such a huge difference???
>>
>> Duke
>
> That 140KB is called the CYA document.  It is there so that when you
> the programmer screws up you don't look so bad in front of your boss.


[quote]CYA document ...[/quote]

sounds about right!!! :)
September 20, 2013
Duke Normandin:

> I should have been a bit more clear!! It's the _relative_ size difference that bothers me!! One is almost 26 times larger than the other.

http://xkcd.com/605/

Bye,
bearophile
September 20, 2013
On Fri, Sep 20, 2013 at 05:04:23PM -0400, Nick Sabalausky wrote:
> On Fri, 20 Sep 2013 21:45:48 +0200
> "Temtaime" <temtaime@gmail.com> wrote:
> > 
> > Software MUST running almost ANYWHERE and consumes minimal resources.
> > 
> > For example i hate 3dsmax developers when on my game's map it uses several GB of ram amd freezes sometimes, when Blender uses only 500 MB and runs fast. The only reason for me for use 3dsmax is more friendly contoling. But this is another story...
> > 
> > Some users which doesn't have ""modern"" PC will hate your app too i think.  One should optimize ALL things which he can to optimize.
> > 
> 
> I agree with what you're saying here, but the problem is we're looking at a difference of only a few hundred k.
> 
> Heck, my primary PC was a 32-bit single-core right up until last year (and I still use it as a secondary system), and I didn't care one bit if a hello world was 1k or 1MB.

I agree with the OP that dmd should improve dead-code culling, though. Recently Walter has started doing lazy template instantiation for imports, which begins to trim off some of the fat. But there's plenty of room for more improvements.

For example, after seeing Walter's recent pulls, I got inspired to write a simple utility that takes the output of objdump -d (the disassembly of an executable) and parses it to extract code symbols from the program along with references to other symbols. It then builds of graph of how symbols reference each other, and performs some trivial reachability analysis on it. It revealed some startling results... like the fact that symbols from std.complex are included in a hello world program, even though complex numbers are never used!

The ratio of total number of symbols to symbols transitively reachable from _Dmain is rather large, ranging from 5 (medium-sized, complex program) to about 30 (a hello world program). Now I'm not 100% confident about the accuracy of these numbers, since some symbols may be indirectly referenced, and thus missed in the graph built from parsing the disassembly. But still, even when taken as ballpark figures, it shows that there's a *lot* of room for improvement. Certainly, some of the unreferenced symbols are druntime overhead (used by startup/exit functions, etc.), but a ratio of *5*? That's a 5x executable size bloat. Even if we discount half of that for druntime overhead and indirect references... I mean, how many indirect references can you have?  I really can't convince myself that's "merely" druntime/phobos overhead. Especially when I see symbols from std.complex in a program that doesn't even use complex numbers. std.complex shouldn't be in there in the first place, before we even talk about template bloat.


> How many real world programs are as trivial as a hello world? A few maybe, but not many. Certainly not enough to actually add up to anything significant, unless maybe you happen to be running on a 286 or such.
> 
> If we were talking about real-world D programs taking tens/hundreds of
> MB more than they should, then that would be a problem. But they
> don't. We're just talking about a few hundred k for an *entire* program.

My numbers show otherwise. :) Well, OK, I'm counting symbols rather than size, and the count may not be 100% accurate. But it does show that we could improve. By a lot.

A hello world program, according to my test, has a ratio of 30 between total symbols and symbols reachable from _Dmain, whereas a medium-sized complex program shows a ratio of around 5 (the symbol analyser program itself, which is significantly simpler than the complex program I tested, also shows a ratio of 5). So we can probably discount the hello world case, since most of the apparent bloat is probably just one-off overhead from druntime, etc.. But the ratio of 5 for non-trivial programs? No matter how I try to rationalize it, I'm forced to conclude that there is a lot of room for improvement here. Surely *some* significant subset of these unreferenced symbols must be actually unreachable and can be pruned from the executable.

I'll continue refining the analysis while Walter works on more lazy instantiations for imports. I'm expecting to see a lot of improvements in this area. :)


T

-- 
Береги платье снову, а здоровье смолоду.
September 20, 2013
On 9/20/13 3:49 PM, Duke Normandin wrote:
> On Friday, 20-Sep-13 3:04 PM, Nick Sabalausky wrote:
>> On Fri, 20 Sep 2013 21:45:48 +0200
>> "Temtaime" <temtaime@gmail.com> wrote:
>>>
>>> Software MUST
>>> running almost ANYWHERE and consumes minimal resources.
>>>
>>> For example i hate 3dsmax developers when on my game's map it
>>> uses several GB of ram amd freezes sometimes, when Blender uses
>>> only 500 MB and runs fast. The only reason for me for use 3dsmax
>>> is more friendly contoling. But this is another story...
>>>
>>> Some users which doesn't have ""modern"" PC will hate your app
>>> too i think.
>>> One should optimize ALL things which he can to optimize.
>>>
>>
>> I agree with what you're saying here, but the problem is we're looking
>> at a difference of only a few hundred k.
>>
>> Heck, my primary PC was a 32-bit single-core right up until last year
>> (and I still use it as a secondary system), and I didn't care one bit if
>> a hello world was 1k or 1MB.
>>
>> How many real world programs are as trivial as a hello world? A few
>> maybe, but not many. Certainly not enough to actually add up to
>> anything significant, unless maybe you happen to be running on a 286 or
>> such.
>>
>> If we were talking about real-world D programs taking tens/hundreds of
>> MB more than they should, then that would be a problem. But they
>> don't. We're just talking about a few hundred k for an *entire* program.
>>
>
> I should have been a bit more clear!! It's the _relative_ size
> difference that bothers me!! One is almost 26 times larger than the
> other. If I'm to expect that same variance in a large to huge project,
> that I think that I'd me in a world of bullshine!!

The point here is that the factor does not preserve as sizes go. A 4 year-old is twice as old as a 2-year-old, but a 34-year-old is not twice as old as a 32-year-old.

Andrei

September 21, 2013
On Fri, 20 Sep 2013 16:49:58 -0600
Duke Normandin <dukeofperl@ml1.net> wrote:
> 
> I should have been a bit more clear!! It's the _relative_ size difference that bothers me!! One is almost 26 times larger than the other. If I'm to expect that same variance in a large to huge project, that I think that I'd me in a world of bullshine!!


If you're to expect that same variance in a large, huge or even normal sized program then you're very, very mistaken.

September 21, 2013
On 21 September 2013 09:02, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:

> On Fri, Sep 20, 2013 at 05:04:23PM -0400, Nick Sabalausky wrote:
> > On Fri, 20 Sep 2013 21:45:48 +0200
> > "Temtaime" <temtaime@gmail.com> wrote:
> > >
> > > Software MUST running almost ANYWHERE and consumes minimal resources.
> > >
> > > For example i hate 3dsmax developers when on my game's map it uses several GB of ram amd freezes sometimes, when Blender uses only 500 MB and runs fast. The only reason for me for use 3dsmax is more friendly contoling. But this is another story...
> > >
> > > Some users which doesn't have ""modern"" PC will hate your app too i think.  One should optimize ALL things which he can to optimize.
> > >
> >
> > I agree with what you're saying here, but the problem is we're looking at a difference of only a few hundred k.
> >
> > Heck, my primary PC was a 32-bit single-core right up until last year (and I still use it as a secondary system), and I didn't care one bit if a hello world was 1k or 1MB.
>
> I agree with the OP that dmd should improve dead-code culling, though. Recently Walter has started doing lazy template instantiation for imports, which begins to trim off some of the fat. But there's plenty of room for more improvements.
>
> For example, after seeing Walter's recent pulls, I got inspired to write a simple utility that takes the output of objdump -d (the disassembly of an executable) and parses it to extract code symbols from the program along with references to other symbols. It then builds of graph of how symbols reference each other, and performs some trivial reachability analysis on it. It revealed some startling results... like the fact that symbols from std.complex are included in a hello world program, even though complex numbers are never used!
>
> The ratio of total number of symbols to symbols transitively reachable from _Dmain is rather large, ranging from 5 (medium-sized, complex program) to about 30 (a hello world program). Now I'm not 100% confident about the accuracy of these numbers, since some symbols may be indirectly referenced, and thus missed in the graph built from parsing the disassembly. But still, even when taken as ballpark figures, it shows that there's a *lot* of room for improvement. Certainly, some of the unreferenced symbols are druntime overhead (used by startup/exit functions, etc.), but a ratio of *5*? That's a 5x executable size bloat. Even if we discount half of that for druntime overhead and indirect references... I mean, how many indirect references can you have?  I really can't convince myself that's "merely" druntime/phobos overhead. Especially when I see symbols from std.complex in a program that doesn't even use complex numbers. std.complex shouldn't be in there in the first place, before we even talk about template bloat.
>
>
> > How many real world programs are as trivial as a hello world? A few maybe, but not many. Certainly not enough to actually add up to anything significant, unless maybe you happen to be running on a 286 or such.
> >
> > If we were talking about real-world D programs taking tens/hundreds of
> > MB more than they should, then that would be a problem. But they
> > don't. We're just talking about a few hundred k for an *entire* program.
>
> My numbers show otherwise. :) Well, OK, I'm counting symbols rather than size, and the count may not be 100% accurate. But it does show that we could improve. By a lot.
>
> A hello world program, according to my test, has a ratio of 30 between total symbols and symbols reachable from _Dmain, whereas a medium-sized complex program shows a ratio of around 5 (the symbol analyser program itself, which is significantly simpler than the complex program I tested, also shows a ratio of 5). So we can probably discount the hello world case, since most of the apparent bloat is probably just one-off overhead from druntime, etc.. But the ratio of 5 for non-trivial programs? No matter how I try to rationalize it, I'm forced to conclude that there is a lot of room for improvement here. Surely *some* significant subset of these unreferenced symbols must be actually unreachable and can be pruned from the executable.
>
> I'll continue refining the analysis while Walter works on more lazy instantiations for imports. I'm expecting to see a lot of improvements in this area. :)
>

This is awesome.
What would be really awesome is if you integrated this into the D
auto-builder, and hack it publish the results somewhere for the latest
build.
It would be good to know when people write code that results in a
significant increase in coverage (particularly when it doesn't need to).
It would also provide very useful information for hackers who just want to
get in and do some work to try and trim it a bit.


September 21, 2013
On Friday, 20 September 2013 at 23:03:48 UTC, H. S. Teoh wrote:
> I'll continue refining the analysis while Walter works on more lazy
> instantiations for imports. I'm expecting to see a lot of improvements
> in this area. :)

I have been doing similar analysis for some time too, only mostly manually (was curious what symbols actually get included for trivial programs), with pretty much the same conclusion.

Right now I am pretty much convinced that we need some sort of whole program optimization and tweak language spec to allow it safely (i.e. force dynamically loaded symbols to be marked with export).

Lot of code bloat comes from stuff which is unnecessary in the big picture but compiler has to means to decide it during compilation. There is no real reason why

`[1, 2, 3].map!(a => a*2)().reduce!((a, b) => a + b)(0)`

can't be reduce to single loop and inlined, leaving no traces of actual std.algorithm usage.

Other than compiler can't possibly be sure that you won't try to link to those generate instances somewhere (or pass it to shared library). That feels like a language design issue to address.
September 21, 2013
On Saturday, 21 September 2013 at 10:29:35 UTC, Dicebot wrote:
> Lot of code bloat comes from stuff which is unnecessary in the big picture but compiler has to means to decide it during compilation. There is no real reason why
>
> `[1, 2, 3].map!(a => a*2)().reduce!((a, b) => a + b)(0)`
>
> can't be reduce to single loop and inlined, leaving no traces of actual std.algorithm usage.

There's no theoretical reason, but plenty of practical reasons. bearophile linked to a talk by Chandler Carruth that explains the difficulties encountered by inlining optimisers.