DMD producing huge binaries (page 5) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » DMD producing huge binaries (page 5)

May 20, 2016

Re: DMD producing huge binaries

Posted by ZombineDev
in reply to Rene Zwanenburg

ZombineDev

Posted in reply to Rene Zwanenburg

On Friday, 20 May 2016 at 15:39:18 UTC, Rene Zwanenburg wrote:
> On Friday, 20 May 2016 at 12:08:37 UTC, ZombineDev wrote:
>> @Rene
>> How do you expect the compiler to know the exact return type, only by looking at this signature:
>> auto transmogrify(string str);
>>
>> A possible implementation might be this:
>> auto transmogrify(string str)
>> {
>>    return str.map!someFunc.filter!otherFunc.joiner();
>> }
>>
>> or something completly different.
>
> I was thinking of something along the lines of this:
>
> =======
> size_t frobnicate(int i)
> {
> 	return 0;
> }
>
> auto frobnicator(T)(T t)
> {
> 	static struct Result
> 	{
> 		int index;
> 		
> 		size_t front()
> 		{
> 			return frobnicate(index);
> 		}
> 		
> 		enum empty = false;
> 		
> 		void popFront()
> 		{
> 			++index;
> 		}
> 	}
> 	
> 	return Result(t.index);
> }
> =======
>
> Automatically generating a header with DMD gives me:
>
> =======
> size_t frobnicate(int i);
> auto frobnicator(T)(T t)
> {
> 	static struct Result
> 	{
> 		int index;
> 		size_t front();
> 		enum empty = false;
> 		void popFront();
> 	}
> 	return Result(t.index);
> }
> =======
>
> Now frobnicator returns a different type for the same T depending on whether you're using the .d or the .di file. I'm not sure if this is a problem, but it sounds like something that can come back to bite you in edge cases.

The only issue is code bloat. It also happens with separate compilation, becuase the compiler can't know if the template has already been instantiated in a different compilation unit.
Only in a single compilation unit/invocation the compiler can reliably see all the places where the same template instance is used. In that case, the actual mangled name shouldn't matter, AFAIU.

May 20, 2016

Re: DMD producing huge binaries

Posted by ZombineDev
in reply to Johan Engelen

ZombineDev

Posted in reply to Johan Engelen

On Friday, 20 May 2016 at 13:16:32 UTC, Johan Engelen wrote:
> On Friday, 20 May 2016 at 12:57:40 UTC, ZombineDev wrote:
>>
>> As I said earlier, it would be best if can prevent the generation of long symbols in the first place, because that would improve the compilation times significantly.
>
> From what I've observed, generating the long symbol name itself is fast. If we avoid the deep type hierarchy, then I think indeed you can expect compile time improvement.
>
>> Walter's PR slows down the compilation with 25-40% according to my tests. I expect that compilation would be faster if the whole process is skipped altogether.
>
> MD5 hashing slowed down builds by a few percent for Weka (note: LDC machinecodegen is slower than DMD's, so percentage-wise...), which can then be compensated for using PGO ;-)  /+  <-- shameless PGO plug  +/

IIUC, your scheme works like this:
1. DMDFE creates a mangled symbol name
2. Create a MD-5 hash of thr symbol use the hash instead of the full name.

If minimal change in Georgi's almost trivial program w.r.t LoC (compared to e.g. Weka's huge codebase) increases compilation time from 1.5sec to 27sec, I can't imagine how slower it would take for a larger project.

We should cure root cause. Genetating uselessly large symbols (even if hashed after that) is part of that problem, so I think it should never done if they start growing than e.g. 500 bytes.

The solution that Steven showed is exactly what the compiler should do. Another reason why the compiler should do it is because often voldemort types capture outer context (local variables, alias paramters, delegates, etc.), which makes it very hard for the user to manually extract the voldemort type out of the function.

May 20, 2016

Re: DMD producing huge binaries

Posted by Wyatt
in reply to Andrei Alexandrescu

Wyatt

Posted in reply to Andrei Alexandrescu

On Friday, 20 May 2016 at 13:24:42 UTC, Andrei Alexandrescu wrote:
>
> I don't see a need for hashing something. Would a randomly-generated string suffice?
>
Naïve question: is a (randomly-)?generated _anything_ required?

I've probably missed something, but it seems like line noise just because we feel we must.  I guess there's a possibility that there would be multiple matches on the same line with the same object and identifier... Stick the column in there too and call it a day?

-Wyatt

May 20, 2016

Re: DMD producing huge binaries

Posted by Adam D. Ruppe
in reply to Wyatt

Adam D. Ruppe

Posted in reply to Wyatt

On Friday, 20 May 2016 at 17:11:52 UTC, Wyatt wrote:
> I've probably missed something, but it seems like line noise just because we feel we must.

Line and file is not unique because the same template would generate different functions for different arguments.

Template arguments are a part of the identifier... and that can really blow up the size because they might be template arguments which might be template arguments, etc., but each unique argument could be creating an entirely different function.

May 20, 2016

Re: DMD producing huge binaries

Posted by Georgi D
in reply to ZombineDev

Georgi D

Posted in reply to ZombineDev

On Friday, 20 May 2016 at 16:21:55 UTC, ZombineDev wrote:
> On Friday, 20 May 2016 at 13:16:32 UTC, Johan Engelen wrote:
>> On Friday, 20 May 2016 at 12:57:40 UTC, ZombineDev wrote:
>>>
>>> As I said earlier, it would be best if can prevent the generation of long symbols in the first place, because that would improve the compilation times significantly.
>>
>> From what I've observed, generating the long symbol name itself is fast. If we avoid the deep type hierarchy, then I think indeed you can expect compile time improvement.
>>
>>> Walter's PR slows down the compilation with 25-40% according to my tests. I expect that compilation would be faster if the whole process is skipped altogether.
>>
>> MD5 hashing slowed down builds by a few percent for Weka (note: LDC machinecodegen is slower than DMD's, so percentage-wise...), which can then be compensated for using PGO ;-)  /+  <-- shameless PGO plug  +/
>
> IIUC, your scheme works like this:
> 1. DMDFE creates a mangled symbol name
> 2. Create a MD-5 hash of thr symbol use the hash instead of the full name.
>
> If minimal change in Georgi's almost trivial program w.r.t LoC (compared to e.g. Weka's huge codebase) increases compilation time from 1.5sec to 27sec, I can't imagine how slower it would take for a larger project.
>
> We should cure root cause. Genetating uselessly large symbols (even if hashed after that) is part of that problem, so I think it should never done if they start growing than e.g. 500 bytes.
>
> The solution that Steven showed is exactly what the compiler should do. Another reason why the compiler should do it is because often voldemort types capture outer context (local variables, alias paramters, delegates, etc.), which makes it very hard for the user to manually extract the voldemort type out of the function.

I see two separate issues that I think should be handled independently:

1) Exponential growth of symbol name with voldemort types.
I like Steven's solution where the compiler lowers the struct outside of the method.

2) Long symbol names in general which could arise even without voldemort types involved especially with chaining multiple algorithms.

I like Johan Engelen solution in LDC for symbols longer than a threshold. For symbols shorter than the threshold I think Walter's compression algorithm could be used to gain 40-60% reduction and still retain the full type information.

May 20, 2016

Re: DMD producing huge binaries

Posted by Andrei Alexandrescu
in reply to Georgi D

Andrei Alexandrescu

Posted in reply to Georgi D

On 5/20/16 2:34 PM, Georgi D wrote:
> 1) Exponential growth of symbol name with voldemort types.
> I like Steven's solution where the compiler lowers the struct outside of
> the method.

Was talking to Walter on the phone and he just had one of those great ideas: encode in the function mangle that it returns "auto". I thought that was fantastic. Would that work? -- Andrei

May 20, 2016

Re: DMD producing huge binaries

Posted by Walter Bright
in reply to Andrei Alexandrescu

Walter Bright

Posted in reply to Andrei Alexandrescu

On 5/20/2016 6:24 AM, Andrei Alexandrescu wrote:
> I don't see a need for hashing something. Would a randomly-generated string
> suffice?

Hashing produces reproducible unique values. Random will not be reproducible and may not even be unique.

May 20, 2016

Re: DMD producing huge binaries

Posted by Walter Bright
in reply to ZombineDev

Walter Bright

Posted in reply to ZombineDev

On 5/20/2016 5:57 AM, ZombineDev wrote:
> Walter's PR slows down the compilation with
> 25-40% according to my tests. I expect that compilation would be faster if the
> whole process is skipped altogether.

The compressor only kicks in if the string is longer than 64 bytes. I set it pretty low in order to have it kick in often, and hopefully flush out any bugs in it.

For a production version, the minimum size should be set substantially larger, based on testing to provide a reasonable balance.

That should speed it up a lot. Also, the algorithm is a bit naively implemented, I bet it could be speeded up substantially.

Hashing isn't algorithmically cheap, either.

May 20, 2016

Re: DMD producing huge binaries

Posted by Andrei Alexandrescu
in reply to Walter Bright

Andrei Alexandrescu

Posted in reply to Walter Bright

On 05/20/2016 03:45 PM, Walter Bright wrote:
> On 5/20/2016 5:57 AM, ZombineDev wrote:
>> Walter's PR slows down the compilation with
>> 25-40% according to my tests. I expect that compilation would be
>> faster if the
>> whole process is skipped altogether.
>
>
> The compressor only kicks in if the string is longer than 64 bytes. I
> set it pretty low in order to have it kick in often, and hopefully flush
> out any bugs in it.
>
> For a production version, the minimum size should be set substantially
> larger, based on testing to provide a reasonable balance.
>
> That should speed it up a lot. Also, the algorithm is a bit naively
> implemented, I bet it could be speeded up substantially.
>
> Hashing isn't algorithmically cheap, either.

From the measurements shown the work seems Pareto distributed - the major time spender is the few long symbols, not the many short symbols. There are a few long symbols that dominate everything else by an order of magnitude. The one percenters! :o) -- Andrei

May 20, 2016

Re: DMD producing huge binaries

Posted by Dicebot
in reply to Walter Bright

Dicebot

Posted in reply to Walter Bright

On Friday, 20 May 2016 at 19:41:16 UTC, Walter Bright wrote:
> On 5/20/2016 6:24 AM, Andrei Alexandrescu wrote:
>> I don't see a need for hashing something. Would a randomly-generated string
>> suffice?
>
> Hashing produces reproducible unique values. Random will not be reproducible and may not even be unique.

The question is: is it actually good for them to be reproducible? The very idea behind voldemort types is that you don't reference them directly in any way, it is just implementation detail. To me it does make sense to apply it to debugging too (debugging of deeply chained template types isn't really very usable anyway).

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation