DMD producing huge binaries (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » DMD producing huge binaries (page 4)

May 20, 2016

Re: DMD producing huge binaries

Posted by ZombineDev
in reply to ZombineDev

ZombineDev

Posted in reply to ZombineDev

On Friday, 20 May 2016 at 12:01:14 UTC, ZombineDev wrote:
> On Friday, 20 May 2016 at 11:40:12 UTC, Rene Zwanenburg wrote:
>> On Friday, 20 May 2016 at 11:32:16 UTC, ZombineDev wrote:
>>> IMO, the best way forward is:
>>> + The compiler should lower voldemort types, according to the scheme that Steve suggested (http://forum.dlang.org/post/nhkmo7$ob5$1@digitalmars.com)
>>> + After that, during symbol generation (mangling) if a symbol starts getting larger than some threshold (e.g. 800 characters), the mangling algorithm should detect that and bail out by generating some unique id instead. The only valuable information that the symbol must include is the module name and location (line and column number) of the template instantiation.
>>
>> Location info shouldn't be used. This will break things like interface files and dynamic libraries.
>
> Well... template-heavy code doesn't play well header-files and dynamic libraries. Most of the time templates are used for the implementation of an interface, but template types such as ranges are unsuable in function signatures. That's why they're called voldemort types - because they're the ones that can not/must not be named.
> Instead dynamic libraries should use stable types such as interfaces, arrays and function pointers, which don't have the aforementioned symbol size problems.
>
> Since we're using random numbers for symbols (instead of the actual names) it would not be possible for such symbols to be part of an interface, because a different invocation of the compiler would produce different symbol names. Such symbols should always an implementation detail, and not part of an interface. That's why location info would play no role, except for debugging purposes.

@Rene
How do you expect the compiler to know the exact return type, only by looking at this signature:
auto transmogrify(string str);

A possible implementation might be this:
auto transmogrify(string str)
{
   return str.map!someFunc.filter!otherFunc.joiner();
}

or something completly different.

May 20, 2016

Re: DMD producing huge binaries

Posted by Johan Engelen
in reply to Andrei Alexandrescu

Johan Engelen

Posted in reply to Andrei Alexandrescu

On Friday, 20 May 2016 at 10:54:18 UTC, Andrei Alexandrescu wrote:
> On 5/19/16 6:16 PM, Walter Bright wrote:
>> On 5/19/2016 6:45 AM, Andrei Alexandrescu wrote:
>>> I very much advocate slapping a 64-long random string for all
>>> Voldermort returns
>>> and calling it a day. I bet Liran's code will get a lot quicker to
>>> build and
>>> smaller to boot.
>>
>> Let's see how far we get with compression first.
>>
>>    https://github.com/dlang/dmd/pull/5793
>>
>> Using 64 character random strings will make symbolic debugging unpleasant.
>
> This is a fallacy. I don't think so, at all, when the baseline is an extremely long string.

I agree with Andrei.
I solved it this way https://github.com/ldc-developers/ldc/pull/1445:
"Hashed symbols look like this:
_D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
ddemangle gives:
one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
Meaning: this symbol is defined in module one.two.three on line 34. The identifier is foo and is contained in the struct or class Result."

May 20, 2016

Re: DMD producing huge binaries

Posted by Andrei Alexandrescu
in reply to Johan Engelen

Andrei Alexandrescu

Posted in reply to Johan Engelen

On 05/20/2016 08:21 AM, Johan Engelen wrote:
> On Friday, 20 May 2016 at 10:54:18 UTC, Andrei Alexandrescu wrote:
>> On 5/19/16 6:16 PM, Walter Bright wrote:
>>> On 5/19/2016 6:45 AM, Andrei Alexandrescu wrote:
>>>> I very much advocate slapping a 64-long random string for all
>>>> Voldermort returns
>>>> and calling it a day. I bet Liran's code will get a lot quicker to
>>>> build and
>>>> smaller to boot.
>>>
>>> Let's see how far we get with compression first.
>>>
>>>    https://github.com/dlang/dmd/pull/5793
>>>
>>> Using 64 character random strings will make symbolic debugging
>>> unpleasant.
>>
>> This is a fallacy. I don't think so, at all, when the baseline is an
>> extremely long string.
>
> I agree with Andrei.
> I solved it this way https://github.com/ldc-developers/ldc/pull/1445:
> "Hashed symbols look like this:
> _D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
> ddemangle gives:
> one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
> Meaning: this symbol is defined in module one.two.three on line 34. The
> identifier is foo and is contained in the struct or class Result."

This is nice. How difficult would it be to rework it into a PR for dmd? -- Andrei

May 20, 2016

Re: DMD producing huge binaries

Posted by ZombineDev
in reply to Johan Engelen

ZombineDev

Posted in reply to Johan Engelen

On Friday, 20 May 2016 at 12:21:58 UTC, Johan Engelen wrote:
> On Friday, 20 May 2016 at 10:54:18 UTC, Andrei Alexandrescu wrote:
>> On 5/19/16 6:16 PM, Walter Bright wrote:
>>> On 5/19/2016 6:45 AM, Andrei Alexandrescu wrote:
>>>> I very much advocate slapping a 64-long random string for all
>>>> Voldermort returns
>>>> and calling it a day. I bet Liran's code will get a lot quicker to
>>>> build and
>>>> smaller to boot.
>>>
>>> Let's see how far we get with compression first.
>>>
>>>    https://github.com/dlang/dmd/pull/5793
>>>
>>> Using 64 character random strings will make symbolic debugging unpleasant.
>>
>> This is a fallacy. I don't think so, at all, when the baseline is an extremely long string.
>
> I agree with Andrei.
> I solved it this way https://github.com/ldc-developers/ldc/pull/1445:
> "Hashed symbols look like this:
> _D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
> ddemangle gives:
> one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
> Meaning: this symbol is defined in module one.two.three on line 34. The identifier is foo and is contained in the struct or class Result."

I like your approach. As I said earlier, it would be best if can prevent the generation of long symbols in the first place, because that would improve the compilation times significantly. Walter's PR slows down the compilation with 25-40% according to my tests. I expect that compilation would be faster if the whole process is skipped altogether.

May 20, 2016

Re: DMD producing huge binaries

Posted by Johan Engelen
in reply to Andrei Alexandrescu

Johan Engelen

Posted in reply to Andrei Alexandrescu

On Friday, 20 May 2016 at 12:30:10 UTC, Andrei Alexandrescu wrote:
> On 05/20/2016 08:21 AM, Johan Engelen wrote:
>> 
>> https://github.com/ldc-developers/ldc/pull/1445:
>> "Hashed symbols look like this:
>> _D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
>> ddemangle gives:
>> one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
>> Meaning: this symbol is defined in module one.two.three on line 34. The
>> identifier is foo and is contained in the struct or class Result."
>
> This is nice. How difficult would it be to rework it into a PR for dmd? -- Andrei

I can work on it, but only if it will not result in a long debate afterwards (!!!).

One obstacle is the hasher itself: I am not going to implement one myself! In the LDC PR, I used LLVM's MD5 hasher and Phobos's MD5 hasher. Perhaps it is better to use a faster hasher (I have no expertise on that; Murmur?), so we will have to carry our own copy of a good hasher implementation. Or perhaps the speedlimit is memory access and hash algorithm speed doesn't matter.

I made the hashing optional, with a symbol length threshold. Getting rid of the variable threshold would be good, such that the (few) large symbols in Phobos are hashed too and all will work fine. Perhaps 1k is a good threshold.

May 20, 2016

Re: DMD producing huge binaries

Posted by Johan Engelen
in reply to ZombineDev

Johan Engelen

Posted in reply to ZombineDev

On Friday, 20 May 2016 at 12:57:40 UTC, ZombineDev wrote:
>
> As I said earlier, it would be best if can prevent the generation of long symbols in the first place, because that would improve the compilation times significantly.

From what I've observed, generating the long symbol name itself is fast. If we avoid the deep type hierarchy, then I think indeed you can expect compile time improvement.

> Walter's PR slows down the compilation with 25-40% according to my tests. I expect that compilation would be faster if the whole process is skipped altogether.

MD5 hashing slowed down builds by a few percent for Weka (note: LDC machinecodegen is slower than DMD's, so percentage-wise...), which can then be compensated for using PGO ;-)  /+  <-- shameless PGO plug  +/

May 20, 2016

Re: DMD producing huge binaries

Posted by Andrei Alexandrescu
in reply to Johan Engelen

Andrei Alexandrescu

Posted in reply to Johan Engelen

On 05/20/2016 09:07 AM, Johan Engelen wrote:
> On Friday, 20 May 2016 at 12:30:10 UTC, Andrei Alexandrescu wrote:
>> On 05/20/2016 08:21 AM, Johan Engelen wrote:
>>>
>>> https://github.com/ldc-developers/ldc/pull/1445:
>>> "Hashed symbols look like this:
>>> _D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
>>> ddemangle gives:
>>> one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
>>> Meaning: this symbol is defined in module one.two.three on line 34. The
>>> identifier is foo and is contained in the struct or class Result."
>>
>> This is nice. How difficult would it be to rework it into a PR for
>> dmd? -- Andrei
>
> I can work on it, but only if it will not result in a long debate
> afterwards (!!!).

Thanks. I'll get back to you on that.

> One obstacle is the hasher itself: I am not going to implement one
> myself! In the LDC PR, I used LLVM's MD5 hasher and Phobos's MD5 hasher.
> Perhaps it is better to use a faster hasher (I have no expertise on
> that; Murmur?), so we will have to carry our own copy of a good hasher
> implementation. Or perhaps the speedlimit is memory access and hash
> algorithm speed doesn't matter.
>
> I made the hashing optional, with a symbol length threshold. Getting rid
> of the variable threshold would be good, such that the (few) large
> symbols in Phobos are hashed too and all will work fine. Perhaps 1k is a
> good threshold.

I don't see a need for hashing something. Would a randomly-generated string suffice?


Andrei

May 20, 2016

Re: DMD producing huge binaries

Posted by H. S. Teoh
in reply to Andrei Alexandrescu

H. S. Teoh

Posted in reply to Andrei Alexandrescu

On Fri, May 20, 2016 at 09:24:42AM -0400, Andrei Alexandrescu via Digitalmars-d wrote:
> On 05/20/2016 09:07 AM, Johan Engelen wrote:
[...]
> > One obstacle is the hasher itself: I am not going to implement one myself! In the LDC PR, I used LLVM's MD5 hasher and Phobos's MD5 hasher.  Perhaps it is better to use a faster hasher (I have no expertise on that; Murmur?), so we will have to carry our own copy of a good hasher implementation. Or perhaps the speedlimit is memory access and hash algorithm speed doesn't matter.
> > 
> > I made the hashing optional, with a symbol length threshold. Getting rid of the variable threshold would be good, such that the (few) large symbols in Phobos are hashed too and all will work fine. Perhaps 1k is a good threshold.
> 
> I don't see a need for hashing something. Would a randomly-generated string suffice?
[...]

Wouldn't we want the same symbol to be generated if we call a Voldemort-returning function with the same compile-time arguments (but not necessarily runtime arguments) multiple times from different places? This is likely not an issue when compiling all sources at once, but may be a problem with incremental compilation.


T

-- 
"Hi." "'Lo."

May 20, 2016

Re: DMD producing huge binaries

Posted by Marc Schütz
in reply to Andrei Alexandrescu

Marc Schütz

Posted in reply to Andrei Alexandrescu

On Friday, 20 May 2016 at 13:24:42 UTC, Andrei Alexandrescu wrote:
> I don't see a need for hashing something. Would a randomly-generated string suffice?

That would break separate compilation, wouldn't it?

May 20, 2016

Re: DMD producing huge binaries

Posted by Rene Zwanenburg
in reply to ZombineDev

Rene Zwanenburg

Posted in reply to ZombineDev

On Friday, 20 May 2016 at 12:08:37 UTC, ZombineDev wrote:
> @Rene
> How do you expect the compiler to know the exact return type, only by looking at this signature:
> auto transmogrify(string str);
>
> A possible implementation might be this:
> auto transmogrify(string str)
> {
>    return str.map!someFunc.filter!otherFunc.joiner();
> }
>
> or something completly different.

I was thinking of something along the lines of this:

=======
size_t frobnicate(int i)
{
	return 0;
}

auto frobnicator(T)(T t)
{
	static struct Result
	{
		int index;
		
		size_t front()
		{
			return frobnicate(index);
		}
		
		enum empty = false;
		
		void popFront()
		{
			++index;
		}
	}
	
	return Result(t.index);
}
=======

Automatically generating a header with DMD gives me:

=======
size_t frobnicate(int i);
auto frobnicator(T)(T t)
{
	static struct Result
	{
		int index;
		size_t front();
		enum empty = false;
		void popFront();
	}
	return Result(t.index);
}
=======

Now frobnicator returns a different type for the same T depending on whether you're using the .d or the .di file. I'm not sure if this is a problem, but it sounds like something that can come back to bite you in edge cases.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation