DMD producing huge binaries (page 7)

On Saturday, 21 May 2016 at 01:29:18 UTC, Observer wrote: > My recollection is that successively compiled binaries are rarely directly comparable, because of timestamps embedded by compilers. > So I have to ask: are there standard tools to understand enough > of the ELF binary format (or whatever, for the given platform) to compare everything except the timestamps? I just looked at an ELF format document, and didn't see any reference to timestamps. Maybe I'm confusing that with compression formats like gzip produces.

On Saturday, 21 May 2016 at 01:35:05 UTC, Bill Hicks wrote: > On Friday, 20 May 2016 at 19:41:16 UTC, Walter Bright wrote: >> Hashing produces reproducible unique values. Random will not be reproducible and may not even be unique. > > Just a troll here, but Quasirandom numbers are reproducible and unique. e.g., Sobol in base 2 can be very efficient and fast, for any output length. Unless there's a way to seed it consistently so order doesn't matter, it won't work. Although a fast CRC32 on the source could then give you a seed to work with, I'd prefer a hash over PRNG.

On Saturday, 21 May 2016 at 02:16:30 UTC, Era Scarecrow wrote: > > Unless there's a way to seed it consistently so order doesn't matter, it won't work. Although a fast CRC32 on the source could then give you a seed to work with, I'd prefer a hash over PRNG. There is no seed; it's a low-discrepancy sequence, and each element of the sequence can be computed independently.

May 21, 2016

Re: DMD producing huge binaries

Posted by poliklosio
in reply to Andrei Alexandrescu

Permalink

poliklosio

Posted in reply to Andrei Alexandrescu

Permalink

On Thursday, 19 May 2016 at 13:45:18 UTC, Andrei Alexandrescu wrote:
> On 05/19/2016 08:38 AM, Steven Schveighoffer wrote:
>> Yep. chain uses voldemort type, map does not.
>
> We definitely need to fix Voldemort types. Walter and I bounced a few ideas during DConf.
>
> 1. Do the cleartext/compress/hash troika everywhere (currently we do it on Windows). That should help in several places.
>
> 2. For Voldemort types, simply generate a GUID-style random string of e.g. 64 bytes. Currently the name of the type is composed out of its full scope, with some exponentially-increasing repetition of substrings. That would compress well, but it's just a lot of work to produce and then compress the name. A random string is cheap to produce and adequate for Voldemort types (you don't care for their name anyway... Voldemort... get it?).
>
> I very much advocate slapping a 64-long random string for all Voldermort returns and calling it a day. I bet Liran's code will get a lot quicker to build and smaller to boot.
>
>
> Andrei

I have an Idea of reproducible, less-than-exponential time mangling, although I don't know how actionable.

Leave mangling as is, but pretend that you are mangling something different, for example when the input is
foo!(boo!(bar!(baz!(int))), bar!(baz!(int)), baz!(int))
pretend that you are mangling
foo!(boo!(bar!(baz!(int))), #1, #2)
Where #1 and #2 are special symbols that refer to stuff that was **already in the name**, particularly:
#1: bar!(baz!(int))
#2: baz!(int)

Now, this is an reduction of the exponential computation time only if you can detect repetitions before you go inside them, for example, in case of #1, detect the existence of bar!(baz!(int)) without looking inside it and seeing baz!(int). Do you think it could be done?

You would also need a deterministic way to assign those symbols #1, #2, #3... to the stuff in the name.

Compression can also be done at the end if necessary to reduce the polynomial growth.

On Saturday, 21 May 2016 at 06:18:05 UTC, poliklosio wrote: > > I have an Idea of reproducible, less-than-exponential time mangling, although I don't know how actionable. > > Leave mangling as is, but pretend that you are mangling something different, for example when the input is > foo!(boo!(bar!(baz!(int))), bar!(baz!(int)), baz!(int)) > pretend that you are mangling > foo!(boo!(bar!(baz!(int))), #1, #2) > Where #1 and #2 are special symbols that refer to stuff that was **already in the name**, particularly: > #1: bar!(baz!(int)) > #2: baz!(int) See http://forum.dlang.org/post/szodxrizfmufqdkpdryc@forum.dlang.org

On Friday, 20 May 2016 at 19:37:23 UTC, Andrei Alexandrescu wrote: > Was talking to Walter on the phone and he just had one of those great ideas: encode in the function mangle that it returns "auto". I thought that was fantastic. Would that work? -- Andrei Equivalent to not mangling return type at all. But this leaves you with 2^^n growth, still exponential. Also is there any evidence that compression is faster than hashing?

On Saturday, 21 May 2016 at 17:34:19 UTC, Kagamin wrote: > On Friday, 20 May 2016 at 19:37:23 UTC, Andrei Alexandrescu wrote: >> Was talking to Walter on the phone and he just had one of those great ideas: encode in the function mangle that it returns "auto". I thought that was fantastic. Would that work? -- Andrei > > Equivalent to not mangling return type at all. But this leaves you with 2^^n growth, still exponential. Also is there any evidence that compression is faster than hashing? Would it be practical to use type-erasure in the middle of a call chain? It would cut 2^^n right at the knees at the possibly negligible runtime cost. It would impose a burden on the client (as I don't think a compiler should do it on its own - or even if it could), but if it's really Pareto distributed then it shouldn't be that difficult to find the hot spots.

On 05/21/2016 01:34 PM, Kagamin wrote: > But this leaves you with 2^^n growth, still exponential He said that that won't happen any longer, the growth was because of the return type. Is that correct? -- Andrei

On 5/20/2016 11:18 PM, poliklosio wrote: > I have an Idea of reproducible, less-than-exponential time mangling, although I > don't know how actionable. > > Leave mangling as is, but pretend that you are mangling something different, for > example when the input is > foo!(boo!(bar!(baz!(int))), bar!(baz!(int)), baz!(int)) > pretend that you are mangling > foo!(boo!(bar!(baz!(int))), #1, #2) > Where #1 and #2 are special symbols that refer to stuff that was **already in > the name**, particularly: > #1: bar!(baz!(int)) > #2: baz!(int) This is what LZW compression does, except that LZW does it in general rather than just for identifiers.

On Saturday, 21 May 2016 at 18:18:21 UTC, Andrei Alexandrescu wrote: > He said that that won't happen any longer, the growth was because of the return type. Is that correct? https://issues.dlang.org/show_bug.cgi?id=15831#c4 Looks like growth is due to the fact that the voldemort type is in the scope of a function and function is fun!(T).fun(T) - hence each level multiplies by 2. And return type is not part of the mangled name already - that would cause circular dependency, you would need the voldemort type mangling to generate it.

Forums