DMD producing huge binaries (page 6)

On 5/20/2016 1:49 PM, Dicebot wrote: > The question is: is it actually good for them to be reproducible? 1. yes, because the same type can be generated from multiple compilation units (the linker removes the duplicates - if they're not duplicates, the linker won't remove them). 2. having the compiler produce different .o files on different runs with the same inputs is pretty eyebrow raising, and makes it harder to debug the compiler, and harder to have a test suite for the compiler.

May 20, 2016

Re: DMD producing huge binaries

Posted by cym13
in reply to Dicebot

Permalink

cym13

Posted in reply to Dicebot

Permalink

On Friday, 20 May 2016 at 20:49:20 UTC, Dicebot wrote:
> On Friday, 20 May 2016 at 19:41:16 UTC, Walter Bright wrote:
>> On 5/20/2016 6:24 AM, Andrei Alexandrescu wrote:
>>> I don't see a need for hashing something. Would a randomly-generated string
>>> suffice?
>>
>> Hashing produces reproducible unique values. Random will not be reproducible and may not even be unique.
>
> The question is: is it actually good for them to be reproducible? The very idea behind voldemort types is that you don't reference them directly in any way, it is just implementation detail. To me it does make sense to apply it to debugging too (debugging of deeply chained template types isn't really very usable anyway).

It would make binary comparison of libraries and executables difficult which troubles me as comparing hashes is a basics of binary distribution security : you can check that a precompiled binary is legit by recompiling it in the same conditions and comparing the two. It would be way harder if random components were added.

On Fri, May 20, 2016 at 02:08:02PM -0700, Walter Bright via Digitalmars-d wrote: [...] > 2. having the compiler produce different .o files on different runs with the same inputs is pretty eyebrow raising, and makes it harder to debug the compiler, and harder to have a test suite for the compiler. Yeah, I think hashing is the way to go. Random strings just raise more issues than they solve. T -- In theory, software is implemented according to the design that has been carefully worked out beforehand. In practice, design documents are written after the fact to describe the sorry mess that has gone on before.

On Friday, 20 May 2016 at 19:37:23 UTC, Andrei Alexandrescu wrote: > On 5/20/16 2:34 PM, Georgi D wrote: >> 1) Exponential growth of symbol name with voldemort types. >> I like Steven's solution where the compiler lowers the struct outside of >> the method. > > Was talking to Walter on the phone and he just had one of those great ideas: encode in the function mangle that it returns "auto". I thought that was fantastic. Would that work? -- Andrei This is a very interesting idea. I see one problem though: The real issue is not just the return type but also the types of the input parameters to a function. So even if the return type of a method is encoded as "auto" the moment the result of that method is passed as parameter to another template method the long typename will resurface. I do not think the type of the input parameters could be encoded as "auto" since the different instances of a template method will clash in names. In essence the problem that should be solved is the long names of types.

On Friday, 20 May 2016 at 22:01:21 UTC, Georgi D wrote: > This is a very interesting idea. I see one problem though: The real issue is not just the return type but also the types of the input parameters to a function. So even if the return type of a method is encoded as "auto" the moment the result of that method is passed as parameter to another template method the long typename will resurface. I do not think the type of the input parameters could be encoded as "auto" since the different instances of a template method will clash in names. > > In essence the problem that should be solved is the long names of types. I keep having it go through my head, if we had to hash the result, I'd prefer to has the source (minus comments & whitespace, and after replacements have been done) to come up with a reproducible value. This of course is if long names or compression won't do the job.

On Friday, 20 May 2016 at 19:45:36 UTC, Walter Bright wrote: > > Hashing isn't algorithmically cheap, either. I also don't think compression should be a performance issue. I heard that some compression algorithms are as fast as the data comes in, so fast enough for our purpose. The reason I chose hashing is simplicity and a guaranteed small symbol size. I wasn't sure whether a 5MB symbol would compress to a reasonable size. I wanted symbols to be readable; so less than, say, 80 chars. For people interested in finding out whether mangling changes will improve compile speed performance: forget about linking and mangling, and just assign symbol names like "a", "b", "c", etc., and see what happens.

On 05/20/2016 06:01 PM, Georgi D wrote: > In essence the problem that should be solved is the long names of types. Voldemort returns are by far the worst. Compression and hashing should handle the rest. -- Andrei

On Friday, 20 May 2016 at 21:09:23 UTC, cym13 wrote: > It would make binary comparison of libraries and executables difficult which troubles me as comparing hashes is a basics of binary distribution security : you can check that a precompiled binary is legit by recompiling it in the same conditions and comparing the two. It would be way harder if random components were added. My recollection is that successively compiled binaries are rarely directly comparable, because of timestamps embedded by compilers. So I have to ask: are there standard tools to understand enough of the ELF binary format (or whatever, for the given platform) to compare everything except the timestamps?

On Friday, 20 May 2016 at 19:41:16 UTC, Walter Bright wrote: > On 5/20/2016 6:24 AM, Andrei Alexandrescu wrote: >> I don't see a need for hashing something. Would a randomly-generated string >> suffice? > > Hashing produces reproducible unique values. Random will not be reproducible and may not even be unique. Just a troll here, but Quasirandom numbers are reproducible and unique. e.g., Sobol in base 2 can be very efficient and fast, for any output length.

Forums