March 26, 2016
On 03/25/2016 11:40 AM, Steven Schveighoffer wrote:
> On 3/25/16 11:07 AM, Andrei Alexandrescu wrote:
>> On 3/25/16 10:07 AM, Steven Schveighoffer wrote:
>>>
>>> We should actually be moving *away from* voldemort types:
>>>
>>> https://forum.dlang.org/post/n96k3g$ka5$1@digitalmars.com
>>>
>>
>> Has this bug been submitted? -- Andrei
>
> I'm not sure it's a bug that can be fixed. It's caused by the design of
> the way template name mangling is included.
>
> I can submit a general "enhancement", but I don't know what it would
> say? Make template mangling more efficient? :)
>
> I suppose having a bug report with a demonstration of why we should
> change it is a good thing. I'll add that.
>
> -Steve

Compression of template names. -- Andrei
March 26, 2016
On 03/25/2016 10:30 AM, Adam D. Ruppe wrote:
> On Friday, 25 March 2016 at 14:07:41 UTC, Steven Schveighoffer wrote:
>> We should actually be moving *away from* voldemort types:
>
> Indeed, they basically suck.

I love those things! -- Andrei

March 25, 2016
On Saturday, March 26, 2016 01:24:11 Andrei Alexandrescu via Digitalmars-d wrote:
> On 03/25/2016 10:30 AM, Adam D. Ruppe wrote:
> > On Friday, 25 March 2016 at 14:07:41 UTC, Steven Schveighoffer wrote:
> >> We should actually be moving *away from* voldemort types:
> > Indeed, they basically suck.
>
> I love those things! -- Andrei

They're very cool in principle but have some issues in practice. As long as we can reasonably fix those issues, I think that we should continue to use them, but if we can't, then we'll have to reconsider using them. But I hope that we can solve the issues. It would certainly be a shame if we couldn't.

- Jonathan M Davis

March 26, 2016
On 3/26/16 1:24 AM, Andrei Alexandrescu wrote:
> On 03/25/2016 10:30 AM, Adam D. Ruppe wrote:
>> On Friday, 25 March 2016 at 14:07:41 UTC, Steven Schveighoffer wrote:
>>> We should actually be moving *away from* voldemort types:
>>
>> Indeed, they basically suck.
>
> I love those things! -- Andrei
>

I prefer voldemort types too! To repeat all the template machinery in an external struct is annoying. So nice to just put "struct X" inside the function.

But clearly, not worth the pain if this can't be fixed.

-Steve
March 26, 2016
On Saturday, 26 March 2016 at 05:22:56 UTC, Andrei Alexandrescu wrote:
> On 03/25/2016 11:40 AM, Steven Schveighoffer wrote:
>> On 3/25/16 11:07 AM, Andrei Alexandrescu wrote:
>>> On 3/25/16 10:07 AM, Steven Schveighoffer wrote:
>>>>
>>>> We should actually be moving *away from* voldemort types:
>>>>
>>>> https://forum.dlang.org/post/n96k3g$ka5$1@digitalmars.com
>>>>
>>>
>>> Has this bug been submitted? -- Andrei
>>
>> I'm not sure it's a bug that can be fixed. It's caused by the design of
>> the way template name mangling is included.
>>
>> I can submit a general "enhancement", but I don't know what it would
>> say? Make template mangling more efficient? :)
>>
>> I suppose having a bug report with a demonstration of why we should
>> change it is a good thing. I'll add that.
>>
>> -Steve
>
> Compression of template names. -- Andrei

Would literally not help. The problem in the bug report is recursive expansion of templates creating mangled name length O(2^n) where n is the number of recursive calls. If you compress that to an eighth of its size, you get O(2^(n-3)), which isn't actually fixing things, as that is still O(2^n). The (conceptually) simple change I suggested brings the mangled name length down to O(n).

You could compress *that*, but then you are compressing such a small amount of data most compression algorithms will cause the size to grow, not shrink.
March 27, 2016
On Saturday, 26 March 2016 at 05:22:56 UTC, Andrei Alexandrescu wrote:
> Compression of template names. -- Andrei

Compression in the usual sense won't help. Sure, it might reduce the object file size, but the full string will again have to be generated first, still requiring absurd amounts time and space. The latter is definitely not negligible for symbol names of several hundreds of kilobytes; it shows up prominently in compiler profiles of affected Weka builds.

 — David
March 30, 2016
On Sunday, 27 March 2016 at 17:01:39 UTC, David Nadlinger wrote:
> Compression in the usual sense won't help. Sure, it might reduce the object file size, but the full string will again have to be generated first, still requiring absurd amounts time and space. The latter is definitely not negligible for symbol names of several hundreds of kilobytes; it shows up prominently in compiler profiles of affected Weka builds.

We love Voldemort types at Weka, and use them a lot in our non-gc-allocating ranges and algorithm libraries. Also, we liberally nest templates inside of other templates.
I don't think we can do many of the things we do if we had to define everything at module level. This flexibility is amazing for us and part of the reason we love D.

But, as David said -- it comes with a great price for us.

I just processed our biggest executable, and came up with the following numbers:
total symbols: 99649
Symbols longer than 1k: 9639
Symbols longer than 500k: 102
Symbols longer than 1M: 62. The longest symbols are about 5M bytes!

This affects our exe sizes in a terrible way, and also increases our compile and link times considerably. I will only be able to come up with statistics of how much time was wasted due to too-long-symbols after we fix it, but obviously this is a major problem for us.

I think we should try the solution proposed by Anon, as it has a good possibility of saving quite a bit.
It's important to make sure that when a template is given as a template parameter, the complete template is treated as the LName.

Thinking about the compression idea by Andrei, I think we get such long names since we have huge symbols that are being passed as Voldemort names to template parameters. Then we repeat the huge symbols several times in the new template.
Think of a .5M symbol passed few times to a template, this is probably how we get to 5M size symbols.
This could end up being too complex, but if we assign "huffman coding" like names to the complete template names in a module scope (lets say, only if the template name is longer than 30 bytes), we then will be able to replace a very long string by the huffman coded version coupled with the LName+Number idea above, we will be able to shorten symbol names considerably.

An initial implementation could start with just the LName# solution, and then we can see if we also have to recursively couple it with huffman-coding of the results template names.

Liran
March 31, 2016
On Saturday, 26 March 2016 at 17:42:06 UTC, Anon wrote:
>
> The (conceptually) simple change I suggested brings the mangled name length down to O(n).

Hi Anon,
  I've started implementing your idea. But perhaps you already have a beginning of an implementation? If so, please contact me :)
https://github.com/JohanEngelen

Thanks,
  Johan
March 31, 2016
On 3/30/16 3:19 PM, Liran Zvibel wrote:
> On Sunday, 27 March 2016 at 17:01:39 UTC, David Nadlinger wrote:
>> Compression in the usual sense won't help. Sure, it might reduce the
>> object file size, but the full string will again have to be generated
>> first, still requiring absurd amounts time and space. The latter is
>> definitely not negligible for symbol names of several hundreds of
>> kilobytes; it shows up prominently in compiler profiles of affected
>> Weka builds.
>
> We love Voldemort types at Weka, and use them a lot in our
> non-gc-allocating ranges and algorithm libraries. Also, we liberally
> nest templates inside of other templates.
> I don't think we can do many of the things we do if we had to define
> everything at module level. This flexibility is amazing for us and part
> of the reason we love D.

Voldemort types are what cause the bloat, templates inside templates aren't as much of a problem. It's because the Voldemort type has to include in its symbol name at least twice, and I think 3 times actually (given the return type), the template parameter/function parameter types of the function it resides in. If the template is just a template, it's just included once. This is why moving the type outside the function is effective at mitigation. It's linear growth vs. exponential.

I too like Voldemort types, but I actually found moving the types outside the functions quite straightforward. It's just annoying to have to repeat the template parameters. If you make them private, then you can simply avoid all the constraints. It's a bad leak of implementation, since now anything in the file has access to that type directly, but it's better than the issues with voldemort types.

See the update to my iopipe library here: https://github.com/schveiguy/iopipe/commit/1b0696dc82fce500c6b314ec3d8e5e11e0c1bcd7

This one commit made my example program 'convert' (https://github.com/schveiguy/iopipe/blob/master/examples/convert/convert.d) save over 90% binary size (went from 10MB to <1MB).

This also calmed down some REALLY horrible stack traces when I was debugging. As in, I could actually understand what function it was talking about, and it didn't take 10 seconds to print stack trace.

>
> But, as David said -- it comes with a great price for us.
>
> I just processed our biggest executable, and came up with the following
> numbers:
> total symbols: 99649
> Symbols longer than 1k: 9639
> Symbols longer than 500k: 102
> Symbols longer than 1M: 62. The longest symbols are about 5M bytes!
>
> This affects our exe sizes in a terrible way, and also increases our
> compile and link times considerably. I will only be able to come up with
> statistics of how much time was wasted due to too-long-symbols after we
> fix it, but obviously this is a major problem for us.

From my testing, it doesn't take much to get to the point where the linker is unusable. A simple struct when nested in 15 calls to a function makes the linker take an unreasonable amount of time (over 1.5 minutes, I didn't wait to see how long). See my bug report for details.

Another factor in the name length is the module name which is included in every type and function. So you have a factor like 3^15 for the name, but then you multiply this by the module names as well.

> I think we should try the solution proposed by Anon, as it has a good
> possibility of saving quite a bit.
> It's important to make sure that when a template is given as a template
> parameter, the complete template is treated as the LName.

I hope this is given serious thought, looks like someone has already started implementation.

Anon, it appears that your mechanism has been well received by a few knowledgeable people here. I encourage you to solidify your proposal in a DIP (D improvement proposal) here: http://wiki.dlang.org/DIPs.

-Steve
March 31, 2016
On Thursday, 31 March 2016 at 13:10:49 UTC, Steven Schveighoffer wrote:
> Voldemort types are what cause the bloat, templates inside templates aren't as much of a problem.


So here's an idea of other things don't work out: voldemort types don't have a name that can be said... that could be true in the mangle too.

We could potentially just hash it to some fixed length. Take the existing name, SHA1 it, and call the mangle function_containing_type$that_hash. Demangling the inside is useless anyway, so we lose nothing from that.

For example, a function foo.bar() returning a voldemort type might have the mangle:

_D3foo3barv$55ca6286e3e4f4fba5d0448333fa99fc5a404a73

where that's the hash of whatever filth the compiler makes up for it.

and methods on the voldemort type would start with that too:

_D503foo3barv$55ca6286e3e4f4fba5d0448333fa99fc5a404a735empty

That's getting messy to see as an example, but it took the name of the function + the voldemort as a whole to be the public name of the type.

A demangler could scan that for the $ and recognize it as ugliness and slice it off, printing the name as a somewhat human-readable

foo.bar().empty

perhaps, so we can see it is a method on the return value of that function. It isn't completely impossible like demanging a whole hash; should be short in the binary and readable enough to make sense of in a stack trace.



I actually recall something like this being done at some point in the past, but I don't remember the details. I think it just truncated the name though, it didn't attempt to remain semi-reversible.



Of course, a chain of voldemort types will still include that type as a template argument in the next call, so names as a whole can still be very long, but how long are your chains? I can see this becoming a 10 KB long name in extreme circumstances (still yikes) but not megabytes.


And, still, having to run over the names to build the hash is going to be a memory/speed thing in the compiler, but SHA1ing a few megabytes isn't as slow as writing it out to disk over and over again.



Another thing to consider is what my D to Javascript thing did: abbreviate EVERYTHING in the object file. You can't demangle that at all, but it leads to much smaller binaries. This probably isn't that useful for D in general, but might solve Weka's problem since they can use internal hacks.