May 21, 2016
On Saturday, 21 May 2016 at 08:57:57 UTC, Johan Engelen wrote:
> On Saturday, 21 May 2016 at 06:18:05 UTC, poliklosio wrote:
>>
>> I have an Idea of reproducible, less-than-exponential time mangling, although I don't know how actionable.
>>
>> Leave mangling as is, but pretend that you are mangling something different, for example when the input is
>> foo!(boo!(bar!(baz!(int))), bar!(baz!(int)), baz!(int))
>> pretend that you are mangling
>> foo!(boo!(bar!(baz!(int))), #1, #2)
>> Where #1 and #2 are special symbols that refer to stuff that was **already in the name**, particularly:
>> #1: bar!(baz!(int))
>> #2: baz!(int)
>
> See http://forum.dlang.org/post/szodxrizfmufqdkpdryc@forum.dlang.org

So if I understand correctly, you tried implementing something like this, but it didn't help much even with size of mangled names. Are you sure you were testing on the pathological case (exponential stuff), rather than a bad one? Assuming your experiment is correct, something interesting is happening, and I will be observing you guys finding out what it is. :)
May 21, 2016
On Saturday, 21 May 2016 at 18:25:46 UTC, Walter Bright wrote:
> On 5/20/2016 11:18 PM, poliklosio wrote:
>> I have an Idea of reproducible, less-than-exponential time mangling, although I
>> don't know how actionable.
>>
>> Leave mangling as is, but pretend that you are mangling something different, for
>> example when the input is
>> foo!(boo!(bar!(baz!(int))), bar!(baz!(int)), baz!(int))
>> pretend that you are mangling
>> foo!(boo!(bar!(baz!(int))), #1, #2)
>> Where #1 and #2 are special symbols that refer to stuff that was **already in
>> the name**, particularly:
>> #1: bar!(baz!(int))
>> #2: baz!(int)
>
> This is what LZW compression does, except that LZW does it in general rather than just for identifiers.

That's true, but a general compression algorithm requires a stream of chars as input, so to compress something of exponential size you still need exponential runtime in order to iterate (while compressing) on the exponential input.
The idea was to avoid such exponential iteration in the first place by doing some sort of caching.
My thinking is that if you only reduce size but keep exponential runtime, you are going to have to revisit the issue in a few months anyway when people start using things you enabled more heavily.
Anyway I wish you good luck with all this!
May 21, 2016
On Saturday, 21 May 2016 at 18:25:46 UTC, Walter Bright wrote:
> On 5/20/2016 11:18 PM, poliklosio wrote:
>> foo!(boo!(bar!(baz!(int))), #1, #2)
>> Where #1 and #2 are special symbols that refer to stuff that was **already in
>> the name**, particularly:
>> #1: bar!(baz!(int))
>> #2: baz!(int)
>
> This is what LZW compression does, except that LZW does it in general rather than just for identifiers.

It's more like LZ77 than LZW.
May 21, 2016
On Saturday, 21 May 2016 at 17:34:19 UTC, Kagamin wrote:
> Equivalent to not mangling return type at all. But this leaves you with 2^^n growth, still exponential. Also is there any evidence that compression is faster than hashing?

 Depends on implementation and algorithm. However even the weakest compression settings can yield huge initial compression benefits. In normal text a reduction of 2:1 is expected, and will . The benefit being compression still contains all it's information, and hashing will reduce the size to a fixed length but throw away all that information for unique identity.

 LZO is a compressor/decompressor that prioritizes speed and can be used in real-time applications for very little cost to add compression to any application. I tinkered with some of it's options a while back and it did okay, not as good as zlib but that's to be expected. Although using zlib on it's fastest setting will likely yield good results too.

 http://www.oberhumer.com/opensource/lzo/
May 21, 2016
On 5/21/2016 1:09 PM, Era Scarecrow wrote:
>  Depends on implementation and algorithm. However even the weakest compression
> settings can yield huge initial compression benefits. In normal text a reduction
> of 2:1 is expected, and will . The benefit being compression still contains all
> it's information, and hashing will reduce the size to a fixed length but throw
> away all that information for unique identity.
>
>  LZO is a compressor/decompressor that prioritizes speed and can be used in
> real-time applications for very little cost to add compression to any
> application. I tinkered with some of it's options a while back and it did okay,
> not as good as zlib but that's to be expected. Although using zlib on it's
> fastest setting will likely yield good results too.
>
>  http://www.oberhumer.com/opensource/lzo/

We already have a compressor in the compiler source for compressing names:

  https://github.com/dlang/dmd/blob/master/src/backend/compress.c

A faster one would certainly be nice. Anyone game?
May 21, 2016
On 5/21/2016 1:49 PM, Walter Bright wrote:
> We already have a compressor in the compiler source for compressing names:
>
>   https://github.com/dlang/dmd/blob/master/src/backend/compress.c
>
> A faster one would certainly be nice. Anyone game?

Note how well it does:

https://issues.dlang.org/show_bug.cgi?id=15831#c5
May 21, 2016
On Saturday, 21 May 2016 at 20:50:56 UTC, Walter Bright wrote:
> On 5/21/2016 1:49 PM, Walter Bright wrote:
>> We already have a compressor in the compiler source for compressing names:
>>
>>   https://github.com/dlang/dmd/blob/master/src/backend/compress.c
>>
>> A faster one would certainly be nice. Anyone game?
>
> Note how well it does:
>
> https://issues.dlang.org/show_bug.cgi?id=15831#c5

The underlying symbols are still growing exponentially. Nobody has the RAM for that, regardless of what size the resultant binary is.

Compressing the symbol names is a bandage. The compiler needs a new kidney.
May 22, 2016
On 05/21/2016 06:27 PM, Anon wrote:
> On Saturday, 21 May 2016 at 20:50:56 UTC, Walter Bright wrote:
>> On 5/21/2016 1:49 PM, Walter Bright wrote:
>>> We already have a compressor in the compiler source for compressing
>>> names:
>>>
>>> https://github.com/dlang/dmd/blob/master/src/backend/compress.c
>>>
>>> A faster one would certainly be nice. Anyone game?
>>
>> Note how well it does:
>>
>> https://issues.dlang.org/show_bug.cgi?id=15831#c5
>
> The underlying symbols are still growing exponentially. Nobody has the
> RAM for that, regardless of what size the resultant binary is.
>
> Compressing the symbol names is a bandage. The compiler needs a new kidney.

My understanding is that the encoding "auto" return in the mangling makes any exponential growth disappear. Is that the case? -- Andrei
May 22, 2016
On 05/21/2016 03:13 PM, Kagamin wrote:
> On Saturday, 21 May 2016 at 18:18:21 UTC, Andrei Alexandrescu wrote:
>> He said that that won't happen any longer, the growth was because of
>> the return type. Is that correct?
>
> https://issues.dlang.org/show_bug.cgi?id=15831#c4
> Looks like growth is due to the fact that the voldemort type is in the
> scope of a function and function is fun!(T).fun(T) - hence each level
> multiplies by 2. And return type is not part of the mangled name already
> - that would cause circular dependency, you would need the voldemort
> type mangling to generate it.

Just to make sure I understand: do you agree or disagree that there's no more exponential growth if we encode "auto return" in the mangling? Thx! -- Andrei
May 22, 2016
On Sunday, 22 May 2016 at 21:40:07 UTC, Andrei Alexandrescu wrote:
> On 05/21/2016 06:27 PM, Anon wrote:
>> On Saturday, 21 May 2016 at 20:50:56 UTC, Walter Bright wrote:
>>> On 5/21/2016 1:49 PM, Walter Bright wrote:
>>>> We already have a compressor in the compiler source for compressing
>>>> names:
>>>>
>>>> https://github.com/dlang/dmd/blob/master/src/backend/compress.c
>>>>
>>>> A faster one would certainly be nice. Anyone game?
>>>
>>> Note how well it does:
>>>
>>> https://issues.dlang.org/show_bug.cgi?id=15831#c5
>>
>> The underlying symbols are still growing exponentially. Nobody has the
>> RAM for that, regardless of what size the resultant binary is.
>>
>> Compressing the symbol names is a bandage. The compiler needs a new kidney.
>
> My understanding is that the encoding "auto" return in the mangling makes any exponential growth disappear. Is that the case? -- Andrei

No:

auto foo(T)(T x)
{
    struct Vold { T payload; }
    return Vold(x);
}

foo(5)
_D3mod10__T3fooTiZ3fooFNaNbNiNfiZS3mod10__T3fooTiZ3fooFiZ4Vold

foo(5).foo
_D3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFNaNbNiNfS3mod10__T3fooTiZ3fooFiZ4VoldZS3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFS3mod10__T3fooTiZ3fooFiZ4VoldZ4Vold

foo(5).foo.foo
_D3mod94__T3fooTS3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFS3mod10__T3fooTiZ3fooFiZ4VoldZ4VoldZ3fooFNaNbNiNfS3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFS3mod10__T3fooTiZ3fooFiZ4VoldZ4VoldZS3mod94__T3fooTS3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFS3mod10__T3fooTiZ3fooFiZ4VoldZ4VoldZ3fooFS3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFS3mod10__T3fooTiZ3fooFiZ4VoldZ4VoldZ4Vold

Lengths: 62 | 174 | 398

Just dropping the return types to a single character ($) shrinks the names, but it doesn't solve the core of the problem. Still exponential:

foo(5)
_D3mod1010__T3fooTiZ3fooFNaNbNiNfiZ($)

foo(5).foo
_D3mod38__T3fooT(S3mod10__T3fooTiZ3fooFiZ4Vold)Z3fooFNaNbNiNf(S3mod10__T3fooTiZ3fooFiZ4Vold)Z{$}

foo(5).foo.foo
_D3mod94__T3fooT{S3mod38__T3fooT(S3mod10__T3fooTiZ3fooFiZ4Vold)Z3fooF(S3mod10__T3fooTiZ3fooFiZ4Vold)Z4Vold}Z3fooFNaNbNiNf{S3mod38__T3fooT(S3mod10__T3fooTiZ3fooFiZ4Vold)Z3fooF(S3mod10__T3fooTiZ3fooFiZ4Vold)Z4Vold}Z$

Lengths: 36 | 90 | 202

Note: the part inside () is the return type of the first. The part in {} is the return type of the second. I left those in for illustrative purposes.