May 23, 2016
On 22.05.2016 00:07, Walter Bright wrote:
> On 5/21/2016 2:37 PM, Timon Gehr wrote:
>> Why is longest_match Ω(nm) instead of O(n+m) (e.g. KMP)?
>
> I don't understand the terms you use, but as to the "why" it is based on
> what I knew about LZ77 compression.  I don't pretend to be an expert on
> compression, and this is what I churned out. As mentioned elsewhere, the
> C++ mangling scheme has a primitive and ineffective LZ77 scheme built
> in, so I wouldn't waste time on that.
>
> A quick google search on finding the longest match in a string did not
> turn up anything obvious.
> ...

E.g. the Knuth-Morris-Pratt (KMP) string search algorithm can be modified to compute longest match instead of full match (as it just efficiently builds and runs a finite state machine whose state is the length of the longest match ending at the current position).

https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

Implementation:

auto longestMatch(string x,string y){
    if(y.length>x.length) y=y[0..x.length];
    auto f=new size_t[y.length+1];
    for(size_t i=0,j=f[0]=-1;i<y.length;f[++i]=++j)
        while(~j&&y[i]!=y[j]) j=f[j];
    auto r=x[0..0];
    for(size_t i=0,j=0;i<x.length&&j<y.length;++i,++j){
        while(~j&&x[i]!=y[j]) j=f[j];
        if(j+1>r.length) r=x[i-j..i+1];
    }
    return r;
}

This returns a slice of x representing the leftmost longest match with y. Running time is O(x.length). (In practice, if this should be really fast, the allocation for 'f' should probably be shared among multiple calls.)

(This probably only improves running time in case there are sufficiently many sufficiently long matches.)

But this just improves longest_match. It should be possible to bring down the running time of the entire compression algorithm significantly using a suffix tree (the resulting running time bound is linear in the input size and independent of the window size).


> If you want to throw your hat (i.e. expertise) into the ring and post a
> faster compressor, please do so!

As far as I understand, the use case for this is compressing mangled names? I don't think that generating huge mangled names explicitly just to compress them afterwards is a viable strategy. This process throws away a lot of structure information during mangling that could be useful for fast and effective compression.

Instead, compression should be performed while generating the mangled string. As far as I understand, mangled names only grow unmanageably large because the same symbols are mangled into them multiple times? Isn't it enough to create references to previously embedded mangled symbols (i-th symbol already mangled) while generating the mangled string?
May 23, 2016
On 5/23/2016 12:32 PM, Timon Gehr wrote:
> Instead, compression should be performed while generating the mangled string.

The trouble with that is the mangled string is formed from component pieces. Those component pieces may have common substrings with each other, which won't be detected until they are merged, when you're back to dealing with the string as a whole.


> Isn't it enough to create
> references to previously embedded mangled symbols (i-th symbol already mangled)
> while generating the mangled string?

That's been covered upthread, and VC++/g++ do that. But as shown, it is inadequate. There's a lot more redundancy than just the identifiers.
May 23, 2016
On 5/23/2016 9:00 AM, Stefan Koch wrote:
> On Monday, 23 May 2016 at 15:33:45 UTC, Walter Bright wrote:
>>
>> Also, the LZ4 compressor posted here has a 64K string limit, which won't work
>> for D because there are reported 8Mb identifier strings.
>
> This is only partially true.
> The 64k limit does not apply to the input string.

The starting line of the compression function posted here tests for it and aborts if larger.

May 23, 2016
On 23.05.2016 22:34, Walter Bright wrote:
> On 5/23/2016 12:32 PM, Timon Gehr wrote:
>> Instead, compression should be performed while generating the mangled
>> string.
>
> The trouble with that is the mangled string is formed from component
> pieces.

Then don't do that. I.e. re-mangle recursively from scratch for each mangled name and allow sharing parts between unrelated components within that mangled name. (That's similar to how the compiler stores the structure in memory, which also avoids the exponential blowup.)

> Those component pieces may have common substrings with each
> other, which won't be detected until they are merged,
> when you're back to dealing with the string as a whole.
>
>
>> Isn't it enough to create
>> references to previously embedded mangled symbols (i-th symbol already
>> mangled)
>> while generating the mangled string?
>
> That's been covered upthread, and VC++/g++ do that. But as shown, it is
> inadequate. There's a lot more redundancy than just the identifiers.

Yes, but I think that should be covered (up to constants) by what I suggested above?

You can't have exponential growth before compression. Any scheme that does not prevent running time exponential in template instantiation depth is also inadequate in the long run.
May 23, 2016
On 5/23/2016 2:17 PM, Timon Gehr wrote:
> Then don't do that. I.e. re-mangle recursively from scratch for each mangled
> name and allow sharing parts between unrelated components within that mangled
> name.

How is that essentially different from running a compression pass? The only real problem with the compressor was the compression speed, and the LZ4 test shows this is solvable.


> (That's similar to how the compiler stores the structure in memory, which
> also avoids the exponential blowup.)

No, it doesn't.


> You can't have exponential growth before compression.

The compiler is designed so that the internal names are pretty much write-only, uniquely hashed by their address. It's not spending time running strlen()'s on them.


> Any scheme that does not prevent running time exponential in template instantiation depth is also
> inadequate in the long run.

Most algorithms seem to have pathological worst cases, but as long as they are rare, it doesn't impede their utility.

May 23, 2016
Just a heads up on the LZ4.
I have spent roughly 3 hours optimizing my decompresser.
And while I had stunning success, a speed-up of about 400%.
I am still about 600x slower then the C variant.

It is still a mystery to me why that is :)
Since the generated code both smaller and works almost without spills.

May 23, 2016
On 5/23/2016 4:49 PM, Stefan Koch wrote:
> Just a heads up on the LZ4.
> I have spent roughly 3 hours optimizing my decompresser.
> And while I had stunning success, a speed-up of about 400%.
> I am still about 600x slower then the C variant.
>
> It is still a mystery to me why that is :)
> Since the generated code both smaller and works almost without spills.


I'm not sure why you're working on a decompressor. The speed needs to be in the compressor.
May 24, 2016
On Monday, 23 May 2016 at 20:36:04 UTC, Walter Bright wrote:
> On 5/23/2016 9:00 AM, Stefan Koch wrote:
>> The 64k limit does not apply to the input string.
>
> The starting line of the compression function posted here tests for it and aborts if larger.

 So Walter, big question: How big before the compressor kicks in? What's the minimum size of the input string the compressor should expect?

 Found something VERY interesting, I have 4 algorithms I'm benchmarking (the original and 3 edited compression routines). Most of them perform with similar speed (15-25% boost), but when I gave it a test by compressing raw D code of 20k big (for a big example), it suddenly showed an obvious winner between the 4 (performing 5x faster than the original id_compress). This gain in speed wasn't obvious with a smaller string (1k or so), so assuming I'm getting good results this is very promising.

 id_compress: 1379ms (1379331μs), 22595	-> 8458 bytes
big_compress: 1186ms (1186608μs), 22595	-> 6784 bytes
  id_reduced:  626ms  (626750μs), 22595	-> 9618 bytes
min_compress:  262ms  (262340μs), 22595 -> 9266 bytes

 I assume speed is more important than a little loss in compression.
May 24, 2016
On 24.05.2016 01:02, Walter Bright wrote:
> On 5/23/2016 2:17 PM, Timon Gehr wrote:
>> Then don't do that. I.e. re-mangle recursively from scratch for each
>> mangled
>> name and allow sharing parts between unrelated components within that
>> mangled
>> name.
>
> How is that essentially different from running a compression pass?

Speed. The title of this thread is "Need a faster compressor".

> The
> only real problem with the compressor was the compression speed, and the
> LZ4 test shows this is solvable.
> ...

No. Just increase the recursion depth by a small number of levels to completely kill the speedups gained by using a faster compression algorithm.

>
>  > (That's similar to how the compiler stores the structure in memory,
> which
>> also avoids the exponential blowup.)
>
> No, it doesn't.
> ...

Yes, it does. The compiler does not use exponential space to store the AST.

>
>> You can't have exponential growth before compression.
>
> The compiler is designed so that the internal names are pretty much
> write-only, uniquely hashed by their address. It's not spending time
> running strlen()'s on them.
> ...

It's _exponential_ growth. We don't even want to spend the time and memory required to generate the strings.

>
>> Any scheme that does not prevent running time exponential in template
>> instantiation depth is also
>> inadequate in the long run.
>
> Most algorithms seem to have pathological worst cases, but as long as
> they are rare, it doesn't impede their utility.
>

The reason we have this discussion is that the worst case isn't rare enough to make this argument. Why compress in the first place if mangled names don't grow large in practice?
May 24, 2016
On Tuesday, 24 May 2016 at 16:22:56 UTC, Timon Gehr wrote:
> On 24.05.2016 01:02, Walter Bright wrote:
>> [...]
>
> Speed. The title of this thread is "Need a faster compressor".
>
>> [...]
>
> No. Just increase the recursion depth by a small number of levels to completely kill the speedups gained by using a faster compression algorithm.
>
>> [...]
>
> Yes, it does. The compiler does not use exponential space to store the AST.
>
>> [...]
>
> It's _exponential_ growth. We don't even want to spend the time and memory required to generate the strings.
>
>>[...]
>
> The reason we have this discussion is that the worst case isn't rare enough to make this argument. Why compress in the first place if mangled names don't grow large in practice?

I completely agree!
However such a thing will only work if dmd is used as a pre-linker and If we can stablize the hash.
I.E. every run of the compiler generates the same hash.