Need a Faster Compressor (page 5)

On Sunday, 22 May 2016 at 16:06:07 UTC, Marco Leise wrote: > There are a few open questions, for me at least. Are there other proposals with good properties? Are there any objections against the benchmarking method or the license? Can the implementation be in D? If not, should we link against the system liblz4.so/a or copy the C code into the backend? Is a pre-filled dictionary worthwhile? If so, which words should go in it? The latter is important because changes to the Dlang mangling need to be reflected in other projects outside our direct control, too. How it's being benchmarked does make me question it a bit. I avoided doing multi-threading so my results are as they would run on 1 CPU/Core, while adding multi-threading probably isn't too hard to do (at least in D). Then there's the wonder of how many cores, and other details. A compressor designed to run in parallel will naturally be in it's own element there. This of course assumes we are allowed to in the first place; The compiler may already be using those cores for other tasks: Scanning other files, optimizing code, CTFE, etc, to which point speed gained in compression may not actually translate to an overall speedup if you are stealing CPU cycles. As for a dictionary and being primed, I'm trying to figure out what would be in it myself. Commonly used templates and keywords that appear in symbols are so far my best guess. I'd avoid library names unless they'd appear in the symbols (so 'std.core' might not mean much, while 'unittestL' does). Then again I don't have a good sample of what I'd actually encounter live, maybe every major library import would be good thing to have.

May 22, 2016

Re: Need a Faster Compressor

Posted by Rainer Schuetze
in reply to Walter Bright

Permalink

Rainer Schuetze

Posted in reply to Walter Bright

Permalink


On 22.05.2016 00:58, Walter Bright wrote:
> On 5/21/2016 3:41 PM, Guillaume Boucher wrote:
>> Sorry if I didn't memorize everything in this forum from the last 20
>> years, can
>> you give a link to some reasoning?
>
> DMC++ matches the Microsoft name mangling scheme, which includes such
> compression. It proved hopelessly inadequate, which is why I implemented
> compress.c in the first place (it was for the DMC++ compiler).
>
> (And frankly, I don't see how an ad-hoc scheme like that could hope to
> compare with a real compression algorithm.)

You are right about the symbols using the VC mangling. The test case "1.s.s.s.s.s" in the bugreport translated to C++ yields

?foo@Result@?1???$s@UResult@?1???$s@UResult@?1???$s@UResult@?1???$s@UResult@?1???$s@H@testexpansion@@YA@H@Z@@testexpansion@@YA@U1?1???$s@H@2@YA@H@Z@@Z@@testexpansion@@YA@U1?1???$s@UResult@?1???$s@H@testexpansion@@YA@H@Z@@2@YA@U1?1???$s@H@2@YA@H@Z@@Z@@Z@@testexpansion@@YA@U1?1???$s@UResult@?1???$s@UResult@?1???$s@H@testexpansion@@YA@H@Z@@testexpansion@@YA@U1?1???$s@H@2@YA@H@Z@@Z@@2@YA@U1?1???$s@UResult@?1???$s@H@testexpansion@@YA@H@Z@@2@YA@U1?1???$s@H@2@YA@H@Z@@Z@@Z@@Z@@testexpansion@@YA@U1?1???$s@UResult@?1???$s@UResult@?1???$s@UResult@?1???$s@H@testexpansion@@YA@H@Z@@testexpansion@@YA@U1?1???$s@H@2@YA@H@Z@@Z@@testexpansion@@YA@U1?1???$s@UResult@?1???$s@H@testexpansion@@YA@H@Z@@2@YA@U1?1???$s@H@2@YA@H@Z@@Z@@Z@@2@YA@U1?1???$s@UResult@?1???$s@UResult@?1???$s@H@testexpansion@@YA@H@Z@@testexpansion@@YA@U1?1???$s@H@2@YA@H@Z@@Z@@2@YA@U1?1???$s@UResult@?1???$s@H@testexpansion@@YA@H@Z@@2@YA@U1?1???$s@H@2@YA@H@Z@@Z@@Z@@Z@@Z@QAEXXZ

i.e. 936 characters. I think this is due to the very bad limitation of just back referencing the first 10 types.

The gcc mangling allows arbitrary back references and is closer to what some have proposed. It yields

_ZZN13testexpansion1sIZNS0_IZNS0_IZNS0_IZNS0_IiEEDaT_E6ResultEES1_S2_E6ResultEES1_S2_E6ResultEES1_S2_E6ResultEES1_S2_EN6Result3fooEv

which is only 39 characters longer than the compressed version of the D symbol. It uses a much smaller character set, though.

This is my translation of the test case to C++14:

namespace testexpansion {

template<class T>
struct S
{
    void foo(){}
};

template<class T>
auto testexpansion_s(T t)
{
#ifdef bad
    struct Result
	{
		void foo(){}
	};
	return Result();
#else
    return S<T>();
#endif
}

void xmain()
{
    auto x = s(s(s(s(s(1)))));
    x.foo();
}

}

On Sunday, 22 May 2016 at 17:44:22 UTC, Rainer Schuetze wrote: > You are right about the symbols using the VC mangling. The test case "1.s.s.s.s.s" in the bugreport translated to C++ yields > > ?foo@Result@?1???$s@UResult@?1???$s@UResult@?1???$s <snip> > > i.e. 936 characters. I think this is due to the very bad limitation of just back referencing the first 10 types. Unless ? is the proper/valid character those are far more likely to be failed UTF-8 decoding. Care to double check?

On 22.05.2016 19:56, Era Scarecrow wrote: > On Sunday, 22 May 2016 at 17:44:22 UTC, Rainer Schuetze wrote: >> You are right about the symbols using the VC mangling. The test case >> "1.s.s.s.s.s" in the bugreport translated to C++ yields >> >> ?foo@Result@?1???$s@UResult@?1???$s@UResult@?1???$s <snip> >> >> i.e. 936 characters. I think this is due to the very bad limitation of >> just back referencing the first 10 types. > > Unless ? is the proper/valid character those are far more likely to be > failed UTF-8 decoding. Care to double check? ? is an allowed character in VC++ and for example permits unambiguous demangling. The gcc style using only alpha numeric chanracters and '_' can also just be plain C symbols.

On Sunday, 22 May 2016 at 18:04:25 UTC, Rainer Schuetze wrote: > On 22.05.2016 19:56, Era Scarecrow wrote: >> Unless ? is the proper/valid character those are far more likely to be failed UTF-8 decoding. Care to double check? > > ? is an allowed character in VC++ and for example permits unambiguous demangling. The gcc style using only alpha numeric characters and '_' can also just be plain C symbols. Not quite what I asked. Is the sample provided accurate? Is having 3 ?s in a row what should be there not just spitting a ? because it doesn't know how to decode it?

Am Sun, 22 May 2016 17:19:38 +0000 schrieb Stefan Koch <uplink.coder@googlemail.com>: > Here are my answers please take them with a grain of ignorance. […] Maybe you are right about the dictionary. I haven't put much any thought into it. Anyways I was looking for input from the core devs who are affected by this. > However I agree with Martins point: It is a waste of resources re-implementing something that already has a perfectly fine implementation. I already loosely copied and fixed up the compressor in the attached proof of concept. Thanks to the two languages' similarities it was a matter of an hour or so. If you provide the decompressor, we're set. :) -- Marco

On 22.05.2016 20:12, Era Scarecrow wrote: > On Sunday, 22 May 2016 at 18:04:25 UTC, Rainer Schuetze wrote: >> On 22.05.2016 19:56, Era Scarecrow wrote: >>> Unless ? is the proper/valid character those are far more likely to >>> be failed UTF-8 decoding. Care to double check? >> >> ? is an allowed character in VC++ and for example permits unambiguous >> demangling. The gcc style using only alpha numeric characters and '_' >> can also just be plain C symbols. > > Not quite what I asked. Is the sample provided accurate? Is having 3 ?s > in a row what should be there not just spitting a ? because it doesn't > know how to decode it? The 3 consecutive ?s are also in the object file, so they are actually part of the mangled symbol. VC++ uses only ASCII characters for mangling (which D should do, too, IMO).

On Sunday, 22 May 2016 at 18:18:46 UTC, Marco Leise wrote: > If you provide the decompressor, we're set. :) I am already working on a non-ctfeable version that does not use slices. And therefore should be faster. But decompression is only needed for demangling.

On 5/22/2016 2:07 AM, Era Scarecrow wrote: > On Sunday, 22 May 2016 at 08:50:47 UTC, Walter Bright wrote: >> On 5/21/2016 11:26 PM, Era Scarecrow wrote: >>> With 1 Million iterations: >>> >>> new_compress: TickDuration(311404604) >>> id_compress TickDuration(385806589) >>> >>> Approx 20% increase in speed (if i'm reading and did this right). >> >> It is promising. Need more! > > Well there's other good news. While I was working with about 2Mb for the data > size in general, it can be reduced it looks like to about 8k-16k, and run > entirely on the stack. It doesn't appear to gain any speed, or any speed gained > is lost with better memory management. > > Although based on how things look, the id_compress might perform better with a > max window of 255 and max match of 127. I'll give the original one a tweak based > on my own findings and see if it turns out true. > > There's also https://github.com/dlang/dmd/pull/5799 which should make things faster.

On 5/22/2016 5:50 AM, deadalnix wrote: > On Sunday, 22 May 2016 at 01:36:39 UTC, Walter Bright wrote: >> Just a note: large lookup tables can have cold cache performance problems. > > 16k lookup table are the gold standard. > I like 64 bit vectors because the lookup and test can be done in one instruction, and the vector cached in a register!

Forums