Need a Faster Compressor (page 7)

Am Sun, 22 May 2016 23:42:33 -0700 schrieb Walter Bright <newshound2@digitalmars.com>: > The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html > > It doesn't look too difficult. If we implement our own LZ4 compressor based on that, from scratch, we can boost license it. Ok, any volunteers? I'm not personally looking forward to reimplementing lz4 from sratch right now. As Stefan Koch said, we should be able to use the existing optimized code, like gcc uses gmp. -- Marco

May 23, 2016

Re: Need a Faster Compressor

Posted by ZombineDev
in reply to Era Scarecrow

Permalink

ZombineDev

Posted in reply to Era Scarecrow

Permalink

On Monday, 23 May 2016 at 01:46:40 UTC, Era Scarecrow wrote:
> On Sunday, 22 May 2016 at 19:44:08 UTC, Era Scarecrow wrote:
>> ...
>
> Well here's the rundown of some numbers. min_compress uses a tiny window, big_compress was my original algorithmn but modified to use 16k total for a window. reduced_id_compress is the original except reduced to a 220 window and 2 byte constant output. Along with the compressed outputs of each.
>
> min_compress:           [TickDuration(46410084)]  0.779836
> big_compress:           [TickDuration(47998202)]  0.806545
> orig_id_compress:       [TickDuration(59519257)]  baseline
> reduced_id_compress:    [TickDuration(44033192)]  0.739894
> 1001 (original size)
>
> 72 testexpansion.s!(æεó▌æ╗int).så°Resulà≡Ñ╪¢╨╘ÿ¼É ↑─↑╜►╘fñv├ÿ╜ ↑│↑Ä .foo()
> 73 testexpansion.s!(ÅæεÅó▌Åæ╗int).sÅå°ResulÅà≡ÅÑ╪Å¢╨Å╘ÿÅ¼É₧├ÿÄ╜É╝▓ÿëåâ.foo()
> 67 tes╤xpansion.s!(ÇææÇóóÇææint)∙⌡ResulÇàÅÇÑºÇ¢»Ç╘τÇ¼∩ë├τü╜∩¢▓τ².foo()
> 78 testexpansion.s!(æ2óCæ2int).så(Resulà0ÑH¢P╘ê¼É╘íñÖ├ê┤ÿσ║ñ¬├ê¼É╘íñÖ├êÉ1å).foo()
>
> min_compress:           [TickDuration(29210832)]  0.82391
> big_compress:           [TickDuration(31058664)]  0.87601
> orig_id_compress:       [TickDuration(35466130)]  baseline
> reduced_id_compress:    [TickDuration(25032532)]  0.705977
> 629  (original size)
>
> 52 E.s!(à·è⌡àδint).så°Resulà≡ÖΣÅ▄╝╝ö┤ líd _Ög læ◄.foo()
> 61 E.s!(Åà·Åè⌡Åàδint).sÅå°ResulÅà≡ÅÖΣÅÅ▄Å╝╝Åö┤₧ç∞ÄÖΣ¡ó╠ïå╗.foo()
> 52 E.s!(ΣÇèèΣint)∙⌡ResulÇàÅÇÖ¢ÇÅúÇ╝├Çö╦ëçôüÖ¢Æó│².foo()
> 52 E.s!(à&è+à&int).så(Resulà0Ö<ÅD╝döl ┤í╝ ┴Ö╣ ┤æ9.foo()

If you want, you can give the test case for issue 16039 a try. It produces 300-400MB binary, so it should be a nice test case :P

Am Sun, 22 May 2016 23:42:33 -0700 schrieb Walter Bright <newshound2@digitalmars.com>: > > The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html > > It doesn't look too difficult. If we implement our own LZ4 compressor based on that, from scratch, we can boost license it. That it right. It's pretty simple. On Monday, 23 May 2016 at 07:30:00 UTC, Marco Leise wrote: > > Ok, any volunteers? Well I am not a compression expert but since I am already working on optimizing the decompressor. The method for archiving perfect compression it outlined here: https://github.com/Cyan4973/lz4/issues/183

On Saturday, 21 May 2016 at 21:27:37 UTC, Era Scarecrow wrote: > > I assume this is related to compressing symbols thread? I mentioned possibly considering the LZO library. Maybe consider lz4 instead? Tends to be a bit faster, and it's BSD instead of GPL. https://cyan4973.github.io/lz4/ -Wyatt

On 5/23/2016 5:04 AM, Stefan Koch wrote: > Am Sun, 22 May 2016 23:42:33 -0700 > schrieb Walter Bright <newshound2@digitalmars.com>: >> >> The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html >> >> It doesn't look too difficult. If we implement our own LZ4 compressor based on >> that, from scratch, we can boost license it. > > That it right. It's pretty simple. Also, the LZ4 compressor posted here has a 64K string limit, which won't work for D because there are reported 8Mb identifier strings.

On Monday, 23 May 2016 at 15:33:45 UTC, Walter Bright wrote: > > Also, the LZ4 compressor posted here has a 64K string limit, which won't work for D because there are reported 8Mb identifier strings. This is only partially true. The 64k limit does not apply to the input string. It does only apply to the dictionary.It would only hit if we find 64k of identifier without repetition.

On Monday, 23 May 2016 at 16:00:20 UTC, Stefan Koch wrote: > On Monday, 23 May 2016 at 15:33:45 UTC, Walter Bright wrote: >> >> Also, the LZ4 compressor posted here has a 64K string limit, which won't work for D because there are reported 8Mb identifier strings. > > This is only partially true. > The 64k limit does not apply to the input string. It does only apply to the dictionary.It would only hit if we find 64k of identifier without repetition. If you want speed: 16k (aka half of the L1D).

Am Mon, 23 May 2016 12:04:48 +0000 schrieb Stefan Koch <uplink.coder@googlemail.com>: > The method for archiving perfect compression it outlined here: https://github.com/Cyan4973/lz4/issues/183 Nice, if it can keep the compression speed up. -- Marco

Forums