Thread overview
Problems with Zlib - data error
Apr 20, 2017
Era Scarecrow
Apr 20, 2017
Adam D. Ruppe
Apr 21, 2017
Era Scarecrow
Apr 21, 2017
Adam D. Ruppe
Apr 21, 2017
Era Scarecrow
Apr 22, 2017
Era Scarecrow
April 20, 2017
I took the UnCompress example and tried to make use of it, however it breaks midway through my program with nothing more than 'Data Error'.

[code]
//shamelessly taken for experimenting with
UnCompress decmp = new UnCompress;
foreach (chunk; stdin.byChunk(4096).map!(x => decmp.uncompress(x)))
[/code]

Although 2 things to note. First I'm using an xor block of data that's compressed (either with gzip or using only zlib), and second the size of the data is 660Mb, while the compressed gzip file is about 3Mb in size. So when the data gets out of the large null blocks is when it dies. The first 5Mb fits in 18k of compressed space (and could be re-compressed to save another 17%).

Is this a bug with zlib? With the Dlang library? Or is it a memory issue with allocation (which drove me to use this rather than the straight compress/decompress in the first place).

[code]
  File xor = File(args[2], "r"); //line 53

  foreach (chunk; xor.byChunk(2^^16).map!(x => cast(ubyte[]) decmp.uncompress(x))) //line 59 where it's breaking, doesn't matter if it's 4k, 8k, or 64k.
[/code]


std.zlib.ZlibException@std\zlib.d(96): data error
----------------
0x00407C62 in void std.zlib.UnCompress.error(int)
0x00405134 in ubyte[] xortool.main(immutable(char)[][]).__lambda2!(ubyte[]).__lambda2(ubyte[])
0x00405291 in @property ubyte[] std.algorithm.iteration.__T9MapResultS297xortool
4mainFAAyaZ9__lambda2TS3std5stdio4File7ByChunkZ.MapResult.front() at c:\D\dmd2\windows\bin\..\..\src\phobos\std\algorithm\iteration.d(582)
0x0040243F in _Dmain at g:\\Patch-Datei\xortool.d(59)
0x00405F43 in D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ6runAllMFZ9__lambda1MFZv
0x00405F07 in void rt.dmain2._d_run_main(int, char**, extern (C) int function(char[][])*).runAll()
0x00405E08 in _d_run_main
0x00405BF8 in main at g:\\Patch-Datei\xortool.d(7)
0x0044E281 in mainCRTStartup
0x764333CA in BaseThreadInitThunk
0x77899ED2 in RtlInitializeExceptionChain
0x77899EA5 in RtlInitializeExceptionChain
April 20, 2017
On Thursday, 20 April 2017 at 20:19:31 UTC, Era Scarecrow wrote:
> I took the UnCompress example and tried to make use of it, however it breaks midway through my program with nothing more than 'Data Error'.

See the tip of the week here:

http://arsdnet.net/this-week-in-d/2016-apr-24.html

In short, byChunk reuses its buffer, and std.zlib holds on to the pointer. That combination leads to corrupted data.


Easiest fix is to .dup the chunk... I don't know of one off the top of my head that avoids the allocation using any of the std.zlib functions.
April 21, 2017
On Thursday, 20 April 2017 at 20:24:15 UTC, Adam D. Ruppe wrote:
> In short, byChunk reuses its buffer, and std.zlib holds on to the pointer. That combination leads to corrupted data.
>
> Easiest fix is to .dup the chunk...

 So that's what's going on. But if I have to dup the blocks then I have the same problem as before with limited memory issues. I kinda wish more there was the gz_open that is in the C interface and let it deal with the decompression and memory management as appropriate.

I suppose i could incorporate a 8 byte header file that has the length before/after that are 0's and just drop 630Mb from the data that can be skipped... which is the bulk of the compressed data. I just hoped to keep it very simple.
April 21, 2017
On Friday, 21 April 2017 at 11:18:55 UTC, Era Scarecrow wrote:
>  So that's what's going on. But if I have to dup the blocks then I have the same problem as before with limited memory issues. I kinda wish more there was the gz_open that is in the C interface and let it deal with the decompression and memory management as appropriate.

You could always declare it with extern(C) and call it yourself.

But I didn't realize your thing was a literal example from the docs. Ugh, can't even trust that.

> I suppose i could incorporate a 8 byte header file that has the length before/after that are 0's and just drop 630Mb from the data that can be skipped... which is the bulk of the compressed data. I just hoped to keep it very simple.

Take a look at zlib.d's source

http://dpldocs.info/experimental-docs/source/std.zlib.d.html#L232

It isn't a long function, so if you take that you can copy/paste the C parts to get you started with your own function that manages the memory more efficiently to drop the parts you don't care about.
April 21, 2017
On Friday, 21 April 2017 at 12:57:25 UTC, Adam D. Ruppe wrote:
> But I didn't realize your thing was a literal example from the docs. Ugh, can't even trust that.

Which was a larger portion of why I was confused by it all than otherwise.

Still, it's much easier to salvage if I knew how the memory being returned was allocated or not, and if it could be de-allocated after I was done with it, vs letting the gc manage it. The black box vs white box approach.

> Take a look at zlib.d's source
>
> http://dpldocs.info/experimental-docs/source/std.zlib.d.html#L232
>
> It isn't a long function, so if you take that you can copy/paste the C parts to get you started with your own function that manages the memory more efficiently to drop the parts you don't care about.

I've worked directly with Zlib API in the past; However it was namely to get it to work with AHK allowing me to instantly compress text and see it's UUEncode64 output (which was fun) as well as having multiple source references for better compression.



I think I'll just go with full memory compression and make a quick simple filter to manage the large blocks of 0's to something more manageable. That will reduce the memory allocation issues.
April 22, 2017
On Friday, 21 April 2017 at 17:40:03 UTC, Era Scarecrow wrote:
> I think I'll just go with full memory compression and make a quick simple filter to manage the large blocks of 0's to something more manageable. That will reduce the memory allocation issues.

 Done and I'm happy with the results. After getting all my tests to work, working on the input of 660Mb went to 3.8Mb, and then compressing it with Zlib went to 2.98Mb.

 Alas the tool will be more useful in limited scope (rom hacking for example) than anywhere else probably... Although if there's any request for the source I can spruce it up before submitting it for public use.