Thread overview
[Issue 3191] New: std.zlib.UnCompress errors if buffer is reused
Jul 19, 2009
smjg@iname.com
Jul 19, 2009
Stewart Gordon
May 25, 2011
Andrej Mitrovic
July 19, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3191

           Summary: std.zlib.UnCompress errors if buffer is reused
           Product: D
           Version: 1.046
          Platform: x86
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: smjg@iname.com


I've just been trying to read zlib-compressed data from a file and decompress it block by block, but it kept throwing a ZlibException, when the same data successfully decompresses if done in one go.  I've spent ages reducing the problem.

----------
import std.stream, std.stdio, std.zlib;

const size_t BLOCK_SIZE = 1024;

void main(string[] a) {
    scope File file = new File(a[1], FileMode.In);
    scope UnCompress uc = new UnCompress;
    void[] ucData;
    ubyte[] block = new ubyte[BLOCK_SIZE];

    while (!file.eof) {
        block.length = file.read(block);
        writefln(block.length);
        ucData ~= uc.uncompress(block);
    }
    ucData ~= uc.flush();
    writefln("Finished: %d", ucData.length);
}
----------
C:\Users\Stewart\Documents\Programming\D\Tests\bugs>rdmd zlib_blocks.d comp.bin
1024
1024
Error: data error
----------

I then found that, if I move the declaration of block inside the while loop (thereby reallocating it each time), then it works.

----------
1024
1024
335
Finished: 16790
----------

Presumably, when the second block is fed to UnCompress, it tries to read data from the first block as well by re-reading the original memory location.  But this memory has been overwritten with the second block.  In other words, UnCompress is keeping and relying on a reference to memory it doesn't own.

You could argue that this is a limitation of the zlib implementation and it's the caller's responsibility to keep the blocks separate in memory.  But such a limitation would have to be documented.

According to a quick test, alternating between two buffers seems to work. Assuming that it does work in the general case, a possible solution is to add something like this to the documentation for std.zlib.UnCompress.uncompress:

"The contents of buf must not be changed between this call and the next call to uncompress.  Thus the buffer may not be immediately re-used for the next block of compressed data; however, alternating between two buffers is permissible."

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 19, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3191





--- Comment #1 from Stewart Gordon <smjg@iname.com>  2009-07-19 15:37:37 PDT ---
Created an attachment (id=427)
 --> (http://d.puremagic.com/issues/attachment.cgi?id=427)
Sample compressed data file

This is the data file I used with the testcase program.  For the curious ones among you, it's the content extracted from a PNG file's IDAT chunk.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 25, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3191


Andrej Mitrovic <andrej.mitrovich@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrej.mitrovich@gmail.com


--- Comment #2 from Andrej Mitrovic <andrej.mitrovich@gmail.com> 2011-05-24 22:05:17 PDT ---
Greetings, I come from the future.

Here's a modern implementation of your sample:

import std.zlib;
import std.stdio;

const size_t BLOCK_SIZE = 1024;

void main(string[] a)
{
    auto file = File(a[1], "r");
    auto uc = new UnCompress();

    void[] ucData;
    ubyte[] block = new ubyte[BLOCK_SIZE];

    foreach (ubyte[] buffer; file.byChunk(BLOCK_SIZE))
    {
        writeln(buffer.length);
        ucData ~= uc.uncompress(buffer);
    }

    ucData ~= uc.flush();
    writefln("Finished: %s", ucData.length);
}

Still errors out. But I have a hunch it has something to do with buffer being reused by file.byChunk, and zlib might internally be storing a pointer to the buffer while the GC might deallocate it in the meantime.

Something like that, because if you .dup your buffer, you won't get errors anymore:

import std.zlib;
import std.stdio;

const size_t BLOCK_SIZE = 1024;

void main(string[] a)
{
    auto file = File(a[1], "r");
    auto uc = new UnCompress();

    void[] ucData;
    ubyte[] block = new ubyte[BLOCK_SIZE];

    foreach (ubyte[] buffer; file.byChunk(BLOCK_SIZE))
    {
        writeln(buffer.length);
        ucData ~= uc.uncompress(buffer.dup);
    }

    ucData ~= uc.flush();
    writefln("Finished: %s", ucData.length);
}

It might just be that zlib expects all data passed in to be valid while you use the UnCompress() class. I have no other explanation.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------