Thread overview | |||||
---|---|---|---|---|---|
|
July 19, 2009 [Issue 3191] New: std.zlib.UnCompress errors if buffer is reused | ||||
---|---|---|---|---|
| ||||
http://d.puremagic.com/issues/show_bug.cgi?id=3191 Summary: std.zlib.UnCompress errors if buffer is reused Product: D Version: 1.046 Platform: x86 OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody@puremagic.com ReportedBy: smjg@iname.com I've just been trying to read zlib-compressed data from a file and decompress it block by block, but it kept throwing a ZlibException, when the same data successfully decompresses if done in one go. I've spent ages reducing the problem. ---------- import std.stream, std.stdio, std.zlib; const size_t BLOCK_SIZE = 1024; void main(string[] a) { scope File file = new File(a[1], FileMode.In); scope UnCompress uc = new UnCompress; void[] ucData; ubyte[] block = new ubyte[BLOCK_SIZE]; while (!file.eof) { block.length = file.read(block); writefln(block.length); ucData ~= uc.uncompress(block); } ucData ~= uc.flush(); writefln("Finished: %d", ucData.length); } ---------- C:\Users\Stewart\Documents\Programming\D\Tests\bugs>rdmd zlib_blocks.d comp.bin 1024 1024 Error: data error ---------- I then found that, if I move the declaration of block inside the while loop (thereby reallocating it each time), then it works. ---------- 1024 1024 335 Finished: 16790 ---------- Presumably, when the second block is fed to UnCompress, it tries to read data from the first block as well by re-reading the original memory location. But this memory has been overwritten with the second block. In other words, UnCompress is keeping and relying on a reference to memory it doesn't own. You could argue that this is a limitation of the zlib implementation and it's the caller's responsibility to keep the blocks separate in memory. But such a limitation would have to be documented. According to a quick test, alternating between two buffers seems to work. Assuming that it does work in the general case, a possible solution is to add something like this to the documentation for std.zlib.UnCompress.uncompress: "The contents of buf must not be changed between this call and the next call to uncompress. Thus the buffer may not be immediately re-used for the next block of compressed data; however, alternating between two buffers is permissible." -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 19, 2009 [Issue 3191] std.zlib.UnCompress errors if buffer is reused | ||||
---|---|---|---|---|
| ||||
Posted in reply to smjg@iname.com | http://d.puremagic.com/issues/show_bug.cgi?id=3191 --- Comment #1 from Stewart Gordon <smjg@iname.com> 2009-07-19 15:37:37 PDT --- Created an attachment (id=427) --> (http://d.puremagic.com/issues/attachment.cgi?id=427) Sample compressed data file This is the data file I used with the testcase program. For the curious ones among you, it's the content extracted from a PNG file's IDAT chunk. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
May 25, 2011 [Issue 3191] std.zlib.UnCompress errors if buffer is reused | ||||
---|---|---|---|---|
| ||||
Posted in reply to smjg@iname.com | http://d.puremagic.com/issues/show_bug.cgi?id=3191 Andrej Mitrovic <andrej.mitrovich@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |andrej.mitrovich@gmail.com --- Comment #2 from Andrej Mitrovic <andrej.mitrovich@gmail.com> 2011-05-24 22:05:17 PDT --- Greetings, I come from the future. Here's a modern implementation of your sample: import std.zlib; import std.stdio; const size_t BLOCK_SIZE = 1024; void main(string[] a) { auto file = File(a[1], "r"); auto uc = new UnCompress(); void[] ucData; ubyte[] block = new ubyte[BLOCK_SIZE]; foreach (ubyte[] buffer; file.byChunk(BLOCK_SIZE)) { writeln(buffer.length); ucData ~= uc.uncompress(buffer); } ucData ~= uc.flush(); writefln("Finished: %s", ucData.length); } Still errors out. But I have a hunch it has something to do with buffer being reused by file.byChunk, and zlib might internally be storing a pointer to the buffer while the GC might deallocate it in the meantime. Something like that, because if you .dup your buffer, you won't get errors anymore: import std.zlib; import std.stdio; const size_t BLOCK_SIZE = 1024; void main(string[] a) { auto file = File(a[1], "r"); auto uc = new UnCompress(); void[] ucData; ubyte[] block = new ubyte[BLOCK_SIZE]; foreach (ubyte[] buffer; file.byChunk(BLOCK_SIZE)) { writeln(buffer.length); ucData ~= uc.uncompress(buffer.dup); } ucData ~= uc.flush(); writefln("Finished: %s", ucData.length); } It might just be that zlib expects all data passed in to be valid while you use the UnCompress() class. I have no other explanation. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
Copyright © 1999-2021 by the D Language Foundation