View mode: basic / threaded / horizontal-split · Log in · Help
August 30, 2004
std.zlib.decompress may have flawed logic for allocating buffer
It appears there may be a bug in std.zlib.uncompress. The size of the output
buffer is apparently estimated from the size of the input, and can be inadequate
when the decompression ratio is over 2x.

Rather than repeat other info, there is discussion in the following thread from
the digitalmars.d newsgroup:
"Having problems with uncompress of zip file created by std.zlib"

There is a corresponding thread that is more specific about the uncompress
behavior, and has more info (including code) at:
http://dsource.org/forums/viewtopic.php?t=321

HTH,
Lynn A.
August 31, 2004
Re: std.zlib.decompress may have flawed logic for allocating buffer
In article <cgv2k4$2dau$1@digitaldaemon.com>, Lynn A. says...
>
>It appears there may be a bug in std.zlib.uncompress. The size of the output
>buffer is apparently estimated from the size of the input, and can be inadequate
>when the decompression ratio is over 2x.
>
>Rather than repeat other info, there is discussion in the following thread from
>the digitalmars.d newsgroup:
>"Having problems with uncompress of zip file created by std.zlib"
>
>There is a corresponding thread that is more specific about the uncompress
>behavior, and has more info (including code) at:
>http://dsource.org/forums/viewtopic.php?t=321

Working on it.  I'm just about done updating etc.c.zlib from v1.1.4 to v1.2.1.
After that I'm going to look at this bug.

Sean
September 01, 2004
Re: std.zlib.decompress may have flawed logic for allocating buffer
In article <ch29mp$ugn$1@digitaldaemon.com>, Sean Kelly says...
>
>In article <cgv2k4$2dau$1@digitaldaemon.com>, Lynn A. says...
>>
>>It appears there may be a bug in std.zlib.uncompress. The size of the output
>>buffer is apparently estimated from the size of the input, and can be inadequate
>>when the decompression ratio is over 2x.
>>
>>Rather than repeat other info, there is discussion in the following thread from
>>the digitalmars.d newsgroup:
>>"Having problems with uncompress of zip file created by std.zlib"
>>
>>There is a corresponding thread that is more specific about the uncompress
>>behavior, and has more info (including code) at:
>>http://dsource.org/forums/viewtopic.php?t=321
>
>Working on it.  I'm just about done updating etc.c.zlib from v1.1.4 to v1.2.1.
>After that I'm going to look at this bug.
>
>Sean
>

Before you posted this, I checked into it and believe I found a fix.

I'm not a zlib expert, but this seems to take care of the problem with 1.1.4.

Change line 159 from:
err = etc.c.zlib.inflate(&zs, Z_FINISH);

to:
err = etc.c.zlib.inflate(&zs, Z_NO_FLUSH);


Here's how I tested it:

#import std.zlib;
#import std.stdio;
#import std.random;
#
#bool CompressThenUncompress (ubyte[] src)
#{
#  try {
#    ubyte[] dst = cast(ubyte[])std.zlib.compress(cast(void[])src);
#    double ratio = (dst.length / cast(double)src.length);
#    writef("src.length:  ", src.length, ", dst: ", dst.length, ", Ratio = ",
ratio);
#    ubyte[] uncompressedBuf;
#    uncompressedBuf = cast(ubyte[])std.zlib.uncompress(cast(void[])dst);
#    assert(src.length == uncompressedBuf.length);
#    assert(src == uncompressedBuf);
#  }
#  catch {
#    writefln(" ... Exception thrown when src.length = ", src.length, ".");
#    return false;
#  }
#  return true;
#}
#
#void main (char[][] args)
#{
#    // smallish buffers
#    for(int idx = 0; idx < 25; idx++) {
#        char[] buf = new char[rand() % 100];
#
#        // Alternate between more & less compressible
#        foreach(inout char c; buf) c = ' ' + (rand() % (idx % 2 ? 91 : 2));
#
#        if(CompressThenUncompress(cast(ubyte[])buf)) {
#            printf("; Success.\n");
#        } else {
#            return;
#        }
#    }
#
#    // larger buffers
#    for(int idx = 0; idx < 25; idx++) {
#        char[] buf = new char[rand() % 10000000];
#
#        // Alternate between more & less compressible
#        foreach(inout char c; buf) c = ' ' + (rand() % (idx % 2 ? 91 : 10));
#
#        if(CompressThenUncompress(cast(ubyte[])buf)) {
#            printf("; Success.\n");
#        } else {
#            return;
#        }
#    }
#
#    printf("PASSED all tests.\n");
#}
September 01, 2004
Re: std.zlib.decompress may have flawed logic for allocating buffer
>#        // Alternate between more & less compressible
>#        foreach(inout char c; buf) c = ' ' + (rand() % (idx % 2 ? 91 : 10));

<alert comment="newbie>

Looks like great progress is taking place. THANKS from a prospective user of
std.zlib.

One suggestion: The above test is ingenious (and over my head), but with large
buffers the resulting compression ratios are invariably about 46.9% for the more
compressible buffer, and about 82.4% for the less compressible buffer. I would
advocate logic that generated test buffers with much greater variability of the
compression ratio ... perhaps from 1:10 to 2:1.  And also a contrived buffer
that was all one letter (e.g. '1'), all two letters repeated many times (e.g.
"1212121212 ... 1212"), all three letters (e.g. "123"), etc.

Specifically, I anticipate using zlib for quite regular specialized text that
compresses the original from about 4.1 mb to 1.1 mb. = 73%. It would be an
ironic implementation of "Murphy's Law" if the fix worked well with a limited
range of compression ratios, but couldn't handle a buffer of 5000 "Test Test
Test Test ... repeat 4995 more times ... Test".

Another suggestion: perhaps the tests that are being put together could
eventually be incorporated as zlib unittests? Also, could the people doing the
fixing post the modified zlib object code somewhere for other people to try out?
My impression is that recompiling Phobos is not for faint-hearted newbies.

(Another suggestion: perhaps I should do more than advocate the suggestions
above, but "roll up my sleeves" and contribute time and effort. "People tend to
be as useless as they can get away with, and as resourceful as the circumstances
demand." L.Allan :-)

</alert>
September 01, 2004
Re: std.zlib.decompress may have flawed logic for allocating buffer
In article <ch4f53$26q7$1@digitaldaemon.com>, Lynn says...
>
>My impression is that recompiling Phobos is not for faint-hearted newbies.
>

1) On Windows, make sure the dm\bin and dmd\bin folders are in your PATH (you
may have to move them towards the 'front' of the PATH if other make tools are
already present in the PATH).

2) Next, cd to each:

dmd/src/phobos/etc/c/zlib/
dmd/src/phobos/etc/c/recls/
dmd/src/phobos/internal/gc/
dmd/src/phobos/

3) and edit the win32.mak files to make sure that the CC and DMD vars. are set
properly (if your path is setup Ok, then you can just change them to dmc and dmd
respectively).

4) Then run "make -f win32.mak" in each.

5) copy dmd/src/phobos/phobos.lib to dmd/lib (after making a backup ;)

The steps are similar for Linux, except the make files are named linux.mak, the
C compiler won't be DMC and you need to copy dmd/src/phobos/libphobos.a to
/usr/lib after the build.

>
></alert>
September 01, 2004
Re: std.zlib.decompress may have flawed logic for allocating buffer
In article <ch3ajm$1gks$1@digitaldaemon.com>, Dave says...
>
>Before you posted this, I checked into it and believe I found a fix.
>
>I'm not a zlib expert, but this seems to take care of the problem with 1.1.4.
>
>Change line 159 from:
>err = etc.c.zlib.inflate(&zs, Z_FINISH);
>
>to:
>err = etc.c.zlib.inflate(&zs, Z_NO_FLUSH);

Yup, that should do it.  I actually missed the while(1) bit and thought inflate
was only being called once.  It should also work if you set the value to
Z_SYNC_FLUSH.


Sean
September 01, 2004
Re: std.zlib.decompress may have flawed logic for allocating buffer
In article <ch4uav$2dfk$1@digitaldaemon.com>, Sean Kelly says...
>
>In article <ch3ajm$1gks$1@digitaldaemon.com>, Dave says...
>>
>>Before you posted this, I checked into it and believe I found a fix.
>>
>>I'm not a zlib expert, but this seems to take care of the problem with 1.1.4.
>>
>>Change line 159 from:
>>err = etc.c.zlib.inflate(&zs, Z_FINISH);
>>
>>to:
>>err = etc.c.zlib.inflate(&zs, Z_NO_FLUSH);
>
>Yup, that should do it.  I actually missed the while(1) bit and thought inflate
>was only being called once.  It should also work if you set the value to
>Z_SYNC_FLUSH.
>
>
>Sean
>

Thanks!

Timed both with the following on Linux and there was virtually no difference
between the two averaged over 3 runs each. Also timed the previous code post -
again no difference.

The Z_NO_FLUSH was tested on Windows as well and "passed".

What is the best way to submit this so it is included in the next build?

import std.zlib;
import std.stdio;
import std.random;

bool CompressThenUncompress (char[] src)
{
try {
char[] dst = cast(char[])std.zlib.compress(cast(void[])src);
double ratio = (src.length / cast(double)dst.length);
if(src != cast(char[])std.zlib.uncompress(cast(void[])dst)) {
writefln(" ... NO MATCH when src.length = ", src.length, ".");
return false;
} else {
writef("src.length:  ", src.length, ", dst: ", dst.length, ", Ratio = ", ratio);
}
}
catch {
writefln(" ... Exception thrown when src.length = ", src.length, ".");
return false;
}

return true;
}

void main (char[][] args)
{
for(int idx = 0; idx < 50; idx++) {
char[] buf = new char[(idx + 1) * 20000];

// Alternate between more & less compressible
foreach(inout char c; buf) c = ' ' + (rand() % (90 / ((idx % 5) + 1)));

if(CompressThenUncompress(buf)) {
printf("; Success.\n");
} else {
return;
}
}

printf("PASSED all tests.\n");
}
September 01, 2004
Re: std.zlib.decompress may have flawed logic for allocating buffer
> 2) Next, cd to each:
> dmd/src/phobos/etc/c/zlib/
> dmd/src/phobos/etc/c/recls/
> dmd/src/phobos/internal/gc/
> dmd/src/phobos/
>
> 3) and edit the win32.mak files to make sure that the CC and DMD vars. are 
> set
> properly (if your path is setup Ok, then you can just change them to dmc 
> and dmd
> respectively).
>
> 4) Then run "make -f win32.mak" in each.
>

<alert comment="newbie">

I wasn't able to get dmd/src/phobos/c/recls to recompile. I was able to 
rebuild zlib.lib, but couldn't figure out how to get D code to reference my 
changes rather than phobos.lib. I suppose I've used IDE's too much.

After some head scratching, I wonder if you really meant that I should 
attempt to rebuild:
dmd/src/photos/std/zlib.d

As a quick hack, I put test.d and zlib.d in the same directory, and put in 
several :
writefln("Hello Lynn");
"fingerprints" within std.zlib.compress and std.zlib.uncompress. I compiled 
with:
dmd test.d zlib.d
The "Hello Lynn" outputs shows up when running test.exe, so it seems like I 
have an ad hoc environment from which I can make further tests.

Quick sanity check ... am I on the right track with this? My intent is to 
incorporate the "fix candidates" identified in earlier posts, and then 
exercise the "CompressThenUncompress" test scaffolding with test cases that 
have compression ranges from 100:1 to 1:2. (such as
"1111111   .... 1111"
"12121212 ...  1212"
"123123123 ... 123123"

</alert>
September 01, 2004
Re: std.zlib.decompress may have flawed logic for allocating buffer
<alert comment="newbie">

With the Z_NO_FLUSH change noted in earlier posts, I was able to recompile 
all previous code using a rebuilt zlib.obj and ... (drum roll, please) ... 
IT WORKED.  THANKS!

I also appended the following code to the code submitted by Dave. It has a 
test buffer than has extremely regular text in a buffer than grows from 
len=6 to about len=6,291,456. (6 * 2 **  20).  This caused the "compression 
ratio" to go from 1:2.33 for 6 bytes of "123123" to about 1000:1 for 6291456 
bytes of "123123 ... 123123".

#   // increasingly long buffers with VERY high compressibility
#    // "123123 .... 123123"
#   for (int len = 6; len <= 10000000; len += len) {
#        char[] buf = new char[len];
#
#        for (int idx = 0; idx < len; idx += 3) {
#          buf[idx] = '1';
#          buf[idx + 1] = '2';
#          buf[idx + 2] = '3';
#        }
#        if(CompressThenUncompress(cast(ubyte[])buf)) {
#            printf("; Success.\n");
#        }
#        else {
#            return;
#        }
#    }
September 02, 2004
Re: std.zlib.decompress may have flawed logic for allocating buffer
By "What is the best way to submit this so it is included in the next build?" I
meant the Z_NO_FLUSH change, not the included code as one might expect given the
context ;)

Geesh, I guess I should review my own posts more often /before/ I send them <g>

- Dave

In article <ch52eq$2fbi$1@digitaldaemon.com>, Dave says...
>
>Timed both with the following on Linux and there was virtually no difference
>between the two averaged over 3 runs each. Also timed the previous code post -
>again no difference.
>
>The Z_NO_FLUSH was tested on Windows as well and "passed".
>
>What is the best way to submit this so it is included in the next build?
>
>import std.zlib;
>import std.stdio;
>import std.random;
>
>bool CompressThenUncompress (char[] src)
>{
>try {
>char[] dst = cast(char[])std.zlib.compress(cast(void[])src);
>double ratio = (src.length / cast(double)dst.length);
>if(src != cast(char[])std.zlib.uncompress(cast(void[])dst)) {
>writefln(" ... NO MATCH when src.length = ", src.length, ".");
>return false;
>} else {
>writef("src.length:  ", src.length, ", dst: ", dst.length, ", Ratio = ", ratio);
>}
>}
>catch {
>writefln(" ... Exception thrown when src.length = ", src.length, ".");
>return false;
>}
>
>return true;
>}
>
>void main (char[][] args)
>{
>for(int idx = 0; idx < 50; idx++) {
>char[] buf = new char[(idx + 1) * 20000];
>
>// Alternate between more & less compressible
>foreach(inout char c; buf) c = ' ' + (rand() % (90 / ((idx % 5) + 1)));
>
>if(CompressThenUncompress(buf)) {
>printf("; Success.\n");
>} else {
>return;
>}
>}
>
>printf("PASSED all tests.\n");
>}
>
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home