Jump to page: 1 2
Thread overview
std.zlib.decompress may have flawed logic for allocating buffer
Aug 30, 2004
Lynn A.
Aug 31, 2004
Sean Kelly
Sep 01, 2004
Dave
Sep 01, 2004
Lynn
Sep 01, 2004
Dave
Sep 01, 2004
Lynn Allan
Sep 01, 2004
Sean Kelly
Sep 01, 2004
Dave
Sep 02, 2004
Dave
Sep 02, 2004
Sean Kelly
Sep 02, 2004
Lynn Allan
Sep 04, 2004
Lynn Allan
Sep 04, 2004
Dave
Sep 01, 2004
Lynn Allan
August 30, 2004
It appears there may be a bug in std.zlib.uncompress. The size of the output buffer is apparently estimated from the size of the input, and can be inadequate when the decompression ratio is over 2x.

Rather than repeat other info, there is discussion in the following thread from
the digitalmars.d newsgroup:
"Having problems with uncompress of zip file created by std.zlib"

There is a corresponding thread that is more specific about the uncompress behavior, and has more info (including code) at: http://dsource.org/forums/viewtopic.php?t=321

HTH,
Lynn A.


August 31, 2004
In article <cgv2k4$2dau$1@digitaldaemon.com>, Lynn A. says...
>
>It appears there may be a bug in std.zlib.uncompress. The size of the output buffer is apparently estimated from the size of the input, and can be inadequate when the decompression ratio is over 2x.
>
>Rather than repeat other info, there is discussion in the following thread from
>the digitalmars.d newsgroup:
>"Having problems with uncompress of zip file created by std.zlib"
>
>There is a corresponding thread that is more specific about the uncompress behavior, and has more info (including code) at: http://dsource.org/forums/viewtopic.php?t=321

Working on it.  I'm just about done updating etc.c.zlib from v1.1.4 to v1.2.1. After that I'm going to look at this bug.

Sean


September 01, 2004
In article <ch29mp$ugn$1@digitaldaemon.com>, Sean Kelly says...
>
>In article <cgv2k4$2dau$1@digitaldaemon.com>, Lynn A. says...
>>
>>It appears there may be a bug in std.zlib.uncompress. The size of the output buffer is apparently estimated from the size of the input, and can be inadequate when the decompression ratio is over 2x.
>>
>>Rather than repeat other info, there is discussion in the following thread from
>>the digitalmars.d newsgroup:
>>"Having problems with uncompress of zip file created by std.zlib"
>>
>>There is a corresponding thread that is more specific about the uncompress behavior, and has more info (including code) at: http://dsource.org/forums/viewtopic.php?t=321
>
>Working on it.  I'm just about done updating etc.c.zlib from v1.1.4 to v1.2.1. After that I'm going to look at this bug.
>
>Sean
>

Before you posted this, I checked into it and believe I found a fix.

I'm not a zlib expert, but this seems to take care of the problem with 1.1.4.

Change line 159 from:
err = etc.c.zlib.inflate(&zs, Z_FINISH);

to:
err = etc.c.zlib.inflate(&zs, Z_NO_FLUSH);


Here's how I tested it:

#import std.zlib;
#import std.stdio;
#import std.random;
#
#bool CompressThenUncompress (ubyte[] src)
#{
#  try {
#    ubyte[] dst = cast(ubyte[])std.zlib.compress(cast(void[])src);
#    double ratio = (dst.length / cast(double)src.length);
#    writef("src.length:  ", src.length, ", dst: ", dst.length, ", Ratio = ",
ratio);
#    ubyte[] uncompressedBuf;
#    uncompressedBuf = cast(ubyte[])std.zlib.uncompress(cast(void[])dst);
#    assert(src.length == uncompressedBuf.length);
#    assert(src == uncompressedBuf);
#  }
#  catch {
#    writefln(" ... Exception thrown when src.length = ", src.length, ".");
#    return false;
#  }
#  return true;
#}
#
#void main (char[][] args)
#{
#    // smallish buffers
#    for(int idx = 0; idx < 25; idx++) {
#        char[] buf = new char[rand() % 100];
#
#        // Alternate between more & less compressible
#        foreach(inout char c; buf) c = ' ' + (rand() % (idx % 2 ? 91 : 2));
#
#        if(CompressThenUncompress(cast(ubyte[])buf)) {
#            printf("; Success.\n");
#        } else {
#            return;
#        }
#    }
#
#    // larger buffers
#    for(int idx = 0; idx < 25; idx++) {
#        char[] buf = new char[rand() % 10000000];
#
#        // Alternate between more & less compressible
#        foreach(inout char c; buf) c = ' ' + (rand() % (idx % 2 ? 91 : 10));
#
#        if(CompressThenUncompress(cast(ubyte[])buf)) {
#            printf("; Success.\n");
#        } else {
#            return;
#        }
#    }
#
#    printf("PASSED all tests.\n");
#}


September 01, 2004
>#        // Alternate between more & less compressible
>#        foreach(inout char c; buf) c = ' ' + (rand() % (idx % 2 ? 91 : 10));

<alert comment="newbie>

Looks like great progress is taking place. THANKS from a prospective user of std.zlib.

One suggestion: The above test is ingenious (and over my head), but with large buffers the resulting compression ratios are invariably about 46.9% for the more compressible buffer, and about 82.4% for the less compressible buffer. I would advocate logic that generated test buffers with much greater variability of the compression ratio ... perhaps from 1:10 to 2:1.  And also a contrived buffer that was all one letter (e.g. '1'), all two letters repeated many times (e.g. "1212121212 ... 1212"), all three letters (e.g. "123"), etc.

Specifically, I anticipate using zlib for quite regular specialized text that compresses the original from about 4.1 mb to 1.1 mb. = 73%. It would be an ironic implementation of "Murphy's Law" if the fix worked well with a limited range of compression ratios, but couldn't handle a buffer of 5000 "Test Test Test Test ... repeat 4995 more times ... Test".

Another suggestion: perhaps the tests that are being put together could eventually be incorporated as zlib unittests? Also, could the people doing the fixing post the modified zlib object code somewhere for other people to try out? My impression is that recompiling Phobos is not for faint-hearted newbies.

(Another suggestion: perhaps I should do more than advocate the suggestions above, but "roll up my sleeves" and contribute time and effort. "People tend to be as useless as they can get away with, and as resourceful as the circumstances demand." L.Allan :-)

</alert>


September 01, 2004
In article <ch4f53$26q7$1@digitaldaemon.com>, Lynn says...
>
>My impression is that recompiling Phobos is not for faint-hearted newbies.
>

1) On Windows, make sure the dm\bin and dmd\bin folders are in your PATH (you may have to move them towards the 'front' of the PATH if other make tools are already present in the PATH).

2) Next, cd to each:

dmd/src/phobos/etc/c/zlib/
dmd/src/phobos/etc/c/recls/
dmd/src/phobos/internal/gc/
dmd/src/phobos/

3) and edit the win32.mak files to make sure that the CC and DMD vars. are set properly (if your path is setup Ok, then you can just change them to dmc and dmd respectively).

4) Then run "make -f win32.mak" in each.

5) copy dmd/src/phobos/phobos.lib to dmd/lib (after making a backup ;)

The steps are similar for Linux, except the make files are named linux.mak, the C compiler won't be DMC and you need to copy dmd/src/phobos/libphobos.a to /usr/lib after the build.

>
></alert>


September 01, 2004
In article <ch3ajm$1gks$1@digitaldaemon.com>, Dave says...
>
>Before you posted this, I checked into it and believe I found a fix.
>
>I'm not a zlib expert, but this seems to take care of the problem with 1.1.4.
>
>Change line 159 from:
>err = etc.c.zlib.inflate(&zs, Z_FINISH);
>
>to:
>err = etc.c.zlib.inflate(&zs, Z_NO_FLUSH);

Yup, that should do it.  I actually missed the while(1) bit and thought inflate was only being called once.  It should also work if you set the value to Z_SYNC_FLUSH.


Sean


September 01, 2004
In article <ch4uav$2dfk$1@digitaldaemon.com>, Sean Kelly says...
>
>In article <ch3ajm$1gks$1@digitaldaemon.com>, Dave says...
>>
>>Before you posted this, I checked into it and believe I found a fix.
>>
>>I'm not a zlib expert, but this seems to take care of the problem with 1.1.4.
>>
>>Change line 159 from:
>>err = etc.c.zlib.inflate(&zs, Z_FINISH);
>>
>>to:
>>err = etc.c.zlib.inflate(&zs, Z_NO_FLUSH);
>
>Yup, that should do it.  I actually missed the while(1) bit and thought inflate was only being called once.  It should also work if you set the value to Z_SYNC_FLUSH.
>
>
>Sean
>

Thanks!

Timed both with the following on Linux and there was virtually no difference between the two averaged over 3 runs each. Also timed the previous code post - again no difference.

The Z_NO_FLUSH was tested on Windows as well and "passed".

What is the best way to submit this so it is included in the next build?

import std.zlib;
import std.stdio;
import std.random;

bool CompressThenUncompress (char[] src)
{
try {
char[] dst = cast(char[])std.zlib.compress(cast(void[])src);
double ratio = (src.length / cast(double)dst.length);
if(src != cast(char[])std.zlib.uncompress(cast(void[])dst)) {
writefln(" ... NO MATCH when src.length = ", src.length, ".");
return false;
} else {
writef("src.length:  ", src.length, ", dst: ", dst.length, ", Ratio = ", ratio);
}
}
catch {
writefln(" ... Exception thrown when src.length = ", src.length, ".");
return false;
}

return true;
}

void main (char[][] args)
{
for(int idx = 0; idx < 50; idx++) {
char[] buf = new char[(idx + 1) * 20000];

// Alternate between more & less compressible
foreach(inout char c; buf) c = ' ' + (rand() % (90 / ((idx % 5) + 1)));

if(CompressThenUncompress(buf)) {
printf("; Success.\n");
} else {
return;
}
}

printf("PASSED all tests.\n");
}


September 01, 2004
> 2) Next, cd to each:
> dmd/src/phobos/etc/c/zlib/
> dmd/src/phobos/etc/c/recls/
> dmd/src/phobos/internal/gc/
> dmd/src/phobos/
>
> 3) and edit the win32.mak files to make sure that the CC and DMD vars. are
> set
> properly (if your path is setup Ok, then you can just change them to dmc
> and dmd
> respectively).
>
> 4) Then run "make -f win32.mak" in each.
>

<alert comment="newbie">

I wasn't able to get dmd/src/phobos/c/recls to recompile. I was able to rebuild zlib.lib, but couldn't figure out how to get D code to reference my changes rather than phobos.lib. I suppose I've used IDE's too much.

After some head scratching, I wonder if you really meant that I should
attempt to rebuild:
dmd/src/photos/std/zlib.d

As a quick hack, I put test.d and zlib.d in the same directory, and put in
several :
writefln("Hello Lynn");
"fingerprints" within std.zlib.compress and std.zlib.uncompress. I compiled
with:
dmd test.d zlib.d
The "Hello Lynn" outputs shows up when running test.exe, so it seems like I
have an ad hoc environment from which I can make further tests.

Quick sanity check ... am I on the right track with this? My intent is to
incorporate the "fix candidates" identified in earlier posts, and then
exercise the "CompressThenUncompress" test scaffolding with test cases that
have compression ranges from 100:1 to 1:2. (such as
"1111111   .... 1111"
"12121212 ...  1212"
"123123123 ... 123123"

</alert>


September 01, 2004
<alert comment="newbie">

With the Z_NO_FLUSH change noted in earlier posts, I was able to recompile all previous code using a rebuilt zlib.obj and ... (drum roll, please) ... IT WORKED.  THANKS!

I also appended the following code to the code submitted by Dave. It has a test buffer than has extremely regular text in a buffer than grows from len=6 to about len=6,291,456. (6 * 2 **  20).  This caused the "compression ratio" to go from 1:2.33 for 6 bytes of "123123" to about 1000:1 for 6291456 bytes of "123123 ... 123123".

#   // increasingly long buffers with VERY high compressibility
#    // "123123 .... 123123"
#   for (int len = 6; len <= 10000000; len += len) {
#        char[] buf = new char[len];
#
#        for (int idx = 0; idx < len; idx += 3) {
#          buf[idx] = '1';
#          buf[idx + 1] = '2';
#          buf[idx + 2] = '3';
#        }
#        if(CompressThenUncompress(cast(ubyte[])buf)) {
#            printf("; Success.\n");
#        }
#        else {
#            return;
#        }
#    }


September 02, 2004
By "What is the best way to submit this so it is included in the next build?" I meant the Z_NO_FLUSH change, not the included code as one might expect given the context ;)

Geesh, I guess I should review my own posts more often /before/ I send them <g>

- Dave

In article <ch52eq$2fbi$1@digitaldaemon.com>, Dave says...
>
>Timed both with the following on Linux and there was virtually no difference between the two averaged over 3 runs each. Also timed the previous code post - again no difference.
>
>The Z_NO_FLUSH was tested on Windows as well and "passed".
>
>What is the best way to submit this so it is included in the next build?
>
>import std.zlib;
>import std.stdio;
>import std.random;
>
>bool CompressThenUncompress (char[] src)
>{
>try {
>char[] dst = cast(char[])std.zlib.compress(cast(void[])src);
>double ratio = (src.length / cast(double)dst.length);
>if(src != cast(char[])std.zlib.uncompress(cast(void[])dst)) {
>writefln(" ... NO MATCH when src.length = ", src.length, ".");
>return false;
>} else {
>writef("src.length:  ", src.length, ", dst: ", dst.length, ", Ratio = ", ratio);
>}
>}
>catch {
>writefln(" ... Exception thrown when src.length = ", src.length, ".");
>return false;
>}
>
>return true;
>}
>
>void main (char[][] args)
>{
>for(int idx = 0; idx < 50; idx++) {
>char[] buf = new char[(idx + 1) * 20000];
>
>// Alternate between more & less compressible
>foreach(inout char c; buf) c = ' ' + (rand() % (90 / ((idx % 5) + 1)));
>
>if(CompressThenUncompress(buf)) {
>printf("; Success.\n");
>} else {
>return;
>}
>}
>
>printf("PASSED all tests.\n");
>}
>


« First   ‹ Prev
1 2