Thread overview
Having problems with uncompress of zip file created by std.zlib
Aug 28, 2004
Lynn Allan
Aug 28, 2004
Walter
Aug 28, 2004
Lynn Allan
Aug 29, 2004
Ben Hinkle
Aug 29, 2004
Lynn Allan
Aug 29, 2004
Sean Kelly
Aug 30, 2004
Lynn Allan
August 28, 2004
I'm puzzled why the code below doesn't work. I'm attempting to use the
std.zlib.uncompress on a file that was created with std.zlib.compress. In
the code, there is a "version" that:
* reads Test.vpl into step01 (plain text with <crlf>'s)
* compresses into step02 and writes Test.zip
* reads Test.zip into buffer step03
* checks length of step02 and step03 equal to confirm read+write+read
* attempts to uncompress the buffer from Test.zip into char[] buffer step04
* gets exception with message, "Error: buf error"

The same file without -version=WriteFileBeforeReading skips the 1st, 2nd, and 4th steps above to see if it works better to let the Test.zip file close. I've tried a variety of combinations with reusing buffers, small files, big files.

Am I doing something wrong? Leaving out a step or three? Do I need to incorporate std.zip.ZipArchive and ArchiveMember for a Test.zip that only includes one file?

Lynn A.


//********************************************************
//********************************************************
import std.file;
import std.zlib;

int main (char[][] args)
{
  version(WriteFileBeforeReading) // dmd -version=WriteFileBeforeReading
test.d
  {
    printf("Reached version: WriteFileBeforeReading\n");
    char[] inputStep01 = cast(char[])std.file.read("Test.vpl");
    ubyte[] compressedStep02 = cast(ubyte[])compress(inputStep01);
    printf("inputStep01 size:          %d\n", inputStep01.length);
    printf("compressedStep02 size:     %d\n", compressedStep02.length);
    std.file.write("Test.zip", compressedStep02);
  }
  printf("Reached past: WriteFileBeforeReading\n");
  ubyte [] compressedStep03 = cast(ubyte[])std.file.read("Test.zip");
  printf("Test.zip size:               %d\n", compressedStep03.length);

  version(WriteFileBeforeReading)
  { assert(compressedStep02.length == compressedStep03.length);
  }
  char[] textUncompressedStep04 = cast(char[])uncompress(compressedStep03);
  printf("textUncompressedStep04 size: %d\n",
textUncompressedStep04.length);

  return 0;
}


August 28, 2004
The following test program does a simple read/write of a zip file. It might be helpful.
----------------------------

import std.file;
import std.date;
import std.zip;
import std.zlib;

int main(char[][] args)
{
    byte[] buffer;
    std.zip.ZipArchive zr;
    char[] zipname;
    ubyte[] data;

    testzlib();
    if (args.length > 1)
 zipname = args[1];
    else
 zipname = "test.zip";
    buffer = cast(byte[])std.file.read(zipname);
    zr = new std.zip.ZipArchive(cast(void[])buffer);
    printf("comment = '%.*s'\n", zr.comment);
    zr.print();

    foreach (ArchiveMember de; zr.directory)
    {
 de.print();
 printf("date = '%.*s'\n", std.date.toString(std.date.toDtime(de.time)));

 arrayPrint(de.compressedData);

 data = zr.expand(de);
 printf("data = '%.*s'\n", data);
    }

    printf("**Success**\n");

    zr = new std.zip.ZipArchive();
    ArchiveMember am = new ArchiveMember();
    am.compressionMethod = 8;
    am.name = "foo.bar";
    //am.extra = cast(ubyte[])"ExTrA";
    am.expandedData = cast(ubyte[])"We all live in a yellow submarine, a
yellow submarine";
    am.expandedSize = am.expandedData.length;
    zr.addMember(am);
    void[] data2 = zr.build();
    std.file.write("foo.zip", cast(byte[])data2);

    return 0;
}

void arrayPrint(ubyte[] array)
{
    //printf("array %p,%d\n", (void*)array, array.length);
    for (int i = 0; i < array.length; i++)
    {
 printf("%02x ", array[i]);
 if (((i + 1) & 15) == 0)
     printf("\n");
    }
    printf("\n\n");
}

void testzlib()
{
    ubyte[] src = cast(ubyte[])
"the quick brown fox jumps over the lazy dog\r
the quick brown fox jumps over the lazy dog\r
";
    ubyte[] dst;

    arrayPrint(src);
    dst = cast(ubyte[])std.zlib.compress(cast(void[])src);
    arrayPrint(dst);
    src = cast(ubyte[])std.zlib.uncompress(cast(void[])dst);
    arrayPrint(src);
}


August 28, 2004
<alert comment="newbie">

I'm still having problems so I removed the std.file logic to better illustrate the misbehavior I'm seeing. The exceptions thrown seem related to the size of the buffer handled by std.zlib.uncompress. Or that I never really woke up this morning???

The code is similar to Walter B.'s sample code for using zip, except using larger buffers and doesn't "reuse" the original src buffer as the destination of uncompress. Eventually, I want to read in a 1.1 meg plain text file that has been compressed with std.zlib.compress from about 4.1 meg. The application will use std.zlib.uncompress and proceed. The original uncompressed buffer will be read in from a file, but this simplified sample code just uses arrays to check what happens when a plain text buffer is compressed, and then uncompressed.

To summarize, main declares different text buffers of varying sizes and then calls CompressThenUncompress. Oddly, the same CompressThenUncompress code (below) that works for a buffer of 30 ubytes may fail inconsistently with 60 ubytes. The way the buffer is declared also seems to make a difference.

There seems to be a 'threshold' of about 50 bytes, but that isn't consistent either. I suspect that I'm confused about declaring arrays of ubytes??

I've included the code below, which may be hard to view depending on word wrap. It may be more viewable at: http://dsource.org/forums/viewtopic.php?t=321

Am I doing something wrong or leaving out a step or three? The output from running the program is shown at the bottom..

</alert>

// *******************************
// *******************************
import std.zlib;
import std.stdio;

void CompressThenUncompress (ubyte[] src)
{
  try {
    ubyte[] dst = cast(ubyte[])std.zlib.compress(cast(void[])src);
    writef("src.length:  ", src.length, " dst: ", dst.length);
    ubyte[] uncompressedBuf;
    uncompressedBuf = cast(ubyte[])std.zlib.uncompress(cast(void[])dst);
    writefln(" ... Got past std.zlib.uncompress. dst.length: ", dst.length);
    assert(src.length == uncompressedBuf.length);
    assert(src == uncompressedBuf);
  }
  catch {
    writefln(" ... Exception thrown when src.length = ", src.length, ". Keep
going");
  }
}

char[] outerBuf30 =  "000000000011111111112222222222";
char[] outerBuf40 =  "0000000000111111111122222222223333333333";
char[] outerBuf50 =  "00000000001111111111222222222233333333334444444444";
char[] outerBuf100 = "00000000001111111111222222222233333333334444444444"
                     "01234567890123456789012345678901234567890123456789";

void main (char[][] args)
{
  char[] buf32 = "0123456789 0123456789 0123456789";
  CompressThenUncompress(cast(ubyte[])buf32);  // Works ok

  char[] buf40 = "0123456789 0123456789 0123456789 0123456";
  CompressThenUncompress(cast(ubyte[])buf40);  // Works ok

  char[] buf60 = "0123456789 0123456789 0123456789 0123456790 123456789
123456";
  CompressThenUncompress(cast(ubyte[])buf60);  // Throws exception

  ubyte[] ubuf60 = cast(ubyte[])"0123456789 0123456789 0123456789 "
                                "0123456790 123456789 123456";
  CompressThenUncompress(ubuf60);              // Throws exception

  char[] buf80 = "0123456789012345678901234567890123456789"
                 "0123456789012345678901234567890123456789";
  CompressThenUncompress(cast(ubyte[])buf80);  // Throws exception

  CompressThenUncompress(cast(ubyte[])"This string is 28 chars long");
//ok
  CompressThenUncompress(cast(ubyte[])"This string is 42 chars long "
                                      "0123456789012");
//ok
  CompressThenUncompress(cast(ubyte[])"This string is 46 chars long "
                                      "01234567890123456");
//ok
  CompressThenUncompress(cast(ubyte[])"This string is 60 chars long "
                                      "0123456789012345678901234567890");
//ok
  CompressThenUncompress(cast(ubyte[])"This string is 80 chars long "
                                      "0123456789012345678901234567890"
                                      "12345678901234567890");
//ok

  CompressThenUncompress(cast(ubyte[])outerBuf30);      // ok
  CompressThenUncompress(cast(ubyte[])outerBuf40);      // Throws exception
  CompressThenUncompress(cast(ubyte[])outerBuf50);      // Throws exception
  CompressThenUncompress(cast(ubyte[])outerBuf100);     // Throws exception
}

// Results from running above code for different array declarations
src.length:  32 dst: 22 ... Got past std.zlib.uncompress. dst.length: 22
src.length:  40 dst: 22 ... Got past std.zlib.uncompress. dst.length: 22
src.length:  60 dst: 28 ... Exception thrown when src.length = 60. Keep
going
src.length:  60 dst: 28 ... Exception thrown when src.length = 60. Keep
going
src.length:  80 dst: 21 ... Exception thrown when src.length = 80. Keep
going
src.length:  28 dst: 34 ... Got past std.zlib.uncompress. dst.length: 34
src.length:  42 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
src.length:  46 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
src.length:  60 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
src.length:  80 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
src.length:  30 dst: 16 ... Got past std.zlib.uncompress. dst.length: 16
src.length:  40 dst: 19 ... Exception thrown when src.length = 40. Keep
going
src.length:  50 dst: 21 ... Exception thrown when src.length = 50. Keep
going
src.length: 100 dst: 33 ... Exception thrown when src.length = 100. Keep
going


"Walter" <newshound@digitalmars.com> wrote in message news:cgper0$2hoh$1@digitaldaemon.com...
> The following test program does a simple read/write of a zip file. It
might
> be helpful.
> ----------------------------
>
> import std.file;
> import std.date;
> import std.zip;
> import std.zlib;
>
> int main(char[][] args)
> {
>     byte[] buffer;
>     std.zip.ZipArchive zr;
>     char[] zipname;
>     ubyte[] data;
>
>     testzlib();
>     if (args.length > 1)
>  zipname = args[1];
>     else
>  zipname = "test.zip";
>     buffer = cast(byte[])std.file.read(zipname);
>     zr = new std.zip.ZipArchive(cast(void[])buffer);
>     printf("comment = '%.*s'\n", zr.comment);
>     zr.print();
>
>     foreach (ArchiveMember de; zr.directory)
>     {
>  de.print();
>  printf("date = '%.*s'\n", std.date.toString(std.date.toDtime(de.time)));
>
>  arrayPrint(de.compressedData);
>
>  data = zr.expand(de);
>  printf("data = '%.*s'\n", data);
>     }
>
>     printf("**Success**\n");
>
>     zr = new std.zip.ZipArchive();
>     ArchiveMember am = new ArchiveMember();
>     am.compressionMethod = 8;
>     am.name = "foo.bar";
>     //am.extra = cast(ubyte[])"ExTrA";
>     am.expandedData = cast(ubyte[])"We all live in a yellow submarine, a
> yellow submarine";
>     am.expandedSize = am.expandedData.length;
>     zr.addMember(am);
>     void[] data2 = zr.build();
>     std.file.write("foo.zip", cast(byte[])data2);
>
>     return 0;
> }
>
> void arrayPrint(ubyte[] array)
> {
>     //printf("array %p,%d\n", (void*)array, array.length);
>     for (int i = 0; i < array.length; i++)
>     {
>  printf("%02x ", array[i]);
>  if (((i + 1) & 15) == 0)
>      printf("\n");
>     }
>     printf("\n\n");
> }
>
> void testzlib()
> {
>     ubyte[] src = cast(ubyte[])
> "the quick brown fox jumps over the lazy dog\r
> the quick brown fox jumps over the lazy dog\r
> ";
>     ubyte[] dst;
>
>     arrayPrint(src);
>     dst = cast(ubyte[])std.zlib.compress(cast(void[])src);
>     arrayPrint(dst);
>     src = cast(ubyte[])std.zlib.uncompress(cast(void[])dst);
>     arrayPrint(src);
> }
>
>


August 29, 2004
It looks like a bug in std.zlib.uncompress. The code
    if (!destlen)
        destlen = srcbuf.length * 2 + 1;
doesn't always allocate enough space for the result. When I change the 1 to
100 (or something big like that) all the examples in your test work. I have
no idea what the "right" value should be. I was just playing around with
different values.

-Ben

Lynn Allan wrote:

> <alert comment="newbie">
> 
> I'm still having problems so I removed the std.file logic to better illustrate the misbehavior I'm seeing. The exceptions thrown seem related to the size of the buffer handled by std.zlib.uncompress. Or that I never really woke up this morning???
> 
> The code is similar to Walter B.'s sample code for using zip, except using larger buffers and doesn't "reuse" the original src buffer as the destination of uncompress. Eventually, I want to read in a 1.1 meg plain text file that has been compressed with std.zlib.compress from about 4.1 meg. The application will use std.zlib.uncompress and proceed. The original uncompressed buffer will be read in from a file, but this simplified sample code just uses arrays to check what happens when a plain text buffer is compressed, and then uncompressed.
> 
> To summarize, main declares different text buffers of varying sizes and then calls CompressThenUncompress. Oddly, the same CompressThenUncompress code (below) that works for a buffer of 30 ubytes may fail inconsistently with 60 ubytes. The way the buffer is declared also seems to make a difference.
> 
> There seems to be a 'threshold' of about 50 bytes, but that isn't consistent either. I suspect that I'm confused about declaring arrays of ubytes??
> 
> I've included the code below, which may be hard to view depending on word wrap. It may be more viewable at: http://dsource.org/forums/viewtopic.php?t=321
> 
> Am I doing something wrong or leaving out a step or three? The output from running the program is shown at the bottom..
> 
> </alert>
> 
> // *******************************
> // *******************************
> import std.zlib;
> import std.stdio;
> 
> void CompressThenUncompress (ubyte[] src)
> {
>   try {
>     ubyte[] dst = cast(ubyte[])std.zlib.compress(cast(void[])src);
>     writef("src.length:  ", src.length, " dst: ", dst.length);
>     ubyte[] uncompressedBuf;
>     uncompressedBuf = cast(ubyte[])std.zlib.uncompress(cast(void[])dst);
>     writefln(" ... Got past std.zlib.uncompress. dst.length: ",
>     dst.length); assert(src.length == uncompressedBuf.length);
>     assert(src == uncompressedBuf);
>   }
>   catch {
>     writefln(" ... Exception thrown when src.length = ", src.length, ".
>     Keep
> going");
>   }
> }
> 
> char[] outerBuf30 =  "000000000011111111112222222222";
> char[] outerBuf40 =  "0000000000111111111122222222223333333333";
> char[] outerBuf50 =  "00000000001111111111222222222233333333334444444444";
> char[] outerBuf100 = "00000000001111111111222222222233333333334444444444"
>                      "01234567890123456789012345678901234567890123456789";
> 
> void main (char[][] args)
> {
>   char[] buf32 = "0123456789 0123456789 0123456789";
>   CompressThenUncompress(cast(ubyte[])buf32);  // Works ok
> 
>   char[] buf40 = "0123456789 0123456789 0123456789 0123456";
>   CompressThenUncompress(cast(ubyte[])buf40);  // Works ok
> 
>   char[] buf60 = "0123456789 0123456789 0123456789 0123456790 123456789
> 123456";
>   CompressThenUncompress(cast(ubyte[])buf60);  // Throws exception
> 
>   ubyte[] ubuf60 = cast(ubyte[])"0123456789 0123456789 0123456789 "
>                                 "0123456790 123456789 123456";
>   CompressThenUncompress(ubuf60);              // Throws exception
> 
>   char[] buf80 = "0123456789012345678901234567890123456789"
>                  "0123456789012345678901234567890123456789";
>   CompressThenUncompress(cast(ubyte[])buf80);  // Throws exception
> 
>   CompressThenUncompress(cast(ubyte[])"This string is 28 chars long");
> //ok
>   CompressThenUncompress(cast(ubyte[])"This string is 42 chars long "
>                                       "0123456789012");
> //ok
>   CompressThenUncompress(cast(ubyte[])"This string is 46 chars long "
>                                       "01234567890123456");
> //ok
>   CompressThenUncompress(cast(ubyte[])"This string is 60 chars long "
>                                       "0123456789012345678901234567890");
> //ok
>   CompressThenUncompress(cast(ubyte[])"This string is 80 chars long "
>                                       "0123456789012345678901234567890"
>                                       "12345678901234567890");
> //ok
> 
>   CompressThenUncompress(cast(ubyte[])outerBuf30);      // ok
>   CompressThenUncompress(cast(ubyte[])outerBuf40);      // Throws
>   exception
>   CompressThenUncompress(cast(ubyte[])outerBuf50);      // Throws
>   exception
>   CompressThenUncompress(cast(ubyte[])outerBuf100);     // Throws
>   exception
> }
> 
> // Results from running above code for different array declarations
> src.length:  32 dst: 22 ... Got past std.zlib.uncompress. dst.length: 22
> src.length:  40 dst: 22 ... Got past std.zlib.uncompress. dst.length: 22
> src.length:  60 dst: 28 ... Exception thrown when src.length = 60. Keep
> going
> src.length:  60 dst: 28 ... Exception thrown when src.length = 60. Keep
> going
> src.length:  80 dst: 21 ... Exception thrown when src.length = 80. Keep
> going
> src.length:  28 dst: 34 ... Got past std.zlib.uncompress. dst.length: 34
> src.length:  42 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
> src.length:  46 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
> src.length:  60 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
> src.length:  80 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
> src.length:  30 dst: 16 ... Got past std.zlib.uncompress. dst.length: 16
> src.length:  40 dst: 19 ... Exception thrown when src.length = 40. Keep
> going
> src.length:  50 dst: 21 ... Exception thrown when src.length = 50. Keep
> going
> src.length: 100 dst: 33 ... Exception thrown when src.length = 100. Keep
> going
> 
> 
> "Walter" <newshound@digitalmars.com> wrote in message news:cgper0$2hoh$1@digitaldaemon.com...
>> The following test program does a simple read/write of a zip file. It
> might
>> be helpful.
>> ----------------------------
>>
>> import std.file;
>> import std.date;
>> import std.zip;
>> import std.zlib;
>>
>> int main(char[][] args)
>> {
>>     byte[] buffer;
>>     std.zip.ZipArchive zr;
>>     char[] zipname;
>>     ubyte[] data;
>>
>>     testzlib();
>>     if (args.length > 1)
>>  zipname = args[1];
>>     else
>>  zipname = "test.zip";
>>     buffer = cast(byte[])std.file.read(zipname);
>>     zr = new std.zip.ZipArchive(cast(void[])buffer);
>>     printf("comment = '%.*s'\n", zr.comment);
>>     zr.print();
>>
>>     foreach (ArchiveMember de; zr.directory)
>>     {
>>  de.print();
>>  printf("date = '%.*s'\n", std.date.toString(std.date.toDtime(de.time)));
>>
>>  arrayPrint(de.compressedData);
>>
>>  data = zr.expand(de);
>>  printf("data = '%.*s'\n", data);
>>     }
>>
>>     printf("**Success**\n");
>>
>>     zr = new std.zip.ZipArchive();
>>     ArchiveMember am = new ArchiveMember();
>>     am.compressionMethod = 8;
>>     am.name = "foo.bar";
>>     //am.extra = cast(ubyte[])"ExTrA";
>>     am.expandedData = cast(ubyte[])"We all live in a yellow submarine, a
>> yellow submarine";
>>     am.expandedSize = am.expandedData.length;
>>     zr.addMember(am);
>>     void[] data2 = zr.build();
>>     std.file.write("foo.zip", cast(byte[])data2);
>>
>>     return 0;
>> }
>>
>> void arrayPrint(ubyte[] array)
>> {
>>     //printf("array %p,%d\n", (void*)array, array.length);
>>     for (int i = 0; i < array.length; i++)
>>     {
>>  printf("%02x ", array[i]);
>>  if (((i + 1) & 15) == 0)
>>      printf("\n");
>>     }
>>     printf("\n\n");
>> }
>>
>> void testzlib()
>> {
>>     ubyte[] src = cast(ubyte[])
>> "the quick brown fox jumps over the lazy dog\r
>> the quick brown fox jumps over the lazy dog\r
>> ";
>>     ubyte[] dst;
>>
>>     arrayPrint(src);
>>     dst = cast(ubyte[])std.zlib.compress(cast(void[])src);
>>     arrayPrint(dst);
>>     src = cast(ubyte[])std.zlib.uncompress(cast(void[])dst);
>>     arrayPrint(src);
>> }
>>
>>

August 29, 2004
Interesting ... I found the snippet you noted below in the phobos zlib code. Does that mean that there isn't really a workaround for someone using std.zlib? My impression is that std.zlib was ported from the original C code.

> if (!destlen)
>   destlen = srcbuf.length * 2 + 1;

"Ben Hinkle" <bhinkle4@juno.com> wrote in message news:<cgr95t$u3q$1@digitaldaemon.com>...
> It looks like a bug in std.zlib.uncompress. The code
> if (!destlen)
> destlen = srcbuf.length * 2 + 1;
> doesn't always allocate enough space for the result. When I change the 1
to
> 100 (or something big like that) all the examples in your test work. I
have
> no idea what the "right" value should be. I was just playing around with different values.
>
> -Ben
>


August 29, 2004
> My impression is that std.zlib was ported from the original C code.

interestingly, those lines of code do not appear in the original zlib source.  i think what walter tried to do was "approximate" a buffer size, which is not really the best way to go about it, as the size of the uncompressed data is not necessarily (2*compressed)+1.  it would be better just to fail than try to carry on half-assedly in this case.

or, rather than returning a void[], it could accept an out void[] for the dest buffer.  though it wouldn't be as elegant :P


August 29, 2004
Ben Hinkle wrote:
> It looks like a bug in std.zlib.uncompress. The code
>     if (!destlen)
>         destlen = srcbuf.length * 2 + 1;
> doesn't always allocate enough space for the result. When I change the 1 to
> 100 (or something big like that) all the examples in your test work. I have
> no idea what the "right" value should be. I was just playing around with
> different values.

Typical usage of zlib is to loop on inflate until all the data has been extracted--it looks like the current implementation is trying to do everything in one pass.  I'd be happy to fix this, though I won't have time until tomorrow.

Also, the core zlib inflate/deflate functions do not generate or parse a zip header.  This process is only taken care of by the printf-type functions in the library (which don't operate on memory buffers).  While not having a header is fine (and probably preferable) for application-specific data, it means that std.zlib will not be able to read or write zip files usable by other programs.  I've written in-memory wrappers for zlib before that take care of this issue and would be happy to do something about it if folks are interested.  For the free functions the best way to do this would be to add a bit parameter at the end to specify whether the header should be processed/generated.  For the classes this could be a value passed on construction.  Default would be to off.

Frankly, it would be nice if the zlib routines didn't allocate a new buffer for every function call.  Maybe a new set of functions that take both the input and output buffers as parameters?  The output buffer might still have to grow if it's not big enough.


Sean
August 30, 2004
"Sean Kelly" <sean@f4.ca> wrote in message news:cgtc4a$1ojk$1@digitaldaemon.com...
> Ben Hinkle wrote:
> > It looks like a bug in std.zlib.uncompress. The code
> >     if (!destlen)
> >         destlen = srcbuf.length * 2 + 1;
> > doesn't always allocate enough space for the result. When I change the 1
to
> > 100 (or something big like that) all the examples in your test work. I
have
> > no idea what the "right" value should be. I was just playing around with different values.
>
> Typical usage of zlib is to loop on inflate until all the data has been extracted--it looks like the current implementation is trying to do everything in one pass.  I'd be happy to fix this, though I won't have time until tomorrow.

I've posted as a std.zlib.decompress bug, and appreciate Sean K's offer to
fix.
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/1677

Lynn A.