Jump to page: 1 2
Thread overview
unzip parallel, 3x faster than 7zip
Apr 05, 2012
Jay Norwood
Apr 05, 2012
Jay Norwood
Apr 05, 2012
Jay Norwood
Apr 05, 2012
dennis luehring
Apr 05, 2012
Timon Gehr
Apr 05, 2012
Jay Norwood
Apr 06, 2012
Sean Cavanaugh
Apr 06, 2012
Jay Norwood
Apr 07, 2012
Rainer Schuetze
Apr 07, 2012
Jay Norwood
Apr 07, 2012
Jay Norwood
Apr 08, 2012
Marco Leise
Apr 08, 2012
Jay Norwood
Apr 07, 2012
dennis luehring
Apr 07, 2012
dennis luehring
Apr 07, 2012
Jay Norwood
April 05, 2012
I uploaded a parallel unzip here, and the main in the examples folder.  Testing on my ssd drive, unzips a 2GB directory structure in 17.5 secs.  7zip took 55 secs on the same file.  This restores timestamps on the regular files.  There is also a loop which will restore timestams on folders.  It can be uncommented if the fix is added to std.file.setTimes that allows timestamp updates on folders.  I documented a fix that I tested in issue 7819.

https://github.com/jnorwood/file_parallel

http://d.puremagic.com/issues/show_bug.cgi?id=7819


This has similar limitations to std.zip,  Only does inflate or store, doesn't do decryption.  There is a 4GB limit based on the 32 bit offsets limit of the zip format used.  It processes 40MB blocks of files, and uses std.parallelism foreach loop.  If the archived entry is larger than 40MB it will attempt to load it into memory, but there currently is no expansion technique in there to split a large single entry into blocks.

I used the streams io to avoid the 2GB file limits still in stdio.
April 05, 2012
On Thursday, 5 April 2012 at 14:04:57 UTC, Jay Norwood wrote:
> I uploaded a parallel unzip here, and the main in the examples folder.

So, below is a demo of how to use the example app in windows, where I unzipped a 2GB directory structure from a 1GB zip file, tzip.zip.

02/18/2012  03:23 PM    <DIR>          test
03/30/2012  11:28 AM       968,727,390 tzip.zip
04/05/2012  08:07 AM           462,364 uzp.exe
03/21/2012  10:26 AM         1,603,584 wc.exe
03/06/2012  12:20 AM    <DIR>          xx8
              13 File(s)  1,071,302,938 bytes
              14 Dir(s)  49,315,860,480 bytes fre

H:\>uzp tzip.zip tz
unzipping: .\tzip.zip
finished! time: 17183 ms

02/18/2012  03:23 PM    <DIR>          test
04/05/2012  08:12 AM    <DIR>          tz
03/30/2012  11:28 AM       968,727,390 tzip.zip
04/05/2012  08:07 AM           462,364 uzp.exe
03/21/2012  10:26 AM         1,603,584 wc.exe
03/06/2012  12:20 AM    <DIR>          xx8
              13 File(s)  1,071,302,938 bytes
              15 Dir(s)  47,078,543,360 bytes free


The example supports several forms of commandline:
uzp zipFilename to unzip in current folder, or
uzp zipFilename destFoldername to unzip into the destination folder, or
uzp zipf1 zipf2 zipf3 destFoldername to unzip multiple zip files to dest folder, or
uzp zipf* destFoldername to unzip multiple zip files (wildarg expansion)to dest folder

It overwrites existing directory entries without asking in the current form.




April 05, 2012
On Thursday, 5 April 2012 at 15:07:47 UTC, Jay Norwood wrote:
........

so, a few comments about std.zip...

I attempted to use it and found that its way of unzipping is a memory hog, keeping the full original and all the unzipped data in memory.  It quickly ran out of memory on my test case.

The classes didn't lend themselves to parallel execution, so I broke them into a few pieces ... one creates the directory structure, one reads in compressed archive entries, one expands archive entries.

The app creates the directory structure non-parallel using the mkdir recursive.

I found that creating the directory structure only took about 0.4 secs of the total time in that 2GB test.

I found that creating the directory structure, reading the zip entries, and expanding the data, without writing to disk, took less than 4 secs, with the expansion done in parallel.

The other 13 to 14 secs were all taken up by writing out the files, with less than a half sec of that required to update the timestamps.  This is on about 39k directory entries.

The 17 sec result is on the intel 510 series ssd drive.  on a hard drive 7zip took 128 secs and uzp took about 70 sec.

G:\>uzp tzip.zip tz
unzipping: .\tzip.zip
finished! time: 69440 ms


It is interesting that  win7 takes longer to delete these directories than it does to create them.
April 05, 2012
Am 05.04.2012 16:04, schrieb Jay Norwood:
> I uploaded a parallel unzip here, and the main in the examples
> folder.  Testing on my ssd drive, unzips a 2GB directory
> structure in 17.5 secs.  7zip took 55 secs on the same file.

it makes no sense to benchmark different algorithm zip<->7zip

compare only unzip and parallel unzip - nothing else makes sense

April 05, 2012
On 04/05/2012 06:37 PM, dennis luehring wrote:
> Am 05.04.2012 16:04, schrieb Jay Norwood:
>> I uploaded a parallel unzip here, and the main in the examples
>> folder. Testing on my ssd drive, unzips a 2GB directory
>> structure in 17.5 secs. 7zip took 55 secs on the same file.
>
> it makes no sense to benchmark different algorithm zip<->7zip
>
> compare only unzip and parallel unzip - nothing else makes sense
>

I think he is talking about 7zip the standalone software, not 7zip the compression algorithm.

> 7zip took 55 secs _on the same file_.
April 05, 2012
>
> I think he is talking about 7zip the standalone software, not 7zip the compression algorithm.
>
>> 7zip took 55 secs _on the same file_.

Yes, that's right, both 7zip and this uzp program are using the same deflate standard format of zip for this test.  It is the only expand format that is supported in std.zip.   7zip was used to create the zip file used in the test.

7zip already has multi-core compression capability, but no multi-core uncompress.  I haven't seen any multi-core uncompress for deflate format, but I did see one for bzip2 named pbzip2.  In general, though, inflate/deflate are the fastest algorithms I've seen, when comparing the ones that are available in 7zip.   I'm happy with the 7zip performance on compress with the inflate format, but not on the uncompress, so I will be using this uzp app.


I'm curious why win7 is such a dog when removing directories.  I see a lot of disk read activity going on which seems to dominate the delete time. This doesn't make any sense to me unless there is some file caching being triggered  on files being deleted. I don't see any virus checker app being triggered ... it all seems to be system read activity.  Maybe I'll try non cached flags, write truncate to 0 length before deleting and see if that results in faster execution when the files are deleted...



April 06, 2012
On 4/5/2012 6:53 PM, Jay Norwood wrote:
>
>>
>
> I'm curious why win7 is such a dog when removing directories. I see a
> lot of disk read activity going on which seems to dominate the delete
> time. This doesn't make any sense to me unless there is some file
> caching being triggered on files being deleted. I don't see any virus
> checker app being triggered ... it all seems to be system read activity.
> Maybe I'll try non cached flags, write truncate to 0 length before
> deleting and see if that results in faster execution when the files are
> deleted...
>
>
>

If you delete a directory containing several hundred thousand directories (each with 4-5 files inside, don't ask), you can see windows freeze for long periods (10+seconds) of time until it is finished, which affects everything up to and including the audio mixing (it starts looping etc).

April 06, 2012
On Friday, 6 April 2012 at 14:55:14 UTC, Sean Cavanaugh wrote:
>
> If you delete a directory containing several hundred thousand directories (each with 4-5 files inside, don't ask), you can see windows freeze for long periods (10+seconds) of time until it is finished, which affects everything up to and including the audio mixing (it starts looping etc).

Yeah, I saw posts by people doing video complaining about such things.  One good suggestion was to create may small volumes for separate projects and just do a fast format on them rather than trying to delete folders.

I got procmon to see what is going on.  Win7 has doing indexing and thumbnails, and there was some virus checker going on, but you can get rid of those. Still, most of the problem just boils down to the duration of the delete on close being proportional to the size of the file, and apparently related to the access times of the disk.  I sometimes see .25 sec duration for a single file  during the close of the delete operations on the hard drive.

I've been using an intel 510 series 120GB drive for recording concerts. It is hooked up with an ineo usb3 adaptor to the front panel port of an rme ufx recorder.  The laptop is just used as a controller ... the ufx does all the mixing and recording to the hard drive.
April 07, 2012
Am 05.04.2012 19:04, schrieb Timon Gehr:
> On 04/05/2012 06:37 PM, dennis luehring wrote:
>>  Am 05.04.2012 16:04, schrieb Jay Norwood:
>>>  I uploaded a parallel unzip here, and the main in the examples
>>>  folder. Testing on my ssd drive, unzips a 2GB directory
>>>  structure in 17.5 secs. 7zip took 55 secs on the same file.
>>
>>  it makes no sense to benchmark different algorithm zip<->7zip
>>
>>  compare only unzip and parallel unzip - nothing else makes sense
>>
>
> I think he is talking about 7zip the standalone software, not 7zip the
> compression algorithm.
>
>>  7zip took 55 secs _on the same file_.

that is ok but he still compares different implementations
April 07, 2012
Am 06.04.2012 01:53, schrieb Jay Norwood:
> I'm curious why win7 is such a dog when removing directories.  I
> see a lot of disk read activity going on which seems to dominate
> the delete time.

try windows safe-mode (without network :} - your virus scanner is disabled), press F8 before windows start - thats seems to remove
many strange pauses,blockings etc. - still no idea why, but a good testenvironment

« First   ‹ Prev
1 2