View mode: basic / threaded / horizontal-split · Log in · Help
April 05, 2012
unzip parallel, 3x faster than 7zip
I uploaded a parallel unzip here, and the main in the examples 
folder.  Testing on my ssd drive, unzips a 2GB directory 
structure in 17.5 secs.  7zip took 55 secs on the same file.  
This restores timestamps on the regular files.  There is also a 
loop which will restore timestams on folders.  It can be 
uncommented if the fix is added to std.file.setTimes that allows 
timestamp updates on folders.  I documented a fix that I tested 
in issue 7819.

https://github.com/jnorwood/file_parallel

http://d.puremagic.com/issues/show_bug.cgi?id=7819


This has similar limitations to std.zip,  Only does inflate or 
store, doesn't do decryption.  There is a 4GB limit based on the 
32 bit offsets limit of the zip format used.  It processes 40MB 
blocks of files, and uses std.parallelism foreach loop.  If the 
archived entry is larger than 40MB it will attempt to load it 
into memory, but there currently is no expansion technique in 
there to split a large single entry into blocks.

I used the streams io to avoid the 2GB file limits still in stdio.
April 05, 2012
Re: unzip parallel, 3x faster than 7zip
On Thursday, 5 April 2012 at 14:04:57 UTC, Jay Norwood wrote:
> I uploaded a parallel unzip here, and the main in the examples 
> folder.

So, below is a demo of how to use the example app in windows, 
where I unzipped a 2GB directory structure from a 1GB zip file, 
tzip.zip.

02/18/2012  03:23 PM    <DIR>          test
03/30/2012  11:28 AM       968,727,390 tzip.zip
04/05/2012  08:07 AM           462,364 uzp.exe
03/21/2012  10:26 AM         1,603,584 wc.exe
03/06/2012  12:20 AM    <DIR>          xx8
              13 File(s)  1,071,302,938 bytes
              14 Dir(s)  49,315,860,480 bytes fre

H:\>uzp tzip.zip tz
unzipping: .\tzip.zip
finished! time: 17183 ms

02/18/2012  03:23 PM    <DIR>          test
04/05/2012  08:12 AM    <DIR>          tz
03/30/2012  11:28 AM       968,727,390 tzip.zip
04/05/2012  08:07 AM           462,364 uzp.exe
03/21/2012  10:26 AM         1,603,584 wc.exe
03/06/2012  12:20 AM    <DIR>          xx8
              13 File(s)  1,071,302,938 bytes
              15 Dir(s)  47,078,543,360 bytes free


The example supports several forms of commandline:
uzp zipFilename to unzip in current folder, or
uzp zipFilename destFoldername to unzip into the destination 
folder, or
uzp zipf1 zipf2 zipf3 destFoldername to unzip multiple zip files 
to dest folder, or
uzp zipf* destFoldername to unzip multiple zip files (wildarg 
expansion)to dest folder

It overwrites existing directory entries without asking in the 
current form.
April 05, 2012
Re: unzip parallel, 3x faster than 7zip
On Thursday, 5 April 2012 at 15:07:47 UTC, Jay Norwood wrote:
........

so, a few comments about std.zip...

I attempted to use it and found that its way of unzipping is a 
memory hog, keeping the full original and all the unzipped data 
in memory.  It quickly ran out of memory on my test case.

The classes didn't lend themselves to parallel execution, so I 
broke them into a few pieces ... one creates the directory 
structure, one reads in compressed archive entries, one expands 
archive entries.

The app creates the directory structure non-parallel using the 
mkdir recursive.

I found that creating the directory structure only took about 0.4 
secs of the total time in that 2GB test.

I found that creating the directory structure, reading the zip 
entries, and expanding the data, without writing to disk, took 
less than 4 secs, with the expansion done in parallel.

The other 13 to 14 secs were all taken up by writing out the 
files, with less than a half sec of that required to update the 
timestamps.  This is on about 39k directory entries.

The 17 sec result is on the intel 510 series ssd drive.  on a 
hard drive 7zip took 128 secs and uzp took about 70 sec.

G:\>uzp tzip.zip tz
unzipping: .\tzip.zip
finished! time: 69440 ms


It is interesting that  win7 takes longer to delete these 
directories than it does to create them.
April 05, 2012
Re: unzip parallel, 3x faster than 7zip
Am 05.04.2012 16:04, schrieb Jay Norwood:
> I uploaded a parallel unzip here, and the main in the examples
> folder.  Testing on my ssd drive, unzips a 2GB directory
> structure in 17.5 secs.  7zip took 55 secs on the same file.

it makes no sense to benchmark different algorithm zip<->7zip

compare only unzip and parallel unzip - nothing else makes sense
April 05, 2012
Re: unzip parallel, 3x faster than 7zip
On 04/05/2012 06:37 PM, dennis luehring wrote:
> Am 05.04.2012 16:04, schrieb Jay Norwood:
>> I uploaded a parallel unzip here, and the main in the examples
>> folder. Testing on my ssd drive, unzips a 2GB directory
>> structure in 17.5 secs. 7zip took 55 secs on the same file.
>
> it makes no sense to benchmark different algorithm zip<->7zip
>
> compare only unzip and parallel unzip - nothing else makes sense
>

I think he is talking about 7zip the standalone software, not 7zip the 
compression algorithm.

> 7zip took 55 secs _on the same file_.
April 05, 2012
Re: unzip parallel, 3x faster than 7zip
>
> I think he is talking about 7zip the standalone software, not 
> 7zip the compression algorithm.
>
>> 7zip took 55 secs _on the same file_.

Yes, that's right, both 7zip and this uzp program are using the 
same deflate standard format of zip for this test.  It is the 
only expand format that is supported in std.zip.   7zip was used 
to create the zip file used in the test.

7zip already has multi-core compression capability, but no 
multi-core uncompress.  I haven't seen any multi-core uncompress 
for deflate format, but I did see one for bzip2 named pbzip2.  In 
general, though, inflate/deflate are the fastest algorithms I've 
seen, when comparing the ones that are available in 7zip.   I'm 
happy with the 7zip performance on compress with the inflate 
format, but not on the uncompress, so I will be using this uzp 
app.


I'm curious why win7 is such a dog when removing directories.  I 
see a lot of disk read activity going on which seems to dominate 
the delete time. This doesn't make any sense to me unless there 
is some file caching being triggered  on files being deleted. I 
don't see any virus checker app being triggered ... it all seems 
to be system read activity.  Maybe I'll try non cached flags, 
write truncate to 0 length before deleting and see if that 
results in faster execution when the files are deleted...
April 06, 2012
Re: unzip parallel, 3x faster than 7zip
On 4/5/2012 6:53 PM, Jay Norwood wrote:
>
>>
>
> I'm curious why win7 is such a dog when removing directories. I see a
> lot of disk read activity going on which seems to dominate the delete
> time. This doesn't make any sense to me unless there is some file
> caching being triggered on files being deleted. I don't see any virus
> checker app being triggered ... it all seems to be system read activity.
> Maybe I'll try non cached flags, write truncate to 0 length before
> deleting and see if that results in faster execution when the files are
> deleted...
>
>
>

If you delete a directory containing several hundred thousand 
directories (each with 4-5 files inside, don't ask), you can see windows 
freeze for long periods (10+seconds) of time until it is finished, which 
affects everything up to and including the audio mixing (it starts 
looping etc).
April 06, 2012
Re: unzip parallel, 3x faster than 7zip
On Friday, 6 April 2012 at 14:55:14 UTC, Sean Cavanaugh wrote:
>
> If you delete a directory containing several hundred thousand 
> directories (each with 4-5 files inside, don't ask), you can 
> see windows freeze for long periods (10+seconds) of time until 
> it is finished, which affects everything up to and including 
> the audio mixing (it starts looping etc).

Yeah, I saw posts by people doing video complaining about such 
things.  One good suggestion was to create may small volumes for 
separate projects and just do a fast format on them rather than 
trying to delete folders.

I got procmon to see what is going on.  Win7 has doing indexing 
and thumbnails, and there was some virus checker going on, but 
you can get rid of those. Still, most of the problem just boils 
down to the duration of the delete on close being proportional to 
the size of the file, and apparently related to the access times 
of the disk.  I sometimes see .25 sec duration for a single file  
during the close of the delete operations on the hard drive.

I've been using an intel 510 series 120GB drive for recording 
concerts. It is hooked up with an ineo usb3 adaptor to the front 
panel port of an rme ufx recorder.  The laptop is just used as a 
controller ... the ufx does all the mixing and recording to the 
hard drive.
April 07, 2012
Re: unzip parallel, 3x faster than 7zip
Am 05.04.2012 19:04, schrieb Timon Gehr:
> On 04/05/2012 06:37 PM, dennis luehring wrote:
>>  Am 05.04.2012 16:04, schrieb Jay Norwood:
>>>  I uploaded a parallel unzip here, and the main in the examples
>>>  folder. Testing on my ssd drive, unzips a 2GB directory
>>>  structure in 17.5 secs. 7zip took 55 secs on the same file.
>>
>>  it makes no sense to benchmark different algorithm zip<->7zip
>>
>>  compare only unzip and parallel unzip - nothing else makes sense
>>
>
> I think he is talking about 7zip the standalone software, not 7zip the
> compression algorithm.
>
>>  7zip took 55 secs _on the same file_.

that is ok but he still compares different implementations
April 07, 2012
Re: unzip parallel, 3x faster than 7zip
Am 06.04.2012 01:53, schrieb Jay Norwood:
> I'm curious why win7 is such a dog when removing directories.  I
> see a lot of disk read activity going on which seems to dominate
> the delete time.

try windows safe-mode (without network :} - your virus scanner is 
disabled), press F8 before windows start - thats seems to remove
many strange pauses,blockings etc. - still no idea why, but a good 
testenvironment
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home