April 07, 2012
On Saturday, 7 April 2012 at 05:02:04 UTC, dennis luehring wrote:
>>> 7zip took 55 secs _on the same file_.
>
> that is ok but he still compares different implementations

7zip is the program.  It unzips many formats, with the standard zip format being one of them.  The parallel d program is three times faster at decoding the zip format than 7zip decodes the same file on the same ssd drive.  That is an appropriate comparison since 7zip has been my utility of choice for unzipping zip format files on windows for many years.

I provided the source code in the examples folder for the complete command line utility that I used, so you may build it and compare it to whatever you like and report the results.

April 07, 2012

On 4/7/2012 12:32 AM, Jay Norwood wrote:
> I got procmon to see what is going on. Win7 has doing indexing and
> thumbnails, and there was some virus checker going on, but you can get
> rid of those. Still, most of the problem just boils down to the duration
> of the delete on close being proportional to the size of the file, and
> apparently related to the access times of the disk. I sometimes see .25
> sec duration for a single file during the close of the delete operations
> on the hard drive.

Maybe it is the trim command being executed on the sectors previously occupied by the file.

>
> I've been using an intel 510 series 120GB drive for recording concerts.
> It is hooked up with an ineo usb3 adaptor to the front panel port of an
> rme ufx recorder. The laptop is just used as a controller ... the ufx
> does all the mixing and recording to the hard drive.
April 07, 2012
On Saturday, 7 April 2012 at 11:41:41 UTC, Rainer Schuetze wrote:
 >
> Maybe it is the trim command being executed on the sectors previously occupied by the file.
>

No, perhaps I didn't make it clear that the rmdir slowness is only an issue on hard drives.  I can unzip the 2GB archive in about 17.5 sec on the ssd drive, and delete it using the rmd multi-thread delete example program in less than 17 secs on the ssd drive.   The same operations on a hard drive take around 60 seconds to extract, but 1.5 to 3 minutes to delete.

H:\>uzp tzip.zip tz
unzipping: .\tzip.zip
finished! time: 17405 ms

H:\>rmd tz
removing: .\tz
finished! time:16671 ms


I've been doing some reading on the web and studying the procmon logs. I am convinced the slow hard drive delete is an issue with seek times, since it is not an issue on the ssd.  It may be caused by fragmentation of the stored data or the mft itself, or else it could be that ntfs is doing some book-keeping journaling.  You are right that it could be doing delete notifications to any application watching the disk activity.  I've already turned off the virus checker and the indexing, but I'm going to try the tweaks in the second link and also try the  mydefrag program in the third link and see if anything improves the hd delete times.


http://ixbtlabs.com/articles/ntfs/index3.html
http://www.gilsmethod.com/speed-up-vista-with-these-simple-ntfs-tweaks
http://www.mydefrag.com/index.html


That mydefrag has some interesting ideas about sorting folders by full pathname on the disk as one of the defrag algorithms.  Perhaps using it, and also using  unzip and zip algorithms that match the defrag algorithm, would be a nice combination.  In other words, if the zip algorithm processes the files in a sorted-by-pathname order, and if the defrag algorithm has created folders that are sorted on disk by the same order, then you would expect optimally short seeks while processing the files in the order they are stored.

The mydefrag program uses the ntfs defrag api.  There is an article at the following link showing how to access it to get the Logical Cluster Numbers on disk for a file.  I suppose you could sort your file operations  by start LCN, of the file, for example during compression, and that might reduce the seek related delays.

http://blogs.msdn.com/b/jeffrey_wall/archive/2004/09/13/229137.aspx



April 07, 2012
On Saturday, 7 April 2012 at 17:08:33 UTC, Jay Norwood wrote:
> The mydefrag program uses the ntfs defrag api.  There is an article at the following link showing how to access it to get the Logical Cluster Numbers on disk for a file.  I suppose you could sort your file operations  by start LCN, of the file, for example during compression, and that might reduce the seek related delays.
>
> http://blogs.msdn.com/b/jeffrey_wall/archive/2004/09/13/229137.aspx

I did a complete defrag of the g hard drive, then did parallel unzip of the tz folder with rmd, then unzipped it again with the parallel uzp.  Then analyzed the disk again with mydefrag. The analysis shows the unzip resulted in over 300 fragmented files created, even though I wrote each expanded file in a single operation. So, I did a complete defrag again, then removed the folder again, and get about the same 109 secs for the delete operation on the hd  (vs about 17 sec on the ssd for the same operation).  The uzp parallel unzip is bout 85 secs vs about 17.5 sec on the ssd.

G:\>rmd tz
removing: .\tz
finished! time:109817 ms

G:\>uzp tzip.zip tz
unzipping: .\tzip.zip
finished! time: 85405 ms

G:\>rmd tz
removing: .\tz
finished! time:108387 ms

So ... it looks like the defrag helps, as the 109 sec values are at the low end of the range I've seen previously.  Still it is totally surprising to me that deleting files should take longer than creating the same files.

btw, here are the windows rmdir on the defragged hd and on the ssd drive, and the third measurement is the D  parallel rmd on the ssd ... much faster on D.

G:\>cmd /v:on /c "echo !TIME! & rmdir /q /s tz & echo !TIME!"
14:34:09.06
14:36:23.36

H:\>cmd /v:on /c "echo !TIME! & rmdir /q /s tz & echo !TIME!"
14:38:44.69
14:40:02.16

H:\>rmd tz
removing: .\tz
finished! time:17536 ms



April 08, 2012
Am Sat, 07 Apr 2012 21:45:04 +0200
schrieb "Jay Norwood" <jayn@prismnet.com>:

> So ... it looks like the defrag helps, as the 109 sec values are at the low end of the range I've seen previously.  Still it is totally surprising to me that deleting files should take longer than creating the same files.

Maybe the kernel caches writes, but synchronizes deletes? (So the seek times become apparent there, and not in the writes)
Also check the file creation flags, maybe you can hint Windows to the final file size and they wont be fragmented?
April 08, 2012
On Sunday, 8 April 2012 at 13:55:21 UTC, Marco Leise wrote:
> Maybe the kernel caches writes, but synchronizes deletes? (So the seek times become apparent there, and not in the writes)
> Also check the file creation flags, maybe you can hint Windows to the final file size and they wont be fragmented?

My understanding is that a delete operation occurs after all the file handles associated with a file are closed, assuming there other handles were opened with file_share_delete.  I believe otherwise you get an error from the attempt to delete.

I'm doing some experiments with myFrag sortByName() and it indicates to me that there will be huge improvments in delete efficiency available on a hard drive if you can figure out some way to get the os to arrange the files and directories in LCNs in that byName order.  Below are the delete time from win7 rmdir on the same 2GB folder with and without defrag using myFrag sortByName().

This is win7 rmdir following  myFrag sortByName() defrag ... less than 7 seconds
G:\>cmd /v:on /c "echo !TIME! & rmdir /q /s tz & echo !TIME!"
 9:06:33.79
 9:06:40.47


This is the same rmdir without defrag of the folder.  2 minutes 14 secs.
G:\>cmd /v:on /c "echo !TIME! & rmdir /q /s tz & echo !TIME!"
14:34:09.06
14:36:23.36

This is all on win7 ntfs, and I have no idea if similar gains are available for linux.

So, yes,  whatever tricks you can play with the win api in order to get it to organize the unzipped archive into this particular order is going to make huge improvements in the speed of delete.




1 2
Next ›   Last »