| Thread overview | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
April 08, 2012 a pretty exciting result for parallel D lang rmd following defrag by name | ||||
|---|---|---|---|---|
| ||||
These are measured times to unzip and then delete a 2GB folder in Win7. Both are using the parallel rmd to remove the directory on a regular hard drive. The first measurement is for an unzip of the archive. The second is remove of the folder when no defrag has been done. The third is unzip of the same archive. Following it, I used a myDefrag script to sort the LCN positions of all the files in the folder based on the full path name. They describe this sort by name script on their website. Following that I ran the rmd D program to remove the folder, and it took only 3.7 secs ( vs 197 secs the first time). I thought I must have done something wrong so I repeated the whole thing, and zipd up the folder before deleting it and also looked at its properties and poked around in it. Same 3.7 second delete. I'll have to analyze what is happening, but this is a huge improvement. If it is just the sequential LCN order of the operations, it may be that I can just pre-sort the delete operations by the file lcn number and get similar results. It also makes a case for creating a zip and unzip implementations that preserve the sort by filepath order.
G:\>uzp tz.zip tz
unzipping: .\tz.zip
finished! time: 87066 ms
G:\>rmd tz
removing: .\tz
finished! time:197182 ms
G:\>uzp tzip.zip tz
unzipping: .\tzip.zip
finished! time: 86015 ms
G:\>rmd tz
removing: .\tz
finished! time:3654 ms
Below is the simple sortByName defrag script that I ran prior to the deletion.
# MyDefrag v4.0 default script: Sort By Name
#
# This is an example script.
Title('Sort By Name tz')
Description('
Sort all the files in G:\tz by name on all the selected disk(s).
')
WriteLogfile("MyDefrag.log","LogHeader")
VolumeSelect
Name("g:")
VolumeActions
AppendLogfile("MyDefrag.log","LogBefore")
FileSelect
DirectoryName("tz")
FileActions
SortByName(Ascending)
FileEnd
AppendLogfile("MyDefrag.log","LogAfter")
VolumeEnd
AppendLogfile("MyDefrag.log","LogFooter")
| ||||
April 08, 2012 Re: a pretty exciting result for parallel D lang rmd following defrag by name | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | On Sunday, 8 April 2012 at 01:18:49 UTC, Jay Norwood wrote: > in it. Same 3.7 second delete. I'll have to analyze what is happening, but this is a huge improvement. If it is just the sequential LCN order of the operations, it may be that I can just pre-sort the delete operations by the file lcn number and get similar results. I ran rmd in the debugger to look at the order of entries being returned from the depth first search. The directory entry list returned is sorted alphabetically the same whether or not the sortByName() defrag script has been executed. This article confirms that directory entries are sorted alphabetically. http://msdn.microsoft.com/en-us/library/ms995846.aspx "Directory entries are sorted alphabetically, which explains why NTFS files are always printed alphabetically in directory listings." I'll have to write something to dump the starting lcn for each directory entry and see if the sortByName defrag is matching the DirEntries list exactly. | |||
April 08, 2012 Re: a pretty exciting result for parallel D lang rmd following defrag by name | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | Le 08/04/2012 09:34, Jay Norwood a écrit :
> On Sunday, 8 April 2012 at 01:18:49 UTC, Jay Norwood wrote:
>> in it. Same 3.7 second delete. I'll have to analyze what is happening, but this is a huge improvement. If it is just the sequential LCN order of the operations, it may be that I can just pre-sort the delete operations by the file lcn number and get similar results.
>
> I ran rmd in the debugger to look at the order of entries being returned from the depth first search. The directory entry list returned is sorted alphabetically the same whether or not the sortByName() defrag script has been executed.
>
> This article confirms that directory entries are sorted alphabetically.
>
> http://msdn.microsoft.com/en-us/library/ms995846.aspx
>
> "Directory entries are sorted alphabetically, which explains why NTFS files are always printed alphabetically in directory listings."
>
> I'll have to write something to dump the starting lcn for each directory entry and see if the sortByName defrag is matching the DirEntries list exactly.
Hi,
You seem to have done a pretty good job with your parallel unzip. Have
you tried a parallel zip as well ?
Do you think you could include this in std.zip when you're done ?
| |||
April 08, 2012 Re: a pretty exciting result for parallel D lang rmd following defrag by name | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Somedude | On Sunday, 8 April 2012 at 09:21:43 UTC, Somedude wrote:
> Hi,
>
> You seem to have done a pretty good job with your parallel unzip. Have
> you tried a parallel zip as well ?
> Do you think you could include this in std.zip when you're done ?
I'm going to do a parallel zip as well. There is already parallel zip utility available with 7zip, so I haven't looked closely at D's std.zip.
These parallel implementations all bring in std.parallelism as a dependency, and I don't know if that is acceptable. I'm just putting them in my github for now, along with examples.
| |||
April 08, 2012 Re: a pretty exciting result for parallel D lang rmd following defrag by name | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | On Sunday, 8 April 2012 at 16:14:05 UTC, Jay Norwood wrote: There are signficant improvements also in copy operations as a result of defrag by Name. 43 seconds vs 1 min 43 secs for xcopy of sorted 2GB vs unsorted. this is the 2GB folder defragged with sorted LCN by pathname G:\>cmd /v:on /c "echo !TIME! & xcopy /q /e /I tz h:\tz & echo !TIME!" 12:28:15.30 34119 File(s) copied 12:28:58.81 this is the same 2GB folder, but not defragged G:\>cmd /v:on /c "echo !TIME! & xcopy /q /e /I tz h:\tz & echo !TIME!" 12:34:10.58 34119 File(s) copied 12:35:53.14 I think it is probable you would see a large part of this improvement if you sorted accesses by LCN, so you would need to have support for looking up the lcn for a directory entry. This guy has made c# wrappers for some of the ntfs defrag api. It includes looking up the lcns for a filename. We would just need the first lcn, and sort accesses by that. After unzip, there were only 300 of the 34000 entries that were fragmented, so my guess is just sorting accesses by the start lcn would provide most of the benefit that would be achieved by reorganizing the files with the defrag by filename. http://blogs.msdn.com/b/jeffrey_wall/archive/2004/09/13/229137.aspx | |||
April 08, 2012 Re: a pretty exciting result for parallel D lang rmd following defrag by name | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | Le 08/04/2012 18:14, Jay Norwood a écrit :
> On Sunday, 8 April 2012 at 09:21:43 UTC, Somedude wrote:
>> Hi,
>>
>> You seem to have done a pretty good job with your parallel unzip. Have
>> you tried a parallel zip as well ?
>> Do you think you could include this in std.zip when you're done ?
>
> I'm going to do a parallel zip as well. There is already parallel zip utility available with 7zip, so I haven't looked closely at D's std.zip.
>
> These parallel implementations all bring in std.parallelism as a dependency, and I don't know if that is acceptable. I'm just putting them in my github for now, along with examples.
>
>
Well, you can always do something like this:
version (parallel)
{
import std.parallelism;
// multithreaded
...
}
else
{
// single thread
...
}
| |||
April 08, 2012 Re: a pretty exciting result for parallel D lang rmd following defrag by name | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Somedude | Le 09/04/2012 00:15, Somedude a écrit :
> Le 08/04/2012 18:14, Jay Norwood a écrit :
>> On Sunday, 8 April 2012 at 09:21:43 UTC, Somedude wrote:
>>> Hi,
>>>
>>> You seem to have done a pretty good job with your parallel unzip. Have
>>> you tried a parallel zip as well ?
>>> Do you think you could include this in std.zip when you're done ?
>>
>> I'm going to do a parallel zip as well. There is already parallel zip utility available with 7zip, so I haven't looked closely at D's std.zip.
>>
>> These parallel implementations all bring in std.parallelism as a dependency, and I don't know if that is acceptable. I'm just putting them in my github for now, along with examples.
>>
>>
> Well, you can always do something like this:
>
> version (parallel)
> {
> import std.parallelism;
> // multithreaded
> ...
> }
> else
> {
> // single thread
> ...
> }
Or rather:
// single thread zip
...
version (parallel)
{
import std.parallelism;
// multithreaded
...
}
| |||
April 09, 2012 Re: a pretty exciting result for parallel D lang rmd following defrag by name | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Somedude | On Sunday, 8 April 2012 at 22:17:43 UTC, Somedude wrote:
>>>
>> Well, you can always do something like this:
>>
>> version (parallel)
>> {
>> import std.parallelism;
>> // multithreaded
>> ...
>> }
>> else
>> {
>> // single thread
>> ...
>> }
>
> Or rather:
>
> // single thread zip
> ...
> version (parallel)
> {
> import std.parallelism;
> // multithreaded
> ...
> }
ok, I'll look at doing that. Thanks.
| |||
April 21, 2012 Re: a pretty exciting result for parallel D lang rmd following defrag by name | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | I was able to achieve similar efficiency to the defrag result on ntfs by using a modified version of std.file.write that uses FILE_FLAG_WRITE_THROUGH. The ntfs rmdir of the 2GB layout takes 6 sec vs 161 sec when removing the unzipped layout. I posted the measurements in D.learn, as well as the modified code. http://forum.dlang.org/thread/gmkocaqzmlmfbuozhrsj@forum.dlang.org This has a big effect on processing files in a folder of a hard drive, where the operations are dominated by the seek times. | |||
April 22, 2012 Re: a pretty exciting result for parallel D lang rmd following defrag by name | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | Am Sun, 22 Apr 2012 01:10:18 +0200 schrieb "Jay Norwood" <jayn@prismnet.com>: > I was able to achieve similar efficiency to the defrag result on ntfs by using a modified version of std.file.write that uses FILE_FLAG_WRITE_THROUGH. The ntfs rmdir of the 2GB layout takes 6 sec vs 161 sec when removing the unzipped layout. I posted the measurements in D.learn, as well as the modified code. > > http://forum.dlang.org/thread/gmkocaqzmlmfbuozhrsj@forum.dlang.org > > This has a big effect on processing files in a folder of a hard drive, where the operations are dominated by the seek times. So when you did your first measurements, with 160 seconds for rmd, did you wait for the I/O to complete? Sorry if that's a stupid question :p but that's the obvious difference when using write-through from what the documentation says. -- Marco | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply