View mode: basic / threaded / horizontal-split · Log in · Help
February 14, 2012
parallel copy directory, faster than robocopy
Attached is the source for a small parallel app that copies a source folder to a destination.  It creates the directory structure first using the breadth ordering, then uses a parallel foreach loop with the taskPool to copy all the regular files in parallel.  On my corei7, this copied a 1.5GB folder with around 36K entries to a destination in about 11.5 secs (src and dest on the same ssd drive).  This was about a second better than robocopy, which is the fastest alternative I could find.   The regular win7-64 copy takes 41 secs for the same folder.

I'd like to add wildcard processing for the sources, but haven't found a good example.
February 14, 2012
Re: parallel copy directory, faster than robocopy
ok, so I guess the Add File didn't work for some reason, so here's the source.



module main;

import std.stdio;
import std.file;
import std.path;
import std.datetime;
import std.parallelism;

int main(string[] argv)
{
	if (argv.length != 3){
		writeln ("need to specify src and dest dir");
		return 0;
	}

	// TODO expand this to handle wildcard

	string dest = argv[$-1];
	foreach(string dir; argv[1..$-1])
	{
		writeln("copying directory: "~ dir );
		auto st1 = Clock.currTime(); //Current time in local time.
		cpdir(dir,dest); 
		auto st2 = Clock.currTime(); //Current time in local time.
		auto dif = st2  - st1 ;
		auto ts= dif.toString();
		writeln("time:"~ts);
	}
	writeln("finished !");
	return 0;
}
void cpdir(in char[] pathname ,in char[] dest){
   DirEntry deSrc = dirEntry(pathname);
	string[] files;

	if (!exists(dest)){
		mkdir (dest); // makes dest root
	}
	DirEntry destDe = dirEntry(dest);
	if(!destDe.isDir()){        
		throw new FileException( destDe.name, " is not a directory"); 
	}
	string destName = destDe.name ~ '/';

	if(!deSrc.isDir()){
		copy(deSrc.name,dest); 
	}
	else    { 
		string srcRoot = deSrc.name;
		int srcLen = srcRoot.length;
		string destRoot = destName ~ baseName(deSrc.name);
       mkdir(destRoot);
		
		// make an array of the regular files only, also create the directory structure
		// Since it is SpanMode.breadth, can just use mkdir
		foreach(DirEntry e; dirEntries(deSrc.name, SpanMode.breadth, false)){
			if (attrIsDir(e.linkAttributes)){
				string destDir = destRoot ~ e.name[srcLen..$];
				mkdir(destDir);
			}
			else{
				files ~= e.name;
			}
		} 

		// parallel foreach for regular files
		foreach(fn ; taskPool.parallel(files)) {
			string dfn = destRoot ~ fn[srcLen..$];
			copy(fn,dfn);
		}
	}
}
February 14, 2012
Re: parallel copy directory, faster than robocopy
ok, I didn't test that first one very well.  It worked for directory copies, but I didn't test non directories.  So here is the fixed operation for non directories, where it just copies the single file.  

So it now does two cases:
copy regular_file destinationDirectory
copy folder destinationDirectory

What I'd like to add is wildcard support for something like 
copy folder/* destinationDirectory

I suppose also it could be enhanced to handle all the robocopy options, but I'm just trying out the copy speeds for now.


module main;

import std.stdio;
import std.file;
import std.path;
import std.datetime;
import std.parallelism;

int main(string[] argv)
{
	if (argv.length != 3){
		writeln ("need to specify src and dest dir");
		return 0;
	}

	// TODO expand this to handle wildcard

	string dest = argv[$-1];
	foreach(string dir; argv[1..$-1])
	{
		writeln("copying directory: "~ dir );
		auto st1 = Clock.currTime(); //Current time in local time.
		cpdir(dir,dest); 
		auto st2 = Clock.currTime(); //Current time in local time.
		auto dif = st2  - st1 ;
		auto ts= dif.toString();
		writeln("time:"~ts);
	}
	writeln("finished !");
	return 0;
}
void cpdir(in char[] pathname ,in char[] dest){
   DirEntry deSrc = dirEntry(pathname);
	string[] files;

	if (!exists(dest)){
		mkdir (dest); // makes dest root
	}
	DirEntry destDe = dirEntry(dest);
	if(!destDe.isDir()){        
		throw new FileException( destDe.name, " is not a directory"); 
	}
	string destName = destDe.name ~ '/';
	string destRoot = destName ~ baseName(deSrc.name);

	if(!deSrc.isDir()){
		copy(deSrc.name,destRoot); 
	}
	else    { 
		string srcRoot = deSrc.name;
		int srcLen = srcRoot.length;
       mkdir(destRoot);
		
		// make an array of the regular files only, also create the directory structure
		// Since it is SpanMode.breadth, can just use mkdir
		foreach(DirEntry e; dirEntries(deSrc.name, SpanMode.breadth, false)){
			if (attrIsDir(e.linkAttributes)){
				string destDir = destRoot ~ e.name[srcLen..$];
				mkdir(destDir);
			}
			else{
				files ~= e.name;
			}
		} 

		// parallel foreach for regular files
		foreach(fn ; taskPool.parallel(files)) {
			string dfn = destRoot ~ fn[srcLen..$];
			copy(fn,dfn);
		}
	}
}
February 14, 2012
Re: parallel copy directory, faster than robocopy
An  improvement  is to change this first mkdir to mkdirRecurse.

 if (!exists(dest)){
               mkdir (dest); // makes dest root
       }
February 14, 2012
Re: parallel copy directory, faster than robocopy
Le 14/02/2012 14:29, Jay Norwood a écrit :
> An  improvement  is to change this first mkdir to mkdirRecurse.
>
>    if (!exists(dest)){
>                  mkdir (dest); // makes dest root
>          }
>

If I could suggest something, it would be great to see this added to 
std.file . As well as the multithreaded remove we talked about recently 
in another thread.
February 15, 2012
Re: parallel copy directory, faster than robocopy
On 2/13/2012 10:58 PM, Jay Norwood wrote:
> Attached is the source for a small parallel app that copies a source folder to a destination.  It creates the directory structure first using the breadth ordering, then uses a parallel foreach loop with the taskPool to copy all the regular files in parallel.  On my corei7, this copied a 1.5GB folder with around 36K entries to a destination in about 11.5 secs (src and dest on the same ssd drive).  This was about a second better than robocopy, which is the fastest alternative I could find.   The regular win7-64 copy takes 41 secs for the same folder.
>
> I'd like to add wildcard processing for the sources, but haven't found a good example.


more of an 'FYI/reminder':

At a minimum Robocopy does additional work to preserve the timestamps 
and attributes of the copies of the files (by default) so it can avoid 
redundant copies of files in the future.  This is undoubtedly creating 
some additional overhead.

Its probably also quite a bit worse with /SEC etc to copy permissions.

On the plus side you would have windows scheduling the IO which in 
theory would be able to minimize seeking to some degree, compared to 
robocopy's serial copying.
February 15, 2012
Re: parallel copy directory, faster than robocopy
"Jay Norwood" <jayn@prismnet.com> wrote in message 
news:jhcplo$1jj8$1@digitalmars.com...
> Attached is the source for a small parallel app that copies a source 
> folder to a destination.  It creates the directory structure first using 
> the breadth ordering, then uses a parallel foreach loop with the taskPool 
> to copy all the regular files in parallel.  On my corei7, this copied a 
> 1.5GB folder with around 36K entries to a destination in about 11.5 secs 
> (src and dest on the same ssd drive).  This was about a second better than 
> robocopy, which is the fastest alternative I could find.   The regular 
> win7-64 copy takes 41 secs for the same folder.
>
> I'd like to add wildcard processing for the sources, but haven't found a 
> good example.

Nice!

Is it possible this could increase disk fragmentation though? Or do the 
filesystem drivers on Win/Lin/etc work in a way that mitigates that 
possibility?
February 15, 2012
Re: parallel copy directory, faster than robocopy
On Wednesday, 15 February 2012 at 00:11:32 UTC, Sean Cavanaugh 
wrote:
> more of an 'FYI/reminder':
>
> At a minimum Robocopy does additional work to preserve the 
> timestamps and attributes of the copies of the files (by 
> default) so it can avoid redundant copies of files in the 
> future.  This is undoubtedly creating some additional overhead.
>
> Its probably also quite a bit worse with /SEC etc to copy 
> permissions.
>
> On the plus side you would have windows scheduling the IO which 
> in theory would be able to minimize seeking to some degree, 
> compared to robocopy's serial copying.

Yeah, Robocopy has a lot of nice options.  Currently the D 
library has copy (srcpath, destpath), which goes directly to the 
OS copy.   If it had something like 
copy(DirectoryEntry,destpath,options), with the options being 
like the Robocopy options, that might be more efficient.

On the ssd seeking is on the order of 0.2msec vs 16msec on my 
7200rpm seagate hard drive.  I do think seeks on a hard drive 
will be a problem with all the small, individual file copies.  So 
is Robocopy bundling these up in some way?

I did find a nice solution in std.file for the argv expansion, 
btw, and posted an example on D.learn.  It uses a version of 
dirEntries that has an extra parameter that is used for expansion 
that is available in std.path.
March 04, 2012
Re: parallel copy directory, faster than robocopy
I placed the two parallel file operations, rmdir and copy on 
github in

https://github.com/jnorwood/file_parallel

These combine the std.parallelism operations with the std.file 
operations to speed up the processing on Windows.
-----------
I also put a useful function that does argv pathname wildcard 
expansion in

https://github.com/jnorwood/file_utils

This makes use of one of the existing dirEntries call that has 
the pattern matching parameter which enables simple * and ? 
expansions in windows args.  I'm only allowing expansions in the 
basename, and only expanding in one level of the directory.

There are example Windows commandline utilies that use each of 
the functions in file_parallel/examples.

I've only testsd these on win7, 64 bit.
March 05, 2012
Re: parallel copy directory, faster than robocopy
On 3/4/12 2:53 PM, Jay Norwood wrote:
> I placed the two parallel file operations, rmdir and copy on github in
>
> https://github.com/jnorwood/file_parallel
>
> These combine the std.parallelism operations with the std.file
> operations to speed up the processing on Windows.
> -----------
> I also put a useful function that does argv pathname wildcard expansion in
>
> https://github.com/jnorwood/file_utils
>
> This makes use of one of the existing dirEntries call that has the
> pattern matching parameter which enables simple * and ? expansions in
> windows args. I'm only allowing expansions in the basename, and only
> expanding in one level of the directory.
>
> There are example Windows commandline utilies that use each of the
> functions in file_parallel/examples.
>
> I've only testsd these on win7, 64 bit.

Sounds great! Next step, should you be interested, is to create a pull 
request for phobos so we can integrate your code within.

Andrei
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home