Jump to page: 1 2
Thread overview
parallel copy directory, faster than robocopy
Feb 14, 2012
Jay Norwood
Feb 14, 2012
Jay Norwood
Feb 14, 2012
Jay Norwood
Feb 14, 2012
Jay Norwood
Feb 14, 2012
deadalnix
Feb 15, 2012
Sean Cavanaugh
Feb 15, 2012
Jay Norwood
Feb 15, 2012
Nick Sabalausky
Mar 04, 2012
Jay Norwood
Mar 05, 2012
Jay Norwood
Feb 06, 2018
Danny
Mar 05, 2012
dennis luehring
Mar 05, 2012
Jay Norwood
Mar 06, 2012
Jay Norwood
Dec 11, 2017
jackreacher
Feb 07, 2018
rumbu
February 14, 2012
Attached is the source for a small parallel app that copies a source folder to a destination.  It creates the directory structure first using the breadth ordering, then uses a parallel foreach loop with the taskPool to copy all the regular files in parallel.  On my corei7, this copied a 1.5GB folder with around 36K entries to a destination in about 11.5 secs (src and dest on the same ssd drive).  This was about a second better than robocopy, which is the fastest alternative I could find.   The regular win7-64 copy takes 41 secs for the same folder.

I'd like to add wildcard processing for the sources, but haven't found a good example.
February 14, 2012
ok, so I guess the Add File didn't work for some reason, so here's the source.



module main;

import std.stdio;
import std.file;
import std.path;
import std.datetime;
import std.parallelism;

int main(string[] argv)
{
 	if (argv.length != 3){
 		writeln ("need to specify src and dest dir");
 		return 0;
 	}

	// TODO expand this to handle wildcard

 	string dest = argv[$-1];
 	foreach(string dir; argv[1..$-1])
	{
		writeln("copying directory: "~ dir );
		auto st1 = Clock.currTime(); //Current time in local time.
		cpdir(dir,dest);
 		auto st2 = Clock.currTime(); //Current time in local time.
		auto dif = st2  - st1 ;
		auto ts= dif.toString();
		writeln("time:"~ts);
	}
	writeln("finished !");
	return 0;
}
void cpdir(in char[] pathname ,in char[] dest){
    DirEntry deSrc = dirEntry(pathname);
	string[] files;

	if (!exists(dest)){
		mkdir (dest); // makes dest root
	}
 	DirEntry destDe = dirEntry(dest);
	if(!destDe.isDir()){
		throw new FileException( destDe.name, " is not a directory");
	}
	string destName = destDe.name ~ '/';

	if(!deSrc.isDir()){
		copy(deSrc.name,dest);
	}
	else    {
		string srcRoot = deSrc.name;
		int srcLen = srcRoot.length;
		string destRoot = destName ~ baseName(deSrc.name);
        mkdir(destRoot);

		// make an array of the regular files only, also create the directory structure
		// Since it is SpanMode.breadth, can just use mkdir
 		foreach(DirEntry e; dirEntries(deSrc.name, SpanMode.breadth, false)){
			if (attrIsDir(e.linkAttributes)){
				string destDir = destRoot ~ e.name[srcLen..$];
				mkdir(destDir);
			}
			else{
				files ~= e.name;
			}
 		}

		// parallel foreach for regular files
		foreach(fn ; taskPool.parallel(files)) {
			string dfn = destRoot ~ fn[srcLen..$];
			copy(fn,dfn);
		}
	}
}


February 14, 2012
ok, I didn't test that first one very well.  It worked for directory copies, but I didn't test non directories.  So here is the fixed operation for non directories, where it just copies the single file.

So it now does two cases:
copy regular_file destinationDirectory
copy folder destinationDirectory

What I'd like to add is wildcard support for something like
 copy folder/* destinationDirectory

I suppose also it could be enhanced to handle all the robocopy options, but I'm just trying out the copy speeds for now.


module main;

import std.stdio;
import std.file;
import std.path;
import std.datetime;
import std.parallelism;

int main(string[] argv)
{
 	if (argv.length != 3){
 		writeln ("need to specify src and dest dir");
 		return 0;
 	}

	// TODO expand this to handle wildcard

 	string dest = argv[$-1];
 	foreach(string dir; argv[1..$-1])
	{
		writeln("copying directory: "~ dir );
		auto st1 = Clock.currTime(); //Current time in local time.
		cpdir(dir,dest);
 		auto st2 = Clock.currTime(); //Current time in local time.
		auto dif = st2  - st1 ;
		auto ts= dif.toString();
		writeln("time:"~ts);
	}
	writeln("finished !");
	return 0;
}
void cpdir(in char[] pathname ,in char[] dest){
    DirEntry deSrc = dirEntry(pathname);
	string[] files;

	if (!exists(dest)){
		mkdir (dest); // makes dest root
	}
 	DirEntry destDe = dirEntry(dest);
	if(!destDe.isDir()){
		throw new FileException( destDe.name, " is not a directory");
	}
	string destName = destDe.name ~ '/';
	string destRoot = destName ~ baseName(deSrc.name);

	if(!deSrc.isDir()){
		copy(deSrc.name,destRoot);
	}
	else    {
		string srcRoot = deSrc.name;
		int srcLen = srcRoot.length;
        mkdir(destRoot);

		// make an array of the regular files only, also create the directory structure
		// Since it is SpanMode.breadth, can just use mkdir
 		foreach(DirEntry e; dirEntries(deSrc.name, SpanMode.breadth, false)){
			if (attrIsDir(e.linkAttributes)){
				string destDir = destRoot ~ e.name[srcLen..$];
				mkdir(destDir);
			}
			else{
				files ~= e.name;
			}
 		}

		// parallel foreach for regular files
		foreach(fn ; taskPool.parallel(files)) {
			string dfn = destRoot ~ fn[srcLen..$];
			copy(fn,dfn);
		}
	}
}


February 14, 2012
An  improvement  is to change this first mkdir to mkdirRecurse.

  if (!exists(dest)){
                mkdir (dest); // makes dest root
        }

February 14, 2012
Le 14/02/2012 14:29, Jay Norwood a écrit :
> An  improvement  is to change this first mkdir to mkdirRecurse.
>
>    if (!exists(dest)){
>                  mkdir (dest); // makes dest root
>          }
>

If I could suggest something, it would be great to see this added to std.file . As well as the multithreaded remove we talked about recently in another thread.
February 15, 2012
On 2/13/2012 10:58 PM, Jay Norwood wrote:
> Attached is the source for a small parallel app that copies a source folder to a destination.  It creates the directory structure first using the breadth ordering, then uses a parallel foreach loop with the taskPool to copy all the regular files in parallel.  On my corei7, this copied a 1.5GB folder with around 36K entries to a destination in about 11.5 secs (src and dest on the same ssd drive).  This was about a second better than robocopy, which is the fastest alternative I could find.   The regular win7-64 copy takes 41 secs for the same folder.
>
> I'd like to add wildcard processing for the sources, but haven't found a good example.


more of an 'FYI/reminder':

At a minimum Robocopy does additional work to preserve the timestamps and attributes of the copies of the files (by default) so it can avoid redundant copies of files in the future.  This is undoubtedly creating some additional overhead.

Its probably also quite a bit worse with /SEC etc to copy permissions.

On the plus side you would have windows scheduling the IO which in theory would be able to minimize seeking to some degree, compared to robocopy's serial copying.
February 15, 2012
"Jay Norwood" <jayn@prismnet.com> wrote in message news:jhcplo$1jj8$1@digitalmars.com...
> Attached is the source for a small parallel app that copies a source folder to a destination.  It creates the directory structure first using the breadth ordering, then uses a parallel foreach loop with the taskPool to copy all the regular files in parallel.  On my corei7, this copied a 1.5GB folder with around 36K entries to a destination in about 11.5 secs (src and dest on the same ssd drive).  This was about a second better than robocopy, which is the fastest alternative I could find.   The regular win7-64 copy takes 41 secs for the same folder.
>
> I'd like to add wildcard processing for the sources, but haven't found a good example.

Nice!

Is it possible this could increase disk fragmentation though? Or do the filesystem drivers on Win/Lin/etc work in a way that mitigates that possibility?


February 15, 2012
On Wednesday, 15 February 2012 at 00:11:32 UTC, Sean Cavanaugh wrote:
> more of an 'FYI/reminder':
>
> At a minimum Robocopy does additional work to preserve the timestamps and attributes of the copies of the files (by default) so it can avoid redundant copies of files in the future.  This is undoubtedly creating some additional overhead.
>
> Its probably also quite a bit worse with /SEC etc to copy permissions.
>
> On the plus side you would have windows scheduling the IO which in theory would be able to minimize seeking to some degree, compared to robocopy's serial copying.

Yeah, Robocopy has a lot of nice options.  Currently the D library has copy (srcpath, destpath), which goes directly to the OS copy.   If it had something like copy(DirectoryEntry,destpath,options), with the options being like the Robocopy options, that might be more efficient.

On the ssd seeking is on the order of 0.2msec vs 16msec on my 7200rpm seagate hard drive.  I do think seeks on a hard drive will be a problem with all the small, individual file copies.  So is Robocopy bundling these up in some way?

I did find a nice solution in std.file for the argv expansion, btw, and posted an example on D.learn.  It uses a version of dirEntries that has an extra parameter that is used for expansion that is available in std.path.




March 04, 2012
I placed the two parallel file operations, rmdir and copy on github in

https://github.com/jnorwood/file_parallel

These combine the std.parallelism operations with the std.file operations to speed up the processing on Windows.
-----------
I also put a useful function that does argv pathname wildcard expansion in

https://github.com/jnorwood/file_utils

This makes use of one of the existing dirEntries call that has the pattern matching parameter which enables simple * and ? expansions in windows args.  I'm only allowing expansions in the basename, and only expanding in one level of the directory.

There are example Windows commandline utilies that use each of the functions in file_parallel/examples.

I've only testsd these on win7, 64 bit.


March 05, 2012
On 3/4/12 2:53 PM, Jay Norwood wrote:
> I placed the two parallel file operations, rmdir and copy on github in
>
> https://github.com/jnorwood/file_parallel
>
> These combine the std.parallelism operations with the std.file
> operations to speed up the processing on Windows.
> -----------
> I also put a useful function that does argv pathname wildcard expansion in
>
> https://github.com/jnorwood/file_utils
>
> This makes use of one of the existing dirEntries call that has the
> pattern matching parameter which enables simple * and ? expansions in
> windows args. I'm only allowing expansions in the basename, and only
> expanding in one level of the directory.
>
> There are example Windows commandline utilies that use each of the
> functions in file_parallel/examples.
>
> I've only testsd these on win7, 64 bit.

Sounds great! Next step, should you be interested, is to create a pull request for phobos so we can integrate your code within.

Andrei
« First   ‹ Prev
1 2