View mode: basic / threaded / horizontal-split · Log in · Help
February 04, 2012
4x speedup of recursive rmdir in std.file
It would be good if the std.file operations used the D multi-
thread features, since you've done such a nice job of making them
easy.   I hacked up your std.file recursive remove and got a 4x
speed-up on a win7 system with corei7 using the examples from the
D programming language book.  Code is below with a hard-coded file
I was using for test.  I'm just learning this, so I know you can
do better ...

Delete time dropped from 1minute 5 secs to less than 15 secs.
This was on an ssd drive.

module main;

import std.stdio;
import std.file;
import std.datetime;
import std.concurrency;
const int THREADS = 16;
int main(string[] argv)
{
  writeln("removing H:/pa10_120130/xx8");
	auto st1 = Clock.currTime(); //Current time in local time.
	rmdirRecurse2("H:/pa10_120130/xx8");
	auto st2 = Clock.currTime(); //Current time in local time.
	auto dif = st2  - st1 ;
	auto ts= dif.toString();
	writeln("time:");
	writeln(ts);
	writeln("finished !");
  return 0;
}
void rmdirRecurse2(in char[] pathname){
   DirEntry de = dirEntry(pathname);
   rmdirRecurse2(de);
}
void rmdirRecurse2(ref DirEntry de){
	if(!de.isDir)
		throw new FileException( de.name, " is not a
directory");
	if(de.isSymlink())
		remove(de.name);
	else    {
		Tid tid[THREADS];
		int i=0;
       for(;i<THREADS;i++){
			tid[i]= spawn(&fileRemover);
		}
		Tid tidd = spawn(&dirRemover);

		// all children, recursively depth-first
	    i=0;
		foreach(DirEntry e; dirEntries(de.name,
SpanMode.depth, false))        {
			string nm = e.name;
           attrIsDir(e.linkAttributes) ? tidd.send(nm)  : tid
[i].send(nm),i=(i+1)%THREADS;
		}

       // wait for the THREADS threads to complete their file
removes and acknowledge
		// receipt of the tid
		for (i=0;i<THREADS;i++){
			tid[i].send(thisTid);
			receiveOnly!Tid();
		}
		tidd.send(thisTid);
		receiveOnly!Tid();

		// the dir itself
		rmdir(de.name);
	}
}
	void fileRemover() {
		for(bool running=true;running;){
		receive(
				(string s) {
					remove(s);
				}, // remove the files
				(Tid x) {
					x.send(thisTid);
					running=false;

				} // this is the terminator
				);
		}
	}

	void dirRemover() {
		string[] dirs;
		for(bool running=true;running;){
			receive(
					(string s) {
						dirs~=s;
					},
					(Tid x) {
						foreach(string
d;dirs){
							rmdir(d);
						}
						x.send(thisTid);
						running = false;
					}
					);
		}
	}
February 05, 2012
Re: 4x speedup of recursive rmdir in std.file
"Jay Norwood" <jayn@prismnet.com> wrote in message 
news:jgkfdf$qb5$1@digitalmars.com...
> It would be good if the std.file operations used the D multi-
> thread features, since you've done such a nice job of making them
> easy.   I hacked up your std.file recursive remove and got a 4x
> speed-up on a win7 system with corei7 using the examples from the
> D programming language book.  Code is below with a hard-coded file
> I was using for test.  I'm just learning this, so I know you can
> do better ...
>
> Delete time dropped from 1minute 5 secs to less than 15 secs.
> This was on an ssd drive.
>

Interesting. How does it perform when just running on one core?
February 05, 2012
Re: 4x speedup of recursive rmdir in std.file
== Quote from Nick Sabalausky (a@a.a)'s article
> Interesting. How does it perform when just running on one core?

The library without the threads is 1 min 5 secs for the 1.5GB
directory structure with about 32k files.  This is on an 510
series intel ssd.  The win7 os removes it in almost exactly the
same time, and you can see from their task manager it is also
being done single core and only a small percentage of cpu.  In
contrast, all 8 threads in the task manager max out for a period
when running this multi-thread remove. The regular file deletes
are occurring in parallel.  A single thread removes the directory
structure after waiting for all the regular files to be deleted by
the parallel threads.  I attached a screen capture.

I tried last night to do a similar thing with the unzip processing
in std.zip, but the library code is written in such a way that the
parallel threads would need to create the whole zip archive
directory in order to process the elements.   I would hope to be
able to solve this problem and provide a similar 4x speedup to the
unzip of, for example 7zip, which is currently also showing
execution on a single thread.  7zip takes about 50 seconds to
unzip this file.

What is needed is probably a dumber archive element processing
call that gets passed an archive element immutable structure read
by the main thread.  The parallel threads could then seek to the
position and process just each assigned single element without
loading the whole file.

Also, the current design requires a memory buffer with the whole
zip archive in it before it can create the archive directory.
There should instead be some way of sequentially processing the
file.
February 05, 2012
Re: 4x speedup of recursive rmdir in std.file
"Jay Norwood" <jayn@prismnet.com> wrote in message 
news:jgm5vh$hbe$1@digitalmars.com...
> == Quote from Nick Sabalausky (a@a.a)'s article
> > Interesting. How does it perform when just running on one core?
>
> The library without the threads is 1 min 5 secs for the 1.5GB
> directory structure with about 32k files.  This is on an 510
> series intel ssd.  The win7 os removes it in almost exactly the
> same time, and you can see from their task manager it is also
> being done single core and only a small percentage of cpu.  In
> contrast, all 8 threads in the task manager max out for a period
> when running this multi-thread remove. The regular file deletes
> are occurring in parallel.  A single thread removes the directory
> structure after waiting for all the regular files to be deleted by
> the parallel threads.  I attached a screen capture.
>

What I'm wondering is this:

Suppose all the cores but one are already preoccupied with other stuff, or 
maybe you're even running on a single-core. Does the threading add enough 
overhead that it would actually go slower than the original single-threaded 
version?

If not, then this would indeed be a fantastic improvement to phobos. 
Otherwise, I wonder how such a situation could be mitigated?

> I tried last night to do a similar thing with the unzip processing
> in std.zip, but the library code is written in such a way that the
> parallel threads would need to create the whole zip archive
> directory in order to process the elements.   I would hope to be
> able to solve this problem and provide a similar 4x speedup to the
> unzip of, for example 7zip, which is currently also showing
> execution on a single thread.  7zip takes about 50 seconds to
> unzip this file.
>

That would be cool.
February 05, 2012
Re: 4x speedup of recursive rmdir in std.file
On 2/5/12 10:16 AM, Nick Sabalausky wrote:
> "Jay Norwood"<jayn@prismnet.com>  wrote in message
> news:jgm5vh$hbe$1@digitalmars.com...
>> == Quote from Nick Sabalausky (a@a.a)'s article
>>> Interesting. How does it perform when just running on one core?
>>
>> The library without the threads is 1 min 5 secs for the 1.5GB
>> directory structure with about 32k files.  This is on an 510
>> series intel ssd.  The win7 os removes it in almost exactly the
>> same time, and you can see from their task manager it is also
>> being done single core and only a small percentage of cpu.  In
>> contrast, all 8 threads in the task manager max out for a period
>> when running this multi-thread remove. The regular file deletes
>> are occurring in parallel.  A single thread removes the directory
>> structure after waiting for all the regular files to be deleted by
>> the parallel threads.  I attached a screen capture.
>>
>
> What I'm wondering is this:
>
> Suppose all the cores but one are already preoccupied with other stuff, or
> maybe you're even running on a single-core. Does the threading add enough
> overhead that it would actually go slower than the original single-threaded
> version?
>
> If not, then this would indeed be a fantastic improvement to phobos.
> Otherwise, I wonder how such a situation could be mitigated?

There's a variety of ways, but the simplest approach is to pass a 
parameter to the function telling how many threads it's allowed to 
spawn. Jay?

Andrei
February 05, 2012
Re: 4x speedup of recursive rmdir in std.file
== Quote from Andrei Alexandrescu
> > Suppose all the cores but one are already preoccupied with
other stuff, or
> > maybe you're even running on a single-core. Does the threading
add enough
> > overhead that it would actually go slower than the original
single-threaded
> > version?
> >
> > If not, then this would indeed be a fantastic improvement to
phobos.
> > Otherwise, I wonder how such a situation could be mitigated?
> There's a variety of ways, but the simplest approach is to pass a
> parameter to the function telling how many threads it's allowed
to
> spawn. Jay?
> Andrei

I can tell you that there are a couple of seconds improvement in
the execution time running 16 threads vs 8 on the i7 on the ssd
drive, so we aren't keeping all the cores busy with 8 threads. I
suppose they are all blocked waiting for file system operations
for some portion of time even with 8 threads.  I would guess that
even on a single core it would be an advantage to have multiple
threads available for the core to work on when it blocks waiting
for the fs operations.

The previous results were on an ssd drive.  I tried again on  a
Seagate sata3 7200rpm hard drive it took 2 minutes 12 sec to
delete the same layout using OS, and never used more than 10%
cpu.

The one thread configuration of the D program similarly used less
than 10% cpu but took only 1 minute 50 seconds to delete the same
layout.

Anything above 1 thread configuration on the sata drive began
degrading the D program performance when using the hard drive.
I'll have to scratch my head on this a while.  This is for an
optiplex 790, win7-64, using the board's sata for both the ssd and
the hd.

The extract of the zip using 7zip takes 1:55 on the seagate disk
drive, btw ... vs about 50 secs on the ssd.
February 05, 2012
Re: 4x speedup of recursive rmdir in std.file
On 2/5/12 3:04 PM, Jay Norwood wrote:
> I can tell you that there are a couple of seconds improvement in
> the execution time running 16 threads vs 8 on the i7 on the ssd
> drive, so we aren't keeping all the cores busy with 8 threads. I
> suppose they are all blocked waiting for file system operations
> for some portion of time even with 8 threads.  I would guess that
> even on a single core it would be an advantage to have multiple
> threads available for the core to work on when it blocks waiting
> for the fs operations.
[snip]

That's why I'm saying - let's leave the decision to the user. Take a 
uint parameter for the number of threads to be used, where 0 means leave 
it to phobos, and default to 0.

Andrei
February 07, 2012
Re: 4x speedup of recursive rmdir in std.file
Andrei Alexandrescu Wrote:
> That's why I'm saying - let's leave the decision to the user. Take a 
> uint parameter for the number of threads to be used, where 0 means leave 
> it to phobos, and default to 0.
> 
> Andrei
> 


ok, here is another version.  I was reading about the std.parallelism library, and I see I can do the parallel removes more cleanly.  Plus the library figures out the number of cores and limits the taskpool size accordingly. It is only a slight bit slower than the other code.  It looks like they choose 7 threads in the taskPool when you have 8 cores.

So, I do the regular files in parallel, then pass it back to the original library code which cleans up the  directory-only  tree non-parallel.  I also added in code to get the directory names from argv.


module main;

import std.stdio;
import std.file;
import std.datetime;
import std.parallelism;

int main(string[] argv)
{
	if (argv.length < 2){
		writeln ("need to specify one or more directories to remove");
		return 0;
	}
	foreach(string dir; argv[1..$]){
		writeln("removing directory: "~ dir );
		auto st1 = Clock.currTime(); //Current time in local time.
		rmdirRecurse2(dir); 
		auto st2 = Clock.currTime(); //Current time in local time.
		auto dif = st2  - st1 ;
		auto ts= dif.toString();
		writeln("time:"~ts);
	}
	writeln("finished !");
	return 0;
}
void rmdirRecurse2(in char[] pathname){
   DirEntry de = dirEntry(pathname);
   rmdirRecurse2(de);
}
void rmdirRecurse2(ref DirEntry de){ 
	string[] files;

	if(!de.isDir)        
		throw new FileException( de.name, " is not a directory");    
	if(de.isSymlink())        
		remove(de.name);    
	else    { 
		// make an array of the regular files only
		foreach(DirEntry e; dirEntries(de.name, SpanMode.depth, false)){
            if (!attrIsDir(e.linkAttributes)){
				 files ~= e.name ;
			 }
		} 

		// parallel foreach for regular files
		foreach(fn ; taskPool.parallel(files,1000)) {
			remove(fn);
		}

		// let the original code remove the directories only
		rmdirRecurse(de);
	}
}
February 07, 2012
Re: 4x speedup of recursive rmdir in std.file
Le 05/02/2012 18:38, Andrei Alexandrescu a écrit :
> On 2/5/12 10:16 AM, Nick Sabalausky wrote:
>> "Jay Norwood"<jayn@prismnet.com> wrote in message
>> news:jgm5vh$hbe$1@digitalmars.com...
>>> == Quote from Nick Sabalausky (a@a.a)'s article
>>>> Interesting. How does it perform when just running on one core?
>>>
>>> The library without the threads is 1 min 5 secs for the 1.5GB
>>> directory structure with about 32k files. This is on an 510
>>> series intel ssd. The win7 os removes it in almost exactly the
>>> same time, and you can see from their task manager it is also
>>> being done single core and only a small percentage of cpu. In
>>> contrast, all 8 threads in the task manager max out for a period
>>> when running this multi-thread remove. The regular file deletes
>>> are occurring in parallel. A single thread removes the directory
>>> structure after waiting for all the regular files to be deleted by
>>> the parallel threads. I attached a screen capture.
>>>
>>
>> What I'm wondering is this:
>>
>> Suppose all the cores but one are already preoccupied with other
>> stuff, or
>> maybe you're even running on a single-core. Does the threading add enough
>> overhead that it would actually go slower than the original
>> single-threaded
>> version?
>>
>> If not, then this would indeed be a fantastic improvement to phobos.
>> Otherwise, I wonder how such a situation could be mitigated?
>
> There's a variety of ways, but the simplest approach is to pass a
> parameter to the function telling how many threads it's allowed to
> spawn. Jay?
>
> Andrei
>
>

That cold be a solution, but this is a bad separation of concerns IMO, 
and should be like that in phobos.

The parameter should be a thread pool or something similar. This allow 
to not only choose the number of thread, but also to choose how the task 
is distributed over threads, eventually mix thoses task with other tasks 
(by using the same thread pool in other places).

It allow to basically separate the problem of deleting and the problem 
of spreading the task over multiple threads and with which policy.
Top | Discussion index | About this forum | D home