November 30, 2012
Is there a way to walk files with std.file.dirEntries such that certain directories are skipped (i.e. how to avoid .git entirely/recursively)?

Thanks
Dan
November 30, 2012
On Friday, November 30, 2012 01:24:07 Dan wrote:
> Is there a way to walk files with std.file.dirEntries such that certain directories are skipped (i.e. how to avoid .git entirely/recursively)?

You can use std.algorithm.filter on its result. Then when it would iterate to something which doesn't match filter's predicate, it skips it.

- Jonathan M Davis
November 30, 2012
On Friday, 30 November 2012 at 01:13:13 UTC, Jonathan M Davis wrote:
> On Friday, November 30, 2012 01:24:07 Dan wrote:
>> Is there a way to walk files with std.file.dirEntries such that
>> certain directories are skipped (i.e. how to avoid .git
>> entirely/recursively)?
>
> You can use std.algorithm.filter on its result. Then when it would iterate to
> something which doesn't match filter's predicate, it skips it.
>
> - Jonathan M Davis

That will do the filtering correctly - but what I was hoping was to actually prune at the directory level and not drill down to the files in of an unwanted directory (e.g. .git). The problem with this and what I'm trying to overcome is accessing lots of files and directories recursively all of which I want to skip. Much like there is a *followSymlink* it would be nice if a predicate were accepted to *followDirectory* in general or some way to cause that.

---------------

static bool desired(string m) {
  bool unwanted = match(m, _uninterestRe)? true : false;
  writeln("Is unwanted ", m, " ", unwanted);
  return !unwanted;
}
static Regex!(char) _uninterestRe = regex(`\.git\b`);
filter!(desired)(dirEntries(root, SpanMode.depth))) {
...
}
November 30, 2012
On Friday, November 30, 2012 02:57:20 Dan wrote:
> On Friday, 30 November 2012 at 01:13:13 UTC, Jonathan M Davis
> 
> wrote:
> > On Friday, November 30, 2012 01:24:07 Dan wrote:
> >> Is there a way to walk files with std.file.dirEntries such that certain directories are skipped (i.e. how to avoid .git entirely/recursively)?
> > 
> > You can use std.algorithm.filter on its result. Then when it
> > would iterate to
> > something which doesn't match filter's predicate, it skips it.
> > 
> > - Jonathan M Davis
> 
> That will do the filtering correctly - but what I was hoping was to actually prune at the directory level and not drill down to the files in of an unwanted directory (e.g. .git). The problem with this and what I'm trying to overcome is accessing lots of files and directories recursively all of which I want to skip. Much like there is a *followSymlink* it would be nice if a predicate were accepted to *followDirectory* in general or some way to cause that.
> 
> ---------------
> 
> static bool desired(string m) {
> bool unwanted = match(m, _uninterestRe)? true : false;
> writeln("Is unwanted ", m, " ", unwanted);
> return !unwanted;
> }
> static Regex!(char) _uninterestRe = regex(`\.git\b`);
> filter!(desired)(dirEntries(root, SpanMode.depth))) {
> ...
> }

You can use the glob matching overload then:

auto dirEntries(string path, string pattern, SpanMode mode,
 bool followSymlink = true)

I don't really know how to use it though, so you'll have to read the docs and figure it out.

- Jonathan M Davis
November 30, 2012
On Friday, 30 November 2012 at 01:57:21 UTC, Dan wrote:
> That will do the filtering correctly - but what I was hoping was to actually prune at the directory level and not drill down to the files in of an unwanted directory (e.g. .git). The problem with this and what I'm trying to overcome is accessing lots of files and directories recursively all of which I want to skip. Much like there is a *followSymlink* it would be nice if a predicate were accepted to *followDirectory* in general or some way to cause that.

what about the following?

import std.algorithm, std.array, std.regex;
import std.stdio, std.file;
void main()
{
  auto exclude = regex(r"\.git", "g");
  dirEntries("/path/GIT", SpanMode.breadth)
    .filter!(a => match(a.name, exclude).empty)
    .writeln();
}

I think if you go breadth first, you can filter out the unwanted directories before it delves into them

November 30, 2012
On Friday, 30 November 2012 at 06:29:01 UTC, Joshua Niehus wrote:
> I think if you go breadth first, you can filter out the unwanted directories before it delves into them

oh wait... it probably still looks through all those dir's.
What about this?

import std.algorithm, std.regex, std.stdio, std.file;
import std.parallelism;
DirEntry[] prune(string path, ref DirEntry[] files)
{
  auto exclude = regex(r"\.git|\.DS_Store", "g");
  foreach(_path; taskPool.parallel(dirEntries(path, SpanMode.shallow)
    .filter!(a => match(a.name, exclude).empty)))
  {
    files ~= _path;
    if (isDir(_path.name)) { prune(_path.name, files); }
  }
return files;
}

void main()
{
  DirEntry[] files;
  prune("/path", files);
  foreach(file;files) { writeln(file.name); }
}

November 30, 2012
On Friday, 30 November 2012 at 07:29:59 UTC, Joshua Niehus wrote:
> On Friday, 30 November 2012 at 06:29:01 UTC, Joshua Niehus wrote:
>> I think if you go breadth first, you can filter out the unwanted directories before it delves into them
>
Good idea, thanks. I could not get original to compile as is - but the concept is just what was needed. I got an error on line 8:
Error: not a property dirEntries(path, cast(SpanMode)0, true).filter!(__lambda2)
I'm using a quite recent version of dmd and phobos.

But, I pulled the lamda out into a function and it works great. I assume the parallel is for performance, and it actually runs significantly slower than without on my test case - but no work is being done other than build the list of files, so that is probably normal. For my case the breakdown is:

No Pruning: 11 sec
Pruning Parallel: 4.78 sec
Pruning Serial: 0.377 sec

Thanks
Dan

---------------------
import std.algorithm, std.regex, std.stdio, std.file;
import std.parallelism;

bool interested(DirEntry path) {
  static auto exclude = regex(r"\.git|\.DS_Store", "g");
  return match(path.name, exclude).empty;
}

DirEntry[] prune(string path, ref DirEntry[] files)
{
  static if(0) {
    foreach(_path; taskPool.parallel(filter!interested(dirEntries(path, SpanMode.shallow))))  {
      files ~= _path;
      if (isDir(_path.name)) { prune(_path.name, files); }
    }
  } else {
    foreach(_path; filter!(interested)(dirEntries(path, SpanMode.shallow)))  {
      files ~= _path;
      if (isDir(_path.name)) { prune(_path.name, files); }
    }
  }
  return files;
}

void main()
{
  DirEntry[] files;
  prune("/path", files);
  //  foreach(file;files) { writeln(file.name); }
}
November 30, 2012
On Friday, 30 November 2012 at 12:02:51 UTC, Dan wrote:
> Good idea, thanks. I could not get original to compile as is - but the concept is just what was needed. I got an error on line 8:
> Error: not a property dirEntries(path, cast(SpanMode)0, true).filter!(__lambda2)
> I'm using a quite recent version of dmd and phobos.

hmm strange... I'm using 2.060 (on a mac),

> But, I pulled the lamda out into a function and it works great. I assume the parallel is for performance, and it actually runs significantly slower than without on my test case - but no work is being done other than build the list of files, so that is probably normal. For my case the breakdown is:
>
> No Pruning: 11 sec
> Pruning Parallel: 4.78 sec
> Pruning Serial: 0.377 sec

Thats cool.
Yea I thought parallel would make a big difference (in the positive sense) for large directories, but I guess if we are recursively spawning parallel tasks, the overhead involved starts accumulating, resulting in worse performance (my best guess anyway).


November 30, 2012
11/30/2012 11:29 AM, Joshua Niehus пишет:
> On Friday, 30 November 2012 at 06:29:01 UTC, Joshua Niehus wrote:
>> I think if you go breadth first, you can filter out the unwanted
>> directories before it delves into them
>
> oh wait... it probably still looks through all those dir's.
> What about this?
>
> import std.algorithm, std.regex, std.stdio, std.file;
> import std.parallelism;
> DirEntry[] prune(string path, ref DirEntry[] files)
> {
>    auto exclude = regex(r"\.git|\.DS_Store", "g");
>    foreach(_path; taskPool.parallel(dirEntries(path, SpanMode.shallow)
>      .filter!(a => match(a.name, exclude).empty)))
>    {
>      files ~= _path;

I do think that there is a race on 'files' variable. parallel doesn't auto-magically lock anything.

>      if (isDir(_path.name)) { prune(_path.name, files); }

An yes, I have a bad feeling that spawning a few threads per directory recursively is a bad idea.


>    }
> return files;
> }
>
> void main()
> {
>    DirEntry[] files;
>    prune("/path", files);
>    foreach(file;files) { writeln(file.name); }
> }
>

Otherwise I think there is a better way to filter out directories inside because here you a basically doing what dirEntries depth search does (but with recursion vs queue).

Maybe file it as an enhancement?


-- 
Dmitry Olshansky
November 30, 2012
On Friday, November 30, 2012 13:02:50 Dan wrote:
> On Friday, 30 November 2012 at 07:29:59 UTC, Joshua Niehus wrote:
> > On Friday, 30 November 2012 at 06:29:01 UTC, Joshua Niehus
> > 
> > wrote:
> >> I think if you go breadth first, you can filter out the unwanted directories before it delves into them
> 
> Good idea, thanks. I could not get original to compile as is -
> but the concept is just what was needed. I got an error on line 8:
> Error: not a property dirEntries(path, cast(SpanMode)0,
> true).filter!(__lambda2)
> I'm using a quite recent version of dmd and phobos.

If you're compiling with -property, filter must have the parens for the function call as it's a function, not a property. The !() is for the template arguments and is separate from the parens for the function call. That means that if you're compiling with -property and using UFCS, then you end up with range.filter!(pred)(), whereas you have range.filter!(pred).

- Jonathan M Davis
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home