Thread overview
path matching problem
Nov 27, 2012
Charles Hixson
Nov 27, 2012
Joshua Niehus
Nov 27, 2012
Charles Hixson
Nov 27, 2012
Joshua Niehus
Nov 27, 2012
jerro
Nov 28, 2012
Charles Hixson
Nov 28, 2012
jerro
Nov 28, 2012
Charles Hixson
Nov 28, 2012
Philippe Sigaud
Nov 28, 2012
jerro
November 27, 2012
Is there a better way to do this?  (I want to find files that match any of some extensions and don't match any of several other strings, or are not in some directories.):

 import	std.file;

...

 string  exts  =  "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
 string[]  exclude  =  ["/template/",  "biblio.txt",  "categories.txt",
        "subjects.txt",  "/toCDROM/"]

 int  limit  =  1
 //  Iterate  a  directory  in  depth
 foreach  (string  name;  dirEntries(sDir,  exts,  SpanMode.depth))
 {  bool  excl  =  false;
    foreach  (string  part;  exclude)
    {  if  (part  in  name)
       {  excl  =  true;
          break;
       }
    }
    if  (excl)  break;
etc.
November 27, 2012
On Tuesday, 27 November 2012 at 19:40:56 UTC, Charles Hixson wrote:
> Is there a better way to do this?  (I want to find files that match any of some extensions and don't match any of several other strings, or are not in some directories.):
>
>  import	std.file;
>
> ...
>
>  string  exts  =  "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
>  string[]  exclude  =  ["/template/",  "biblio.txt",  "categories.txt",
>         "subjects.txt",  "/toCDROM/"]
>
>  int  limit  =  1
>  //  Iterate  a  directory  in  depth
>  foreach  (string  name;  dirEntries(sDir,  exts,  SpanMode.depth))
>  {  bool  excl  =  false;
>     foreach  (string  part;  exclude)
>     {  if  (part  in  name)
>        {  excl  =  true;
>           break;
>        }
>     }
>     if  (excl)  break;
> etc.

maybe this:?

import std.algorithm, std.array, std.regex;
import std.stdio, std.file;
void main()
{
    enum string[] exts  =  [`".txt"`, `".utf8"`, `".utf-8"`, `".TXT"`, `".UTF8"`, `".UTF-8"`];
    enum string exclude = `r"/template/|biblio\.txt|categories\.txt|subjects\.txt|/toCDROM/"`;

    auto x = dirEntries("/path", SpanMode.depth)
        .filter!(`endsWith(a.name,` ~ exts.join(",") ~ `)`)
        .filter!(`std.regex.match(a.name,` ~ exclude ~ `).empty`);;

    writeln(x);
}
November 27, 2012
On Tuesday, 27 November 2012 at 19:40:56 UTC, Charles Hixson wrote:
> Is there a better way to do this?  (I want to find files that match any of some extensions and don't match any of several other strings, or are not in some directories.):
>
>  import	std.file;
>
> ...
>
>  string  exts  =  "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
>  string[]  exclude  =  ["/template/",  "biblio.txt",  "categories.txt",
>         "subjects.txt",  "/toCDROM/"]
>
>  int  limit  =  1
>  //  Iterate  a  directory  in  depth
>  foreach  (string  name;  dirEntries(sDir,  exts,  SpanMode.depth))
>  {  bool  excl  =  false;
>     foreach  (string  part;  exclude)
>     {  if  (part  in  name)
>        {  excl  =  true;
>           break;
>        }
>     }
>     if  (excl)  break;
> etc.

You could replace the inner loop with somehting like:

bool excl = exclude.any!(part => name.canFind(part));

There may be even some easier way to do it, take a look at std.algorithm.
November 27, 2012
On 11/27/2012 01:31 PM, Joshua Niehus wrote:
> On Tuesday, 27 November 2012 at 19:40:56 UTC, Charles Hixson wrote:
>> Is there a better way to do this? (I want to find files that match any
>> of some extensions and don't match any of several other strings, or
>> are not in some directories.):
>>
>> import std.file;
>>
>> ...
>>
>> string exts = "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
>> string[] exclude = ["/template/", "biblio.txt", "categories.txt",
>> "subjects.txt", "/toCDROM/"]
>>
>> int limit = 1
>> // Iterate a directory in depth
>> foreach (string name; dirEntries(sDir, exts, SpanMode.depth))
>> { bool excl = false;
>> foreach (string part; exclude)
>> { if (part in name)
>> { excl = true;
>> break;
>> }
>> }
>> if (excl) break;
>> etc.
>
> maybe this:?
>
> import std.algorithm, std.array, std.regex;
> import std.stdio, std.file;
> void main()
> {
> enum string[] exts = [`".txt"`, `".utf8"`, `".utf-8"`, `".TXT"`,
> `".UTF8"`, `".UTF-8"`];
> enum string exclude =
> `r"/template/|biblio\.txt|categories\.txt|subjects\.txt|/toCDROM/"`;
>
> auto x = dirEntries("/path", SpanMode.depth)
> .filter!(`endsWith(a.name,` ~ exts.join(",") ~ `)`)
> .filter!(`std.regex.match(a.name,` ~ exclude ~ `).empty`);;
>
> writeln(x);
> }

That's a good approach, except that I want to step through the matching paths rather than accumulate them in a vector...though ... the filter documentation could mean that it would return an iterator.  So I could replace
writeln (x);
by
foreach (string name; x)
{
	...
}
and x wouldn't have to hold all the matching strings at the same time.

But why the chained filters, rather than using the option provided by dirEntries for one of them?  Is it faster?  Just the way you usually do things? (Which I accept as a legitimate answer.  I can see that that approach would be more flexible.)
November 27, 2012
On Tuesday, 27 November 2012 at 23:43:43 UTC, Charles Hixson wrote:
> But why the chained filters, rather than using the option provided by dirEntries for one of them?  Is it faster?  Just the way you usually do things? (Which I accept as a legitimate answer.  I can see that that approach would be more flexible.)

Ignorance...
Your right, I didn't realize that dirEntries had that filter option, you should use that.  I doubt the double .filter would effect performance at all (might even slow it down for all i know :)

//update:
import std.algorithm, std.array, std.regex;
import std.stdio, std.file;
void main()
{
  string exts = "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
  enum string exclude =
    `r"/template/|biblio\.txt|categories\.txt|subjects\.txt|/toCDROM/"`;

  dirEntries("/path", exts, SpanMode.depth)
    .filter!(` std.regex.match(a.name,` ~ exclude ~ `).empty `)
    .writeln();
}

November 28, 2012
On 11/27/2012 01:34 PM, jerro wrote:
> On Tuesday, 27 November 2012 at 19:40:56 UTC, Charles Hixson wrote:
>> Is there a better way to do this? (I want to find files that match any
>> of some extensions and don't match any of several other strings, or
>> are not in some directories.):
>>
>> import std.file;
>>
>> ...
>>
>> string exts = "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
>> string[] exclude = ["/template/", "biblio.txt", "categories.txt",
>> "subjects.txt", "/toCDROM/"]
>>
>> int limit = 1
>> // Iterate a directory in depth
>> foreach (string name; dirEntries(sDir, exts, SpanMode.depth))
>> { bool excl = false;
>> foreach (string part; exclude)
>> { if (part in name)
>> { excl = true;
>> break;
>> }
>> }
>> if (excl) break;
>> etc.
>
> You could replace the inner loop with somehting like:
>
> bool excl = exclude.any!(part => name.canFind(part));
>
> There may be even some easier way to do it, take a look at std.algorithm.

std.algorithm seems to generally be running the match in the opposite direction, if I'm understanding it properly.  (Dealing with D template is always confusing to me.)  OTOH, I couldn't find the string any method, so I'm not really sure what you're proposing, though it does look attractive.

Still, though your basic approach sounds good, the suggestion of Joshua Niehus would let me filter out the strings that didn't fit before entering the loop.  There's probably no real advantage to doing it that way, but it does seem more elegant.  (You were right, though.  That is in std.algorithms.)
November 28, 2012
>> You could replace the inner loop with somehting like:
>>
>> bool excl = exclude.any!(part => name.canFind(part));

> std.algorithm seems to generally be running the match in the opposite direction, if I'm understanding it properly.  (Dealing with D template is always confusing to me.)  OTOH, I couldn't find the string any method, so I'm not really sure what you're proposing, though it does look attractive.

I don't understand what you mean with running the match in the opposite direction, but I'll explain how my line of code works. First of all, it is equivalent to:

any!(part => canFind(name, part))(exclude);

The feature that that lets you write that in the way I did in my previous post is called uniform function call syntax (often abbreviated to UFCS) and is described at http://www.drdobbs.com/cpp/uniform-function-call-syntax/232700394.

canFind(name, part) returns true if name contains part.

(part => canFind(name, part)) is a short syntax for (part){ return canFind(name, part); }

any!(condition)(range) returns true if condition is true for any element of range

So the line of code in my previous post sets excl to true if name contains any of the strings in exclude. If you know all the strings you want to exclude in advance, it is easier to do that with a regex like Joshua did.

If you want to learn about D templates, try this tutorial:

https://github.com/PhilippeSigaud/D-templates-tutorial/blob/master/dtemplates.pdf?raw=true

> Still, though your basic approach sounds good, the suggestion of Joshua Niehus would let me filter out the strings that didn't fit before entering the loop.  There's probably no real advantage to doing it that way, but it does seem more elegant.

I agree, it is more elegant.

November 28, 2012
On 11/27/2012 06:45 PM, jerro wrote:
>>> You could replace the inner loop with somehting like:
>>>
>>> bool excl = exclude.any!(part => name.canFind(part));
>
>> std.algorithm seems to generally be running the match in the opposite
>> direction, if I'm understanding it properly. (Dealing with D template
>> is always confusing to me.) OTOH, I couldn't find the string any
>> method, so I'm not really sure what you're proposing, though it does
>> look attractive.
>
> I don't understand what you mean with running the match in the opposite
> direction, but I'll explain how my line of code works. First of all, it
> is equivalent to:
>
> any!(part => canFind(name, part))(exclude);
>
> The feature that that lets you write that in the way I did in my
> previous post is called uniform function call syntax (often abbreviated
> to UFCS) and is described at
> http://www.drdobbs.com/cpp/uniform-function-call-syntax/232700394.
>
> canFind(name, part) returns true if name contains part.
>
> (part => canFind(name, part)) is a short syntax for (part){ return
> canFind(name, part); }
>
> any!(condition)(range) returns true if condition is true for any element
> of range
>
> So the line of code in my previous post sets excl to true if name
> contains any of the strings in exclude. If you know all the strings you
> want to exclude in advance, it is easier to do that with a regex like
> Joshua did.
>
> If you want to learn about D templates, try this tutorial:
>
> https://github.com/PhilippeSigaud/D-templates-tutorial/blob/master/dtemplates.pdf?raw=true
>
>
>> Still, though your basic approach sounds good, the suggestion of
>> Joshua Niehus would let me filter out the strings that didn't fit
>> before entering the loop. There's probably no real advantage to doing
>> it that way, but it does seem more elegant.
>
> I agree, it is more elegant.
>
Thanks for the tutorial link, I'll give it a try. (Whee!  A 182 page tutorial!)  Those things, though, don't seem to stick in my mind.  I learned programming in FORTRAN IV, and I don't seem to be able to force either templates, Scheme, or Haskell into my way of thinking about programming.  (Interestingly, classes and structured programming fit without problems.)

The link to the Walter article in Dr. Dobbs is interesting.  I intend to read it first.

OTOH, I still don't know where "any" is documented.  It's clearly some sort of template instantiation, but it doesn't seem to be defined in either std.string or std.object (or anywhere else I've thought to check).  And it look as if it would be something very useful to know.
November 28, 2012
> Thanks for the tutorial link, I'll give it a try. (Whee!  A 182 page
> tutorial!)


Well, it *started* as a tutorial. Then people sent me code :)


November 28, 2012
> OTOH, I still don't know where "any" is documented.  It's clearly some sort of template instantiation, but it doesn't seem to be defined in either std.string or std.object (or anywhere else I've thought to check).  And it look as if it would be something very useful to know.

It's documented here:

http://dlang.org/phobos/std_algorithm.html#any