Jump to page: 1 2
Thread overview
[Issue 8680] New: SpanMode.breadth incorrect
Sep 17, 2012
Jesse Phillips
[Issue 8680] SpanMode.breadth is incorrectly named and implementation fails in Linux
Sep 18, 2012
Jesse Phillips
Sep 18, 2012
entheh@cantab.net
Sep 18, 2012
Jesse Phillips
Sep 18, 2012
entheh@cantab.net
Sep 19, 2012
Jesse Phillips
Sep 19, 2012
Jesse Phillips
Sep 20, 2012
entheh@cantab.net
Sep 20, 2012
entheh@cantab.net
Sep 20, 2012
Jesse Phillips
Sep 22, 2012
entheh@cantab.net
September 17, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8680

           Summary: SpanMode.breadth incorrect
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: Jesse.K.Phillips+D@gmail.com


--- Comment #0 from Jesse Phillips <Jesse.K.Phillips+D@gmail.com> 2012-09-17 11:44:24 PDT ---
Using Linux 64bit with a reiserFS the following code does not span in a breadth manner. Testing on Windows 32bit it works as expected with on exception, all directories are not listed before spanning.

import std.file;
import std.stdio;

void main() {
    mkdir("a");
    mkdir("a/b");
    mkdir("a/c");
    mkdir("a/c/z");
    std.file.write("a/1.txt", "");
    std.file.write("a/2.txt", "");
    std.file.write("a/b/1.txt", "");
    std.file.write("a/b/2.txt", "");
    std.file.write("a/c/1.txt", "");
    std.file.write("a/c/z/1.txt", "");
    foreach(string file; dirEntries("a/", SpanMode.breadth))
        writeln(file);

    rmdirRecurse("a");
}

   // Expected Approximation
    // a/2.txt
    // a/1.txt
    // a/b
    // a/c
    // a/c/1.txt
    // a/c/z
    // a/c/z/1.txt
    // a/b/1.txt
    // a/b/2.txt
    //
    // Actual
    // a/c
    // a/c/z
    // a/c/z/1.txt
    // a/c/1.txt
    // a/b
    // a/b/2.txt
    // a/b/1.txt
    // a/2.txt
    // a/1.txt

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 18, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8680



--- Comment #1 from Jesse Phillips <Jesse.K.Phillips+D@gmail.com> 2012-09-17 18:43:39 PDT ---
A breadth first traversal is not a good algorithm for filesystem traversal. The described implementation is useful. As shown Linux ReiserFS does not traverse as described in the documentation.

SpanMode.breadth should be deprecated for not be a breadth traversal.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 18, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8680


entheh@cantab.net changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |entheh@cantab.net


--- Comment #2 from entheh@cantab.net 2012-09-18 07:03:41 PDT ---
I originally identified the bug. First I'd like to comment more on the terminology, and give my proposed fix.

The description in the documentation matches the implementation, and reflects what a user would find most useful. There is no implementation bug that I'm aware of (see below for more discussion of this).

The only issue is naming. The SpanMode enum constants 'depth' and 'breadth' actually describe 'postorder depth-first' and 'preorder depth-first' respectively. That means the word 'depth' carries insufficient information to identify it, and 'breadth' is just wrong. If you're unfamiliar with tree traversal algorithms and can't immediately see this, then please review the examples and terminology on http://en.wikipedia.org/wiki/Tree_traversal . (Pretty much everyone missed the point on the newsgroup - please do this if you're at all uncertain!)

So here's my proposed fix:

enum SpanMode {
  shallow,
  preorder,
  postorder;

  @deprecated alias preorder breadth;
  @deprecated alias postorder depth;
}

As for the behaviour on Linux - I don't think this is actually violating the spec. Probably what you're complaining about is that the directories 'c' and 'b' are coming before the files '2.txt' and '1.txt' which are siblings. However, the spec doesn't say anything about a directory's position relative to its siblings. It only talks about a directory's position relative to its children. See the difference:

For preorder depth-first:
  a/b
must come before
  a/b/1.txt

But
  a/b
has no ordering constraint with
  a/1.txt

If 'directories first' or 'directories last' is required, then that would be more of a feature request than a bug :) It could probably be generalised to child sorting - the user could supply a comparator which (if present) is used to sort the children in each directory.

Hope that helps :)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 18, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8680



--- Comment #3 from Jesse Phillips <Jesse.K.Phillips+D@gmail.com> 2012-09-18 08:04:40 PDT ---
(In reply to comment #2)
> As for the behaviour on Linux - I don't think this is actually violating the spec. Probably what you're complaining about is that the directories 'c' and 'b' are coming before the files '2.txt' and '1.txt' which are siblings. However, the spec doesn't say anything about a directory's position relative to its siblings. It only talks about a directory's position relative to its children. See the difference:

Looks like you are still correct. I'm still looking at the desire for useful traversal of a filesystem.

> If 'directories first' or 'directories last' is required, then that would be more of a feature request than a bug :) It could probably be generalised to child sorting - the user could supply a comparator which (if present) is used to sort the children in each directory.

I do not see sorting as a solution. I really do not care if the dir is before the file. What is useful is recieving all children before traversing children. I usually end up ignoring directories anyway. So I'm thinking your origional suggestion for childrenFirst and parentFirst is actuall what is useful and does not need too abide by actual traversal rules, only usefulness.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 18, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8680



--- Comment #4 from entheh@cantab.net 2012-09-18 09:51:31 PDT ---
(In reply to comment #3)
> Looks like you are still correct. I'm still looking at the desire for useful traversal of a filesystem.

OK, let's explore that:
> I do not see sorting as a solution. I really do not care if the dir is before the file. What is useful is recieving all children before traversing children.

Are you saying you want a real breadth-first search to be implemented? Fine, we can do that, if it's useful. Note though that it'll intersperse children, e.g.

a
a/1
a/2
a/1/A
a/1/B
a/2/A
a/2/B
a/1/A/...
a/1/B/...
a/2/A/...
a/2/B/...
a/1/A/.../...
etc. - so directories 1 and 2 become increasingly interleaved.

> I usually end up ignoring directories anyway. So I'm thinking your origional suggestion for childrenFirst and parentFirst is actuall what is useful and does not need too abide by actual traversal rules, only usefulness.

I intended those as alternative names instead of 'postorder' and 'preorder' (respectively) - I didn't intend to suggest any other behaviour.

What behaviour do you want? Do you want all of the first-level files and directories (as if starting a breadth-first), followed by all subsequent levels in depth-first order? See I would argue that that's very specific, not intuitive in the general case, and best implemented using two separate calls to the function with different arguments:

auto firstLevel = dirEntries("a", SpanMode.shallow);
foreach (file; firstLevel) {
  if (file.isDir()) {
    ... = dirEntries(file, SpanMode.preorder);
  }
}

Or do you want something else? I think this needs clarifying, since there's obviously a misunderstanding in our interpretation of "parentFirst", and "usefulness" is a goal, not a spec.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 19, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8680



--- Comment #5 from Jesse Phillips <Jesse.K.Phillips+D@gmail.com> 2012-09-18 20:59:48 PDT ---
(In reply to comment #4)
> Are you saying you want a real breadth-first search to be implemented? Fine, we can do that, if it's useful. Note though that it'll intersperse children, e.g.

No, but someone else did, which is fine. Probably not really needed as a standard traversal type though.

> I intended those as alternative names instead of 'postorder' and 'preorder' (respectively) - I didn't intend to suggest any other behaviour.

I know, but they are descriptive of the two most useful traversal types I can think of.

> What behaviour do you want?

I have only had two types of traversals.

I don't care: In this type of traversal I just need all the files and couldn't care less when they come to me. In all cases I've never wished to have the directory name (but don't wish for a traversal that leaves it out).

Parent First: Give me everything in the Parent First, then you can traverse the children. Almost as you described, except it would be starting a breadth first from each child.

The other traversal types are likely useful in other situations. I was close to instead of using Parent First choosing

Child First: Give me what is in the Child First before that in the parent. This turns the Parent First on its head.

> I would argue that that's very specific, not
> intuitive in the general case, and best implemented using two separate calls to
> the function with different arguments:
> 
> auto firstLevel = dirEntries("a", SpanMode.shallow);
> foreach (file; firstLevel) {
>   if (file.isDir()) {
>     ... = dirEntries(file, SpanMode.preorder);
>   }
> }

This isn't as you described nor what I want. You'll notice all you did was create a preorder search where you'll never see the top level directories. I'm looking more at:

auto firstLevel = dirEntries("a", SpanMode.shallow);
foreach (file; firstLevel) {
   if (file.isDir()) {
     secondLevel ~= dirEntries(file, SpanMode.shallow);
   }
}

foreach(sl; secondLevel)
foreach (file; sl) {
   if (file.isDir()) {
     thirdLevel ~= dirEntries(file, SpanMode.shallow);
   }
}

foreach(tl; thirdLevel)
foreach (file; tl) {
   if (file.isDir()) {
     thirdLevel ~= dirEntries(file, SpanMode.shallow);
   }
}
....

On until you get tired of trying to predict how many levels this directory actually has.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 19, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8680



--- Comment #6 from Jesse Phillips <Jesse.K.Phillips+D@gmail.com> 2012-09-19 07:03:06 PDT ---
Sorry, you can ignore my code, that's what I get for copy pasting a guess at what it would be. That would be breadth first, which is close, instead they'd need to be some loop that when finished with the files do the same thing for the first subdirectory.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 20, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8680



--- Comment #7 from entheh@cantab.net 2012-09-19 17:10:02 PDT ---
(In reply to comment #5)
> I know, but they are descriptive of the two most useful traversal types I can think of.

I think they are unambiguous, but I have a feeling I know what traversal you want.

> > What behaviour do you want?
> 
> I have only had two types of traversals.
> 
> I don't care: In this type of traversal I just need all the files and couldn't care less when they come to me. In all cases I've never wished to have the directory name (but don't wish for a traversal that leaves it out).

I suspect we don't need to offer the 'I don't care' option explicitly. People may as well be specific. Could add a documentation note that the depth-first options will generally be slightly cheaper than breadth-first.

> Parent First: Give me everything in the Parent First, then you can traverse the children. Almost as you described, except it would be starting a breadth first from each child.

There are two ways you could get this effect. You could do this:

recurse(dir) {
  foreach (file; dirEntries(dir, shallow)) {
    result~=file;
  }
  foreach (file; dirEntries(dir, shallow)) {
    if (file.isDir()) recurse(file);
  }
}

Or you could do this, which is equivalent if you (the user) are ignoring directories themselves and only care about files:

recurse(dir) {
  auto list=dirEntries(dir, shallow);
  sortDirectoriesLast(list);
  foreach (file; list) {
    result~=file;
    if (file.isDir()) recurse(file);
  }
}

This is the 'standard preorder depth-first with sorting' that I described before.

Or you could offer both these implementations (and the corresponding reversed ones). For the sorting case, I was merely suggesting that the sort could be put in the user's control, so they can sort alphabetically or by timestamp or however they want.

> The other traversal types are likely useful in other situations. I was close to instead of using Parent First choosing
> 
> Child First: Give me what is in the Child First before that in the parent. This turns the Parent First on its head.

So the above options can be reversed in all possible permutations to give what you want here.

> > I would argue that that's very specific, not
> > intuitive in the general case, and best implemented using two separate calls to
> > the function with different arguments:
> > 
> > auto firstLevel = dirEntries("a", SpanMode.shallow);
> > foreach (file; firstLevel) {
> >   if (file.isDir()) {
> >     ... = dirEntries(file, SpanMode.preorder);
> >   }
> > }
> 
> This isn't as you described nor what I want. You'll notice all you did was create a preorder search where you'll never see the top level directories.

I see, you're reading it as if "..." is the final result, but I guess I meant to imply that this was user-code and would make use of "firstLevel" directly as well as later recursing into it. I can't actually remember :)

But yes, I thought you were describing shallow first level followed by preorder for each child - and if not, then by describing it, I helped you understand what I misunderstood, so you could help me understand what you actually wanted. Hopefully. :)

(In reply to comment #6)
> Sorry, you can ignore my code, that's what I get for copy pasting a guess at what it would be.

We're even :)

> That would be breadth first, which is close, instead they'd
> need to be some loop that when finished with the files do the same thing for
> the first subdirectory.

Presumably for each subdirectory in turn?

So to conclude, I'd recommend making do with my second example above if it fits your needs - with obviously the sorting more customisable. But if you think there's a strong case for the first example, then do it. Remember though that anyone can get any of these results by writing their own recursive function and repeatedly calling 'shallow', so I wouldn't go too overboard with different options. (Consider whether you even want the sorting callback, since again, the effect can be achieved by just writing it out.)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 20, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8680



--- Comment #8 from entheh@cantab.net 2012-09-19 17:13:31 PDT ---
(In reply to comment #7)
> I think they are unambiguous, but I have a feeling I know what traversal you want.

I apologise, I meant to say I think they are ambiguous.

'parentFirst' could mean you want the *direct contents* of each parent
directory before any subdirectories are considered (your interpretation), or it
could mean you want the parent directory *itself* before any of its children
(files or subdirectories) are considered (my original interpretation).

If you want a name for 'contents of each directory before contents of subdirectories', then maybe... hmm...

directChildrenFirst and directChildrenLast?

They're not great names - other suggestions welcome.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 20, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8680



--- Comment #9 from Jesse Phillips <Jesse.K.Phillips+D@gmail.com> 2012-09-20 10:00:49 PDT ---
(In reply to comment #7)
> I suspect we don't need to offer the 'I don't care' option explicitly. People may as well be specific. Could add a documentation note that the depth-first options will generally be slightly cheaper than breadth-first.

I was not trying to claim traversals that should be available, only emphasizing the ones I've used.

> > Parent First: Give me everything in the Parent First, then you can traverse the children. Almost as you described, except it would be starting a breadth first from each child.
> 
> There are two ways you could get this effect. You could do this:

listdir was deprecated for a very good reason. Would you now kindly consider the steps to fold this into a Range.

We can implement any traversal with shallow, why bother adding depth and breadth or anything else?

I'm very glade you have brought up the naming and implementation issue as I am relying on a behavior which isn't defined, and now I can plan to correct this. But it would be nice if the standard library included the use case I would be using.

> For the sorting case, I was merely suggesting that the sort could be put in the user's control, so they can sort alphabetically or by timestamp or however they want.

With those examples I see this as a very useful addition. Just not supporting the traversal I use.

> So the above options can be reversed in all possible permutations to give what you want here.

And that is even easier if the traversal is already a range.

> So to conclude, I'd recommend making do with my second example above if it fits your needs - with obviously the sorting more customisable. But if you think there's a strong case for the first example, then do it. Remember though that anyone can get any of these results by writing their own recursive function and repeatedly calling 'shallow', so I wouldn't go too overboard with different options. (Consider whether you even want the sorting callback, since again, the effect can be achieved by just writing it out.)

This argument applies to so many languages that I hope you reconsider your position "it can be done with primitives already." Why did you use foreach? Do we really need the function we have goto.

In the beginning I was confused with your issues, to clear up I just wanted to explain the traversal types I do use so whoever gets around to fixing this may consider including them. But the main issue of Breadth not being breadth needs resolved. I can fend for myself, just don't want to.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2