November 07, 2006
On Tue, 07 Nov 2006 16:03:13 +0200, Craig Black <cblack@ara.com> wrote:
> When writing custom C++ iterators, I find that end() is not ever necessary.
> If end() is not used, it means that a little more smarts have to be added to
> the iterator itself so that the iterator knows when to stop.  In some cases
> this means that the iterator needs a pointer to the collection/container
> object in order to get that information.
>

A benefit of having a pointer to the iterated container is of course that boundary checks are always up to date, even if you change the container while iterating it. Well, I'm not saying that it would be the way to do the iterators, or that it's not the way to do them.

Btw, I have used (in C++) iterators having the following functions:

  isEnd()    //does the iterator point to the end position?  (isNotEnd() etc provided for convenience)
             //there is also: isBegin(), isFirst(), isLast()
  toBegin()  //to position -1 (which is outside the container)
  toFirst()  //to position 0
  toLast()   //to position end - 1
  toEnd()    //the end position points outside the container
  toNext()
  toPrev()
  get()      //returns the current item
  read()     //== get() + toNext()
  getNext()
  getPrev()
  readBackward()  //== get() + toPrev()
  ...

I have named iterators capable of changing the container as 'Writer's ('wr' + 'iter'). They have the following functions:

  set()
  write()  //== set() + toNext()
  ...


Ok, there are a lot of functions... but my actual point is that the iterator has two positions, 'begin' and 'end', that point *outside* the container. When writing (or setting) a value to 'end', the value is appended to the end of the container (very natural operation for files). The same way "i.toBegin(); i.set(val);" inserts 'val' to the start of the container (very unnatural operation for files, and actually causes an error if tried for files).
November 07, 2006
"Walter Bright" <newshound@digitalmars.com> wrote in message news:eio6u5$1j6o$1@digitaldaemon.com...
> It's becoming increasingly obvious that D needs iterators. While opApply is far better for some kinds of iteration (such as recursively traversing a directory), iterators are more efficient in some cases, and allow for things that opApply makes difficult.
>
> So hear are the design goals:
>
> 1) Works with dynamic and stat arrays
> 2) Doesn't need to work with associative arrays
> 3) Should be possible to achieve "pointer efficiency" with it
> 4) Needs to be able to restrict lvalue access (what C++ does with const
> iterators)
> 5) Needs to work seamlessly with foreach
> 6) Iterators need to be copyable
>
> So here's one design:
>
> .being property returns an iterator that starts at the beginning
>
> .end returns an iterator that is at the end
>
> Overloading * doesn't work with D. But, instead:
>
> Overload opIndex for rvalue access
> Overload opIndexAssign for lvalue access
>
> Overloading opIndex also will work for random access
>
> foreach loops will not be able to have a 'key' parameter.

It's not a huge problem, but passing an argument needlessly is both ugly and slightly inefficient.  Perhaps opIndex and opIndexAssign could be overloaded without any parameters.

-Craig


November 07, 2006
Walter Bright wrote:
> 
> One such case is the usefulness of being able to provide an input iterator to a parsing function, which may itself pass the iterator off to other parsing functions.

Some questions:

Is your data structure a graph or a tree ?

What does your iterator iterating ?

I ask because I currently investigate the visitor pattern to traverse graph like data structures (see attached code for an example).



November 07, 2006
Sean Kelly wrote:
> Since D has slicing, the argument for using iterators to define the boundaries of a range of randomly accessible elements seems kind of small to me.  ie.
> 
>     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
> or
>     sort( a[0 .. 5] );
> 
> I find the second to be cleaner to look at.  But I'm undecided whether we'd lose any power or flexibility by not bothering with random access iterators.

Hmm, instead of 'pointers as iterators', have 'arrays as iterators'. That definitely sounds like a good area to explore.
November 07, 2006
Walter Bright wrote:
> Sean Kelly wrote:
>> Since D has slicing, the argument for using iterators to define the boundaries of a range of randomly accessible elements seems kind of small to me.  ie.
>>
>>     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
>> or
>>     sort( a[0 .. 5] );
>>
>> I find the second to be cleaner to look at.  But I'm undecided whether we'd lose any power or flexibility by not bothering with random access iterators.
> 
> Hmm, instead of 'pointers as iterators', have 'arrays as iterators'. That definitely sounds like a good area to explore.

I went ahead and wrote everything up during my commute this morning. It's posted separately as "Iterator straw-man."  And I apologize for it being a bit longer than expected :-)

Sean
November 07, 2006
Walter Bright wrote:
> It's becoming increasingly obvious that D needs iterators. While opApply  is far better for some kinds of iteration (such as recursively traversing a directory), iterators are more efficient in some cases, and allow for things that opApply makes difficult.
> 
> So hear are the design goals:
> 
> 1) Works with dynamic and stat arrays
> 2) Doesn't need to work with associative arrays
> 3) Should be possible to achieve "pointer efficiency" with it
> 4) Needs to be able to restrict lvalue access (what C++ does with const iterators)
> 5) Needs to work seamlessly with foreach
> 6) Iterators need to be copyable

I apologize for this long post. Writing too much has become a bad habit of mine lately it seems. This took a while writing, and in the meantime new posts appeared that say much of the same. If you don't bother reading this, read Sean's posts instead, as I agree 100 % with him on this. :)

What I suggest is that we should not try to copy the C++ "iterator == pointer" idiom, but instead aim for an iterator design that is self-contained and simple.

For me, the iterator concept is simple. An iterator is an entity that allows an iterative way of accessing data. An iterator needs:
1. a way to reference the data
2. a way to traverse the data
3. a way to signal the end of the data

(In C++, many iterators lack #3 above, which leads to the necessity of a second iterator that just signals the end of the iterative range.)

An iterator could also be a generator.
Here are some examples (translated to D) of how iterators are implemented in some languages other than C++:

Java
----

class Iterator(T) {
	bool hasNext(); // self explanatory
	T* next();      // steps and returns reference to next element
}

while(i.hasNext())
	writefln(*i.next());


C# (Simplified, from David Gileadi)
-----------------------

class Iterator(T) {
	bool MoveNext();  // false if no more elements
	T Current();      // property getter
}

while(i.MoveNext)
	writefln(i.Current);


Python
------

class Iterator(T) {
	T next();
}

while(true)
	try
		writefln(i.next);
	catch (StopIteration)
		break;


Taking inspiration from the above and following Walter's design goals above, here is my suggestion:

interface Iterator(T) {
	bool step();     // as C# MoveNext
	T value;         // getter (rvalue access)
	T value(T);      // setter (lvalue access)
	int opApply(...) // support foreach style iteration
}

An alternative is supplying a value property that returns a pointer to the element, but that would not allow limiting lvalue access for read only iterators and introduces a pointer, which might be bad by itself.

Usage:

// explicit iteration
while(i.step) {
	writefln(i.value);
	i.value = 5;
}

// implicit iteration
foreach(o; i)
	writefln(o);


I will briefly explain my reasoning:

As I said, I believe the iterator should be clean and simple. Saying that it is OK for the iterator implementations to be dirty and messy, but that the implementations will be hidden away in libraries and only be the headache of library writers will lead to just that - only library writers will write iterators.

As Sean hints at, "random access iterators" are not pure iterators. The random access part is orthogonal to the iterative part. In D, a perfect way to implement a random access view is defining a .length property and an opIndex method. A random access view doesn't need to be iterable and an iterator doesn't need to provide random access. A D slice is the perfect example of a random access view.

I see no reason to use operator overloading when the result isn't replaceable with a pointer anyway.

Bill Baxter's points regarding iterator as a concept vs an interface/abstract class are very valid. And as Bill says, maybe both should be supported. One of them could always be converted to the other.

Regarding efficiency, here is one sample implementation that a compiler should(*) be able to make just as efficient as for/foreach loops:

struct Iterator(T) {
    T* ptr;
    T* end;

    bool step() { return ++ptr != end; }
    T value() { return *ptr; }
    T value(T x) { return *ptr = x; }
    // mixin OpApplyMixin;
}

Iterator!(T) iter(T)(T[] x) {
    Iterator!(T) ret;// = {x.ptr-1, x.ptr+x.length}; // ICE
    ret.ptr = x.ptr-1;
    ret.end = x.ptr+x.length;
    return ret;
}

void main() {
    int[] arr = [1,2,3,4,5];

    auto i = arr.iter();

    while(i.step) {
        writefln("%s",i.value);
        i.value = 5;
    }
}

/Oskar
November 07, 2006
Sean Kelly wrote:
> Since D has slicing, the argument for using iterators to define the boundaries of a range of randomly accessible elements seems kind of small to me.  ie.
>
>     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
> or
>     sort( a[0 .. 5] );
>
> I find the second to be cleaner to look at.  But I'm undecided whether we'd lose any power or flexibility by not bothering with random access iterators.

I agree completely. As someone with a background in Java, C#, and Python (among others), but without significant C++ experience, my concept of an iterator is: "an object with methods for getting the current element, determining whether a 'next' element exists, and for advancing the cursor to the next element".

The C++ iterators look very weird (and very very wrong) to me.

Search and sort routines should use opIndex, opIndexAssign, and opSlice. If an object doesn't implement opIndex or opIndexAssign, it's not sortable using generic sorting algorithms. Tough luck.

Iteration, though, is completely distinct from sorting and searching, and should use a different mechanism.

Walter Bright wrote:
> Hmm, instead of 'pointers as iterators', have 'arrays as iterators'.
> That definitely sounds like a good area to explore.

Perhaps an iterator always steps through an array (or an object implementing a List or Iterable interface), and in order for an object to be foreach-able, you'll need to first transform an object into an array (or construct a List or an Iterable wrapping the object).

Arrays could still get their own special-case implementation of foreach, avoiding interfaces for maximum speed. And, if a class supports opIndex, then you could avoid making a virtual method call through the Iterable interface.

But I think it's also desirable, in many many cases, to implement an Iterator class, without supporting opIndex. The only methods in the iterator would be hasNext() and next().

Foreach should support real arrays, classes with opIndex, and Java-style iterators. Searching and sorting should only use opIndex, opIndex, opCmp, and opEquals. Iterators should only be used for iteration.

Anyhoo, those are my two cents.

--Benji Smith
November 07, 2006
Walter Bright wrote:
> Bill Baxter wrote:
>> Sean Kelly wrote:
>>>> The problem is if the iterator is an actual pointer, there is no .value property for a pointer.
>>
>> Well, they could.  It's up to Mr. Compiler Writer if a pointer has a value property or not.
> 
> Consider the following struct:
> 
>     struct Foo
>     {
>         int value;
>     }
> 
> Foo* p;
> 
> p.value;
> 
> Is p.value the entire contents of Foo (as it would be for a proposed .value property) or just Foo.value? "value" is a pretty common field name.

Maybe that's why Ada used "`" as a property marker rather than "."?  Of course there's the Python approach...naming it __value__.

A suitable solution might be to name it "valueof".  That's a much less common field name.  The problem is that this is something used quite frequently, so one would want it to be short.  Perhaps "V"?  Then one would write
struct Foo
{  int value;
}

Foo* p;

p.value;
p.V;
and have two distinct forms.  (Yes, V is used occasionally as a variable name...but pretty rarely.)
November 07, 2006
Knud Sørensen wrote:
> On Mon, 06 Nov 2006 12:46:01 -0800, Walter Bright wrote:
> 
> 
>>It's becoming increasingly obvious that D needs iterators. While opApply 
>>  is far better for some kinds of iteration (such as recursively 
>>traversing a directory), iterators are more efficient in some cases, and allow for things that opApply makes difficult.
>>
> 
> 
> What are those cases ? Maybe we can find a way to fix the problem with
> opApply.
> 

Mostly agreed.  For example, one case I seem to find mentioned a few times is providing depth-first versus breadth-first iteration on a tree -- supposedly this is hard to do with foreach/opApply.  While I do agree that opApply is not optimal (because you can't tell opApply which way you want to go) I thought this was part of the utility of the recent delegates-as-aggregates feature?  The following is just off the very top of my head, so it might not be ideal/perfect, but I do believe it provides this ability, using current D:

<code>
struct Tree(int B = 2, T) {
  version (Tree_NoWarnings) {}
  else {
    static if (B == 0) {
      static assert(false, "A tree with zero branches is just useless.");
    }
    else static if (B == 1) {
      static assert(false, "A tree with only one branch may as well be an array.");
    }
  }

  Tree*[B] children ;
  T        value    ;

  int walkDepth (int delegate(inout Tree!(T)) dlg) {
    int result ;

    foreach (x; children) {
      if (x && result = x.walkDepth(dlg)) {
        goto end;
      }
    }
    if (result = dlg(this)) {
      goto end;
    }

    end:
      return result;
  }

  int walkBreadth (int delegate(inout Tree!(T)) dlg) {
    int         result          ;
    Tree!(T)*[] gen    = [this] ,
                next            ;

    while (gen.length) {
      foreach (x; gen) {
        if (result = dlg(x.value)) {
          goto end;
        }
        foreach (y; x.children) {
          if (y) {
            next ~= y;
          }
        }
      }
      gen = next.dup;
      next.length = 0;
    }

    end:
      return result;
  }
}

Tree!(2, int) binary_int_tree ;
// ...
foreach (node; &binary_int_tree.walkDepth) {
  // ...
}
foreach (node; &binary_int_tree.walkBreadth) {
  // ...
}
</code>

Unless I'm missing something entirely, I don't see any major utility in this case for having a seperate Iterator entity...

-- Chris Nicholson-Sauls
November 08, 2006
Walter Bright wrote:
> Sean Kelly wrote:
>> Since D has slicing, the argument for using iterators to define the boundaries of a range of randomly accessible elements seems kind of small to me.  ie.
>>
>>     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
>> or
>>     sort( a[0 .. 5] );
>>
>> I find the second to be cleaner to look at.  But I'm undecided whether we'd lose any power or flexibility by not bothering with random access iterators.
> 
> Hmm, instead of 'pointers as iterators', have 'arrays as iterators'. That definitely sounds like a good area to explore.

OOh, I like the direction this is heading.

Sean I don't have time to read the straw-man right now but that definitely solves my recursive mergesort question.

And Benji's comment about simple iterators and ranges being distinct things that shouldn't be lumped together also seems on target.  If iterators don't have to be ranges then it's simple to see how to support iterators over infinite or circular containers.  They just never stop spitting out 'next()'s

But how do you handle generically pointing to an element then?  Like you have with an iterator to a node in linked list.  I guess you can have iterators with hasPrev() getPrev() (or --) type methods too.

--bb