Thread overview
Iterator straw-man
Nov 07, 2006
Sean Kelly
Nov 07, 2006
Craig Black
Nov 07, 2006
Sean Kelly
Nov 08, 2006
Bill Baxter
Nov 08, 2006
Sean Kelly
Nov 08, 2006
Bill Baxter
Nov 08, 2006
Benji Smith
Nov 08, 2006
Bill Baxter
Nov 08, 2006
Sean Kelly
Nov 08, 2006
Bill Baxter
November 07, 2006
C++ supports pointers as iterators largely for compatibility with existing/old code and because of the prevalence of pointer use in the language.  D however, is not burdened with the need for backwards compatibility, nor is pointer use at all common.  Languages without pointers typically implement iterators as a single object that is itself aware of the end of the referenced sequence.  But while this works quite well for sequential operations, it falls apart somewhat for random access operations, particularly in languages lacking operator overloading.

Unlike C++, D contains a fairly robust built-in dynamic array type. These arrays may be passed by value efficiently and contain both built-in property methods and user extension of property semantics using the following convention:

    void mutate( char[] c, int i );
    char[] c;
    c.mutate( 0 );

D also supports a slice syntax which is roughly comparable to a self-contained random access iterator.  Thus "abcdef"[0 .. 2] returns a dynamic array type supporting all the typical array operations, including the length property for determining an end point.  Slicing is supported for user-defined types as well, so this method may be applied to any container supporting random access.

Given the presence of D's slice syntax, it would be a shame to ignore it in favor of pointer-style random access iterators for algorithm use. And while the idea of a single iterator design for all access methods (sequential and random access) seems vaguely appealing, it relies heavily on traits templates to allow proper function overloading for different iterator categories.  In fact, overloading template functions for different iterator categories is so annoying that the C++ committee is adding concept checking to C++ 0x to simplify the implementation of such overloaded template functions.

I believe D's slice syntax offers a perfect opportunity to provide a reasonable, intuitive syntax for robust "smart" random-access iterators.  The built-in array type will work as-is, and a user-defined random access container could support the same syntax by providing opIndex, opIndexAssign, opSlice, and opSliceAssign for both the container and iterator types.  Obtaining an iterator on a random access container, then, would be done via a slice operation:

    auto i = myCont[0 .. $];
    foreach( e; i ) {} // sequential access
    auto e = i[0];     // random access

The remaining iterator categories in C++ are essentially variations of forward iterators and bidirectional iterators, and could roughly be described as requiring a means of advancing, testing whether the iterator references a valid element, and of accessing the referenced element.  Testing validity in C++ is done by comparing the data iterator with an "end" iterator for equality, as again, this syntax works with pointers as well.  But using two separate iterators for defining the bounds of a range:

    * Is cumbersome.  It means passing two parameters to every function requiring an iterator instead of one.

    * Is dangerous.  Comparing the "end" iterator of one range with the data iterator of another range is legal if the data (and possibly container) types match, and can result in an attempt to read/write past the end of the referenced sequence.

    * Is unnecessary.  It is more natural, and more common, to perform random access operations via array index operations, which leaves typical iterator use largely restricted to sequential access.  This being the case, why support pointers as valid iterators?

What follows is a rough description of a "smart" read-only iterator for D:

interface Iterator<T>
{
    bool atEnd();
    T next();
    T curr();
    int opApply( int delegate( T ) dg );
}

The nuances are unimportant, but the basic idea is that any such iterator should provide a means of advancing, testing whether the iterator references a valid element, and of referencing that element. Also, opApply is supported to allow iterators to be used in foreach operations for simple uses.

Assuming such a design, the iterator type determines progress through the container (forward, reverse, stepped, etc) rather than the algorithm being applied.  Here is a straightforward implementation of some algorithms using such iterators, 'C' is a container type for demonstration:

    bool contains( C cont, E match ) {
        foreach( e; cont ) {
            if( e == match )
                return true;
        }
        return false;
    }

    I find_first( C cont, E match ) {
        auto i = cont.fwdIt;
        foreach( e; i ) {
            if( e == match )
                return i;
        }
        return i;
    }

    I find_last( C cont, E match ) {
        auto i = cont.revIt;
        foreach( e; i ) {
            if( e == match )
                return i;
        }
        return i;
    }

Given the above, it seems awkward that the iterator must be declared outside of foreach so a copy may be made to return the position of the found element.  Therefore, foreach should allow an arbitrary key type to be returned by opApply.  For array operations, the key could continue to be an integer representing the array postion., but in sequence operations the key could be an iterator representing the current position.  Thus, the following operations would be equivalent:

1)  auto i = cont.fwdIt;
    foreach( e; i ) {
        if( e == match )
            return i;
    }

2)  foreach( i, e; cont.fwdIt ) {
        if( e == match )
            return i;
    }

3)  foreach( i, e; cont ) {
        if( e == match )
            return i;
    }

This being the case, I believe iterators must be copyable in a generic manner, and ideally, copies must be relatively efficient to perform. Therefore, I suggest the addition of a .dup() property to the standard iterator interface to flatten the differences between class and struct-based iterators, and it may be advisable to use structs as iterators in the standard case and forget about classes and interfaces altogether.  That said, if a reference to the current iterator could be passed around within foreach instead of a copy being made for each iteration, then much of the cost could be eliminated.  This should easily be possible using the same auto-dereferencing semantics used by "inout" parameters in D already.

So in summation, D adequately represents "smart" random access iterators via array slice semantics and sequential iteration is best represented using a single "smart" iterator object as opposed to a pair of "dumb" iterators as per C++.  Finally, the example code above suggests that the need for a foreach_reverse statement may not be necessary if fwdIt and revIt properties are provided for built-in arrays.  Though the only functional difference between:

    foreach( i, e; array )

and

    foreach( i, e; array.fwdIt )

would be the "key" type returned in 'i'.  In the first case it would be an integer index and in the second it would be an iterator.

As an aside, foreach_reverse may remain useful in instances where we want the index of the current element instead of an iterator.  Or array iterators may provide a means to obtain the relevant index.  Perhaps someone could make an argument for or against based on these ideas.
November 07, 2006
I do agree that using classes and interfaces to represent iterators is a bad idea for performance reasons.  I also agree that I don't think it's necessary to support raw pointers.

IMO iterators should be structs such that iteration can be performed as follows:

for(auto i = container.begin(); i.isValid(); i.next())  write(i.value);

If I am not mistaken, this can already be done in D.  With foreach support this would simplify to.

foreach(i; container)  write(i);

-Craig


November 07, 2006
Craig Black wrote:
> I do agree that using classes and interfaces to represent iterators is a bad idea for performance reasons.  I also agree that I don't think it's necessary to support raw pointers.
> 
> IMO iterators should be structs such that iteration can be performed as follows:
> 
> for(auto i = container.begin(); i.isValid(); i.next())  write(i.value);
> 
> If I am not mistaken, this can already be done in D.  With foreach support this would simplify to.
> 
> foreach(i; container)  write(i);

Yup.  The reason for a proposal is twofold: to establish a common idiom for people to follow so code can interact reliably, and to determine whether any built-in behavior needs to be tweaked to support iterators in a way this is both generic (for UDTs, AAs, and arrays) and elegant.


Sean
November 08, 2006
Craig Black wrote:
> I do agree that using classes and interfaces to represent iterators is a bad idea for performance reasons.  I also agree that I don't think it's necessary to support raw pointers.
> 
> IMO iterators should be structs such that iteration can be performed as follows:
> 
> for(auto i = container.begin(); i.isValid(); i.next())  write(i.value);
> 
> If I am not mistaken, this can already be done in D.  With foreach support this would simplify to.
> 
> foreach(i; container)  write(i);

I haven't read Seans big long proposal yet, but I think the method that returns an iterator should not be called 'begin'.  I should be called 'forward' or 'iterator' or 'iter' or 'forward_iterator' or something like that.

Then i would hope that all of the following would be possible:

   foreach(i; container.iter)  write(i);
   foreach(i; container.reverse_iter)  write(i);
   foreach(i; container.my_skip_every_prime_number_iter) write(i);

foreach(i; container) can look for the .iter (or opIterator, or whatever standard name is decided upon).

foreach_reverse(i; container) [shudder] can look for the .riter (or opIteratorReverse, or whatever).

--bb

November 08, 2006
Sean Kelly wrote:

I mostly agree with you.  But have some comments on the specifics.

> What follows is a rough description of a "smart" read-only iterator for D:
> 
> interface Iterator<T>
> {
>     bool atEnd();
>     T next();
>     T curr();
>     int opApply( int delegate( T ) dg );
> }
> 
> The nuances are unimportant, but the basic idea is that any such iterator should provide a means of advancing, testing whether the iterator references a valid element, and of referencing that element. Also, opApply is supported to allow iterators to be used in foreach operations for simple uses.

I don't think opApply should be the mechanism by which foreach accepts iterators.  It's just too convoluted and too much a burden on iterator implementors.  And it's not going to be as efficient as a raw for-loop, either.

> Assuming such a design, the iterator type determines progress through the container (forward, reverse, stepped, etc) rather than the algorithm being applied.  Here is a straightforward implementation of some algorithms using such iterators, 'C' is a container type for demonstration:
> 
>     bool contains( C cont, E match ) {
>         foreach( e; cont ) {
>             if( e == match )
>                 return true;
>         }
>         return false;
>     }

It should be possible to make a template out of that, too.

     bool contains(C,E)( C cont, E match ) {
         foreach( e; cont ) {
             if( e == match )
                 return true;
         }
         return false;
     }

Current compiler limitations aside, that *should* be reducible to the same efficiency as a for loop with simple pointer (or index) manipulation.

> [...] 
> 
> Given the above, it seems awkward that the iterator must be declared outside of foreach so a copy may be made to return the position of the found element.  Therefore, foreach should allow an arbitrary key type to be returned by opApply.  

I like that.  It makes sense.  The iterator itself is analogous to the int index when foreach-ing arrays.  That seems very logical.

Given that similarity it seems like it would make sense for indexing a container by an iterator to work as well:

foreach( i, e; cont.fwdIt ) {
      if( e == match ) {
          writefln("Matched item ", cont[i]);
          return i;
      }
}

I.e. cont[i] becomes synonymous with i.curr().

That would be easy to arrange if opIndex was allowed to take an iterator as the index.  (I think it's a good idea to remove the restriction on opIndex arguments in any event, but this is another justification.)

An opIndex for an iterator would take the simple form:

   static T opIndex(ContIterator i) { return i.curr(); }

If you had write-able iterator with say a .set(T v) method, then you could also have opIndexAssign:

   static T opIndexAssign(ContIterator i, T x) { return i.set(x); }

If the Iterator type has a pointer to its container, you could make the methods non-static and assert(this==i.container).

> As an aside, foreach_reverse may remain useful in instances where we want the index of the current element instead of an iterator.  Or array iterators may provide a means to obtain the relevant index.  Perhaps someone could make an argument for or against based on these ideas.

foreach_reverse(i,x; cont) would remain (marginally) useful as a shortcut for the special reverse iterator name:  I.e. a synonym for
    foreach(i,x; cont.revIter)

I'm guessing the specially-ordained iterator method names will probably end up being opIter and opIterReverse (or maybe same with 'Iterator' spelled out) just following the pattern established so far.

--bb
November 08, 2006
Bill Baxter wrote:
> I'm guessing the specially-ordained iterator method names will probably end up being opIter and opIterReverse (or maybe same with 'Iterator' spelled out) just following the pattern established so far.

Why the continued assumption that there are only two valid kinds of iterator? A hashtable might have forward and reverse iterators for both its keys and its values, thereby requiring four distinct iterators. A tree might have depth-first, breadth-first, in-order, and reverse-order iterators. Any collection class might have a random-order iterator. I think it's awfully presumptuous to bake the notion of forward and reverse iteration directly into the language.

--benji
November 08, 2006
Bill Baxter wrote:
> Craig Black wrote:
>> I do agree that using classes and interfaces to represent iterators is a bad idea for performance reasons.  I also agree that I don't think it's necessary to support raw pointers.
>>
>> IMO iterators should be structs such that iteration can be performed as follows:
>>
>> for(auto i = container.begin(); i.isValid(); i.next())  write(i.value);
>>
>> If I am not mistaken, this can already be done in D.  With foreach support this would simplify to.
>>
>> foreach(i; container)  write(i);
> 
> I haven't read Seans big long proposal yet, but I think the method that returns an iterator should not be called 'begin'.  I should be called 'forward' or 'iterator' or 'iter' or 'forward_iterator' or something like that.

I agree.  In my proposal I used fwdIt and revIt for lack of a better idea.


Sean
November 08, 2006
Bill Baxter wrote:
> Sean Kelly wrote:
> 
> I mostly agree with you.  But have some comments on the specifics.
> 
>> What follows is a rough description of a "smart" read-only iterator for D:
>>
>> interface Iterator<T>
>> {
>>     bool atEnd();
>>     T next();
>>     T curr();
>>     int opApply( int delegate( T ) dg );
>> }
>>
>> The nuances are unimportant, but the basic idea is that any such iterator should provide a means of advancing, testing whether the iterator references a valid element, and of referencing that element. Also, opApply is supported to allow iterators to be used in foreach operations for simple uses.
> 
> I don't think opApply should be the mechanism by which foreach accepts iterators.  It's just too convoluted and too much a burden on iterator implementors.  And it's not going to be as efficient as a raw for-loop, either.

True enough.  I mostly added it as an alternative for situations where users didn't need the power of a manual loop.  And I think implementation would be pretty simple for most/all iterators:

    for( ; !atEnd(); next() )
    {
        ret = dg( curr() );
        // handle ret
    }

>> Assuming such a design, the iterator type determines progress through the container (forward, reverse, stepped, etc) rather than the algorithm being applied.  Here is a straightforward implementation of some algorithms using such iterators, 'C' is a container type for demonstration:
>>
>>     bool contains( C cont, E match ) {
>>         foreach( e; cont ) {
>>             if( e == match )
>>                 return true;
>>         }
>>         return false;
>>     }
> 
> It should be possible to make a template out of that, too.
> 
>      bool contains(C,E)( C cont, E match ) {
>          foreach( e; cont ) {
>              if( e == match )
>                  return true;
>          }
>          return false;
>      }
> 
> Current compiler limitations aside, that *should* be reducible to the same efficiency as a for loop with simple pointer (or index) manipulation.

Yup.  The function above was meant for illustrating an idea more than it was a proposal for how to implement a contains function :-)

>> [...]
>> Given the above, it seems awkward that the iterator must be declared outside of foreach so a copy may be made to return the position of the found element.  Therefore, foreach should allow an arbitrary key type to be returned by opApply.  
> 
> I like that.  It makes sense.  The iterator itself is analogous to the int index when foreach-ing arrays.  That seems very logical.
> 
> Given that similarity it seems like it would make sense for indexing a container by an iterator to work as well:
> 
> foreach( i, e; cont.fwdIt ) {
>       if( e == match ) {
>           writefln("Matched item ", cont[i]);
>           return i;
>       }
> }
> 
> I.e. cont[i] becomes synonymous with i.curr().

Hrm, good point.

> That would be easy to arrange if opIndex was allowed to take an iterator as the index.  (I think it's a good idea to remove the restriction on opIndex arguments in any event, but this is another justification.)

Is it currently required to be an integer?  I had no idea.

> An opIndex for an iterator would take the simple form:
> 
>    static T opIndex(ContIterator i) { return i.curr(); }
> 
> If you had write-able iterator with say a .set(T v) method, then you could also have opIndexAssign:
> 
>    static T opIndexAssign(ContIterator i, T x) { return i.set(x); }
> 
> If the Iterator type has a pointer to its container, you could make the methods non-static and assert(this==i.container).

All good ideas.

>> As an aside, foreach_reverse may remain useful in instances where we want the index of the current element instead of an iterator.  Or array iterators may provide a means to obtain the relevant index.  Perhaps someone could make an argument for or against based on these ideas.
> 
> foreach_reverse(i,x; cont) would remain (marginally) useful as a shortcut for the special reverse iterator name:  I.e. a synonym for
>     foreach(i,x; cont.revIter)

True enough.

> I'm guessing the specially-ordained iterator method names will probably end up being opIter and opIterReverse (or maybe same with 'Iterator' spelled out) just following the pattern established so far.

Following the D convention, the 'op' prefix suggests a use form different from that of a normal function call.  Personally, I'd be fine with this done as plain old property methods: fwdIt/revIt, or something along those lines.


Sean
November 08, 2006
Sean Kelly wrote:
> Bill Baxter wrote:
> 
>> Sean Kelly wrote:
>>

> 
> Following the D convention, the 'op' prefix suggests a use form different from that of a normal function call.  Personally, I'd be fine with this done as plain old property methods: fwdIt/revIt, or something along those lines.

That's true.  With all current uses you could see the 'op' as synonymous with "you ain't never gonna call this directly".   So something intended to be called directly doesn't necessarily have to follow that convention.

--bb
November 08, 2006
Benji Smith wrote:
> Bill Baxter wrote:
> 
>> I'm guessing the specially-ordained iterator method names will probably end up being opIter and opIterReverse (or maybe same with 'Iterator' spelled out) just following the pattern established so far.
> 
> 
> Why the continued assumption that there are only two valid kinds of iterator? 

Because those two will be recognized specially by foreach and foreach_reverse without having to specify a method name.  But you'll still be able to pass in a specific iterator of whatever kind you wish.

A hashtable might have forward and reverse iterators for both
> its keys and its values, thereby requiring four distinct iterators. 

Yep, for hashtable you may have to name them all explicitly:
   foreach(k; ht.keys)
   foreach(v; ht.values)
   foreach(k; ht.keysReverse)
   foreach(v; ht.valuesReverse)

A
> tree might have depth-first, breadth-first, in-order, and reverse-order iterators. Any collection class might have a random-order iterator. I think it's awfully presumptuous to bake the notion of forward and reverse iteration directly into the language.

Can't say I disagree with you, but it's already there with foreach and foreach_reverse and opApply/opApplyReverse.

--bb