November 08, 2006
Chris Nicholson-Sauls wrote:
> Knud Sørensen wrote:
>> On Mon, 06 Nov 2006 12:46:01 -0800, Walter Bright wrote:
>>
>>
>>> It's becoming increasingly obvious that D needs iterators. While opApply  is far better for some kinds of iteration (such as recursively traversing a directory), iterators are more efficient in some cases, and allow for things that opApply makes difficult.
>>>
>>
>>
>> What are those cases ? Maybe we can find a way to fix the problem with
>> opApply.
>>
> 
> Mostly agreed.  For example, one case I seem to find mentioned a few times is providing depth-first versus breadth-first iteration on a tree -- supposedly this is hard to do with foreach/opApply.  

Other way around.  Tree traversal is harder to do with C++ style iterators, easy to do with opApply.  I assume your code is correct but didn't look at it too closely.

--bb
November 08, 2006
Bill Baxter escribió:
> But how do you handle generically pointing to an element then?  Like you have with an iterator to a node in linked list.  I guess you can have iterators with hasPrev() getPrev() (or --) type methods too.
> 
> --bb

In fact, Java has them:

http://java.sun.com/j2se/1.5.0/docs/api/java/util/ListIterator.html
November 08, 2006

Walter Bright wrote:
> Sean Kelly wrote:
>> Since D has slicing, the argument for using iterators to define the boundaries of a range of randomly accessible elements seems kind of small to me.  ie.
>>
>>     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
>> or
>>     sort( a[0 .. 5] );
>>
>> I find the second to be cleaner to look at.  But I'm undecided whether we'd lose any power or flexibility by not bothering with random access iterators.
> 
> Hmm, instead of 'pointers as iterators', have 'arrays as iterators'. That definitely sounds like a good area to explore.

Hang on; aren't we back to where we are *right now*?

Things that need random access override opIndex and opIndexAssign.

Things which you can take a range of values from override opSlice and opSliceAssign.

Things that require lock-step iteration override opApply, or supply a function that returns a delegate that "runs" a foreach loop.

About the only thing this *doesn't* cover are bi-directional iterators (say, iterating back and forth over an infinite series).

I know some people want to be able to stop iteration and resume it, but couldn't that be solved by making opApply return a generator like what Python does?  Those things are written the same way as a current opApply (sans all the return value from the delegate stuff), and can be advanced manually.

I know this can be done.  I wrote the coroutine module for StackThreads a while back, and one of the *very first* things I did was implement generic generators.  The base generator class essentially just implemented opApply switching over to the coroutine every time it needed the next value.  Here's a port of Python's range() function (well, the one and zero argument versions :P):

> class
> range : Generator!(uint)
> {
>     // These just let you use range() or range(n)
>     static
>     range
>     opCall()
>     {
>         return range(uint.max);
>     }
>
>     static
>     range
>     opCall(uint limit)
>     {
>         return new range(limit);
>     }
>
> protected:
>     uint limit;
>
>     // Or you can use new range(n)
>     this(uint limit)
>     {
>         this.limit = limit;
>         super();
>     }
>
>     // Where the magic happens.
>     override
>     uint
>     run()
>     {
>         uint i = 0;
>         while( i < limit )
>             yield(i++);    // yield in D: *very* sexy
>
>         StopGenerator();   // throws an exception, imitating what
>                            // Python does to stop an iterator.
>     }
> }

Now, from that, we can implement opApply AND next/current in the base class.  Going with coroutines kills both the opApply and forward iterator birds with one stone.

As for bi-directional iterators... I actually can't think of anywhere I'd use them.  I just keep thinking "I could just use random access to the same end".  Python's never suffered from not having them...

Anyway, just some random thoughts.  Haven't had coffee yet :3

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
November 08, 2006
Daniel Keep wrote:
> 
> Walter Bright wrote:
>> Sean Kelly wrote:
>>> Since D has slicing, the argument for using iterators to define the
>>> boundaries of a range of randomly accessible elements seems kind of
>>> small to me.  ie.
>>>
>>>     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
>>> or
>>>     sort( a[0 .. 5] );
>>>
>>> I find the second to be cleaner to look at.  But I'm undecided whether
>>> we'd lose any power or flexibility by not bothering with random access
>>> iterators.
>> Hmm, instead of 'pointers as iterators', have 'arrays as iterators'.
>> That definitely sounds like a good area to explore.
> 
> Hang on; aren't we back to where we are *right now*?

Close, but right now we don't have a good way to iterate over multiple things at once.
  for(i,j; i.hasNext()&&j.hasNext(); i++,j++)
  {
     // do something with current values of i and j
  }

Or as you say to stop iteration or a generic way to return a pointer to a particular spot in a container.

> About the only thing this *doesn't* cover are bi-directional iterators
> (say, iterating back and forth over an infinite series).

> [intersting stuff about generators and D]

If generators can handle the above cases *AND* do it with code that simple to create and use *AND* do all that as efficiently as for loops, then it sounds great to me.  My impression was that any sort of opApply/coroutine thing in D was not going to be so efficient.

--bb
November 08, 2006

Bill Baxter wrote:
> Daniel Keep wrote:
>>
>> Walter Bright wrote:
>>> Sean Kelly wrote:
>>>> Since D has slicing, the argument for using iterators to define the boundaries of a range of randomly accessible elements seems kind of small to me.  ie.
>>>>
>>>>     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
>>>> or
>>>>     sort( a[0 .. 5] );
>>>>
>>>> I find the second to be cleaner to look at.  But I'm undecided whether we'd lose any power or flexibility by not bothering with random access iterators.
>>> Hmm, instead of 'pointers as iterators', have 'arrays as iterators'. That definitely sounds like a good area to explore.
>>
>> Hang on; aren't we back to where we are *right now*?
> 
> Close, but right now we don't have a good way to iterate over multiple
> things at once.
>   for(i,j; i.hasNext()&&j.hasNext(); i++,j++)
>   {
>      // do something with current values of i and j
>   }
> 
> Or as you say to stop iteration or a generic way to return a pointer to a particular spot in a container.
> 
>> About the only thing this *doesn't* cover are bi-directional iterators (say, iterating back and forth over an infinite series).
> 
>> [intersting stuff about generators and D]
> 
> If generators can handle the above cases *AND* do it with code that simple to create and use *AND* do all that as efficiently as for loops, then it sounds great to me.  My impression was that any sort of opApply/coroutine thing in D was not going to be so efficient.
> 
> --bb

Well, as I said, once you have your sequence expressed as a coroutine,
writing hasNext() and getCurrent() is trivial.  Once you have that,
iterating multiple things in lockstep is also trivial:

> class zip : Generator!(Tuple!(T1,T2))
> {
>     // ...
>     Generator!(T1) seq1;
>     Generator!(T2) seq2;
>     // ...
>     void run()
>     {
>         while( seq1.hasNext && seq2.hasNext )
>             yield(new Tuple!(T1,T2)(seq1.next,seq2.next));
>     }
>     // ...
> }

Or something to that effect :P

As for efficiency, I'm not sure how to go about testing that so that the results mean anything.  I'm sure my current implementation could be made faster, but I don't know how.

Just for loose comparison, the game EVE Online has the record for the
most number of people simultaneously online on a single server at once.
 And it's servers are built using coroutines running in *Python*.

I suppose the bottleneck (if it is one) is how fast you can do user-space context switches.  Which is basically a function call, moving the base of the stack, and switching over stuff like the exception handling registers.  Or somesuch... Mikola Lysenko wrote StackThreads; my attempt only mostly worked :P

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
November 08, 2006
>interface Iterator(T) {
>    bool step();     // as C# MoveNext
>    T value;         // getter (rvalue access)
>    T value(T);      // setter (lvalue access)
>    int opApply(...) // support foreach style iteration
>}
I think that this is more better.

interface Iterator(T) {
    bool opNext();     // as C# MoveNext.
                       // also support in pointer with opBack().
    bool opNext(int);  // speed hack
    T opValue;         // getter (rvalue access). also support in pointer
    T opValue(T);      // setter (lvalue access). also support in pointer
    int opApply(...) // support foreach style iteration
}

while(iter.next)//call opNext()
  writefln(*iter);//call opValue()
November 08, 2006
Oskar Linde wrote:
> As Sean hints at, "random access iterators" are not pure iterators. The random access part is orthogonal to the iterative part. In D, a perfect way to implement a random access view is defining a .length property and an opIndex method. A random access view doesn't need to be iterable and an iterator doesn't need to provide random access. A D slice is the perfect example of a random access view.

You two might be right.

> I see no reason to use operator overloading when the result isn't replaceable with a pointer anyway.

Right.

> Bill Baxter's points regarding iterator as a concept vs an interface/abstract class are very valid. And as Bill says, maybe both should be supported. One of them could always be converted to the other.

Yes.

> Regarding efficiency, here is one sample implementation that a compiler should(*) be able to make just as efficient as for/foreach loops:
> 
> struct Iterator(T) {
>     T* ptr;
>     T* end;
> 
>     bool step() { return ++ptr != end; }
>     T value() { return *ptr; }
>     T value(T x) { return *ptr = x; }
>     // mixin OpApplyMixin;
> }
> 
> Iterator!(T) iter(T)(T[] x) {
>     Iterator!(T) ret;// = {x.ptr-1, x.ptr+x.length}; // ICE
>     ret.ptr = x.ptr-1;
>     ret.end = x.ptr+x.length;
>     return ret;
> }
> 
> void main() {
>     int[] arr = [1,2,3,4,5];
> 
>     auto i = arr.iter();
> 
>     while(i.step) {
>         writefln("%s",i.value);
>         i.value = 5;
>     }
> }

It's good, but the pointer-before-the-start is problematic. This can cause problems on some architectures. Pointing past the end is explicitly supported, but before the beginning is not. I'd also prefer the step() get 'stuck' at the end, instead of going past it if it gets called again (this makes sense for input iterators). So I suggest instead:

	for (auto i = arr.iter(); !i.atEnd(); i.step())
		...

in other words, separate out the 'end' test from the 'step' operation.
November 08, 2006
Daniel Keep wrote:
> Things that need random access override opIndex and opIndexAssign.
> 
> Things which you can take a range of values from override opSlice and
> opSliceAssign.
> 
> Things that require lock-step iteration override opApply, or supply a
> function that returns a delegate that "runs" a foreach loop.
> 
> About the only thing this *doesn't* cover are bi-directional iterators
> (say, iterating back and forth over an infinite series).

What about containers where there are multiple valid orders of iteration? For a binary tree class, I might want to perform iteration using depth-first, breadth-first, in-order, or reverse-order traversals. How does the opApply/opApplyReverse solution address these needs?

And isn't a delegate-call about as computationally expensive as a virtual method call (in the case of an Iterable iterface)?

If the delegate solution is really better than an Iterable interface, why create special cases for opApply and opApplyReverse? Why not just say that foreach always requires an iteration delegate?

--benji
November 08, 2006
Daniel Keep wrote:
> 
> Walter Bright wrote:
>> Sean Kelly wrote:
>>> Since D has slicing, the argument for using iterators to define the
>>> boundaries of a range of randomly accessible elements seems kind of
>>> small to me.  ie.
>>>
>>>     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
>>> or
>>>     sort( a[0 .. 5] );
>>>
>>> I find the second to be cleaner to look at.  But I'm undecided whether
>>> we'd lose any power or flexibility by not bothering with random access
>>> iterators.
>> Hmm, instead of 'pointers as iterators', have 'arrays as iterators'.
>> That definitely sounds like a good area to explore.
> 
> Hang on; aren't we back to where we are *right now*?

Essentially.  But D has no formal description of an iterator type.  I think it would be a useful convention to establish, and doing so requires determining what types of iterators should be supported, and how.

> Things that need random access override opIndex and opIndexAssign.
> 
> Things which you can take a range of values from override opSlice and
> opSliceAssign.
> 
> Things that require lock-step iteration override opApply, or supply a
> function that returns a delegate that "runs" a foreach loop.
> 
> About the only thing this *doesn't* cover are bi-directional iterators
> (say, iterating back and forth over an infinite series).

This wouldn't be too difficult to do by adding a prev() method and atBegin() or some such.  Or maybe the Java style of hasNext(), next(), hasPrev(), prev().

> I know some people want to be able to stop iteration and resume it, but
> couldn't that be solved by making opApply return a generator like what
> Python does?  Those things are written the same way as a current opApply
> (sans all the return value from the delegate stuff), and can be advanced
> manually.
> 
> I know this can be done.  I wrote the coroutine module for StackThreads
> a while back, and one of the *very first* things I did was implement
> generic generators.  The base generator class essentially just
> implemented opApply switching over to the coroutine every time it needed
> the next value.  Here's a port of Python's range() function (well, the
> one and zero argument versions :P):
> 
>> class
>> range : Generator!(uint)
>> {
>>     // These just let you use range() or range(n)
>>     static
>>     range
>>     opCall()
>>     {
>>         return range(uint.max);
>>     }
>>
>>     static
>>     range
>>     opCall(uint limit)
>>     {
>>         return new range(limit);
>>     }
>>
>> protected:
>>     uint limit;
>>
>>     // Or you can use new range(n)
>>     this(uint limit)
>>     {
>>         this.limit = limit;
>>         super();
>>     }
>>
>>     // Where the magic happens.
>>     override
>>     uint
>>     run()
>>     {
>>         uint i = 0;
>>         while( i < limit )
>>             yield(i++);    // yield in D: *very* sexy
>>
>>         StopGenerator();   // throws an exception, imitating what
>>                            // Python does to stop an iterator.
>>     }
>> }
> 
> Now, from that, we can implement opApply AND next/current in the base
> class.  Going with coroutines kills both the opApply and forward
> iterator birds with one stone.

Interesting idea--I'll admit I don't have much experience with Python. But I think both approaches have value.  Using coroutines just to step across two sequences simultaneously seems unnecessarily complex, and not terribly efficient compared to a simple pointer increment.  But it does offer a tremendous amount of power for more complex cases.

> As for bi-directional iterators... I actually can't think of anywhere
> I'd use them.  I just keep thinking "I could just use random access to
> the same end".  Python's never suffered from not having them...

I don't think I've ever needed them either, but they can come in handy when implementing one container in terms of another.  Say an iterable stack and queue (unlikely, I know) that are implemented via a doubly linked-list.


Sean
November 08, 2006
Daniel Keep wrote:
> 
> I suppose the bottleneck (if it is one) is how fast you can do
> user-space context switches.  Which is basically a function call, moving
> the base of the stack, and switching over stuff like the exception
> handling registers.  Or somesuch... Mikola Lysenko wrote StackThreads;
> my attempt only mostly worked :P

That's pretty much it.  Saving the current register data, loading the other register data, and jumping to a new instruction address.  There's no way this can be as efficient as an integer increment instruction (for loop iteration) but it's orders of magnitude faster than a kernel thread context switch.


Sean