Iterators for D (page 8)

November 08, 2006

Re: Iterators for D

Posted by Bill Baxter
in reply to Sean Kelly

Permalink

Bill Baxter

Posted in reply to Sean Kelly

Permalink

Sean Kelly wrote:
> Daniel Keep wrote:
> 
>>
>> Walter Bright wrote:
>>
>>> Sean Kelly wrote:
>>>
>>>> Since D has slicing, the argument for using iterators to define the
>>>> boundaries of a range of randomly accessible elements seems kind of
>>>> small to me.  ie.
>>>>
>>>>     sort( a.begin, a.begin + 5 ); // sort the first 5 elements
>>>> or
>>>>     sort( a[0 .. 5] );
>>>>
>>>> I find the second to be cleaner to look at.  But I'm undecided whether
>>>> we'd lose any power or flexibility by not bothering with random access
>>>> iterators.
>>>
>>> Hmm, instead of 'pointers as iterators', have 'arrays as iterators'.
>>> That definitely sounds like a good area to explore.
>>
>>
>> Hang on; aren't we back to where we are *right now*?
> 
> 
> Essentially.  But D has no formal description of an iterator type.  I think it would be a useful convention to establish, and doing so requires determining what types of iterators should be supported, and how.
> 
>> Things that need random access override opIndex and opIndexAssign.
>>
>> Things which you can take a range of values from override opSlice and
>> opSliceAssign.
>>
>> Things that require lock-step iteration override opApply, or supply a
>> function that returns a delegate that "runs" a foreach loop.
>>
>> About the only thing this *doesn't* cover are bi-directional iterators
>> (say, iterating back and forth over an infinite series).
> 
> 
> This wouldn't be too difficult to do by adding a prev() method and atBegin() or some such.  Or maybe the Java style of hasNext(), next(), hasPrev(), prev().
> 
>> I know some people want to be able to stop iteration and resume it, but
>> couldn't that be solved by making opApply return a generator like what
>> Python does?  Those things are written the same way as a current opApply
>> (sans all the return value from the delegate stuff), and can be advanced
>> manually.
>>
>> I know this can be done.  I wrote the coroutine module for StackThreads
>> a while back, and one of the *very first* things I did was implement
>> generic generators.  The base generator class essentially just
>> implemented opApply switching over to the coroutine every time it needed
>> the next value.  Here's a port of Python's range() function (well, the
>> one and zero argument versions :P):
>>
>>> class
>>> range : Generator!(uint)
>>> {
>>>     // These just let you use range() or range(n)
>>>     static
>>>     range
>>>     opCall()
>>>     {
>>>         return range(uint.max);
>>>     }
>>>
>>>     static
>>>     range
>>>     opCall(uint limit)
>>>     {
>>>         return new range(limit);
>>>     }
>>>
>>> protected:
>>>     uint limit;
>>>
>>>     // Or you can use new range(n)
>>>     this(uint limit)
>>>     {
>>>         this.limit = limit;
>>>         super();
>>>     }
>>>
>>>     // Where the magic happens.
>>>     override
>>>     uint
>>>     run()
>>>     {
>>>         uint i = 0;
>>>         while( i < limit )
>>>             yield(i++);    // yield in D: *very* sexy
>>>
>>>         StopGenerator();   // throws an exception, imitating what
>>>                            // Python does to stop an iterator.
>>>     }
>>> }
>>
>>
>> Now, from that, we can implement opApply AND next/current in the base
>> class.  Going with coroutines kills both the opApply and forward
>> iterator birds with one stone.
> 
> 
> Interesting idea--I'll admit I don't have much experience with Python. But I think both approaches have value.  Using coroutines just to step across two sequences simultaneously seems unnecessarily complex, and not terribly efficient compared to a simple pointer increment.  But it does offer a tremendous amount of power for more complex cases.
> 
>> As for bi-directional iterators... I actually can't think of anywhere
>> I'd use them.  I just keep thinking "I could just use random access to
>> the same end".  Python's never suffered from not having them...
> 
> 
> I don't think I've ever needed them either, but they can come in handy when implementing one container in terms of another.  Say an iterable stack and queue (unlikely, I know) that are implemented via a doubly linked-list.

I wrote some mesh (graph) manipulation algorithms in Python.  I really needed a linked list for that and a way to point to e.g. an edge in a list of edges coming out of a vertex that wouldn't be invalidated by inserting edges here and there before or after it, and I needed both to find the previous edge and the following edge.  In C++ std::list would have been the clear choice.

Doing that with indexes into to a standard python list just isn't practical (I started out trying it that way because Python FAQ's seem to say "you don't need a linked list, just use a python list".  But with a python list, every time you insert one new item you have to go find everyone out there that's holding onto an index into this list and get them to update their values.  Just not practical.

So sometimes bidirectional iterators are needed.  Even in Python.

--bb

Benji Smith wrote: > Daniel Keep wrote: > >> Things that need random access override opIndex and opIndexAssign. >> >> Things which you can take a range of values from override opSlice and >> opSliceAssign. >> >> Things that require lock-step iteration override opApply, or supply a >> function that returns a delegate that "runs" a foreach loop. >> >> About the only thing this *doesn't* cover are bi-directional iterators >> (say, iterating back and forth over an infinite series). > > > What about containers where there are multiple valid orders of iteration? For a binary tree class, I might want to perform iteration using depth-first, breadth-first, in-order, or reverse-order traversals. How does the opApply/opApplyReverse solution address these needs? > > And isn't a delegate-call about as computationally expensive as a virtual method call (in the case of an Iterable iterface)? It should be roughly equivelant in cost to explicitly calling the method, meaning if the method is virtual then you will get that cost, and if it is final then it is like any function call. Since Iterable methods will very likely almost always be virtual, there is at least a chance of delegates being a little cheaper /some of the time/ -- when they can be pointers to final methods. > If the delegate solution is really better than an Iterable interface, why create special cases for opApply and opApplyReverse? Why not just say that foreach always requires an iteration delegate? > > --benji Well, when foreach(;) was first added to D, I don't think anyone had thought about delegates-as-aggregates yet. In fact, it was just added a couple of versions back. So I suppose in a sense, opApply*Reverse is just legacy -- though it makes it straight-forward to iterate simple, linear-only collections. I suppose it would be safe enough to do away with opApply*Reverse now, but I don't think it hurts anything to leave them be, either. -- Chris Nicholson-Sauls

I don't like C++ iterators; they're a very old design made to fit a language that was unmodifiable. They can't work with destructive or unbound iterators (well, they can, but you just gotta hope to hell the user doesn't use the interface in an unusual way), they can't work with associative arrays or tree structures without allocation (or wasting space in the structure for parent node pointers), and they're VERY verbose and distributed. Plus they allow iterating to an illegal index (the end index), which requires an additional state value for some iterables. Python's generators are a far better solution, like: void i_range (T) (T from, T to, T step = cast (T) 1, yield T index) { for (index = from; index < to; index += step) yield; } foreach (i; i_range (0, 100)) ... Or: yield T i_range (T) (T from, T to, T step = cast (T) 1) { T index; for (index = from; index < to; index += step) yield index; } You've been meaning to allow frame pointers for proper closures anyway, right? :P The implementation in iterators using the simplest interface I can think of would be something like: struct IRange (T, T from, T to, T step = cast (T) 1) { T index = from; static IRange opCall (T value) { IRange result; result.index = value; return result; } T opThis () { return index; } IRange opNext () { assert (index <= end); return opCall (index + 1); } IRange opEnd () { return opCall (to); } } foreach (i; IRange! (int, 0, 100)) ... Notice how the SINGLE line indicating how the arguments are interpreted are hidden within a whole bunch of boilerplate.

Walter Bright wrote: >> >> struct Iterator(T) { >> T* ptr; >> T* end; >> >> bool step() { return ++ptr != end; } >> T value() { return *ptr; } >> T value(T x) { return *ptr = x; } >> // mixin OpApplyMixin; >> } >> >> Iterator!(T) iter(T)(T[] x) { >> Iterator!(T) ret;// = {x.ptr-1, x.ptr+x.length}; // ICE >> ret.ptr = x.ptr-1; >> ret.end = x.ptr+x.length; >> return ret; >> } >> >> void main() { >> int[] arr = [1,2,3,4,5]; >> >> auto i = arr.iter(); >> >> while(i.step) { >> writefln("%s",i.value); >> i.value = 5; >> } >> } > > It's good, but the pointer-before-the-start is problematic. This can cause problems on some architectures. Pointing past the end is explicitly supported, but before the beginning is not. I'd also prefer the step() get 'stuck' at the end, instead of going past it if it gets called again (this makes sense for input iterators). So I suggest instead: > > for (auto i = arr.iter(); !i.atEnd(); i.step()) > ... > > in other words, separate out the 'end' test from the 'step' operation. For arrays, we already have arr.ptr, what if array.end was simply added? auto slice = array[50..100]; for(auto i = slice.ptr; i != slice.end; i++) { ... } Then for UDT custom iterators, ptr, end and op[Pre|Post][Inc|Dec] could be overloaded? Just a thought..

Forums