View mode: basic / threaded / horizontal-split · Log in · Help
November 03, 2006
Re: foreach vs. iterators
Chris Nicholson-Sauls wrote:
> Bill Baxter wrote:
> 
>> Sean Kelly wrote:
>>
>>> Walter Bright wrote:
>>>
>>>> Sean Kelly wrote:
>>>>
>>>>> I think I agree, but for the sake of argument, how would D handle 
>>>>> 'find'  operations on sequences that don't support opIndex?  At 
>>>>> some point, a generalizable bookmark is almost necessary for a 
>>>>> faithful implementation of some C++ algorithms.  Also, many of 
>>>>> these algorithms also require iteration across two sequences 
>>>>> simultaneously, which doesn't map very cleanly into foreach.  
>>>>> Consider a substring-style find operation (std::search), for 
>>>>> example, where both the pattern and target types do not support 
>>>>> opIndex or opSlice.  I might argue that it's a bit silly or 
>>>>> unrealistic to desire such an operation, but the C++ algorithm 
>>>>> library does support it.
>>>>
>>>>
>>>>
>>>>
>>>> I think the target types will have to support opIndex or opSlice.
>>>
>>>
>>>
>>>
>>> Forgive me for resurrecting a horribly battered equine, but I was 
>>> thinking about this a bit tonight and I've decided that foreach, 
>>> foreach_reverse, and array operations are not a sufficient general 
>>> substitute for C++ iterators.
>>>
>>> For example, let's say I want to compute the set union of two sorted 
>>> ranges, one of size M, the other of size N.  By iterating across the 
>>> two ranges simultaneously is is easy to see that the complexity for 
>>> the operation will be max(M,N).  In C++ this is easily accomplished 
>>> using forward iterators, and the ranges can be anything from arrays 
>>> to SQL result sets.
>>>
>>> In D however, there is no way to iterate across multiple ranges 
>>> simultaneously using foreach, so opIndex must be used instead.  With 
>>> containers that have a constant-time lookup cost this isn't a big 
>>> deal, but what if these containers are binary trees?  The complexity 
>>> for such an operation would be M(log(M)) + N(log(N)).
>>>
>>>  From this it seems clear that foreach is not sufficiently adaptable 
>>> to meet the needs of all sequential algorithms and opIndex is not a 
>>> reasonable substitute for all cases where foreach may not be used.  
>>> What alternatives do we have?  So far, iterator variants are all I've 
>>> been able to come up with.
>>
>>
>>
>> It would be sufficient if D had coroutines and opApply were 
>> implemented using them.
>>
>> The problem is that once you call an opApply function right now, it 
>> has to run to completion.  The only way I can see to have multiple 
>> opApply style iterators going simultaneously if you could truly 
>> suspend execution of one opApply temporarily and run another one for 
>> an iteration.
>>
>> In other words, if you're going to use the stack and function state to 
>> store iterator state, then the only way to have multiple iterators in 
>> different states simultaneously is to be able to have multiple stacks 
>> active simultaneously, aka coroutines.
>>
>> --bb
> 
> 
> Just a thought I had.  Would it be possible to emulate this behavior 
> with delegate-as-aggregate?  What I'm thinking of (and bear in mind I 
> just rolled out of bed, so I'm at quarter-speed right now ;)) is pass 
> foreach a "driver" delegate that then calls step iterator delegates 
> you've provided.  The driver and step iterators would be responsible for 
> maintaining state, and would probably need a reset case (at least the 
> iterators).
> 
> -- Chris Nicholson-Sauls

I was tinkering around with that.  But the problem is that any time you 
call an opApply you end up inside a function that basically does 
something like:

     for (i=0; i<length; i++) {
        ret = loop_body_dg(my_ith_item[i]);
     }

There's no way I can see to stop that juggernaut before it's done, short 
of having it return, and if it returns then it'll lose its state. 
Conceptually what you need is a 'yield' that keeps the current stack 
frame alive, but returns to the caller.  Then you could implement 
something like a zip() who's opApply calls the opApply's of each of its 
arguments, which in turn just run one iteration and each yield one value 
back to the zip().  Zip then packages up the results from one iteration 
of each of its args into a tuple and yields that back to the foreach block.

--bb
November 03, 2006
Re: foreach vs. iterators
Sean Kelly wrote:

>  From this it seems clear that foreach is not sufficiently adaptable to 
> meet the needs of all sequential algorithms and opIndex is not a 
> reasonable substitute for all cases where foreach may not be used.  What 
> alternatives do we have?  So far, iterator variants are all I've been 
> able to come up with.

I agree, and I have been saying this countless times. :)

Why don't we come up with a standard iterator concept for D?
For a forward iterator, C++ has:

*i
i++ / ++i
i != end

In the post "Iterators (Was: Re: Lazy eval -- an example issue)" in 
digitalmars.D ( 
news://news.digitalmars.com:119/ecmvec$ksl$1@digitaldaemon.com ) I 
suggested a simple java-style variant:

*i.next()
i.hasNext()

A generic opApply could then be implemented as:

template opApplyMixin() {
  int opApply(int delegate(inout ValueType v) dg) {
    ValueType *t;
    while(hasNext()) {
      t = next();
      if (auto status = dg(*t))
        return status;
    }
    return 0;
  }
}

And a simple array iterator wrapper (Could of course be more efficiently 
implemented):

struct ArrayForwardIterator(T) {
  T[] arr;
  alias T ValueType;

  T* next() { arr = arr[1..$]; return arr.ptr-1; }
  bool hasNext() { return arr.length > 0; }
  mixin opApplyMixin;
}

If pointers should be avoided, and since D doesn't have an opDeref, but 
has lhs/rhs overloads for indexing, another iterator variant could be to use

i[]
i[]=

for rhs, and lhs iterator dereferencing.

i++ / ++i could be used instead of .next() if it is any clearer, and so 
on...

It doesn't really matter what convention we pick. For better rather than 
worse, D doesn't allow us to do as C++ does, make iterators 
interchangeable with pointers. We may as well pick anything we like as 
long as the compiler is able to make reasonable efficient code out of it.

As I said in the above referenced post, I have been toying with making a 
template iterator/array algorithm library. I've got a quite substantial 
proof of concept working, but have not had much time lately. I do love 
discussions about this though. :)

/Oskar
November 04, 2006
Re: foreach vs. iterators
Walter Bright wrote:
> Sean Kelly wrote:
> 
>> Forgive me for resurrecting a horribly battered equine, but I was 
>> thinking about this a bit tonight and I've decided that foreach, 
>> foreach_reverse, and array operations are not a sufficient general 
>> substitute for C++ iterators.
>>
>> For example, let's say I want to compute the set union of two sorted 
>> ranges, one of size M, the other of size N.  By iterating across the 
>> two ranges simultaneously is is easy to see that the complexity for 
>> the operation will be max(M,N).  In C++ this is easily accomplished 
>> using forward iterators, and the ranges can be anything from arrays to 
>> SQL result sets.
>>
>> In D however, there is no way to iterate across multiple ranges 
>> simultaneously using foreach, so opIndex must be used instead.  With 
>> containers that have a constant-time lookup cost this isn't a big 
>> deal, but what if these containers are binary trees?  The complexity 
>> for such an operation would be M(log(M)) + N(log(N)).
>>
>>  From this it seems clear that foreach is not sufficiently adaptable 
>> to meet the needs of all sequential algorithms and opIndex is not a 
>> reasonable substitute for all cases where foreach may not be used.  
>> What alternatives do we have?  So far, iterator variants are all I've 
>> been able to come up with.
> 
> You are correct in regards to looping across multiple aggregates 
> simultaneously.
> 
> On the other hand, C++ iterators aren't a general substitute for 
> opApply. For example, it is not at all easy to turn a recursive data 
> structure (i.e. binary tree) into a linearized iterator.

If I'm not missing the essence here, there are several kinds of 
iterators, and not every data structure should try to provide all of 
these. For example, an iterator into a tree structure might be 
breadth-first or depth-first, but one wouldn't expect a linearized (if I 
understand the term in this context at all) iterator to exist. 
Conversely, an array is hardly thinkable without a linear iterator.

What Stepanov explained to me at length (back when I was studying CS and 
there was not a single book published on the STL) was that the different 
containers all have their specific sets of "natural" iterators, and that 
only where it is convenient or at all implementable, other iterators may 
exist to them. And that it's not like you have a container and then need 
a certain iterator to it, but that you choose the container based on 
what kind of iterators you need in the first place.

Then it's about the iterator just being a way to reach every item 
exactly once. And if you don't seem to get them in the order you want, 
then you sort the items, say into a list. (Or rethink the problem.)

Back to foreach: it's syntactically a very convenient way to go through 
a trivial collection (i.e. an array or a list). But for non-trivial 
containers (or data structures), foreach simply is insufficient. You 
need an iterator. Iterators were invented for the very purpose of 
walking through a black-box container, if you will. I'd almost say that 
container-specific iterators are the only way to guarantee the lowest 
complexity walk-through of an arbitrary black-box container.

That said, it is of course customary to provide "wrong" iterators to 
containers (where at all possible, as conveniences), for example a 
random access iterator to a linked list. But then the user should stay 
aware that using such is only for the cases where the choice of 
container has already been evaluated to be optimal, and where one only 
occasionally needs to access some of the items in a specific way. In 
other words, where this need is seldom enough to not warrant a temporary 
copy (or linking) of the items into a more suitable container.

One could of course have foreach actually employ iterators. This gives 
the ease of coding with foreach combined with the flexiblity, decoupling 
and orthogonality offered by iterators.

But, as in the previous post, even this becomes cumbersome when we're 
iterating over more than one container simultaneously, especially when 
they're of different sizes or iterating in non-lockstep.

---

In general, the essence and elegance of the STL has not been too 
prevalent in D libraries or algorithms. With the current expressive 
power of D, one might expect to soon start seeing some of this.
November 04, 2006
Re: foreach vs. iterators
Bill Baxter wrote:
> There's no way I can see to stop that juggernaut before it's done, short 
> of having it return, and if it returns then it'll lose its state. 
> Conceptually what you need is a 'yield' that keeps the current stack 
> frame alive, but returns to the caller.

But then your object can only have one iterator at a time, and what if 
you want to have multiple simultaneous iterators for any given object? 
The 'yield' keyword can only keep track of state for a single iterator, 
so this kind of code would fail:

  foreach (Item outerItem; collection) {
    foreach (Item innerItem; collection) {
      doSomething(outerItem, innerItem);
    }
  }

This should be semantically identical to:

  for (int i = 0; i < collection.length; i++) {
    Item outerItem = collection[i];
    for (int j = 0; j < collection.length; j++) {
      Item innerItem = collection[j];
      doSomething(outerItem, innerItem);
    }
  }

But in the 'foreach' example, the 'yield' implementation would get 
confused (as would the opApply method).

If iteration was implemented with iterators (instead of with operator 
overloading and delegates), and if iterators were implemented as nested 
classes within the container class definitions, then iterators could 
access the elements directly, while still keeping their own state 
information. (Such as a pointer to the next iterable tree-node, in a 
InOrderTraversalIterator. Or whatever.)

For example:

class ArrayList {
  private Item[] items;
  private Iterable getForwardIterator() { ... }
  class ForwardIterator : Iterable {
    private Item* nextItem; // Keeps track of state
    public bool hasNext() { ... }
    public Item next() { ... }
  }
}

And then I'd iterate through my items like this:

  foreach (Item outerItem; collection.getForwardIterator()) {
    foreach (Item innerItem; collection.getForwardIterator()) {
      doSomething(outerItem, innerItem);
    }
  }

In most cases, if your iterator can keep track of its own state, and if 
the iterator is a nested class with access to all of its containing 
class's members, then the 'yield' keyword is unnecessary.

(The most compelling argument for 'yield' is not for the implementation 
of iterators, but for building generators.)

But please, please reconsider the foreach * foreach_reverse 
implementations. Forcing a collection class to be aware of its own 
iterables is fugly as can be.

--benji
November 04, 2006
Re: foreach vs. iterators
Benji Smith wrote:
> Bill Baxter wrote:
> 
>> There's no way I can see to stop that juggernaut before it's done, 
>> short of having it return, and if it returns then it'll lose its 
>> state. Conceptually what you need is a 'yield' that keeps the current 
>> stack frame alive, but returns to the caller.
> 
> 
> But then your object can only have one iterator at a time, and what if 
> you want to have multiple simultaneous iterators for any given object? 
> The 'yield' keyword can only keep track of state for a single iterator, 
> so this kind of code would fail:
> 
>   foreach (Item outerItem; collection) {
>     foreach (Item innerItem; collection) {
>       doSomething(outerItem, innerItem);
>     }
>   }

No it wouldn't.  Not if you had support for real coroutines.  That's the 
whole point of coroutines.  You can have multiple stack frames active at 
the same time.   Each stack frame for the opApply's in the above case 
would have its own local iterator state.

Now whether or not coroutines are realistically implementable in a 
systems programming language like D is beyond me.  Stackless Python 
folks did it, but perhaps they're taking advantage of the fact that 
users of interpreted code are a bit more forgiving about runtime 
overhead, and the fact that the interpreter has its own notion of the 
stack that doesn't depend on low-level processor details like EAX 
registers etc.  But technically I don't see any roadblocks.  I mean I 
read somewhere that coroutines are an old assembly programming trick, so 
the hardware is certainly capable of supporting the paradigm at the low 
level.  It's basically just a jmp plus a manipulation of the stack 
pointer when it comes right down to it.

--bb
November 04, 2006
Re: foreach vs. iterators
Georg Wrede wrote:
> Walter Bright wrote:
>> On the other hand, C++ iterators aren't a general substitute for 
>> opApply. For example, it is not at all easy to turn a recursive data 
>> structure (i.e. binary tree) into a linearized iterator.
> 
> If I'm not missing the essence here, there are several kinds of 
> iterators, and not every data structure should try to provide all of 
> these. For example, an iterator into a tree structure might be 
> breadth-first or depth-first, but one wouldn't expect a linearized (if I 
> understand the term in this context at all) iterator to exist. 
> Conversely, an array is hardly thinkable without a linear iterator.

Given the structure:

class Node
{	Node left;
	Node right;
	...data...
}

how do you iterate that with an iterator? This is what I mean by 
'linearizing' it. The STL sidesteps the problem by avoiding providing 
such a container.
November 04, 2006
Re: foreach vs. iterators
Oskar Linde wrote:
> Sean Kelly wrote:
> 
>>  From this it seems clear that foreach is not sufficiently adaptable 
>> to meet the needs of all sequential algorithms and opIndex is not a 
>> reasonable substitute for all cases where foreach may not be used.  
>> What alternatives do we have?  So far, iterator variants are all I've 
>> been able to come up with.
> 
> 
> I agree, and I have been saying this countless times. :)
> 
> Why don't we come up with a standard iterator concept for D?
> For a forward iterator, C++ has:
> 
> *i
> i++ / ++i
> i != end

One big concern I have with the current iterators is that the approach 
supported for the general case is deemed too inefficient for built-in 
arrays.  So arrays are special-cased by the compiler in the foreach 
logic to become more like a simple for loop.  But I want that speed for 
foreach-ing my own classes too!  Why should I be satisfied with less 
that peak performance for *my* classes if it's not considered good 
enough for built-in classes?

I have to pay this penalty even if my class is a simple wrapper around 
an array, that say does nothing more than add an opAdd so I can add two 
arrays together.


Plain old C++-style iterators don't have that problem.  For simple cases 
iter++  bascially compiles down to be equivalent to pointer manipulation.

--bb
November 04, 2006
Re: foreach vs. iterators
Bill Baxter wrote:
> Benji Smith wrote:
>> Bill Baxter wrote:
>>
>>> There's no way I can see to stop that juggernaut before it's done, 
>>> short of having it return, and if it returns then it'll lose its 
>>> state. Conceptually what you need is a 'yield' that keeps the current 
>>> stack frame alive, but returns to the caller.
>>
>>
>> But then your object can only have one iterator at a time, and what if 
>> you want to have multiple simultaneous iterators for any given object? 
>> The 'yield' keyword can only keep track of state for a single 
>> iterator, so this kind of code would fail:
>>
>>   foreach (Item outerItem; collection) {
>>     foreach (Item innerItem; collection) {
>>       doSomething(outerItem, innerItem);
>>     }
>>   }
> 
> No it wouldn't.  Not if you had support for real coroutines.  That's the 
> whole point of coroutines.  You can have multiple stack frames active at 
> the same time.   Each stack frame for the opApply's in the above case 
> would have its own local iterator state.
> 
> Now whether or not coroutines are realistically implementable in a 
> systems programming language like D is beyond me.  Stackless Python 
> folks did it, but perhaps they're taking advantage of the fact that 
> users of interpreted code are a bit more forgiving about runtime 
> overhead, and the fact that the interpreter has its own notion of the 
> stack that doesn't depend on low-level processor details like EAX 
> registers etc.  But technically I don't see any roadblocks.  I mean I 
> read somewhere that coroutines are an old assembly programming trick, so 
> the hardware is certainly capable of supporting the paradigm at the low 
> level.  It's basically just a jmp plus a manipulation of the stack 
> pointer when it comes right down to it.
> 
> --bb

I would direct your attention to StackThreads:

http://assertfalse.com/articles/stFAQ.shtml

Pyd uses this nifty package to wrap a class's opApply function with 
Python's iteration interface. (Which uses iterator objects with a 
.next() method.) The magic may be found in this bit of code:

http://www.dsource.org/projects/pyd/browser/trunk/infrastructure/pyd/iteration.d

-- 
Kirk McDonald
Pyd: Wrapping Python with D
http://pyd.dsource.org
November 04, 2006
Re: foreach vs. iterators
Bill Baxter wrote:
> Now whether or not coroutines are realistically implementable in a 
> systems programming language like D is beyond me.  Stackless Python 
> folks did it, but perhaps they're taking advantage of the fact that 
> users of interpreted code are a bit more forgiving about runtime 
> overhead, and the fact that the interpreter has its own notion of the 
> stack that doesn't depend on low-level processor details like EAX 
> registers etc.  But technically I don't see any roadblocks.  I mean I 
> read somewhere that coroutines are an old assembly programming trick, so 
> the hardware is certainly capable of supporting the paradigm at the low 
> level.  It's basically just a jmp plus a manipulation of the stack 
> pointer when it comes right down to it.

It's not so easy to implement in the general case (asm code takes 
advantage of special, easy cases). To make it work, you need to allocate 
the stack frame of the coroutine on the heap. Not impossible, but a lot 
of work involved.
November 04, 2006
Re: foreach vs. iterators
You're right about that being a problem with opApply.
11 12 13 14 15 16 17
Top | Discussion index | About this forum | D home