November 04, 2006
Walter Bright wrote:

> You're right about that being a problem with opApply.

Does it have to be?

If the body of the opApply is available to the compiler and the delegate parameter is a compile time constant (as it is in the normal foreach case), does anything prevent the compiler from inlining the opApply together with the delegate body?

Would there still be a speed hit if this was done?

I have many cases that would gain a lot if the compiler inlined functions taking compile time known delegates together with the delegate call. I hope there is nothing fundamental preventing this from happening.

/Oskar

November 04, 2006
Oskar Linde wrote:
> Walter Bright wrote:
> 
>> You're right about that being a problem with opApply.
> 
> Does it have to be?
> 
> If the body of the opApply is available to the compiler and the delegate
> parameter is a compile time constant (as it is in the normal foreach case),
> does anything prevent the compiler from inlining the opApply together with
> the delegate body?
> 
> Would there still be a speed hit if this was done?
> 
> I have many cases that would gain a lot if the compiler inlined functions
> taking compile time known delegates together with the delegate call. I hope
> there is nothing fundamental preventing this from happening.

The compiler doesn't inline functions that have loops in them. This isn't a fault in D, but a structural limitation in the way the dmd compiler works.

I find opApply particularly slick for things like recursive directory traversals, associative array walking, etc. But for ordinary linear collection types, it is hard to generate code as good from it as pointer stepping.
November 04, 2006
Georg Wrede wrote:
> 
> Back to foreach: it's syntactically a very convenient way to go through a trivial collection (i.e. an array or a list). But for non-trivial containers (or data structures), foreach simply is insufficient. You need an iterator.

Assuming you simply want to visit every element once then I think foreach is fine.  In fact, it's generally simpler than implementing an iterator, given an arbitrary container type.  Where foreach falls down is if you want to stop after N iterations and resume later.  It's possible, but doing so resembles implementing random access iterators for a linked list.

> Iterators were invented for the very purpose of
> walking through a black-box container, if you will. I'd almost say that container-specific iterators are the only way to guarantee the lowest complexity walk-through of an arbitrary black-box container.

I think iterators are intended to represent an arbitrary position in a sequence and present a means for visiting the remaining elements.  It's the "arbitrary position" part that foreach can't do.

> One could of course have foreach actually employ iterators. This gives the ease of coding with foreach combined with the flexiblity, decoupling and orthogonality offered by iterators.

Yup, and this is what C++ does with its for_each, transform, and other algorithms.

> But, as in the previous post, even this becomes cumbersome when we're iterating over more than one container simultaneously, especially when they're of different sizes or iterating in non-lockstep.

It's not terribly difficult--just a for loop with two or more iterators and conditions--but I would like to believe that this can be simplified.

> In general, the essence and elegance of the STL has not been too prevalent in D libraries or algorithms. With the current expressive power of D, one might expect to soon start seeing some of this.

I agree.  Currently, I've implemented basically all of the C++ algorithm library plus a few additional functions, but only for arrays.  Without iterators, I simply don't see the point in extending many of the algorithms to support other container types.  opIndex simply doesn't apply well to most other container types.


Sean
November 04, 2006
Walter Bright wrote:
> Georg Wrede wrote:
>> Walter Bright wrote:
>>> On the other hand, C++ iterators aren't a general substitute for opApply. For example, it is not at all easy to turn a recursive data structure (i.e. binary tree) into a linearized iterator.
>>
>> If I'm not missing the essence here, there are several kinds of iterators, and not every data structure should try to provide all of these. For example, an iterator into a tree structure might be breadth-first or depth-first, but one wouldn't expect a linearized (if I understand the term in this context at all) iterator to exist. Conversely, an array is hardly thinkable without a linear iterator.
> 
> Given the structure:
> 
> class Node
> {    Node left;
>     Node right;
>     ...data...
> }
> 
> how do you iterate that with an iterator? This is what I mean by 'linearizing' it. The STL sidesteps the problem by avoiding providing such a container.

The problem isn't so much "how" (it's really not terribly difficult) as that the memory requirements and copy semantics for such an iterator could be quite high, and the STL assumes iterators are comparable to pointers in terms of how they may be manipulated.


Sean
November 05, 2006
Oskar Linde wrote:
> Sean Kelly wrote:
> 
>>  From this it seems clear that foreach is not sufficiently adaptable to meet the needs of all sequential algorithms and opIndex is not a reasonable substitute for all cases where foreach may not be used.  What alternatives do we have?  So far, iterator variants are all I've been able to come up with.
> 
> I agree, and I have been saying this countless times. :)
> 
> Why don't we come up with a standard iterator concept for D?
> For a forward iterator, C++ has:
> 
> *i
> i++ / ++i
> i != end
> 
> In the post "Iterators (Was: Re: Lazy eval -- an example issue)" in digitalmars.D ( news://news.digitalmars.com:119/ecmvec$ksl$1@digitaldaemon.com ) I suggested a simple java-style variant:
> 
> *i.next()
> i.hasNext()

I think a Java style iterator is the way to go for D.  Personally, I basically never find a need for bidirectional iterators, and I only use random access iterators with vectors, which correspond to D's built-in dynamic arrays.  Also, a single iterator that knows when it's reached the end of the sequence better suits being used with foreach than the pointer-style iterators of C++.  About the only thing I don't like about Java iterators is how they work ;-)  Only being able to access the referenced data by calling next() is a bit irritating.  That said, it does eliminate the need for iterators to potentially cache the current value when iterating across sequences that don't inherently store data, such as an input stream.  Still, I think it's useful to have in an iterator, so perhaps something roughly like this:

interface Iterator(T)
{
    T val();
    T next();
    bool hasNext();
}

interface WriteIterator(T) : Iterator(T)
{
    T val( T n );
}

> A generic opApply could then be implemented as:
> 
> template opApplyMixin() {
>   int opApply(int delegate(inout ValueType v) dg) {
>     ValueType *t;
>     while(hasNext()) {
>       t = next();
>       if (auto status = dg(*t))
>         return status;
>     }
>     return 0;
>   }
> }
> 
> And a simple array iterator wrapper (Could of course be more efficiently implemented):
> 
> struct ArrayForwardIterator(T) {
>   T[] arr;
>   alias T ValueType;
> 
>   T* next() { arr = arr[1..$]; return arr.ptr-1; }
>   bool hasNext() { return arr.length > 0; }
>   mixin opApplyMixin;
> }

Looks good.

> If pointers should be avoided, and since D doesn't have an opDeref, but has lhs/rhs overloads for indexing, another iterator variant could be to use
> 
> i[]
> i[]=
> 
> for rhs, and lhs iterator dereferencing.

I prefer the 'val' approach.  It seems a bit more meaningful than something with operators.

> i++ / ++i could be used instead of .next() if it is any clearer, and so on...

I'm not sure how I feel about using structs for iterators (instead of classes).  With classes, we can retain some semblance of control over copy semantics, but we can't with structs.  And I'm not entirely sure I think we should be able to copy iterators at will--consider Walter's case of an iterator for a traditional binary tree, for example.  Also, using structs forces user code to either be hard-coded to work with a specific iterator type or to use templates.  And I'll admit I do sort of like the idea of:

    void myFunction( Iterator!(int) x ) {}

being possible.

> It doesn't really matter what convention we pick. For better rather than worse, D doesn't allow us to do as C++ does, make iterators interchangeable with pointers. We may as well pick anything we like as long as the compiler is able to make reasonable efficient code out of it.

Agreed.

> As I said in the above referenced post, I have been toying with making a template iterator/array algorithm library. I've got a quite substantial proof of concept working, but have not had much time lately. I do love discussions about this though. :)

I've been focusing on arrays thus far, but am getting to the point where I'd like to extend some algorithms to work for user-defined containers as well.  And that obviously requires a decent convention for how iterators should be implemented.  Also, I'll admit that I really enjoy algorithm talk as well :-)


Sean
November 05, 2006
Sean Kelly wrote:
> Still, I think it's useful to have in an iterator, so perhaps something roughly like this:
> 
> interface Iterator(T)
> {
>     T val();
>     T next();
>     bool hasNext();
> }

Err... this doesn't work for an iterator returned for an empty container.  Perhaps:

interface Iterator(T)
{
    T val();
    T next();
    bool atEnd();
}

Anyway, you get the idea.


Sean
November 05, 2006
Walter Bright wrote:
> Georg Wrede wrote:
> 
>> Walter Bright wrote:
>>
>>> On the other hand, C++ iterators aren't a general substitute for opApply. For example, it is not at all easy to turn a recursive data structure (i.e. binary tree) into a linearized iterator.
>>
>>
>> If I'm not missing the essence here, there are several kinds of iterators, and not every data structure should try to provide all of these. For example, an iterator into a tree structure might be breadth-first or depth-first, but one wouldn't expect a linearized (if I understand the term in this context at all) iterator to exist. Conversely, an array is hardly thinkable without a linear iterator.
> 
> 
> Given the structure:
> 
> class Node
> {    Node left;
>     Node right;
>     ...data...
> }
> 
> how do you iterate that with an iterator? This is what I mean by 'linearizing' it. The STL sidesteps the problem by avoiding providing such a container.

You're right. This pointer-to-array semantics kept the iterators primitive, I'd almost forgotten how primitive.

So, to traverse the example, one obviously needs an iterator with more state than just the current node. In practice this state would be stored in a private, persistent LIFO.

November 05, 2006
Sean Kelly wrote:
> Sean Kelly wrote:
> 
>> Still, I think it's useful to have in an iterator, so perhaps something roughly like this:
>>
>> interface Iterator(T)
>> {
>>     T val();
>>     T next();
>>     bool hasNext();
>> }
> 
> 
> Err... this doesn't work for an iterator returned for an empty container.  

It's fine if you define the behavior to be that you have to call next() once to get the first value.  So for iteration idiom becomes:

T v;
while(c.hasNext()) )
{
}
November 05, 2006
Bill Baxter wrote:
> Sean Kelly wrote:
> 
>> Sean Kelly wrote:
>>
>>> Still, I think it's useful to have in an iterator, so perhaps something roughly like this:
>>>
>>> interface Iterator(T)
>>> {
>>>     T val();
>>>     T next();
>>>     bool hasNext();
>>> }
>>
>>
>>
>> Err... this doesn't work for an iterator returned for an empty container.  
> 
> 
> It's fine if you define the behavior to be that you have to call next() once to get the first value.  So for iteration idiom becomes:
> 
> T v;
> while(c.hasNext()) )
> {
> }

...another victem of Ctrl+Enter....

while(c.hasNext())
{
   T v = c.next();
}

I think an alias for T also needs to be a part of the definition of the iterator.  Like  alias T value_type.

Also would do you do a mutating iteration over collections of basic value types (eg floats), structs, and classes?  I.e. if you want to do something like:

while(c.hasNext())
{
   T v = c.next();
   v += 2;
}
where T could be float, struct, or class.  Ok, class case is fine, but the other two aren't clear to me.  Mutable iterator over structs could maybe return a pointer to each struct.  Basic value type, I'm not sure.  Just woke up though... maybe it's obvious and my head just isn't working yet.

--bb
November 05, 2006
Bill Baxter wrote:
> Sean Kelly wrote:
>> Sean Kelly wrote:
>>
>>> Still, I think it's useful to have in an iterator, so perhaps something roughly like this:
>>>
>>> interface Iterator(T)
>>> {
>>>     T val();
>>>     T next();
>>>     bool hasNext();
>>> }
>>
>>
>> Err... this doesn't work for an iterator returned for an empty container.  
> 
> It's fine if you define the behavior to be that you have to call next() once to get the first value.  So for iteration idiom becomes:
> 
> T v;
> while(c.hasNext()) )
> {
> }

Yup, but this seems confusing with the presence of val() methods.