View mode: basic / threaded / horizontal-split · Log in · Help
November 04, 2006
Re: foreach vs. iterators
Walter Bright wrote:

> You're right about that being a problem with opApply.

Does it have to be?

If the body of the opApply is available to the compiler and the delegate
parameter is a compile time constant (as it is in the normal foreach case),
does anything prevent the compiler from inlining the opApply together with
the delegate body?

Would there still be a speed hit if this was done?

I have many cases that would gain a lot if the compiler inlined functions
taking compile time known delegates together with the delegate call. I hope
there is nothing fundamental preventing this from happening.

/Oskar
November 04, 2006
Re: foreach vs. iterators
Oskar Linde wrote:
> Walter Bright wrote:
> 
>> You're right about that being a problem with opApply.
> 
> Does it have to be?
> 
> If the body of the opApply is available to the compiler and the delegate
> parameter is a compile time constant (as it is in the normal foreach case),
> does anything prevent the compiler from inlining the opApply together with
> the delegate body?
> 
> Would there still be a speed hit if this was done?
> 
> I have many cases that would gain a lot if the compiler inlined functions
> taking compile time known delegates together with the delegate call. I hope
> there is nothing fundamental preventing this from happening.

The compiler doesn't inline functions that have loops in them. This 
isn't a fault in D, but a structural limitation in the way the dmd 
compiler works.

I find opApply particularly slick for things like recursive directory 
traversals, associative array walking, etc. But for ordinary linear 
collection types, it is hard to generate code as good from it as pointer 
stepping.
November 04, 2006
Re: foreach vs. iterators
Georg Wrede wrote:
> 
> Back to foreach: it's syntactically a very convenient way to go through 
> a trivial collection (i.e. an array or a list). But for non-trivial 
> containers (or data structures), foreach simply is insufficient. You 
> need an iterator.

Assuming you simply want to visit every element once then I think 
foreach is fine.  In fact, it's generally simpler than implementing an 
iterator, given an arbitrary container type.  Where foreach falls down 
is if you want to stop after N iterations and resume later.  It's 
possible, but doing so resembles implementing random access iterators 
for a linked list.

> Iterators were invented for the very purpose of
> walking through a black-box container, if you will. I'd almost say that 
> container-specific iterators are the only way to guarantee the lowest 
> complexity walk-through of an arbitrary black-box container.

I think iterators are intended to represent an arbitrary position in a 
sequence and present a means for visiting the remaining elements.  It's 
the "arbitrary position" part that foreach can't do.

> One could of course have foreach actually employ iterators. This gives 
> the ease of coding with foreach combined with the flexiblity, decoupling 
> and orthogonality offered by iterators.

Yup, and this is what C++ does with its for_each, transform, and other 
algorithms.

> But, as in the previous post, even this becomes cumbersome when we're 
> iterating over more than one container simultaneously, especially when 
> they're of different sizes or iterating in non-lockstep.

It's not terribly difficult--just a for loop with two or more iterators 
and conditions--but I would like to believe that this can be simplified.

> In general, the essence and elegance of the STL has not been too 
> prevalent in D libraries or algorithms. With the current expressive 
> power of D, one might expect to soon start seeing some of this.

I agree.  Currently, I've implemented basically all of the C++ algorithm 
library plus a few additional functions, but only for arrays.  Without 
iterators, I simply don't see the point in extending many of the 
algorithms to support other container types.  opIndex simply doesn't 
apply well to most other container types.


Sean
November 04, 2006
Re: foreach vs. iterators
Walter Bright wrote:
> Georg Wrede wrote:
>> Walter Bright wrote:
>>> On the other hand, C++ iterators aren't a general substitute for 
>>> opApply. For example, it is not at all easy to turn a recursive data 
>>> structure (i.e. binary tree) into a linearized iterator.
>>
>> If I'm not missing the essence here, there are several kinds of 
>> iterators, and not every data structure should try to provide all of 
>> these. For example, an iterator into a tree structure might be 
>> breadth-first or depth-first, but one wouldn't expect a linearized (if 
>> I understand the term in this context at all) iterator to exist. 
>> Conversely, an array is hardly thinkable without a linear iterator.
> 
> Given the structure:
> 
> class Node
> {    Node left;
>     Node right;
>     ...data...
> }
> 
> how do you iterate that with an iterator? This is what I mean by 
> 'linearizing' it. The STL sidesteps the problem by avoiding providing 
> such a container.

The problem isn't so much "how" (it's really not terribly difficult) as 
that the memory requirements and copy semantics for such an iterator 
could be quite high, and the STL assumes iterators are comparable to 
pointers in terms of how they may be manipulated.


Sean
November 05, 2006
Re: foreach vs. iterators
Oskar Linde wrote:
> Sean Kelly wrote:
> 
>>  From this it seems clear that foreach is not sufficiently adaptable 
>> to meet the needs of all sequential algorithms and opIndex is not a 
>> reasonable substitute for all cases where foreach may not be used.  
>> What alternatives do we have?  So far, iterator variants are all I've 
>> been able to come up with.
> 
> I agree, and I have been saying this countless times. :)
> 
> Why don't we come up with a standard iterator concept for D?
> For a forward iterator, C++ has:
> 
> *i
> i++ / ++i
> i != end
> 
> In the post "Iterators (Was: Re: Lazy eval -- an example issue)" in 
> digitalmars.D ( 
> news://news.digitalmars.com:119/ecmvec$ksl$1@digitaldaemon.com ) I 
> suggested a simple java-style variant:
> 
> *i.next()
> i.hasNext()

I think a Java style iterator is the way to go for D.  Personally, I 
basically never find a need for bidirectional iterators, and I only use 
random access iterators with vectors, which correspond to D's built-in 
dynamic arrays.  Also, a single iterator that knows when it's reached 
the end of the sequence better suits being used with foreach than the 
pointer-style iterators of C++.  About the only thing I don't like about 
Java iterators is how they work ;-)  Only being able to access the 
referenced data by calling next() is a bit irritating.  That said, it 
does eliminate the need for iterators to potentially cache the current 
value when iterating across sequences that don't inherently store data, 
such as an input stream.  Still, I think it's useful to have in an 
iterator, so perhaps something roughly like this:

interface Iterator(T)
{
    T val();
    T next();
    bool hasNext();
}

interface WriteIterator(T) : Iterator(T)
{
    T val( T n );
}

> A generic opApply could then be implemented as:
> 
> template opApplyMixin() {
>   int opApply(int delegate(inout ValueType v) dg) {
>     ValueType *t;
>     while(hasNext()) {
>       t = next();
>       if (auto status = dg(*t))
>         return status;
>     }
>     return 0;
>   }
> }
> 
> And a simple array iterator wrapper (Could of course be more efficiently 
> implemented):
> 
> struct ArrayForwardIterator(T) {
>   T[] arr;
>   alias T ValueType;
> 
>   T* next() { arr = arr[1..$]; return arr.ptr-1; }
>   bool hasNext() { return arr.length > 0; }
>   mixin opApplyMixin;
> }

Looks good.

> If pointers should be avoided, and since D doesn't have an opDeref, but 
> has lhs/rhs overloads for indexing, another iterator variant could be to 
> use
> 
> i[]
> i[]=
> 
> for rhs, and lhs iterator dereferencing.

I prefer the 'val' approach.  It seems a bit more meaningful than 
something with operators.

> i++ / ++i could be used instead of .next() if it is any clearer, and so 
> on...

I'm not sure how I feel about using structs for iterators (instead of 
classes).  With classes, we can retain some semblance of control over 
copy semantics, but we can't with structs.  And I'm not entirely sure I 
think we should be able to copy iterators at will--consider Walter's 
case of an iterator for a traditional binary tree, for example.  Also, 
using structs forces user code to either be hard-coded to work with a 
specific iterator type or to use templates.  And I'll admit I do sort of 
like the idea of:

    void myFunction( Iterator!(int) x ) {}

being possible.

> It doesn't really matter what convention we pick. For better rather than 
> worse, D doesn't allow us to do as C++ does, make iterators 
> interchangeable with pointers. We may as well pick anything we like as 
> long as the compiler is able to make reasonable efficient code out of it.

Agreed.

> As I said in the above referenced post, I have been toying with making a 
> template iterator/array algorithm library. I've got a quite substantial 
> proof of concept working, but have not had much time lately. I do love 
> discussions about this though. :)

I've been focusing on arrays thus far, but am getting to the point where 
I'd like to extend some algorithms to work for user-defined containers 
as well.  And that obviously requires a decent convention for how 
iterators should be implemented.  Also, I'll admit that I really enjoy 
algorithm talk as well :-)


Sean
November 05, 2006
Re: foreach vs. iterators
Sean Kelly wrote:
> Still, I think it's useful to have in an 
> iterator, so perhaps something roughly like this:
> 
> interface Iterator(T)
> {
>     T val();
>     T next();
>     bool hasNext();
> }

Err... this doesn't work for an iterator returned for an empty 
container.  Perhaps:

interface Iterator(T)
{
    T val();
    T next();
    bool atEnd();
}

Anyway, you get the idea.


Sean
November 05, 2006
Re: foreach vs. iterators
Walter Bright wrote:
> Georg Wrede wrote:
> 
>> Walter Bright wrote:
>>
>>> On the other hand, C++ iterators aren't a general substitute for 
>>> opApply. For example, it is not at all easy to turn a recursive data 
>>> structure (i.e. binary tree) into a linearized iterator.
>>
>>
>> If I'm not missing the essence here, there are several kinds of 
>> iterators, and not every data structure should try to provide all of 
>> these. For example, an iterator into a tree structure might be 
>> breadth-first or depth-first, but one wouldn't expect a linearized (if 
>> I understand the term in this context at all) iterator to exist. 
>> Conversely, an array is hardly thinkable without a linear iterator.
> 
> 
> Given the structure:
> 
> class Node
> {    Node left;
>     Node right;
>     ...data...
> }
> 
> how do you iterate that with an iterator? This is what I mean by 
> 'linearizing' it. The STL sidesteps the problem by avoiding providing 
> such a container.

You're right. This pointer-to-array semantics kept the iterators 
primitive, I'd almost forgotten how primitive.

So, to traverse the example, one obviously needs an iterator with more 
state than just the current node. In practice this state would be stored 
in a private, persistent LIFO.
November 05, 2006
Re: foreach vs. iterators
Sean Kelly wrote:
> Sean Kelly wrote:
> 
>> Still, I think it's useful to have in an iterator, so perhaps 
>> something roughly like this:
>>
>> interface Iterator(T)
>> {
>>     T val();
>>     T next();
>>     bool hasNext();
>> }
> 
> 
> Err... this doesn't work for an iterator returned for an empty 
> container.  

It's fine if you define the behavior to be that you have to call next() 
once to get the first value.  So for iteration idiom becomes:

T v;
while(c.hasNext()) )
{
}
November 05, 2006
Re: foreach vs. iterators
Bill Baxter wrote:
> Sean Kelly wrote:
> 
>> Sean Kelly wrote:
>>
>>> Still, I think it's useful to have in an iterator, so perhaps 
>>> something roughly like this:
>>>
>>> interface Iterator(T)
>>> {
>>>     T val();
>>>     T next();
>>>     bool hasNext();
>>> }
>>
>>
>>
>> Err... this doesn't work for an iterator returned for an empty 
>> container.  
> 
> 
> It's fine if you define the behavior to be that you have to call next() 
> once to get the first value.  So for iteration idiom becomes:
> 
> T v;
> while(c.hasNext()) )
> {
> }

...another victem of Ctrl+Enter....

while(c.hasNext())
{
   T v = c.next();
}

I think an alias for T also needs to be a part of the definition of the 
iterator.  Like  alias T value_type.

Also would do you do a mutating iteration over collections of basic 
value types (eg floats), structs, and classes?  I.e. if you want to do 
something like:

while(c.hasNext())
{
   T v = c.next();
   v += 2;
}
where T could be float, struct, or class.  Ok, class case is fine, but 
the other two aren't clear to me.  Mutable iterator over structs could 
maybe return a pointer to each struct.  Basic value type, I'm not sure. 
 Just woke up though... maybe it's obvious and my head just isn't 
working yet.

--bb
November 05, 2006
Re: foreach vs. iterators
Bill Baxter wrote:
> Sean Kelly wrote:
>> Sean Kelly wrote:
>>
>>> Still, I think it's useful to have in an iterator, so perhaps 
>>> something roughly like this:
>>>
>>> interface Iterator(T)
>>> {
>>>     T val();
>>>     T next();
>>>     bool hasNext();
>>> }
>>
>>
>> Err... this doesn't work for an iterator returned for an empty 
>> container.  
> 
> It's fine if you define the behavior to be that you have to call next() 
> once to get the first value.  So for iteration idiom becomes:
> 
> T v;
> while(c.hasNext()) )
> {
> }

Yup, but this seems confusing with the presence of val() methods.
12 13 14 15 16 17
Top | Discussion index | About this forum | D home