Iterators for D (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Iterators for D (page 2)

November 06, 2006

Re: Iterators for D

Posted by Mike Capp
in reply to Walter Bright

Mike Capp

Posted in reply to Walter Bright

> Overload opIndex for rvalue access
> [...]
> Overloading opIndex also will work for random access

Wouldn't this make iterators incompatible with e.g. linked lists? Seems to defeat much of the purpose.

(It might be possible to implement random access for a list, but it's not something you'd want to encourage.)

November 06, 2006

Re: Iterators for D

Posted by Hasan Aljudy
in reply to rm

Hasan Aljudy

Posted in reply to rm


rm wrote:
> Walter Bright wrote:
>> Walter Bright wrote:
>>> .being property returns an iterator that starts at the beginning
>>
>> Make that .begin
>>
>> Also, overloading += will be used to advance the iterator.
> 
> that would give something like this?
> 
> import std.stdio;
> 
> void main()
> {
>     char[] s = "hello";
>     // += for advancing an iterator, why not ++?
>     for (iterator i = s.begin; i != s.end; i += 1)
>         writefln(s[i]);
> 
>     // ??? i is an iterator or a char  ???
>     foreach(i; s)
>         // extend writefln to take iterators ?
>         // but that would mean that it must be possible
>         // to determine a value from an iterator
>         writefln(i);
> }
> 

Why would I use
>     for (iterator i = s.begin; i != s.end; i += 1)
>         writefln(s[i]);

instead of:

>     for (int i = 0; i < s.length; i++)
>         writefln(s[i]);

?

November 06, 2006

Re: Iterators for D

Posted by Kirk McDonald
in reply to Walter Bright

Kirk McDonald

Posted in reply to Walter Bright

Walter Bright wrote:
> It's becoming increasingly obvious that D needs iterators. While opApply  is far better for some kinds of iteration (such as recursively traversing a directory), iterators are more efficient in some cases, and allow for things that opApply makes difficult.
> 
> So hear are the design goals:
> 
> 1) Works with dynamic and stat arrays
> 2) Doesn't need to work with associative arrays
> 3) Should be possible to achieve "pointer efficiency" with it
> 4) Needs to be able to restrict lvalue access (what C++ does with const iterators)
> 5) Needs to work seamlessly with foreach
> 6) Iterators need to be copyable
> 
> So here's one design:
> 
> .begin property returns an iterator that starts at the beginning
> 
> .end returns an iterator that is at the end
> 
> Overloading * doesn't work with D. But, instead:
> 

What are the arguments against that, again? I do not recall, and it's worth at least reviewing why we don't have opDeref and opDerefAssign.

As you know, Walter, a major benefit of C++-style iterators is that they are semantically as close as possible to pointers. Would [] be changed to dereference pointers? I sure hope not. If we go this route, I would suggest adding these unary-* overloads.

> Overload opIndex for rvalue access
> Overload opIndexAssign for lvalue access
> 

I think you mean opSlice and opSliceAssign.

Bar bar;
auto foo = bar.begin();
Bar baz = foo[];
foo[] = Bar.init;

> Overloading opIndex also will work for random access
> 
> foreach loops will not be able to have a 'key' parameter.

So, I would assume these iterators would work something like this:

If a class provides .begin and .end, and these both return the same type, and that type provides the requisite iterator methods, it is foreach-able.

If a class provides .begin, .end, and opApply, one of the two iteration methods has to take precedence. I would hope it's opApply.

For a type T, its associated iterator type should be available via T.iterator. This has to be standard.

An iterable type T MUST provide these methods:
I begin()
I end()

An iterator type I MUST provide the following overloads:
E opSlice()
I opPostInc()

E is the type of an element in the collection.

These are enough to describe a read-only forward iterator, which is adequate for foreach. In other words, given a type T that meets the requirements outlined above, the following is adequate to iterate through it:

foreach(e; t) {
    // ...
}

'e' would be of type E.

We run into a curious problem involving pre-increment, which uses opAddAssign. Classes for which random-access is problematic should not be required to provide this, which is essentially a random-access method. There are two options here:

1) Only require that iterators support opPostInc. (This further removes them from actual pointers.)

2) Introduce a convention whereby a non-random-access iterator throws an exception if any value other than 1 is passed to its opAddAssign overload.

Extending these rules for bidirectional iterators and random-access iterators, as well as read-write iterators, is left as an exercise to the reader. (And not a very difficult one.)

All of these strange edge-cases and differences between pointers and possible iterator types convince me that C++-STYLE ITERATORS ARE NOT FOR D. D, unlike C++, does not supply enough operator overloads for their semantics to be identical to those of pointers, which was the whole point of C++-style iterators in the first place.

Instead, we should look to other languages for inspiration. I suggest Java and Python, whose iterator semantics are similar. Other posters have already made suggestions on this front. I would only suggest the "opIter" method for getting an iterator, and that the iterator class itself be available via T.iterator (whether it is a nested class or an alias shouldn't matter). Classes can provide methods for returning alternate iterators, as well. (A class providing opIter should be iterated over as easily as an iterator itself.)

-- 
Kirk McDonald
Pyd: Wrapping Python with D
http://pyd.dsource.org

November 06, 2006

Re: Iterators for D

Posted by Sean Kelly
in reply to rm

Sean Kelly

Posted in reply to rm

rm wrote:
> 
>> I think the Java style may be more appropriate:
>>
>>     foreach( e; s.fwdIter() )
>>     foreach( e; s.revIter() )
>>
> 
> I didn't read the discussion, but wasn't there a topic about foreach_reverse? Should that not cover s.revIter()?

How?  Say I do this:

    foreach_reverse( e; s.begin() )

I've passed an iterator pointing to the beginning of a sequence to an algorithm that wants to iterate backwards, so best case the iterator is bidirectional and we will visit exactly one element (the theoretical "last" in this case) before exiting the loop.

> Secondly, why must one give in the foreach statement the first iterator?
> I'd say that there the iterable must be given?

Something like that, yes.  You need a unified object that knows when it's reached the end of the sequence.  So assuming C++ style iterators you'd have to wrap them in an object that takes care of the comparisons and looping to work with foreach.

Sean

November 06, 2006

Re: Iterators for D

Posted by Walter Bright
in reply to Dave

Walter Bright

Posted in reply to Dave

Dave wrote:
> I should add - will native arrays of structs (say) provide this by default w/o having to add specialized operators?

Yes. It would be no different than for native arrays of ints.

November 06, 2006

Re: Iterators for D

Posted by Walter Bright
in reply to Tom S

Walter Bright

Posted in reply to Tom S

Tom S wrote:
> Walter Bright wrote:
>> foreach loops will not be able to have a 'key' parameter.
> What's the particular reason for that restriction ?

With an iterator, what is the key?

November 06, 2006

Re: Iterators for D

Posted by Walter Bright
in reply to Sean Kelly

Walter Bright

Posted in reply to Sean Kelly

Sean Kelly wrote:
> Is it worth discussing whether the pointer-style C++ iterators are the best approach?  They certainly are if random access iterators are a requirement, but if we're just implementing forward and reverse iterators, the all-in-one Java approach may be better.  In that vein, I  assume alose that reverse iteration would work with reverse iterators and an .rbegin type method instead of .begin and foreach_reverse?

Hmm, or it could be done starting with .end and going to .begin.

November 06, 2006

Re: Iterators for D

Posted by Walter Bright
in reply to Bill Baxter

Walter Bright

Posted in reply to Bill Baxter

Bill Baxter wrote:
> Walter Bright wrote:
>> It's becoming increasingly obvious that D needs iterators. While opApply  is far better for some kinds of iteration (such as recursively traversing a directory), iterators are more efficient in some cases, and allow for things that opApply makes difficult.
>>
>> So hear are the design goals:
>>
>> 1) Works with dynamic and stat arrays
> 
> How about pointers (unknown length arrays)?
>   int *some_numbers;
> 
> Is it required that a pointer be a valid iterator?
> I.e. should it be possible to have this template function:
> 
>    print_all(some_numbers, some_numbers+theLength);
> 
> Or will that case require a wrapper object that contains theLength?
> 
>    auto iter = PointerIterator(some_numbers, theLength);
>    print_all(iter);

It's simpler than that. Pointers can be converted to arrays:
	auto array = some_numbers[0 .. theLength];
and then you can get the iterator.

>> 2) Doesn't need to work with associative arrays
> Why not?  Do you mean it doesn't need to because you can just iterate over keys or values?

Because that falls into my big complaint with iterators - there's no way to efficiently 'linearize' a binary tree. Might as well just iterate over the keys or values.


> That's like C++'s way.  Iterator is basically a generalized pointer.

Yes.

> The other proposal is more like Java/Python/C#'s way, where the iterator  is like a pointer that knows it's own limits.
> The Java/Python/C# way would definitely work in D.   I'm not sure about the C++ way working for D because, in practice, references and dereferencing get used pretty heavily there.  On the other hand Java/Python/C# were designed without real templates.

I think the C++ like way will be a lot more efficient, and I think it will work. Reference types are not needed.

> One of the major design goals for C++ was that for a basic array a regular pointer should be a valid iterator.  Thus the iterator cannot know where it's own end() is, because a pointer does not have such knowledge.  But aside from cute demos on SGI's STL web page where algorithms are applied to plain arrays using pointers,  I've never seen that sort of thing used in the real world.  You almost always use a std::vector or std::list or something besides a raw array.  So pointer equivalence seems to be pretty useless to me.  Even moreso with D.  In the rare case you need to work with raw pointers, you can always make an iterator wrapper.

I think it'll work fine, since pointers can be easily converted to dynamic arrays (see above).


> On the plus side for C++ style:
> * iterators are lightweight (just one size_t/pointer in most cases)
> * algorithms and methods all take begin/end pairs making things uniform.
> Minuses
> * Iterators are dumb and error prone and easily go out of bounds
> * algorithms all require begin/end pairs making things cumbersome
> 
> I think the minuses outweigh the plusses.
> But maybe it's useful to take one concrete example:  A generic mergesort or quicksort algorithm.
> 
> In C++ recursive calls just pass the limits of the range to the children
> mergesort(begin, end)
> {
>     ...
>     mergesort(begin, (end-begin)/2);
>     mergesort((end-begin)/2, end);
> }
> It's very efficient.
> 
> In the iterator-is-object approach I'm not sure how that works.
> If all you've got is .next() or .hasNext() I'm not sure what you're supposed to do to create an iterator over half the range represented by the original iterator.

Iterators can be any type, not just objects, that support [], ++, +=, and =.

>> foreach loops will not be able to have a 'key' parameter.
> Don't understand that one.

What's the key for an unknown iterator type?

November 06, 2006

Re: Iterators for D

Posted by Walter Bright
in reply to Mike Capp

Walter Bright

Posted in reply to Mike Capp

Mike Capp wrote:
>> Overload opIndex for rvalue access
>> [...]
>> Overloading opIndex also will work for random access
> 
> Wouldn't this make iterators incompatible with e.g. linked lists? Seems to defeat
> much of the purpose.

To work, all it has to do is support [0], and throw an exception if the index is anything but 0.

> (It might be possible to implement random access for a list, but it's not
> something you'd want to encourage.)

I agree.

November 06, 2006

Re: Iterators for D

Posted by Walter Bright
in reply to Kirk McDonald

Walter Bright

Posted in reply to Kirk McDonald

Kirk McDonald wrote:
> Walter Bright wrote:
>> Overloading * doesn't work with D. But, instead:
> What are the arguments against that, again?

Because D has many cases of implicit dereferencing, I thought it would cause more trouble than it's worth.

> I do not recall, and it's worth at least reviewing why we don't have opDeref and opDerefAssign.
> 
> As you know, Walter, a major benefit of C++-style iterators is that they are semantically as close as possible to pointers. Would [] be changed to dereference pointers?

You've always been able to:
	int* p;
	p[1] = ...

> I sure hope not. If we go this route, I would suggest adding these unary-* overloads.
> 
>> Overload opIndex for rvalue access
>> Overload opIndexAssign for lvalue access
>>
> 
> I think you mean opSlice and opSliceAssign.

No, I meant Index.

>> Overloading opIndex also will work for random access
>>
>> foreach loops will not be able to have a 'key' parameter.
> 
> So, I would assume these iterators would work something like this:
> 
> If a class provides .begin and .end, and these both return the same type, and that type provides the requisite iterator methods, it is foreach-able.

Yes, though they will return a value, not a type.

> If a class provides .begin, .end, and opApply, one of the two iteration methods has to take precedence. I would hope it's opApply.

I was leaning towards the other way around.

> For a type T, its associated iterator type should be available via T.iterator. This has to be standard.

It's not needed, as typeof(T.begin) will do it.

> An iterable type T MUST provide these methods:
> I begin()
> I end()

Yes.

> An iterator type I MUST provide the following overloads:
> E opSlice()

No. opIndex()

> I opPostInc()

Probably opAddAssign()

> E is the type of an element in the collection.


> These are enough to describe a read-only forward iterator, which is adequate for foreach. In other words, given a type T that meets the requirements outlined above, the following is adequate to iterate through it:
> 
> foreach(e; t) {
>     // ...
> }
> 
> 'e' would be of type E.

Yes.

> We run into a curious problem involving pre-increment, which uses opAddAssign. Classes for which random-access is problematic should not be required to provide this, which is essentially a random-access method. There are two options here:
> 
> 1) Only require that iterators support opPostInc. (This further removes them from actual pointers.)
> 
> 2) Introduce a convention whereby a non-random-access iterator throws an exception if any value other than 1 is passed to its opAddAssign overload.

I could go either way with that.

> Extending these rules for bidirectional iterators and random-access iterators, as well as read-write iterators, is left as an exercise to the reader. (And not a very difficult one.)
> 
> All of these strange edge-cases and differences between pointers and possible iterator types convince me that C++-STYLE ITERATORS ARE NOT FOR D. D, unlike C++, does not supply enough operator overloads for their semantics to be identical to those of pointers, which was the whole point of C++-style iterators in the first place.

I think it does provide enough, in fact, it will be less wacky than in C++ (look at the wretched iterator/const_iterator dichotomy). It'll work with core arrays, and will not need those typedefs.


> Instead, we should look to other languages for inspiration. I suggest Java and Python, whose iterator semantics are similar. Other posters have already made suggestions on this front. I would only suggest the "opIter" method for getting an iterator, and that the iterator class itself be available via T.iterator (whether it is a nested class or an alias shouldn't matter). Classes can provide methods for returning alternate iterators, as well. (A class providing opIter should be iterated over as easily as an iterator itself.)

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation