November 06, 2006
Walter Bright wrote:
> Dave wrote:
>> I should add - will native arrays of structs (say) provide this by default w/o having to add specialized operators?
> 
> Yes. It would be no different than for native arrays of ints.

Cool!
November 06, 2006
Walter Bright wrote:
> Sean Kelly wrote:
>> Is it worth discussing whether the pointer-style C++ iterators are the best approach?  They certainly are if random access iterators are a requirement, but if we're just implementing forward and reverse iterators, the all-in-one Java approach may be better.  In that vein, I  assume alose that reverse iteration would work with reverse iterators and an .rbegin type method instead of .begin and foreach_reverse?
> 
> Hmm, or it could be done starting with .end and going to .begin.

Yup.  About the only potentially weird thing there is that, assuming C++  style iterators, it would require reverse iterators and forward iterators to be comparable to one another.  And, I suppose, for .end to return a reverse style iterator even for sequences that don't support reverse iteration?


Sean
November 06, 2006
Walter Bright wrote:
> Tom S wrote:
>> Walter Bright wrote:
>>> foreach loops will not be able to have a 'key' parameter.
>> What's the particular reason for that restriction ?
> 
> With an iterator, what is the key?

You've got a point. I failed to see the big picture here.
November 06, 2006
Walter Bright wrote:
> Mike Capp wrote:
>>> Overload opIndex for rvalue access
>>> [...]
>>> Overloading opIndex also will work for random access
>>
>> Wouldn't this make iterators incompatible with e.g. linked lists? Seems to defeat
>> much of the purpose.
> 
> To work, all it has to do is support [0], and throw an exception if the index is anything but 0.
> 
>> (It might be possible to implement random access for a list, but it's not
>> something you'd want to encourage.)
> 
> I agree.

This is one point where there's a huge divide between C++ iterators and Java iterators.

In C++ there is no such thing as *the* iterator type or interface.  What C++ (or STL, rather) has is collection of *iterator concepts*.  What that means is that if opIndex doesn't make sense for the the collection in question, you just leave it out.  A 'concept' is like an interface but for templates.  Here are the concepts in the STL:

Trivial Iterator - http://www.sgi.com/tech/stl/trivial.html
Input Iterator - http://www.sgi.com/tech/stl/InputIterator.html
Output Iterator - http://www.sgi.com/tech/stl/OutputIterator.html
Forward Iterator - http://www.sgi.com/tech/stl/ForwardIterator.html
Bidirectional Iterator - http://www.sgi.com/tech/stl/BidirectionalIterator.html
Random Access Iterator
http://www.sgi.com/tech/stl/RandomAccessIterator.html

If you want your container to support forward iteration, then your iterator can be of any type you wish, but it must support x++, ++x, and *x operations.


The Java/C# model is instead based on predefined interfaces which has pros and cons.
* Main pro is that functions that take generic iterators can be ordinary functions rather than templates.
* Main con is that such functions can't generally be as efficient as raw pointer manipulation, because you have to make a virtual method call through the interface pointer to get each next element.


D is in a very interesting place in that it is high level enough that the C++ model doesn't quite make sense, but low level enough that the Java model doesn't quite make sense either.

D may be able to have the best of both worlds.  Define *both* standard concepts *and* standard interfaces that implement those concepts, and then libraries can write either templatized algorithms for speed, or regular function based algorithms for code-size, use with inheritance, etc.

--bb
November 06, 2006
Hasan Aljudy wrote:
> 
> 
> rm wrote:
>> Walter Bright wrote:
>>> Walter Bright wrote:
>>>> .being property returns an iterator that starts at the beginning
>>>
>>> Make that .begin
>>>
>>> Also, overloading += will be used to advance the iterator.
>>
>> that would give something like this?
>>
>> import std.stdio;
>>
>> void main()
>> {
>>     char[] s = "hello";
>>     // += for advancing an iterator, why not ++?
>>     for (iterator i = s.begin; i != s.end; i += 1)
>>         writefln(s[i]);
>>
>>     // ??? i is an iterator or a char  ???
>>     foreach(i; s)
>>         // extend writefln to take iterators ?
>>         // but that would mean that it must be possible
>>         // to determine a value from an iterator
>>         writefln(i);
>> }
>>
> 
> Why would I use
>  >     for (iterator i = s.begin; i != s.end; i += 1)
>  >         writefln(s[i]);
> 
> instead of:
> 
>  >     for (int i = 0; i < s.length; i++)
>  >         writefln(s[i]);
> 
> ?

The point is that you could replace char[] with SomeCollection and the code should still work.  But the same mechanism should also work on the built-in types.

--bb
November 07, 2006
Bill Baxter wrote:
> Walter Bright wrote:
>> Mike Capp wrote:
>>>> Overload opIndex for rvalue access
>>>> [...]
>>>> Overloading opIndex also will work for random access
>>>
>>> Wouldn't this make iterators incompatible with e.g. linked lists? Seems to defeat
>>> much of the purpose.
>>
>> To work, all it has to do is support [0], and throw an exception if the index is anything but 0.
>>
>>> (It might be possible to implement random access for a list, but it's not
>>> something you'd want to encourage.)
>>
>> I agree.
> 
> This is one point where there's a huge divide between C++ iterators and Java iterators.
> 
> In C++ there is no such thing as *the* iterator type or interface.  What C++ (or STL, rather) has is collection of *iterator concepts*.  What that means is that if opIndex doesn't make sense for the the collection in question, you just leave it out.  A 'concept' is like an interface but for templates.

For what it's worth, C++98 style iterator traits won't work in D because of D's simpler template lookup rules.  I think we'll have to go with something much closer to C++0x concept checking if we want to overload algorithms for different iterator categories.

> The Java/C# model is instead based on predefined interfaces which has pros and cons.
> * Main pro is that functions that take generic iterators can be ordinary functions rather than templates.
> * Main con is that such functions can't generally be as efficient as raw pointer manipulation, because you have to make a virtual method call through the interface pointer to get each next element.

Hrm... you could still use a Java design without the base class and have everyone just use templates though.

> D is in a very interesting place in that it is high level enough that the C++ model doesn't quite make sense, but low level enough that the Java model doesn't quite make sense either.
> 
> D may be able to have the best of both worlds.  Define *both* standard concepts *and* standard interfaces that implement those concepts, and then libraries can write either templatized algorithms for speed, or regular function based algorithms for code-size, use with inheritance, etc.

That's kind of what I was thinking, though if iterators all inherit from a common base class then the calls will be virtual regardless, unless the DMD optimizer gets a bit fancier.


Sean
November 07, 2006
Walter Bright wrote:
> Tom S wrote:
>> Walter Bright wrote:
>>> foreach loops will not be able to have a 'key' parameter.
>> What's the particular reason for that restriction ?
> 
> With an iterator, what is the key?

It could just be the count.  Say I'm trying to linearize a non-linear datastructure into a pre-existing array:


   foreach(int i, val v; hierarchical_thing.iterator())
   {
      linearized[i] = val;
   }

Also having the count there would make it easier to do something like iterating lock step in funky ways over different objects:

   foreach(int i, val v; thing.randomized_iterator()) {
      other_thing[i] = val;
   }

Obviously you could count things yourself if you had to.
But I find for instance in Python that I use 'enumerate()' quite often.  enumerate() takes an iterator and returns an iterator that spits out (count,value) tuples.

--bb
November 07, 2006
Walter Bright wrote:
> Bill Baxter wrote:
>> Walter Bright wrote:
>>> It's becoming increasingly obvious that D needs iterators. While opApply  is far better for some kinds of iteration (such as recursively traversing a directory), iterators are more efficient in some cases, and allow for things that opApply makes difficult.
>>>
>>> So hear are the design goals:
>>>
>>> 1) Works with dynamic and stat arrays
>>
>> How about pointers (unknown length arrays)?
>>   int *some_numbers;
>>
>> Is it required that a pointer be a valid iterator?
>> I.e. should it be possible to have this template function:
>>
>>    print_all(some_numbers, some_numbers+theLength);
>>
>> Or will that case require a wrapper object that contains theLength?
>>
>>    auto iter = PointerIterator(some_numbers, theLength);
>>    print_all(iter);
> 
> It's simpler than that. Pointers can be converted to arrays:
>     auto array = some_numbers[0 .. theLength];
> and then you can get the iterator.

Nice!  Didn't know you could do that.


> I think the C++ like way will be a lot more efficient, and I think it will work. Reference types are not needed.

The C++ way really embodies two design choices:
1) iterators are defined by 'concept'
   --> Means generic iteration functions all need to be templates
   --> The other alternative is define them by interfaces.
       --> these two aren't necessarily mutually exclusive though
2) an iterator is a pointer -- begin() and end() are kept separate.
   --> The other alternative is 'an iterator is a range'

Definitely 1) will be more efficient than using interfaces,
but I don't thing pointer semantics, 2), are required for 1) to work.

It may be possible for D iterators to have the speed of C++ while having the convenience and safety of Java's iterators that know their bounds.

>> I think the minuses outweigh the plusses.
>> But maybe it's useful to take one concrete example:  A generic mergesort or quicksort algorithm.
>>
>> In C++ recursive calls just pass the limits of the range to the children
>> mergesort(begin, end)
>> {
>>     ...
>>     mergesort(begin, (end-begin)/2);
>>     mergesort((end-begin)/2, end);
>> }
>> It's very efficient.
>>
>> In the iterator-is-object approach I'm not sure how that works.
>> If all you've got is .next() or .hasNext() I'm not sure what you're supposed to do to create an iterator over half the range represented by the original iterator.
> 
> Iterators can be any type, not just objects, that support [], ++, +=, and =.

I think the iterator-as-range idea is at least worth discussing.  99% of the time you use an iterator you also need to know where to stop.  It's pretty similar to how arrays in D keep a length, because 99% of the time when you have an array you need to know how long it is too.  So it makes sense to keep those two bits of information together.  Similarly in C++, when you have an iterator, you almost always need the end() too.  And any time you want to pass around iterators you end up having to pass the two bits of information around separately.

--bb
November 07, 2006
Walter Bright wrote:
> Kirk McDonald wrote:
>> Walter Bright wrote:
>>> Overloading * doesn't work with D. But, instead:
>> What are the arguments against that, again?
> 
> Because D has many cases of implicit dereferencing, I thought it would cause more trouble than it's worth.
> 
>> I do not recall, and it's worth at least reviewing why we don't have opDeref and opDerefAssign.
>>
>> As you know, Walter, a major benefit of C++-style iterators is that they are semantically as close as possible to pointers. Would [] be changed to dereference pointers?
> 
> You've always been able to:
>     int* p;
>     p[1] = ...
> 

Wouldn't that be p[0] = ...?

Is that how you intend to use opIndex to "dereference" the iterator? That's /terrifying/. Although it is consistent with pointers, using [0] for everything is just asking for trouble with regards to non-random-access iterators. And though it's probably not totally meaningless to a random reader of the code, it comes pretty darned close.

>> I sure hope not. If we go this route, I would suggest adding these unary-* overloads.
>>
>>> Overload opIndex for rvalue access
>>> Overload opIndexAssign for lvalue access
>>>
>>
>> I think you mean opSlice and opSliceAssign.
> 
> No, I meant Index.
> 
>>> Overloading opIndex also will work for random access
>>>
>>> foreach loops will not be able to have a 'key' parameter.
>>
>> So, I would assume these iterators would work something like this:
>>
>> If a class provides .begin and .end, and these both return the same type, and that type provides the requisite iterator methods, it is foreach-able.
> 
> Yes, though they will return a value, not a type.
> 

s/these both return the same type/their return types are the same/

>> If a class provides .begin, .end, and opApply, one of the two iteration methods has to take precedence. I would hope it's opApply.
> 
> I was leaning towards the other way around.
> 

It is slightly easier and clearer to explicitly say:

foreach(e; t.begin) { }

than

foreach(e; &t.opApply) { }

opApply is an operator overload. .begin would just be a special method. I would expect the operator to take precedence.

>> For a type T, its associated iterator type should be available via T.iterator. This has to be standard.
> 
> It's not needed, as typeof(T.begin) will do it.
> 

Due to the way in which D's type inference from properties works, you'll need typeof(T.begin()). Otherwise, it will derive the type of the method itself. (Which isn't even a valid type, as such.)

>> An iterable type T MUST provide these methods:
>> I begin()
>> I end()
> 
> Yes.
> 
>> An iterator type I MUST provide the following overloads:
>> E opSlice()
> 
> No. opIndex()

I'm referring specifically to the no-index form of opSlice (meaning i[] would "dereference" the iterator). (A no-index opIndex doesn't work.) This of course is inconsistent with pointers, so saying i[0] is better on that count.

> 
>> I opPostInc()
> 
> Probably opAddAssign()
> 

Oh, there's another one:

int opEquals(I)

>> E is the type of an element in the collection.
> 
> 
>> These are enough to describe a read-only forward iterator, which is adequate for foreach. In other words, given a type T that meets the requirements outlined above, the following is adequate to iterate through it:
>>
>> foreach(e; t) {
>>     // ...
>> }
>>
>> 'e' would be of type E.
> 
> Yes.
> 
>> We run into a curious problem involving pre-increment, which uses opAddAssign. Classes for which random-access is problematic should not be required to provide this, which is essentially a random-access method. There are two options here:
>>
>> 1) Only require that iterators support opPostInc. (This further removes them from actual pointers.)
>>
>> 2) Introduce a convention whereby a non-random-access iterator throws an exception if any value other than 1 is passed to its opAddAssign overload.
> 
> I could go either way with that.
> 

Neither one is particularly elegant.

>> Extending these rules for bidirectional iterators and random-access iterators, as well as read-write iterators, is left as an exercise to the reader. (And not a very difficult one.)
>>
>> All of these strange edge-cases and differences between pointers and possible iterator types convince me that C++-STYLE ITERATORS ARE NOT FOR D. D, unlike C++, does not supply enough operator overloads for their semantics to be identical to those of pointers, which was the whole point of C++-style iterators in the first place.
> 
> I think it does provide enough, in fact, it will be less wacky than in C++ (look at the wretched iterator/const_iterator dichotomy). It'll work with core arrays, and will not need those typedefs.
> 

Without the ability to overload the dereference operator, any attempt to write a class that behaves like a pointer in D will be unwieldy, or at least ugly. Admittedly, this isn't much of an issue when just using foreach, but might be an issue with any kind of STL-like algorithms library. (Which is what you're inviting when using C++'s iterator semantics.)

> 
>> Instead, we should look to other languages for inspiration. I suggest Java and Python, whose iterator semantics are similar. Other posters have already made suggestions on this front. I would only suggest the "opIter" method for getting an iterator, and that the iterator class itself be available via T.iterator (whether it is a nested class or an alias shouldn't matter). Classes can provide methods for returning alternate iterators, as well. (A class providing opIter should be iterated over as easily as an iterator itself.)
> 
> 

-- 
Kirk McDonald
Pyd: Wrapping Python with D
http://pyd.dsource.org
November 07, 2006
Kirk McDonald wrote:
> Walter Bright wrote:
>> You've always been able to:
>>     int* p;
>>     p[1] = ...
> Wouldn't that be p[0] = ...?

I must have misunderstood you. I thought you were asking if [n] worked on pointers. Yes, it does.

> Is that how you intend to use opIndex to "dereference" the iterator? That's /terrifying/. Although it is consistent with pointers, using [0] for everything is just asking for trouble with regards to non-random-access iterators. And though it's probably not totally meaningless to a random reader of the code, it comes pretty darned close.

I agree it might be startling at first because it's unusual. Once one is past that, however, is it that bad?

>>> If a class provides .begin, .end, and opApply, one of the two iteration methods has to take precedence. I would hope it's opApply.
>>
>> I was leaning towards the other way around.
>>
> 
> It is slightly easier and clearer to explicitly say:
> 
> foreach(e; t.begin) { }
> 
> than
> 
> foreach(e; &t.opApply) { }
> 
> opApply is an operator overload. .begin would just be a special method. I would expect the operator to take precedence.

The:
	foreach(e; t.begin) { }
would not be supported. It would be:
	foreach (e; t) { }
If an aggregate had both, and iterators would be chosen by default, overriding with:
	foreach (e; &t.opApply) { }
would not require any new conventions or syntax. So I think this is the right way to go.


>>> For a type T, its associated iterator type should be available via T.iterator. This has to be standard.
>>
>> It's not needed, as typeof(T.begin) will do it.
>>
> 
> Due to the way in which D's type inference from properties works, you'll need typeof(T.begin()). Otherwise, it will derive the type of the method itself. (Which isn't even a valid type, as such.)

Yes, you're right.

>> No. opIndex()
> 
> I'm referring specifically to the no-index form of opSlice (meaning i[] would "dereference" the iterator). (A no-index opIndex doesn't work.) This of course is inconsistent with pointers, so saying i[0] is better on that count.

Ok, I understand now where you were coming from with the slice.


>>> I opPostInc()
>>
>> Probably opAddAssign()
> 
> Oh, there's another one:
> 
> int opEquals(I)

Yes, I'd overlooked that.

>> I could go either way with that.
> Neither one is particularly elegant.

True, but that inelegance is hidden away in the library, but the flexibility is probably worth it.

>> I think it does provide enough, in fact, it will be less wacky than in C++ (look at the wretched iterator/const_iterator dichotomy). It'll work with core arrays, and will not need those typedefs.
> Without the ability to overload the dereference operator, any attempt to write a class that behaves like a pointer in D will be unwieldy, or at least ugly. Admittedly, this isn't much of an issue when just using foreach, but might be an issue with any kind of STL-like algorithms library. (Which is what you're inviting when using C++'s iterator semantics.)

I still think it isn't necessary, but you might be thinking of a use case I haven't. Can you provide an example?