September 10, 2008
Benji Smith wrote:
> Walter Bright wrote:
>> Andrei Alexandrescu wrote:
>>> I hear you. I brought up the same exact design briefly with Bartosz last week. We called it T.invalid. He argued in favor of it. I thought it brings more complication than it's worth and was willing to go with T.init for simplicity's sake. Why deal with two empty states instead of one.
>>>
>>> One nagging question is, what is T.fail for integral types? For pointers fine, one could be found. For chars, fine too. But for integrals I'm not sure that e.g. T.min or T.max is a credible fail value.
>>
>> The T.init value should be that. That's why, for floats, float.init is a NaN. But for many types, there is no such thing as an invalid value, so it really doesn't work for generic code.
> 
> I don't think values necessarily have to be initialized to an invalid value. You could certainly argue that NaN values are valid results of certain computations, and that they're valid in certain contexts.
> 
> The important thing is that they're *uncommon*, and if you see them cropping up all over the place where they shouldn't, you know you have an initialization problem somewhere in your code.
> 
> The same thing could be true for integers, but zero is such a common value that it's tough to spot the origin of the error.
> 
> If signed integers were initialized to min_value and signed integers were initialized to max_value, I think those initialization errors would be easier to track down. Not because the values are illegal, but because they're *uncommon*.
> 
> --benji

I agree.  I use the 0xcdcdcdcd and 0xfefefefe provided by MSVC a lot to track down errors.

-Jowl
September 10, 2008
Andrei Alexandrescu wrote:
> In most slice-based D programming, using bare pointers is not necessary. Could then there be a way to use _only_ ranges and eliminate iterators altogether? A container/range design would be much simpler than one also exposing iterators.

> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

I like this a lot. You've mentioned safety and simplicity, but it also seems to be a more powerful abstraction than STL-style iterators.

Consider a depth-first-search over a tree. You have a start point, an end point, and some internal state (in this case, some kind of stack). The interesting thing is that the required internal state _may depend on the values of the start & end points_.

STL iterators don't model this very well, since they require a symmetry between iterators. Which creates the difficulty of where the internal state should be stored.
You can get away with independent iterators when the relationship between start and end is, "if you perform ++ on start enough times, you reach end". A simple array-style range formalizes this relationship, but the range concept also allows more complex relationships to be expressed.

So I think the value of this approach improves for more complicated iterators than the simple ones used by the STL.
September 10, 2008
On Wed, Sep 10, 2008 at 4:52 PM, Don <nospam@nospam.com.au> wrote:
> Andrei Alexandrescu wrote:
>>
>> In most slice-based D programming, using bare pointers is not necessary. Could then there be a way to use _only_ ranges and eliminate iterators altogether? A container/range design would be much simpler than one also exposing iterators.
>
>> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
>
> I like this a lot. You've mentioned safety and simplicity, but it also seems to be a more powerful abstraction than STL-style iterators.
>
> Consider a depth-first-search over a tree. You have a start point, an end point, and some internal state (in this case, some kind of stack). The interesting thing is that the required internal state _may depend on the values of the start & end points_.

Or you can think of it as a current point and an stopping criterion.

> STL iterators don't model this very well, since they require a symmetry between iterators. Which creates the difficulty of where the internal state should be stored.

This is also why I argued in my other post on digitalmars.D that we shouldn't be trying to force the start and end parts of a range to be named with complete symmetry.  They have different purposes.  There will generally be one current point that is relatively active and one stopping criterion that is relatively fixed.

I would like to take back one thing, though.  In another post I said I didn't think using * for getting the current value was a good idea because it would be too hard to grep for.  I hadn't been considering the RandomAccess ranges at that time.  I can't imagine *not* using operators for the random access ranges -- it just makes too much sense.  So if it's ok for random access then it should be ok for forward and bidir to use operators too.   BUT, just as with the random access ranges, I don't think there should be any synonyms.  Just use * as the only way to get the current element of a range.

I think the desire to have a special "*" shortcut is as clear an indication as any that Andrei in his heart of hearts agrees that the two parts of a range are not really symmetric and should not be treated as such.

--bb
September 10, 2008
On Wed, 10 Sep 2008 01:56:40 +0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Andrei Alexandrescu wrote:
>> Manfred_Nowak wrote:
>>> Benji Smith wrote:
>>>
>>>> Given the use of "getNext", I think "hasNext" is a more natural
>>>> choice
>>>
>>> clapping hands.
>>  Walter would love that.
>>  for (R r = getR(); r.hasNext; r.next) { ... }
>>  Look ma, no negation! Oops, I just materialized one with the exclamation sign.
>
> I just discovered a problem with that. hasNext implies I'm supposed to call next to get the thingie. It should be hasFirst, which is less appealing.
>
> Andrei

I usually implement my iterators as follows:

interface Iterator(T) {
    bool isValid();
    T value();
    void moveNext();
}

Usage:

auto it = ...;
while (it.isValid()) {
    auto value = it.value();
    it.moveNext();
}

or

for (auto it = ...; it.isValid(); it.moveNext()) {
    auto value = it.value();
}
September 10, 2008
Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Sergey Gromov wrote:
> > Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> >> A better design would be to define collections that own their contents. For example:
> >>
> >> Array!(int) a(100);
> >>
> >> This time a does own the underlying array. You'd be able to get ranges all over it:
> >>
> >> int[] b = a.all;
> > 
> > I really don't like to have basic language constructs implemented as templates.  It's like Tuple!() which is sorta basic type but requires template trickery to really work with it.
> 
> Well I guess we disagree on a number of issues here. The problem with "sorta basic" types is that the list could go on forever. I'd rather use a language that allows creation of good types from a small core, instead of one that tries to supplant all sorta basic types it could think of.

I think I've got your point here.  D is not Python, it shouldn't do anything high-level in the core.  The notion of range is sufficient to iterate through anything, core (namely foreach) doesn't need to be aware of the collections themselves.

Though I'm not fully convinced.  It's always good to have good defaults. So that you could quickly throw things together, and attend to details later.  So that you could write

string[string] dic;
foreach(k, v; dic) whatever;

Can I do this with your Array!()?  Or should I always use all() even though the Array!() is plain linear and obvious?

> > auto a = new File("name");
> > auto b = new TreeSet!(char);
> > 
> > foreach(ch; a)
> >   b.insert(ch);
> > 
> > foreach(ch; b)
> >     writeln("unique char ", ch);
> > 
> > Here is the problem.  First foreach() naturally and expectedly changes the state of an object a.  Second foreach() naturally and expectedly does not make changes to object b.
> > 
> > Solution:
> > 
> > File is an Input range in your notion.  It supports isEmpty() and getNext(), it is non-copyable (but, note, referenceable).
> 
> You left a crucial detail out. What does getNext() return?

Something.  Documented.  I'd be happy with string, that is, line by line iteration.  It's nice for text dumping, simple configurations, simple internet protocols like POP and FTP, user interaction.  You see, some useful default.  The File could also provide byte range bytes() and dchar range chars() and whatever the author considered feasible.

Note that I wasn't convincing you to change stdio design.  My choice of class names was bad.  I should have used MyFile instead of File, meaning some user class with user functionality.

> In the new std.stdio design, a File is preoccupied with opening, closing, and transferring data for the underlying file. On top of it several input ranges can be constructed - that read lines, blocks, parse text, format text, and so on. (One thing I want is to allow std.algorithm to work with I/O easily and naturally.)

OK, you like this design, no problem.  Better even.  Your File is naturally iterable over its bytes.  Any low-level file is, anybody knows that.  I can see no reason to deny foreach() over a file.

> > foreach() checks if a passed object implements opSlice() so that it can iterate non-destructively. If no opSlice() is provided, it falls back to getNext().
September 10, 2008
On 2008-09-10 01:02:00 +0200, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> Fawzi Mohamed wrote:
>> are you sure? then a range is *exactly* equivalent to a STL iterator, only that it cannot go out of bounds:
>> // left1-left2:
>> while((!i1.isEmpty) && (!i1.atSamePlace(i2))){
>>  i1.next;
>> }
>> // left2-left1:
>> while((!i2.isEmpty) && (!i1.atSamePlace(i2))){
>>  i1.next;
>> }
>> // union 1-2
>> while((!i1.isEmpty) && (!(i1.atSamePlace(i2))){
>>  i1.next;
>> }
>> while(!i2.isEmpty){
>>  i2.next;
>> }
>> // union 2-1
>> ...
>> // lower triangle
>> i1=c.all;
>> while(!i1.isEmpty){
>>  i2=c.all;
>>  while(!i2.isEmpty && !i2.atSamePlace(i1)){
>>    i2.next;
>>  }
>> well these are the operations that you can do on basically all iterators (and with wich you can define new iterators).
>> The one you propose need an underlying total order that can be efficiently checked, for example iterators on trees do not have necessarily this property, and then getting your kind of intersection can be difficult (and not faster than the operation using atSamePlace.
> 
> I am getting seriously confused by this subthread. So are you saying that atSamePlace is your primitive and that you implement the other range operations all in linear time? If I did not misunderstand and that's your design, then you may want to revise that design right now. It will never work. I guarantee it.

It desn't seem to difficult to me, just look at the code, they are iterations on subranges of iterators i1 and i2, actually they are the only kind of range combination that can be performed safely on general iterators.
The range combinations you propose are cumbersome rarely used and in general unsafe, I think that it is a bad idea add them to the object that is needed to get some foreach magic, and the most generic iterator.

atSamePlace returns true if two iterators have .left (or however you call it) at the same place (and in general this might not mean that they have the same address) can be implemented for almost all iterators in constant time (at the moment I cannot think of a counter example), and with it (as the code just above shows) you can define some subranges.

In the case in which you have a easily checkable total ordering between the elements then yes you do have all the all that it is needed to have a real range, and for this range object your subrange operations are safe, and I am not against them, just don't force them on every person that wants just to iterate on something.

>> the only new thing is bidirectional iterator: an iterator that can go in both directions as extra iterator type (your bidirectional range is something different).
> 
> A bidirectional range is simply a range that you can shrink from either end.

That is exactly what I said it the sentence before, but in this sentence I am speaking about a bidirectional *iterator* that for me is an iterator that can move both back and forth.

>> I think it is useful and I don't see the need to shoehorn it into a range. For me an iterator is an object that can generate a sequence by itself, so a range is an example of iterator, but I don't see the need to make each iterator a range.
> 
> I have put forth reasons for doing away with iterators entirely in the range doc. What are your counter-reasons for bringing back iterators?

they are simpler and describe a larger range of useful constructs ranges on liked list as I said are unsafe, which does not meant that ranges are not useful, just that there is a place also for simple iterators.

Iterators can be perfectly safe it is just the C++ idea of un-bundling the end from the iterator that makes it unsafe (an also more cumbersome to use).
If iterator for you for you is too connected with C++ view of it call them generators: a self containde object that can generate a sequence.

bidirectional iterators are a natural step in the progression of iterators also they can be implemented safely.

>> As I said before a range also has a total ordering of the objects that can be easily checked, this is a special king of iterator for me, not the most general. Take two ranges of two linked lists, you cannot easily build your intersections because you don't know their relative order, and checking it is inefficient.
> 
> Correct. The range intersection primitives are Achille's heel of the range-based design. Checking subrange reachability is O(n), so the range intersection primitives take the user by faith. But iterators have that Achille's heel problem too, plus a few arrows in their back :o). The document clarifies this disadvantage by saying that range intersection primitives are undefined if certain conditions are not met.
> 
> In short, this is an endemic problem of an iteration based on either ranges or individual iterators. An objection to that should automatically come with a constructive proof, i.e. a better design.

using atSamePlace you can do it safely on any kind of ranges, I think that the operation available should only be safe, and they can be safe, in general using atSamePlace, and if there is a quickly checkable total ordering (as in arrays for example) by never letting a range be larger than the union of two ranges.
One can then discuss if segment it (and the result would be an iterator but not a range) or choose only one side (keep it a range) if the two ranges are disjoint and have a hole between them.

>>>> So this is what "my" iterator/range would look like :)
>>> 
>>> I encourage you to realize your design. Before long you'll find probably even more issues with it than I mentioned above, but you'll be gained in being better equipped to find proper solutions.
>> 
>> I hope we converge toward a good solution ;)
> 
> Well I haven't seen much code yet.

I have written quite some code in my multidimensional array library ( http://github.com/fawzi/narray ) and I thought a lot about iterators, not only in D, but in the several languages that I know.
As with everybody I don't think to really see all implications of the interface, but I think that I understand them enough to participate meaningfully to this discussion.
Actually the moment I am really busy, but I am trying to say something because I think this discussion is important for the future of D, and I care about it.
I will try to give some real code, but I thought that my contribution was not so difficult to understand, but it is always like this to yourself one always seems clear ;)

Fawzi

September 10, 2008
JAnderson wrote:
> 
> Hi Andrei,
> 
> I like the idea behind ranges.  I don't like C++'s / stl's long winded syntax at all.  Its so large that it generally uses up several lines along with several typedefs etc...  All that work just to iterate over some data.  The longer things get the more error prone they get... how many times have I put an begin when I meant to put end *sigh*.
> 
> However I currently disagree on this point.
> 
> Andrei Alexandrescu wrote:
>  >
>  > Fine. So instead of saying:
>  >
>  > foreach (e; c.all) { ... }
>  >
>  > you can say
>  >
>  > foreach (e; c) { ... }
>  >
>  > I think that's some dubious savings.
> 
> 
> I think its useful to have the implicit range conversion.  Consider writing generic/template code.  Of course built in arrays could provide the .all but then consider passing around ranges.  That would also mean all ranges would also have a .all (could we go .all.all.all for instance?).

There's no regression. There are containers and ranges. Containers have .all. Ranges don't.

I think you guys are making a good point; I'm undecided on what would be better. One not-so-cool part about implicit conversion to range is that all of a sudden all range operations spill into the container. So people try to call c.pop and it doesn't compile. (Why?) They get confused.

> I'm all for compile time checking however I think that implicit .all (with of course an explicit option) will make it easy to change a function that once took an object to take a simple range  Also it would make it easy to change from one way of getting at a range to another.
> 
> What about matrices?  They don't implement default .all, they would provide like .col and .row.

Bidimensional ones that is :o).


Andrei
September 10, 2008
Don wrote:
> Andrei Alexandrescu wrote:
>> In most slice-based D programming, using bare pointers is not necessary. Could then there be a way to use _only_ ranges and eliminate iterators altogether? A container/range design would be much simpler than one also exposing iterators.
> 
>> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
> 
> I like this a lot. You've mentioned safety and simplicity, but it also seems to be a more powerful abstraction than STL-style iterators.
> 
> Consider a depth-first-search over a tree. You have a start point, an end point, and some internal state (in this case, some kind of stack). The interesting thing is that the required internal state _may depend on the values of the start & end points_.
> 
> STL iterators don't model this very well, since they require a symmetry between iterators. Which creates the difficulty of where the internal state should be stored.
> You can get away with independent iterators when the relationship between start and end is, "if you perform ++ on start enough times, you reach end". A simple array-style range formalizes this relationship, but the range concept also allows more complex relationships to be expressed.
> 
> So I think the value of this approach improves for more complicated iterators than the simple ones used by the STL.

That's a great insight. Hadn't thought of it!

Andrei
September 10, 2008
On Wed, Sep 10, 2008 at 7:47 AM, Fawzi Mohamed <fmohamed@mac.com> wrote:

>>> 2) All the methods with intersection of iterator in my opinion are
>>> difficult to memorize, and rarely used, I would scrap them.
>>> Instead I would add the comparison operation .atSamePlace(iterator!(T)y)
>>> that would say if two iterators are at the same place. With it one gets back
>>> all the power of pointers, and with a syntax and use that are
>>> understandable.
>>
>> But that comparison operation is not enough to implement anything of substance. Try your hand at a few classic algorithms and you'll see.
>
> are you sure? then a range is *exactly* equivalent to a STL iterator, only
> that it cannot go out of bounds:
> // left1-left2:
> while((!i1.isEmpty) && (!i1.atSamePlace(i2))){
>  i1.next;
> }
> // left2-left1:
> while((!i2.isEmpty) && (!i1.atSamePlace(i2))){
>  i1.next;
> }
> // union 1-2
> while((!i1.isEmpty) && (!(i1.atSamePlace(i2))){
>  i1.next;
> }
> while(!i2.isEmpty){
>  i2.next;
> }
> // union 2-1
> ...
> // lower triangle
> i1=c.all;
> while(!i1.isEmpty){
>  i2=c.all;
>  while(!i2.isEmpty && !i2.atSamePlace(i1)){
>   i2.next;
>  }

Your code shows that you can successfully iterate over the same elements described by Andrei's various unions and differences, but they do not show how you would, say, pass that new range another function to do that job.  Such as you would want to do in say, a recursive sort.  Since in this design you can't set or access the individual iterator-like components of a range directly, being able to copy the begin or end iterator from one range over to another is necessary, I think.

But I think you and I are in agreement that it would be easier and more natural to think of ranges as iterators augmented with information about bounds, as opposed to a contiguous block of things from A to B.

> well these are the operations that you can do on basically all iterators
> (and with wich you can define new iterators).
> The one you propose need an underlying total order that can be efficiently
> checked, for example iterators on trees do not have necessarily this
> property, and then getting your kind of intersection can be difficult (and
> not faster than the operation using atSamePlace.

I don't think that's correct.  Andrei's system does not need a total order any more than yours does.  The unions and diffs just create new ranges by combining the components of existing ranges.  They don't need to know anything about what happens in between those points or how you get from one to the other.  Just take the "begin" of this guy and put it together with the "end" of that guy, for example.  It doesn't require knowing how to get from anywhere to anywhere to create that new range.

--bb
September 10, 2008
Bill Baxter wrote:
> On Wed, Sep 10, 2008 at 4:52 PM, Don <nospam@nospam.com.au> wrote:
>> Andrei Alexandrescu wrote:
>>> In most slice-based D programming, using bare pointers is not necessary.
>>> Could then there be a way to use _only_ ranges and eliminate iterators
>>> altogether? A container/range design would be much simpler than one also
>>> exposing iterators.
>>> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
>> I like this a lot. You've mentioned safety and simplicity, but it also seems
>> to be a more powerful abstraction than STL-style iterators.
>>
>> Consider a depth-first-search over a tree. You have a start point, an end
>> point, and some internal state (in this case, some kind of stack). The
>> interesting thing is that the required internal state _may depend on the
>> values of the start & end points_.
> 
> Or you can think of it as a current point and an stopping criterion.

My design intently supports forward iteration with sentinel (e.g. a singly-linked list iterator that only has one node pointer and knows it's done when it hits null) and also forward iteration that holds both limits (e.g. a singly-linked list iterator that holds TWO node pointers and knows it's done when they are equal). That's why forward iterators never support a subrange "up to the beginning of some other range" because that would rule out sentinel-terminated iterators.

It intently does not support things like zero-terminated strings as random iterators. Why? Because there's no safe way of implementing indexing.

I am glad you noticed all this. It's quite subtle.

>> STL iterators don't model this very well, since they require a symmetry
>> between iterators. Which creates the difficulty of where the internal state
>> should be stored.
> 
> This is also why I argued in my other post on digitalmars.D that we
> shouldn't be trying to force the start and end parts of a range to be
> named with complete symmetry.  They have different purposes.  There
> will generally be one current point that is relatively active and one
> stopping criterion that is relatively fixed.

They do have different purposes and they are asymmetric. But as far as I could tell in reimplementing std.algorithm that asymmetry does not need to spill into the interface.

There is one imperfection: there are forward iterators that can implement subranges "up to the beginnig of some other range". They are not categorized in my design.

> I would like to take back one thing, though.  In another post I said I
> didn't think using * for getting the current value was a good idea
> because it would be too hard to grep for.  I hadn't been considering
> the RandomAccess ranges at that time.  I can't imagine *not* using
> operators for the random access ranges -- it just makes too much
> sense.  So if it's ok for random access then it should be ok for
> forward and bidir to use operators too.   BUT, just as with the random
> access ranges, I don't think there should be any synonyms.  Just use *
> as the only way to get the current element of a range.
> 
> I think the desire to have a special "*" shortcut is as clear an
> indication as any that Andrei in his heart of hearts agrees that the
> two parts of a range are not really symmetric and should not be
> treated as such.

Walter doesn't like "*" :o(.


Andrei