RFC on range design for D2 (page 11) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Announce » RFC on range design for D2 (page 11)

September 09, 2008

Re: RFC on range design for D2

Posted by Steven Schveighoffer
in reply to Derek Parnell

Steven Schveighoffer

Posted in reply to Derek Parnell

"Derek Parnell" wrote
> On Tue, 09 Sep 2008 10:45:53 -0500, Andrei Alexandrescu wrote:
>
>
>>          ref int left() { ... }
>
> Is "left" a "movement in a specific direction" as in "go left at the next lights" or the "amount of stuff left over"? It is a bit ambiguous. Even if it is a direction, is it moving towards the first or the last item? It is not self-evident. As a user of the Latin alphabet I'd assume it was going towards the first item but a Hebrew or Arabic users might assume it was heading towards the end.

It means 'left-most element in the range'.  It gets you the first element in the range (i.e. the next element to iterate) without modifying the range.

I agree that it is very misleading, but I think Andrei is exploring other possibilities (see other threads).

-Steve

September 09, 2008

Re: RFC on range design for D2

Posted by Fawzi Mohamed
in reply to Andrei Alexandrescu

Fawzi Mohamed

Posted in reply to Andrei Alexandrescu

On 2008-09-09 18:09:28 +0200, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> Fawzi Mohamed wrote:
>> It is a nice idea to redesign the iterator and range.
>> I think that the proposal is not bad, but I have some notes about it, and some things that I would have done differently.
>> 
>> 1) The simplest interface input (range) is just
>> bool isEmpty();
>> T next();
>> iterator(T) release();
> 
> Actually next is getNext, and release returns R (the range type).
> 
>> Thefirst two I fully agree on, the second one I suppose is there to allow resources to be released and possibly transfer the data to another iterator.. is it really needed?
> 
> Yes. Consider findAdjacent that finds two equal adjacent elements in a collection:
> 
> Range findAdjacent(alias pred = "a == b", Range)(Range r)
> {
>      if (r.isEmpty) return r;
>      auto ahead = r;
>      ahead.next;
>      for (; !ahead.isEmpty; r.next, ahead.next)
>          if (binaryFun!(pred)(r.first, ahead.first)) return r;
>      }
>      return ahead;
> }
> 
> The whole implementation fundamentally rests on the notion that you can copy a range into another, and that you can iterate the collection independently with two distinct ranges. If that's not true, findAdjacent will execute yielding nonsensical results.
> 
> Input iterators are not copyable. With an input iterator "auto ahead = r;" will not compile. But they are movable. So you can relinquish control from one iterator to the other.

ok I understand, indeed it is useful to have non copyable "unique" iterators, even if they are not the common iterators (actually I think it is potentially even more important for output iterators).

>> Now I would see this simplest thing (let me call it iterator) as the basic objects for foreach looping.
>> *all* things on which foreach loops should be iterators.
>> If an object is not a iterator there should be a standard way to convert it to one (.iterator for example).
>> So if the compiler gets something that is not a iterator it tries to see if .iterator is implemented and if it is it calls it and iterates on it.
>> This let many objects have a "default" iterator. Obviously an object could have other methods that return an iterator.
> 
> Fine. So instead of saying:
> 
> foreach (e; c.all) { ... }
> 
> you can say
> 
> foreach (e; c) { ... }
> 
> I think that's some dubious savings.

I think it is useful, but not absolutely necessary.

>> 2) All the methods with intersection of iterator in my opinion are difficult to memorize, and rarely used, I would scrap them.
>> Instead I would add the comparison operation .atSamePlace(iterator!(T)y) that would say if two iterators are at the same place. With it one gets back all the power of pointers, and with a syntax and use that are understandable.
> 
> But that comparison operation is not enough to implement anything of substance. Try your hand at a few classic algorithms and you'll see.

are you sure? then a range is *exactly* equivalent to a STL iterator, only that it cannot go out of bounds:
// left1-left2:
while((!i1.isEmpty) && (!i1.atSamePlace(i2))){
 i1.next;
}
// left2-left1:
while((!i2.isEmpty) && (!i1.atSamePlace(i2))){
 i1.next;
}
// union 1-2
while((!i1.isEmpty) && (!(i1.atSamePlace(i2))){
 i1.next;
}
while(!i2.isEmpty){
 i2.next;
}
// union 2-1
...
// lower triangle
i1=c.all;
while(!i1.isEmpty){
 i2=c.all;
 while(!i2.isEmpty && !i2.atSamePlace(i1)){
   i2.next;
 }
well these are the operations that you can do on basically all iterators (and with wich you can define new iterators).
The one you propose need an underlying total order that can be efficiently checked, for example iterators on trees do not have necessarily this property, and then getting your kind of intersection can be difficult (and not faster than the operation using atSamePlace.

>> I understand the idea of covering all possibilities, if one wants it with .atSamePlace a template can easily construct all possible intersection iterators. Clearly calling recursively such a template is inefficient, but I would say the then one should use directly a pair of iterators (in the worst case one could make a specialization that implements it more efficiently for the types that support it).
>> 
>> 3) copying: I would let the user freely copy and duplicate iterators if needed.
> 
> I like freedom too. But that kind of freedom is incorrect for input iterators.

Now I realized it, thanks.

>> 4) input-output
>> I think that the operations proposed are sound, I like them
> 
> Then you got to accept the consequences :o).

yes :)

>> 5) hierarchy of iterators
>> I would classify the iterator also along another axis: size
>> infinite (stream) - finite (but unknown size) - bounded (finite and known size)
> 
> Distinguishing such things can be of advantage sometimes, and could be added as a refinement to the five categories if shown useful.

well if an iterator knows its size, and you want to use it to initialize an array for example...

>> The other classification:
>> forward iterator (what I called iterator until now)

>> bidirectional range: I understand this, these are basically two iterators one from the beginning and the other from the end that are coupled together. I find it a little bit strange, I would just expect to have a pair of iterators... but I see that it might be useful
>> bidirectional iterator: this is a doubly linked list, I think that this class of iterators cannot easily be described just as a range, it often needs three points (start,end,actual_pos), I think has its place (and is not explicitly present in your list)
>> random_iterator: (this could be also called array type or linear indexed type).
> 
> I can't understand much of the above, sorry.

the only new thing is bidirectional iterator: an iterator that can go in both directions as extra iterator type (your bidirectional range is something different).
I think it is useful and I don't see the need to shoehorn it into a range. For me an iterator is an object that can generate a sequence by itself, so a range is an example of iterator, but I don't see the need to make each iterator a range.
As I said before a range also has a total ordering of the objects that can be easily checked, this is a special king of iterator for me, not the most general. Take two ranges of two linked lists, you cannot easily build your intersections because you don't know their relative order, and checking it is inefficient.

> 
>> So this is what "my" iterator/range would look like :)
> 
> I encourage you to realize your design. Before long you'll find probably even more issues with it than I mentioned above, but you'll be gained in being better equipped to find proper solutions.

I hope we converge toward a good solution ;)

Fawzi
> 
> 
> Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Bill Baxter

Andrei Alexandrescu

Posted in reply to Bill Baxter

Bill Baxter wrote:
> On Wed, Sep 10, 2008 at 12:30 AM, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>> I've updated http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
>> with a drawing (right at the beginning) that illustrates what primitives we
>> need. Maybe this will help us choose even better names.
> 
> The text says that s should be a subrange of r, but the drawing shows
> s extending beyond r.  Does it actually need to be a subrange?

I am considering relaxing the requirements.

Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Fawzi Mohamed

Andrei Alexandrescu

Posted in reply to Fawzi Mohamed

Fawzi Mohamed wrote:
> are you sure? then a range is *exactly* equivalent to a STL iterator, only that it cannot go out of bounds:
> // left1-left2:
> while((!i1.isEmpty) && (!i1.atSamePlace(i2))){
>  i1.next;
> }
> // left2-left1:
> while((!i2.isEmpty) && (!i1.atSamePlace(i2))){
>  i1.next;
> }
> // union 1-2
> while((!i1.isEmpty) && (!(i1.atSamePlace(i2))){
>  i1.next;
> }
> while(!i2.isEmpty){
>  i2.next;
> }
> // union 2-1
> ...
> // lower triangle
> i1=c.all;
> while(!i1.isEmpty){
>  i2=c.all;
>  while(!i2.isEmpty && !i2.atSamePlace(i1)){
>    i2.next;
>  }
> well these are the operations that you can do on basically all iterators (and with wich you can define new iterators).
> The one you propose need an underlying total order that can be efficiently checked, for example iterators on trees do not have necessarily this property, and then getting your kind of intersection can be difficult (and not faster than the operation using atSamePlace.

I am getting seriously confused by this subthread. So are you saying that atSamePlace is your primitive and that you implement the other range operations all in linear time? If I did not misunderstand and that's your design, then you may want to revise that design right now. It will never work. I guarantee it.

> the only new thing is bidirectional iterator: an iterator that can go in both directions as extra iterator type (your bidirectional range is something different).

A bidirectional range is simply a range that you can shrink from either end.

> I think it is useful and I don't see the need to shoehorn it into a range. For me an iterator is an object that can generate a sequence by itself, so a range is an example of iterator, but I don't see the need to make each iterator a range.

I have put forth reasons for doing away with iterators entirely in the range doc. What are your counter-reasons for bringing back iterators?

> As I said before a range also has a total ordering of the objects that can be easily checked, this is a special king of iterator for me, not the most general. Take two ranges of two linked lists, you cannot easily build your intersections because you don't know their relative order, and checking it is inefficient.

Correct. The range intersection primitives are Achille's heel of the range-based design. Checking subrange reachability is O(n), so the range intersection primitives take the user by faith. But iterators have that Achille's heel problem too, plus a few arrows in their back :o). The document clarifies this disadvantage by saying that range intersection primitives are undefined if certain conditions are not met.

In short, this is an endemic problem of an iteration based on either ranges or individual iterators. An objection to that should automatically come with a constructive proof, i.e. a better design.

>>> So this is what "my" iterator/range would look like :)
>>
>> I encourage you to realize your design. Before long you'll find probably even more issues with it than I mentioned above, but you'll be gained in being better equipped to find proper solutions.
> 
> I hope we converge toward a good solution ;)

Well I haven't seen much code yet.

Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Steven Schveighoffer

Andrei Alexandrescu

Posted in reply to Steven Schveighoffer

Steven Schveighoffer wrote:
> "Derek Parnell" wrote
>> On Tue, 09 Sep 2008 10:45:53 -0500, Andrei Alexandrescu wrote:
>>
>>
>>>          ref int left() { ... }
>> Is "left" a "movement in a specific direction" as in "go left at the next
>> lights" or the "amount of stuff left over"? It is a bit ambiguous. Even if
>> it is a direction, is it moving towards the first or the last item? It is
>> not self-evident. As a user of the Latin alphabet I'd assume it was going
>> towards the first item but a Hebrew or Arabic users might assume it was
>> heading towards the end.
> 
> It means 'left-most element in the range'.  It gets you the first element in the range (i.e. the next element to iterate) without modifying the range.
> 
> I agree that it is very misleading, but I think Andrei is exploring other possibilities (see other threads).

Finally the coin dropped on the Arabic/Hebrew cultural thing. I don't think they'd be offended. This is not writing. Left is left and right is right in math.

But yes... first and last are in I guess. I'd also like *r as a shortcut for r.first, as it will be no doubt used very intensively.


Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to bearophile

Andrei Alexandrescu

Posted in reply to bearophile

bearophile wrote:
> Andrei Alexandrescu:
>> In Phobos things will be something like:
> 
> List!(Filedesc) lst;
> for (auto r = lst.all; !r.isEmpty; ) {
>      if (should_remove)
>           r = lst.erase(take(1, r));
>      else
>           r.next;
> }
> 
> It may be better to invent and add some sugar to that, and foreach helps, maybe something like:
> 
> List!(Filedesc) lst;
> foreach (ref r; lst.all) {
>      if (predicate(lst.item(r)))
>           r = lst.erase(r);
>      else
>           r = r.next();
> }
> 
> I think that code of mine isn't good yet :-)

Wow, that's risky. foreach bumps r under the hood, so you'll skip some elements.

Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Steven Schveighoffer

Andrei Alexandrescu

Posted in reply to Steven Schveighoffer

Steven Schveighoffer wrote:
> "Andrei Alexandrescu" wrote
> I thought you stated that 'pointers' shouldn't be allowed, only ranges?  In general, I agree with that, but I think the ability to use a pointer type instead of ranges has advantages in some cases.

I think there's a little confusion. There's three things:

1. Ranges
2. Iterators
3. Pointers, e.g. the exact address where the object sits in memory

My address uses 1 and drops 2. You still have access to 3 if you so need.

void showAddresses(R)(R r)
{
    for (size_t i = 0; !r.isEmpty; r.next, ++i)
    {
        writeln("Element ," i, " is sitting at address: ", &(r.first));
    }
}

Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Benji Smith

Andrei Alexandrescu

Posted in reply to Benji Smith

Benji Smith wrote:
> Andrei Alexandrescu wrote:
>> I put together a short document for the range design. I definitely missed about a million things and have been imprecise about another million, so feedback would be highly appreciated. See:
>>
>> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
> 
> Just thinking off the top of my head...
> 
> How well would the proposal support a producer/consumer work queue, or a signal/slot implementation?
> 
> A work-queue consumer would view the queue as an infinite range with no end, but the producer would view that same queue as an infinite range with no beginning. And, conceivably, you could have "conduits" between the producer and consumer that would view that same queue as having neither a beginning nor an end.
> 
> I'm making no judgment about whether the proposal supports or doesn't support that kind of model. I'm just putting the idea out there for consideration.

I think it's great to bring the idea up for discussion. The current design does not cater to such uses yet.

Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Derek Parnell

Andrei Alexandrescu

Posted in reply to Derek Parnell

Derek Parnell wrote:
> On Tue, 09 Sep 2008 10:30:58 -0500, Andrei Alexandrescu wrote:
> 
> 
>> I'd like to go with:
>>
>> r.first
>> r.last
>> r.next
>> r.pop
> 
> LOL ... I was just thinking to myself ... "what's wrong with First and
> Last? I should suggest them." then I read this post.
> 
> "next" is fine, but "pop"? Isn't the pair of "next" called "prev(ious)" and
> the pair of "pop" called "push". So please, either have next/prev or
> push/pop, and in that case push/pop looks quite silly.

Previous is confusing as it suggest I'm moving back where I came from. In reality I shrink the range from the other end. So we need:

"Shrink the range from the left end"
"Shrink the range from the right end"

The first will be used much more often than the second.


Andrei

September 09, 2008

Re: RFC on range design for D2

Posted by Steven Schveighoffer
in reply to Andrei Alexandrescu

Steven Schveighoffer

Posted in reply to Andrei Alexandrescu

"Andrei Alexandrescu" wrote
> Steven Schveighoffer wrote:
>> "Andrei Alexandrescu" wrote
>> I thought you stated that 'pointers' shouldn't be allowed, only ranges?
>> In general, I agree with that, but I think the ability to use a pointer
>> type instead of ranges has advantages in some cases.
>
> I think there's a little confusion. There's three things:
>
> 1. Ranges
> 2. Iterators
> 3. Pointers, e.g. the exact address where the object sits in memory

Yes, I have been using the terms iterator and pointer interchangably, my bad :)  I look at pointers as a specialized type of iterator, ones for which only 'dereference' is defined (and on contiguous memory types such as arrays, increment and decrement).

>
> My address uses 1 and drops 2. You still have access to 3 if you so need.
>
> void showAddresses(R)(R r)
> {
>     for (size_t i = 0; !r.isEmpty; r.next, ++i)
>     {
>         writeln("Element ," i, " is sitting at address: ", &(r.first));
>     }
> }

Let me explain by example:

HashMap!(uint, myResource) resources;

....

// returns something that allows me to later remove the element
auto r = resources.find(key);

useResource(r);

resources[newkey] = new myResource;

resources.erase(r);

Now, assuming that adding the new resource rehashes the hash map, what is in r such that it ONLY points to the single resource?  A marker saying 'only one element'?  Perhaps you just deleted a range you didn't mean to delete, when you only wanted to delete a single resource.  Perhaps r is now considered 'invalid'.  Granted, this example can be fixed by reordering the lines of code, and perhaps you don't care about the penalty of looking up the key again, but what if I want to save the iterator to the resource somewhere and delete it later in another function?  And what if the cost of lookup for removal is not as quick?

I think with a range being the only available 'iterator' type for certain containers may make life difficult for stuff like this.  I really don't think iterator is the right term for what I think is needed, what I think is needed is a dumbed down pointer.  Something that has one operation -- opStar.  No increment, no decrement, just 'here is a reference to this element'  that can be passed into the container to represent a pointer to a specific element.

-Steve

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation