RFC on range design for D2 (page 18) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Announce » RFC on range design for D2 (page 18)

September 10, 2008

Re: RFC on range design for D2

Posted by Steven Schveighoffer
in reply to Andrei Alexandrescu

Steven Schveighoffer

Posted in reply to Andrei Alexandrescu

"Andrei Alexandrescu" wrote
> Steven Schveighoffer wrote:
>> "Andrei Alexandrescu" wrote
>>> Notice that "a range can't grow" is different from "a range can't be assigned from a larger range". In particular, a range operation can return a range larger than both input ranges. But not larger than their union :o).
>>
>> Yes, so every time you add an element you have to update your forward range from the 'all' range so it includes the new element at the end. Every time you remove an element, you have to update your reverse range from the 'all' range so it excludes the element at the beginning. Failure to do this results in invalid ranges, which seems to me like a lot more work than simply not doing anything  (in the case of an iterator).  The pitfalls of using ranges for dynamically changing containers might outweigh the advantages that they provide in certain cases.
>
> No, this is incorrect. I don't "have to" at all. I could define the behavior of range as you mention, or I could render them undefined. Iterators invalidate anyway at the drop of a hat, so they're none the wiser. You can't transform a lack of an advantage into a disadvantage.

A range or iterator that becomes undefined when adding an element to a linked list or removing an element from a linked list (provided you don't remove the element in question) makes it useless for this type of purpose. What I want is a cursor that saves the position of an element, not the end and beginning.

Here is what I'm assuming a range consists of, and granted this is an assumption since I didn't look at any of your implementation, and a list object which uses ranges doesn't even exist AFAIK.  Assume that integers below are individual elements

all: 0 1 2 3 4 5 6 7 8 9 E
reverse range: 0 1 2 3 4
forward range: 5 6 7 8 9 E

Now I remove an element from the front:

all: 1 2 3 4 5 6 7 8 9 E
reverse range: ? 1 2 3 4
forward range: 5 6 7 8 9 E

I've now lost my reverse iterator because it's no longer valid, but I can reconstruct it by diffing the forward iterator and list.all.

If I add to the end I got a similar situation, I can reconstruct my forward iterator by diffing list.all and the reverse iterator.

Yes, it can be done, but it seems like more work than it's worth for this case.  The problem is, not only do I have to pay attention to what the end and beginning of the list are (as I would with an iterator), but I also have to pay attention to the same pieces in the ranges.  So ranges (in this implementation) have given me more work to do, and their still not safe because I could mistakenly use an invalid range.

> "Look at this pineapple. It's fresher than the other, and bigger too."
>
> "No, it's about as big. That pineapple sucks."

???

>
>>>> My belief is that ranges should be the form of input to algorithms, but iterators should be provided for using containers as general data structures.  Similar to how strings are represented by arrays/slices, but iterators (pointers) exist if you need them.
>>> If we agree it's better without iterators if not needed, we'd need a strong case to add them. Right now I have a strong case against them.
>>
>> I don't need to worry about whether you have them or not, I can always implement them on my own ;)  Really, range/iterator support doesn't require direct support from the compiler (except for builtin arrays), and any improvements made to the compiler to support ranges (such as reference returns, etc) can be applied to iterators as well.
>>
>> I think ranges are an excellent representation when a range of elements is needed.  I think a cursor or iterator is an excellent representation when an individual position is needed.
>>
>>>> I'll probably move forward with this model in dcollections, I really like the range idea, and in general the view on how ranges are akin to slices. But I also like having access to iterators for other functions.
>>> Which functions?
>>
>> Functions which take or return a single position.  Such as 'erase the element at this position' or 'find the position of element x'.
>
> I agree. In fact I agreed in my original document, which I quote: ``Coding with ranges also has disadvatages. Some algorithms work naturally with individual iterators in the "middle" of a range. A range-based implementation would have to maintain a redundant range spanning e.g. from the beginning of the container to that middle.''
>
> However, I could meanigfully rewrite std.algorithm to work on ranges alone. The disadvantage does exist but is minor, For example, find does not return an iterator. It simply shrinks its input range until the element is found, or until it is empty. That way you can nicely use the result of find iteratively.

I totally agree with you that ranges are the way to go for std.algorithm.  I am not debating that.

But you have no example of how iterators and ranges compare for using non-array containers in situations BESIDES running std.algorithm.  I'm showing you an example, which happens to model after code I actually wrote and use, where iterators seem to be more suited for the task.

> Range find(alias pred = "a == b", Range, E)(Range haystack, E needle)
> {
>     alias binaryFun!(pred) test;
>     for (; !haystack.isEmpty; haystack.next)
>     {
>         if (test(haystack.first, needle)) break;
>     }
>     return haystack;
> }
>
> This is quite a few bites smaller than the previous version, which is now to be found in std.algorithm:
>
> Iterator!(Range) find(alias pred = "a == b", Range, E)(Range haystack, E
> needle)
> {
>     alias binaryFun!(pred) test;
>     for (auto i = begin(haystack); i != end(haystack); ++i)
>     {
>         if (test(*i, needle)) return i;
>     }
>     return end(haystack);
> }
>
> It has two less variables, and VERY importantly, one less type to deal with. Arguments aired against primitive ranges systematically omit this important simplification they bring. When you don't weigh in the advantages, of course all there is to be seen are the disadvantages.
>
> Better yet, when find does return, client code's in better shape because it doesn't need to compare the result against end(myrange). It can just test whether it's empty and be done with.

Unless myrange has changed since you called find.  In which case you have to run find again to get the range?

> So a newcomer to D2 would have to have an understanding of containers and ranges. Containers own data. They offer various traversals to crawl them in the form of ranges. Ranges are generalized slices.
>
> If iterators are thrown into the mix, things get more complex because iterators are a lower-level primitive, a generalized pointer. So the newcomer would have to ALSO understand iterators and deal with functions that require or return either. They'd have to learn how to pair iterators from ranges and how to extract iterators from ranges (more primitives). They'd also have to understand when it's better to hold on to a range (most of the time) versus a naked iterator (seldom and for a dubious convenience).
>
> I /understand/ there are advantages to iterators. Just don't forget the cost when discussing them.

I don't forget the cost.  I absolutely *100%* agree that ranges are a much better representation for std.algorithm.  i.e. when a RANGE OF VALUES is required.  When you want references SINGLE ELEMENTS that persist across container changes, I think the best implementation is a cursor/iterator.  I think they can both exist.  I think there is value to having a pointer to a single element without storing the boundaries with that pointer.

This is just like the const debate that I continue to have with you and Walter.  You want const for different reasons than for what I want const.  I want const for contracts, and you want it for pure functions.  You seem to dismiss anything that isn't in your realm of requirements as 'dubious' and 'seldom used'.  Other people have requirements that are different than yours, and are just as valid.

> I am also sure that if I sat down long enough contemplating my navel I
> could come with more examples of iterators=good/ranges=bad.
> <snip>

Now you're just being rude :)  Please note that I'm not attacking you personally.  All I'm pointing out is that your solution solves certain problems VERY well, but leaves other problems not solved.  I think allowing iterators/cursors would solve all the problems.  I might be proven wrong, but certainly I don't think you've done that so far.  I'd love to be proven wrong, since I agree that iterators are generally unsafe.

-Steve

September 10, 2008

Re: RFC on range design for D2

Posted by Fawzi Mohamed
in reply to Steven Schveighoffer

Fawzi Mohamed

Posted in reply to Steven Schveighoffer

I am sorry I hadn't the time t fully follow the discussion, but I took some time to actually define how I think a basic iterator should be, in this I think I am quite close to Bill and Steven from what I could see.

Again I am not against ranges, ranges are nice, but iterators are more general, and in my opinion they should be what foreach targets.
Then ranges can basically trivially be an iterator that has more structure (compareLevel= FullOrdering) and more

basic idea:
an iterator has a position in a sequence+ possibility to move into it


Basic Iterator

// return element and go to next
// (reasons: most common use, only one function call (good if not inlined), also o for iterators on data that is not stored)
// throw an exception if you iterate past end
T next();
void transferTo(ref R it2) // transfer this iterator to it2 (un-copyable iterators)
void stop(); // stop the iteration (might release resources)
size_t nElNext(); // number of elements, constant time, 0 if empty
ComparePos comparePos(R it2); // comparison of position, has to be in constant time
static const CompareLevel compareLevel; // compare level (for compile time choices)
static const SizeLevel sizeLevel; // size level (for compile time choices)

Copyable Generator: Iterator
T value; // return the actual value
opAssign(R); // copies the iterator

A range is obviously also a Basic iterator, but has more structure

* Bidirectional Iterator
size_t nElPrev; // number of elements, constant time, 0 if empty
T prev; // goes to previous element

// constants
enum ComparePos:int {
   Uncomparable, // might be bigger smaller or incompatible (Same only if compareLevel==CompareLevel.None)
   Incompatible, // ranges of two different sequences
   Same, // at the same position
   Growing, // in growing order
   Descending // in decreasing order
}
enum CompareLevel:int {
   None, // no comparison
   Equal, // can decide if they are at the same position in constant time
   FullOrdering // can compare all iterators in constant time
}
enum SizeLevel:int{
   Bounded, // finite and known size
   Finite, // finite but possibly unknown size
   MaybeFinite, // maybe finite, maybe infinite
   Infinite // infinite
}
const INFINITE_SIZE=size_t.max;
const MAYBE_INFINITE=size_t.max-1;
const UNKNOWN_FINITE=size_t.max-2;

September 10, 2008

Re: RFC on range design for D2

Posted by Sergey Gromov
in reply to Steven Schveighoffer

Sergey Gromov

Posted in reply to Steven Schveighoffer

Steven Schveighoffer <schveiguy@yahoo.com> wrote:
> "Andrei Alexandrescu" wrote
> > Steven Schveighoffer wrote:
> >> "Andrei Alexandrescu" wrote
> >>> Notice that "a range can't grow" is different from "a range can't be assigned from a larger range". In particular, a range operation can return a range larger than both input ranges. But not larger than their union :o).
> >>
> >> Yes, so every time you add an element you have to update your forward range from the 'all' range so it includes the new element at the end. Every time you remove an element, you have to update your reverse range from the 'all' range so it excludes the element at the beginning. Failure to do this results in invalid ranges, which seems to me like a lot more work than simply not doing anything  (in the case of an iterator).  The pitfalls of using ranges for dynamically changing containers might outweigh the advantages that they provide in certain cases.
> >
> > No, this is incorrect. I don't "have to" at all. I could define the behavior of range as you mention, or I could render them undefined. Iterators invalidate anyway at the drop of a hat, so they're none the wiser. You can't transform a lack of an advantage into a disadvantage.
> 
> A range or iterator that becomes undefined when adding an element to a linked list or removing an element from a linked list (provided you don't remove the element in question) makes it useless for this type of purpose. What I want is a cursor that saves the position of an element, not the end and beginning.
> 
> Here is what I'm assuming a range consists of, and granted this is an assumption since I didn't look at any of your implementation, and a list object which uses ranges doesn't even exist AFAIK.  Assume that integers below are individual elements
> 
> all: 0 1 2 3 4 5 6 7 8 9 E
> reverse range: 0 1 2 3 4
> forward range: 5 6 7 8 9 E
> 
> Now I remove an element from the front:
> 
> all: 1 2 3 4 5 6 7 8 9 E
> reverse range: ? 1 2 3 4
> forward range: 5 6 7 8 9 E
> 
> I've now lost my reverse iterator because it's no longer valid, but I can reconstruct it by diffing the forward iterator and list.all.
> 
> If I add to the end I got a similar situation, I can reconstruct my forward iterator by diffing list.all and the reverse iterator.

You don't mention here which iterator usage pattern you are trying to model with ranges.  I can think of at least two.

1.  You use a single bidirectional 'center' iterator, center == 5.  As one would naturally do with iterators.  Note then that whenever you use your center for, say, backward iteration, you reconstruct the actual range by calling list.begin.  You do it on each iteration.  No wonder it stays valid even if you remove the first element in the meantime: you're constructing your range from scratch anyway.  If you want to model this pattern with ranges---no problem, keep an empty 'center' range, center == (5,5), and reconstruct backward iteration range,

reverse = all.before(center);

whenever you need to iterate, then

center = reverse.end;

This 'center' range, being slightly less efficient, stays valid and becomes invalid in exactly the same conditions as your classical iterator.

2.  You use 3 iterators, one for the list start, one for the center, and one for the end.  In this case the 'start' iterator becomes invalid after removing the first element exactly like 'reverse' range becomes invalid in your example, with exactly the same consequences.

September 10, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Steven Schveighoffer

Andrei Alexandrescu

Posted in reply to Steven Schveighoffer

Steven Schveighoffer wrote:
>> I am also sure that if I sat down long enough contemplating my navel I
>> could come with more examples of iterators=good/ranges=bad.
>> <snip>
> 
> Now you're just being rude :)  Please note that I'm not attacking you personally.  All I'm pointing out is that your solution solves certain problems VERY well, but leaves other problems not solved.  I think allowing iterators/cursors would solve all the problems.  I might be proven wrong, but certainly I don't think you've done that so far.  I'd love to be proven wrong, since I agree that iterators are generally unsafe.

Didn't mean to. You are making great points, and I hope (without being sure) they can be addressed. The "contemplating navel" thing is a fave quote of mine from Bjarne's book on C++.

Andrei

September 10, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Sergey Gromov

Andrei Alexandrescu

Posted in reply to Sergey Gromov

Sergey Gromov wrote:
> Steven Schveighoffer <schveiguy@yahoo.com> wrote:
>> "Andrei Alexandrescu" wrote
>>> Steven Schveighoffer wrote:
>>>> "Andrei Alexandrescu" wrote
>>>>> Notice that "a range can't grow" is different from "a range can't be assigned from a larger range". In particular, a range operation can return a range larger than both input ranges. But not larger than their union :o).
>>>> Yes, so every time you add an element you have to update your forward range from the 'all' range so it includes the new element at the end. Every time you remove an element, you have to update your reverse range from the 'all' range so it excludes the element at the beginning. Failure to do this results in invalid ranges, which seems to me like a lot more work than simply not doing anything  (in the case of an iterator).  The pitfalls of using ranges for dynamically changing containers might outweigh the advantages that they provide in certain cases.
>>> No, this is incorrect. I don't "have to" at all. I could define the
>>> behavior of range as you mention, or I could render them undefined.
>>> Iterators invalidate anyway at the drop of a hat, so they're none the
>>> wiser. You can't transform a lack of an advantage into a disadvantage.
>> A range or iterator that becomes undefined when adding an element to a linked list or removing an element from a linked list (provided you don't remove the element in question) makes it useless for this type of purpose. What I want is a cursor that saves the position of an element, not the end and beginning.
>>
>> Here is what I'm assuming a range consists of, and granted this is an assumption since I didn't look at any of your implementation, and a list object which uses ranges doesn't even exist AFAIK.  Assume that integers below are individual elements
>>
>> all: 0 1 2 3 4 5 6 7 8 9 E
>> reverse range: 0 1 2 3 4
>> forward range: 5 6 7 8 9 E
>>
>> Now I remove an element from the front:
>>
>> all: 1 2 3 4 5 6 7 8 9 E
>> reverse range: ? 1 2 3 4
>> forward range: 5 6 7 8 9 E
>>
>> I've now lost my reverse iterator because it's no longer valid, but I can reconstruct it by diffing the forward iterator and list.all.
>>
>> If I add to the end I got a similar situation, I can reconstruct my forward iterator by diffing list.all and the reverse iterator.
> 
> You don't mention here which iterator usage pattern you are trying to model with ranges.  I can think of at least two.
> 
> 1.  You use a single bidirectional 'center' iterator, center == 5.  As one would naturally do with iterators.  Note then that whenever you use your center for, say, backward iteration, you reconstruct the actual range by calling list.begin.  You do it on each iteration.  No wonder it stays valid even if you remove the first element in the meantime: you're constructing your range from scratch anyway.  If you want to model this pattern with ranges---no problem, keep an empty 'center' range, center == (5,5), and reconstruct backward iteration range,
> 
> reverse = all.before(center);
> 
> whenever you need to iterate, then
> 
> center = reverse.end;
> 
> This 'center' range, being slightly less efficient, stays valid and becomes invalid in exactly the same conditions as your classical iterator.
> 
> 2.  You use 3 iterators, one for the list start, one for the center, and one for the end.  In this case the 'start' iterator becomes invalid after removing the first element exactly like 'reverse' range becomes invalid in your example, with exactly the same consequences.

I'm acquiring the nagging feeling that Sergey understands ranges better than I do. I could understand how to address Steven's point only after reading the post above. Thanks, Sergey.

Andrei

September 10, 2008

Re: RFC on range design for D2

Posted by Sean Kelly
in reply to Sergey Gromov

Sean Kelly

Posted in reply to Sergey Gromov

Sergey Gromov wrote:
> 
> Now the same algorithm in ranges:
> 
>> Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
>> {
>>     for (;;)
>>     {
>>         if (s1.isEmpty())
>>             dst[] = s2[];
>>         else if (s2.isEmpty())
>>             dst[] = s1[];

I'm not sure the above is correct.  It should return after the copy is performed, and the code also assumes that the size of dst is equal to the size of s2 and s1, respectively.

>>         else if (s1.last < s2.last)
>>         {
>>             dst.last = s1.last;
>>             s1.shrink();
>>         }
>>         else
>>         {
>>             dst.last = s2.last;
>>             s2.shrink();
>>         }
>>         dst.shrink();
>>     }
>> }
> 
> If there were shrink-on-read and (eureka!) shrink-on-write operations, it would be even shorter:
> 
>>         else if (s1.last < s2.last)
>>             dst.putBack(s1.getBack());
>>         else
>>             dst.putBack(s2.getBack());
> 
> where both getBack() and putBack() shrink the range from the end side.

Very slick.


Sean

September 10, 2008

Re: RFC on range design for D2

Posted by Sergey Gromov
in reply to Andrei Alexandrescu

Sergey Gromov

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> I'm acquiring the nagging feeling that Sergey understands ranges better than I do. I could understand how to address Steven's point only after reading the post above. Thanks, Sergey.

Thank you, and welcome! ;)

P.S. I really love the looks of "reverse = all.before(center);" It's like writing program in plain English.

September 10, 2008

Re: RFC on range design for D2

Posted by Derek Parnell
in reply to Sergey Gromov

Derek Parnell

Posted in reply to Sergey Gromov

On Thu, 11 Sep 2008 02:20:32 +0400, Sergey Gromov wrote:

> Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>> I'm acquiring the nagging feeling that Sergey understands ranges better than I do. I could understand how to address Steven's point only after reading the post above. Thanks, Sergey.
> 
> Thank you, and welcome! ;)
> 
> P.S. I really love the looks of "reverse = all.before(center);" It's like writing program in plain English.

Oh boy! We must put an end to that otherwise we might all be out of a job ;-)
-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

September 10, 2008

Re: RFC on range design for D2

Posted by Sergey Gromov
in reply to Sean Kelly

Sergey Gromov

Posted in reply to Sean Kelly

Sean Kelly <sean@invisibleduck.org> wrote:
> Sergey Gromov wrote:
> > 
> > Now the same algorithm in ranges:
> > 
> >> Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
> >> {
> >>     for (;;)
> >>     {
> >>         if (s1.isEmpty())
> >>             dst[] = s2[];
> >>         else if (s2.isEmpty())
> >>             dst[] = s1[];
> 
> I'm not sure the above is correct.  It should return after the copy is performed, and the code also assumes that the size of dst is equal to the size of s2 and s1, respectively.

Of course there should be return statements, thank you.  I've never tested this code (obviously), just've thrown it together, so there ought to be stupid mistakes like this.

As to the destination size.  This is merge sort.  The size of destination buffer must be the sum of the sizes of source buffers.  As soon as one of the source buffers is empty, i.e. completely moved to the destination, there must be place exactly for what left in another source buffer.  If this condition doesn't hold then the arguments weren't correct in the first place.

September 10, 2008

Re: RFC on range design for D2

Posted by Andrei Alexandrescu
in reply to Sergey Gromov

Andrei Alexandrescu

Posted in reply to Sergey Gromov

Sergey Gromov wrote:
> Sean Kelly <sean@invisibleduck.org> wrote:
>> Sergey Gromov wrote:
>>> Now the same algorithm in ranges:
>>>
>>>> Merge_backward(R1, R2, R3)(R1 s1, R2 s2, R3 dst)
>>>> {
>>>>     for (;;)
>>>>     {
>>>>         if (s1.isEmpty())
>>>>             dst[] = s2[];
>>>>         else if (s2.isEmpty())
>>>>             dst[] = s1[];
>> I'm not sure the above is correct.  It should return after the copy is performed, and the code also assumes that the size of dst is equal to the size of s2 and s1, respectively.
> 
> Of course there should be return statements, thank you.  I've never tested this code (obviously), just've thrown it together, so there ought to be stupid mistakes like this.
> 
> As to the destination size.  This is merge sort.  The size of destination buffer must be the sum of the sizes of source buffers.  As soon as one of the source buffers is empty, i.e. completely moved to the destination, there must be place exactly for what left in another source buffer.  If this condition doesn't hold then the arguments weren't correct in the first place.

Speaking of copying, C++'s std::copy and friends have been under increasing scrutiny lately because of their inability to modularly protect data against overruns. STL's three-argument functions that copy out are often a kiss of death for inexperienced STL users. I'm glad ranges cut that Gordian knot.

Andrei

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation