RFC on range design for D2 (page 20) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Announce » RFC on range design for D2 (page 20)

September 11, 2008

Re: RFC on range design for D2

Posted by Steven Schveighoffer
in reply to Bill Baxter

Steven Schveighoffer

Posted in reply to Bill Baxter

"Bill Baxter" wrote
> So far though we don't seem to be able to come up with a good example other of where ranges are weak than traversing a list back and forth. Note that "move back and forth according to some user input" is not clearly not an "algorithm" that would be in std.algorithm.  But it does come up often enough in applications.  I don't think the fact that it's not strictly an Algorithm-with-a-captial-A makes it any less important.
>
> But it is a little fishy that we can't come up with any other example besides sliding a bead on a wire back and forth.

Any structure that might change topology doesn't lend itself well to persistant ranges.  Ranges are fine for iterating over a constant version of the container.  i.e., if you want to implement a search function, where you are assuming that during the search, the container doesn't change, that should take a range as an argument.  But storing references to individual elements for later use (such as O(1) lookup or quick removal), and modifying the container inbetween getting the reference and using the reference makes it difficult to guarantee the behavior.  The only range type that seems like it would be immune to such changes would be the empty range where both ends point to the same element.  In fact, this can be reduced to a single reference, just copied for the sake of calling it a 'range'.

Arrays are really a special case where the ranges unequivocally work because once you get a range, all of it is guaranteed not to disappear or change topology.  i.e. a slice always contains valid data, no matter what you do to the original array.  I think this is the model Andrei is trying to achieve for all containers/iterables, and I think it's just not the same.  I think passing the range around as one entity is a very helpful thing, especially for algorithms which generally take ranges in the form of 2 iterators, but I don't think it solves all problems.

-Steve

September 11, 2008

Re: RFC on range design for D2

Posted by Bill Baxter
in reply to Steven Schveighoffer

Bill Baxter

Posted in reply to Steven Schveighoffer

On Thu, Sep 11, 2008 at 10:35 AM, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
> "Bill Baxter" wrote
>> On Thu, Sep 11, 2008 at 9:32 AM, Bill Baxter <wbaxter@gmail.com> wrote:
>>> On Thu, Sep 11, 2008 at 8:17 AM, Steven Schveighoffer
>>>> To get the value there, I have to do:
>>>> all.after(center).left // or whatever gets decided as the 'get first
>>>> value
>>>> of range' member
>>>> or if opStar is used:
>>>>
>>>> *all.after(center);
>>>
>>> Why is all that necessary?  Can't you just do a  *center?
>>
>> Oh, I get it.  It's empty.  Duh.
>>
>> Ok, so you can have third cursor function in the std lib:
>>
>> T cursorValue(R,T)(R all, R center)
>> {
>>   return all.after(center).left;
>> }
>> ... plus the
>> cursorAdvance and cursorRetreat.
>
> That is all fine and dandy in the world of "I don't care how well my iterators perform or how much code bloat is added because of them," but I usually work in a different world ;)

Ok, but I have yet to hear an actual use case that demands blazing fast iteration both forwards and backwards.  In your shuffling video there's no way moving the iterator back and forth is going to be the bottleneck.  In my undo/redo stack example it is also far from being on the critical path.    I think it goes back to the fact that going back and forth randomly isn't a property of many algorithms.  In all the examples I can think of it's more a property of how humans interact with data.  And humans are slow compared to how long it takes to update a few extra values.

Certainly one-way iteration needs to be as fast as possible, for all kinds of algorithms.  But does bidirection iteration really need to be super-fast?

> But if I were forced not to use an iterator model (which isn't the case, iterators should be very possible without compiler help), I would actually implement this as a wrapper struct:
>
> struct Cursor(containerType)
> {
>   private Range!(containerType) _cur;
>   private containerType owner;
>
>   Cursor  moveLeft() {...}
>   Cursor moveRight() {...}
>   bool hasLeft() {...}
>   etc.
> }

That would work for me too.  Just put it in the standard lib so I don't have to scratch my head wondering why such a basic thing is so hard to do!  Of course once you do that, you have to wonder why this one's interface isn't branded a range concept but the others are.  (I know I know... it's not Stepanov "basic"), but if it's there and people want to use it, I see no value in refusing to recognize it on purist grounds.

> Thus one can implement iterators on top of ranges, but I'd argue that ranges are much easier to implement on top of iterators.

Ranges are safer and easier to work with in most cases so it's worth
it, or so the argument goes.  You don't buy it?
I think things like infinite generators make more sense as a range
because it's difficult to express succinctly as two iterators.  Or
perhaps you don't mean to imply that every range would have a begin()
and an end() iterator you could access?

> In any case, I think there are benefits to having a range type that is not necessarily defined as two iterators.

But how to do it without a large increase in the number of fundamental concepts you have to keep track of -- that's the issue.

--bb

September 11, 2008

Re: RFC on range design for D2

Posted by Bill Baxter
in reply to Steven Schveighoffer

Bill Baxter

Posted in reply to Steven Schveighoffer

On Thu, Sep 11, 2008 at 10:46 AM, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
> "Bill Baxter" wrote
>> So far though we don't seem to be able to come up with a good example other of where ranges are weak than traversing a list back and forth. Note that "move back and forth according to some user input" is not clearly not an "algorithm" that would be in std.algorithm.  But it does come up often enough in applications.  I don't think the fact that it's not strictly an Algorithm-with-a-captial-A makes it any less important.
>>
>> But it is a little fishy that we can't come up with any other example besides sliding a bead on a wire back and forth.
>
> Any structure that might change topology doesn't lend itself well to persistant ranges.

But they often don't lend themselves to iterators either.

> Ranges are fine for iterating over a constant version of
> the container.  i.e., if you want to implement a search function, where you
> are assuming that during the search, the container doesn't change, that
> should take a range as an argument.  But storing references to individual
> elements for later use (such as O(1) lookup or quick removal), and modifying
> the container inbetween getting the reference and using the reference makes
> it difficult to guarantee the behavior.

Lots of algorithms on containers using iterators have this property too.

> The only range type that seems like
> it would be immune to such changes would be the empty range where both ends
> point to the same element.  In fact, this can be reduced to a single
> reference, just copied for the sake of calling it a 'range'.

Or a here-to-end range where one end points to a special "end"
sentinel.  Like with a linked list.
You can't change the sentinel by mutating the contents so it remains
valid.  I think this would be a more common/useful form than the empty
range.  Because you can dereference the range from here-to-end, but
you can't dereference the range from here to here.

Probably the cursor idiom should also use "all" and "here-to-end" rather than "all" and "here-to-here".  Then the all.after(center).value ugliness isn't needed, just center.value.  (I prefer ".value" to ".first")

> Arrays are really a special case where the ranges unequivocally work because once you get a range, all of it is guaranteed not to disappear or change topology.  i.e. a slice always contains valid data, no matter what you do to the original array.  I think this is the model Andrei is trying to achieve for all containers/iterables, and I think it's just not the same.  I think passing the range around as one entity is a very helpful thing, especially for algorithms which generally take ranges in the form of 2 iterators, but I don't think it solves all problems.

Well it solves them, but is it worth the tradeoffs you have to make. You say no.  I say I now have a reasonable way to handle the only example I could think of that seemed really cumbersome.  Lacking further evidence it seems the main problem remaining is just having to carry around an extra value when you just want to refer to one value.

However, those pointers to "single values" can also move, so for safety's sake why not keep the fence post with it?

--bb

September 11, 2008

Re: An alternative to .init

Posted by Brad Roberts
in reply to Manfred_Nowak

Brad Roberts

Posted in reply to Manfred_Nowak

Manfred_Nowak wrote:
> Walter Bright wrote:
> 
>> But for many types, there is no such thing as an invalid value
> 
> Why can one then define
> 
> |   typedef int T=void; // T.init == void
> 
> -manfred

That just means don't initialize, leaving any instances with random values until first assignment.  That doesn't mean that it contains an invalid value.

Later,
Brad

September 11, 2008

Re: RFC on range design for D2

Posted by JAnderson
in reply to Andrei Alexandrescu

JAnderson

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu wrote:
> JAnderson wrote:
>>
>> Hi Andrei,
>>
>> I like the idea behind ranges.  I don't like C++'s / stl's long winded syntax at all.  Its so large that it generally uses up several lines along with several typedefs etc...  All that work just to iterate over some data.  The longer things get the more error prone they get... how many times have I put an begin when I meant to put end *sigh*.
>>
>> However I currently disagree on this point.
>>
>> Andrei Alexandrescu wrote:
>>  >
>>  > Fine. So instead of saying:
>>  >
>>  > foreach (e; c.all) { ... }
>>  >
>>  > you can say
>>  >
>>  > foreach (e; c) { ... }
>>  >
>>  > I think that's some dubious savings.
>>
>>
>> I think its useful to have the implicit range conversion.  Consider writing generic/template code.  Of course built in arrays could provide the .all but then consider passing around ranges.  That would also mean all ranges would also have a .all (could we go .all.all.all for instance?).
> 
> There's no regression. There are containers and ranges. Containers have .all. Ranges don't.
> 

Just to be clear then.  Say you write something that works on arrays and objects.  Then you write:

void Foo(T)(T t)
{
	...
	foreach (auto i; t.all)
	{

	}
	...
}

Now I realize I want to use that function with a range as well as an object (its a template after all).  Well if .all isn't regressive then I can't.  Of course if .all was implicit then I might have written:

void Foo(T)(T t)
{
	...
	foreach (auto i; t)
	{

	}
	...
}

But then again, .all is still available so there's still a chance a coder might not realize that its better to use the implicit value.  I'm beginning to think regressive would be useful either way.

Note of course generic code does not just apply to templates.  It also applies when I want to change a variable to a different type.  If .all is required (and non-regressive) then I have to go to all the places that value is used and change it.  Its the same reason auto is so awesome.

Of course .all adds an extra function you'd need to implement for custom ranges, but it could always be in the "range" mixin.

> I think you guys are making a good point; I'm undecided on what would be better. One not-so-cool part about implicit conversion to range is that all of a sudden all range operations spill into the container. So people try to call c.pop and it doesn't compile. (Why?) They get confused.
> 
>> I'm all for compile time checking however I think that implicit .all (with of course an explicit option) will make it easy to change a function that once took an object to take a simple range  Also it would make it easy to change from one way of getting at a range to another.
>>
>> What about matrices?  They don't implement default .all, they would provide like .col and .row.
> 
> Bidimensional ones that is :o).

Of course :) being a games programmer, we know of only speak of one matrix type.

Just kidding.

> 
> 
> Andrei

September 11, 2008

Re: RFC on range design for D2

Posted by Benji Smith
in reply to Bill Baxter

Benji Smith

Posted in reply to Bill Baxter

Bill Baxter wrote:
> Ok, but I have yet to hear an actual use case that demands blazing
> fast iteration both forwards and backwards.  In your shuffling video
> there's no way moving the iterator back and forth is going to be the
> bottleneck.  In my undo/redo stack example it is also far from being
> on the critical path.    I think it goes back to the fact that going
> back and forth randomly isn't a property of many algorithms.  In all
> the examples I can think of it's more a property of how humans
> interact with data.  And humans are slow compared to how long it takes
> to update a few extra values.

Oh!! I thought of one!!

Parsers & regex engines move both forward and backward, as they try to match characters to a pattern.

Really, anything that uses an NFA or DFA to define patterns would benefit from fast bidirectional iteration...

--benji

September 11, 2008

Re: RFC on range design for D2

Posted by Bill Baxter
in reply to Benji Smith

Bill Baxter

Posted in reply to Benji Smith

On Thu, Sep 11, 2008 at 1:31 PM, Benji Smith <dlanguage@benjismith.net> wrote:
> Bill Baxter wrote:
>>
>> Ok, but I have yet to hear an actual use case that demands blazing fast iteration both forwards and backwards.  In your shuffling video there's no way moving the iterator back and forth is going to be the bottleneck.  In my undo/redo stack example it is also far from being on the critical path.    I think it goes back to the fact that going back and forth randomly isn't a property of many algorithms.  In all the examples I can think of it's more a property of how humans interact with data.  And humans are slow compared to how long it takes to update a few extra values.
>
> Oh!! I thought of one!!
>
> Parsers & regex engines move both forward and backward, as they try to match characters to a pattern.
>
> Really, anything that uses an NFA or DFA to define patterns would benefit from fast bidirectional iteration...

Good call.  I was about to post something mentioning that Turing machines but that seemed too academic.  Same class of thing as NFA/DFA/FSM.

The question is, though, would you really implement those things using a linked list?  I would expect most of those suckers work on arrays, and so can take advantage of the bidirectional nature of random access ranges.

Hmm, for FSMs you can't really define a good end state.  There may not be any particular end state. ... ah, but wait I forgot.  That's the beauty of a range -- the end "state" doesn't have to be a "state" per se.  It can be any predicate you want it to be.  "Range" is misleading in this case.  This is one of those cases where you just have to remember "range" means "current value plus stopping criterion".

--bb

September 11, 2008

Re: RFC on range design for D2

Posted by JAnderson
in reply to Andrei Alexandrescu

JAnderson

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu wrote:
> JAnderson wrote:
>>
>> Hi Andrei,
>>
>> I like the idea behind ranges.  I don't like C++'s / stl's long winded syntax at all.  Its so large that it generally uses up several lines along with several typedefs etc...  All that work just to iterate over some data.  The longer things get the more error prone they get... how many times have I put an begin when I meant to put end *sigh*.
>>
>> However I currently disagree on this point.
>>
>> Andrei Alexandrescu wrote:
>>  >
>>  > Fine. So instead of saying:
>>  >
>>  > foreach (e; c.all) { ... }
>>  >
>>  > you can say
>>  >
>>  > foreach (e; c) { ... }
>>  >
>>  > I think that's some dubious savings.
>>
>>
>> I think its useful to have the implicit range conversion.  Consider writing generic/template code.  Of course built in arrays could provide the .all but then consider passing around ranges.  That would also mean all ranges would also have a .all (could we go .all.all.all for instance?).
> 
> There's no regression. There are containers and ranges. Containers have .all. Ranges don't.
> 
> I think you guys are making a good point; I'm undecided on what would be better. One not-so-cool part about implicit conversion to range is that all of a sudden all range operations spill into the container. So people try to call c.pop and it doesn't compile. (Why?) They get confused.

I'm not sure that range operations need to spill over.  I was thinking
that foreach would be kinda like a template.  The foreach would do the
implict conversion. ie something like (pseudo):

foreach(I,T)(I i, T t, delegate d)
{
	foreach (I i; t.all)
	{
		d();
	}
}

Infact anything that takes range would implicitly convert (for they too
can be used inside generic code).  Of course that that would require
compiler support, probably.

> 
>> I'm all for compile time checking however I think that implicit .all (with of course an explicit option) will make it easy to change a function that once took an object to take a simple range  Also it would make it easy to change from one way of getting at a range to another.
>>
>> What about matrices?  They don't implement default .all, they would provide like .col and .row.
> 
> Bidimensional ones that is :o).
> 
> 
> Andrei

September 11, 2008

Re: RFC on range design for D2

Posted by Jason House
in reply to Steven Schveighoffer

Jason House

Posted in reply to Steven Schveighoffer

Steven Schveighoffer Wrote:

> "Bill Baxter" wrote
> > So far though we don't seem to be able to come up with a good example other of where ranges are weak than traversing a list back and forth. Note that "move back and forth according to some user input" is not clearly not an "algorithm" that would be in std.algorithm.  But it does come up often enough in applications.  I don't think the fact that it's not strictly an Algorithm-with-a-captial-A makes it any less important.
> >
> > But it is a little fishy that we can't come up with any other example besides sliding a bead on a wire back and forth.
> 
> Any structure that might change topology doesn't lend itself well to persistant ranges.

Who says all ranges have to be persistant? Ranges "from here to the end" can be dynamic similar to an iterator.

In my mind, the important cdiscussion is what "end" means and to compare when ranges and iterators get invalidated. I also wonder a bit about mixing ranges with non-iterable cursors. Of course, their limited value may not merit the complexity.








> Ranges are fine for iterating over a constant version of the container.  i.e., if you want to implement a search function, where you are assuming that during the search, the container doesn't change, that should take a range as an argument.  But storing references to individual elements for later use (such as O(1) lookup or quick removal), and modifying the container inbetween getting the reference and using the reference makes it difficult to guarantee the behavior.  The only range type that seems like it would be immune to such changes would be the empty range where both ends point to the same element.  In fact, this can be reduced to a single reference, just copied for the sake of calling it a 'range'.
> 
> Arrays are really a special case where the ranges unequivocally work because once you get a range, all of it is guaranteed not to disappear or change topology.  i.e. a slice always contains valid data, no matter what you do to the original array.  I think this is the model Andrei is trying to achieve for all containers/iterables, and I think it's just not the same.  I think passing the range around as one entity is a very helpful thing, especially for algorithms which generally take ranges in the form of 2 iterators, but I don't think it solves all problems.
> 
> -Steve
> 
>

September 11, 2008

Re: RFC on range design for D2

Posted by Sergey Gromov
in reply to Steven Schveighoffer

Sergey Gromov

Posted in reply to Steven Schveighoffer

Steven Schveighoffer <schveiguy@yahoo.com> wrote:
> What I see as the biggest downside is the cumbersome and verbose code of moving the 'iterator' around, as every time I want to move forward, I construct a new range, and every time I want to move backwards I construct a new range (and construct a new 'center' afterwards).  So a 'move back one' looks like:
> 
> auto before = all.before(center);
> if(!before.isEmpty)
>   center = before.pop.end;
> 
> And to move forward it's:
> auto after = all.after(center);
> if(!after.isEmpty)
>   center = after.next.begin;
> 
> To get the value there, I have to do:
> all.after(center).left // or whatever gets decided as the 'get first value
> of range' member
> 
> or if opStar is used:
> 
> *all.after(center);
> 
> I much prefer:
> 
> forward:
> if(center != list.end)
>     ++center;
> 
> reverse:
> if(center != list.begin)
>    --center;
> 
> get value:
> *center;
> 
> Especially without all the extra overhead
> 
> I see both methods as being just as open to mistakes, the first more-so, and more difficult to comprehend (at least for me).

Yes, these are valid points, I completely agree.  But there are also other points.  Let me voice some of them.

1.  Probably most important.  You say here that ranges suck at incremental bidirectional iteration over a linked list, as Bill aslo agrees with.  This seems true.  But this sort of iteration is not a goal in its own.  It's just an idiomatic *iterator* solution for a range of real-world problems.  I can't think of any such problem from the top of my head but it's probably a matter of my education.  Bill proposed one already.

I want to say that I believe that for any such real-world problem there is a range solution that's probably better than a direct mapping of an existing iterator solution.  The analogy would be trying to write in C++ as if you were using Python, or Haskell, and then declare that C++ sucks because it requires bulky, inefficient and error-prone code to implement simple functional idioms.  Different languages require different idioms and ranges are a different language from iterators.

For instance, Bill's undo/redo stack consisted of two entities: a list of operations, and a cursor. It was OK and natural with iterators.  It sucks with ranges.  Okay, I'm also going to use two entities: undo stack and redo stacks. Undo => pop from one, push to another.  New operation => push undo, trash redo.  Doesn't it look simpler and safer?  Well, it doesn't use ranges, at least from the user perspective, so what?

I also believe that a regular expression engine would benefit from using ranges rather than suffer.

2.  There was a special case that a center marker couldn't have been dereferenced.  Let's imagine you really needed it.  OK, no problem. Let's create a special kind of ranges, Cursor.  Cursor always contains one element.  Its begin points to that element and its end is calculated so that it always points after that element no matter what.  This is as valid as having an iterator pointing to that same element because you must guarantee in both cases that an iterator is dereferenceable.  This is ad-hoc, yes, but an EOF iterator in C++ is no less ad-hoc.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation