Must ranges support `r1 = r2;`? (page 2)

On Saturday, 24 March 2018 at 21:44:35 UTC, ag0aep6g wrote: > Long version: <https://issues.dlang.org/show_bug.cgi?id=18657> ("std.range and std.algorithm can't handle refRange"). > > Short version: With two `std.range.RefRange`s, `r1 = r2;` does not what other Phobos code expects. > > Question is: Who's at fault? Where do I fix this? Do ranges have to support assignment as expected - even though std.range doesn't mention it? Or should range-handling code never do that - even though it comes naturally and is widespread currently? Posting this from the PR. This is a band-aid over the issue of what to do with funky operator overloads in ranges. In conjunction with this issue, you also have to assume that one could do stuff in the ctor that would mess up existing range code in Phobos. What really needs to happen is the range spec needs to be updated with a clear list of assumptions that Phobos makes in regards to these functions. I'm pretty sure that RefRange would have been designed differently if the designer realized the results of overloading opAssign like that. In a perfect world, we'd have a pre-constructed test suite that people could plug their range (with some data in it) into, where all the tests make sure all Phobos assumptions hold true. In my opinion, it would be best to update the range spec and go from there.

On Mon, Apr 16, 2018 at 03:58:34PM +0000, Jack Stouffer via Digitalmars-d wrote: [...] > In a perfect world, we'd have a pre-constructed test suite that people could plug their range (with some data in it) into, where all the tests make sure all Phobos assumptions hold true. [...] We could perhaps add a little test framework in std.internal.* containing tests that run through a variety of different types of ranges. I think there's already something like this somewhere, it's just not used as widely as it could be. I could come up with some pretty pathological range types for stress-testing, if you want. (These are actual range types I've used in my own code, I didn't set out to break Phobos, but it turned out that Phobos makes many unstated assumptions that explode in your face when given a more unusual instance of a range. There used to be (and may still be) a lot of places that essentially assumes built-in array semantics, and that blew up in ugly ways when handed something else that's nevertheless still a valid range.) T -- Making non-nullable pointers is just plugging one hole in a cheese grater. -- Walter Bright

On Monday, April 16, 2018 15:58:34 Jack Stouffer via Digitalmars-d wrote: > On Saturday, 24 March 2018 at 21:44:35 UTC, ag0aep6g wrote: > > Long version: <https://issues.dlang.org/show_bug.cgi?id=18657> ("std.range and std.algorithm can't handle refRange"). > > > > Short version: With two `std.range.RefRange`s, `r1 = r2;` does not what other Phobos code expects. > > > > Question is: Who's at fault? Where do I fix this? Do ranges have to support assignment as expected - even though std.range doesn't mention it? Or should range-handling code never do that - even though it comes naturally and is widespread currently? > > Posting this from the PR. > > This is a band-aid over the issue of what to do with funky operator overloads in ranges. In conjunction with this issue, you also have to assume that one could do stuff in the ctor that would mess up existing range code in Phobos. What really needs to happen is the range spec needs to be updated with a clear list of assumptions that Phobos makes in regards to these functions. I'm pretty sure that RefRange would have been designed differently if the designer realized the results of overloading opAssign like that. > > In a perfect world, we'd have a pre-constructed test suite that people could plug their range (with some data in it) into, where all the tests make sure all Phobos assumptions hold true. > > In my opinion, it would be best to update the range spec and go from there. Well, the reality of the matter is that if RefRange's opAssign doesn't work the way that it works, then RefRange is broken and fails at its purpose (and this is the designer of it speaking). So, if RefRange couldn't do what it's doing, it wouldn't exist, and no requirements have ever been place on opAssign for ranges (in fact, for better or worse, the range API doesn't actually require that opAssign even work for ranges, though it can certainly be argued that it should). Now, RefRange is weird enough overall that maybe it shouldn't exist, and it was trying to solve a problem that's not really solvable in the general case (basically it's trying to force a range-based function to mutate the original rather than to deal with a copy even though the function was written to copy) - especially once forward ranges get involved - but opAssign is doing exactly what it was intended to do, and if what it's doing wasn't considered valid at the time, RefRange wouldn't exist. Either way, there are times when I think that adding RefRange was a mistake precisely because it's such an oddball and because it's only ever really going to do what it was intended to do in some circumstances. - Jonathan M Davis

On Monday, 16 April 2018 at 16:15:30 UTC, H. S. Teoh wrote: > I could come up with some pretty pathological range types for stress-testing, if you want. (These are actual range types I've used in my own code, I didn't set out to break Phobos, but it turned out that Phobos makes many unstated assumptions that explode in your face when given a more unusual instance of a range. There used to be (and may still be) a lot of places that essentially assumes built-in array semantics, and that blew up in ugly ways when handed something else that's nevertheless still a valid range.) It would be awesome to have these as range types in std.internal.test.dummyrange. Then they could be used in tests throughout Phobos.

On Monday, 16 April 2018 at 16:22:04 UTC, Jonathan M Davis wrote: > Well, the reality of the matter is that if RefRange's opAssign doesn't work the way that it works, then RefRange is broken and fails at its purpose (and this is the designer of it speaking). So, if RefRange couldn't do what it's doing, it wouldn't exist, and no requirements have ever been place on opAssign for ranges (in fact, for better or worse, the range API doesn't actually require that opAssign even work for ranges, though it can certainly be argued that it should). Now, RefRange is weird enough overall that maybe it shouldn't exist, and it was trying to solve a problem that's not really solvable in the general case (basically it's trying to force a range-based function to mutate the original rather than to deal with a copy even though the function was written to copy) - especially once forward ranges get involved - but opAssign is doing exactly what it was intended to do, and if what it's doing wasn't considered valid at the time, RefRange wouldn't exist. Either way, there are times when I think that adding RefRange was a mistake precisely because it's such an oddball and because it's only ever really going to do what it was intended to do in some circumstances. RefRange can be useful in certain situations. But, I don't see a pretty way out of this. Either 1. we break code by saying `RefRange` is doing something illegal 2. we break code by making `RefRange` always an input range 3. we change all range code which could use `opAssign` to use `move`, potentially breaking code which relies on opAssign being called, and imposing a large burden on the maintenance of existing code for a very small corner case. Just imagine how much code outside of Phobos also has this subtle bug.

April 16, 2018

Re: Must ranges support `r1 = r2;`?

Posted by Jonathan M Davis
in reply to Jack Stouffer

Permalink

Jonathan M Davis

Posted in reply to Jack Stouffer

Permalink

On Monday, April 16, 2018 19:01:02 Jack Stouffer via Digitalmars-d wrote:
> On Monday, 16 April 2018 at 16:22:04 UTC, Jonathan M Davis wrote:
> > Well, the reality of the matter is that if RefRange's opAssign doesn't work the way that it works, then RefRange is broken and fails at its purpose (and this is the designer of it speaking). So, if RefRange couldn't do what it's doing, it wouldn't exist, and no requirements have ever been place on opAssign for ranges (in fact, for better or worse, the range API doesn't actually require that opAssign even work for ranges, though it can certainly be argued that it should). Now, RefRange is weird enough overall that maybe it shouldn't exist, and it was trying to solve a problem that's not really solvable in the general case (basically it's trying to force a range-based function to mutate the original rather than to deal with a copy even though the function was written to copy) - especially once forward ranges get involved - but opAssign is doing exactly what it was intended to do, and if what it's doing wasn't considered valid at the time, RefRange wouldn't exist. Either way, there are times when I think that adding RefRange was a mistake precisely because it's such an oddball and because it's only ever really going to do what it was intended to do in some circumstances.
>
> RefRange can be useful in certain situations. But, I don't see a pretty way out of this. Either
>
> 1. we break code by saying `RefRange` is doing something illegal
> 2. we break code by making `RefRange` always an input range
> 3. we change all range code which could use `opAssign` to use
> `move`, potentially breaking code which relies on opAssign being
> called, and imposing a large burden on the maintenance of
> existing code for a very small corner case. Just imagine how much
> code outside of Phobos also has this subtle bug.

Honestly, I'm perfectly fine with RefRange only working some of the time. Even if the function doesn't use assignment anywhere, RefRange doesn't necessarily really work correctly with regards to its goal anyway in the sense that as soon as save is used, you lose any changes that are made. And even if RefRange doesn't work with some range-based functions that doesn't mean that it can't work with others. We can simply go the route of putting a note on RefRange that it may not work correctly with all range-based algorithms - that it's there for the cases where it's useful but that the nature of its semantics make it so that it's never going to work well with some functions.

As for changing range-based code that uses assignment, if it really needs assignment, then I'd favor just saying that RefRange doesn't work properly with it and tough luck. It's not like it's ever worked properly with such a function. But at the same time, I don't know why a function would _need_ assignment to work with ranges. Typically, you can just create a new variable, and you're fine. It wouldn't surprise me if there's some example where assignment would be needed, but I don't think that it's typical. So, if a range-based function can easily avoid using assignment on ranges, it probably should.

Part of the problem here is that when you're dealing with generic code, you can't make very many assumptions, but it's so, so easy to make assumptions that are almost alwayws true that it becomes almost impossible to avoid making such assumptions at least from time to time even if you don't mean to. Locking down the required semantics even further can help by guaranteeing that more assumptions are valid, but that also eliminates certain use cases (ranges with transitive front would be a good example of this; ideally, no such ranges would exist, because they cause major problems, but at the same time, they can sometimes be really useful - particularly when performance is important). So, I suspect that we're going to be stuck finding problems like this from time to time for years to come, though hopefully only rarely.

- Jonathan M Davis

Forums