Thread overview
Re: Tricky semantics of ranges & potentially numerous Phobos bugs
Oct 16, 2012
Jonathan M Davis
Oct 16, 2012
deadalnix
Oct 16, 2012
H. S. Teoh
October 16, 2012
On Tuesday, October 16, 2012 12:07:01 H. S. Teoh wrote:
> Perhaps mark ranges with an .isTransient property (which can be an enum so it doesn't waste any space), and range-based functions that require non-transient ranges will refuse to work with a transient range. Or alternatively, switch to a different implementation that takes transience into account (which may be slower, etc., but the important thing is to do it right -- after all, D's motto is safe by default, unsafe if you ask for it).

Not a bad idea, though it's still arguably a bit disgusting to potentially have to check for that all over the place. Inevitably, most functions won't check, and ByLine _still_ won't work. Yes, Phobos would presumably end up checking in most cases, but I suspect that little else will. We arguably have way to many things to check about ranges as it is. So, I'd be far more tempted to just change ByLine to use opApply rather than adding the extra complication of isTransient to the standard library just for this one use case.

> I think this approach of what amounts to guessing which ranges are safe/unsafe with which functions is what's untenable. We all know that people don't read documentation, or at least, not thoroughly. Something like this is too easy to miss, and bugs will slip into code unnoticed for a long time, and then explode in your face. It's unsafe by default, which goes against D's philosophy.

The problem is that what ByLine is doing is incredibly abnormal. So, we're talking about affecting how all ranges do things just to satisfy one, incredibly abnormal use case. It's just that this one range is heavily used, making the fact that it doesn't work normally a problem. But given that even using opApply with ByLine would require the programmer to understand how to use it correctly (since simply keeping the buffer around after an iteration would still break in that case), the programmer has to understand the weirdness of ByLine _regardless_, making it so that I don't know that it's ultimately all that big a deal that many range-based functions will choke on it. The main problem I see with that is similar to the problem with relying on CTFE - a simple change to a function could make it so that what _did_ work correctly suddenly stops working correctly and will only be caught if that particular use case has unit tests (highly unlikely with ByLine).

So, really, I'm in favor of just making ByLine use opApply and make it so that it's not a range. That doesn't completely fix the problem, but it reduces its scope and doesn't complicate the rest of Phobos in the process.

- Jonathan M Davis
October 16, 2012
On Tue, Oct 16, 2012 at 09:21:11PM +0200, Jonathan M Davis wrote:
> On Tuesday, October 16, 2012 12:07:01 H. S. Teoh wrote:
> > Perhaps mark ranges with an .isTransient property (which can be an enum so it doesn't waste any space), and range-based functions that require non-transient ranges will refuse to work with a transient range. Or alternatively, switch to a different implementation that takes transience into account (which may be slower, etc., but the important thing is to do it right -- after all, D's motto is safe by default, unsafe if you ask for it).
> 
> Not a bad idea, though it's still arguably a bit disgusting to potentially have to check for that all over the place. Inevitably, most functions won't check, and ByLine _still_ won't work. Yes, Phobos would presumably end up checking in most cases, but I suspect that little else will. We arguably have way to many things to check about ranges as it is. So, I'd be far more tempted to just change ByLine to use opApply rather than adding the extra complication of isTransient to the standard library just for this one use case.

But like I said, we shouldn't single out ByLine just because it's what I happened to run into this time. This is part of a larger issue, which is that currently, we *assume* that ranges will not invalidate the value of .front when popFront() is called. But many kinds of ranges don't fulfill this (currently unstated) requirement. ByLine is only one of the more common examples.

Another example that I've mentioned is a range which returns the permutations of an array by swapping array elements in-place.

Yet another example, that actually came up in my own code, is an expression tree structure that represents potentially multiple values (it contains operators like ±). There is a method that returns a range of all possible values the tree can take on. Individual values are in fact vectors, which means .front returns real[]. It is much more efficient to implement, for example, ± just by flipping the signs of each component of the vector in-place, than to .dup every single time. If one is searching for a specific vector, for example, it would be wasteful to .dup every real[] that gets generated, only to immediately discard it.

So this is a non-trivial example of a range that modifies the array returned by .front when you call popFront(), but is otherwise a perfectly valid range.

And I argue that these "transient ranges" as I call them do have their place. One can iterate over them, call find() or canFind() on them, etc., all the normal things one does with ranges, which are what makes the concept of ranges so powerful. The only thing is that one has to be careful about saving the value of .front and then calling popFront().

Marking these ranges with some kind of marker, like isTransient, will help solve this problem. It doesn't have to be done everywhere: just for the few ranges that exhibit this property. And it only needs to be checked by functions that require .front to not mutate under them when they call popFront(). Most functions in Phobos, I suspect, don't even need to bother with it. So it should be a relatively small change. Seems to be a win-win situation to me.


> > I think this approach of what amounts to guessing which ranges are safe/unsafe with which functions is what's untenable. We all know that people don't read documentation, or at least, not thoroughly. Something like this is too easy to miss, and bugs will slip into code unnoticed for a long time, and then explode in your face. It's unsafe by default, which goes against D's philosophy.
> 
> The problem is that what ByLine is doing is incredibly abnormal. So, we're talking about affecting how all ranges do things just to satisfy one, incredibly abnormal use case. It's just that this one range is heavily used, making the fact that it doesn't work normally a problem.

I don't agree that what ByLine does is abnormal. If this were indeed the case, it would greatly limit the scope of ranges and range-based functions. You wouldn't be able to implement ranges that reuse buffers (as I described above), and you may end up being forced to use inefficient implementations just to satisfy the few functions that assume that .front doesn't mutate when you call popFront(). I don't see that as an advantageous position to take. Many range-based functions *do* work with ByLine and other such ranges, so why impose arbitrary restrictions on them?

Better to just treat the special cases specially (transient ranges + functions that assume .front can be safely copied), and let the general case work as it is already working.


[...]
> So, really, I'm in favor of just making ByLine use opApply and make it so that it's not a range. That doesn't completely fix the problem, but it reduces its scope and doesn't complicate the rest of Phobos in the process.
[...]

This is the simplest way out, I suppose, but it does limit the scope of range-based functions. Which I think is a loss -- what makes the concept of ranges so powerful is their sheer generality. I'd hate to lose that just because a few Phobos functions decide that they can't live with a transient .front.


T

-- 
BREAKFAST.COM halted...Cereal Port Not Responding. -- YHL
October 16, 2012
Le 16/10/2012 21:21, Jonathan M Davis a écrit :
> On Tuesday, October 16, 2012 12:07:01 H. S. Teoh wrote:
>> Perhaps mark ranges with an .isTransient property (which can be an enum
>> so it doesn't waste any space), and range-based functions that require
>> non-transient ranges will refuse to work with a transient range. Or
>> alternatively, switch to a different implementation that takes
>> transience into account (which may be slower, etc., but the important
>> thing is to do it right -- after all, D's motto is safe by default,
>> unsafe if you ask for it).
>
> Not a bad idea, though it's still arguably a bit disgusting to potentially
> have to check for that all over the place. Inevitably, most functions won't
> check, and ByLine _still_ won't work. Yes, Phobos would presumably end up
> checking in most cases, but I suspect that little else will. We arguably have
> way to many things to check about ranges as it is. So, I'd be far more tempted
> to just change ByLine to use opApply rather than adding the extra complication
> of isTransient to the standard library just for this one use case.
>

I have to agree here. Even if I see several uses outside byLine, this is known that most code will not take that into account. Adding a property isn't going to help.
October 17, 2012
On 10/16/12 3:21 PM, Jonathan M Davis wrote:
> Not a bad idea, though it's still arguably a bit disgusting to potentially
> have to check for that all over the place. Inevitably, most functions won't
> check, and ByLine _still_ won't work. Yes, Phobos would presumably end up
> checking in most cases, but I suspect that little else will.

Then "little else" should claim working with input ranges and rely on forward ranges semantics.

> The problem is that what ByLine is doing is incredibly abnormal.

Works exactly as intended, and is eerily normal.

Andrei

October 17, 2012
On 10/16/12 5:51 PM, deadalnix wrote:
> Le 16/10/2012 21:21, Jonathan M Davis a écrit :
>> On Tuesday, October 16, 2012 12:07:01 H. S. Teoh wrote:
>>> Perhaps mark ranges with an .isTransient property (which can be an enum
>>> so it doesn't waste any space), and range-based functions that require
>>> non-transient ranges will refuse to work with a transient range. Or
>>> alternatively, switch to a different implementation that takes
>>> transience into account (which may be slower, etc., but the important
>>> thing is to do it right -- after all, D's motto is safe by default,
>>> unsafe if you ask for it).
>>
>> Not a bad idea, though it's still arguably a bit disgusting to
>> potentially
>> have to check for that all over the place. Inevitably, most functions
>> won't
>> check, and ByLine _still_ won't work. Yes, Phobos would presumably end up
>> checking in most cases, but I suspect that little else will. We
>> arguably have
>> way to many things to check about ranges as it is. So, I'd be far more
>> tempted
>> to just change ByLine to use opApply rather than adding the extra
>> complication
>> of isTransient to the standard library just for this one use case.
>>
>
> I have to agree here. Even if I see several uses outside byLine, this is
> known that most code will not take that into account. Adding a property
> isn't going to help.

The property is already there - .save. The question boils down to whether input ranges should be acceptable as ranges or not.

Andrei