Difference between input range and forward range (page 2)

On Tuesday, 10 November 2015 at 18:57:31 UTC, Steven Schveighoffer wrote: > IMO, that shouldn't be a forward range. But in any case, the correct mechanism is: > > forward range -> a = b works and makes a copy of the iteration. > non-forward range -> a = b fails, you'd have to use a = b.getRef or something like that -or- a = b is a moving operation (a is no longer usable) Hmm. You mean "b is no longer usable", right? So, any algorithm requiring a copy should better do something like static if(isForwardRange!range) copy = range; else copy = range.save();

On Tuesday, 10 November 2015 at 16:07:01 UTC, Jonathan M Davis wrote: > generic code, you have to consider it to be consumed, because the state of range you passed to foreach is now undefined, since what happens when copying the range is undefined. This is true even if you put a break out of the loop, because the range was copied, and you simply cannot rely on the state of the range you passed to foreach after that copy. The problem I find with generic code is when the desire is to consume the data. Take this example of parsing a data stream. auto osmData = datastream.take(size).array; datastream.popFrontN(size); auto header = BlobHeader(osmData); http://he-the-great.livejournal.com/49636.html How do I know if popFrontN is needed? If I was given a value base range then it is. If I was given a reference range (in its many forms) the extra call to popFrontN will result in an unneeded data jump. I could require that a forward range is passed in, then I can save() before calling .array and thus always require popFrontN. The best option is probably to use the RefRange wrapper, but it does create an annoying element of surprise.

> auto osmData = datastream.take(size).array; > datastream.popFrontN(size); > auto header = BlobHeader(osmData); > > http://he-the-great.livejournal.com/49636.html > > How do I know if popFrontN is needed? If I was given a value base range then it is. If I was given a reference range (in its many forms) the extra call to popFrontN will result in an unneeded data jump. I could require that a forward range is passed in, then I can save() before calling .array and thus always require popFrontN. > > The best option is probably to use the RefRange wrapper, but it does create an annoying element of surprise. Yes. I agree. It's good example of problem. I just had problem like this with *take* function, *startsWith* and others. So we have two kinds of ranges: with reference and value semantics. But both of them could be of reference or value *types* (struct or class). So how would we determine at CT (by generic algorithms) wheter current range has ref or value semantics (we couldn't just rely on testing if it's class or struct)!? I think it's important, because it can make influence on programme logic. And again I want to say that we must explicitly say (in doc) that we have two logical range categories, so new users would not make stupid mistakes. Also unittests would be good for ref and value ranges to illustrate our intentions.

November 12, 2015

Re: Difference between input range and forward range

Posted by Jonathan M Davis
in reply to Jesse Phillips

Permalink

Jonathan M Davis

Posted in reply to Jesse Phillips

Permalink

On Wednesday, 11 November 2015 at 22:34:32 UTC, Jesse Phillips wrote:
> On Tuesday, 10 November 2015 at 16:07:01 UTC, Jonathan M Davis wrote:
>> generic code, you have to consider it to be consumed, because the state of range you passed to foreach is now undefined, since what happens when copying the range is undefined. This is true even if you put a break out of the loop, because the range was copied, and you simply cannot rely on the state of the range you passed to foreach after that copy.
>
> The problem I find with generic code is when the desire is to consume the data. Take this example of parsing a data stream.
>
>     auto osmData = datastream.take(size).array;
>     datastream.popFrontN(size);
>     auto header = BlobHeader(osmData);
>
> http://he-the-great.livejournal.com/49636.html
>
> How do I know if popFrontN is needed? If I was given a value base range then it is. If I was given a reference range (in its many forms) the extra call to popFrontN will result in an unneeded data jump. I could require that a forward range is passed in, then I can save() before calling .array and thus always require popFrontN.
>
> The best option is probably to use the RefRange wrapper, but it does create an annoying element of surprise.

Well, if we're talking forward ranges, then the only way to be 100% consistent with this is to basically consider datastream unusable after the call to take, because its state is undefined. So, the correct way to handle this would be to do something like

auto osmData = datastream.save.take(size).array;
datastream.popFrontN(size);

Now, that's ugly, but it does ensure that the code will work correctly regardless of whether the range is a reference type, value type, or pseudo-reference type. And if you wanted to guaranteed reference semantics, as you say, you could use RefRange, though you probably do have to be careful about that.

Regardless, it highlights how save needs to be called all over the place if you want ranges which are reference types to work consistently with value types.

The bigger problem is pure input ranges. If you do

auto osmData = datastream.take(size).array;

then the state of datastream is undefined (at least in generic code), and you can't do _anything_ with it. If datastream is reference type, then using it would work just fine, since the first size elements would have been consumed, and we shouldn't have to worry about value types (because they can always be forward ranges - though it's technically possible for someone to not make them forward ranges even when they should be). However, we _do_ have a problem with pseudo-reference types. A pure input range pretty much has to be a reference type with regards to its elements (otherwise it could be a forward range), but stuff like caching front can turn it into a pseudo-reference type and totally break code like this. With a full-on reference type

auto osmData = datastream.take(size).array;

results in datastream being the same as it would have been had you called datastream.popFrontN(size) instead. But with a cached front, while the subsequent elements would be correct, front would be wrong.

So, really, you're highlighting a really nasty aspect of this problem. As long as we allow pseudo-reference types for pure input ranges (and as I understand it, existing stuff like vibe.d has them), I don't see how we can make this code work correctly and access any of the elements in the range that were after the elements that were accessed via take.

Actually, even with full-on reference types, we're kind of screwed with take and input ranges. take is lazy, so if you do

auto osmData = datastream.take(size);
datastream.popFrontN(size);

and datastream is a reference type, then take will end up referring to to the second n elements, not the first.

*sigh* Pure input ranges suck. It's ugly as all get out with forward ranges, but liberal use of save can guarantee consistent semantics. But with pure input ranges...

I suspect that there's a whole pile of algorithms that technically should never be used with pure input ranges or which require that you be _very_ careful with them. It would be ludicrous to not have take work with input ranges, but it's quite clear that with a pure input range, calling take _and_ accessing elements beyond the ones taken isn't going to work unless you know that the range is a reference type, and you make sure that you iterate through the result of take _before_ doing anything with the rest of the range.

I think that it's pretty clear that we need to re-examine how pure input ranges should work and either make a change to them or have some very clear guidelines on how to use them (which is not likely going to be easy to do correctly) and probably disallow them for most algorithms.

Yuck. I'm definitely going to have to stew on this one. I've always thought that pure input ranges were too restrictive to be useful in most cases (though for some stuff you're pretty much stuck with them without doing a lot of buffering), but they seem to get worse every time we examine them in depth.

- Jonathan M Davis

On 11/11/15 4:20 AM, Dominikus Dittes Scherkl wrote: > On Tuesday, 10 November 2015 at 18:57:31 UTC, Steven Schveighoffer wrote: >> IMO, that shouldn't be a forward range. But in any case, the correct >> mechanism is: >> >> forward range -> a = b works and makes a copy of the iteration. >> non-forward range -> a = b fails, you'd have to use a = b.getRef or >> something like that -or- a = b is a moving operation (a is no longer >> usable) > Hmm. You mean "b is no longer usable", right? > Heh, right :) > So, any algorithm requiring a copy should better do something like > > static if(isForwardRange!range) > copy = range; > else > copy = range.save(); No, it should do this: static if(!isForwardRange!range) assert(0, "I need a forward range") save doesn't enter the picture, it would be eliminated. But again, this isn't going to happen. Note my specification above is for a fictional proposal of what I would have done. The problem with the current save regime is that simple assignment for forward ranges works too, but it doesn't always do what you want (and diabolically, 99% of the time it DOES do what you want). -Steve

On Thursday, 12 November 2015 at 15:29:19 UTC, Steven Schveighoffer wrote: > (and diabolically, 99% of the time it DOES do what you want). _This_ is the big problem. It wouldn't surprise me in the least if the vast majority of range-based code out there does not actually work properly with ranges which aren't implicitly saved when they're copied. I mean, technically, you have to do stuff like haystack.save.startsWith(needle) instead of haystack.startsWith(needle) if you want your code to work right with all forward ranges, but almost no one does that. And 99% of the time the code works - but not always. - Jonathan M Davis

On Thursday, 12 November 2015 at 15:43:50 UTC, Jonathan M Davis wrote: > On Thursday, 12 November 2015 at 15:29:19 UTC, Steven Schveighoffer wrote: >> (and diabolically, 99% of the time it DOES do what you want). > > _This_ is the big problem. It wouldn't surprise me in the least if the vast majority of range-based code out there does not actually work properly with ranges which aren't implicitly saved when they're copied. I mean, technically, you have to do stuff like haystack.save.startsWith(needle) instead of haystack.startsWith(needle) if you want your code to work right with all forward ranges, but almost no one does that. And 99% of the time the code works - but not always. > > - Jonathan M Davis https://issues.dlang.org/show_bug.cgi?id=11951

Forums