Jump to page: 1 2
Thread overview
Difference between input range and forward range
Nov 10, 2015
Shachar Shemesh
Nov 10, 2015
Jonathan M Davis
Nov 11, 2015
Jesse Phillips
Nov 12, 2015
Ur@nuz
Nov 12, 2015
Jonathan M Davis
Nov 10, 2015
Jonathan M Davis
Nov 10, 2015
Ur@nuz
Nov 10, 2015
Jonathan M Davis
Nov 10, 2015
Jonathan M Davis
Nov 10, 2015
H. S. Teoh
Nov 11, 2015
Ur@nuz
Nov 12, 2015
Jonathan M Davis
Nov 12, 2015
Tobias Pankrath
November 10, 2015
Please consider the following program:
import std.stdio;
import std.range;

struct Range {
    int a;

    @disable this(this);
public:
    int front() {
        return a;
    }

    void popFront() {
        a--;
    }

    bool empty() {
        return a<=0;
    }
}

void main()
{
    pragma(msg, isInputRange!Range);
    pragma(msg, isForwardRange!Range);
    auto r = Range(5);

    foreach( i; r ) {
        writeln(i);
    }

    foreach( i; r ) {
        writeln(i);
    }
}

It seems that "foreach" requires that the range be copyable, in direct contradiction to the fact that Range is not a forward range.

Removing the @disable, the program compiles and run, but prints the range twice instead of once.

If you couple that with the extremely amorphous definition of "save" for forward ranges, there seems to be quite a confusion between the two.

Am I missing something?

Shachar
November 10, 2015
On Tuesday, 10 November 2015 at 14:28:15 UTC, Shachar Shemesh wrote:
> Am I missing something?

The short answer is that input ranges need to be copyable to work with foreach, because of how foreach is defined for ranges. The long answer is...

Input ranges _are_ copyable. It's just that their state isn't guaranteed to be copied (and if it is copied, then it probably should have been a forward range anyway). The semantics of what happens when copying any type of range are undefined, because it depends on how the range is implemented. For example, if you have

auto copy = orig;
copy.popFront();

and the type of orig and copy is a reference type, then popping off an element from copy popped off an element from orig, whereas if the range is a value type, then popping off an element from copy has no effect on orig. And if the range is a pseudo-reference type, then it'll probably do something like pop off an element from orig but not affect orig's front, but it could also be that it has no effect on orig (what exactly happens depends on how the range is implemented). Basically, if you copy a range, you can't do _anything_ with the original unless you assign it a value, because what happens when you call anything on the original depends on how its implemented and thus is undefined behavior in general. I talked about this problem in my dconf 2015 talk:

http://dconf.org/2015/talks/davis.html

Now, with regards to foreach specifically,

foreach(e; range)
{
    // do stuff
}

becomes

for(auto __c = range; !__c.empty; __c.popFront())
{
    auto e = _c.front;
    // do stuff
}

Notice that the range is copied. So, yes, for a range to work with foreach, it's going to have to be copyable. @disabling the postblit constructor is just going to cause you trouble. But the key thing here is that this means that once you use a range in a foreach loop, you can't use it for anything else. In generic code, you have to consider it to be consumed, because the state of range you passed to foreach is now undefined, since what happens when copying the range is undefined. This is true even if you put a break out of the loop, because the range was copied, and you simply cannot rely on the state of the range you passed to foreach after that copy.

Now, if you know the exact semantics of a particular range type, and the code you're writing is not generic, then you have more leeway, but in generic code, you have to be very careful to make sure that you don't use a range that has been copied unless it's been assigned a new value - and that includes not using a range after passing it to foreach. We're frequently lax with this due to the fact that most ranges are value types or pseudo-reference types that act like value types, so they're not only forward ranges, but they're implicitly saved by a copy, and so we frequently get away with using a range after it's been copied, but it doesn't work in the general case.

- Jonathan M Davis
November 10, 2015
On Tuesday, 10 November 2015 at 14:28:15 UTC, Shachar Shemesh wrote:
> It seems that "foreach" requires that the range be copyable, in direct contradiction to the fact that Range is not a forward range.

To add to what I said, whether a range is "copyable" has nothing to do with whether it is a forward range or not. What's supposed to be guaranteed with any and all ranges is that if you copy it, it has the exact same state as the original had - but _not_ that that state is separate from the original. So, if you have

auto copy = orig;

then you can pop elements off of copy like you would have popped them off of orig, but you must on longer pop elements off of orig. _copy_ is now the range. This is critical when you consider that most ranges will be copied when they're passed to a function. If they weren't copyable, they'd be almost useless.

> If you couple that with the extremely amorphous definition of "save" for forward ranges, there seems to be quite a confusion between the two.

There is nothing amorphous about the definition of save. save duplicates a range such that you have two ranges of the same type which are independent of one another and which contain exactly the same elements. save provides a way to get a copy of the range's state regardless of whether it's value type, reference type, or pseudo-reference type.

Consider, for instance, a range which is implemented as a class. It's a full-on reference type, and copying it around will never result in its state being copied. Without save, there would be no way to duplicate it. And save provides the _only_ generic means of duplicating a range. It's a bug any time that generic code depends on the state of a range being duplicated when it's copied. Unfortunately, because most ranges _do_ duplicate their state when they're copied, there's plenty of code out there which is buggy and relies on ranges being duplicated when they're copied. That's why it's so critical to test algorithms with a variety of range types.

- Jonathan M Davis
November 10, 2015
I agree with these considerations. When I define non-copyable range (with disabled this) lot of standard phobos functions fails to compile instead of using *save* method. So logical question is in which cases we should use plain old struct copy or and when we should use *save* on forward ranges.

Also good question is should we have input ranges copyable (or for what types of ranges they can be copyable)? Good example is network socket as input range, because we can't save the state of socket stream and get consumed data again so as I thing copying of such range looks meaningless (in my opinion). If we want to pass it somewhere it's better pass it by reference.

Also passing range somewhere to access it in two different places simultaneously is also bad idea. The current state looks like we have current approach with range postblit constructor and +save+, because we have it for structs and it works somehow (yet) for trivial cases. But we don't have clear intentions about how it should really work.

Copying and passing ranges should also be specifyed as part of range protocol, because it's very common use case and shouldn't be ambigous.

Also as far as range could be class object we must consider how should they behave?

Is there some ideas and understanding of what I am talking about?
November 10, 2015
On 11/10/15 9:28 AM, Shachar Shemesh wrote:
> Please consider the following program:

The problem is that you defined a forward range without implementing all the primitives :) That is, you can easily make a copy of the range so it iterates from the current spot (the quintessential feature of forward ranges). You actually have to do *extra work* to make it a true input-but-not-forward-range.

I've never made it a secret that I dislike the 'save' requirement. In my experience, the only time you ever implement 'save' is to do the same thing as copying the range via =. To the point where people simply don't use the .save member, and their code always works, so what is the point? It may as well have been an enum isCopyable.

-Steve
November 10, 2015
On Tuesday, 10 November 2015 at 17:16:41 UTC, Steven Schveighoffer wrote:
> I've never made it a secret that I dislike the 'save' requirement. In my experience, the only time you ever implement 'save' is to do the same thing as copying the range via =. To the point where people simply don't use the .save member, and their code always works, so what is the point? It may as well have been an enum isCopyable.

Well, it's not that hard to come up with a range that has to implement save, because copying it doesn't save it. The obvious example is if it's implemented as a class. Another would be std.range.refRange which exists specifically to turn a non-reference type range into a reference type range. And really, any range that's an input range but not a forward range is in that boat, because if copying it duplicated it, then it could be a forward range.

I really don't see any way around having something like save without artificially restricting ranges to types which are implicitly saved when copied (which would pretty much mean getting rid of pure input ranges), but even if there were clearly a better way, that's a pretty big change at this stage.

- Jonathan M Davis
November 10, 2015
On Tue, Nov 10, 2015 at 06:38:22PM +0000, Jonathan M Davis via Digitalmars-d wrote:
> On Tuesday, 10 November 2015 at 17:16:41 UTC, Steven Schveighoffer wrote:
> >I've never made it a secret that I dislike the 'save' requirement. In my experience, the only time you ever implement 'save' is to do the same thing as copying the range via =. To the point where people simply don't use the .save member, and their code always works, so what is the point?  It may as well have been an enum isCopyable.
> 
> Well, it's not that hard to come up with a range that has to implement save, because copying it doesn't save it. The obvious example is if it's implemented as a class. Another would be std.range.refRange which exists specifically to turn a non-reference type range into a reference type range.  And really, any range that's an input range but not a forward range is in that boat, because if copying it duplicated it, then it could be a forward range.
> 
> I really don't see any way around having something like save without artificially restricting ranges to types which are implicitly saved when copied (which would pretty much mean getting rid of pure input ranges), but even if there were clearly a better way, that's a pretty big change at this stage.
[...]

There are a few ranges in Phobos that needs to do non-trivial things in .save, that otherwise would not work properly. IIRC groupBy is one of them. I'm pretty sure there are one or two others. I've also needed to do this in my own code.

All it takes is for one range to need non-trivial implementation in .save, and all the other wrapper ranges will need to do the same, in order to be semantically correct (e.g., if they wrap around a range in .r, then the copy returned by .save must have r = this.r.save, otherwise the returned range will have side-effects on the original range).


T

-- 
Meat: euphemism for dead animal. -- Flora
November 10, 2015
On Tuesday, 10 November 2015 at 16:33:02 UTC, Ur@nuz wrote:
> I agree with these considerations. When I define non-copyable range (with disabled this) lot of standard phobos functions fails to compile instead of using *save* method. So logical question is in which cases we should use plain old struct copy or and when we should use *save* on forward ranges.
>
> Also good question is should we have input ranges copyable (or for what types of ranges they can be copyable)? Good example is network socket as input range, because we can't save the state of socket stream and get consumed data again so as I thing copying of such range looks meaningless (in my opinion). If we want to pass it somewhere it's better pass it by reference.

Passing by reference really doesn't work with ranges. Consider that most range-based functions are lazy and wrap the range that they're given in a new range. e.g.

auto r = filter!pred(range);

or

auto r = map!func(range);

The range has to be copied for that to work. And even if you could make it so that the result of functions like map or filter referred to the original range by reference, their return value would not be returned by ref, so if a function required that its argument by passed by ref, then you couldn't chain it. So, requiring that ranges be passed by ref would pretty much kill function chaining.

> Also passing range somewhere to access it in two different places simultaneously is also bad idea. The current state looks like we have current approach with range postblit constructor and +save+, because we have it for structs and it works somehow (yet) for trivial cases. But we don't have clear intentions about how it should really work.

It's mostly clear, but it isn't necessarily straightforward to get it right. If you want to duplicate a range, then you _must_ use save. Copying a range by assigning it to another range is not actually copying it per the range API. You pretty much have to consider it a move and consider the original unusable after the copy.

The problem is that for arrays and many of the common ranges, copying the range and calling save are semantically the same, so it's very easy to write code which assumes that behavior and then doesn't work with other types of changes. That's why it's critical to test range-based functions with a variety of ranges types - particularly reference types in addition to value types or dynamic arrays.

> Copying and passing ranges should also be specifyed as part of range protocol, because it's very common use case and shouldn't be ambigous.

The semantics of copying a range depend heavily on how a range is implemented and cannot be defined in the general case:

auto copy = orig;

Dynamic arrays and classes will function fundamentally differently, and with structs, there are a variety of different semantics that that copy could have. What it ultimately comes down to is that while the range API can require that the copy be in the exact same state that the original was in, it can't say anything about the state of the original after the copy. Well-behaved range-based code has to assume that once orig has been copied, it is unusable. If the code wants to actually get a duplicate of the range, then it will have to use save, and the semantics of that _are_ well-defined and do not depend on the type of the range.

> Also as far as range could be class object we must consider how should they behave?

There's really nothing to consider here. It's known how they should behave. There's really only one way that they _can_ behave. One of the main reasons that save exists is because of classes. While copying a dynamic array or many struct types is equivalent to save, it _can't_ be equivalent with a class. When you consider that fact, the required behavior of ranges pretty much falls into place on its own. We may very well need to be far clearer about what those semantics are and how that affects best practices, but there really isn't much (if any) wiggle room in what the range API does and doesn't guarantee and how it should be used. The problem is whether it's _actually_ used that way.

If a range-based function is tested with a variety of range types - dynamic arrays, value types, reference types, etc. then it becomes clear very quickly when calls to save are required and how the function must be written to work for all of those range types. But far too often, range-based functions are tested with dynamic arrays and a few struct range types that wrap dynamic arrays, and bugs with regards to reference type ranges are not found. So, there's almost certainly a lot of range-based code out there that works fantastically with dynamic arrays but would fail miserably with a number of other range types.

For the most part, I think that it's pretty clear how ranges have to act and how they need to be used based on their API when you actually look at how the range API interacts with different types of ranges, but we often do not go much beyond dynamic arrays and miss out on some of the subtleties.

We really do need some good write-ups on ranges and their best practices. I've worked on that before but never managed to spend the time to finish it. Clearly, I need to fix that.

- Jonathan M Davis
November 10, 2015
On 11/10/15 1:38 PM, Jonathan M Davis wrote:
> On Tuesday, 10 November 2015 at 17:16:41 UTC, Steven Schveighoffer wrote:
>> I've never made it a secret that I dislike the 'save' requirement. In
>> my experience, the only time you ever implement 'save' is to do the
>> same thing as copying the range via =. To the point where people
>> simply don't use the .save member, and their code always works, so
>> what is the point? It may as well have been an enum isCopyable.
>
> Well, it's not that hard to come up with a range that has to implement
> save, because copying it doesn't save it. The obvious example is if it's
> implemented as a class.

IMO, that shouldn't be a forward range. But in any case, the correct mechanism is:

forward range -> a = b works and makes a copy of the iteration.
non-forward range -> a = b fails, you'd have to use a = b.getRef or something like that -or- a = b is a moving operation (a is no longer usable)

Classes wouldn't work with this because you can't overload assignment, but the whole range hierarchy system cannot be changed at this point anyway. Moving on...

-Steve
November 11, 2015
I think that we should document it somewhere in order to prevent future mistakes in algorithms that copies and +save+s ranges. I think that lots of algorithms could rely on that initializing new range or copying via opAssign will create independent range, but not really all do it. So we must somehow pay attention to it, write some article on working with ranges, because there are a lot of aspects that not clear for me and other. This could make library much better.
« First   ‹ Prev
1 2