Must ranges support `r1 = r2;`? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Must ranges support `r1 = r2;`?

Thread overview

Must ranges support `r1 = r2;`?
Mar 24, 2018 ag0aep6g
Mar 24, 2018 H. S. Teoh
Mar 24, 2018 ag0aep6g
Mar 24, 2018 Simen Kjærås
Mar 24, 2018 Simen Kjærås
Mar 24, 2018 ag0aep6g
Mar 24, 2018 Jonathan M Davis
Mar 24, 2018 ag0aep6g
Mar 25, 2018 Jonathan M Davis
Mar 25, 2018 ag0aep6g
Apr 16, 2018 Jack Stouffer
Apr 16, 2018 H. S. Teoh
Apr 16, 2018 Jack Stouffer
Apr 16, 2018 Jonathan M Davis
Apr 16, 2018 Jack Stouffer
Apr 16, 2018 Jonathan M Davis

March 24, 2018

Must ranges support `r1 = r2;`?

Posted by ag0aep6g

ag0aep6g

Long version: <https://issues.dlang.org/show_bug.cgi?id=18657> ("std.range and std.algorithm can't handle refRange").

Short version: With two `std.range.RefRange`s, `r1 = r2;` does not what other Phobos code expects.

Question is: Who's at fault? Where do I fix this? Do ranges have to support assignment as expected - even though std.range doesn't mention it? Or should range-handling code never do that - even though it comes naturally and is widespread currently?

March 24, 2018

Re: Must ranges support `r1 = r2;`?

Posted by H. S. Teoh
in reply to ag0aep6g

H. S. Teoh

Posted in reply to ag0aep6g

On Sat, Mar 24, 2018 at 10:44:35PM +0100, ag0aep6g via Digitalmars-d wrote:
> Long version: <https://issues.dlang.org/show_bug.cgi?id=18657> ("std.range and std.algorithm can't handle refRange").
> 
> Short version: With two `std.range.RefRange`s, `r1 = r2;` does not what other Phobos code expects.
> 
> Question is: Who's at fault? Where do I fix this? Do ranges have to support assignment as expected - even though std.range doesn't mention it? Or should range-handling code never do that - even though it comes naturally and is widespread currently?

Short answer: if r1=r2 is meant to save the current position in the range, that's wrong; it should be r1=r2.save instead.

Long answer: I'm on my phone and don't want to type too much, so I'll leave the explanations to someone else. :-P


T

-- 
A linguistics professor was lecturing to his class one day. "In English," he said, "A double negative forms a positive. In some languages, though, such as Russian, a double negative is still a negative. However, there is no language wherein a double positive can form a negative." A voice from the back of the room piped up, "Yeah, yeah."

March 24, 2018

Re: Must ranges support `r1 = r2;`?

Posted by ag0aep6g
in reply to H. S. Teoh

ag0aep6g

Posted in reply to H. S. Teoh

On 03/24/2018 10:57 PM, H. S. Teoh wrote:
> On Sat, Mar 24, 2018 at 10:44:35PM +0100, ag0aep6g via Digitalmars-d wrote:
[...]
>> Short version: With two `std.range.RefRange`s, `r1 = r2;` does not
>> what other Phobos code expects.
[...]
> Short answer: if r1=r2 is meant to save the current position in the
> range, that's wrong; it should be r1=r2.save instead.

`r1 = r2.save;` has the exact same problem. See the linked Bugzilla issue for details.

March 24, 2018

Re: Must ranges support `r1 = r2;`?

Posted by Simen Kjærås
in reply to ag0aep6g

Simen Kjærås

Posted in reply to ag0aep6g

On Saturday, 24 March 2018 at 21:44:35 UTC, ag0aep6g wrote:
> Long version: <https://issues.dlang.org/show_bug.cgi?id=18657> ("std.range and std.algorithm can't handle refRange").
>
> Short version: With two `std.range.RefRange`s, `r1 = r2;` does not what other Phobos code expects.
>
> Question is: Who's at fault? Where do I fix this? Do ranges have to support assignment as expected - even though std.range doesn't mention it? Or should range-handling code never do that - even though it comes naturally and is widespread currently?

It seems to me you've misunderstood what refRange actually does. From bugzilla:

void main()
{
    import std.range;
    import std.stdio;

    string s = "foo";
    auto r = refRange(&s);

    auto r2 = r;
    r2 = r2.save;
        /* Surprising: Effectively just does `s = s;` (i.e., nothing). */

    r2.popFront();
    writeln(r); /* Surprising: Prints "oo". */
}

None of this is surprising. When you call r2.popFront(), it's being forwarded to s.popFront. That's what refRange does, it's what it's supposed to do, and it's the only thing it could sensibly do.

This is conceptually what refRange does:

struct RefRange(R) {
    R* innerRange;
    this(R* innerRange) {
        this.innerRange = innerRange;
    }
    void popFront() {
        innerRange.popFront();
    }
    @property auto front() {
        return innerRange.front;
    }
    @property bool empty() {
        return innerRange.empty;
    }
}

In other words:

unittest {
    import std.range : refRange;
    import std.algorithm.comparison : equal;
    import std.array : popFront;

    auto a = "foo";
    auto b = refRange(&a);

    // Popping off the refRange pops off the original range.
    b.popFront();
    assert(a == "oo");

    // Popping off the original range pops off the refRange.
    a.popFront();
    assert(b.equal("o"));

    // b.equal("o") above had to iterate over the range,
    // thereby calling popFront until empty() returned true,
    // so the range is now empty.
    assert(a == "");
    assert(b.equal(""));
}

So when you do this:

    string s = "foo";
    auto r = refRange(&s).group;
    writeln(r.save);

r.save is going to create a new Group range, which contains a RefRange, which ultimately points to s. writeln() then repeatedly calls popFront(). This mutates s, as specified above.

You do have a point in the bug report on this point: calling writeln(r.save) again, should not write [Tuple!(dchar, uint)('f', 1)], but just [], since the underlying range is now empty. This is a bug.

The 'chain', 'choose', and 'cycle' examples are an effect of strings not being random access ranges. I'm uncertain if we should call this a bug, but I agree the behavior is unexpected, so we probably should.

The splitter example is of course a bug. The crash, that is. The expected behavior is that the first writeln prints [foo, ar], which it does, and that the second print [].

--
  Simen

March 24, 2018

Re: Must ranges support `r1 = r2;`?

Posted by Simen Kjærås
in reply to Simen Kjærås

Simen Kjærås

Posted in reply to Simen Kjærås

On Saturday, 24 March 2018 at 22:36:55 UTC, Simen Kjærås wrote:
> struct RefRange(R) {
>     R* innerRange;
>     this(R* innerRange) {
>         this.innerRange = innerRange;
>     }
>     void popFront() {
>         innerRange.popFront();
>     }
>     @property auto front() {
>         return innerRange.front;
>     }
>     @property bool empty() {
>         return innerRange.empty;
>     }
> }

Minor nitpick: these should of course be (*innerRange).popFront(), etc.

--
  Simen

March 24, 2018

Re: Must ranges support `r1 = r2;`?

Posted by Jonathan M Davis
in reply to ag0aep6g

Jonathan M Davis

Posted in reply to ag0aep6g

On Saturday, March 24, 2018 22:44:35 ag0aep6g via Digitalmars-d wrote:
> Long version: <https://issues.dlang.org/show_bug.cgi?id=18657> ("std.range and std.algorithm can't handle refRange").
>
> Short version: With two `std.range.RefRange`s, `r1 = r2;` does not what other Phobos code expects.
>
> Question is: Who's at fault? Where do I fix this? Do ranges have to support assignment as expected - even though std.range doesn't mention it? Or should range-handling code never do that - even though it comes naturally and is widespread currently?

I'll have to sit down and study RefRange again to really say whether it's violating how ranges are supposed to work, but the key idea behind RefRange is that it makes it possible to pass a range to a function and not have it be copied in the process so that if you have something like

void foo(R)(R range)
{
    ...
}

you can pass it a range and continue to use what's left of the range after it's done. As far as copying and assignment go, when you have

auto range2 = range1;

as far as generic code is concerned, it's effectively undefined behavior to then do anything with range1. The same goes with passing a range to a function by value or doing anything else that copies a range. The behavior of how ranges in general work is not defined for copying.

If the range is a reference type, then range1 and range2 refer to each other, and attempting to iterate either range1 or range2 would iterate the other.

If the range is a value type, then range1 and range2 are then independent, and iterating one does not affect the other.

If the range is a pseudo-reference type, then what happens to one when trying to iterate the other depends entirely on the implementation. In the case of dynamic arrays, you get the same behavior as a value type, but in many other cases, you get a weird halfway state, because calling popFront on range2 affects _some_ of the state in range1 but not all.

As such, in generic code, the only operation that I would expect to be valid on range1 after

auto range2 = range1;

would be to assign range1 another value - be that range2 or another range of the same type, at which point, range1 should then be able to be iterated while the range that was assigned to it can't be, because it then would have been copied, and the behavior for using it would be undefined. e.g.

auto range2 = range1; // now, range1 can't be used until it's assigned to
range2.popFront();

range1 = range2; // now, range2 can't be used until it's assigned to
range1.popFront();

And I don't think that RefRange violates that. Regardless, any generic, range-based code that uses a range after copying it (and I mean actually copying it, not calling save) is buggy code, and it _can't_ work correctly because of how diverse the behavior of copying ranges is. Any generic, code that copies a range and then uses it is wrong. Non-generic, range-based code can do it, because the type of the range is well-known and thus the semantics of copying are well-known and can be depended on, but taht's not the case for generic, range-based code (which most of the range-based code in Phobos is).

As far as save goes, save is supposed to create an independent copy to iterate, and RefRange's documentation says that that's what it does, meaning that passing a RefRange to a function that uses the forward range API probably isn't very useful, but it would do what a forward range is supposed to do. Whether the actual implementation does that properly on not, I don't know - I'd have to study it - but per the documentation, it's at least designed to do the right thing with save.

- Jonathan M Davis

March 25, 2018

Re: Must ranges support `r1 = r2;`?

Posted by ag0aep6g
in reply to Simen Kjærås

ag0aep6g

Posted in reply to Simen Kjærås

On 03/24/2018 11:36 PM, Simen Kjærås wrote:
> void main()
> {
>      import std.range;
>      import std.stdio;
> 
>      string s = "foo";
>      auto r = refRange(&s);
> 
>      auto r2 = r;
>      r2 = r2.save;
>          /* Surprising: Effectively just does `s = s;` (i.e., nothing). */
> 
>      r2.popFront();
>      writeln(r); /* Surprising: Prints "oo". */
> }
> 
> None of this is surprising. When you call r2.popFront(), it's being forwarded to s.popFront. That's what refRange does, it's what it's supposed to do, and it's the only thing it could sensibly do.

I think it's surprising. It works as intended and as documented, but it's surprising. Note that the behavior is different when you change

    auto r2 = r;
    r2 = r2.save;

to

    auto r2 = r.save;

`.save` actually makes a new slice that can be popped independently from `s`. But the assignment throws that independent slice away.

> This is conceptually what refRange does:
> 
> struct RefRange(R) {
>      R* innerRange;
>      this(R* innerRange) {
>          this.innerRange = innerRange;
>      }
>      void popFront() {
>          innerRange.popFront();
>      }
>      @property auto front() {
>          return innerRange.front;
>      }
>      @property bool empty() {
>          return innerRange.empty;
>      }
> }

That's all fine. You're missing the interesting bit: opAssign. `save` might also be interesting, but the actual `save` is perfectly fine.

[...]
> So when you do this:
> 
>      string s = "foo";
>      auto r = refRange(&s).group;
>      writeln(r.save);
> 
> r.save is going to create a new Group range, which contains a RefRange, which ultimately points to s. writeln() then repeatedly calls popFront(). This mutates s, as specified above.
That's not how it should be. A saved range should be iterable independently from the original. That's the point of `.save`. If that's not possible, `.save` should not be there.

You're implying that a saved RefRange points to the same range as the original. That's not true. RefRange.save saves the underlying range and lets the new RefRange point to that. This is correct. The two ranges can be iterated independently.

It's inside `Group` that things get messed up. There you have this innocent looking implementation of `save` [1]:

    @property typeof(this) save() {
        typeof(this) ret = this;
        ret._input = this._input.save;
        ret._current = this._current;
        return ret;
    }

The problem is in the line `ret._input = this._input.save;`. That calls RefRange.opAssign. If you refactor the code to avoid opAssign, saving works properly. For example:

    private this(typeof(_input) inp, typeof(_current) cur)
    {
        this._input = inp; /* Initialization, not assigment. */
        this._current = cur; /* Ditto, but doesn't matter. */
    }
    @property typeof(this) save() {
        return typeof(this)(this._input.save, this._current);
    }

Note that it calls `_input.save` just the same as before. The code only avoids RefRange.opAssign.

[...]
> The 'chain', 'choose', and 'cycle' examples are an effect of strings not being random access ranges. I'm uncertain if we should call this a bug, but I agree the behavior is unexpected, so we probably should.

I don't think random access has anything to do with this.

> The splitter example is of course a bug. The crash, that is. The expected behavior is that the first writeln prints [foo, ar], which it does, and that the second print [].

Again, I don't agree. You're effectively saying that saving a RefRange isn't possible. If that were so, it simply shouldn't have a `save` method. But the `save` method works perfectly fine. It's Phobos (accidentally) calling RefRange.opAssign that breaks things.

[1] https://github.com/dlang/phobos/blob/3225b19c9bae4b7e8d7a93d2d8c22c94b2a1a6c5/std/algorithm/iteration.d#L1505-L1510

March 25, 2018

Re: Must ranges support `r1 = r2;`?

Posted by ag0aep6g
in reply to Jonathan M Davis

ag0aep6g

Posted in reply to Jonathan M Davis

On 03/25/2018 12:02 AM, Jonathan M Davis wrote:
> auto range2 = range1; // now, range1 can't be used until it's assigned to
> range2.popFront();
> 
> range1 = range2; // now, range2 can't be used until it's assigned to
> range1.popFront();
> 
> And I don't think that RefRange violates that.

What RefRange violates is the assumption that range1 gets discarded, and that it simply gets overwritten by range2.

Note that `auto range2 = range1;` does not have that problem, because it's initialization, not assignment. It doesn't call RefRange's funky opAssign.

The various misbehaving Phobos functions assume that assignment works the same as initialization. But it doesn't with RefRange.

[...]
> As far as save goes, save is supposed to create an independent copy to
> iterate, and RefRange's documentation says that that's what it does, meaning
> that passing a RefRange to a function that uses the forward range API
> probably isn't very useful, but it would do what a forward range is supposed
> to do. Whether the actual implementation does that properly on not, I don't
> know - I'd have to study it - but per the documentation, it's at least
> designed to do the right thing with save.

RefRange.save works correctly. opAssign is the problem.

March 24, 2018

Re: Must ranges support `r1 = r2;`?

Posted by Jonathan M Davis
in reply to ag0aep6g

Jonathan M Davis

Posted in reply to ag0aep6g

On Sunday, March 25, 2018 00:28:32 ag0aep6g via Digitalmars-d wrote:
> On 03/25/2018 12:02 AM, Jonathan M Davis wrote:
> > auto range2 = range1; // now, range1 can't be used until it's assigned
> > to
> > range2.popFront();
> >
> > range1 = range2; // now, range2 can't be used until it's assigned to
> > range1.popFront();
> >
> > And I don't think that RefRange violates that.
>
> What RefRange violates is the assumption that range1 gets discarded, and that it simply gets overwritten by range2.
>
> Note that `auto range2 = range1;` does not have that problem, because it's initialization, not assignment. It doesn't call RefRange's funky opAssign.
>
> The various misbehaving Phobos functions assume that assignment works the same as initialization. But it doesn't with RefRange.

I'll have to think about it. IIRC, for RefRange to work as it's supposed to, opAssign really does need to work the way that it does, but it's been a while since I did much with RefRange, so I don't know. Either way, assignment really isn't part of the range API. I don't think that any of the range traits even test that assignment works on any level. So, it could be argued that generic, range-based functions simply shouldn't ever be assigning one range to another. I'm pretty sure that technically, a range could @disable opAssign without violating the range API, though it wouldn't surprise me if such a range then didn't work with a lot of existing code.

On some level, this falls into the same trap as save in that _usually_ save is not required, but in some cases it is, so it's frequently the case that save gets skipped when it's supposed to be called, and it only gets caught when tests are added which use ranges which are reference types.

Similarly, it's quite common for range-based functions to assume that front is not transitive, but occasionally, front is transitive, which then causes problems. It's been argued that the solution for that is that any generic code which retains the result of front after popFront is called needs to call save before getting front in order to guarantee that the result of front is independent, but doing that is not common practice at all and has been debated on a number of occasions.

RefRange is operating in an area where the range API doesn't actually define the semantics - either in the traits themselves or the behavior that's documented as expected but not actually guaranteed by the traits (e.g. as is the case with save - it's behavior can't be enforced, just documented).

So, I think that there's a good argument to be made that truly generic range-based code should not be assigning one range to another, because the range API does not actually require that ranges even support assignment, but it also doesn't surprise me at all if folks do it quite a bit. We frequently have the problem with ranges that folks assume certain behaviors based on what the typical range does but which is not actually guaranteed by the range API. And I don't know what the solution is for that. Certainly, the result is that a lot of range-based code that exists doesn't actually work with any range that matches the template constraints. Phobos does a far better job of it than a lot of code, because it's written to be generic and has lots of tests to catch corner cases that many folks never test for, but we don't catch everything.

A lot of stuff would be cleaner if we could somehow require that all basic input ranges be reference types and all forward ranges be value types (which would include eliminating save from the range API), but that would be overly restrictive in some cases, and it would be too disruptive a change now. And for better or worse, RefRange probably wouldn't into that more restrictive paradigm.

- Jonathan M Davis

March 26, 2018

Re: Must ranges support `r1 = r2;`?

Posted by ag0aep6g
in reply to Jonathan M Davis

ag0aep6g

Posted in reply to Jonathan M Davis

On 03/25/2018 01:49 AM, Jonathan M Davis wrote:
> assignment really isn't part of the range API. I don't think that any of the
> range traits even test that assignment works on any level. So, it could be
> argued that generic, range-based functions simply shouldn't ever be
> assigning one range to another.
So we have to remove all range assignments from generic code. Ugh.

The first three: https://github.com/dlang/phobos/pull/6346

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation