Jump to page: 1 2 3
Thread overview
called copy constructor in foreach with ref on Range
Jun 21, 2020
Rogni
Jun 21, 2020
kinke
Jun 22, 2020
Jonathan M Davis
Jun 22, 2020
kinke
Jun 22, 2020
Jonathan M Davis
Jun 22, 2020
kinke
Jun 22, 2020
Jonathan M Davis
Jun 22, 2020
kinke
Jun 22, 2020
Jonathan M Davis
Jun 22, 2020
Stanislav Blinov
Jun 22, 2020
H. S. Teoh
Jun 22, 2020
Paul Backus
Jun 23, 2020
Jonathan M Davis
Jun 23, 2020
Stanislav Blinov
Jun 23, 2020
Jonathan M Davis
Jun 23, 2020
Stanislav Blinov
Jun 23, 2020
Jonathan M Davis
Jun 23, 2020
Stanislav Blinov
Jun 23, 2020
Paul Backus
Jun 23, 2020
H. S. Teoh
Jun 23, 2020
Stanislav Blinov
Jun 23, 2020
Sebastiaan Koppe
Jun 23, 2020
Jonathan M Davis
Jun 23, 2020
H. S. Teoh
June 21, 2020
https://pastebin.com/BYcRN8T2

why called copy constructor for ref Range struct type in foreach ?

```
struct MyRange {
    private int current_row = 0;
    private int rows_count = 5;
    @disable this(this); // <===
    bool empty () { return current_row == rows_count; }
    int front() { return current_row*current_row; }
    void popFront() { ++current_row; }
}

struct Container {
    @disable this(this); // <===
    int opApply(int delegate(int) dg) {
        foreach (int row; 0..5) dg(row*row);
        return 1;
    }
}

void foreach_iterable(Type)(ref Type iterable) {
    import std.stdio: writeln;
    foreach (int row_value; iterable) { // test_range.d(20): Error: struct test_range.MyRange is not copyable because it is annotated with @disable
        row_value.writeln;
    }
}

void call_iterable(Type)() {
    Type iterable;
    foreach_iterable(iterable);
}

void main() {
    call_iterable!Container(); // work
    call_iterable!MyRange(); // does`t work
}
```

ldc2 version: version   1.8.0 (DMD v2.078.3, LLVM 5.0.1)


June 21, 2020
A foreach over a custom range makes an initial copy, so that the original range is still usable after the foreach (not empty).
June 22, 2020
On Sunday, June 21, 2020 2:25:37 PM MDT kinke via Digitalmars-d-learn wrote:
> A foreach over a custom range makes an initial copy, so that the original range is still usable after the foreach (not empty).

No, it's not so that the range will be useable afterwards. In fact, for generic code, you must _never_ use a range again after passing it to foreach, because the copying semantics of ranges are not specified, and you can get radically different behavior depending on what the copying semantics of a range are (e.g. if it's a class, then it's just copying the reference). In general, you should never use a range after it's been copied unless you know exactly what type of range you're dealing with and what its copying behavior is. If you want an independent copy, you need to use save.

The reason that foreach copies the range is simply due to how the code is lowered. e.g.

    foreach(e; range)
    {
    }

essentially becomes

    for(auto r = range; !r.empty; r.popFront())
    {
        auto e = r.front;
    }

And the fact that a copy is made is likely simply a result of it mimicking what happens with arrays. Either way, you should never be doing something like

    foreach(e; range)
    {
    }

    auto v = range.front;

in generic code. It needs to be

    foreach(e; range.save)
    {
    }

    auto v = range.front;

instead.

- Jonathan M Davis



June 22, 2020
On Monday, 22 June 2020 at 16:25:11 UTC, Jonathan M Davis wrote:
> The reason that foreach copies the range is simply due to how the code is lowered. e.g.
>
>     foreach(e; range)
>     {
>     }
>
> essentially becomes
>
>     for(auto r = range; !r.empty; r.popFront())
>     {
>         auto e = r.front;
>     }
>
> And the fact that a copy is made is likely simply a result of it mimicking what happens with arrays.

I was trying to explain/guess the rationale for that copy (not in terms of how it's implemented, but why it's implemented that way). This 'mimicking-an-array' doesn't make any sense to me. If the original idea wasn't to make sure the range is reusable afterwards, I guess it's done for implementation simplicity, to promote an rvalue range to the required lvalue.

If copying a range is considered to be generally unsafe and a common pitfall (vs. the save() abomination), maybe range-foreach shouldn't allow any lvalue ranges in the 1st place, thus not making any copies and forcing the user to specify some rvalue (as returned by `range.save()`, or `move(range)` if destructive iteration is indeed intended).
June 22, 2020
On Monday, June 22, 2020 10:59:45 AM MDT kinke via Digitalmars-d-learn wrote:
> If copying a range is considered to be generally unsafe and a
> common pitfall (vs. the save() abomination), maybe range-foreach
> shouldn't allow any lvalue ranges in the 1st place, thus not
> making any copies and forcing the user to specify some rvalue (as
> returned by `range.save()`, or `move(range)` if destructive
> iteration is indeed intended).

Copying ranges isn't a problem. Almost all range-based code copies ranges quite a bit (e.g. almost all range-based functions take a range by value, and chaining range-based function calls wouldn't work if it didn't). Rather, it's using the original after a copy has been made that's a problem (at least in generic code), because the semantics of copying aren't part of the range API and can vary wildly depending on the type.

Really, the issues with copying were not properly taken into account when the range API was created, and mistakes were made. If we were to rework the range API at this point, we would probably get rid of save and require that copying be equivalent to save for forward ranges (which would then make it illegal for classes to be forward ranges without wrapping them in a struct). That would fix the problem for forward ranges, but basic input ranges can't have those semantics, or they could be forward ranges, so exactly what the correct solution would be is debatable, and if generic code operates on both basic input ranges and forward ranges, the copying semantics won't always be the same, which is the problem we have now. So, if we were to rework the range API, it's one of the open problems that needs to be sorted out.

Regardless, as things stand, because copying a range can have reference semantics, value semantics, or something in between, in practice, that means that generic code cannot use a range once it's been copied, because the semantics will vary from type to type. The copy can be used (and most range-based code relies on that), but the original needs to be left alone. Non-generic code has more leeway, because it can rely on the behavior of specific range types, but with generic code, you have to be careful. And foreach is just one of the places where the issue of not using the original after making a copy comes up.

- Jonathan M Davis



June 22, 2020
On Monday, 22 June 2020 at 19:03:44 UTC, Jonathan M Davis wrote:
> in practice, that means that generic code cannot use a range once it's been copied

Translating to a simple rule-of-thumb: never copy a (struct) range, always move.
June 22, 2020
On Monday, June 22, 2020 1:41:34 PM MDT kinke via Digitalmars-d-learn wrote:
> On Monday, 22 June 2020 at 19:03:44 UTC, Jonathan M Davis wrote:
> > in practice, that means that generic code cannot use a range once it's been copied
>
> Translating to a simple rule-of-thumb: never copy a (struct)
> range, always move.

You're unlikely to find much range-based code that does that and there really isn't much point in doing that. Again, copying isn't the problem. It's using the original after making the copy that's the problem. And moving doesn't fix anything, since the original variable is still there (just in its init state, which would still be invalid to use in generic code and could outright crash in some cases if you tried to use it - e.g. if it were a class reference, since it would then be null). So, code that does a move could accidentally use the original range after the move and have bugs just like code that copies the range has bugs if the original is used after the copy has been made. So, the rule of thumb is not that you should avoid copying ranges. It's that once you've copied a range, you should then use only the copy and not the original.

- Jonathan M Davis



June 22, 2020
On Monday, 22 June 2020 at 20:51:37 UTC, Jonathan M Davis wrote:
> [...]

That's why I put the struct in parantheses. Moving a class ref makes hardly any sense, but I've also never written a *class* to represent a range. Moving is the no-brainer solution for transferring ownership of struct ranges and invalidating the original instance.
June 22, 2020
On Monday, 22 June 2020 at 20:51:37 UTC, Jonathan M Davis wrote:

> You're unlikely to find much range-based code that does that and there really isn't much point in doing that. Again, copying isn't the problem. It's using the original after making the copy that's the problem.

Copy *is* the problem. If you can't make a copy (i.e. you get a compilation error) - you don't have a problem, the compiler just found a bug for you. Sadly, historically D's libraries were built around lots of redundant copying, even though there are very few cases where copies are actually required. The reason why we're "unlikely to find much range-based code that does that [move]" is (a) sloppy or shortsighted design and (b) poor language support. The latter hopefully stands to change.

> And moving doesn't fix anything, since the original variable is still there
> (just in its init state, which would still be invalid to use in generic code and could outright crash in some cases if you tried to use it - e.g. if it were a class reference, since it would then be null).

Eh? A range in 'init' state should be an empty range. If you get a crash from that then there's a bug in that range's implementation, not in user code.

> So, code that does a move could accidentally use the original range after the move and have bugs just like code that copies the range has bugs if the original is used after the copy has been made. So, the rule of thumb is not that you should avoid copying ranges. It's that once you've copied a range, you should then use only the copy and not the original.

That is not true. Copying of forward ranges is absolutely fine. It's what the current `save()` primitive is supposed to do. It's the copying of input ranges should just be rejected, statically.
June 22, 2020
On Mon, Jun 22, 2020 at 09:11:07PM +0000, Stanislav Blinov via Digitalmars-d-learn wrote:
> On Monday, 22 June 2020 at 20:51:37 UTC, Jonathan M Davis wrote:
[...]
> > And moving doesn't fix anything, since the original variable is still there (just in its init state, which would still be invalid to use in generic code and could outright crash in some cases if you tried to use it - e.g. if it were a class reference, since it would then be null).
> 
> Eh? A range in 'init' state should be an empty range. If you get a crash from that then there's a bug in that range's implementation, not in user code.

Don't be shocked when you find out how many Phobos ranges have .init states that are invalid (e.g., non-empty, but .front and .popFront will crash / return invalid values).


> > So, code that does a move could accidentally use the original range after the move and have bugs just like code that copies the range has bugs if the original is used after the copy has been made. So, the rule of thumb is not that you should avoid copying ranges. It's that once you've copied a range, you should then use only the copy and not the original.
> 
> That is not true. Copying of forward ranges is absolutely fine. It's what the current `save()` primitive is supposed to do. It's the copying of input ranges should just be rejected, statically.

Jonathan is coming from the POV of generic code.  The problem with move leaving the original range in its .init state isn't so much that it will crash or anything (even though as you said that does indicate a flaw in the range's implementation), but that the semantics of generic code changes in subtle ways. For example:

	auto myGenericFunc(R)(R r) {
		...
		foreach (x; r) {
			doSomething(x);
		}
		if (!r.empty)
			doSomethingElse(r);
		...
	}

Suppose for argument's sake that the above foreach/if structure is an essential part of whatever algorithm myGenericFunc is implementing. Now there's a problem, because if R has array-like semantics, then the algorithm will do one thing, but if R has reference-like or move semantics, then the behaviour of the algorithm will be different, even if both ranges represent the same sequence of input values.

Note that in real-life code, this problem can be far more subtle than a blatant foreach loop and if statement like the above. For example, consider a function that drops the first n elements of a range. Your generic function might want to pop the first n elements then do something else with the rest of the range.  Well, if you write it the obvious way:

	auto myAlgo(R)(R r) {
		size_t n = ...;
		dropFirstN(r, n);
		... // do something else with r
	}

then you have a subtle bug, because the state of r after the call to dropFirstN might be completely different depending on whether r behaves like an array or like a by-reference or move type.


T

-- 
Truth, Sir, is a cow which will give [skeptics] no more milk, and so they are gone to milk the bull. -- Sam. Johnson
« First   ‹ Prev
1 2 3