called copy constructor in foreach with ref on Range (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » called copy constructor in foreach with ref on Range (page 2)

June 22, 2020

Re: called copy constructor in foreach with ref on Range

Posted by Jonathan M Davis
in reply to kinke

Jonathan M Davis

Posted in reply to kinke

On Monday, June 22, 2020 3:10:28 PM MDT kinke via Digitalmars-d-learn wrote:
> On Monday, 22 June 2020 at 20:51:37 UTC, Jonathan M Davis wrote:
> > [...]
>
> That's why I put the struct in parantheses. Moving a class ref makes hardly any sense, but I've also never written a *class* to represent a range. Moving is the no-brainer solution for transferring ownership of struct ranges and invalidating the original instance.

Invalidating the instance doesn't actually prevent it from being misused though. At best, the fact that you moved the instance rather than copying it makes it more likely that accidentally using the instance will cause more extreme bugs or crash. The core issue that the original is potentially invalid but can still be used exists whether you copy the range or move it.

Also, since the issue here is generic code, you have to take classes into account and cannot assume that the range is a struct. True, it's usually a bad idea to use classes for ranges, and ideally, we'd alter the range API so that classes weren't valid ranges, but the reality of the matter is that they are, and generic code has to take that into account.

- Jonathan M Davis

June 22, 2020

Re: called copy constructor in foreach with ref on Range

Posted by Paul Backus
in reply to H. S. Teoh

Paul Backus

Posted in reply to H. S. Teoh

On Monday, 22 June 2020 at 21:33:08 UTC, H. S. Teoh wrote:
>
> Jonathan is coming from the POV of generic code.  The problem with move leaving the original range in its .init state isn't so much that it will crash or anything (even though as you said that does indicate a flaw in the range's implementation), but that the semantics of generic code changes in subtle ways. For example:
>
> [...]

Seems to me like adding some unit tests with non-copyable input ranges to Phobos could expose a number of latent bugs.

June 23, 2020

Re: called copy constructor in foreach with ref on Range

Posted by Stanislav Blinov
in reply to H. S. Teoh

Stanislav Blinov

Posted in reply to H. S. Teoh

On Monday, 22 June 2020 at 21:33:08 UTC, H. S. Teoh wrote:

> Don't be shocked when you find out how many Phobos ranges have .init states that are invalid (e.g., non-empty, but .front and .popFront will crash / return invalid values).

Which ones?

> Jonathan is coming from the POV of generic code.  The problem with move leaving the original range in its .init state isn't so much that it will crash or anything (even though as you said that does indicate a flaw in the range's implementation), but that the semantics of generic code changes in subtle ways.

Well that means that the code is not generic, i.e. the bug originates in the design, not implementation.

> 	auto myGenericFunc(R)(R r) {
> 		...
> 		foreach (x; r) {
> 			doSomething(x);
> 		}
> 		if (!r.empty)
> 			doSomethingElse(r);
> 		...
> 	}
>
> Suppose for argument's sake that the above foreach/if structure is an essential part of whatever algorithm myGenericFunc is implementing. Now there's a problem, because if R has array-like semantics, then the algorithm will do one thing, but if R has reference-like or move semantics, then the behaviour of the algorithm will be different, even if both ranges represent the same sequence of input values.

Yep, definitely not generic. Which is exactly the kind of error that should be caught at compile time. Which we, sadly, can't do with the current range API.

> consider a function that drops the first n elements of a range.
> Your generic function might want to pop the first n elements then do something else with the rest of the range.  Well, if you write it the obvious way:
>
> 	auto myAlgo(R)(R r) {
> 		size_t n = ...;
> 		dropFirstN(r, n);
> 		... // do something else with r
> 	}
>
> then you have a subtle bug, because the state of r after the call to dropFirstN might be completely different depending on whether r behaves like an array or like a by-reference or move type.

Err... that one is actually fine. It would take the range by ref and pop off the elements. There already is such a function - the popFrontN. It's the functions that take ranges by value that present the not-so-subtle issue with reference types. For example, its chainable would-be-counterpart drop(). "Would-be" because that one takes the argument by value.
We should move toward disallowing reference types to be ranges. A good deal of the rest can be solved with API and design changes (like disallowing copying of input ranges).

It's kind of interesting how, with the ongoing discussion about range API in the general forum, a couple of range questions are brought up in learn. Something, as they say, is in the air.

June 22, 2020

Re: called copy constructor in foreach with ref on Range

Posted by Jonathan M Davis
in reply to Paul Backus

Jonathan M Davis

Posted in reply to Paul Backus

On Monday, June 22, 2020 3:38:02 PM MDT Paul Backus via Digitalmars-d-learn wrote:
> On Monday, 22 June 2020 at 21:33:08 UTC, H. S. Teoh wrote:
> > Jonathan is coming from the POV of generic code.  The problem with move leaving the original range in its .init state isn't so much that it will crash or anything (even though as you said that does indicate a flaw in the range's implementation), but that the semantics of generic code changes in subtle ways. For example:
> >
> > [...]
>
> Seems to me like adding some unit tests with non-copyable input ranges to Phobos could expose a number of latent bugs.

At this point, non-copyable ranges really aren't a thing. foreach does not support them and neither does Phobos. isInputRange doesn't actually reject them, but they don't really work in practice, and it's unlikely that you're going to find much in Phobos that happens to work with them. isForwardRange does outright reject them though.

- Jonathan M Davis

June 22, 2020

Re: called copy constructor in foreach with ref on Range

Posted by Jonathan M Davis
in reply to Stanislav Blinov

Jonathan M Davis

Posted in reply to Stanislav Blinov

On Monday, June 22, 2020 3:11:07 PM MDT Stanislav Blinov via Digitalmars-d- learn wrote:
> On Monday, 22 June 2020 at 20:51:37 UTC, Jonathan M Davis wrote:
> > You're unlikely to find much range-based code that does that and there really isn't much point in doing that. Again, copying isn't the problem. It's using the original after making the copy that's the problem.
>
> Copy *is* the problem. If you can't make a copy (i.e. you get a compilation error) - you don't have a problem, the compiler just found a bug for you. Sadly, historically D's libraries were built around lots of redundant copying, even though there are very few cases where copies are actually required. The reason why we're "unlikely to find much range-based code that does that [move]" is (a) sloppy or shortsighted design and (b) poor language support. The latter hopefully stands to change.

As things stand, uncopyable ranges aren't really a thing, and common range idiomns rely on ranges being copyable. We'd need some major redesigning to make uncopyable ranges work, and personally, I don't think that it's worth the trouble.

> > And moving doesn't fix anything, since the original variable is
> > still there
> > (just in its init state, which would still be invalid to use in
> > generic code and could outright crash in some cases if you
> > tried to use it - e.g. if it were a class reference, since it
> > would then be null).
>
> Eh? A range in 'init' state should be an empty range. If you get a crash from that then there's a bug in that range's implementation, not in user code.

The range API has no requirement that the init value of a range be empty, and any generic code which relies on such behavior is buggy. In the case of classes, it would outright crash, because the init value would be null. I agree that ideally the range API would require that the init state of a range be a valid, empty range, but that's simply not how it works right now. In order to make it work that way, we'd have to redesign the range API in at least some respects (e.g. getting rid of save and making it illegal for classes to be forward ranges).

> > So, code that does a move could accidentally use the original range after the move and have bugs just like code that copies the range has bugs if the original is used after the copy has been made. So, the rule of thumb is not that you should avoid copying ranges. It's that once you've copied a range, you should then use only the copy and not the original.
>
> That is not true. Copying of forward ranges is absolutely fine. It's what the current `save()` primitive is supposed to do. It's the copying of input ranges should just be rejected, statically.

As things stand, it is _not_ true that it's safe to copy forward ranges and then use the original. Sure, it will work for some ranges, but for others it won't. The entire reason that save exists is for ranges that are classes, because copying them does not result in an independent range. The range API does not require that copying a range result in an independent copy. It's not even guaranteed that copying a range that's a struct will result in an indpendent copy. Depending on the range type, copying it could result in an independent copy, or it could result in a reference to to the original range, or it could even result in part of the state being copied and part of it being a reference (e.g. if the current front is stored directly in the range, but the iteration state is stored by reference). If you rely on copying a range resulting in an independent copy, you will have buggy code - even if it's a forward range. The _only_ way that the range API specifies that you can get an independent copy of a range is to use save, and generic code should never rely on any other mechanism for that.

Now, ideally, we'd get rid of save and require that copying a forward range result in an independent copy (which would then require that a class be wrapped in a struct to be a range), but that's simply not how the current range API works. Because the current range API does not make any guarantees about the semantics of copying a range, generic code cannot assume that those semantics are the same as save. As such, if you want an independent copy of a forward range in generic code, you must use save, or the code will be buggy. Similarly, generic code cannot use the original range after it's been copied (unless it simply never does anything with the copy), because mutating the copy may or may not mutate the original, and the original may or may not even be in a valid state if the copy is mutated.

With non-generic code, you can rely on the behaviors of specific ranges and what will happen when you copy them (e.g. not bothering to call save when passing strings around), but with generic code, that's not true. And there's plenty of D code out there that works correctly with a specific range type but which would fail miserably if it were used generically, because it makes assumption which the range API does not guarantee (like the semantics of copying a range).

- Jonathan M Davis

June 22, 2020

Re: called copy constructor in foreach with ref on Range

Posted by Jonathan M Davis

Jonathan M Davis

On Monday, June 22, 2020 3:33:08 PM MDT H. S. Teoh via Digitalmars-d-learn wrote:
> On Mon, Jun 22, 2020 at 09:11:07PM +0000, Stanislav Blinov via Digitalmars-
d-learn wrote:
> > That is not true. Copying of forward ranges is absolutely fine. It's what the current `save()` primitive is supposed to do. It's the copying of input ranges should just be rejected, statically.
>
> Jonathan is coming from the POV of generic code.  The problem with move leaving the original range in its .init state isn't so much that it will crash or anything (even though as you said that does indicate a flaw in the range's implementation), but that the semantics of generic code changes in subtle ways. For example:
>
>   auto myGenericFunc(R)(R r) {
>       ...
>       foreach (x; r) {
>           doSomething(x);
>       }
>       if (!r.empty)
>           doSomethingElse(r);
>       ...
>   }
>
> Suppose for argument's sake that the above foreach/if structure is an essential part of whatever algorithm myGenericFunc is implementing. Now there's a problem, because if R has array-like semantics, then the algorithm will do one thing, but if R has reference-like or move semantics, then the behaviour of the algorithm will be different, even if both ranges represent the same sequence of input values.
>
> Note that in real-life code, this problem can be far more subtle than a blatant foreach loop and if statement like the above. For example, consider a function that drops the first n elements of a range. Your generic function might want to pop the first n elements then do something else with the rest of the range.  Well, if you write it the obvious way:
>
>   auto myAlgo(R)(R r) {
>       size_t n = ...;
>       dropFirstN(r, n);
>       ... // do something else with r
>   }
>
> then you have a subtle bug, because the state of r after the call to dropFirstN might be completely different depending on whether r behaves like an array or like a by-reference or move type.

Exactly. It's because of issues like this that generic, range-based functions need to be tested with a variety of range types - including reference types. Without that, bugs are routinely missed. I think that a fairly large percentage of Phobos (maybe even all of it) gets that right now, because we've added tests for reference type ranges, but there used to be quite a few bugs in Phobos, because there were no such tests. It's frequently the case that people write range-based code under the assumption that ranges will act like dynamic arrays, and their code then also works with many ranges which have similar copying behavior, but it doesn't work with all range types. Such problems don't generally get caught without extensive testing. It matters a lot less for code within a program that doesn't use a large variety of range types, but for libraries using generic functions, it's a must.

If we do actually rework the range API as has occasionally been discussed, it would be _really_ nice if we could more thoroughly restrict the copying semantics of ranges so that some of these problems go away, but without such a redesign, we're stuck with such problems. And even with a redesign, the best fix for it is far from obvious, because basic input ranges and forward ranges inherently have different copying semantics. We can probably require that copying forward ranges always results in an independent copy (thereby essentially having value semantics), but basic input ranges can't really be value types, because if they could, they could be forward ranges, meaning that they're generally going to either be reference types or pseudo-reference types, which of course have different copying semantics. So, as long as the range API is set up so that the same code can operate on both basic input and forward ranges, we pretty much inherently have a problem with the copying semantics in ranges being inconsistent - though requiring value semantics for forward ranges would still be a significant improvement.

- Jonathan M Davis

June 22, 2020

Re: called copy constructor in foreach with ref on Range

Posted by H. S. Teoh

H. S. Teoh

On Mon, Jun 22, 2020 at 08:53:22PM -0600, Jonathan M Davis via Digitalmars-d-learn wrote: [...]
> Exactly. It's because of issues like this that generic, range-based functions need to be tested with a variety of range types - including reference types. Without that, bugs are routinely missed.
[...]
> If we do actually rework the range API as has occasionally been discussed, it would be _really_ nice if we could more thoroughly restrict the copying semantics of ranges so that some of these problems go away, but without such a redesign, we're stuck with such problems.

Redesign or not, I think the new range API should be much more specific in specifying exactly what behaviours ranges ought to have.  The current API is under-specified, with two of the main problem points that I can remember being:

1) Transient-ness: is the value return by .front guaranteed to remain
   valid, or does .popFront invalidate it?  This is not specified in the
   current spec, and we have ranges in Phobos of either type (e.g., iota
   and .byLine), with a lot of code just blindly assuming non-transience
   only to discover that things break down when handed a .byLine
   instance.

2) The exact semantics of copying (assigning) a range. What parts of a
   range's state is/isn't preserved when you assign a range to a local
   variable, for example?  What happens if you pass a range to a
   function and then continue to operate on the original?  All of this
   must be specified precisely in the range API so that we don't have
   one range acting one way and another range acting another way,
   thereby compromising the semantics of generic code.

T

-- 
Democracy: The triumph of popularity over principle. -- C.Bond

June 23, 2020

Re: called copy constructor in foreach with ref on Range

Posted by Stanislav Blinov
in reply to Jonathan M Davis

Stanislav Blinov

Posted in reply to Jonathan M Davis

On Tuesday, 23 June 2020 at 02:41:55 UTC, Jonathan M Davis wrote:

> As things stand, uncopyable ranges aren't really a thing, and common range idiomns rely on ranges being copyable.

Which idioms are those? I mean, genuine idioms, not design flaws like e.g. references.

> We'd need some major redesigning to make uncopyable ranges work, and personally, I don't think that it's worth the trouble.

Of course we would. Andrei is entertaining changing the whole input range API. Though he, like you, seems opposed to the idea of uncopyables.

> The range API has no requirement that the init value of a range be empty, and any generic code which relies on such behavior is buggy. In the case of classes, it would outright crash, because the init value would be null.

Yeah, that's a shame.

> I agree that ideally the range API would require that the init state of a range be a valid, empty range, but that's simply not how it works right now. In order to make it work that way, we'd have to redesign the range API in at least some respects (e.g. getting rid of save and making it illegal for classes to be forward ranges).

Better yet - making it illegal for classes to be ranges.

> As things stand, it is _not_ true that it's safe to copy forward ranges and then use the original. Sure, it will work for some ranges, but for others it won't. The entire reason that save exists is for ranges that are classes, because copying them does not result in an independent range. The range API does not require that copying a range result in an independent copy. It's not even guaranteed that copying a range that's a struct will result in an indpendent copy. Depending on the range type, copying it could result in an independent copy, or it could result in a reference to to the original range, or it could even result in part of the state being copied and part of it being a reference (e.g. if the current front is stored directly in the range, but the iteration state is stored by reference).

If copying isn't making a copy, it's not copying.

> If you rely on copying a range resulting in an independent copy, you will have buggy code - even if it's a forward range. The _only_ way that the range API specifies that you can get an independent copy of a range is to use save, and generic code should never rely on any other mechanism for that.

That's exactly what I said. "Copying of forward ranges is absolutely fine. It's what the current `save()` primitive is supposed to do."
...Except, of course, that that shouldn't be the case. *Copying* should be creating a copy. :)

> Now, ideally, we'd get rid of save and require that copying a forward range result in an independent copy (which would then require that a class be wrapped in a struct to be a range)

Yes!

> but that's simply not how the current range API works.

No, it's not :(

June 22, 2020

Re: called copy constructor in foreach with ref on Range

Posted by Jonathan M Davis
in reply to Stanislav Blinov

Jonathan M Davis

Posted in reply to Stanislav Blinov

On Monday, June 22, 2020 9:25:55 PM MDT Stanislav Blinov via Digitalmars-d- learn wrote:
> On Tuesday, 23 June 2020 at 02:41:55 UTC, Jonathan M Davis wrote:
> > As things stand, uncopyable ranges aren't really a thing, and common range idiomns rely on ranges being copyable.
>
> Which idioms are those? I mean, genuine idioms, not design flaws like e.g. references.

It is extremely common to wrap ranges in other ranges (and in fact, you basically have to in order to have lazy ranges). That really doesn't work very well - if at all - if you can't copy the range. It might be possible with a bunch of explicit calls to move, but that would result in range-based code in general being @system without a clean way to use @trusted, since whether it's really safe to mark such moves with @trusted would depend on the specific range implementation, which then is a big problem with generic code. Regardless of whether it's actually possible to make it work though, it's a complete mess in comparison to simply copying. And the fact that chaining range-based calls is extremely common makes the problem that much worse.

The way that ranges are routinely passed around and wrapped works as well as it does, because ranges are copyable.

> > As things stand, it is _not_ true that it's safe to copy forward ranges and then use the original. Sure, it will work for some ranges, but for others it won't. The entire reason that save exists is for ranges that are classes, because copying them does not result in an independent range. The range API does not require that copying a range result in an independent copy. It's not even guaranteed that copying a range that's a struct will result in an indpendent copy. Depending on the range type, copying it could result in an independent copy, or it could result in a reference to to the original range, or it could even result in part of the state being copied and part of it being a reference (e.g. if the current front is stored directly in the range, but the iteration state is stored by reference).
>
> If copying isn't making a copy, it's not copying.

The semantics of copying a variable or object vary wildly depending on the type regardless of whether we're talking about ranges. Copying a pointer or reference is still a copy even if it isn't a deep copy. Copying a range _always_ results in a copy, just like it does with any other type. It just doesn't necessarily result in a copy which can be independently iterated.

- Jonathan M Davis

June 22, 2020

Re: called copy constructor in foreach with ref on Range

Posted by H. S. Teoh
in reply to Stanislav Blinov

H. S. Teoh

Posted in reply to Stanislav Blinov

On Tue, Jun 23, 2020 at 03:25:55AM +0000, Stanislav Blinov via Digitalmars-d-learn wrote:
> On Tuesday, 23 June 2020 at 02:41:55 UTC, Jonathan M Davis wrote:
> 
> > As things stand, uncopyable ranges aren't really a thing, and common range idiomns rely on ranges being copyable.
> 
> Which idioms are those? I mean, genuine idioms, not design flaws like e.g.  references.
> 
> > We'd need some major redesigning to make uncopyable ranges work, and personally, I don't think that it's worth the trouble.
> 
> Of course we would. Andrei is entertaining changing the whole input range API. Though he, like you, seems opposed to the idea of uncopyables.
[...]

I'm also wondering what's the motivation behind supporting non-copyable ranges, and whether it's worth the effort and inevitable complications to support it if it's a rare use-case.  Do you have any compelling use-case in mind, that might make this worthwhile?


T

-- 
Любишь кататься - люби и саночки возить.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation