March 19, 2020
On 3/19/20 10:00 AM, Dukc wrote:
> On Wednesday, 18 March 2020 at 16:29:53 UTC, H. S. Teoh wrote:
>> This is a dangerous assumption.  My personal advice is, if you expect a range to retain its contents after iteration, call .save explicitly. Don't assume anything not explicitly required by the range API.
> 
> This means that even if the api does thing X right now, I should not assume it won't change unless it's documented, right?

If you want to copy a forward range use save. Anything else is possible to break in an implementation detail later.

-Steve
March 19, 2020
On Thursday, 19 March 2020 at 14:11:20 UTC, Steven Schveighoffer wrote:
>
> If you want to copy a forward range use save. Anything else is possible to break in an implementation detail later.
>
> -Steve

Ouch - my code has to be totally broken. And I don't think I'm the only one.
March 19, 2020
On 3/19/20 10:28 AM, Dukc wrote:
> On Thursday, 19 March 2020 at 14:11:20 UTC, Steven Schveighoffer wrote:
>>
>> If you want to copy a forward range use save. Anything else is possible to break in an implementation detail later.
>>
> 
> Ouch - my code has to be totally broken. And I don't think I'm the only one.

You are not the only one. Almost all range functions are tested with and use arrays, which do exactly the same for .save or copying. The reason why `save` is such a bad solution is because there is generally no penalty for ignoring it (until there is).

The only way to "fix" this IMO is to migrate to a system where standard (non-forward) input ranges are not copyable, and deprecate save. It likely will never happen.

-Steve
March 19, 2020
On Thursday, 19 March 2020 at 14:46:39 UTC, Steven Schveighoffer wrote:
> The only way to "fix" this IMO is to migrate to a system where standard (non-forward) input ranges are not copyable, and deprecate save. It likely will never happen.
>
> -Steve

Perhaps there is another way. Individual ranges can give guarantees about their own copy behaviour if they want. We could document the current copy behaviour of existing Phobos ranges and say that relying on it is thereafter fine.

But then the question with `chunkBy` is, should it take the opportunity to start to behave like most other ranges in this regard?
March 19, 2020
On Thu, Mar 19, 2020 at 10:46:39AM -0400, Steven Schveighoffer via Digitalmars-d wrote:
> On 3/19/20 10:28 AM, Dukc wrote:
> > On Thursday, 19 March 2020 at 14:11:20 UTC, Steven Schveighoffer wrote:
> > > 
> > > If you want to copy a forward range use save. Anything else is possible to break in an implementation detail later.
> > > 
> > 
> > Ouch - my code has to be totally broken. And I don't think I'm the only one.
> 
> You are not the only one. Almost all range functions are tested with and use arrays, which do exactly the same for .save or copying. The reason why `save` is such a bad solution is because there is generally no penalty for ignoring it (until there is).

Yeah, Andrei has said that in retrospect, .save was a design mistake, he should have made the input range vs. forward range distinction keyed on copyability or ref/non-ref instead.  But to be fair, back in the day when the range API was first designed, D didn't have the necessary language facilities to be able to handle a better solution, unlike now when we have @disable, copy ctors, and a bunch of other language features that make such a solution more feasible.


> The only way to "fix" this IMO is to migrate to a system where standard (non-forward) input ranges are not copyable, and deprecate save. It likely will never happen.
[...]

It will happen if std.v2 happens... ;-)


T

-- 
"How are you doing?" "Doing what?"
March 19, 2020
On Thu, Mar 19, 2020 at 03:35:28PM +0000, Dukc via Digitalmars-d wrote:
> On Thursday, 19 March 2020 at 14:46:39 UTC, Steven Schveighoffer wrote:
> > The only way to "fix" this IMO is to migrate to a system where standard (non-forward) input ranges are not copyable, and deprecate save. It likely will never happen.
[...]
> Perhaps there is another way. Individual ranges can give guarantees about their own copy behaviour if they want. We could document the current copy behaviour of existing Phobos ranges and say that relying on it is thereafter fine.

The problem is, you *cannot* give any guarantees about copy behaviour, because it depends on the behaviour of the incoming range. For example, if you pass the output of .byChunk to another range algorithm, that second algorithm cannot guarantee copy behaviour anymore.

In fact, all you have to do is to wrap a forward range in an InputRangeObject (let's say you need to alternate between two different range types (but with compatible elements) at runtime, then you'll need to do this), and now you have a forward range with by-reference semantics that requires the use of .save in order to retain its current position.

This is (likely) part of the reason why the range API does not specify the behaviour of a range once it's iterated over without using .save: it depends on implementation details.


> But then the question with `chunkBy` is, should it take the opportunity to start to behave like most other ranges in this regard?

I don't see why this should be a compelling reason to change chunkBy -- since copy behaviour is not specified by the range API (for IMO good reasons -- it's implementation-specific).  But I do agree that the implementation of chunkBy could be improved for other reasons, like not using the expensive ref-counting implementation when the underlying range is, say, a native array that you could just slice over.


T

-- 
MAS = Mana Ada Sistem?
March 19, 2020
On Thursday, 19 March 2020 at 16:01:39 UTC, H. S. Teoh wrote:
>
> The problem is, you *cannot* give any guarantees about copy behaviour, because it depends on the behaviour of the incoming range. For example, if you pass the output of .byChunk to another range algorithm, that second algorithm cannot guarantee copy behaviour anymore.
>
> In fact, all you have to do is to wrap a forward range in an InputRangeObject (let's say you need to alternate between two different range types (but with compatible elements) at runtime, then you'll need to do this), and now you have a forward range with by-reference semantics that requires the use of .save in order to retain its current position.

The documentation would say that the copy behaviour of the range is the same as the source range has. A guarantee to depend on the source range is still a guarantee, and definitely better than the present situation.
March 19, 2020
On Thu, Mar 19, 2020 at 05:15:51PM +0000, Dukc via Digitalmars-d wrote:
> On Thursday, 19 March 2020 at 16:01:39 UTC, H. S. Teoh wrote:
> > 
> > The problem is, you *cannot* give any guarantees about copy behaviour, because it depends on the behaviour of the incoming range. For example, if you pass the output of .byChunk to another range algorithm, that second algorithm cannot guarantee copy behaviour anymore.
> > 
> > In fact, all you have to do is to wrap a forward range in an InputRangeObject (let's say you need to alternate between two different range types (but with compatible elements) at runtime, then you'll need to do this), and now you have a forward range with by-reference semantics that requires the use of .save in order to retain its current position.
> 
> The documentation would say that the copy behaviour of the range is the same as the source range has. A guarantee to depend on the source range is still a guarantee, and definitely better than the present situation.

A guarantee about copy behaviour potentially binds the Phobos maintainers to the specifics of the current implementation of the algorithm.  I'm not sure that's what we want, since it may limit future improvements.

The thing about the range API is that it's like a contract between user code and Phobos; the way you use it should be according to the contract, anything not stated by the contract should not be relied upon even if currently it happens to work.  That's the principle of encapsulation. Behaviours that arise from the specifics of how something is currently implemented are implementation details that shouldn't leak into the calling code, including assumptions about copy behaviour.  Arguably, relying on a specific copy behaviour is an instance of leaky abstraction, since outside code shouldn't know nor depend upon Phobos internal implementation details.  Part of the power of encapsulation is to leave certain non-essential things unspecified, so that implementors can have greater freedom in how they implement the API.  Something that isn't directly related to the particular function (e.g., the primary function of byChunk) shouldn't be a part of the API, IMO, it should be left unspecified with the understanding that relying on the behaviour of something unspecified runs the risk of future breakage.  The specific behaviour ought to be inside the "black box" of encapsulation, not relied upon by user code, much less specified in library docs.


T

-- 
"Maybe" is a strange word.  When mom or dad says it it means "yes", but when my big brothers say it it means "no"! -- PJ jr.
March 20, 2020
On Wednesday, March 18, 2020 10:29:53 AM MDT H. S. Teoh via Digitalmars-d wrote:
> On Wed, Mar 18, 2020 at 02:57:41PM +0000, Dukc via Digitalmars-d wrote: [...]
>
> > This is how chunkBy[1] currently behaves when copying. It essentially behaves like a reference range: it will only save it's state when `save` is explicitly called, not otherwise, even if the chunked source range is a forward range with value semantics.
>
> The current range API design, which Andrei himself admitted was not the best design in retrospect, does not specify behaviour on copying a range.  IOW, it's the application-level equivalent of undefined behaviour.  Generally, this is not a problem because usually you're working with your own range types which you already know the semantics of.  But in generic code, assumptions of this sort are a no-no, precisely because of breakages of this sort when different ranges behave differently on copy.
>
> tl;dr: never copy a range directly, always use .save.  Never assume a range remains unchanged after iterating a copy; always assume the worst and use .save whenever you wish the original range to remain unchanged afterwards.

Ranges get copied all the time when passing them around. You're not going to avoid it (even using them with foreach copies them). The key thing is not that a range shouldn't be copied without using save but that if a range is ever copied, then the original shouldn't be used again, and from that point on, only the copy should be used.

So, if you pass a range to foreach, a function, or do anything else which would copy it, then don't use the original again, and if you want to use it again, then instead of copying it directly, call save to get an independent copy. Generic code should _never_ rely on the copying semantics of a range, and even in non-generic code, depending on the semantics of copying a range whose implementation you don't fully control is just begging for bugs.

- Jonathan M Davis



March 20, 2020
On Fri, Mar 20, 2020 at 03:59:57PM -0600, Jonathan M Davis via Digitalmars-d wrote:
> On Wednesday, March 18, 2020 10:29:53 AM MDT H. S. Teoh via Digitalmars-d wrote:
[...]
> > tl;dr: never copy a range directly, always use .save.  Never assume a range remains unchanged after iterating a copy; always assume the worst and use .save whenever you wish the original range to remain unchanged afterwards.
> 
> Ranges get copied all the time when passing them around. You're not going to avoid it (even using them with foreach copies them). The key thing is not that a range shouldn't be copied without using save but that if a range is ever copied, then the original shouldn't be used again, and from that point on, only the copy should be used.

Yes, that's right.  Actually, for by-value ranges the act of passing them as an argument to a range function in the first place already copies them.  The catch is really that once this happens, the caller or whoever retains the original copy should no longer assume that the range remains in the same place as before.  For some ranges this is true, but for other ranges this assumption is invalid, and will lead to incorrect results.


> So, if you pass a range to foreach, a function, or do anything else which would copy it, then don't use the original again, and if you want to use it again, then instead of copying it directly, call save to get an independent copy. Generic code should _never_ rely on the copying semantics of a range, and even in non-generic code, depending on the semantics of copying a range whose implementation you don't fully control is just begging for bugs.
[...]

+1.


T

-- 
Knowledge is that area of ignorance that we arrange and classify. -- Ambrose Bierce