foreach() behavior on ranges (page 3)

Settings

Help

Index » Learn » foreach() behavior on ranges (page 3)

August 25, 2021

Re: foreach() behavior on ranges

Posted by Steven Schveighoffer
in reply to Joseph Rushton Wakeling

Permalink

Steven Schveighoffer

Posted in reply to Joseph Rushton Wakeling

Permalink

On 8/25/21 6:06 AM, Joseph Rushton Wakeling wrote:

On Tuesday, 24 August 2021 at 09:15:23 UTC, bauss wrote:

A range should be a struct always and thus its state is copied when the foreach loop is created.

That's quite a strong assumption, because its state might be a reference type, or it might not have state in a meaningful sense -- consider an input range that wraps reading from a socket, or that just reads from /dev/urandom, for two examples.

Deterministic copying per foreach loop is only guaranteed for forward ranges.

structs still provide a mechanism (postblit/copy ctor) to properly save a forward range when copying, even if the guts need copying (unlike classes). In general, I think it was a mistake to use .save as the mechanism, as generally .save is equivalent to copying, so nobody does it, and code works fine for most ranges.

What should have happened is that input-only ranges should not have been copyable, and copying should have been the save mechanism. Then it becomes way way more obvious what is happening. Yes, this means forgoing classes as ranges.

-Steve

August 25, 2021

Re: foreach() behavior on ranges

Posted by Steven Schveighoffer
in reply to frame

Permalink

Steven Schveighoffer

Posted in reply to frame

Permalink

On 8/25/21 4:31 AM, frame wrote:

On Tuesday, 24 August 2021 at 21:15:02 UTC, Steven Schveighoffer wrote:

I'm surprised you bring PHP as an example, as it appears their foreach interface works EXACTLY as D does:

Yeah, but the point is, there is a rewind() method. That is called every time on foreach().

It seems what you are after is forward ranges. Those are able to "rewind" when you are done with them. It's just not done through a rewind method, but via saving the range before iteration:

foreach(val; forwardRange.save)
{
   ...
   break;
}

// forwardRange hasn't been iterated here

-Steve

August 25, 2021

Re: foreach() behavior on ranges

Posted by Steven Schveighoffer
in reply to Alexandru Ermicioi

Permalink

Steven Schveighoffer

Posted in reply to Alexandru Ermicioi

Permalink

On 8/25/21 6:06 AM, Alexandru Ermicioi wrote:

On Wednesday, 25 August 2021 at 08:15:18 UTC, frame wrote:

I know, but foreach() doesn't call save().

Hmm, this is a regression probably, or I missed the time frame when foreach moved to use of copy constructor for forward ranges.

Do we have a well defined description of what input, forward and any other well known range is, and how it does interact with language features?

For some reason I didn't manage to find anything on dlang.org.

It never has called save. It makes a copy, which is almost always the equivalent save implementation.

-Steve

August 25, 2021

Re: foreach() behavior on ranges

Posted by Alexandru Ermicioi
in reply to Steven Schveighoffer

Permalink

Alexandru Ermicioi

Posted in reply to Steven Schveighoffer

Permalink

On Wednesday, 25 August 2021 at 11:04:35 UTC, Steven Schveighoffer wrote:

It never has called save. It makes a copy, which is almost always the equivalent save implementation.

-Steve

Really?

Then what is the use for .save method then?
The only reason I can find is that you can't declare constructors in interfaces hence the use of the .save method instead of copy constructor for defining forward ranges.

We have now two ways of doing the same thing, which can cause confusion. Best would be then for ranges to hide copy constructor under private modifier (or disable altoghether), and force other range wrappers call .save always, including foreach since by not doing so we introduce difference in behavior between ref and value forward ranges (for foreach use).

August 25, 2021

Re: foreach() behavior on ranges

Posted by Steven Schveighoffer
in reply to Alexandru Ermicioi

Permalink

Steven Schveighoffer

Posted in reply to Alexandru Ermicioi

Permalink

On 8/25/21 7:26 AM, Alexandru Ermicioi wrote:

On Wednesday, 25 August 2021 at 11:04:35 UTC, Steven Schveighoffer wrote:

It never has called save. It makes a copy, which is almost always the equivalent save implementation.

Really?

The save function was used to provide a way for code like isForwardRange to have a definitive symbol to search for. It's also opt-in, whereas if we used copying, it would be opt-out.

Why a function, and not just some enum? Because it should be something that has to be used, not just a "documenting" attribute if I recall correctly.

Keep in mind, UDAs were not a thing yet, and compile-time introspection was not as robust as it is now. I'm not even sure you could disable copying.

There would be a huge hole in this plan -- arrays. Arrays are the most common range anywhere, and if a forward range must not be copyable any way but using save, it would mean arrays are not forward ranges.

Not to mention that foreach on an array is a language construct, and does not involve the range interface.

-Steve

August 25, 2021

Re: foreach() behavior on ranges

Posted by Joseph Rushton Wakeling
in reply to Steven Schveighoffer

Permalink

Joseph Rushton Wakeling

Posted in reply to Steven Schveighoffer

Permalink

On Wednesday, 25 August 2021 at 10:59:44 UTC, Steven Schveighoffer wrote:

Consider a struct whose internal fields are just a pointer to its "true" internal state. Does one have any right to assume that the postblit/copy ctor would necessarily deep-copy that?

If that struct implements a forward range, though, and that pointed-to state is mutated by iteration of the range, then it would be reasonable to assume that the save method MUST deep-copy it, because otherwise the forward-range property would not be respected.

With that in mind, I am not sure it's reasonable to assume that just because a struct implements a forward-range API, that copying the struct instance is necessarily the same as saving the range.

Indeed, IIRC quite a few Phobos library functions program defensively against that difference by taking a .save copy of their input before iterating over it.

I think there's a benefit of a method whose definition is explicitly "If you call this, you will get a copy of the range which will replay exactly the same results when iterating over it". Just because the meaning of "copy" can be ambiguous, whereas a promise about how iteration can be used is not.

August 25, 2021

Re: foreach() behavior on ranges

Posted by Steven Schveighoffer
in reply to Joseph Rushton Wakeling

Permalink

Steven Schveighoffer

Posted in reply to Joseph Rushton Wakeling

Permalink

On 8/25/21 12:46 PM, Joseph Rushton Wakeling wrote:

On Wednesday, 25 August 2021 at 10:59:44 UTC, Steven Schveighoffer wrote:

Consider a struct whose internal fields are just a pointer to its "true" internal state. Does one have any right to assume that the postblit/copy ctor would necessarily deep-copy that?

In a world where copyability means it's a forward range? Yes. We aren't in that world, it's a hypothetical "if we could go back and redesign".

With that in mind, I am not sure it's reasonable to assume that just because a struct implements a forward-range API, that copying the struct instance is necessarily the same as saving the range.

Technically this is true. In practice, it rarely happens. The flaw of save isn't that it's an unsound API, the flaw is that people get away with just copying, and it works 99.9% of the time. So code is simply untested with ranges where save is important.

Indeed, IIRC quite a few Phobos library functions program defensively against that difference by taking a .save copy of their input before iterating over it.

I'd be willing to bet $10 there is a function in phobos right now, that takes forward ranges, and forgets to call save when iterating with foreach. It's just so easy to do, and works with most ranges in existence.

> >

The idea is to make the meaning of a range copy not ambiguous.

-Steve

August 25, 2021

Re: foreach() behavior on ranges

Posted by Joseph Rushton Wakeling
in reply to Steven Schveighoffer

Permalink

Joseph Rushton Wakeling

Posted in reply to Steven Schveighoffer

Permalink

On Wednesday, 25 August 2021 at 17:01:54 UTC, Steven Schveighoffer wrote:

In a world where copyability means it's a forward range? Yes. We aren't in that world, it's a hypothetical "if we could go back and redesign".

OK, that makes sense.

This is very true, and makes it quite reasonable to try to pursue "the obvious/lazy thing == the thing you're supposed to do" w.r.t. how ranges are defined.

I'm sure you'd win that bet!

The idea is to make the meaning of a range copy not ambiguous.

Yes, this feels reasonable. And then one can reserve the idea of a magic deep-copy method for special cases like pseudo-RNGs where one wants them to be copyable on user request, but without code assuming it can copy them.

August 25, 2021

Re: foreach() behavior on ranges

Posted by H. S. Teoh
in reply to Joseph Rushton Wakeling

Permalink

H. S. Teoh

Posted in reply to Joseph Rushton Wakeling

Permalink

On Wed, Aug 25, 2021 at 04:46:54PM +0000, Joseph Rushton Wakeling via Digitalmars-d-learn wrote:
> On Wednesday, 25 August 2021 at 10:59:44 UTC, Steven Schveighoffer wrote:
> > structs still provide a mechanism (postblit/copy ctor) to properly save a forward range when copying, even if the guts need copying (unlike classes). In general, I think it was a mistake to use `.save` as the mechanism, as generally `.save` is equivalent to copying, so nobody does it, and code works fine for most ranges.
> 
> Consider a struct whose internal fields are just a pointer to its "true" internal state.  Does one have any right to assume that the postblit/copy ctor would necessarily deep-copy that?
[...]
> If that struct implements a forward range, though, and that pointed-to state is mutated by iteration of the range, then it would be reasonable to assume that the `save` method MUST deep-copy it, because otherwise the forward-range property would not be respected.
[...]

What I understand from what Andrei has said in the past, is that a range is merely a "view" into some underlying storage; it is not responsible for the contents of that storage.  My interpretation of this is that .save will only save the *position* of the range, but it will not save the contents it points to, so it will not (should not) deep-copy.

However, if the range is implemented by a struct that contains a reference to its iteration state, then yes, to satisfy the definition of .save it should deep-copy this state.

> With that in mind, I am not sure it's reasonable to assume that just because a struct implements a forward-range API, that copying the struct instance is necessarily the same as saving the range.
[...]

Andrei has mentioned before that in retrospect, .save was a design mistake.  The difference between an input range and a forward range should have been keyed on whether the range type has reference semantics (input range) or by-value semantics (forward range).  But for various reasons, including the state of the language at the time the range API was designed, the .save route was chosen, and we're stuck with it unless Phobos 2.0 comes into existence.

Either way, though, the semantics of a forward range pretty much dictates that whatever type a range has, if it claims to be a forward range then .save must preserve whatever iteration state it has at that point in time. If this requires deep-copying some state referenced from a struct, then that's what it takes to satisfy the API.  This may take the form of a .save method that copies state, or a copy ctor that does the same, or simply storing iteration state as PODs in the range struct so that copying the struct equates to preserving the iteration state.

T

-- 
Why waste time reinventing the wheel, when you could be reinventing the engine? -- Damian Conway

August 26, 2021

Re: foreach() behavior on ranges

Posted by Joseph Rushton Wakeling
in reply to H. S. Teoh

Permalink

Joseph Rushton Wakeling

Posted in reply to H. S. Teoh

Permalink

On Wednesday, 25 August 2021 at 19:51:36 UTC, H. S. Teoh wrote:
> What I understand from what Andrei has said in the past, is that a range is merely a "view" into some underlying storage; it is not responsible for the contents of that storage.  My interpretation of this is that .save will only save the *position* of the range, but it will not save the contents it points to, so it will not (should not) deep-copy.

That definition is potentially misleading if we take into account that a range is not necessarily iterating over some underlying storage: ranges can also be defined by algorithmic processes.  (Think e.g. iota, or pseudo-RNGs, or a range that iterates over the Fibonacci numbers.)

> However, if the range is implemented by a struct that contains a reference to its iteration state, then yes, to satisfy the definition of .save it should deep-copy this state.

Right.  And in the case of algorithmic ranges (rather than container-derived ranges), the state is always and only the iteration state.  And then as well as that there are ranges that are iterating over external IO, which in most cases can't be treated as forward ranges but in a few cases might be (e.g. saving the cursor position when iterating over a file's contents).

Arguably I think a lot of problems in the range design derive from not thinking through those distinctions in detail (external-IO-based vs. algorithmic vs. container-based), even though superficially those seem to map well to the input vs forward vs bidirectional vs random-access range distinctions.

That's also not taking into account edge cases, e.g. stuff like RandomShuffle or RandomSample: here one can in theory copy the "head" of the range but one arguably wants to avoid correlations in the output of the different copies (which can arise from at least 2 different sources: copying under-the-hood pseudo-random state of the sampling/shuffling algorithm itself, or copying the underlying pseudo-random number generator).  Except perhaps in the case where one wants to take advantage of the pseudo-random feature to reproduce those sequences ... but then one wants that to be a conscious programmer decision, not happening by accident under the hood of some library function.

(Rabbit hole, here we come.)

> Andrei has mentioned before that in retrospect, .save was a design mistake.  The difference between an input range and a forward range should have been keyed on whether the range type has reference semantics (input range) or by-value semantics (forward range).  But for various reasons, including the state of the language at the time the range API was designed, the .save route was chosen, and we're stuck with it unless Phobos 2.0 comes into existence.
>
> Either way, though, the semantics of a forward range pretty much dictates that whatever type a range has, if it claims to be a forward range then .save must preserve whatever iteration state it has at that point in time. If this requires deep-copying some state referenced from a struct, then that's what it takes to satisfy the API.  This may take the form of a .save method that copies state, or a copy ctor that does the same, or simply storing iteration state as PODs in the range struct so that copying the struct equates to preserving the iteration state.

Yes.  FWIW I agree that when _implementing_ a forward range one should probably make sure that copying by value and the `save` method produce the same results.

But as a _user_ of code implemented using the current range API, it might be a bad idea to assume that a 3rd party forward range implementation will necessarily guarantee that.

Top | Forum index | About this forum

Forums