User defined type and foreach (page 3)

Settings

Help

Index » Learn » User defined type and foreach (page 3)

November 18, 2017

Re: User defined type and foreach

Posted by H. S. Teoh

Permalink

H. S. Teoh

Permalink

On Fri, Nov 17, 2017 at 02:50:59AM -0700, Jonathan M Davis via Digitalmars-d-learn wrote: [...]
> Personally, I'm inclined to think that we should never have had save and should have required that reference type ranges which are forward ranges be wrapped in a struct where copying it does the same thing that save does now, but I seriously doubt that we could make a change that big now.

Andrei also expressed, in retrospect, the same sentiment about input vs. forward ranges. But as you said, that ship has already sailed and we just have to make do with what we have now.


T

-- 
If you want to solve a problem, you need to address its root cause, not just its symptoms. Otherwise it's like treating cancer with Tylenol...

January 19

Re: User defined type and foreach

Posted by Jim Balter
in reply to Jonathan M Davis

Permalink

Jim Balter

Posted in reply to Jonathan M Davis

Permalink

On Friday, 17 November 2017 at 17:55:30 UTC, Jonathan M Davis wrote:

> When you have
>
> foreach(e; range)
>
> it gets lowered to something like
>
> for(auto r = range; !r.empty; r.popFront())
> {
>     auto e = r.front;
> }
>
> So, the range is copied when you use it in a foreach.

Indeed, and the language spec says so, but this is quite wrong as it violates the specification and design of ranges ... only forward ranges are copyable and only via their `save` function. I have an input range that can only be iterated once; if you try to do so again it's empty ... but the foreach implementation breaks that. You should be able to break out of the foreach statement, then run it again (or another foreach) and it should continue from where it left off, but copying breaks that. I need to know how many elements of my range were consumed; copying breaks that. I got around this by having a pointer to my state so only the pointer gets copied. I would also note that tutorials such as Ali Çehreli's "Programming in D – Tutorial and Reference" are unaware of this breakage:

"
Those three member functions must be named as empty, popFront, and front, respectively. The code that is generated by the compiler calls those functions:

    for ( ; !myObject.empty(); myObject.popFront()) {

        auto element = myObject.front();

        // ... expressions ...
    }
"

January 19

Re: User defined type and foreach

Posted by Jonathan M Davis
in reply to Jim Balter

Permalink

Jonathan M Davis

Posted in reply to Jim Balter

Permalink

On Friday, January 19, 2024 3:49:29 AM MST Jim Balter via Digitalmars-d-learn wrote:
> On Friday, 17 November 2017 at 17:55:30 UTC, Jonathan M Davis
>
> wrote:
> > When you have
> >
> > foreach(e; range)
> >
> > it gets lowered to something like
> >
> > for(auto r = range; !r.empty; r.popFront())
> > {
> >
> >     auto e = r.front;
> >
> > }
> >
> > So, the range is copied when you use it in a foreach.
>
> Indeed, and the language spec says so, but this is quite wrong as
> it violates the specification and design of ranges ... only
> forward ranges are copyable and only via their `save` function. I
> have an input range that can only be iterated once; if you try to
> do so again it's empty ... but the foreach implementation breaks
> that. You should be able to break out of the foreach statement,
> then run it again (or another foreach) and it should continue
> from where it left off, but copying breaks that. I need to know
> how many elements of my range were consumed; copying breaks that.
> I got around this by having a pointer to my state so only the
> pointer gets copied. I would also note that tutorials such as Ali
> Çehreli's "Programming in D – Tutorial and Reference" are unaware
> of this breakage:
>
> "
> Those three member functions must be named as empty, popFront,
> and front, respectively. The code that is generated by the
> compiler calls those functions:
>
>      for ( ; !myObject.empty(); myObject.popFront()) {
>
>          auto element = myObject.front();
>
>          // ... expressions ...
>      }
> "

No spec is being violated, and the behavior is completely expected. The core problem is that the range API doesn't actually specify the semantics of copying a range - and actually can't do so without making breaking changes.

D types in general fall into one of three categories with regards to their copy semantics:

1. value types
2. reference types
3. pseudo-reference types

When you copy a value type, you get two fully independent copies of the object (e.g. integers are a prime example of this; mutating a copy of an integer has no effect whatsoever on the original).

When you copy a reference type, you get two fully dependent copies. The type is either a pointer or a reference (or a struct that holds just a pointer or a reference), and copying it results in another pointer or reference to the same object. So, mutating the object affects all pointers or references to that object.

When you copy a pseudo-reference type, you get a partial copy. Typically, you're dealing with a struct which has both members which are value types and members which are reference types. The result is that some operations will affect only the object being mutated, whereas other operations will affect other copies of that object. Dynamic arrays are one example of this. They container a pointer and a size_t which is the length of the array, and reducing the size of the array by mutating the pointer and the length has no effect on other dynamic arrays which point to the same data, but mutating the actual elements affects all dynamic arrays which point to the same data.

What this means for ranges is that depending on how they're implemented, you get one of four different behaviors when they're copied:

1. If the range is a value type, then copying the range results in two independent copies, so mutating the copy has no effect on the original. Code can iterate through both ranges independently.

2. If the range is a reference type, then copying the range results in two dependent copies, so mutating the copy mutates the original. Any code that iterates one of the two ranges then affects any code which would try to iterate the original, but the state is consistent across both ranges, because rather than containing their data, the data is elsewhere, and they both point to the same place.

3. If a range is a pseudo-reference type, and its iteration state is copied by value, then copying the range gives you the same behavior as a value type as far as iteration goes. Both the copy and the original can be iterated independently (though depending on the implementation, mutating the elements themselves could affect both ranges). Dynamic arrays fall in this category.

4. If a range is a pseudo-reference type, and its iteration state is not fully copied by value, then you end up with the copy and the original being partially dependent. This means that if you mutate one of them, it will only partially mutate the other, which tends to mean that the other ends up in an invalid state. A common situation where this can occur is if the range stores its front as a member variable, but the rest of its state is stored in another member variable which is a reference type. If you then call popFront on the copy, you'd end up with the copy's front changing, but the original's front wouldn't, and yet, the state they share for the rest of the range would be mutated, so if you then called popFront on the original, you wouldn't get the element that the copy got by calling popFront; you'd get the element after it - meaning that calling popFront on one range would then cause the other to skip an element. So, with pseudo-reference types that do not copy their iteration state by value, you basically can never use the original once you make a copy, because the original will be put into an invalid / inconsistent state by mutating the copy.

The result of all of this is that you cannot rely on the semantics of copying a range. When you make a copy, you can rely on that copy having the same elements in the same order that the original would have had if you hadn't made a copy (since otherwise, you couldn't pass ranges around), but you can't rely on what happens to the original when you mutate the copy. This means that in generic code, once you copy a range, you cannot safely do anything with the original (since you'll get completely different behavior from different range types).

In order for forward ranges to be able to get independent copies regardless of their copy sementics, we have the save function. This means that you can rely on save giving you an independent copy of the forward range, but you still can't rely on the semantics of copying the range, and there is no way to get independent copies of basic input ranges (and you can't rely on copies of basic input ranges being fully dependent either, since they could be pseudo-reference types).

This means that in generic code, once you pass a range to foreach, you cannot use the original any longer, because you've copied it, and the semantics of how the original are affected by iterating through the copy depend on how that particular range type is implemented. Non-generic code can get away with it by relying on the copy semantics of a particular range type, but generic code can't.

What this means in practice is that if you want foreach to use an independent copy of the range, you need to use save, and if you want to only partially iterate a range with a loop and then use the original afterwards, you cannot use foreach, because it copies the range, which then potentially puts the original range in an invalid state. And because you can't use save on a basic input range, once you pass a basic input range to foreach, you cannot safely use the original any longer. You either fully iterate through the range or you stop partway through and then never do anything with the rest of the range (since there is no safe way to do so in generic code), but there is no way to continue to iterate through the range once you pass it to foreach and then exit the foreach without relying on the copy semantics of a specific range type. So, for generic code, you'll need to use something other than foreach and avoid copying the range.

To fix this, we would need to make it so that the range API defined the copy semantics of ranges, but that doesn't work as long as forward ranges are treated as a more advanced version of basic input ranges, because basic input ranges and forward ranges inherently have different copy semantics.

Basic input ranges cannot be value types (otherwise, they could implement save), which means that in order to have consistent copy semantics for them, we would have to require that they be reference types (e.g. by requiring that they be classes or pointers to structs). On the other hand, we can't make forward ranges require reference semantics, because that won't work with dynamic arrays (and really what we want for forward ranges is to have the same copying semantics as dynamic arrays - i.e. their iteration state is copied by value). We could make forward ranges have consistent copy semantics by getting rid of save and requiring that copying a range results in a range that can be independently iterated (which would mean requiring that they be dynamic arrays or structs, forcing any reference type ranges to be wrapped in structs which do the equivalent of calling save internally when necessary to get an independent copy). And that way, we would have consistent copy semantics for basic input ranges and consistent copy semantics for forward ranges - but they wouldn't be the same copy semantics, so for generic code to be able to rely on the copy semantics, it would have to only support basic input ranges or only support forward ranges and not basic input ranges.

This means that in order to fix the problem, we need to change the range API, which would break code. Hopefully, we'll be able to do that with the next version of Phobos which is currently in the planning stages, but that will then only work with new code or code that is updated to use the new API. Code written to work with the current range API will always have the problem.

- Jonathan M Davis

January 25

Re: User defined type and foreach

Posted by Jim Balter
in reply to Jonathan M Davis

Permalink

Jim Balter

Posted in reply to Jonathan M Davis

Permalink

On Friday, 19 January 2024 at 18:13:55 UTC, Jonathan M Davis wrote:

On Friday, January 19, 2024 3:49:29 AM MST Jim Balter via Digitalmars-d-learn wrote:

On Friday, 17 November 2017 at 17:55:30 UTC, Jonathan M Davis

wrote:

When you have

foreach(e; range)

it gets lowered to something like

for(auto r = range; !r.empty; r.popFront())
{

auto e = r.front;

}

So, the range is copied when you use it in a foreach.

Indeed, and the language spec says so, but this is quite wrong as
it violates the specification and design of ranges ... only
forward ranges are copyable and only via their save function. I
have an input range that can only be iterated once; if you try to
do so again it's empty ... but the foreach implementation breaks
that. You should be able to break out of the foreach statement,
then run it again (or another foreach) and it should continue
from where it left off, but copying breaks that. I need to know
how many elements of my range were consumed; copying breaks that.
I got around this by having a pointer to my state so only the
pointer gets copied. I would also note that tutorials such as Ali
Çehreli's "Programming in D – Tutorial and Reference" are unaware
of this breakage:

"
Those three member functions must be named as empty, popFront,
and front, respectively. The code that is generated by the
compiler calls those functions:

 for ( ; !myObject.empty(); myObject.popFront()) {

     auto element = myObject.front();

     // ... expressions ...
 }

"No spec is being violated, and the behavior is completely expected."

This is not true, and your over-long response that is a lecture I didn't ask for and don't need misses my point and is largely irrelevant. The specification of ranges, which is independent of the D language, says that the way to copy a range is to use save(). Input ranges cannot be copied or restarted; that's the whole point of the difference between input ranges and forward ranges. You go on and on about the semantics of copying, which has nothing to do with what I wrote, which is that foreach copies when it shouldn't; the semantics of copying is not relevant to not copying. Input ranges are one-shots; they yield values once and that's it. foreach should have been implemented to call save() when that is available (when it is a forward range), and to do no copying when it is not available. Had that been done, then people wouldn't have written a bunch of ranges with missing save() functions that are reusable when they shouldn't be due to foreach making a copy, making foreach impossible to fix now.

"Basic input ranges cannot be value types (otherwise, they could implement save)"

This is not true, and again misses the point. My input range, the one I described in my original message, is a value type (a struct) which, when passed to functions, is passed by reference. It doesn't implement save because it's an input range, not a forward range. It's not supposed to be copied or restartable because it's an input range, not a forward range. Copying it has the wrong consequences because calling popFront on a copy doesn't advance the one-shot input range. It would work just fine if foreach didn't copy it. It works fine if I use it without foreach, e.g., if I use it in a for loop like in Ali's book:

> >

for ( ; !myObject.empty(); myObject.popFront()) {

     auto element = myObject.front();

     // ... expressions ...

}

That works just fine with an input range, which is a range that is not restartable.
The only thing that doesn't work with my input range is foreach, which copies the input range--which, by the specification of input ranges, was never intended.

I managed to make it usable with a foreach loop by having the struct only store a pointer to its state so the foreach copy is harmless, in effect turning my value type range into a reference type and introducing unnecessary overhead.

January 25

Re: User defined type and foreach

Posted by Jonathan M Davis
in reply to Jim Balter

Permalink

Jonathan M Davis

Posted in reply to Jim Balter

Permalink

On Thursday, January 25, 2024 1:57:56 AM MST Jim Balter via Digitalmars-d- learn wrote:
> The specification of ranges, which is independent of
> the D language, says that the way to copy a range is to use
> save().

I'm sorry, but you're misunderstanding the range specification if you think that save is the only way to copy a range or that the range specification makes any such guarantee. And if there is any official documentation that makes any such claim, then it needs to be fixed, because ranges have never worked that way, and they cannot work that way unless ranges in general are non-copyable, which is not the case at all.

What save does is give you a guaranteed way to get a copy which can be independently iterated. However, it's perfectly valid to simply copy a range and then use the copy. They're not non-copyable types, and range-based code in general copies ranges all over the place - both with basic input ranges and with forward ranges. They're copied when they're passed to functions. They're copied when they're wrapped by other ranges. They're copied when they're passed to foreach. This is all falls within the expected use of ranges. What you don't get from those copies is the guarantee that the copy is independent, which is why save is necessary in cases where you need to be sure that you're getting an independent copy of a range.

I gave that long explanation to try to get across what was happening with the copy semantics of ranges and what the consequences of that are. If that wasn't helpful to you, then I'm sorry. Either way, if you think that it goes against the range API to copy a range any way other than by calling save, then you are misunderstanding how ranges work, and looking at how Phobos uses ranges should make that clear, because it copies them all over the place, including with code that works on basic input ranges.

The range API guarantees that copying a range will result in the copy having the same elements in the same order as the original would have had had it not been copied, because without that guarantee, range-based code in general simply wouldn't work. Range-based code in general relies on that ability to copy a range when passing it around. Now, what happens to the original after the copy has been made is not specified by the range API, and that's why save is necessary. But you can still copy a range and use that copy in generic code so long as you don't touch the original again.

So, I've tried to explain to you why the current behavior is expected and does not violate the range specification. If that's not enough for you, then I'm sorry. Maybe someone else can help you out, but if what you've said about save were correct, then Phobos as a whole would be violating the range specification - as would the language itself with foreach. But they've been working this way for well over a decade.

We would like to improve the situation with the next major version of Phobos so that the copy semantics of ranges are cleaner, and the range API will likely see a redesign to fix that issue, among others, but the current range API is as I've explained it to you, warts and all.

- Jonathan M Davis

Top | Forum index | About this forum

Forums