Range Redesign: Copy Semantics (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Range Redesign: Copy Semantics (page 3)

January 22

Re: Range Redesign: Copy Semantics

Posted by Richard (Rikki) Andrew Cattermole
in reply to Jonathan M Davis

Richard (Rikki) Andrew Cattermole

Posted in reply to Jonathan M Davis

On 22/01/2024 3:08 PM, Jonathan M Davis wrote:
> On Sunday, January 21, 2024 10:39:41 AM MST Richard (Rikki) Andrew Cattermole
> via Digitalmars-d wrote:
>> On 21/01/2024 11:58 PM, Jonathan M Davis wrote:
>>> Either way, if we're doing a new major version of Phobos, we'll be
>>> reworking a variety of stuff anyway, and so Nullable would presumably be
>>> on the list (e.g. I think that it was a big mistake to make it a range,
>>> and I'd remove that functionality; slicing it to get a range is one
>>> thing, but the fact that it itself is a range definitely causes problems
>>> in generic code).
>> Umm, why would we bring back Nullable?
>>
>> By then we'd have sum types which can represent it in the language far
>> better.
> 
> Much as I know that you like the idea of sum types, AFAIK, it's never been
> officially agreed that we're going to have them in the language, and even if
> we do get them, I don't know what that's actually going to look like.

Given that Walter has put together a DIP for them, I'm fairly confident it is a question of when not if.

https://github.com/WalterBright/DIPs/blob/sumtypes/DIPs/1NNN-(wgb).md

January 22

Re: Range Redesign: Copy Semantics

Posted by Paul Backus
in reply to Jonathan M Davis

Paul Backus

Posted in reply to Jonathan M Davis

On Monday, 22 January 2024 at 02:10:32 UTC, Jonathan M Davis wrote:
> On Sunday, January 21, 2024 7:51:33 AM MST Paul Backus via Digitalmars-d wrote:
>> With the proposed design, it would be possible to implement next() for forward ranges as a UFCS function:
[...]
>
> True, but then you have a function with the same copy semantics problem that we have now, since the forward range wouldn't have reference semantics. It would need to be fully wrapped in another type which gave it reference semantics to do that.

You caught me--I forgot to mark the parameter as 'ref'. Dangers of writing code directly into a message without testing it first, I suppose. :)

January 21

Re: Range Redesign: Copy Semantics

Posted by Jonathan M Davis
in reply to Richard (Rikki) Andrew Cattermole

Jonathan M Davis

Posted in reply to Richard (Rikki) Andrew Cattermole

On Sunday, January 21, 2024 7:17:35 PM MST Richard (Rikki) Andrew Cattermole via Digitalmars-d wrote:
> On 22/01/2024 3:08 PM, Jonathan M Davis wrote:
> > On Sunday, January 21, 2024 10:39:41 AM MST Richard (Rikki) Andrew
> > Cattermole>
> > via Digitalmars-d wrote:
> >> On 21/01/2024 11:58 PM, Jonathan M Davis wrote:
> >>> Either way, if we're doing a new major version of Phobos, we'll be reworking a variety of stuff anyway, and so Nullable would presumably be on the list (e.g. I think that it was a big mistake to make it a range, and I'd remove that functionality; slicing it to get a range is one thing, but the fact that it itself is a range definitely causes problems in generic code).
> >>
> >> Umm, why would we bring back Nullable?
> >>
> >> By then we'd have sum types which can represent it in the language far better.
> >
> > Much as I know that you like the idea of sum types, AFAIK, it's never been officially agreed that we're going to have them in the language, and even if we do get them, I don't know what that's actually going to look like.
> Given that Walter has put together a DIP for them, I'm fairly confident it is a question of when not if.
>
> https://github.com/WalterBright/DIPs/blob/sumtypes/DIPs/1NNN-(wgb).md

Well, we'll see what we actually end up with, but I'd consider that proposal to be terrible if all you want is a nullable type, and restrictions such as "Members of sumtypes cannot have copy constructors, postblits, or destructors" would be far too much for plenty of use cases such that I would still be very much inclined to have an explicit Nullable type.

In any case, whether Phobos v3 has a Nullable type and the question of the exact type returned by next in the proposed API don't really matter that much right now. What matters is the basics of the API, and some of the details can be ironed out later. IMHO, having next return a type that emulates a pointer is the best choice (especially when we don't even have sum types yet), but that can be revisited down the line, particularly since it's not like we're releasing major changes to ranges right now.

- Jonathan M Davis

January 21

Re: Range Redesign: Copy Semantics

Posted by Jonathan M Davis
in reply to Paul Backus

Jonathan M Davis

Posted in reply to Paul Backus

On Sunday, January 21, 2024 3:46:06 PM MST Paul Backus via Digitalmars-d wrote:
> On Sunday, 21 January 2024 at 19:24:20 UTC, Steven Schveighoffer
>
> wrote:
> > While I agree with most of what you wrote, there is one very big problem with this. Namely, current input ranges are suddenly considered to be forward ranges. Changing the semantics of what it means to only define `front`, `popFront`, and `empty` changes the semantics of existing code.
> >
> > And this is not something you can solve with some kind of versioning. The only way I can think of is to alter the name of one or more primitives to prevent accidental usage.
>
> One possibility is to change the forward range interface to something like
>
>      bool empty();
>      Element head();
>      typeof(this) tail();
>
> In addition to distinguishing new-style forward ranges from old-style ones, this interface would also allow forward ranges to be `const` (although iterating a `const` forward range would only be possible with recursion).
>
> `popFront` could then be implemented as a UFCS function:
>
>      void popFront(R)(ref R r)
>          if (std.v2.range.isForwardRange!R && isAssignable!R)
>      {
>          r = r.tail;
>      }

Well, I think that the reality of the matter here is that if we want to deal with const and immutable properly (which is another issue that we need to be tackling with the range API redesign), we're either going to need to switch to a more functional-style API like this, or we're going to need to add some sort of opTailConst to the language, or we're going to need to add some other sort of language improvement that would allow use to convert const(Range!Element) to Range!(const Element) generically. As things stand, const and ranges really don't interact well together outside of fairly limited situations (e.g. using dynamic arrays for everything), and containers in particular get screwed by it.

So, we may very well decide that we need to move forward ranges to something like head and tail, though I'm not terribly happy with the idea, and I'm not sure what the performance implications are (though LDC at least could probably make it not matter). It would also require another change to foreach, but pretty much _anything_ we do here to fix the tail-const problem with ranges is going to require some kind of language change.

Another problem with the head/tail approach is that we then have to come up with names for bidirectional ranges, and all of that could get pretty confusing. But ultimately, that can be solved.

Either way, the main things that I think needs to happen to fix the copy semantics problems are to require that forward ranges have the same copy semantics as dynamic arrays (at least with regards to their iteration state), which means that save is no longer a thing, and we need basic input ranges to be reference types, which means that they should probably have a different API from forward ranges. I suppose that we could then change the forward range API and leave the basic input range API alone, but given that the current API requires caching front, which causes some problems for basic input ranges, I'm inclined to give basic input ranges a different API. If we then also need to change the API of forward ranges beyond removing save, then so be it, though that will result in a lot more code having to be changed when it's updated to use or be in Phobos v3.

- Jonathan M Davis

January 21

Re: Range Redesign: Copy Semantics

Posted by Jonathan M Davis
in reply to Paul Backus

Jonathan M Davis

Posted in reply to Paul Backus

On Sunday, January 21, 2024 7:30:40 PM MST Paul Backus via Digitalmars-d wrote:
> On Monday, 22 January 2024 at 02:10:32 UTC, Jonathan M Davis
>
> wrote:
> > On Sunday, January 21, 2024 7:51:33 AM MST Paul Backus via
> >
> > Digitalmars-d wrote:
> >> With the proposed design, it would be possible to implement
>
> >> next() for forward ranges as a UFCS function:
> [...]
>
> > True, but then you have a function with the same copy semantics problem that we have now, since the forward range wouldn't have reference semantics. It would need to be fully wrapped in another type which gave it reference semantics to do that.
>
> You caught me--I forgot to mark the parameter as 'ref'. Dangers of writing code directly into a message without testing it first, I suppose. :)

Well, even if the parameter is ref (which it would need to be), the core problem is that the range-based function then has to deal with passing around an object that could have value semantics even though basic input ranges would be required to have reference semantics with the proposed changes. Obviously, range-based code _can_ work without caring about the copy semantics of the range type, but the main thrust of the proposed changes is to try to require specific copy semantics from different categories of ranges so that we can avoid the bugs that come with different range types having different copy semantics (as well as making it possible to write code that takes advantage of the specific copy semantics of a particular category of range; even simply being able to assign one range to another in generic code would be a big help to some code).

Ideally, code would be able to rely on the copy and assignment semantics of ranges - and that means not trying to treat basic input ranges and forward ranges as the same thing without actually wrapping one in a way that forces it to be the same thing (be it by turning a forward range into a reference type with the basic input range API or by wrapping a basic input range in a struct which caches the data such that it can implement the forward range API). Simply slapping the functions for the other API on one of the two types will result in ranges that won't behave properly, which will work with some functions and fail miserably with others.

- Jonathan M Davis

January 21

Re: Range Redesign: Copy Semantics

Posted by Jonathan M Davis
in reply to Steven Schveighoffer

Jonathan M Davis

Posted in reply to Steven Schveighoffer

On Sunday, January 21, 2024 12:24:20 PM MST Steven Schveighoffer via Digitalmars-d wrote:
> While I agree with most of what you wrote, there is one very big problem with this. Namely, current input ranges are suddenly considered to be forward ranges. Changing the semantics of what it means to only define `front`, `popFront`, and `empty` changes the semantics of existing code.
>
> And this is not something you can solve with some kind of versioning. The only way I can think of is to alter the name of one or more primitives to prevent accidental usage.

Yeah, I was orignally hoping that this sort of thing could just be solved with editions, but regardless of what we do with editions and Phobos, third party code won't necessarily be updated to match, and it won't even necessarily be obvious whether it has been. To an extent, that doesn't really matter, and we could theoretically punt on the issue based on the idea that any basic input ranges written to follow the Phobos v2 API would just behave incorrectly with functions written to follow the Phobos v3 API for forward ranges, so testing would catch it, but that's obviously not ideal. It _could_ be caught if we required all basic input ranges to be either class references or pointers to structs, while requiring that all forward ranges be structs or dynamic arrays (and we will have to do the latter), but that would arguably restrict the implementation of basic input ranges too much (e.g. reference counting wouldn't be possible).

For better or worse, it's definitely an argument for switching to a more functional-style API like Paul mentioned and like has occasionally been discussed in the past (usually as a potentially solution to the tail-const and tail-immutable issue). That would actually allow us to solve other problems rather than simply being a minor API change just to make it so that we could detect the difference, though I'm not sure what all of the other consequences of such a change would be (particularly with regards to performance).

- Jonathan M Davis

January 21

Re: Range Redesign: Copy Semantics

Posted by Jonathan M Davis
in reply to Nick Treleaven

Jonathan M Davis

Posted in reply to Nick Treleaven

On Sunday, January 21, 2024 11:13:02 AM MST Nick Treleaven via Digitalmars-d wrote:
> On Sunday, 21 January 2024 at 13:48:56 UTC, Jonathan M Davis
>
> wrote:
> > And how on earth would you use a range if it can't be copied? They're copied all over the place and have to be to even be used. You copy them when you pass them to any range-based function. You copy them when you return them from a range-based function. You copy them when they get wrapped by another range. You copy them when you pass them to foreach. Disabling copying on ranges would make them borderline useless.
>
> Maybe they could be moved instead of copied when the original is not needed.

Do you really want to have to call move every time that you call a range-based function or want to wrap one range in another? Immovable types are incredibly un-user-friendly in general.

It also would mean that ranges wouldn't work with foreach with how foreach currently works, since it copies the range. And if it doesn't copy the range, then ranges other than dynamic arrays would have different semantics when being iterated with foreach than dynamic arrays do, which would also be a big problem.

> > And you also can't disable copying on classes or dynamic arrays.
>
> Unless all ranges have to be structs, but then we probably need a function/syntax to ask for a range from a class or dynamic array.

Forward ranges are supposed to mimic dynamic arrays. Having to wrap dynamic arrays to use them as ranges would be very annoying. And for what? To just avoid copying? Yes, if copying ranges is illegal, then you've given them all the same copy semantics, but the result is incredibly unwieldy, and it means that you can't have dynamic arrays and forward ranges use the same code without wrapping dynamic arrays. And are people really going to want to be wrapping dynamic arrays in another type just so that they can make them uncopyable and the be forced to call move all over the place instead?

The most friendly way for forward ranges to function is to make them more properly emulate dynamic arrays as they were originally intended to do. The reason that they don't already is so that we could support classes as ranges, which while it should work, is not the typical use case, and we can allow them to be forward ranges almost as easily with a wrapper struct as we can by having a save function.

If forward ranges all have the same copy semantics as dynamic arrays, then they become easy to use correctly without having bugs due to stuff like forgetting to call save - and without a bunch of compiler errors, because you forgot to use move all over the place. For forward ranges, it would be a huge win, and it's exactly what we've talked about doing for years now. We just haven't done it, because doing it in the current version of Phobos would break existing code.

The bigger problem is then what to do with basic input ranges, because they can't have the same copy semantics, but the fact that they can't implement something like save already hamstrings them enough that arguably they should be treated as something separate anyway. So, by just giving them a separate API, we simplify the situation with forward ranges considerably, and we avoid the classes of bugs that come with code being written to work with both basic input ranges and forward ranges. Of course, there are some cases where having code written to work with both works just fine, and having separate APIs could be annoying in those cases, but basic input ranges are limited enough that most code doesn't support them anyway.

Obviously, the tradeoffs are debatable, but I think that it's clear that ideal situation for forward ranges is to have them all have the same copy semantics as dynamic arrays.

- Jonathan M Davis

January 21

Re: Range Redesign: Copy Semantics

Posted by Jonathan M Davis
in reply to Alexandru Ermicioi

Jonathan M Davis

Posted in reply to Alexandru Ermicioi

On Sunday, January 21, 2024 6:50:26 AM MST Alexandru Ermicioi via Digitalmars- d wrote:
> On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis
> wrote:
> We could use java Iterator api in this case which has also `bool
> hasNext()` function.

Well, the key change the API that would really matter IMHO would be to merge front and popFront, since that would eliminate the need for basic input ranges to cache front, which many of them are forced to do right now. I'd have to think about it more to see whether having hasNext instead of returning a nullable type from next would be an improvement, make it worse, or be more or less equal. A lot of that depends on how it would affect the typical implementation of a basic input range.

> Then new input range api can also be propagated to other types of range such as forward range and further.

Part of the point here is to _not_ do that, because they fundamentally have different semantics.

> > where next returns a nullable type where the value returned is the next element in the range, with a null value being returned if the range is empty. The return type would then need to emulate a pointer - specifically, when casting it to bool, it would be true if it's non-null and false if it's null, and dereferencing it would give you the actual value if it's non-null (with it being undefined behavior if you dereference null). So, a basic input range of ints might define next as
> >
> >     int* next();
> >
> > or alternatively, it could be something like
> >
> >     Nullable!int next();
>
> The returned type should be a tagged union, that can allow storing of actual value inside, or pointer to it, while it's interface will hide the details, i.e. get would look like this: `ref T get()`.

Requiring the pointer API already does that. It's just that checking whether the value is null means casting to bool, and you dereference the return value rather than calling get. The implementation would be quite free to then use a Nullable or some other type which does whatever it wants internally so long as casting to bool tells you whether the object contains a value, and dereferencing it gives you the value.

> Input ranges could just be disallowed in foreach statements, that would solve different semantics between them and forward ranges, just like how in Java it is done with Stream api.

They could be, but that wouldn't be very user-friendly. And it doesn't solve the different semantics of copying them.

> Imho, this proposal is complicated, and unnecessarily complicates construction of ranges, making them less appealing to implement in user code. I'd opt for restricting copying completely, and allow copying through `.save` only.
>
> The `.next` method proposal is a good improvement though, with addition of `.hasNext` method at minimum.

Restricting copying would make ranges borderline unusable. They have to be able to be passed around and be wrappable, which means either copying them or moving them, and moving them would be very un-user-friendly, since it would require explicit calls to move calls all over the place. What we really want to be able to do in general is treat ranges like dynamic arrays (and part of the reason to give basic input ranges a different API is because the can't have the same copy semantics as dynamic arrays).

The suggested API would actually simplify forward ranges considerably, because then most of them wouldn't have to implement save any longer, and those few that do would just implement the necessary semantics via their copy constructor (or a more advanced implementation might use reference-counting, which would be somewhat more complex, but the vast majority of ranges would not be in that situation). And the result would be that forward ranges in general could be treated just like dynamic arrays, which would simplify the vast majority of range-based code and allow range-based code to rely on the copy and assignment semantics instead of having to carefully avoid using ranges which have been copied and avoid assignment like they have to do now. So, as far as forward ranges go, this seems to me like it's very much a simplification, not something complicated. If this is coming across as complicated, then it's probably because of all of the explanatory text I had to give as to why these changes should be made. But the changes themselves simplify things for forward ranges.

The part that's arguably more complex is simply that the basic input ranges then have to have a separate API, but since they fundamentally have different cropy semantics and can't be used in most range-based functions anyway, I don't think that it complicates things much. And it makes the distinction and behavioral differences between basic input ranges and forward ranges much clearer than they typically are now, which would make understanding ranges easier.

- Jonathan M Davis

January 21

Re: Range Redesign: Copy Semantics

Posted by Jonathan M Davis
in reply to Walter Bright

Jonathan M Davis

Posted in reply to Walter Bright

On Sunday, January 21, 2024 10:15:58 AM MST Walter Bright via Digitalmars-d wrote:
> Thanks for bringing this up. Improving range semantics is something we should not pass up the opportunity for with Phobos3. (I consider Phobos1 the D1 version, Phobos2 the one we have currently.)
>
> Another thing that should be visited is the ability to lock/unlock a range. For example, a range that represents stdout needs to be lockable.

I will have to study that more to see what's needed there. If it's the kind of thing that requires higer level locking than simply locking within the implementations of existing range-based functions (e.g. if it had to be locked across multiple calls), then that likely either merits a non-standard extension that the user has to worry about that's specific to stdio stuff, or we'll need to come up with some sort of higher level addition to handle it. I very much doubt that it's the sort of thing that's going to make much sense to worry about within most range-based functions though. I will add it to the list of things to look into though.

There are several other minor changes that I think are going to be needed to the range API (e.g. requiring that random-access ranges implement opDollar rather than letting it be optional, since it's useless if it's optional like it is now), but the core point of this thread was to try to sort out the issue of the copy and assignment semantics for ranges, and the best solution for that seems to be to get rid of save (which would allow all forward ranges to behave like dynamic arrays as far as their copy, assignment, and iteration semantics go) and to make basic input ranges distinct from forward ranges, since they fundamentally can't have the same semantics.

The issue of tail-const is going to be another big one, though that could result in us having to switch to a head/tail model for forward ranges instead of front/popFront - either that, or we're going to need a language improvement of some kind to sort it out, since we need to be able to generically get a tail-const copy of a forward range, whereas we currently don't have a way to convert const(Range!Element) to Range!(const Element). We can't even assume that const(Range!Element) is the same as const(Range!(const Element)). So, there are definitely further issues that will need to be hashed out beyond the copy semantics. I just decided to start with that.

- Jonathan M Davis

January 21

Re: Range Redesign: Copy Semantics

Posted by Jonathan M Davis
in reply to Sebastiaan Koppe

Jonathan M Davis

Posted in reply to Sebastiaan Koppe

On Sunday, January 21, 2024 11:26:37 AM MST Sebastiaan Koppe via Digitalmars-d wrote:
> On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis
>
> wrote:
> > I've been thinking about this for a while now, but with the next version of Phobos which is in the early planning stages, we really should do some redesigning of ranges.
>
> Thanks for the write up.
>
> Another issue I have encountered is around priming. Sometimes it happens in `empty`, sometimes in a private helper function and sometimes even in the constructor.
>
> If priming happens in `empty`, then it can't be `const`. Which is strange because you would expect `empty` not to mutate.
>
> I have been thinking about having an explicit build and iteration phase, where priming happens when you switch from build to iteration.
>
> The benefit is that implementers have a clear place where to prime the range.

Well, arguably, that's really an implementation detail of the range, and it would be pretty annoying if range-based code had to worry about explicitly priming a range (you're basically getting into two phase construction at that point, which tends to be problematic). Ideally, once you have a range, it's ready to use. So, while I could see saying what best practice was on that, I'm not sure that I'd want to require that it be done a specific way. Part of the problem is that in my experience, it sometimes makes sense to do it one place, whereas with other ranges, it makes more sense to do it somewhere else. And whether empty mutates really shouldn't matter so long as it's not actually const (it has to be logically const to be sure, but that doesn't mean that it needs to actually be const), though obviously, it can become annoying to wrap a range when you want to make your empty const, and you can't rely on the range your wrapping having empty be const. For better or worse though, the current range API makes no requirements about const (and I'm not sure if we want to add such requirements given how restrictive D's const is). If we did require const anywhere on the range API though, empty and length would be the obvious places.

That being said, I'm not sure that I've ever written a range that primes anything in empty. Usually, the question is between what goes in front and and what goes in popFront and how much has to go in the constructor for that to work cleanly. And the answer is not always the same.

We may also want to rework how front and popFront work for forward ranges, which could simplify the priming issue depending on what we did. Regardless though, the priming issue is certainly something that we should be thinking about. Even if we don't add anything to the API for it, it will be good to keep it in mind for how it would affect the other range API functions for any reworking of them that we might do.

Regardless, the big issues that were the point of this thread were the copy and assignment semantics for ranges and what we would need to do fix those. There are definitely other issues that we're going to have to sort out (e.g. the tail-const issue).

> > What I would propose for that would be a single function
> >
> >    auto next();
> >
> > where next returns a nullable type where the value returned is the next element in the range, with a null value being returned if the range is empty.
>
> What about:
>
> ```
>      bool next(Callable)(Callable c) {
>          if (empty)
>              return false;
>          c(front());
>          popFront();
>          return true;
>      }
> ```
>
> It has the benefit of not needing to unbox a Nullable/Pointer.

Well, that would basically be another sort of opApply, which is an idiom that tends to be pretty hard to understand in comparison to the alternatives. Getting a nullable type of some kind back is far easier to read and understand, and the only real downside I see to it is that you get bugs if someone tries to access the value when there isn't one. But ranges already have that in general with front and empty, and I don't think that it's been much of an issue in practice.

We could also go with next and hasNext like Alexandru suggested, in which case, we wouldn't be returning a pointer or nullable type if that's the concern. I'm not sure if that's ultimately better or worse though. In some respects, it would be better to be able to put all of the range logic in a single function, but it would also be nice in some situations to be able to ask whether a basic input range has any elements without having to grab the first one if it's there.

Another consideration that comes to mind is that we might want to add a function that explicitly skips elements (since in some cases, you can skip work if you don't actually need the next element), and since next both pops the element and returns it, we wouldn't have to worry about the state of front in the process (whereas adding some kind of skip function to the current range API would result in an invalid front).

- Jonathan M Davis

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation