Range Redesign: Copy Semantics (page 4)

Settings

Help

Index » General » Range Redesign: Copy Semantics (page 4)

January 22

Re: Range Redesign: Copy Semantics

Posted by Paul Backus
in reply to Jonathan M Davis

Permalink

Paul Backus

Posted in reply to Jonathan M Davis

Permalink

On Monday, 22 January 2024 at 03:52:02 UTC, Jonathan M Davis wrote:
> On Sunday, January 21, 2024 6:50:26 AM MST Alexandru Ermicioi via Digitalmars- d wrote:
>> Then new input range api can also be propagated to other types of range such as forward range and further.
>
> Part of the point here is to _not_ do that, because they fundamentally have different semantics.

I think maybe we're losing sight of the big picture in this discussion.

The big-picture, overarching goal of the range API is to provide a common interface between data structures and algorithms. You have N data structures and M algorithms, and by using this common interface you can implement all N*M combinations in O(N+M) code. Each data structure implements the range API once, and each algorithm is implemented once using the range API.

If you split the API in two, by making input streams (basic input ranges) and forward ranges completely disjoint, you undermine this goal. Now, each data structure has to implement *two* APIs, and each algorithm has to be implemented *twice*, once for input streams and once for forward ranges.

In practice, what's going to happen is library authors will simply not bother to implement both, and we will end up with gaps where many (data structure, algorithm) pairs are not covered.

I appreciate the desire to provide consistent copying semantics, but if it comes to a choice between that and having a single unified API, I will choose the unified API, warts and all.

> Restricting copying would make ranges borderline unusable. They have to be able to be passed around and be wrappable, which means either copying them or moving them, and moving them would be very un-user-friendly, since it would require explicit calls to move calls all over the place.

Worth noting that generic range algorithms already have to account for the existence of non-copyable types, since even if a range itself is copyable, its elements may not be.

Also, as I pointed out in an earlier reply to Walter [1], there are legitimate use-cases for non-copyable input streams, where making them copyable would incur a significant performance overhead.

I think the correct solution to this is something like DIP 1040, which makes non-copyable types less of a pain to work with in general.

[1] https://forum.dlang.org/post/sullaynnrjufookbuijt@forum.dlang.org

---

In light of the points above, my proposed copying semantics for input streams are:

1. Input streams MAY be non-copyable, but are not required to be.
2. If you copy an input stream and then call next() on the original, the behavior of both the original and the copy is unspecified.

That is, I think we should give up on having 100% consistent copying semantics in this one case, in order to keep the overall API unified (by supporting UFCS next()) and avoid unnecessary pessimization of range algorithms.

The good news is that with these semantics, if you write you generic code conservatively and treat all input streams as non-copyable, you'll get the right behavior in all cases. And if you don't, you'll get a compile-time error as soon as you actually try to pass in a non-copyable input stream, and you'll know exactly how to fix it. So this design is still an improvement on the status quo.

January 22

Re: Range Redesign: Copy Semantics

Posted by Ben Jones
in reply to Paul Backus

Permalink

Ben Jones

Posted in reply to Paul Backus

Permalink

On Sunday, 21 January 2024 at 22:40:55 UTC, Paul Backus wrote:

If your concern is aesthetics, dereferencing a pointer or unboxing a Nullable is far less ugly than passing a callback.

I've been writing a fair amount of Kotlin (an a tiny bit of Swift) and the trailing lambda syntax is very nice for making these APIs look nice. Bascially, if the last argument to a function is a function, you can put it's curly braces after the parentheses: range.map{ it*2}.filter{ it < 10}

The sauce that Kotlin has that D doesn't is that 1-arg lambdas get an implicit parameter called it and explicit parameters come after the opening curly brace: { p1, p2 -> body }. I'm also not sure how it would work with templates. Generally, I think it's much easier to think about lifetime guarantees with functional APIs, so something like that might make them more palatable syntactically.

January 22

Re: Range Redesign: Copy Semantics

Posted by Alexandru Ermicioi
in reply to Jonathan M Davis

Permalink

Alexandru Ermicioi

Posted in reply to Jonathan M Davis

Permalink

On Monday, 22 January 2024 at 03:52:02 UTC, Jonathan M Davis wrote:
> I'd have to think about it more to see whether having hasNext instead of returning a nullable type from next would be an improvement, make it worse, or be more or less equal. A lot of that depends on how it would affect the typical implementation of a basic input range.

Well one advantage of `hasNext` is that you can check in advance whether range is empty or not, without the need to pop front element.

> Part of the point here is to _not_ do that, because they fundamentally have different semantics.

That would mean that you need to overload a method to accept both input, and forward ranges when foreach is not used, right?

> If this is coming across as complicated, then it's probably because of all of the explanatory text I had to give as to why these changes should be made. But the changes themselves simplify things for forward ranges.

Could be the case :). Perhaps it would be nice to see it as a code example.

January 22

Re: Range Redesign: Copy Semantics

Posted by Atila Neves
in reply to Jonathan M Davis

Permalink

Atila Neves

Posted in reply to Jonathan M Davis

Permalink

On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:
> I've been thinking about this for a while now, but with the next version of Phobos which is in the early planning stages, we really should do some redesigning of ranges. Most of their API is just fine how it is, but there are some aspects of it which really should be changed if we want them to be better (the most obvious one being the removal of auto-decoding). But what I'd like to discuss specifically in this thread is fixing - and defining - the semantics of copying and assigning to ranges. Specifically, the semantics of stuff like
>
> [...]

I don't think I've ever encountered a situation where reference ranges would have been desirable - I've never used one.

I think that `.save` was a historical mistake, and that ranges that can be copied are forward ranges. Something like a range reading from stdin or a socket would/should disable the copy/postblit constructors.

Has anyone here used the class-based ranges?

January 23

Re: Range Redesign: Copy Semantics

Posted by Richard (Rikki) Andrew Cattermole
in reply to Atila Neves

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Atila Neves

Permalink

On 23/01/2024 4:41 AM, Atila Neves wrote:
> On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:
>> I've been thinking about this for a while now, but with the next version of Phobos which is in the early planning stages, we really should do some redesigning of ranges. Most of their API is just fine how it is, but there are some aspects of it which really should be changed if we want them to be better (the most obvious one being the removal of auto-decoding). But what I'd like to discuss specifically in this thread is fixing - and defining - the semantics of copying and assigning to ranges. Specifically, the semantics of stuff like
>>
>> [...]
> 
> I don't think I've ever encountered a situation where reference ranges would have been desirable - I've never used one.
> 
> I think that `.save` was a historical mistake, and that ranges that can be copied are forward ranges. Something like a range reading from stdin or a socket would/should disable the copy/postblit constructors.
> 
> Has anyone here used the class-based ranges?

I did experiment with them when I first learned ranges.

Nothing is set up for them.

But they do two things extremely well:

1. Create confusion, they are not the definition of what a range is, nor are they used anywhere.
2. Of course since then I've also learned that they offer opportunity to prevent optimizations, so not a great solution to be tied to.

I want signatures. I've told Adam Wilson that if you want me involved with ranges for PhobosV3 at the very least I want the compile time aspect. The current situation with ranges not being properly documented in API is just not good enough.

Signatures do have a runtime aspect, they can represent ranges if you need a reference, we can also implement something like save via that (even if the implementation doesn't have it).

January 22

Re: Range Redesign: Copy Semantics

Posted by victoroak
in reply to Jonathan M Davis

Permalink

victoroak

Posted in reply to Jonathan M Davis

Permalink

On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:
>
>  [...]
>
>

I think one problem with just "next" as the api for an Input Range is that currently I don't know any safe way to return a reference to the item. If "next" returns a pointer we can't be sure the backing data structure (a growable array like std::vector for example) did not freed that memory yet but I guess this is the same for any range right now.

Any way, it would be good to have an optimization for Nullable!T or a different type that when T is nullable (e.g. classes or pointers) it doesn't need a boolean flag but also loses the ability to store null inside it.

That's one of the optimizations of "Option" in Rust, although in Rust references can't be null so it's ok to do this.

January 22

Re: Range Redesign: Copy Semantics

Posted by H. S. Teoh
in reply to Atila Neves

Permalink

H. S. Teoh

Posted in reply to Atila Neves

Permalink

On Mon, Jan 22, 2024 at 03:41:35PM +0000, Atila Neves via Digitalmars-d wrote:
> On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:
> > I've been thinking about this for a while now, but with the next version of Phobos which is in the early planning stages, we really should do some redesigning of ranges. Most of their API is just fine how it is, but there are some aspects of it which really should be changed if we want them to be better (the most obvious one being the removal of auto-decoding). But what I'd like to discuss specifically in this thread is fixing - and defining - the semantics of copying and assigning to ranges. Specifically, the semantics of stuff like
> > 
> > [...]
> 
> I don't think I've ever encountered a situation where reference ranges would have been desirable - I've never used one.

It's useful in recursive-descent parsers where you expect the range to have advanced past whatever a recursive call has consumed.


> I think that `.save` was a historical mistake, and that ranges that can be copied are forward ranges. Something like a range reading from stdin or a socket would/should disable the copy/postblit constructors.

Meaning they can only be passed by reference?


> Has anyone here used the class-based ranges?

Yes.  They are useful when you have a range whose type varies at runtime. For example:

	auto myFunc(R)(R range, int type) {
		switch (type) {
			case 1:
				return range.map!(a => a*2);
			case 2:
				auto auxRange = helperFunc(range);
				return range.filter!(a => a % 2)
					.chain(auxRange);
			case 3:
				return range.chain(recurrence!(...));
		}
		assert(0);
	}

This code will not compile because there is no common return type. To fix it, wrap each return in out of the class-based range objects:

	auto myFunc(R)(R range, int type) {
		switch (type) {
			case 1:
				return range.map!(a => a*2)
					.inputRangeObject;
			case 2:
				auto auxRange = helperFunc(range);
				return range.filter!(a => a % 2)
					.chain(auxRange)
					.inputRangeObject;
			case 3:
				return range.chain(recurrence!(...))
					.inputRangeObject;
		}
		assert(0);
	}


T

-- 
They say that "guns don't kill people, people kill people." Well I think the gun helps. If you just stood there and yelled BANG, I don't think you'd kill too many people. -- Eddie Izzard, Dressed to Kill

January 22

Re: Range Redesign: Copy Semantics

Posted by Paul Backus
in reply to Atila Neves

Permalink

Paul Backus

Posted in reply to Atila Neves

Permalink

On Monday, 22 January 2024 at 15:41:35 UTC, Atila Neves wrote:

I think that .save was a historical mistake, and that ranges that can be copied are forward ranges. Something like a range reading from stdin or a socket would/should disable the copy/postblit constructors.

This was my initial thought too, but it turns out it's not quite that simple.

If we require basic input ranges to be non-copyable, then types which are inherently copyable (in particular, T[]) will be exclude from being basic input ranges. In order to use a T[] with an algorithm that expects a basic input range, you would have to wrap it in a struct that has copying disabled.

It's also a big problem for composability. If half the algorithms require your range to be copyable, and the other half require it to be non-copyable, you have to keep track of what color each algorithm is whenever you write a range pipeline, or write a new algorithm that calls an existing one internally.

January 22

Re: Range Redesign: Copy Semantics

Posted by monkyyy
in reply to Jonathan M Davis

Permalink

monkyyy

Posted in reply to Jonathan M Davis

Permalink

On Sunday, 21 January 2024 at 05:00:31 UTC, Jonathan M Davis wrote:
> wall-o-text

Can this be summarized as

```d
enum isRangeCopible(R)= is(
  (R r){
    R rdup=r;
    r=rdup;
  }
);
```

> What I'd like to get out of this thread is feedback on how much sense this idea does

I think someone formal should collect all changes people want to make to ranges

see my thread for .index and feature sets
theres something about immutablity
I would also suggest arrays should be more important then doubly linked lists

January 22

Re: Range Redesign: Copy Semantics

Posted by Sebastiaan Koppe
in reply to Jonathan M Davis

Permalink

Sebastiaan Koppe

Posted in reply to Jonathan M Davis

Permalink

On Monday, 22 January 2024 at 04:38:13 UTC, Jonathan M Davis wrote:
> On Sunday, January 21, 2024 11:26:37 AM MST Sebastiaan Koppe
>> I have been thinking about having an explicit build and iteration phase, where priming happens when you switch from build to iteration.
>>
>> The benefit is that implementers have a clear place where to prime the range.
>
> Well, arguably, that's really an implementation detail of the range, and it would be pretty annoying if range-based code had to worry about explicitly priming a range (you're basically getting into two phase construction at that point, which tends to be problematic). Ideally, once you have a range, it's ready to use. So, while I could see saying what best practice was on that, I'm not sure that I'd want to require that it be done a specific way.
>
> Part of the problem is that in my experience, it sometimes makes sense to do it one place, whereas with other ranges, it makes more sense to do it somewhere else.

Yes exactly. My point is that if there is an explicit place to do priming, that uncertainty would go away.

> That being said, I'm not sure that I've ever written a range that primes anything in empty. Usually, the question is between what goes in front and and what goes in popFront and how much has to go in the constructor for that to work cleanly. And the answer is not always the same.

With 2-phase ranges it would always be the same. Anyway, here are some Phobos examples:

- `filter` does its priming in empty, https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L1381

- `filterByDirectional` does a while loop in its constructor, https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L1581

- `chunkBy` calls empty in its constructor, https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L1931

- `substitute` calls empty and popFront in its constructor, https://github.com/dlang/phobos/blob/bf35228426529ab19e5d17a3286d187214bf024a/std/algorithm/iteration.d#L6909

Taken together, pretty much anything can happen just by constructing a range. You don't even need to iterate it!

> Regardless, the big issues that were the point of this thread were the copy and assignment semantics for ranges and what we would need to do fix those. There are definitely other issues that we're going to have to sort out (e.g. the tail-const issue).

Ultimately it all comes together or it doesn't.

>> What about:
>>
>> ```
>>      bool next(Callable)(Callable c) {
>>          if (empty)
>>              return false;
>>          c(front());
>>          popFront();
>>          return true;
>>      }
>> ```
>>
>> It has the benefit of not needing to unbox a Nullable/Pointer.
>
> Well, that would basically be another sort of opApply, which is an idiom that tends to be pretty hard to understand in comparison to the alternatives.

Actual usual would just use `foreach` of course. But in cases where you want to iterate manually you have to deal with the added ugliness/complexity, fair.

I do think its easier to access the item by `ref` this way, instead of having to do a pointer in some nullable wrapper.

> Getting a nullable type of some kind back is far easier to read and understand, and the only real downside I see to it is that you get bugs if someone tries to access the value when there isn't one. But ranges already have that in general with front and empty, and I don't think that it's been much of an issue in practice.

I don't particularly like APIs that have a requirement to call methods in a particular order, I much rather have APIs that can't be used wrong. Pit of success and all that.

In practice though, I messed up only a few times, and then added a condition on `empty` and continued on.

> We could also go with next and hasNext like Alexandru suggested, in which case, we wouldn't be returning a pointer or nullable type if that's the concern. I'm not sure if that's ultimately better or worse though. In some respects, it would be better to be able to put all of the range logic in a single function, but it would also be nice in some situations to be able to ask whether a basic input range has any elements without having to grab the first one if it's there.

With some sources, determining whether there is a next item actually involves *getting* the next item. Instead, like `empty`, I think `hasNext` ought to be `const`.

Top | Forum index | About this forum

Forums