September 02 Re: Requirements for Slicing Ranges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sebastiaan Koppe | On Monday, September 2, 2024 3:59:04 AM MDT Sebastiaan Koppe via Digitalmars-d wrote: > On Monday, 2 September 2024 at 02:47:16 UTC, Jonathan M Davis > > wrote: > > Does anyone know of a situation where that would not be the case? I would love to just ditch the test for slicing in the updated range API and tie the ability to random-access ranges, since it would be simplify things. If I had to guess without digging through all of the history, I suspect that random-access ranges were added before slicing was and that that's why they weren't tied together, but in practice, they always seem to be, and I can't think of a reason at the moment why they can't be. > > > > Can anyone see something that I'm missing here? > > > > - Jonathan M Davis > > I briefly wondered how it would work if the range is non-copyable, since taking a slice is effectively taking a copy of the underlying data the range captures (and its lifetime). But I remember you saying in another thread that forward ranges have to be copyable, correct? Slicing doesn't copy the underlying data. It just gives a range which refers to fewer elements from the original where the new range can be iterated independently. Just think about dynamic arrays / slices (which is what the range API is attempting to create an abstraction of). If you do something like arr[5 .. 12], you get a new dynamic array which is a slice of the original dynamic array starting at index 5 and going through index 11. Each dynamic array / slice refers to the same memory, and none of the elements were actually copied. The same occurs when slicing a range. You get two ranges which can be independently iterated, but the elements themselves are not necessarily independent (they will be if the range is a value type, but in most cases, they'll refer to the same memory; it's just the iteration state which is guaranteed to be independent). Similarly, save gives you a range which is independent with regards to its iteration state. Both the copy and the original can be iterated without affecting the other, but the elements themselves are not necessarily copied (though in some cases they are), and mutating the elements in one will often mutate the elements in the other. If you want to make sure that all of the elements are copied, then you'd use something like std.array.array to copy all of the range's elements into a dynamic array. Essentially, save is equivalent to range[0 .. $], but there are plenty of ranges that can support copying their full iteration state but which can't support copying their iteration state while skipping a bunch of elements. With the proposed API, copying ranges in general then becomes essentially equivalent to save (though an implementation could use reference counting and avoid actually doing save internally until the range is mutated in order to avoid copying its state in the cases where there's only one copy). So, a non-copyable range would be equivalent to a basic input range - and in general, such a range is not a forward range precisely because it can't implement save in O(1). And if a range can't implement save in O(1), then it can't implement slicing in O(1). So, if a range has to be a basic input range, then it makes no sense for it to have slicing - and the current API does not allow such a range to have slicing, since hasSlicing checks for isForwardRange. My issue is that as far as I can tell, if a range can implement slicing in O(1), then it should be possible for it to implement indexing in O(1) (and vice-versa), in which case, we might as well simplify things and just have isRandomAccessRange require slicing and not support ranges which have slicing but not random-access with the idea that there's no reason why a range that can support one can't support the other. > So I guess the question for me boils down to whether there exists a need for a non-copyable range that provides random access and doesn't allow copies. (Although I guess with ref and dip1000 you could make that work, but lets not go there :smile). If a range cannot implement save - or with the new API, cannot be copyable - then that's because it cannot copy its iteration state in O(1). I suppose that there could be cases where it's technically possible to implement save / copying in O(1), but the programmer doesn't want to (e.g. that would arguably make sense with random number generators, since copying those accidentally can cause result in reusing numbers). However, in the normal case, a range doesn't implement save (or isn't copyable in the new API), because it can't do it in O(1). If a range can't copy its iteration state in O(1), then slicing won't work (since that needs to be O(1), and it's copying the iteration state), and in such a case, I'm pretty sure that it's not possible to implement indexing in O(1) - though maybe someone can come up with a corner case where it is possible. But as long as the ability to implement indexing in O(1) implies the ability to implement slicing in O(1) (and vice versa), it would make sense to just tie random-access and slicing together in order to simplify the API. I guess that the question then is ranges which _could_ copy their iteration state in O(1) but where the programmer decided to not allow it for one reason or another. Sure, then it could be possible to implement random-access in O(1), since in that situation, the lack of copyability has nothing to do with the ability to implement it, but as things stand, bidirectional ranges are always forward ranges, and finite random-access ranges are always bidirectional ranges (whereas infinite random-access ranges are forward ranges but can't be bidirectional ranges for obvious reasons). So, to allow a non-copyable range to be be bidirectional or random-access would make it so that it's no longer the case that bidirectional ranges and random-access ranges are guaranteed to be forward ranges. And we _could_ technically do that, but that would certainly complicate things, and it would be for a pretty rare use case. Also, I'm pretty sure that it's the case that most algorithms which require access to the end of a range or to elements in the middle of a range also require the ability to copy the iteration state of a range. There will be exceptions, but I'm not sure that they're worth the trouble of further complicatin the range API. The question for which I created this thread was based on the assumption that a range would implement all of the functionality that it could. If it doesn't, then it's certainly the case that a range _could_ implement slicing and not be random-access - just like it's possible to have a range that could implement save but which doesn't. So, yeah, the situation changes considerably if we don't treat the range categories as being a linear hierarchy, but I'm inclined to keep the hierarchy as-is, because it's much simpler that way, and I expect that the need for something like a random-access range which can't be a forward range would be quite rare. The whole reason that I'm asking if anyone can think of an example where it would be possible to implement indexing but not slicing (or vice versa) is in order to be able to simplify the range API by getting rid of hasSlicing, whereas if we allow basic input ranges to be bidirectional or random-access, then those traits become more like hasSlicing, and the hierarchy is lost, which makes the whole situation more complicated rather than simpler. And in cases where you have something on the stack which you don't want to copy, then you have basically the same situation as static arrays, and you can always create a range that refers to that memory and use that rather than having a range be non-copyable because of where it is. Sure, that then starts getting into scope and DIP 1000 if you want the compiler to track stuff for you, but the way that range-based algorithms typically work, the dangers of escaping are pretty minimal, since either they take the argument by ref and mutate the original, or they take a copy and return a copy. - Jonathan M Davis |
September 02 Re: Requirements for Slicing Ranges | ||||
---|---|---|---|---|
| ||||
Posted in reply to monkyyy | On Monday, September 2, 2024 10:03:27 AM MDT monkyyy via Digitalmars-d wrote:
> I think slicing should probably be cut from that main api, Id argue its the hardest to get correct of any of the functions while its the most fragile in meta programming in my experiments
>
> Instead there should be a collection of implict range functions given special treatment(std.range.interface, object.d, somewhere new for ranges)
>
> ---
>
> ```d
> auto tail(R)(R r){
> r.pop;
> return r;
> }
> ```
> It would suck to implement tail everywhere, so tail should not be
> in the main api; but often you need to make a poped dup. If tail
> doesnt exist, several range algorithms will need 3 extra lines of
> code repeated several time, if its in the main api, every
> algorthim will need to impliment a `auto tail()=>r.tail;` busy
> work as a specail treatment floating function it is 3 lines,
> users dont need to implement anything and it will reduce
> complexity and clarity of several algorithms(consider zip.pop
> being implimented with static map)
>
> likewise `sliceable` should be a floating function and it wraps ranges and adds opSlice, implements the meta programming of detecting if its implementable *once*, instead of the each and every algorthim needing to juggle the complexity of opDollar existing or not, infinite or finite, error behavoir etc.
As things stand, slicing finite ranges (and infinite ranges when using $) always results in the original type - just like it does when slicing dynamic arrays. I don't think that we want to lose that property of slicing just to avoid making it so that someone has to implement opSlice to get slicing. And if you're already implementing indexing, I don't think that it's particularly onerous to have to implement slicing as well.
Maybe we could provide code to mix in which would make it easier, but ranges are supposed to be an abstraction for dynamic arrays / slices (with ranges in lower categories having fewer capabilities), and making slices have a different type breaks that abstraction. Obviously, the abstraction isn't perfect as things stand, but in general, we want code that's written to work with dynamic arrays to be able to work with random-access ranges with as few changes as possible (preferably with no changes outside of the function signature).
Also, it's going to make algorithm code simpler if it doesn't have to worry about whether random-access ranges have slicing. As it is, we already have too many range-related traits that algorithms potentially have to branch on, and I expect that simplifying that buys us more than trying to make it so that someone who wants to implement a random-access range doesn't have to implement slicing.
- Jonathan M Davis
|
September 03 Re: Requirements for Slicing Ranges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Paul Backus | On Monday, 2 September 2024 at 11:53:55 UTC, Paul Backus wrote:
> On Monday, 2 September 2024 at 09:59:04 UTC, Sebastiaan Koppe wrote:
>>
>> I briefly wondered how it would work if the range is non-copyable, since taking a slice is effectively taking a copy of the underlying data the range captures (and its lifetime).
>
> Taking a slice does not necessarily copy the range, nor does it necessarily copy the range's elements.
>
Apologies, I expressed myself poorly, and I have to admit it is a bit of a stretch.
I meant that taking a slice often creates another alias to the same data and/or state or part thereof.
This is in contrast to indexing, which just returns the underlying item, possible as value.
If you would want to restrict/control a range so much so that you make it non-copyable, you could argue you would want to avoiding taking a slice from it as well. It might very well have random access though, so you would end up with one that exposes indexing but not slicing.
|
September 03 Re: Requirements for Slicing Ranges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sebastiaan Koppe | On Tuesday, September 3, 2024 7:42:28 AM MDT Sebastiaan Koppe via Digitalmars- d wrote:
> On Monday, 2 September 2024 at 11:53:55 UTC, Paul Backus wrote:
> > On Monday, 2 September 2024 at 09:59:04 UTC, Sebastiaan Koppe
> >
> > wrote:
> >> I briefly wondered how it would work if the range is non-copyable, since taking a slice is effectively taking a copy of the underlying data the range captures (and its lifetime).
> >
> > Taking a slice does not necessarily copy the range, nor does it necessarily copy the range's elements.
>
> Apologies, I expressed myself poorly, and I have to admit it is a bit of a stretch.
>
> I meant that taking a slice often creates another alias to the same data and/or state or part thereof.
>
> This is in contrast to indexing, which just returns the underlying item, possible as value.
>
> If you would want to restrict/control a range so much so that you make it non-copyable, you could argue you would want to avoiding taking a slice from it as well. It might very well have random access though, so you would end up with one that exposes indexing but not slicing.
Well, with the proposed API, that would not be possible, because the ability to copy is what differentiates between a basic input range and a forward range (i.e. copying replaces save), in which case, if you have a non-copyable range, all you can do is iterate through the elements once. To have random access would require being a random-access range, which would require both being a bidirectional range and a forward range, so it would have to be copyable.
Now, you could still give a basic input range opIndex, since you can have extra functions on your ranges, but the range API would not consider it a random-access range, and it would not work with the normal algorithms that required anything beyond a basic input range.
- Jonathan M Davis
|
September 03 Re: Requirements for Slicing Ranges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Tuesday, 3 September 2024 at 00:54:32 UTC, Jonathan M Davis wrote: > On Monday, September 2, 2024 10:03:27 AM MDT monkyyy via Digitalmars-d wrote: >> I think slicing should probably be cut from that main api, Id argue its the hardest to get correct of any of the functions while its the most fragile in meta programming in my experiments >> >> Instead there should be a collection of implict range functions given special treatment(std.range.interface, object.d, somewhere new for ranges) >> >> --- >> >> ```d >> auto tail(R)(R r){ >> r.pop; >> return r; >> } >> ``` >> It would suck to implement tail everywhere, so tail should not be >> in the main api; but often you need to make a poped dup. If tail >> doesnt exist, several range algorithms will need 3 extra lines of >> code repeated several time, if its in the main api, every >> algorthim will need to impliment a `auto tail()=>r.tail;` busy >> work as a specail treatment floating function it is 3 lines, >> users dont need to implement anything and it will reduce >> complexity and clarity of several algorithms(consider zip.pop >> being implimented with static map) >> >> likewise `sliceable` should be a floating function and it wraps ranges and adds opSlice, implements the meta programming of detecting if its implementable *once*, instead of the each and every algorthim needing to juggle the complexity of opDollar existing or not, infinite or finite, error behavoir etc. > > As things stand, slicing finite ranges (and infinite ranges when using $) always results in the original type - just like it does when slicing dynamic arrays. I don't think that we want to lose that property of slicing just to avoid making it so that someone has to implement opSlice to get slicing. And if you're already implementing indexing, I don't think that it's particularly onerous to have to implement slicing as well. > > Maybe we could provide code to mix in which would make it easier, but ranges are supposed to be an abstraction for dynamic arrays / slices (with ranges in lower categories having fewer capabilities), and making slices have a different type breaks that abstraction. Obviously, the abstraction isn't perfect as things stand, but in general, we want code that's written to work with dynamic arrays to be able to work with random-access ranges with as few changes as possible (preferably with no changes outside of the function signature). > > Also, it's going to make algorithm code simpler if it doesn't have to worry about whether random-access ranges have slicing. As it is, we already have too many range-related traits that algorithms potentially have to branch on, and I expect that simplifying that buys us more than trying to make it so that someone who wants to implement a random-access range doesn't have to implement slicing. > > - Jonathan M Davis > *As things stand* > *just* As thing stands, I bet that the 50ish algorthims that should work correctly with slicing, dont handle the complexitys of all infinite, opDollar, nested ranges, with coherent error behavior in d2 and its had the 10 years and 100+ whatever contributors. I dont think is possible to get it correct with the old api; it always feels like a crapshoot if array like ranges work.I dont know why anyone would sign up to recreate anything resembling d2. My experiments had more ambitious plans for slicing, but it very quickly broke down. You should make a proof of concept algorithms lib with lets say 20 algorithms 10 of which are suppose to work with slicing; can you juggle all the complexity with the requirements, are you sure? It will only get worse with scaling it up to 100 people, real world use cases, changing requirements, and feature requests as you attempt to reach for 150-200 algorithms in d3. From my prospective your generating a very very **very** long wishlist on pure math theory that will not survive contact with d as a buggy compiler and templates that merely pretends to be context-free; instead of implementing and feeling it out what the compiler actually is; redoing the foundational pieces to be simple and strong as possible. |
September 05 Re: Requirements for Slicing Ranges | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Monday, 2 September 2024 at 02:47:16 UTC, Jonathan M Davis wrote: > One thing that I've been thinking about with regards to improving the range API which I forgot to discuss previously is the requirements for hasSlicing. Right now, for whatever reason, slicing is divorced from random-access ranges. In practice, random-access ranges have slicing (unless the programmer just didn't bother to implement it), and other ranges don't, but currently, hasSlicing is a separate trait that can theoretically pass with a forward range, and it's not guaranteed that a random-access range has slicing. So, my question is: > > 1. Are there any realistic situations where a range could be a forward or bidirectional range _and_ implement slicing in O(1), but it _can't_ actually be implemented as a random-access range? So, it would need to be possible to take slices of elements of such a range by index in O(1) and yet somehow not be possible to access individual elements by index in O(1). I can answer this direction immediately and formally. If a range has O(1) slicing, it has O(1) indexing, even if an opIndex isn't implemented: If I want range[i], I can replace it by either range[i .. i + 1].front or range[i .. $].front. It's guaranteed to exist by virtue of the assumption that slicing exists and the slicing operation returns a range, and every range has a front. > 2. Are there any realistic situations where a range could be a random-access > range but _can't_ implement slicing in O(1)? So, it would need to be > possible to access individual elements by index in O(1) and yet somehow not > be possible to take slices of elements by index in O(1). For the other direction, i.e. has indexing, but you want slicing, I think you're really in the practical uses category. Maybe a sparse dynamic array using a splay tree as a backing structure is a counterexample to your idea. A splay tree is an established search tree that optimizes repeated adjacent accesses of the same element, i.e. if you seek x (average O(log n), worst case O(n)) and then x again, the second lookup (and any third, fourth, ...) will be O(1) guaranteed. I don't see how this thing can implement slicing. But, again, it may not be an actual counterexample. I also remember that Judy arrays are supposedly really difficult to implement. Maybe they're a counterexample. I really don't know much about them. I looked some stuff up, but that wasn't enough to judge whether they can implement slicing or not. |
Copyright © 1999-2021 by the D Language Foundation