Jonathan M Davis
| On Saturday, May 18, 2024 8:26:18 AM MDT H. S. Teoh via Digitalmars-d wrote:
> On Thu, May 16, 2024 at 08:56:55AM -0600, Jonathan M Davis via Digitalmars-d wrote: [...]
>
> > 1. The easy one is that the range API functions for dynamic arrays will
> > not
> > treat arrays of characters as special. A dynamic array of char will be a
> > range of char, and a dynamic array of wchar will be a range of wchar.
> >
> > Any code that needs to decode will need to use the phobos v3 replacement for std.utf's decode or decodeFront - or use foreach - to decode the code units to code points (and if it needs to switch encodings, then there will be whatever versions of byUTF, byChar, etc. that the replacement for std.utf will have).
>
> I thought we already have this? std.string.byRepresentation, std.uni.byCodePoint, std.uni.byGrapheme already fill this need.
Yes, but Phobos v3 will need its own solutions for those - even if it's simply porting over the existing ones and making them work with the new range API - because nothing in Phobos v3 is going to depend on Phobos v2 at all. Most of Phobos v3 is likely to be a cleaned up version of what's in Phobos v2 rather than something completely different (though in some cases, there will be more drastic changes), but Phobos v3 is intended to ultimately replace Phobos v2 so that you won't need to use it for anything any longer (much as v2 will stick around long term so that old code continues to compile).
The point here is that because auto-decoding will be gone, you'll need to explicitly call the functions for decoding if you want decoding, and if you want graphemes or some other conversion, you'll need to call the appropriate functions for that. The new range API itself does not deal with them at all. We do already have such functions for Phobos v2, but anyone using the range API from Phobos v3 will be calling the Phobos v3 versions of those functions.
>
> [...]
>
> > However, with infinite ranges, there is no such solution. If they cannot be default-initialized, then they either can't be ranges, or they would have to be finite ranges which would just never be empty if they're constructed at runtime (while doing something like the flag trick to make their init value empty). And it's certainly true that the range API doesn't (and can't) guarantee that finite ranges are truly finite, but it's still better if we can define infinite ranges that need to be constructed at runtime as infinite ranges, since then we can get the normal benefits that come from statically knowing that a range is infinite.
>
> Infinite ranges also have the peculiarity that slicing may create a finite range, i.e., the underlying type changes. That's another wrinkle to deal with.
Yes, though I think that the decision that I've gone with of making it so that you're allowed to have finite and infinite ranges disable default initialization while requiring that their init value be valid if they don't disable it (and further requiring that init be empty for finite ranges) avoids any real complications with slicing.
Any infinite range with slicing will have to define a finite range that follows all of the rules for finite ranges, which would include the requirements for initialization, but since the programmer has the full freedom to implement it like they would any other finite range, I don't think there's really much different about such ranges. Worst case, it's similar to a finite range that needs be constructed at runtime to really function properly (since those can still support default initialization by adding a flag to indicate whether they were default-initialized or not and then just have empty be true if they were).
Now, if we were to require that all ranges be default-initializable, since some infinite ranges can't work that way, those ranges would then have to be implemented as finite ranges that were never actually empty, and since slicing has to return the same type for finite ranges, they would end up with a more complex implementation, because the range would presumably have to check internally whether it was the "infinite" version or the truly finite version in portions of its code. So, that approach would be uglier because of that, but it could still be implemented.
If we required that finite ranges be default-initializable but allowed infinite ranges to disable default initialization, then an infinite range with slicing which disabled default initatialization _would_ have to be able to return a finite range which could be default-initialized, and that would then require doing something like using a Nullable to contain the infinite range that it would presumably be wrapping so that the finite range could be default-initialized (as well as being able to determine that it was empty when the infinite range hadn't been initialized). So, that would also be more complicated.
However, in both of those cases, it _would_ still be possible to correctly implement the finite range. It would just be annoying, whereas in the first case, it would be impossible to implement the infinite range as an infinite range if it can't be default-initialized and work correctly. So, that's definitely the larger issue.
In any case, going with the rule where we allow ranges to disable default initialization but require that their init value be valid if they can be default-initialized (and require that the init value be empty for finite ranges) avoids all of those complications. It does have the downside of generic range-based code having to deal with the possibility of ranges not being default-initializable, but that is a general problem with generic code rather than a range-specific one, and the solutions for it are not range-specific.
>
> [...]
>
> > 4. All ranges must be either dynamic arrays or structs (and not
> > pointers to structs either).
>
> This is not necessarily an ideal solution. In my own code I've often had to iterate over forward ranges via sub functions, where I expect the iteration state after returning from the sub function to be retained. There are two ways to do this:
>
> 1) Have two versions of every iteration function, one taking the range by value (with implicit saving of current iteration state), the other taking the range by reference (retain changes to iteration state upon return), which leads to a lot of code duplication; or:
>
> 2) Have a single version of each function and pass the range by reference, usually by passing a pointer to it (since the current API would transparently treat the pointer as the range itself).
>
> Prohibiting pointers eliminates option (2), and leaves me with the
> non-ideal situation (1) where I need lots of code duplication.
>
> Although, come to think of it, we could have a .byRef range wrapper that encapsulates a pointer to the range so that changes to iteration state would be preserved. But then it begs the question, why not just allow pointers in the first place? Why require jumping through extra hoops?
The short answer to this is that we cannot allow forward ranges to be reference types and get rid of save, and the fact that save has been part of the range API has proven to be a big mistake. Along with the removal auto-decoding, it's the #1 change that we've talked about wanting to do for years now. I wrote a really long reply to this before explaining a bunch of details about why locking down the copy semantics matter and deleted it, because it was getting way too long, but ultimately, the fact of the matter is that if we want to get rid of save, we have no choice but to require that copying forward ranges results in an independent copy. If you can make that work with some sort of wrapper type without violating the range API, then have at it (though my experience with RefRange is that it was a big mistake on my part and that attempting such semantics is too error-prone). Either way, save needs to go.
And the reality of the matter is that any code that currently passes a range to a function by value and then does _anything_ with that range after that is relying on the copy semantics of that particular range type as well as the current implementation of the function being called, since generic code cannot rely on the copy semantics of ranges, and if the function was going to make any promises about the state of the range afterwards, it would have taken it by ref. So, unless you control all of the code in question, you're relying on unspecified behavior and are just lucky that it happens to work right now. And if you are in full control of the code, then you can do whatever you want regardless of what the range API says, though Phobos functions will be statically enforcing as much of the range API as they can, so it's true you won't be able to do stuff like use a reference type range with any Phobos code.
But in order get rid of save and lock down the copy and assignment semantics of ranges so that generic code can actually rely on those semantics, reference type ranges have to be banned - or they'll have to be non-copyable ranges. We _could_ make it so that basic input ranges are required to be reference types instead of non-copyable, but that creates a different set of issues, and it wouldn't help you if you want forward ranges which are reference types. Those simply aren't possible if we get rid of save unless we force all forward ranges to be reference types, which would mean having to wrap arrays to make them be ranges as well as significantly increasing how much D code would need to allocate on the heap.
> > 11. Finite random-access ranges are required to implement opDollar, and their opIndex must work with $. Similarly, any ranges which implement slicing must implement opDollar, and slicing must work with $.
> >
> > In most cases, this will just be an alias to length, but barring a language change that automatically treats length as opDollar (which has been discussed before but has never materialized and is somewhat controversial given types where it wouldn't make sense to treat length as opDollar), we have to require that opDollar be defined, or generic code won't be able to use $ with indexing or slicing. We probably would have required it years ago except that it would have broken code to add the requirement.
>
> [...]
>
> Will this also require implementing arithmetic operators on the return
> type of opDollar? Otherwise things like r[0 .. $-1] still wouldn't
> work correctly. Or r[0 .. complicatedMathFunc(($-1)/2)].
I have a detailed reply on this topic in my reply to Steven, but the short answer is that yes, they'll be required for opDollar with finite ranges (they make no sense for infinite ranges), but it's probably going to make sense to just require that opDollar evaluate to size_t, since length already has to be size_t, and I don't think that we want to change that.
- Jonathan M Davis
|