Jonathan M Davis
| On Saturday, May 18, 2024 2:48:51 PM MDT Dukc via Digitalmars-d wrote:
> > 6. The range API does not currently specify what happens if you copy front and then call popFront. For the vast majority of code, it is both assumed that you can copy front and that the copy of front will be unaffected by calling popFront. However, there _are_ ranges (such as std.stdio's ByLine) which will do things like use a dynamic array as a buffer which has its elements overwritten in popFront, resulting in the elements of the copy being mutated, since it's a slice of the same data. Such ranges are sometimes referred to as having a transient front, and they do not work with a lot of range-based code. As such, they either need to be disallowed, or we need a way to detect such behavior so that range-based code can check for it.
>
> I think this one is beyond the scope of the range spec. Whether the buffer of `ByLine` gets copied on element copy is an issue about the element of the range, not about the range itself. If this needs fixing, the element type of `byLine` needs to be changed, not the range spec.
>
> Otherwise `front` would need to always deep copy the element on each invocation, which would make any range of ranges horribly inefficient!
It is something that has proven to be a problem. Many range-based functions will behave incorrectly if they copy front and then popFront mutates the copy. The range API needs to lock down the semantics involved for range-based code to actually be able to rely on them. Any range-based code which makes assumptions which aren't guaranteed by the range API risks breaking with types that do not follow those assumptions, and that's something that has been causing us problems for years, because not enough was locked down. And the possibility of front being transient is one of those issues that was missed when the range API was originally designed.
If deep-copying elements is an issue, then pointers or reference types should be used - or the elements should be non-copyable. We are improving the support for non-copyable types as part of this (whereas right now, most range-based code assumes that the elements are copyable), but experience has shown that allowing popFront to mutate the copy of front causes problems, and the rare case where something like that is actually desirable has alternate solutions.
In the case of byLine, it can use non-copyable elements, or it can use opApply, or it could use some sort of reference-counted type so that it can detect whether there are currently any copies which would be affected if the buffer was reused. There is no reason why it has to use its current solution.
Ultimately though, IMHO, it would be better for such a type to not even be a range than to make it so that range-based code has to worry about this issue. Experience has shown that it causes problems for most range-based code, and it really doesn't add much value.
> > 9. const makes ranges useless. Realistically, a const range is not a
> > range,
> > because you can't call popFront, and you can't iterate through it
> > (technically, it _is_ possible to have popFront be const in cases where
> > the
> > iteration state doesn't actually live in the range, and popFront isn't
> > pure, but normal ranges do not work that way).
>
> In principle it goes like: elements being const is a non-issue, it's just another element type. Range itself can be const, but has to be cast to non-const with const elements to be actually iterable.
Casting away const and then mutating the result violates D's type system. The only cases where it doesn't is when the cast actually copies the type. So, as soon as you have something like a range of const ranges, you're screwed.
> In practice, no way! With some care, you can write a range that copes with const elements, but even that is surprisingly thorny since a struct having a const field runs into all sorts of issues.
>
> But just try designing a range that can be cast between const and tail const! Bar for some niche ranges like TakeNone, it would be extremely hard if not impossible with the current language.
Fixing the issue of tail-const realistically requires a language DIP, and
Walter's DIP should make it so that a const range will be able to be
converted to a range of const elements, whereas right now, that only works
with dynamic arrays, because technically, const(Range!Foo) and
Range!(const Foo) are completely different and unrelated types as far as the
type system is concerned (technically, they could even have completely
different implementations in spite of the fact that they come from the same
template).
> > 10. While it's not exactly a problem with the range API, it would be nice if we didn't have to import std.range / std.range.primitives (or the Phobos v3 equivalent) to use arrays as ranges.
>
> Respectful disagree on this one. Ranges are (bar for the foreach range API) a Phobos concept, not a language concept. It's good to be able to pick whether functions in your module will digest ranges, and if, whether it's V2, V3 or some other API. I wish for no changes here.
Well, Walter has specifically requested that we fix it so that the import is no longer required, and a lot of us would love to get rid of it. Either way, this will have no effect on the v2 API, because the functions for the v3 API will be different. And since they only affect ranges, the only code that it would potentially cause problems for is code that wants to name one or more free functions which take dynamic arrays with exactly the same names.
Anyone who wants to create their own range API will still be free to do so.
> > Overview of Proposed Changes
> > ----------------------------
> >
> > 3. If a range is finite (that is, it's not statically known to be empty),
> > and it can be default-initialized, then its init value must be empty. If
> > it
> > cannot be default-initialized, then it must define a function called
> > emptyRange which returns an empty range of the same type. Phobos will then
> > define a function called emptyRange which takes a finite range which can
> > be
> > default-initialized and returns the init value of that range.
>
> I suggest a slight relaxation. A finite range must have an empty init
> value if it does not provide an explicit `emptyRange`. That is, any
> range, finite or infinite, may have any valid `.init` value, as long as
> it's `.emptyRange` (whether explicit or Phobos-provided) is actually empty.
That means that you cannot rely on a default-initialized finite range being valid and empty, which is part of the whole point here. Having that requirement will reduce bugs, and the lack of that requirement has caused bugs, particularly since plenty of folks end up assuming that the init value is empty even when there is no such guarantee. Right now, there isn't even a guarantee that it's valid, let alone empty.
At least when a range cannot be default-initialized, you get an error when you attempt instead of just getting buggy behavior. Requiring emptyRange in that case then allows us a way to get an empty range in spite of the fact that we can't rely on the range being default-initializable, but by requiring that finite ranges that can be default-initialized have an init value that's both valid and empty will prevent bugs due to folks assuming that the range is valid and empty like has already been happening. And since the vast majority of ranges can be default-initialized, emptyRange will only need to be declared in rare cases, and the free function, emptyRange, will take care of the cases where the range can be default-initialized.
> > 4. All ranges must be either dynamic arrays or structs (and not pointers
> > to
> > structs either).
>
> Maybe unions too? Not that it's often needed.
Unions can't have functions, so they couldn't directly be ranges anyway. But even if they could, passing around unions is just begging for trouble. They don't have postblit or copy constructors even when their members would need it (since the union doesn't keep track of which of its members is the valid one), and for code to work properly, it really needs to access whichever the currently valid member of the union is anyway. Trying to do anything with unions and ranges would just be begging for bugs.
> > 11. Finite random-access ranges are required to implement opDollar, and their opIndex must work with $. Similarly, any ranges which implement slicing must implement opDollar, and slicing must work with $.
> >
> > In most cases, this will just be an alias to length, but barring a
> > language
> > change that automatically treats length as opDollar (which has been
> > discussed before but has never materialized and is somewhat controversial
> > given types where it wouldn't make sense to treat length as opDollar), we
> > have to require that opDollar be defined, or generic code won't be able to
> > use $ with indexing or slicing. We probably would have required it years
> > ago except that it would have broken code to add the requirement.
>
> Why would any language changes be needed for that? Can't `.length` be a Phobos function for types that have `opDollar` but not their own `.length`?
Right now, generic code canot use $ with ranges, because the range API does not require that it be there, and there are many, many ranges which do not implement it even if they're random-access and/or have slicing. So, generic code cannot do something like
auto slice = range[5 .. $ - 1];
or
auto value = range[$ - 7];
For that to work, we have to actually require that $ work properly for ranges with slicing and/or random-access. Right now, you're forced to do stuff like
auto slice = range[5 .. range.length - 1];
or
auto value = range[range.length - 7];
which is far more verbose and does not play well with converting code that operates on dynamic arrays to code that operates on random-access ranges instead. hasSlicing and isRandomAccessRange likely would have reqired opDollar years ago, but it wasn't there initially, and adding it later would have broken too much code.
With the current language, for $ to work with a user-defined type for either indexing or slicing, that type must implement opDollar, because that's what the language replaces $ with. There is no other way to have $.
It has been suggested in the past that we make it so that the language treats length as opDollar if it's present and opDollar is not present, which would make it so that all random-access ranges automatically got a working $. However, it has also been brought up that there are types which have both indexing and length but where it would be inappropriate for length to be treated as opDollar - the prime example being hash tables / maps where $ wouldn't make any sense at all.
So, barring a language change, the most straightforward way for a finite range to implement opDollar is just to alias length to opDollar, and finite ranges without length shouldn't have opDollar anyway. But infinite ranges will have to define their own opDollar regardless, since they need one that's a type that just marks the end when slicing, and it makes no sense for their opDollar to be size_t, since size_t is not infinite.
The proposal here is basically just to require opDollar with slicing and indexing like we should have originally - and would have if it hadn't been forgotten.
- Jonathan M Davis
|