May 17
On 17/05/2024 6:40 PM, Jonathan M Davis wrote:
> All in all, I do not understand why you think that wrapping arrays improves
> anything. From what I can see, it's just purely worse.

Given the problems listed it is a possible solution.

That doesn't mean it's better. But it is a common enough solution in PL design space that it is worth considering.

For stuff like this I want to evaluate ideas not just to get to the first good one that could work, but also to make sure it actually is the best candidate.
May 17
On Friday, May 17, 2024 12:53:07 AM MDT Richard (Rikki) Andrew Cattermole via Digitalmars-d wrote:
> On 17/05/2024 6:40 PM, Jonathan M Davis wrote:
> > All in all, I do not understand why you think that wrapping arrays
> > improves
> > anything. From what I can see, it's just purely worse.
>
> Given the problems listed it is a possible solution.
>
> That doesn't mean it's better. But it is a common enough solution in PL design space that it is worth considering.
>
> For stuff like this I want to evaluate ideas not just to get to the first good one that could work, but also to make sure it actually is the best candidate.

I have no problem with it being presented and evaluated, but I don't see any reason why it would be better. It seems like it's just worse.

- Jonathan M Davis



May 17

On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:

>

Alternatively, we could add an enum of some kind to the new range API to make it different. It would be kind of ugly, but it would allow us to keep all of the current function names

Actually, this option worth exploring.

We can see now that implicit interfaces have downsides. When the interface changes, we are forced to jump through hoops to avoid problems. And what would will do if at some point in future we want to change API again? Come up with another set of names? Versioning would solve this once and for all.

If you think that enum is ugly, we can use an UDA instead. This way, ranges that use the new API will be instantly recognizable.

May 17
On Fri, May 17, 2024 at 01:22:48PM +0000, Ogi via Digitalmars-d wrote:
> On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
> > Alternatively, we could add an enum of some kind to the new range API to make it different. It would be kind of ugly, but it would allow us to keep all of the current function names
> 
> Actually, this option worth exploring.
> 
> We can see now that implicit interfaces have downsides. When the interface changes, we are forced to jump through hoops to avoid problems. And what would will do if at some point in future we want to change API again? Come up with another set of names? Versioning would solve this once and for all.
> 
> If you think that enum is ugly, we can use an UDA instead. This way, ranges that use the new API will be instantly recognizable.
[...]

+1, I like the UDA idea.  It will be much less effort to migrate to the new range API to slap a UDA on, than to rename every range method.


T

-- 
Bare foot: (n.) A device for locating thumb tacks on the floor.
May 17
On Friday, May 17, 2024 8:30:47 AM MDT H. S. Teoh via Digitalmars-d wrote:
> On Fri, May 17, 2024 at 01:22:48PM +0000, Ogi via Digitalmars-d wrote:
> > On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
> > > Alternatively, we could add an enum of some kind to the new range API to make it different. It would be kind of ugly, but it would allow us to keep all of the current function names
> >
> > Actually, this option worth exploring.
> >
> > We can see now that implicit interfaces have downsides. When the interface changes, we are forced to jump through hoops to avoid problems. And what would will do if at some point in future we want to change API again? Come up with another set of names? Versioning would solve this once and for all.
> >
> > If you think that enum is ugly, we can use an UDA instead. This way, ranges that use the new API will be instantly recognizable.
>
> [...]
>
> +1, I like the UDA idea.  It will be much less effort to migrate to the new range API to slap a UDA on, than to rename every range method.

It wouldn't fix the import problem. Realistically, it's not going to work to retain the same function names and have arrays work as ranges without needing an explicit import. Adding the functions to object.d breaks pretty much all range-based code in existence due to symbol conflicts, and due to an issue with how selective imports work, you can't even use selective imports with std.range to fix it; you have to use selective imports with std.range.primitives - and of course, you'd have to use selective imports for every range API function, which would be tedious. Using new function names makes it so that we don't have any symbol conflicts - in addition to allowing us to statically distinguish between the old and new APIs. It would also make it easy to see at a glance which API a piece of code is using.

Also, having the main difference that can be statically checked between the old and new API be just a UDA makes it much more likely that code will accidentally be set up for the wrong range API version, because someone forgot the UDA, and it will make it much riskier to try to write code that's overloaded on the old and new APIs. It would not surprise me if a number of library writers will want to make it so that their code works with both APIs by creating overloads that call the new Phobos functions to wrap an old range so that it works with the new API.

Some ranges will be easily distinguishable due to a lack of save, but if the code is written to work on basic input ranges (or it overloads on the category of the range), then if the primary difference is whether the programmer remembered to put a UDA on the type or not, the odds are pretty high that the wrong overload will be used some percentage of the time. Whether that outright makes the code behave incorrectly or just degrades its performance is going to be highly dependent on what it's doing, but I'm afraid that if the main difference between the old and new APIs is a UDA, it's going to be missed in many cases and cause issues, whereas a different set of function names makes the difference crystal clear and completely obviates the risk of a range from one version of the API being mistaken as a range from the other version.

And maybe it won't be at all common to try to overload code based on which range API it's using, but I don't think that there's much question that it will be less error-prone if the two APIs are distinct.

But regardless of how risky it is to have the primary difference simply be a UDA, I don't see how the import problem can be solved without changing the names. And Walter has explicitly requested that we fix it so that we no longer need to import anything for arrays to be treated as ranges.

We could force them to not be ranges at all and use wrapper types like Rikki is suggesting, but I can't imagine that Walter would be any happier about that, and having wrappers like that typically causes issues (especially with type introspection and any code that actually needs to operate on dynamic arrays), whereas the primary issues with having new names is that you have to remember to use the new ones when writing code that uses the new API, and when updating code, you'll have to do a search and replace.

All in all, while renaming the functions is annoying, it definitely will solve some of the issues that we have, and I don't expect that it will ultimately be a big issue. And the more I thought about it while working out the proposal, the more I came to the conclusion that having the difference between the two API versions be visible at a glance would reduce the number of problems that we'd have. It makes the introspection situation cleaner, it makes it much more likely that making a mistake will result in a compilation error, and it makes it easier to understand what's going on when you read the code instead of having to dig around to figure out which version of the API is actually being used. And it allows us to get rid of needing to import anything to use arrays as ranges, which I don't think that we can actually do with the current function names.

- Jonathan M Davis



May 18
Jonathan M Davis kirjoitti 16.5.2024 klo 17.56:> This is a proposal for an updated input range API for Phobos v3.

Good effort writing all this up! I certainly like most of this. However, some questions / proposals I do have.

> 
> Some of the Problems with Input Ranges
> --------------------------------------
> 
> 6. The range API does not currently specify what happens if you copy front
> and then call popFront. For the vast majority of code, it is both assumed
> that you can copy front and that the copy of front will be unaffected by
> calling popFront. However, there _are_ ranges (such as std.stdio's ByLine)
> which will do things like use a dynamic array as a buffer which has its
> elements overwritten in popFront, resulting in the elements of the copy
> being mutated, since it's a slice of the same data. Such ranges are
> sometimes referred to as having a transient front, and they do not work with
> a lot of range-based code. As such, they either need to be disallowed, or we
> need a way to detect such behavior so that range-based code can check for
> it.

I think this one is beyond the scope of the range spec. Whether the buffer of `ByLine` gets copied on element copy is an issue about the element of the range, not about the range itself. If this needs fixing, the element type of `byLine` needs to be changed, not the range spec.

Otherwise `front` would need to always deep copy the element on each invocation, which would make any range of ranges horribly inefficient!
> 
> 9. const makes ranges useless. Realistically, a const range is not a range,
> because you can't call popFront, and you can't iterate through it
> (technically, it _is_ possible to have popFront be const in cases where the
> iteration state doesn't actually live in the range, and popFront isn't pure,
> but normal ranges do not work that way).

In principle it goes like: elements being const is a non-issue, it's just another element type. Range itself can be const, but has to be cast to non-const with const elements to be actually iterable.

In practice, no way! With some care, you can write a range that copes with const elements, but even that is surprisingly thorny since a struct having a const field runs into all sorts of issues.

But just try designing a range that can be cast between const and tail const! Bar for some niche ranges like TakeNone, it would be extremely hard if not impossible with the current language.

> 
> 10. While it's not exactly a problem with the range API, it would be nice if
> we didn't have to import std.range / std.range.primitives (or the Phobos v3
> equivalent) to use arrays as ranges.

Respectful disagree on this one. Ranges are (bar for the foreach range API) a Phobos concept, not a language concept. It's good to be able to pick whether functions in your module will digest ranges, and if, whether it's V2, V3 or some other API. I wish for no changes here.


> 
> Overview of Proposed Changes
> ----------------------------
> 
> 3. If a range is finite (that is, it's not statically known to be empty),
> and it can be default-initialized, then its init value must be empty. If it
> cannot be default-initialized, then it must define a function called
> emptyRange which returns an empty range of the same type. Phobos will then
> define a function called emptyRange which takes a finite range which can be
> default-initialized and returns the init value of that range.

I suggest a slight relaxation. A finite range must have an empty init value if it does not provide an explicit `emptyRange`. That is, any range, finite or infinite, may have any valid `.init` value, as long as it's `.emptyRange` (whether explicit or Phobos-provided) is actually empty.


> 4. All ranges must be either dynamic arrays or structs (and not pointers to
> structs either).

Maybe unions too? Not that it's often needed.


> 9. In order to deal with transient front, we will explicitly disallow it. It
> will be a requirement that if front is copied, the copy will be unaffected
> by popFront. Of course, this doesn't prevent other references to the same
> data from mutating it (e.g. if it's a reference type), but calling popFront
> will not mutate it.

I'd rather have this one as a recommendation as opposed to something other ranges are allowed to assume. That would probably turn some current uses of `recurrence` or `generate` to unspecified behaviour, silently. In the case of `ByLine`, I agree the proposed design is superior but I wouldn't want to declare the present design illegal either.

> 
> 11. Finite random-access ranges are required to implement opDollar, and
> their opIndex must work with $. Similarly, any ranges which implement
> slicing must implement opDollar, and slicing must work with $.
> 
> In most cases, this will just be an alias to length, but barring a language
> change that automatically treats length as opDollar (which has been
> discussed before but has never materialized and is somewhat controversial
> given types where it wouldn't make sense to treat length as opDollar), we
> have to require that opDollar be defined, or generic code won't be able to
> use $ with indexing or slicing. We probably would have required it years ago
> except that it would have broken code to add the requirement.

Why would any language changes be needed for that? Can't `.length` be a Phobos function for types that have `opDollar` but not their own `.length`?


> Of course, there will be other minor changes (e.g. I'll probably rename
> isForwardRange to isCopyableRange),

I would have somewhat high bar for renaming that but leaving the other range names alone. I understand the names input range, forward range, bidirectional range and random access range all come from C++ iterators that are named exactly the same. So maybe best to either stick with the standard we follow, or break away in full and reconsider the names of all of them.

May 18
On Saturday, May 18, 2024 2:48:51 PM MDT Dukc via Digitalmars-d wrote:
> > 6. The range API does not currently specify what happens if you copy front and then call popFront. For the vast majority of code, it is both assumed that you can copy front and that the copy of front will be unaffected by calling popFront. However, there _are_ ranges (such as std.stdio's ByLine) which will do things like use a dynamic array as a buffer which has its elements overwritten in popFront, resulting in the elements of the copy being mutated, since it's a slice of the same data. Such ranges are sometimes referred to as having a transient front, and they do not work with a lot of range-based code. As such, they either need to be disallowed, or we need a way to detect such behavior so that range-based code can check for it.
>
> I think this one is beyond the scope of the range spec. Whether the buffer of `ByLine` gets copied on element copy is an issue about the element of the range, not about the range itself. If this needs fixing, the element type of `byLine` needs to be changed, not the range spec.
>
> Otherwise `front` would need to always deep copy the element on each invocation, which would make any range of ranges horribly inefficient!

It is something that has proven to be a problem. Many range-based functions will behave incorrectly if they copy front and then popFront mutates the copy. The range API needs to lock down the semantics involved for range-based code to actually be able to rely on them. Any range-based code which makes assumptions which aren't guaranteed by the range API risks breaking with types that do not follow those assumptions, and that's something that has been causing us problems for years, because not enough was locked down. And the possibility of front being transient is one of those issues that was missed when the range API was originally designed.

If deep-copying elements is an issue, then pointers or reference types should be used - or the elements should be non-copyable. We are improving the support for non-copyable types as part of this (whereas right now, most range-based code assumes that the elements are copyable), but experience has shown that allowing popFront to mutate the copy of front causes problems, and the rare case where something like that is actually desirable has alternate solutions.

In the case of byLine, it can use non-copyable elements, or it can use opApply, or it could use some sort of reference-counted type so that it can detect whether there are currently any copies which would be affected if the buffer was reused. There is no reason why it has to use its current solution.

Ultimately though, IMHO, it would be better for such a type to not even be a range than to make it so that range-based code has to worry about this issue. Experience has shown that it causes problems for most range-based code, and it really doesn't add much value.

> > 9. const makes ranges useless. Realistically, a const range is not a
> > range,
> > because you can't call popFront, and you can't iterate through it
> > (technically, it _is_ possible to have popFront be const in cases where
> > the
> > iteration state doesn't actually live in the range, and popFront isn't
> > pure, but normal ranges do not work that way).
>
> In principle it goes like: elements being const is a non-issue, it's just another element type. Range itself can be const, but has to be cast to non-const with const elements to be actually iterable.

Casting away const and then mutating the result violates D's type system. The only cases where it doesn't is when the cast actually copies the type. So, as soon as you have something like a range of const ranges, you're screwed.

> In practice, no way! With some care, you can write a range that copes with const elements, but even that is surprisingly thorny since a struct having a const field runs into all sorts of issues.
>
> But just try designing a range that can be cast between const and tail const! Bar for some niche ranges like TakeNone, it would be extremely hard if not impossible with the current language.

Fixing the issue of tail-const realistically requires a language DIP, and
Walter's DIP should make it so that a const range will be able to be
converted to a range of const elements, whereas right now, that only works
with dynamic arrays, because technically, const(Range!Foo) and
Range!(const Foo) are completely different and unrelated types as far as the
type system is concerned (technically, they could even have completely
different implementations in spite of the fact that they come from the same
template).

> > 10. While it's not exactly a problem with the range API, it would be nice if we didn't have to import std.range / std.range.primitives (or the Phobos v3 equivalent) to use arrays as ranges.
>
> Respectful disagree on this one. Ranges are (bar for the foreach range API) a Phobos concept, not a language concept. It's good to be able to pick whether functions in your module will digest ranges, and if, whether it's V2, V3 or some other API. I wish for no changes here.

Well, Walter has specifically requested that we fix it so that the import is no longer required, and a lot of us would love to get rid of it. Either way, this will have no effect on the v2 API, because the functions for the v3 API will be different. And since they only affect ranges, the only code that it would potentially cause problems for is code that wants to name one or more free functions which take dynamic arrays with exactly the same names.

Anyone who wants to create their own range API will still be free to do so.

> > Overview of Proposed Changes
> > ----------------------------
> >
> > 3. If a range is finite (that is, it's not statically known to be empty),
> > and it can be default-initialized, then its init value must be empty. If
> > it
> > cannot be default-initialized, then it must define a function called
> > emptyRange which returns an empty range of the same type. Phobos will then
> > define a function called emptyRange which takes a finite range which can
> > be
> > default-initialized and returns the init value of that range.
>
> I suggest a slight relaxation. A finite range must have an empty init
> value if it does not provide an explicit `emptyRange`. That is, any
> range, finite or infinite, may have any valid `.init` value, as long as
> it's `.emptyRange` (whether explicit or Phobos-provided) is actually empty.

That means that you cannot rely on a default-initialized finite range being valid and empty, which is part of the whole point here. Having that requirement will reduce bugs, and the lack of that requirement has caused bugs, particularly since plenty of folks end up assuming that the init value is empty even when there is no such guarantee. Right now, there isn't even a guarantee that it's valid, let alone empty.

At least when a range cannot be default-initialized, you get an error when you attempt instead of just getting buggy behavior. Requiring emptyRange in that case then allows us a way to get an empty range in spite of the fact that we can't rely on the range being default-initializable, but by requiring that finite ranges that can be default-initialized have an init value that's both valid and empty will prevent bugs due to folks assuming that the range is valid and empty like has already been happening. And since the vast majority of ranges can be default-initialized, emptyRange will only need to be declared in rare cases, and the free function, emptyRange, will take care of the cases where the range can be default-initialized.

> > 4. All ranges must be either dynamic arrays or structs (and not pointers
> > to
> > structs either).
>
> Maybe unions too? Not that it's often needed.

Unions can't have functions, so they couldn't directly be ranges anyway. But even if they could, passing around unions is just begging for trouble. They don't have postblit or copy constructors even when their members would need it (since the union doesn't keep track of which of its members is the valid one), and for code to work properly, it really needs to access whichever the currently valid member of the union is anyway. Trying to do anything with unions and ranges would just be begging for bugs.

> > 11. Finite random-access ranges are required to implement opDollar, and their opIndex must work with $. Similarly, any ranges which implement slicing must implement opDollar, and slicing must work with $.
> >
> > In most cases, this will just be an alias to length, but barring a
> > language
> > change that automatically treats length as opDollar (which has been
> > discussed before but has never materialized and is somewhat controversial
> > given types where it wouldn't make sense to treat length as opDollar), we
> > have to require that opDollar be defined, or generic code won't be able to
> > use $ with indexing or slicing. We probably would have required it years
> > ago except that it would have broken code to add the requirement.
>
> Why would any language changes be needed for that? Can't `.length` be a Phobos function for types that have `opDollar` but not their own `.length`?

Right now, generic code canot use $ with ranges, because the range API does not require that it be there, and there are many, many ranges which do not implement it even if they're random-access and/or have slicing. So, generic code cannot do something like

    auto slice = range[5 .. $ - 1];

or

    auto value = range[$ - 7];

For that to work, we have to actually require that $ work properly for ranges with slicing and/or random-access. Right now, you're forced to do stuff like

    auto slice = range[5 .. range.length - 1];

or

    auto value = range[range.length - 7];

which is far more verbose and does not play well with converting code that operates on dynamic arrays to code that operates on random-access ranges instead. hasSlicing and isRandomAccessRange likely would have reqired opDollar years ago, but it wasn't there initially, and adding it later would have broken too much code.

With the current language, for $ to work with a user-defined type for either indexing or slicing, that type must implement opDollar, because that's what the language replaces $ with. There is no other way to have $.

It has been suggested in the past that we make it so that the language treats length as opDollar if it's present and opDollar is not present, which would make it so that all random-access ranges automatically got a working $. However, it has also been brought up that there are types which have both indexing and length but where it would be inappropriate for length to be treated as opDollar - the prime example being hash tables / maps where $ wouldn't make any sense at all.

So, barring a language change, the most straightforward way for a finite range to implement opDollar is just to alias length to opDollar, and finite ranges without length shouldn't have opDollar anyway. But infinite ranges will have to define their own opDollar regardless, since they need one that's a type that just marks the end when slicing, and it makes no sense for their opDollar to be size_t, since size_t is not infinite.

The proposal here is basically just to require opDollar with slicing and indexing like we should have originally - and would have if it hadn't been forgotten.

- Jonathan M Davis



May 27

On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:

>
  1. Forward ranges will no longer have save. Rather, it will be required that copying a forward range will result in an independent copy. Both the original and the copy must then have the same elements in the same order, and iterating one will not affect the other. So, in essence, copying a forward range must be equivalent to calling save (though the actual implementation could be more clever than that).

So something I just thought of as a drawback here -- save gives a nice explicit mechanism to copy when passing to a function that takes a value by auto ref. In other words it forces non-ref semantics so you do not affect the original.

Since passing to an auto ref function lets the compiler decide to not make a copy, code that currently uses save for this purpose doesn't have an equivalent in the new regime.

If we aren't going to have it, we might want to add a specialized do-nothing UFCS function which just takes an lvalue and turns it into an rvalue. Maybe we just call it save! Or maybe just a boring asRvalue...

-Steve

June 04

On Saturday, 18 May 2024 at 20:48:51 UTC, Dukc wrote:

>

Jonathan M Davis kirjoitti 16.5.2024 klo 17.56:> This is a

>
  1. While it's not exactly a problem with the range API, it would be nice if
    we didn't have to import std.range / std.range.primitives (or the Phobos v3
    equivalent) to use arrays as ranges.

Respectful disagree on this one. Ranges are (bar for the foreach range API) a Phobos concept, not a language concept. It's good to be able to pick whether functions in your module will digest ranges, and if, whether it's V2, V3 or some other API. I wish for no changes here.

The range API is baked into the language through foreach semantics. It’s not a Phobos feature, it’s a core language feature.

July 03

On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:

>

[…]

Maybe good new names could be empty, front, and opPopFront for forward ranges, and empty and opFrontThenPop (or something).

Using op… makes sense as those functions get used by core-language constructs such as foreach. The reason empty and front aren’t op… is because those would often be called by programmers directly, and people would just define an alias anyway. One op… function suffices for a new interface.

As a matter of fact, ++range could lower to range.opPopFront() or, if that is void, (range.opPopFront(), range) (comma operator). And *range++ could lower to range.opFrontThenPop.

Another idea: For a forward range, front can be a field or property with a setter. The same can be true for input ranges’ opFrontThenPop which could be overloaded: If opFrontThenPop() is a getter for front, it’s an input range, and if there is opFrontThenPop(rhs) it’s an output range. Expressions of the form *range++ = rhs could lower to opFrontThenPop(rhs).

If the names are too range-specific, they could be generalized to opDereferenceUnary, such that *obj++ and *obj-- lower to opDereferenceUnary!"++" and opDereferenceUnary!"--", respectively.

If we go that path, a forward range could be defined by having empty, front, and supporting ++range (which implements popFront); an input range be defined by having empty and supporting *range++ (possibly by opDereferenceUnary!"++" that can be called with no arguments), and an output range could be defined by having empty and supporting *range++ = rhs (possibly by opDereferenceUnary!"++" that can be called with 1 argument).