Range Redesign: Empty Ranges (page 5)

March 07

Re: Range Redesign: Empty Ranges

Posted by Steven Schveighoffer
in reply to Paul Backus

Permalink

Steven Schveighoffer

Posted in reply to Paul Backus

Permalink

On Wednesday, 6 March 2024 at 17:38:56 UTC, Paul Backus wrote:

On Wednesday, 6 March 2024 at 16:47:02 UTC, Steven Schveighoffer wrote:

On Wednesday, 6 March 2024 at 14:18:50 UTC, Paul Backus wrote:

On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:

The range API provides no way (other than fully iterating through a range) to get an empty range of the same type from a range unless the range is a random-access range.

Genuine question: what are the use-cases for this?

In general, the capabilities of ranges are designed to serve the needs of algorithms. Input ranges exist because single-pass iteration is all that's needed to implement algorithms like map, filter, and reduce. Random-access ranges exist because they're needed for sorting algorithms. And so on.

I'm not aware of any algorithm, or class of algorithm, that needs this specific capability you're describing. If such algorithms exist, we should be using them to guide our design here. If they don't...then maybe this isn't really a problem at all.

When you need to pass a specific range type, and you want to pass in an empty range of that type, how do you do it?

By "specific range type", do you mean a specific category of range (input, forward, random-access, etc.), or a specific concrete type?

If the former, this is already easy to do with existing language and library features (for example, in many cases you can use an empty slice).

If the latter, then the question I'm asking is, why do you need to do that?

I meant the latter. A concrete range type.

How does it happen? Consider that an array is a concrete type that people use all the time. Why should any other range be different? I've definitely stored ranges and other things as type members that were voldemort types.

This ability is more of a question of "do we want to add this feature to ranges or not?" The feature doesn't currently exist -- you can't assume that an uninitialized range is empty.

Sometimes, we are bitten by the fact that the array is the most common range, and behaves in a specific way. People depend on that mechanism without realizing it, and then sometime later, they decide to change the type to one that is very compatible with arrays, but offers some benefit (i.e. to remove an allocation). However, the new range type might behave in unexpected ways, but still compiles.

A few examples I can think of:

copying an array is equivalent to arr.save, but may not be the case for other forward ranges.
character arrays have a mechanism to decode into dchar if you specify dchar as the loop variable type.
arrays have a default value of an empty array.

Reducing surprises when you substitute what seems like a "compatible" type is desirable, but not strictly necessary. I.e. I'm also OK if we don't try and add these definitions.

For the empty array case, I think the mechanism is trivial to add as a formal requirement, since nearly all ranges default to empty, and the ones that don't are probably easy to change. Maybe this isn't the case? I don't know. I think reducing friction when it's easy to do is something we should always be looking at.

> >

The only tricky aspect is ranges that are references (classes/pointers). Neither of those to me should be supported IMO, you can always wrap such a thing in a range harness.

The main thing you lose by dropping support for reference-type ranges is interfaces. In particular, the interface inheritance hierarchy in std.range.interfaces, where ForwardRange inherits from InputRange and so on, cannot really be replicated using structs (alias this only goes so far).

As mentioned, you can wrap these interfaces into structs, which then have better lifetime tracking capabilities.

-Steve

On Thu, Mar 07, 2024 at 06:32:50PM +0000, Steven Schveighoffer via Digitalmars-d wrote: [...] > Sometimes, we are bitten by the fact that the array is the most common range, and behaves in a specific way. People depend on that mechanism without realizing it, and then sometime later, they decide to change the type to one that is very compatible with arrays, but offers some benefit (i.e. to remove an allocation). However, the new range type might behave in unexpected ways, *but still compiles*. Over the past decade of working with D, this has repeatedly come up as the weakness of a signature-constraint based approach to ducktyping. The problem is that what the signature constraint requires (e.g., isInputRange) may only be a subset of what the function body assumes, and there is no way to check this mechanically (signature constraints are Turing-complete). What ideally should happen is that whatever the code assumes should also be declared in the function signature. So if a template takes a parameter t of generic type T, and then goes and performs t++, then the compiler should enforce that ++ is declared as part of the constraints on T. The C++ concepts approach is better in this respect, in that the compiler can typecheck the function body and emit an error if it tries to perform an operation on T that it didn't declare as part of its signature constraint. Barring implementing concepts in D, which is unlikely to happen, the next best alternative is for the compiler to enforce that any operation on T must have a corresponding check in the signature constraint, and when this is not the case, it should, based on the attempted operation, issue an error message with a suggested addition to the sig constraint that would satisfy this requirement. This doesn't fully solve the problem here, of course -- sometimes the difference is semantic rather than something easily checkable by the compiler, like subtle differences between built-in arrays and user-defined arrays; in that case you're still up the creek without a paddle. But it's a step forward. T -- The trouble with TCP jokes is that it's like hearing the same joke over and over.

On Thursday, 7 March 2024 at 18:32:50 UTC, Steven Schveighoffer wrote:

I meant the latter. A concrete range type.

This doesn't answer my question.

There are lots of ways in which (built-in) arrays are different from other types of ranges. Most notably, an array can be used as either a range or a container. There are plenty of use-cases for creating an empty container.

What I am asking, specifically, is whether there is any use-case where generic code, given a range of some arbitrary type R, needs to create another range which both (a) has the exact concrete type R, and (b) is empty. Since that's the feature that's being proposed here.

(Non-generic code does not need this feature to be part of the range API, because it can rely on specific features of whatever concrete type it's working with.)

This ability is more of a question of "do we want to add this feature to ranges or not?" The feature doesn't currently exist -- you can't assume that an uninitialized range is empty.

If there are no use-cases for this feature, then the answer to "do we want to add it" ought to be "no." That's why I'm asking about use-cases.

This is a general problem with templates/macros compared to typed generics. Even if we get rid of this particular edge case, there are still dozens more that users are going to run into if they only test with arrays.

If we want to address this problem, I think the best thing we can do is to provide a standard suite of test ranges that users can plug into their code to uncover edge cases and bugs.

> >

As mentioned, you can wrap these interfaces into structs, which then have better lifetime tracking capabilities.

How do you implement this with structs?

interface ForwardAssignable : InputAssignable!E, ForwardRange!E

Interfaces allow multiple inheritance. Structs can only have one alias this member. Maybe you're fine with giving up on this feature, but let's at least be honest that we are giving up features here.

Forums