March 06
On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
> Every time this topic comes up, class-based ranges become the whipping boy of range design woes.  Actually, they serve an extremely important role: type erasure, which is critical when you have code like this:
> ...etc.

This would be a complete non-issue if we had language-level sumtypes:

```
	sumtype { R, S } myRangeFunc(R,S)(R range1, S range2) {
		if (runtimeDecision()) {
			return range1;
		} else {
			return range2;
		}
	}
```

std.sumtype can also do this, the ergonomics just aren't as nice.
March 06
On Wed, Mar 06, 2024 at 05:43:05PM +0000, Paul Backus via Digitalmars-d wrote:
> On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
> > Every time this topic comes up, class-based ranges become the whipping boy of range design woes.  Actually, they serve an extremely important role: type erasure, which is critical when you have code like this:
> > 
> > 	auto myRangeFunc(R,S)(R range1, S range2) {
> > 		if (runtimeDecision()) {
> > 			return range1;
> > 		} else {
> > 			return range2;
> > 		}
> > 	}
> > 
> > [...]
> > 
> > Note that you can't use a struct wrapper here, because R and S have different ABIs; the only way to correctly forward range methods to R or S is to use overridden base class methods. IOW, the type erasure of R and S is unavoidable for this code to work.
> 
> This is what std.range.choose [1] is for. Internally, it's implemented as a tagged union of R and S.
> 
> [1] https://dlang.org/library/std/range/choose.html

Haha, I knew that one would be brought up.  Unfortunately, it only works for a very limited subset of use cases. In the above simplified code you could replace the if-statement with .choose. However, this immediately falls flat if the decision is made by a switch statement, or nested if-blocks, or an early exit from a loop.

The problem is that .choose requires that both ranges are constructible in the same scope that it is called in. In non-trivial code this is sometimes impossible, as you may be in a decision tree where certain subranges are only constructed if the condition is true, or may depend on value types that may not even exist if the condition is not satisfied.  This makes .choose essentially useless outside of trivial cases.

Besides, .choose does not type-erase, so in the cases where you want type erasure it's still no help.


T

-- 
English is useful because it is a mess. Since English is a mess, it maps well onto the problem space, which is also a mess, which we call reality. Similarly, Perl was designed to be a mess, though in the nicest of all possible ways. -- Larry Wall
March 07
On 07/03/2024 6:38 AM, Paul Backus wrote:
>     The only tricky aspect is ranges that are references
>     (classes/pointers). Neither of those to me should be supported IMO,
>     you can always wrap such a thing in a range harness.
> 
> The main thing you lose by dropping support for reference-type ranges is interfaces. In particular, the interface inheritance hierarchy in |std.range.interfaces|, where |ForwardRange| inherits from |InputRange| and so on, cannot really be replicated using |structs| (|alias this| only goes so far).

All of this would ideally be solved with the introduction of signatures.

But alas as we've covered in past conversations that is a big set of features to introduce.
March 06
On Wed, Mar 06, 2024 at 05:51:11PM +0000, Meta via Digitalmars-d wrote:
> On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
> > Every time this topic comes up, class-based ranges become the whipping boy of range design woes.  Actually, they serve an extremely important role: type erasure, which is critical when you have code like this: ...etc.
> 
> This would be a complete non-issue if we had language-level sumtypes:
> 
> ```
> 	sumtype { R, S } myRangeFunc(R,S)(R range1, S range2) {
> 		if (runtimeDecision()) {
> 			return range1;
> 		} else {
> 			return range2;
> 		}
> 	}
> ```
> 
> std.sumtype can also do this, the ergonomics just aren't as nice.

No, it doesn't work the way you think it would. The returned range must behave like an input range (forward, etc.), as a single entity, because the caller doesn't and shouldn't know or care whether the returned value is a sumtype or not.  It must be able to just call .front, .empty, .popFront on the result without any explicit unwrapping.

You can't do that with a sumtype, because you need to unwrap first. And even if you make unwrapping implicit, so that a sumtype allows operations common to all underlying types, there are implementational difficulties.  Calling e.g.  .front on a sumtype potentially accessing two completely incompatible functions. For example, if R.empty is a function but S.empty is a field, there is no sane way to implement the sumtype in such a way that the caller could just access .empty without introducing a runtime switch into every range API call. (And even if you were to implement things in this convoluted way, there are unaddressed difficulties of what .empty on a sumtype might mean if it refers to two completely incompatible things, like a 3-parameter function in R and an alias to a symbol in S. What would mySumtype.empty actually compile to in that case?)


T

-- 
It's bad luck to be superstitious. -- YHL
March 06
On Wednesday, March 6, 2024 6:09:52 AM MST Ogi via Digitalmars-d wrote:
> On Wednesday, 6 March 2024 at 12:20:40 UTC, Atila Neves wrote: I thought I'd found a clever way to
>
> > make this work for classes but apparently this crashes, much to my surprise:
> >
> > ```d
> > class Class {
> >
> >     bool empty() @safe @nogc pure nothrow scope const {
> >
> >         return this is null;
> >
> >     }
> >
> > }
> >
> > Class c;
> > assert(c.empty);
> > ```
>
> I’m surprised that the equivalent C++ code *doesn’t* crash (at
> least on my machine):
> ```
> Class* c = nullptr;
> assert(c->empty());
>
> That’s still UB though.

The issue is whether the function is virtual. In D, class member functions are virtual by default, so you get a crash when the program attempts to use the vtable to look the function up (and therefore attempts to dereference null), whereas in C++, you don't, because class member functions are non-virtual by default, and so it doesn't do anything with the vtable. If you mark the function as final in D (without it overriding anything), then there shouldn't be a crash, and if you mark the C++ function as virtual, then there will be.

- Jonathan M Davis




March 06
On Wednesday, March 6, 2024 7:18:50 AM MST Paul Backus via Digitalmars-d wrote:
> On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
> > 2. The range API provides no way (other than fully iterating through a range) to get an empty range of the same type from a range unless the range is a random-access range.
>
> Genuine question: what are the use-cases for this?
>
> In general, the capabilities of ranges are designed to serve the needs of algorithms. Input ranges exist because single-pass iteration is all that's needed to implement algorithms like map, filter, and reduce. Random-access ranges exist because they're needed for sorting algorithms. And so on.
>
> I'm not aware of any algorithm, or class of algorithm, that needs this specific capability you're describing. If such algorithms exist, we should be using them to guide our design here. If they don't...then maybe this isn't really a problem at all.

There are times where I've had to make a range empty in algorithms in order to indicate that it's done or force it to be done. This is particularly true with some algorithms I've written where the range was passed by ref. IIRC, I tried to add something to std.range for this years ago (either that, or I tried to make takeNone do it; it's been a while) using the init value, but Andrei nixed it on the grounds that we couldn't rely on it. The result is that while we have takeNone, you can't use it to actually make a range empty. You can just use it to get an empty range of a potentially different type.

So, while this is not something that I've seen come up frequently, this is definitely something that I've seen come up in my own code.

And in general, if we're able to require that the init value be empty for finite ranges, that will reduce the number of bugs surrounding member variables and the init value, since it's fairly common to end up using the init value of a range when you have to store it as a member variable (even if it was a Voldemort type to begin with), and the natural thing to do at that point is to assume that the init value is empty (even though in reality, for some range types right now, it isn't even in a valid).

But yes, the desire to have a way to force a range to be empty is something that's come from actual experience.

- Jonathan M Davis



March 06
On Wednesday, March 6, 2024 10:32:17 AM MST H. S. Teoh via Digitalmars-d wrote:
> On Wed, Mar 06, 2024 at 04:47:02PM +0000, Steven Schveighoffer via Digitalmars-d wrote: [...]
>
> > The only tricky aspect is ranges that are references (classes/pointers).  Neither of those to me should be supported IMO, you can always wrap such a thing in a range harness.
>
> [...]
>
> Every time this topic comes up, class-based ranges become the whipping boy of range design woes.  Actually, they serve an extremely important role: type erasure, which is critical when you have code like this:
>
>   auto myRangeFunc(R,S)(R range1, S range2) {
>       if (runtimeDecision()) {
>           return range1;
>       } else {
>           return range2;
>       }
>   }
>
> This generally will not compile because R and S are different types, even if both conform to a common underlying range API, like an input range. To remedy this situation, class-based range wrappers come to the rescue:
>
>   auto myRangeFunc(R,S)(R range1, S range2) {
>       if (runtimeDecision()) {
>           return inputRangeObject(range1);
>       } else {
>           return inputRangeObject(range2);
>       }
>   }
>
> Note that you can't use a struct wrapper here, because R and S have different ABIs; the only way to correctly forward range methods to R or S is to use overridden base class methods. IOW, the type erasure of R and S is unavoidable for this code to work.
>
> //
>
> Also, class-based ranges are sometimes necessary for practical reasons, like in this one program I had, that has an UFCS chain consisting of hundreds of components (not all written out explicitly in the same function, of course, but that's what the underlying instantiation looks like).  It generated ridiculously large symbols that crashed the compiler. Eventually the bug I filed prodded Rainer to implement symbol compression in DMD. But even then, the symbols were still ridiculously huge -- because the outermost template instantiation had to encode the types of every one of the hundreds of components.  As a result, symbol compression or not, compile times were still ridiculously slow because the compiler still had to copy those symbols around -- their length grew quadratically with every component added to the chain.
>
> Inserting a call to .inputRangeObject in the middle of the chain dramatically cut down the size of the resulting symbols, because it effectively erased all the types preceding it, resulting in much saner codegen afterwards.

My current plan is to retain std.range.interfaces in some form or another so that we can have ranges as classes as far as their implementation goes (at my job, we actually have to use classes for ranges, because we have to be able to wrap arbitrary ranges at runtime, since we're dealing with an intepreter). However, they would not count as ranges by any range-based algorithms, because they would be classes, and so they would have to be wrapped in structs to actually be used as ranges. The wrapper structs would then do all of the work to ensure that the proper copying behavior was implemented.

So, you'd do something like

    auto r = inputRangeObject(range1);

and get a struct range wrapping a class from the replacement for std.range.interfaces. And if we need the ability to get at the underlying class for something, it's as simple as providing a member to the struct which returns it - e.g. source, like we do with several other range types. Then you could do something like

auto r = inputRangeObject(range1);
InputRange cls = r.source;

Of course, you lose access to the class as soon as it's wrapped in another range type, but that happens right now when you pass ranges around which are classes. So, that shouldn't really change much from what I can see. It's basically just wrapping classes in structs so that the struct does the behavior that we currently do with save (and it could do it more intelligently by taking advantage of reference counting rather than dup-ing the class whenever the range is copied).

So, unless I'm missing something here, we _should_ be able to just wrap classes in structs to allow ranges which are implemented as classes even if the classes themselves can't be ranges any longer. The net result should be that dealing with forward ranges is simpler, and we stop having save-related bugs, but we shouldn't actually lose any functionality.

- Jonathan M Davis



March 06
On Wed, Mar 06, 2024 at 01:03:35PM -0700, Jonathan M Davis via Digitalmars-d wrote: [...]
> My current plan is to retain std.range.interfaces in some form or another so that we can have ranges as classes as far as their implementation goes (at my job, we actually have to use classes for ranges, because we have to be able to wrap arbitrary ranges at runtime, since we're dealing with an intepreter). However, they would not count as ranges by any range-based algorithms, because they would be classes, and so they would have to be wrapped in structs to actually be used as ranges. The wrapper structs would then do all of the work to ensure that the proper copying behavior was implemented.
[...]
> So, unless I'm missing something here, we _should_ be able to just wrap classes in structs to allow ranges which are implemented as classes even if the classes themselves can't be ranges any longer. The net result should be that dealing with forward ranges is simpler, and we stop having save-related bugs, but we shouldn't actually lose any functionality.
[...]

This actually sounds like a reasonable approach.


T

-- 
Applying computer technology is simply finding the right wrench to pound in the correct screw.
March 06
On Wednesday, 6 March 2024 at 18:07:22 UTC, H. S. Teoh wrote:
> On Wed, Mar 06, 2024 at 05:51:11PM +0000, Meta via Digitalmars-d wrote:
>> On Wednesday, 6 March 2024 at 17:32:17 UTC, H. S. Teoh wrote:
>> > Every time this topic comes up, class-based ranges become the whipping boy of range design woes.  Actually, they serve an extremely important role: type erasure, which is critical when you have code like this: ...etc.
>> 
>> This would be a complete non-issue if we had language-level sumtypes:
>> 
>> ```
>> 	sumtype { R, S } myRangeFunc(R,S)(R range1, S range2) {
>> 		if (runtimeDecision()) {
>> 			return range1;
>> 		} else {
>> 			return range2;
>> 		}
>> 	}
>> ```
>> 
>> std.sumtype can also do this, the ergonomics just aren't as nice.
>
> No, it doesn't work the way you think it would. The returned range must behave like an input range (forward, etc.), as a single entity, because the caller doesn't and shouldn't know or care whether the returned value is a sumtype or not.  It must be able to just call .front, .empty, .popFront on the result without any explicit unwrapping.
>
> You can't do that with a sumtype, because you need to unwrap first. And even if you make unwrapping implicit, so that a sumtype allows operations common to all underlying types, there are implementational difficulties.  Calling e.g.  .front on a sumtype potentially accessing two completely incompatible functions. For example, if R.empty is a function but S.empty is a field, there is no sane way to implement the sumtype in such a way that the caller could just access .empty without introducing a runtime switch into every range API call. (And even if you were to implement things in this convoluted way, there are unaddressed difficulties of what .empty on a sumtype might mean if it refers to two completely incompatible things, like a 3-parameter function in R and an alias to a symbol in S. What would mySumtype.empty actually compile to in that case?)
>
>
> T

Ah, you're right. There is really no way to do this in a completely flexible way in current D, if you also want to have value range semantics. Couldn't we somehow fit RefRange into the current range hierarchy somewhere, and provide primitives to make it more difficult to shoot yourself in the foot?
March 06

On Wednesday, 6 March 2024 at 17:57:36 UTC, H. S. Teoh wrote:

>

The problem is that .choose requires that both ranges are constructible in the same scope that it is called in.

It's not even about those ranges being constructible. It's about separation of concerns. Like you wrote in your earlier post, sometimes we want to work with an arbitrary range without templating all of our code. Maybe for compile time reasons, or maybe to write a library that can be separately compiled from the client code.

>

This makes .choose essentially useless outside of trivial cases.

It's far from useless, it's only like taking an enum argument and switching on it instead of taking a delerate argument. Certainly works but means you have to list all your cases in the function, which may or may not be what you want.

The gripe I have with std.range.interface is lack of attributes in the virtual functions. I don't think any simple attribute set alone would solve the problem well. Granted, there should rarely be reason to have a dynamic @system range type, but pure and nothrow are attributes you sometimes really want, sometimes you really don't. Therefore we need the ability to pick, meaning the interfaces will have to be templated, along the lines of:

interface ForwardRange(uint attrs) : InputRange!attrs
{
   // ...
}

Plus there should probably be function to convert, for example, a

BidirectionalRange!(FunctionAttribute!safe | FunctionAttribute!pure_ | FunctionAttribute!nothrow_)

to

BidirectionalRange!(FunctionAttribute!safe | FunctionAttribute!nothrow_)

.