Jump to page: 1 25  
Page
Thread overview
March 04
Okay. A few weeks ago, I made a post about potential design decisions when updating the range API in the next version of Phobos, and this is a continuation of that, albeit with a different set of problems. Specifically:

1. The range API does not currently - and cannot currently - require or guarantee that the init value of a range is valid.

2. The range API provides no way (other than fully iterating through a range) to get an empty range of the same type from a range unless the range is a random-access range.

Fixing the first issue is straightforward, I think, though it potentially affects the second, and of course, someone here may have thoughts or insights which I've missed.

The issue with init is that of course code wants to use the init value, and code often _does_ use the init value for specific ranges, but doing so when that specific range type does not specify that the init value is valid (and documentation almost never does) is relying on unspecified behavior - and in the general case, using the init value of a range cannot be specified behavior. The obvious case is ranges which are classes, because their init value is going to be null, meaning that using the init value of a range could result in a segfault. Of course, _most_ range-based code doesn't use classes for ranges, so I expect that it's quite rare to hit that problem in practice, but most range-based code uses Voldemort types and tends to ignore the fact that the init value can still be used (either by using typeof on the result to get its type or by getting init directly from the result - e.g. range.init). And you actually need to do that when storing a range as a member variable. E.G.

struct Foo
{
    ...
    auto _range = typeof(chain(func1(), func2(), func3())).init;
    ...
}

The more correct thing to do would be something like

struct Foo
{
    ...
    auto _range = Nullable!(typeof(chain(func1(), func2(), func3())));
    ...
}

so that you can avoid using the range's init value, but realistically, I don't think that it's something that many folks think of. They use the range's init value, and it works (since it often does work), and then they end up relying on how that range type happens to work at the moment, which could change in the future, since the range API does not guarantee that it works, and the folks maintaining that range probably aren't thinking much about the init value anyway - especially when it's a Voldemort type. The typical expectation is that the user will be creating the range by calling the function that returns it, not by using type introspection to get their hands on the init value. But it is always possible to get the init value, and code declaring member variables from ranges will use it unless it goes to the extra effort of using something like Nullable so that it's possible to store the range without using its init value as the default value.

Now, I think that the fix for this particular problem is pretty easy. We just require that all ranges have a valid init value. This means disallowing classes or pointers being used directly as ranges, but we were already looking at doing that for forward ranges anyway in order to be able to get rid of save. It's arguably more problematic for basic input ranges, but it's quite feasible there as well. The previous thread discussed requiring that they be reference types, which would lean towards them being pointers or classes, but you can have structs which are reference types (e.g. they contain a class reference but then have the logic around it to ensure that it's not used when null). Also, it was proposed in that thread that we require that basic input ranges be non-copyable, which would also make it straightforward to require that the init value be valid, since in that case, all basic input ranges would have to be structs regardless of what their internals were doing with pointers or references. Either way, we can disallow classes and pointers being used directly as ranges and require that all ranges have a valid init value.

So, I'm proposing that we require that all ranges have a valid init value. I suspect that the issue will still often be forgotten, particularly since ranges are often Voldemort types, but if the range API explicitly requires it, and the range documentation is clear on that point, then we can clearly consider it a bug when a range has an invalid init value, and in most cases, I wouldn't expect it to be a big deal to fix.

The second issue is then the ability to get an empty range from a range type in O(1) which is the same type as the original range. In my experience, it's sometimes the case that you really want to be able to do something like

range = Range.emptyRange;

or

range.makeEmpty();

in order to make a range empty without eating up CPU cycles calling popFront until empty is true. This is possible with a random-access range by slicing it with a zero length, but it's not possible with lower classes of ranges. std.range.takeNone tries to do it, but it is forced to give you a new range type if the range isn't a random-access range - specifically, it uses takeExactly, but regardless, that doesn't work in cases where you need to keep the range type the same but make it empty (e.g. because you're dealing with a ref parameter). And right now, that means calling popFront over and over again.

Of course, if you're dealing with an infinite range, then there can't be a way to make it empty, but IMHO, we should have a way to get an empty range from an arbitrary finite range type where the empty range is of the same type. And for that, I see three possibly options:

1. Make it so that for finite ranges, the init value of a range is required to not just be valid, but it's also required to be empty.

2. We add a static / enum member to the range API called something like emptyRange where you can get an empty range from that range type where the range is that range type.

3. Add a new member function to the range API called something like makeEmpty which makes that particular instance empty.

My gut reaction is to go with #1, because it doesn't involve any new functions, and in general, the init value probably should be empty. However, my concern there is that there may be ranges where that's going to constrain the implementation in annoying ways. All finite ranges have to have an empty state, because you'll get to empty if you call popFront enough times, so it should be possible for all ranges to have their init value be empty, but I could also see it being the case where requiring that would mean that the implementer of the range would have to jump through some annoying hoops to make it work - though in most cases, if they have to jump through hoops, I would expect that they'd have to jump through most (or all) of those hoops to make the init value valid whether it's empty or not. But I would like feedback on how reasonable folks think it would be to require that a range's init value be empty.

Between #2 and #3, I think that #2 is more desirable, because then you can get an empty range without having to create one with elements first, whereas if we went with makeEmpty, then you'd need to construct a range first just to make it empty. That being said, implementing emptyRange (or whatever we'd call it) wouldn't be all that different from making the init value empty. The main difference I see is that emptyRange wouldn't have to be statically known, which might be useful in some cases, though the kinds of cases that I can think of tend to be ones where making the init value valid would require additional logic, in which case, in going to the effort of making the init value valid, you might as well make it empty anyway.

I expect that for some ranges, makeEmpty would be easier to implement, and it would probably also work better if we decide to make basic input ranges non-copyable, since mutating a non-copyable range would be more straightforward than moving an empty one on top of it. But I would think that we could make the latter work, and in the general case, being able to have an empty range for any finite range in O(1) would be nice to have without having to create a range just to make it empty.

Either awy, one advantage to having emptyRange / makeEmpty over using the init value as empty would be that while adding a new function would be annoying, it would force anyone implementing a range to take into account the fact that they need to provide a way to get an empty range. In many cases, it may just be

    alias emptyRange = typeof(this).init;

but range implementers would be forced to think about it, whereas if we just made it so that the documentation said that they had to make the init value empty, they'd be less likely to notice. But it's still extra stuff that has to implemented which would be completely unnecessary with ranges that would have empty init value anyway.

A related issue is that if we're removing save from forward ranges, and we keep the rest of the functions the same, some of the old basic input ranges then look like the new forward ranges (since they wouldn't have save, and forward ranges wouldn't require it), which would be bad, because they wouldn't have the correct semantics. To fix that, we either need to add something to the range API (be it a whole new function like emptyRange or something like an enum that flagged a type as being a forward range), or we'd have to change the existing functions (be it something as simple as changing empty to isEmpty or something more complicated like switching from front/popFront to head/tail, which would also allow us to get around the tail-const problem but would be a pretty massive change to how ranges work). Adding emptyRange or makeEmpty _almost_ fixes that problem, but since infinite ranges wouldn't have have it (since they couldn't implement it), it wouldn't actually fix the problem. Some old basic input ranges which are infinite would still pass the new isForwardRange.

So, if it weren't for the infinite range issue, I'd see the fact that it would change the API enough to distingush between the old and new APIs as a good reason to go with emptyRange over requiring that the init value be empty, but since it doesn't fix the issue with infinite ranges, I don't think that it makes sense to go with emptyRange / makeEmpty just to differentiate between the old and new range APIs (it would make it more obvious to programmers using these ranges, which could be valuable - but then again, having isEmpty instead of empty would do the same).

So, hopefully that makes the situation with those two issues clear. The question then is what insights you folks might have on them and the proposed solutions. It may even be that it's way too onerous to require that the init value be valid, and someone has a use case to show that, though I very much hope that that is not the case. I expect though that the bigger question is whether init should be required to be empty for finite ranges or whether it would be better to add a new function for that.

Thoughts?

- Jonathan M Davis



March 04
On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
> 1. Make it so that for finite ranges, the init value of a range is required to not just be valid, but it's also required to be empty.

Makes a lot of sense, but I agree it will be very annoying for some ranges.

I suspect it will result in extra state and/or checks for those cases.

Would it be too crazy of an idea for those ranges to implement a static init function?
March 04
On Monday, March 4, 2024 2:58:08 PM MST Sebastiaan Koppe via Digitalmars-d wrote:
> On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
> > 1. Make it so that for finite ranges, the init value of a range is required to not just be valid, but it's also required to be empty.
>
> Makes a lot of sense, but I agree it will be very annoying for some ranges.
>
> I suspect it will result in extra state and/or checks for those cases.
>
> Would it be too crazy of an idea for those ranges to implement a static init function?

You mean define a static function named init? I think that most of us agree at this point that it was a mistake to allow init to be anything other than the compiler-defined init, and as I understand it, it doesn't really work to redefine it in the way that Walter intended. Realistically, code tends to depend on the init value being the compiler-defined one, and trying to do anything else is asking for trouble. A redefined init value doesn't actually get used with default-initialization, so with a case like

struct Foo
{
    ...
    typeof(chain(func1(), func2(), func3())) _range;
    ...
}

any init that you declare will be ignored (even if it's a value and not a function), meaning that unless that range type has a valid init value, that variable won't work properly unless it happens to be assigned a value before it's used.

We could of course add a function / enum to the range API which would be required to be used instead of the init value when you're looking for some kind of initialization, but that doesn't solve the problem of default-initialized values. For that, it really needs the init to be valid.

Of course, we _could_ allow ranges to @disable default initialization, and then the type could define init with a static function which would be used when init was used explicitly, but that would potentially cause problems if init were used explicitly at compile time. It's also on the list of things that I half expect to be disallowed at some point with a future edition, because allowing init to be anything other than the default one tends to cause issues. So, we _might_ be able to make it work, but my gut reaction is that we're playing with fire, and we'll be better off overall if we can just require that the compiler-defined init value be valid.

Still, if we allowed it, at least it should result in compile time errors in generic code rather than weird runtime behavior. But it would mean that generic code couldn't rely on being able to default-initialize ranges, and it would have to give all ranges a value at runtime, which would be annoying, and it realistically wouldn't happen outside of something like Phobos (and it would likely only happen in Phobos if that specific case was tested for by Phobos functions in general). Code in general expects default initialization to work, and it's usually only code specifically written to work with types that don't allow it where it actually works to have a type that @disables it.

So, I don't know. I'd really rather not play games with trying to redefine the init value. I also suspect that most of the annoyance that ranges might have with the init value would be there simply by requiring that the init value be valid without requiring that it be empty, in which case, requiring that it be empty wouldn't be onerous. It's requiring that it be valid which would potentially be onerous. But I'm also not sure how much of an issue that that would be in practice. For most ranges, I expect that it will be pretty simple, and many of them already have a valid, empty init value. In the vast majority of cases, I expect that if a range's init value isn't valid, it's because the person who wrote it didn't think about the init value being used (since it's probably a Voldemort type) and didn't test that case.

And if the issue of it being problematic to make the init value valid isn't common, I'm disinclined to make range-based code in general have to worry about it - especially if it's still possible to make such ranges work (even if it's more annoying than would be desirable). I can also think of several ways to make it easy right off the top of my head, though you wouldn't necessarily want the additional overhead (e.g. any range could just use is to compare itself to its init value in empty, or it could have a bool to indicate whether it had been initialized via a constructor and check that; it would be annoying but easy).

But I'm also not entirely sure how onerous it will be in practice to make the init values of ranges valid (be it empty or otherwise) - though it's actually _really_ easy in the cases where it would segfault now, since if such types then have a wrapper struct, it can easily check whether the pointer or reference is null and return true for empty.

- Jonathan M Davis



March 05
On Monday, 4 March 2024 at 21:29:40 UTC, Jonathan M Davis wrote:
> Of course, if you're dealing with an infinite range, then there can't be a way to make it empty, but IMHO, we should have a way to get an empty range from an arbitrary finite range type where the empty range is of the same type. And for that, I see three possibly options:
>
> 1. Make it so that for finite ranges, the init value of a range is required to not just be valid, but it's also required to be empty.

I think this is the best solution.

> A related issue is that if we're removing save from forward ranges, [...] Adding emptyRange or makeEmpty [...] wouldn't actually fix the problem. Some old basic input ranges which are infinite would still pass the new isForwardRange.

Are there infinite ranges that are indeed not forward ranges?
I mean - they produce their output somehow algorithmically.
Should be possible to safe the state of such an algorithm, no?
Ok, maybe someone has not implemented it, but as we are requiring things anyway, why not require an infinite range to implement the copy function?

March 05
On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
> Are there infinite ranges that are indeed not forward ranges?

/dev/random could be such a range.
March 06
On Tuesday, 5 March 2024 at 16:41:34 UTC, Alexandru Ermicioi wrote:
> On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
>> Are there infinite ranges that are indeed not forward ranges?
>
> /dev/random could be such a range.

In the case of /dev/random I don't really see how it's a true infinity, you can't go infinitely backwards. It is an infinitely forward moving stream, so you can have the concept of "zero" and "positive infinity", but there is no concept of "negative infinity". A negative number is utter non-sense for TRNG's, as whatever negative number you specify would become the new "zero".

Practically there is a limit to how much entropy you can grab off the the chip in single chunk, so it is a bounded range, it's just that the boundary is somewhat fuzzy.

In short you would probably end up implementing an entropy source as a forward range with some large upper limit.
March 05
On Wed, Mar 06, 2024 at 12:11:10AM +0000, Adam Wilson via Digitalmars-d wrote:
> On Tuesday, 5 March 2024 at 16:41:34 UTC, Alexandru Ermicioi wrote:
> > On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
> > > Are there infinite ranges that are indeed not forward ranges?
> > 
> > /dev/random could be such a range.
> 
> In the case of /dev/random I don't really see how it's a true infinity, you can't go infinitely backwards.

Nothing about an infinite range requires that it be infinite both ways, unless it's a bidirectional range, which an RNG is not.


[...]
> Practically there is a limit to how much entropy you can grab off the the chip in single chunk, so it is a bounded range, it's just that the boundary is somewhat fuzzy.
[...]

A periodically-reseeded RNG is indeed a practically infinite range, with no cycling.  You don't have to grab every value from the hardware entropy source; it suffices to use a cryptographic hash function on a counter that's periodically reseeded from the hardware entropy. It can literally generate an endless stream of random numbers.


T

-- 
Tell me and I forget. Teach me and I remember. Involve me and I understand. -- Benjamin Franklin
March 06
On Wednesday, 6 March 2024 at 00:23:26 UTC, H. S. Teoh wrote:
> A periodically-reseeded RNG is indeed a practically infinite range, with no cycling.  You don't have to grab every value from the hardware entropy source; it suffices to use a cryptographic hash function on a counter that's periodically reseeded from the hardware entropy. It can literally generate an endless stream of random numbers.
>
>
> T

That is true in theory, but in practice if you try it on real hardware, not only will you pay some pretty serious performance penalties as the CPU tries to dump all that entropy, it will be dumping it to memory, of which you will eventually run out. So yes, it's theoretically unlimited, but in practice, there is no valid reason to actually implement it that way, and to-date, no modern Operating System entropy source allows you to. For example, on Windows, you'll be passing a fixed size buffer to the entropy source. Same with OpenSSL.

We do not design code for what is theoretically possible, only that which can actually be achieved. I know, because I wrote a Crypto library for D that specifically deals with this.
March 06
On 05.03.24 17:41, Alexandru Ermicioi wrote:
> On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
>> Are there infinite ranges that are indeed not forward ranges?
> 
> /dev/random could be such a range.

Or a range that returns captured data from a sensor.
March 06
On Wednesday, 6 March 2024 at 08:56:02 UTC, Arafel wrote:
> On 05.03.24 17:41, Alexandru Ermicioi wrote:
>> On Tuesday, 5 March 2024 at 14:39:17 UTC, Dom DiSc wrote:
>>> Are there infinite ranges that are indeed not forward ranges?
>> 
>> /dev/random could be such a range.
>
> Or a range that returns captured data from a sensor.

Nice example,

Basically, any global resource (sensor, /dev/random, etc.), that can be representable as a range, and doesn't have a definitive end, is an infinite input range, since you can't save it at a specific point and then continue multiple times from that point.

Best regards,
Alexandru.
« First   ‹ Prev
1 2 3 4 5