Jump to page: 1 2 3
Thread overview
May 16
Okay. This is my proposal for an updated input range API for Phobos v3. I previously sent it out to some or the core D folks, and the feedback has largely been positive, but I'm now posting it to the newsgroup for wider feedback.

And yes, this is long, but there's a lot to cover, and I want both what's
changing and why it's changing to be clear. Not that anyone expects me to be
short when writing something like this up anyway...
===

This is a proposal for an updated input range API for Phobos v3.

Input ranges as defined in Phobos v2 have worked extremely well for us thus far, but experience has shown that there are some problems with them, and we would like to address and fix them with Phobos v3. So, this proposal attempts to leave input ranges the same as much as reasonable while making the necessary changes to fix the known problems.

Output ranges will be covered in a separate proposal.

A Note on the Enforcement of Range API Requirements
---------------------------------------------------

As a general note with regards to any requirements that the range API has (be it the current version or the new one), some requirements can be and are enforced with the traits for ranges (e.g. isInputRange requires that front result in a value and will be false if front is void), whereas other requirements cannot actually be enforced (e.g. that front returns the same value each time that it's evaluated until popFront is called). As such, the range API statically enforces what it can but can't enforce everything.

This means that when the range API places a requirement on ranges and their behavior, and that particular requirement cannot be statically enforced, it _is_ possible to write a type that violates the range API but passes all of the associated traits and template constraints. Such a type is considered to be in violation of the range API, and there are no promises that any range-based function will behave correctly with such a range (since the range itself does not have the required behavior). Range-based code in general is free to assume that any ranges that it's given correctly implement the range API, and it really doesn't make sense for the situation to be otherwise.

As such, when this proposal says that a range is required to behave in a certain way, that does not necessarily mean that that behavior is statically enforced (just like not all requirements of the current range API are statically enforced). They will be enforced where practicable, but in some cases, it simply means that it's a requirement that any ranges must meet if they're going to work correctly with normal range-based code, and anyone writing range-based code can assume that those requirements have been met. As is the case with many things, testing will then be required to make sure that the code is working correctly with regards to anything that couldn't be statically enforced.

Most of the changes in this proposal revolve around being stricter with the behavioral requirements for the range API so that range-based code can actually rely on certain behaviors of ranges (like their copy semantics), whereas the current range API has been lax on a number of aspects of the semantics of ranges, making it impossible for generic range-based code to rely on those semantics.

Now, on to discussing what actually needs to be changed...

Some of the Problems with Input Ranges
--------------------------------------

1. Auto-decoding has not solved the problems that it was intended to solve (if nothing else, because graphemes are a thing, and auto-decoding does not take that into account), and it has caused both performance problems and a proliferation of complications (in particular with all of the code that attempts to work around auto-decoding to regain performance and avoid having strings be wrapped by other range types where possible). So, we clearly would like to get rid of it.

2. Requiring save for forward ranges has been a source of bugs. Most forward ranges do not require it, so it is routinely forgotten, causing bugs for any ranges that do require it. Ideally, we would be able to just rely on copies of forward ranges being independent and thus not need save at all. It would also bring the semantics of ranges that much closer to dynamic arrays / slices, which ranges are effectively trying to be a generalization of, just like iterators in C++ are effectively a generalization of pointers.

3. The semantics of copying from and assigning to ranges are not defined by the current range API, and because ranges allow such a variety of types with such a variety of semantics for copying and assignment, we cannot actually define the semantics of copying or assignment for ranges with the current API. This puts some annoying restrictions on writing correct generic code (e.g. not being able to use a range once it's been copied), and ideally, we would be able to actually define the semantics of copying and assignment so that generic code can rely on it.

4. There is no guarantee that the init value of a range is even valid - or that a range can be default-initialized. This causes problems in a number of cases but in particular when it's necessary to store a range as a member variable. One recent example of this is https://issues.dlang.org/show_bug.cgi?id=24202 where a project incorrectly relied on std.range.chain's init value being both valid and empty (which it was previously but then wasn't after some changes were made to chain to improve its performance). As things stand, code that wants to be able to store a range as a member variable and to do so completely correctly needs to do something like wrap it in a std.typecons.Nullable to ensure that the init value isn't actually used, and of course, no one actually does this or even really thinks about it. In practice, they end up relying on the behavior of whatever ranges their code currently happens to use, and they almost certainly don't realize that they're relying on unspecified behavior that could change in the future. Ideally, we would be able to rely on the init value being valid - and preferrably empty (at least for finite ranges) - which is not possible with the current range API.

5. The range API does not provide any way to get an empty range from ranges that can't be sliced other than iterating through the entire range (ranges that can be sliced allow it by using the slice operator with indices that make the length 0). takeExactly allows us to make an empty range from any range, but the type changes in the process, which does not work in any situation where code needs to make a range empty but cannot change the type (e.g. if it's dealing with a ref parameter or a member variable and needs to set that variable to an empty range).

6. The range API does not currently specify what happens if you copy front and then call popFront. For the vast majority of code, it is both assumed that you can copy front and that the copy of front will be unaffected by calling popFront. However, there _are_ ranges (such as std.stdio's ByLine) which will do things like use a dynamic array as a buffer which has its elements overwritten in popFront, resulting in the elements of the copy being mutated, since it's a slice of the same data. Such ranges are sometimes referred to as having a transient front, and they do not work with a lot of range-based code. As such, they either need to be disallowed, or we need a way to detect such behavior so that range-based code can check for it.

7. Related to the issue of transient front is the fact that range-based code in general tends to assume that front is copyable, meaning that it doesn't work with non-copyable types. This isn't a problem with the range API per se, but we probably do need to better formalize what you can assume you can do with front vs what you have to test for.

8. $ cannot be used with random-access ranges in generic code, because the range API does not require that a random-access range define opDollar.

9. const makes ranges useless. Realistically, a const range is not a range, because you can't call popFront, and you can't iterate through it (technically, it _is_ possible to have popFront be const in cases where the iteration state doesn't actually live in the range, and popFront isn't pure, but normal ranges do not work that way).

Most range-based code doesn't actually need to worry much about const, since it's just iterating through the range, and the constness of the elements often doesn't even matter, since the code is templated and works regardless of the mutability of the elements. However, once a range is stored in another type, constness can quickly become a problem. This not a problem with dynamic arrays / slices, because they can be sliced to get a tail-const slice of the original, and templates are actually instantiated with the tail-const type for dynamic arrays, but other range types do not have such compiler magic at present.

10. While it's not exactly a problem with the range API, it would be nice if we didn't have to import std.range / std.range.primitives (or the Phobos v3 equivalent) to use arrays as ranges. In theory, we could add the range API functions for arrays to object.d, but doing so would conflict with the existing functions in std.range (and bugs with selective imports actually make it somewhat problematic to resolve those conflicts - see https://issues.dlang.org/show_bug.cgi?id=24451). And if we're getting rid of auto-decoding, the range API functions for arrays of characters would change, meaning that we can't solve the problem by putting the functions in object.d and then aliasing them in std.range.primitives to avoid problems with selective imports from std.range or std.range.primitives.

So, the symbol conflicts which would result from putting the range API functions for arrays in object.d do potentially place some restrictions on what we do with the new range API.

Overview of Proposed Changes
----------------------------

1. The easy one is that the range API functions for dynamic arrays will not treat arrays of characters as special. A dynamic array of char will be a range of char, and a dynamic array of wchar will be a range of wchar.

Any code that needs to decode will need to use the phobos v3 replacement for std.utf's decode or decodeFront - or use foreach - to decode the code units to code points (and if it needs to switch encodings, then there will be whatever versions of byUTF, byChar, etc. that the replacement for std.utf will have). And if code needs to worry about normalization or other grapheme stuff, then it'll need to use that functionality from the replacement of std.uni (which is basically what ranges have to do right now anyway). We may choose to still have UTFException, but it will be up to the programmer to choose whether they want to use the replacement character or to throw exceptions when decoding, because the decoding will no longer be automatic. As such, using the range API on dynamic arrays will work with @nogc, and we won't need a bunch of extra code just to avoid the auto-decoding.

This may result in more bugs where programmers slice into the middle of code points, but auto-decoding just makes that harder rather than actually preventing it, and there's really no fix for that without doing something like operating at the grapheme level by default, which would be horribly inefficient. We probably should have better documentation and tutorials for how to write range-based code which deals with Unicode correctly, but ultimately, education is the solution here, not trying to make everything work at the code point or grapheme level by default. It may make sense to add a string type of some kind to Phobos which operates at the grapheme level with its API but which tries to be efficient internally, but that's really not a question for the range API. It seems to be generally agreed upon that arrays of code units should be treated as ranges of code units, so that's what the new range API will do.

2. All ranges which can be default-initialized must have a valid init value (that is, their range API functions must behave according to the normal rules for those functions and can't do something like have empty be false and then have front fail to return a valid value; the init value of all ranges that can be default-initialized must work properly as ranges).

By requiring that the init value be valid, we avoid all of the problems that we've been having with ranges which behave badly when they're default-initialized. Instead, generic code will be able to rely on a default-initialized range behaving sanely, and the default behavior will be what most programmers expect, whereas right now, some ranges will actually segfault if their init value is used (e.g. if the range type is a class).

Of course, this does make it so that classes, interfaces, and pointers can't be ranges, but there are ways to work around that (as will be discussed later).

Ideally, we would also disallow ranges from disabling default initialization so that generic range-based code can rely on it, but that causes problems with ranges that need to be constructed at runtime to function properly. That problem can be worked around fairly easily with finite ranges, since worst case, you can always just add a flag to them which indicates whether they were default-initialized and then have empty return true in that case (without having to care about what the range normally does). It's also likely that such ranges are rare, since it's almost always the case that you can rework the logic of a finite range to have a valid init value.

However, with infinite ranges, there is no such solution. If they cannot be default-initialized, then they either can't be ranges, or they would have to be finite ranges which would just never be empty if they're constructed at runtime (while doing something like the flag trick to make their init value empty). And it's certainly true that the range API doesn't (and can't) guarantee that finite ranges are truly finite, but it's still better if we can define infinite ranges that need to be constructed at runtime as infinite ranges, since then we can get the normal benefits that come from statically knowing that a range is infinite.

Alternatively, we could disallow finite ranges from disabling default initialization (since they have a workaround) while allowing infinite ranges to disable it, but that seems like it's too much special-casing and like it would add to the confusion overall.

And maybe it would be better to just force infinite ranges which can't work properly unless they're constructed at runtime to be finite ranges given that they're likely to be pretty rare, but ultimately, I think that it's more straightforward to just require that the init value be valid if a range can be default-initialized. The issues with types that can't be default-initialized already are something which causes problems for generic code in general, and it would not require any special explanation with the range API.

It would also fit in with the language better if we ever add default constructors with a future Edition (not that I expect that to happen, but forcing range types to have a valid init value would make that harder).

The only real problem I see is that there will likely be some generic code which will be written to assume that default initialization works for all ranges, but again, that's a general problem with types that disallow default initialization, and the bug that you then get is a compilation error, whereas forgetting something like save results in code that compiles but does the wrong thing. So, when we do run into such bugs, they won't silently cause problems.

Weighing the pros and cons here, I think that we're better off just letting ranges disable default initialization (much as it's annoying in general when types do that). Simply requiring that the init value of a range that can be default-initialized be valid fixes the core problem, and then the initialization problems that we're left with are not range-specific.

3. If a range is finite (that is, it's not statically known to be empty), and it can be default-initialized, then its init value must be empty. If it cannot be default-initialized, then it must define a function called emptyRange which returns an empty range of the same type. Phobos will then define a function called emptyRange which takes a finite range which can be default-initialized and returns the init value of that range.

The result of this is that we have a reliable way to get an empty range from any finite range without changing the type of the range. If we could require that finite ranges be default-initializable, then we wouldn't need emptyRange in the same way, but even then, such code would have to check whether the range was infinite to see if assuming that init was empty was correct, whereas this way, code can just call emptyRange, and it'll get a compilation error if it's given an infinite range. So, code that needs an empty range will get compiler checks which verify that the range is finite, and it'll work whether the range can be default-initialized or not (though it'll only work with CTFE if the range can be default-initialized).

Of course, this entry doesn't apply to infinite ranges at all, since whether they can be default-initialized or not, their empty is always false. The fact that that can be determined statically is what makes them infinite ranges.

4. All ranges must be either dynamic arrays or structs (and not pointers to
structs either).

This allows us to have consistent requirements with regards to copying, assignment, and initialization (e.g. it's not possible to require that the init value be valid if classes can be ranges). Most ranges are already dynamic arrays or structs anyway, so for _most_ range-based code, this requirement is a complete non-issue.

However, the main problem that we then have is the ranges that actually need to be classes or interfaces. This happens when you need to do stuff like wrap arbitrary ranges with a single range type at runtime (which std.range.interfaces currently provides the functionality to do). The solution to this problem then is to provide struct wrappers. That way, a range that needs to be a class / interface will instead be a struct wrapping a class / interface reference. The struct can then ensure that its init value is valid, and it can ensure that the range has the correct copying and assignment semantics. So, the replacement for std.range.interfaces will then provide not only classes and interfaces, but it will provide the appropriate struct wrappers as well.

So, a function like inputRangeObject might return something like RangeWrapper!(InputRangeObject!Foo) instead of InputRangeObject!Foo like it would currently do. The exact details still need to be sorted out, but there should be no technical barrier to replacing all ranges which are currently classes or interfaces with ranges that are structs wrapping classes or interfaces, which would allow us to make it illegal for classes or interfaces to be ranges without losing the actual functionality that such ranges currently provide.

5. Forward ranges will no longer have save. Rather, it will be required that copying a forward range will result in an independent copy. Both the original and the copy must then have the same elements in the same order, and iterating one will not affect the other. So, in essence, copying a forward range must be equivalent to calling save (though the actual implementation could be more clever than that).

So, for most forward ranges, this will simply mean that they won't need a save function that just returns the this reference, and range-based code will not need to call save.

For the ranges that would currently require that save be called, they will have to be implemented in a way that save is no longer required (which is made possible by disallowing pointers and references from being ranges). The naive implementation would therefore just have the copy constructor be equivalent to save, whereas a more sophisticated implementation would likely use reference counting and only duplicate the internal state when the reference count isn't 1, and a function is called which would mutate the state (e.g. doing the equivalent to calling save when popFront is called, and the reference count isn't 1). Obviously, the struct wrappers that the replacement for std.range.interfaces would provide would implement the correct copying functionality so that save would not be required, and anyone using a class or interface as a range wouldn't actually have to implement that behavior either.

6. Basic input ranges will be non-copyable.

Forward ranges can be either value types or pseudo-reference types which are value types with regards to their iteration state (that is, their actual elements may be outside of the range type itself, but the iteration state is copied when the range is copied). The fact that their iteration state can be copied is what makes it so that they can be forward ranges.

On the other hand, basic input ranges can't have their iteration state copied; otherwise, they could be forward ranges. So, currently, they are either reference types, or they are pseudo-reference types where a copy does not copy the full iteration state, leaving the original in an invalid state if anything is done to the copy (e.g. front is stored directly in the struct, but the rest of the elements are on the heap or come from somewhere else outside the struct like a socket, meaning that front will be stale on the original after popFront is called on the copy).

So, fundamentally, basic input ranges have different copy semantics than forward ranges. However, we _can_ have consistent copy semantics for all basic input ranges and consistent copy semantics for all forward ranges - just not consistent semantics between the two. The difference in their copy semantics is fundamentally the difference between those two kinds of ranges.

So, for forward ranges, as described in the previous entry, we fix the problem of inconsistent copy semantics by requiring that copying them result in independent copies, whereas we have two options for basic input ranges:

1) Either we need to require that basic input ranges be reference types (thereby avoiding screwed up copies, because the copying is basically just copying a pointer)

2) or we need to make them non-copyable - in which case, they can still be moved, but because they're non-copyable, you don't get situations where iterating through one copy partially mutates the other, since there's only ever one copy.

In order to fix the init problem, we already need to disallow classes and pointers, which would mean that if we required that basic input ranges be reference types, they would have to be structs wrapping pointers or references, and it would be difficult to statically determine the difference between a basic input range and a forward range (since there will no longer be a save function to do that). So, we'd be forced to give basic input ranges a different API in order to distinguish them - which isn't all bad, since it's not always a good idea to have basic input ranges and forward ranges share the same code given that they have different copy semantics, but it would be nice to still be able to share code in the cases where the copy semantics don't matter. Also, requiring that basic input ranges be reference types would likely require some additional indirections in some cases, since you then can't put anything directly in the range itself (because all of the actual state would need to be in the type pointed to by the wrapper struct in order for it to be a reference type).

On the other hand, if we just make basic input ranges non-copyable, then the difference between them and forward ranges is then trivial to detect with __traits(isCopyable, ...), and you can put as much state directly in the basic input range as you want. The downside, of course, is then that you can't copy the range. However, it then makes the semantics of what's going on _very_ clear. A forward range is copyable, and a basic input range is not.

This should make the fundamental difference between a basic input range and a forward range stark in a way that it arguably isn't with save, even though the difference is still fundamentally the same. Forward ranges can have their iteration state copied, whereas basic input ranges can't. But instead of having that be determined by a special range API function that most folks forget to call, it will instead be a clear difference in copy semantics that will actually result in compilation errors if you get it wrong.

Of course, the major downside here is simply that code will be forced to use explicit moves in a bunch of cases. But it's certainly better than copying a range and then using the partially mutated (and thus invalid) original like you can easily do now. And DIP 1040 should improve the situation considerably by adding implicit moves for the last use of a variable.

It will also help that most ranges are forward ranges anyway, so while this will be annoying in some code (probably most commonly in the functions in Phobos that are written to work with both basic input ranges and forward ranges), most code probably won't have to deal with it, and the code that does will then be avoiding subtle copying bugs like it would potentially have now. So, it's one of those cases where the compilation errors could get annoying, but they're also clearly preventing bugs.

As an aside, having non-copyable basic input ranges will be beneficial for the replacement for std.random. It's historically been a problem that accidentally copying random number generator ranges results in numbers being repeated, and by making them non-copyable, that won't be a problem anymore. We can then add a separate function to them (e.g. dup) to explicitly to get a copy from them when we actually want to.

7. Assigning a forward range to an lvalue of the same type should have the equivalent semantics as copying a forward range, with the difference of course being that the lvalue is then completely replaced with the new state as opposed to generating a completely new copy.

These should be the obvious semantics for assignment, but it's not something that we can currently rely on for all of the same reasons that we can't currently rely on the semantics of copying a range.

8. Assigning an basic input range to an lvalue of the same type should only work if it's a move rather than a copy, since basic input ranges are non-copyable.

Again, these should be the obvious semantics for assignment given what the copy semantics are.

9. In order to deal with transient front, we will explicitly disallow it. It will be a requirement that if front is copied, the copy will be unaffected by popFront. Of course, this doesn't prevent other references to the same data from mutating it (e.g. if it's a reference type), but calling popFront will not mutate it.

Therefore, code which wishes do something like std.stdio's ByLine needs to use a different solution, e.g. it can use opApply instead, or it can make front non-copyable, since if it's non-copyable, then popFront can mutate it without worrying about any copies being mutated.

10. With regards to ranges of non-copyable elements, there will be a clear way to test for whether a range has copyable elements, and range-based code that won't work with non-copyable elements should disallow them. This may be as simple as __traits(isCopyable, ElementType!R), or it may involve a new trait such as hasCopyableElements!R. It's undecided yet whether an additional trait will be necessary, but the range documentation will be clear on the matter, and range-based Phobos code that needs to copy elements will test for copyability.

11. Finite random-access ranges are required to implement opDollar, and their opIndex must work with $. Similarly, any ranges which implement slicing must implement opDollar, and slicing must work with $.

In most cases, this will just be an alias to length, but barring a language change that automatically treats length as opDollar (which has been discussed before but has never materialized and is somewhat controversial given types where it wouldn't make sense to treat length as opDollar), we have to require that opDollar be defined, or generic code won't be able to use $ with indexing or slicing. We probably would have required it years ago except that it would have broken code to add the requirement.

12. We're going to rely on some form of language improvement (such as Walter's recent Implicit Conversion of Template Instantiations DIP) to solve the tail-const problem.

As an alternate solution, we could rework the range API to do be more like a list in a functional language - e.g. head returns the first element, and tail returns a range containing all of the elements except for the first one. This would allow us to define tail's return type as being tail-const, but that could get interesting with assignment (e.g. you can't assign Range!(const Foo) to const!(Range!Foo)), and basic iteration would require assignment with such an API, e.g.

    for(auto r = range; !r.empty; r = r.tail)
    {
        auto e = r.head;
    }

So, we'd probably be forced to add a separate function just to get a tail-const copy (since tail wouldn't necessarily return the exact same type as the range, because tail would be returning a tail-const copy minus head), e.g.

    for(auto r = range.tailConstCopy; !r.empty; r = r.tail)
    {
        auto e = r.head;
    }

And at that point, the actual solution to the tail-const issue isn't really changing to an API that copies the range so much as providing a function that returns a tail-const copy and routinely using that, which means that we could fix the tail-const problem with the current API simply by adding a function which gives you a tail-const copy. As such, switching to an API that copies ranges to iterate through them rather than popping off their elements in-place doesn't really buy us anything.

Regardless, adding a function to get a tail-const copy would be putting us in pretty much the same boat that we're in now with save - only probably worse. It would really only be needed in situations where a range has become const somehow, meaning that most range-based code would not be tested with such ranges, and the new function would be forgotten just like save (only it would probably be forgotten even more). So, I'm inclined to think that a language solution is a much better solution.

Also, with regards to a range API that copies for iteration, concerns have been raised about its performance vs popFront. I ran some basic tests to see if that was a problem, and it didn't seem to make any difference, but it wouldn't surprise me if it did with more complex types (which basically would have required duplicating a bunch of std.algorithm with a different range API to compare the performance, which I didn't take the time to do). So, I don't know how valid those concerns are, but I do agree that it would be giving the optimizer more to optimize out.

Regardless, I decided that it would just be too large a change to the range API with too little benifit, and if we can just implement tail-const conversions in the language (which we arguably should be doing for more than just ranges), then that solves the problem. And if we decide that we really do need a new range API function to fix the problem, we can just add it (though if we do, hopefully we do it before the initial release of Phobos v3 so that we can require it without breaking code).

13. The range API functions for basic input ranges and forward ranges will be changed from front, popFront, and empty to first, popFirst, and isEmpty. And the range API functions for bidirectional ranges will be renamed last and popLast.

I'm not a big fan of this, since it would be nice to keep the same function names, and making this change requires a language change for foreach. However, it solves two problems:

First, if we remove save from forward ranges, then the old basic input ranges look like the new forward ranges so long as they're structs and copyable. So, if we don't change the forward range API in some fashion, then it will be easy to pass ranges that aren't really forward ranges to code that requires forward ranges, and bugs will ensue.

Second, if we want to make it so that dynamic arrays / slices are forward ranges without requiring an import, then we're going to get a _lot_ of symbol conflicts. Almost all of the range-based code currently in existence imports either std.range or std.range.primitives, and _all_ of that code would get symbol conflicts if we added the range API functions for arrays to object.d. That can be managed with an Edition, but it would mean that tons of code would have to be updated with the new Edition. On the other hand, if we use a different set of function names for the range API functions that need to be added to object.d for arrays, then we don't get all of those symbol conflicts, and only new code which is written to use the new range API will need to worry about the name changes.

Of course, if we add the functions to object.d, we arguably should still do it with an Edition if we want to minimize the risk of symbol conflicts (since there could be code in the wild with conflicting symbols; but if we rename the functions, then we're talking about _possible_ symbol conflicts rather than the guarantee of a ton of them like we'll get if we don't rename them).

And https://issues.dlang.org/show_bug.cgi?id=24451 makes putting the current range API functions in object.d even worse, since selective imports would only work with std.range.primitives and not std.range, meaning that a lot of the code that currently uses selective imports would still break, though if/when that bug is fixed, then that problem goes away.

Of course, if we decide that it's too much trouble to put the range API functions for arrays in object.d, we could solve the problem of the old basic input ranges looking like the new forward ranges by renaming only a single function - e.g. empty to isEmpty - since even a single difference in the API would be enough to make them different.

Alternatively, we could add an enum of some kind to the new range API to make it different. It would be kind of ugly, but it would allow us to keep all of the current function names - though we'd still have to import a module to use arrays as ranges, so that problem wouldn't be fixed.

Of course, as annoying as it would be to rename the range API functions, it _would_ have the advantage that you could see at a glance that code was using the new range API rather than the old one. And that could definitely be valuable. It also means that if you accidentally use the range traits from the wrong version of the API, you'll get errors _very_ quickly, since rather than being partially compatible, they won't be compatible at all.

So, while I'm definitely not happy about renaming the range API functions, after having considered the options, I think that it's the best path forward.

Of course, if we rename them, that unfortunately also opens up the door for bikeshedding, but I think that first, popFirst, isEmpty, last, and popLast are pretty obvious choices, and they correspond well to the old names.

Either way, it's my intention that Phobos v3 will have wrappers such that you can easily convert a range from the v2 API to a range that works with the v3 API and vice versa. In all likelihood, the main annoyance with the new function names will simply be getting used to them. It'll be easy enough to update the names if code is ported from v2 to v3, and any such ranges will need to be looked over to make sure that they match the new requirements anyway (though most ranges probably already do fulfill most of the added requirements in this proposal).

Conclusion
----------

So, that covers all of the major points that I can think of right now. Of course, there will be other minor changes (e.g. I'll probably rename isForwardRange to isCopyableRange), but those are the core changes, and this document is already ridiculously long as it is. However, I think that it's covered what really needs to be covered.

I do need to put together a specification which lists the range API in its entirety rather than as a proposal for what to change and why to change it (if nothing else, because some form of that needs to be part of the documentation), but this document is already ridiculously long without that, so I'm leaving it as just the list of problems and proposed changes rather than including a full spec.

In any case, have fun picking apart and complaining about the proposal. :)

- Jonathan M Davis



May 17
Two things I suggest we consider, given the write up:

1. Automatic boxing of slices into a struct, this was something I was considering for signatures specifically for ranges.
2. Enable passing the this pointer on a class explicitly, coupled with a few final methods you could handle when the class is null and then forward to the implementation.

May 16
On Thursday, May 16, 2024 9:33:26 AM MDT Richard (Rikki) Andrew Cattermole via Digitalmars-d wrote:
> Two things I suggest we consider, given the write up:
>
> 1. Automatic boxing of slices into a struct, this was something I was considering for signatures specifically for ranges.

What would that buy us? I don't see how it adds any value.

I guess that having a struct would negate the need to import the range API functions for arrays, but instead, we'd have to import the struct (or put it in object.d), and then we'd have to explicitly wrap arrays, whereas right now, all you need is an import to treat arrays as ranges, and with my proposed changes, you wouldn't even need that anymore.

Also, considering that ranges are basically supposed to be an abstraction of dynamic arrays / slices, similar to how C++ iterators are an abstraction of pointers, having to wrap dynamic arrays in structs is arguably taking things in the wrong direction. It would be making it so that dynamic arrays aren't ranges instead of making ranges in general more like dynamic arrays.

It would also get really annoying to have to constantly wrap arrays just to pass them to range-based code - and it would get extra awkward in the cases where a function that used to operate on an array was changed to operate on a range instead, since then you'd either be forced to treat arrays differently than normal with such a function or break code that used to pass arrays, forcing it to wrap them in a struct. And since strings are dynamic arrays, all of this would apply to strings too.

>From what I can see, trying to stop dynamic arrays / slices as ranges just
makes things worse.

> 2. Enable passing the this pointer on a class explicitly, coupled with a few final methods you could handle when the class is null and then forward to the implementation.

That would destroy our ability to rely on the copy or assignment semantics of ranges. The whole point of disallowing pointers or classes as ranges would be so that we can rely on forward ranges being copied without needing a function like save - and so that we can rely on the init value being valid. Having free functions to handle the case where a class reference was null would fix the init problem (at the cost of needing to import those functions), but it doesn't solve the problem that we cannot have reliable copy or assignment semantics if we allow reference types to be ranges.

Requiring that class references be wrapped in structs allows us to have reliable copy and assignment semantics without losing any functionality, and it makes the semantics of ranges in general more consistent - and closer to that of dynamic arrays / slices. It's already the case that ranges which are classes are pretty rare. Some use cases require them, but the vast majority of range-based code uses structs. I really don't see any value in trying to support naked classes as ranges when it's fairly straightforward to just wrap them in structs and get rid of any special cases that they would otherwise cause. And since you typically use the functions in std.range.interfaces to generate class ranges anyway, the resulting code would be pretty similar to what anyone using class ranges would be doing now.

It seems like you're proposing that we start requiring wrapper structs for some of the most common kinds of ranges, whereas my proposal would be requiring wrapper structs only for rare ranges - ones which are already typically generated using helper functions.

- Jonathan M Davis



May 16
On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
> wall-o-text
> no function header lists or patterns

????

> no reference at all to keyed ranges

????

> explict range starters for unicode instead of autodecoding

ok, thats *half* the problem; how does `hello 😀 world`.byUnicode(flag.charIndex).indexOf('w')` count correctly? The existing range api discards information that is otherwise trivial to have

the old solution of dchars and autodecoding failed; whats your proposal for the unicode problem on the *dchar* side of the problem where it was believed that dchar would simplify all use cases of unicode into simple indexing

> There is no guarantee that the init value of a range is even valid

Thats not what init means in this language; I believe it should be `.zero`, optional and part of a larger change were float.zero is 0.0; see my dip

> 8. $ cannot be used with random-access ranges in generic code, because the range API does not require that a random-access range define opDollar

I believe you should be much much much more spefic

is [min(i,$-1)] supported? is [$..0] supported?

the slicing api is a rabbit hole that needs allot of care
May 16
On Thursday, May 16, 2024 11:05:56 AM MDT monkyyy via Digitalmars-d wrote:
> On Thursday, 16 May 2024 at 14:56:55 UTC, Jonathan M Davis wrote:
> > explict range starters for unicode instead of autodecoding
>
> ok, thats *half* the problem; how does `hello 😀
> world`.byUnicode(flag.charIndex).indexOf('w')` count correctly?
> The existing range api discards information that is otherwise
> trivial to have
>
> the old solution of dchars and autodecoding failed; whats your proposal for the unicode problem on the *dchar* side of the problem where it was believed that dchar would simplify all use cases of unicode into simple indexing

The new range API is quite explictly _not_ trying to do any explicit Unicode handling. It's doing what has been discussed for years, which is to leave that up to the programmer and whatever algorithms they choose to use. Whether it's best to operate at the code unit, code point, or grapheme level depends on the operation, and issues such as normalization complicate the situation considerably with regards to any attempt to provide a one size fits all solution for indexing into graphemes. It's also not efficient to operate at the grapheme level by default.

The replacement to std.utf will provide the tools to convert from code units to code points, just like std.utf does now, with the difference being that the programmer will no longer have to work around the range API functions to try to prevent decoding from happening automatically.

Code that then needs to operate at the grapheme level will need to use the replacement for std.uni, and that can include a random-access range of graphemes. But it won't be the default way to operate, because that's inefficient, and no normalization scheme actually works in all cases, making selecting a default problematic.

We may choose to include a string type in Phobos which provides an API that allows you to operate at the grapheme level by default, but dynamics arrays of code units will be treated as ranges of code units. The problems that we've had with auto-decoding have stemmed from trying to treat them as anything else.

So, we can build whatever useful algorithms or types we want on top of the built-in strings, and we may be able to provide some better solutions than we currently have for dealing with graphemes, but the built-in strings will not have any kind of special-casing as part of the range API. Any ranges of graphemes will be types of their own and will not be the built-in strings even if they may wrap the built-in strings.

> > 8. $ cannot be used with random-access ranges in generic code, because the range API does not require that a random-access range define opDollar
>
> I believe you should be much much much more spefic
>
> is [min(i,$-1)] supported? is [$..0] supported?
>
> the slicing api is a rabbit hole that needs allot of care

Finite random-access ranges will need to support the same operations with $ that dynamic arrays do, since the whole point here is to access them in the same manner as dynamic arrays, and operations such as [$ .. 0] never make sense, so no, that won't be supported. And no additional syntax for $ is being added. It's purely for indexing and slicing. The current situation makes it so that we can't use $ at all with ranges in generic code, because there is no requirement that it be supported, whereas ideally, we'd be able to use it in the same way that you would with dynamic arrays. So, isRandomAccessRange will test to make sure that [$ - 1] compiles and returns the correct type, and hasSlicing will test to make sure that [0 .. $] and [0 .. $ - 1] compile and return the correct type. It will be an additional requirement that $ be equivalent to length, but we can't statically test for that.

Infinite random-access ranges will not support $ for opIndex, because they can't and be infinite. They will support it for slicing, but they won't be able to support doing arithmetic on $, because that makes no sense with an infinite range. $ will merely be used to indicate that slice is supposed to go to the end of the range (and therefore that the result will be infinite) rather than resulting in a finite slice as would occur if only indices are used.

$ is actually already partially tested for with hasSlicing and infinite ranges where it requires that opSlice return the same range type if opDollar is used, whereas it's expected to return a finite range if only indices are used. However, since there's no requirement that $ work at all, it can't actually be used in generic code even if you know that the range is infinite. It's just that hasSlicing requires that the result be the same if $ is implemented to work with slicing, which is therefore kind of a pointless test as things stand.

- Jonathan M Davis




May 17
On Thursday, 16 May 2024 at 22:21:54 UTC, Jonathan M Davis wrote:
>
> The new range API is quite explictly _not_ trying to do any explicit Unicode handling.

my proposal of adding .key to the range api isnt unicode spefic either

> It's also not efficient to operate at the grapheme level by default.

Anything lazy will be fine; the dstring conversion was slow because it had to allocate eagerly

> and no normalization scheme actually works in all cases, making selecting a default problematic.

Im very much for "range starters"; my example code is wildly verbose for me and had one

but I view it as *only half the problem*; strings index weird, this is trivial to deal with if you have two counters in the same abstraction; I believe if effencly impossible to handle by piling on more complexity

> So, we can build ***whatever useful algorithms*** or types we want on top of the built-in strings


*I disagree*; I believe the majority of searching should be replaced with an "indexes" function

`auto indexes(alias F,R)(R r)=>r.filter!F.swapkeyvalue;`

such a function depends on having a .key concept in the api; and then you have the unicode range have apis for selecting the right kind of key-value for your range and it will all just work

> [$ .. 0] never make sense, so no, that won't be supported.

if I define a complex opDollar, I can easily imagine it being seen as the "indexing" type, when in fact im using it it to slice on a bidirectional datastruct that maybe knows how to handle $/2

```d
struct mydata{
  auto opDollar(){
    struct dollar{
      opBinary!"-"
      opBinary!"/"
```

if the range api makes an assumption that `range.opSlice!(typeof(range.opDollar),typeof(range.opDollar))` this may break unnecessarily; where opDollar is used in opslicing matters, also like add $/2 to the wishlist

>>is [min(i,$-1)] supported?
>_____
while I know such slicing probably isnt on your rader for being to magic I unironically like how well it works for the base slice



May 16
On Thursday, May 16, 2024 7:30:06 PM MDT monkyyy via Digitalmars-d wrote:
> if I define a complex opDollar, I can easily imagine it being seen as the "indexing" type, when in fact im using it it to slice on a bidirectional datastruct that maybe knows how to handle $/2
>
> ```d
> struct mydata{
>    auto opDollar(){
>      struct dollar{
>        opBinary!"-"
>        opBinary!"/"
> ```
>
> if the range api makes an assumption that
> `range.opSlice!(typeof(range.opDollar),typeof(range.opDollar))`
> this may break unnecessarily; where opDollar is used in opslicing
> matters, also like add $/2 to the wishlist

Essentially, that requires that opDollar define the common arithmetic operations (except maybe multiplication). I'll have to think about the edge cases to decide how reasonable that is (though the fact that you can often use a single opBinary for all of the arithmetic operations helps). Whatever the operations are, _every_ finite range that defines indexing or slicing would then have to define all of them in order for generic code to be able to rely on them.

Fortunately, in most cases, opDollar should just be an alias to length, so it may be reasonable, much as it could be annoying for the corner cases.

> >>is [min(i,$-1)] supported?
> >
> >_____
>
> while I know such slicing probably isnt on your rader for being to magic I unironically like how well it works for the base slice

I was not aware that $ could be used anywhere except directly within the indexing or slicing expression (or if I knew, I forgot). I guess that the compiler lets it be used anywhere between the brackets, and since the result is just size_t with arrays, it works just fine with min. A user-defined type would have to implement opCmp for that to work. So having min work would mean requiring opCmp on top of the arithmetic operations. That might be reasonable, but it definitely adds to the complexity of implementing opDollar for types that can't just alias length, though realistically, there aren't going to be many finite ranges which implement opDollar another way.

So, I'll have to think about it further, but whatever the operations are that ranges will be able to use with $ in generic code will be required by the API and documented accordingly (if not, there would be no way to rely on being able to use them in generic code).

- Jonathan M Davis



May 17
On 17/05/2024 4:22 AM, Jonathan M Davis wrote:
> It seems like you're proposing that we start requiring wrapper structs for
> some of the most common kinds of ranges, whereas my proposal would be
> requiring wrapper structs only for rare ranges - ones which are already
> typically generated using helper functions.

Required, yes, automatic yes.

The language could help here, not a pleasant proposal but it would be do-able to have the language auto-wrap into a boxed type. Given ranges are first class in D, it'll be worth it.

```d
import core.attributes : autobox;

struct Map(@autobox T) {
	this(@autobox T val);
}
```

```d
module core.boxing.slice;

struct BoxedSlice(T) {
	T[] slice;
	alias slice this;

	bool empty() {
		return slice.length == 0;
	}
}
```

The reason I am suggesting this is we clearly don't like the solutions being proposed for the range functions for slices. So lets change it to a problem we do like and can solve.
May 17
On Friday, 17 May 2024 at 04:48:24 UTC, Jonathan M Davis wrote:
> Whatever the operations are, _every_ finite range that defines indexing or slicing would then have to define all of them in order for generic code to be able to rely on them.

I disagree; slicing should be separate from indexing (and keys)

a hashmap can trivially provide a key, and reaccess the value, but it probaly cant slice.

randomAccessRanges is a copy and paste with a clean up of c++ with its pointers, we have base types that airnt pointers.

I prefer just a free-for-all of optional functions, pre-emptive compromise: bidirectionaly, key and indexing(by key), numeric slicing; should be independent feature sets
May 17
On Thursday, May 16, 2024 10:48:26 PM MDT Richard (Rikki) Andrew Cattermole via Digitalmars-d wrote:
> On 17/05/2024 4:22 AM, Jonathan M Davis wrote:
> > It seems like you're proposing that we start requiring wrapper structs for some of the most common kinds of ranges, whereas my proposal would be requiring wrapper structs only for rare ranges - ones which are already typically generated using helper functions.
>
> Required, yes, automatic yes.
>
> The language could help here, not a pleasant proposal but it would be do-able to have the language auto-wrap into a boxed type. Given ranges are first class in D, it'll be worth it.
>
> ```d
> import core.attributes : autobox;
>
> struct Map(@autobox T) {
>   this(@autobox T val);
> }
> ```
>
> ```d
> module core.boxing.slice;
>
> struct BoxedSlice(T) {
>   T[] slice;
>   alias slice this;
>
>   bool empty() {
>       return slice.length == 0;
>   }
> }
> ```
>
> The reason I am suggesting this is we clearly don't like the solutions being proposed for the range functions for slices. So lets change it to a problem we do like and can solve.

I see nothing positive about having to wrap arrays. Treating arrays as ranges is far more user-friendly than that and has worked very well for us overall. It's just annoying to have to import a module in order to be able to use the range API with arrays, and auto-decoding was a mistake. Other than that, it really hasn't been an issue.

In order to get rid of save, the range API has to change enough that the old basic input ranges don't look like the new forward ranges, so it can't stay identical to what it is now regardless. Changing the names of the range API functions takes care of that, _and_ it allows us to put the new range API functions for dynamic arrays in object.d without breaking a ton of code.

Yes, having to rename the functions is annoying, but it also has the advantage of making it clear whether code is using the updated range API with the adjusted semantics or whether the code was written for the old range API and does not necessarily conform to the new requirements. So, renaming the range API functions fixes multiple problems.

Making it so that arrays must be wrapped to be ranges does nothing to fix the need to at least partially change the range API so that the old basic input ranges don't look like the new forward ranges. So, it doesn't fix the need to either rename the functions or add an otherwise pointless enum to the range API just to indicate that it's not the old range API. And if we're going to rename them, we might as well just rename them all so that the difference is clear, and it gives us an easy way to fix the import problem for treating arrays as ranges.

So, I don't see anything here that wrapping arrays improves. It just causes additional friction, and if you have to import anything to create the wrapper, we're right back to where we are now with having to import std.range.primitives, just with a different module.

I'm also highly skeptical that it would work to convince Walter to add an auto-boxing feature for this. He specifically complained in a recent DLF meeting about how we can't just use arrays as ranges with no need to do anything like import a module. He wants to be able to use the range API with arrays with no friction, and renaming the range API functions gives us a way to do that, whereas wrapping them creates additional friction.

On top of that, alias this is almost always a mistake with generic code. Most generic code tests for specific types, not implicit conversions, and testing for implicit conversions is error-prone, because the implicit conversion doesn't actually happen just because it's tested for, and even if it does happen, because the code forced the conversion, it happens inside the function rather than at the call site, which risks all kinds of issues (not as many with dynamic arrays as with some other types, but implicit conversions and template constraints are typically a very bad combination). So, adding a wrapper type is going to cause all kinds of problems with template constraints and which overloads get used. The whole situation is just much simpler if we don't wrap dynamic arrays.

All in all, I do not understand why you think that wrapping arrays improves anything. From what I can see, it's just purely worse.

Yes, Walter has to be convinced to add first/popFirst/isEmpty to foreach in addition to front/popFront/empty for the proposed range API to work with foreach, but given that it explicitly fixes a problem that he was complaining about, and it's a very simple change, I expect that he'll approve it. But even if he doesn't, and we're forced to do something like put an otherwise pointless enum on the new range API to distinguish new ranges from the old ones, I'd still rather be importing the replacement for std.range.primitives than have to deal with a wrapper type.

- Jonathan M Davis



« First   ‹ Prev
1 2 3