June 14, 2021
On 6/14/21 10:46 AM, Ola Fosheim Grøstad wrote:
> On Monday, 14 June 2021 at 14:29:57 UTC, Steven Schveighoffer wrote:
>> Of course those are invalid, but the C++ compiler will happily use them, and graciously allow memory corruption. And of course, that is the point of D using ranges instead of iterators (maybe he glossed over it because C++ developers take unsafety as a given).
> 
> Apples and oranges.
> 
> Gang of Four defined _iterator_ to be what D and C++ call "ranges", basically a "heavy" object that has builtin knowledge of how to traverse a data-structure. Of course, the term "range" is a misnomer since you can traverse a graph in numerous ways, so you can have a depth-first-iterator, breadth-first-iterator, and indeed have iterators that to randomized walks... so iterator is a better name.
> 
> C++ STL "iterators" are *light-weight* pointer-abstractions. You can create heavy C++ "iterators" that keeps track of where "the end" is and have it throw an exception. C++ does support safe iterators. STL does not.

I'm referring to the talk, where you can use one function (search) to compose any possible "range" from the result that makes sense, but also ones that don't make sense. And D doesn't allow that specifically due to the safety implications.

A C++ "safe" iterator that is essentially a range under the hood doesn't compose any better than D ranges.

-Steve
June 14, 2021
On Monday, 14 June 2021 at 15:12:56 UTC, Steven Schveighoffer wrote:
> I'm referring to the talk, where you can use one function (search) to compose any possible "range" from the result that makes sense, but also ones that don't make sense.

Ok, but some complicated algorithms are sometimes easier to express if you allow the ones that don't make sense (but does not happen).

> A C++ "safe" iterator that is essentially a range under the hood doesn't compose any better than D ranges.

I don't think "C++ iterators" with safety checks essentially are ranges, they are essentially checked pointers.

How would you quickly search an annotated suffix array with D ranges as cursors?


June 14, 2021
On Monday, 14 June 2021 at 15:22:29 UTC, Ola Fosheim Grøstad wrote:
> How would you quickly search an annotated suffix array with D ranges as cursors?

Let me expand on that:

What if you want to do a binary search, but each node in the table has an optimization entry that says how many nodes you should move up or down when you hit that node? (For speeding up the search or some other reason).




June 14, 2021
On 6/14/21 11:22 AM, Ola Fosheim Grøstad wrote:
> On Monday, 14 June 2021 at 15:12:56 UTC, Steven Schveighoffer wrote:
>> I'm referring to the talk, where you can use one function (search) to compose any possible "range" from the result that makes sense, but also ones that don't make sense.
> 
> Ok, but some complicated algorithms are sometimes easier to express if you allow the ones that don't make sense (but does not happen).


True, and matching iterators from separate subranges might be reasonable when they all come from an original container/range. It's still *mostly* possible with cursors, as long as you have a definitive parent range/container to use for range composition.

> 
>> A C++ "safe" iterator that is essentially a range under the hood doesn't compose any better than D ranges.
> 
> I don't think "C++ iterators" with safety checks essentially are ranges, they are essentially checked pointers.

You'd have to restrict them to forward-only. Which may not be a huge issue, if you don't ever go backwards. Or else, also you'd have to store the beginning of the original as well as the end.

> 
> How would you quickly search an annotated suffix array with D ranges as cursors?

dcollections cursors are not iterators, they store an optional reference to one element, and also can tell you whether they belong to a container or not (they are actually ranges of zero or one element, but that's mostly only because it was easy to support).

So you can compose ranges using the original container and the cursors.

In terms of std::search (which is not the style that dcollections supported), you would use it just like C++, but of course, you have to compose the range before passing to a D algorithm which requires ranges.

Like:

```d
auto allEqualRange = container.search(someValue);
auto allBeforeRange = container[container.begin .. mid.begin]; // I wish container.begin could be `^`
auto allAfterRange = container[mid.end .. $];
auto beforeAndEqual = container[container.begin .. mid.end];
auto equalAndAfter = container[mid.begin .. $];
```

Dcollections supported a `find` method, which returned a cursor. Then you can compose your ranges with that. Possibly only redblacktree-based containers with allowed duplicates would match the requirements for `search`.

-Steve
June 14, 2021
On Monday, 14 June 2021 at 15:43:47 UTC, Steven Schveighoffer wrote:
> On 6/14/21 11:22 AM, Ola Fosheim Grøstad wrote:
>> I don't think "C++ iterators" with safety checks essentially are ranges, they are essentially checked pointers.
>
> You'd have to restrict them to forward-only. Which may not be a huge issue, if you don't ever go backwards. Or else, also you'd have to store the beginning of the original as well as the end.

You only need to store a pointer to the container, but it doesn't matter if the datastructure you are looking at only provide offsets that stays within the legal range. Like if you do a binary search on a sorted dictionary and each element/node has offsets that expands to the range of items that starts with the same prefiks.

If we think of random access ranges (e.g. C++) as slices, you can implement binary search with elegance, but what to do if you want move out of the range? Then you end up writing cluttered code. At this point, table pointers ("C++ iterators") make the code less cluttered. So, there is an advantage to "C++ ranges" that can be decomposed into table pointers.

> dcollections cursors are not iterators, they store an optional reference to one element, and also can tell you whether they belong to a container or not (they are actually ranges of zero or one element, but that's mostly only because it was easy to support).
>
> So you can compose ranges using the original container and the cursors.

Yes, that makes it more flexible. So a zero-length range is a pointer to the space between two elements, and a one-length range is a pointer to that one element? That is kinda like "C++ iterators". C++ begin() could be seen as pointing to the space before the first element and end() could be seen as pointing the space after the last element. So in a sense, "C++ iterators" are zero-length "ranges".

Of course, usually begin() is described as pointing to the first element, but that is a matter of wording. It actually makes more sense sometimes to think of "C++ iterators" as being pointers to the space between elements.

June 14, 2021

On Monday, 14 June 2021 at 16:02:18 UTC, Ola Fosheim Grøstad wrote:

>

cluttered code. At this point, table pointers ("C++ iterators") make the code less cluttered. So, there is an advantage to "C++ ranges" that can be decomposed into table pointers.

Of course, the big disadvantage with "C++ iterators" as specc'ed is that it is time-consuming to implement all the requirements. Can easily be 400 lines or more.

Compare this to a Python Generator based on coroutines with yield or C++ stack-less coroutines with co_yield...

June 14, 2021
On 6/14/21 12:02 PM, Ola Fosheim Grøstad wrote:
> On Monday, 14 June 2021 at 15:43:47 UTC, Steven Schveighoffer wrote:
>> On 6/14/21 11:22 AM, Ola Fosheim Grøstad wrote:

>> dcollections cursors are not iterators, they store an optional reference to one element, and also can tell you whether they belong to a container or not (they are actually ranges of zero or one element, but that's mostly only because it was easy to support).
>>
>> So you can compose ranges using the original container and the cursors.
> 
> Yes, that makes it more flexible. So a zero-length range is a pointer to the space between two elements, and a one-length range is a pointer to that one element? That is kinda like "C++ iterators". C++ begin() could be seen as pointing to the space before the first element and end() could be seen as pointing the space after the last element. So in a sense, "C++ iterators" are zero-length "ranges".

Yes, this is how dcollections worked. Except dcollections cursors were slightly more tied to an element than C++ iterators. In fact, re-examining the implementation, I made some mistakes which I never addressed.

My concept was that a cursor points either an element or *one past* the element. So popFront'ing a cursor doesn't move past the element to the next element, it moves to the space beyond the element. The awesome thing this enables is that you have a choice of both before and after the element, without having to refer to the next element at all (and things like rearranging or inserting elements do not affect the cursor's viability).

However, in particular the red-black-tree based sets/maps just used the Node pointer as-is for slicing, regardless of whether it was empty or not. Which is very much not correct according to the concept. But I haven't touched the code or used it in over a decade, so I'm not sure if I'll ever fix that.

> 
> Of course, usually begin() is described as pointing to the first element, but that is a matter of wording. It actually makes more sense sometimes to think of "C++ iterators" as being pointers to the space between elements.
> 

Yes, it's pointing at the boundaries. That's what makes them composable.

-Steve
June 14, 2021
On Monday, 14 June 2021 at 16:29:09 UTC, Steven Schveighoffer wrote:
> However, in particular the red-black-tree based sets/maps just used the Node pointer as-is for slicing, regardless of whether it was empty or not. Which is very much not correct according to the concept. But I haven't touched the code or used it in over a decade, so I'm not sure if I'll ever fix that.

Was that because it was harder to do for red-black-trees?

I have been toying with chunked containers that allows fast transfers of ranges of objects from one container type to another.

The basic idea being that you could have a queue and then quickly transfer the whole queue to a B-tree or maybe the first 100 objects of the queue to the middle of a deque.

But it is challenging to come up with a chunk type that works equally well for many container types. So I limit it to sequential container types. Then have links between the chunks (so basically a linked list of small arrays). That way I can have one iterator type that works for all container types.

I would also like to make it possible to have table pointers that doesn't become invalidated when moving a chunk, but how to do this if they point to the space between elements and not the element itself? Like if it points to the space between two chunks and you move the chunk after to a new container then the table pointer would move to the new container as well (or the implementation would be inefficient).

Seems like the semantics of pointing to elements instead of the space between elements is more robust for this kind of container semantics. So, when you mutate then the distinction between pointing to or between elements become significant.

Also, it then becomes important to figure out whether table pointers point to one specific container or the content and how the table pointers can be implemented efficiently (time and space).


June 14, 2021
On 6/14/21 1:00 PM, Ola Fosheim Grøstad wrote:
> On Monday, 14 June 2021 at 16:29:09 UTC, Steven Schveighoffer wrote:
>> However, in particular the red-black-tree based sets/maps just used the Node pointer as-is for slicing, regardless of whether it was empty or not. Which is very much not correct according to the concept. But I haven't touched the code or used it in over a decade, so I'm not sure if I'll ever fix that.
> 
> Was that because it was harder to do for red-black-trees?

No, I think it was just a mistake of design.

The end node in a RBT is always the root (everything is based on a left child), which makes it O(1) always to fetch it. But it's not a true element, there is no data stored there. So the easiest thing to do was to return a cursor with `end` as the node, and `false` for whether the cursor had data. Which probably lead to just ignoring the `hasData` boolean.

But it should have been basically `cursor(_end.prev, false)` for the result of `end()`, and the slice operator would have to take into account which "space" a cursor was pointing at.

> 
> I have been toying with chunked containers that allows fast transfers of ranges of objects from one container type to another.
> 
> The basic idea being that you could have a queue and then quickly transfer the whole queue to a B-tree or maybe the first 100 objects of the queue to the middle of a deque.
> 
> But it is challenging to come up with a chunk type that works equally well for many container types. So I limit it to sequential container types. Then have links between the chunks (so basically a linked list of small arrays). That way I can have one iterator type that works for all container types.

Oh for sure. Generalizing the relationships that nodes have would be pretty difficult. Trees need 2 pointers at least (possibly a parent pointer as well), whereas a linked list only needs one (if it's not bidirectional).

> I would also like to make it possible to have table pointers that doesn't become invalidated when moving a chunk, but how to do this if they point to the space between elements and not the element itself? Like if it points to the space between two chunks and you move the chunk after to a new container then the table pointer would move to the new container as well (or the implementation would be inefficient).
> 
> Seems like the semantics of pointing to elements instead of the space between elements is more robust for this kind of container semantics. So, when you mutate then the distinction between pointing to or between elements become significant.

Unless you allocate nodes for the actual spaces, you have to deal with this kind of stuff. I felt like pointing at an element, and then storing whether that data is valid or not was a pretty reasonable way to have element pointers without safety problems.

Thinking about it some more, it may have been a conscious decision not to care about the "valid" boolean at first, since I wasn't focused on implementing the cursors as ranges at all.

But if I did it again today, this is exactly what I would do.

My original use case for making dcollections was to have a fast-lookup map (using RBTree), and be able to remove items as I iterate it the tree. Using ranges for this just doesn't make sense in the general case, but using cursors works well.

-Steve
June 14, 2021
On 6/14/21 7:29 AM, Steven Schveighoffer wrote:

> But in reality, there are 6 more:
>
> last .. first
> last .. mid.begin()
> .... // etc
>
> Of course those are invalid, but the C++ compiler will happily use them

Add to that pairs of iterators made from different containers. :)

  last_of_apples .. first_of_oranges

Ali