Thread overview
std.container.rbtree as Interval Tree?
Feb 05
RazvanN
February 04
I tried to implement an interval tree backed by std.container.rbtree today and fell flat.

A standard way to make an interval tree is to make an augmented tree; I supposed since rbtree was a generic container and because I could define opCmp, this should be a cinch. I ran into two problems.

First (minor problem), RedBlackTree considers a return value of 0 from opCmp equivalent to "equals", which is discordant with the guidance given for opCmp.[1] This is a  minor pedantic point, since you cannot store un-orderable elements in the tree anyway.

More importantly, though, I cannot figure out how to implement an interval query function on the tree. Typically, the tree would have a "key" that is the left position of the interval and the augmented part of the tree would be that a second value -- a right, or end, position. My Elem == key type is a struct encapsulating both of these (start, end; plus some metadata).

For my Interval element type, I overloaded opCmp to take an integer, but unfortunately RedBlackTree's upperBound() and lowerBound() functions are defined in terms of "Elem" which is aliased to the contained element type, rather than a generic type.

Q1: Is there a simple or elegant way to do this without re-implementing RedBlackTree? I apologize if it is obvious and I am missing it. I suppose it may work if I rewrite Interval's opCmp to not consider the upper bound, and by creating a dummy interval to pass to upperBound and lowerBound, but that seems inelegant compared to passing an integer.

Q2: Would replacing "Elem" with a generic type "T" in the function signatures for upperBound, lowerBound, and various related fns like _firstGreater / _firstGreaterEqual solve this problem?

James

[1] https://dlang.org/spec/operatoroverloading.html#eqcmp ("For example ... x and y are disjoint sets, then neither x < y nor y < x holds, but that does not imply that x == y. Thus, it is insufficient to determine equality purely based on opCmp alone. ")
February 05
On Monday, 4 February 2019 at 22:54:01 UTC, James Blachly wrote:
> I tried to implement an interval tree backed by std.container.rbtree today and fell flat.
>
> [...]
You can use alias this [1] in your interval element type:

struct IntervalElem
{
    size_t start, end;
    /* ... other declarations */
    alias start this;
}

[1] https://dlang.org/spec/class.html#AliasThis


> Q2: Would replacing "Elem" with a generic type "T" in the function signatures for upperBound, lowerBound, and various related fns like _firstGreater / _firstGreaterEqual solve this problem?
>
> [...]
Elem is already a generic type. I don't know how you can make it more generic without adding other template parameters to the class declaration (which means reimplementing RBTree from std.container);

> James
>
> [1] https://dlang.org/spec/operatoroverloading.html#eqcmp ("For example ... x and y are disjoint sets, then neither x < y nor y < x holds, but that does not imply that x == y. Thus, it is insufficient to determine equality purely based on opCmp alone. ")
February 05
On Monday, 4 February 2019 at 22:54:01 UTC, James Blachly wrote:
> I tried to implement an interval tree backed by std.container.rbtree today and fell flat.
>
> A standard way to make an interval tree is to make an augmented tree; I supposed since rbtree was a generic container and because I could define opCmp, this should be a cinch. I ran into two problems.
>
> First (minor problem), RedBlackTree considers a return value of 0 from opCmp equivalent to "equals", which is discordant with the guidance given for opCmp.[1] This is a  minor pedantic point, since you cannot store un-orderable elements in the tree anyway.
>
> More importantly, though, I cannot figure out how to implement an interval query function on the tree. Typically, the tree would have a "key" that is the left position of the interval and the augmented part of the tree would be that a second value -- a right, or end, position. My Elem == key type is a struct encapsulating both of these (start, end; plus some metadata).
>
> For my Interval element type, I overloaded opCmp to take an integer, but unfortunately RedBlackTree's upperBound() and lowerBound() functions are defined in terms of "Elem" which is aliased to the contained element type, rather than a generic type.
>
> Q1: Is there a simple or elegant way to do this without re-implementing RedBlackTree? I apologize if it is obvious and I am missing it. I suppose it may work if I rewrite Interval's opCmp to not consider the upper bound, and by creating a dummy interval to pass to upperBound and lowerBound, but that seems inelegant compared to passing an integer.
>
> Q2: Would replacing "Elem" with a generic type "T" in the function signatures for upperBound, lowerBound, and various related fns like _firstGreater / _firstGreaterEqual solve this problem?
>
> James
>
> [1] https://dlang.org/spec/operatoroverloading.html#eqcmp ("For example ... x and y are disjoint sets, then neither x < y nor y < x holds, but that does not imply that x == y. Thus, it is insufficient to determine equality purely based on opCmp alone. ")

I think you are making a slight confusion. Your `Interval` struct and the `Elem` type that `lowerBound` takes, are the same type.

You can define your RBTree and Interval as follows
```
struct Interval
{
    int start;
    int end;
}

alias IntervalTree = RedBlackTree!(Interval, (i1, i2) => i1.start < i2.start);
```

Please see this runable example: https://run.dlang.io/is/4cPTik

The in-order traversal will be the same as the wikipedia example here: https://en.wikipedia.org/wiki/Interval_tree

Hope this helps.

Cheers,
Edi
February 05
On Tuesday, 5 February 2019 at 10:10:44 UTC, RazvanN wrote:
> On Monday, 4 February 2019 at 22:54:01 UTC, James Blachly wrote:
>> I tried to implement an interval tree backed by std.container.rbtree today and fell flat.
>>
>> [...]
> You can use alias this [1] in your interval element type:
>
> struct IntervalElem
> {
>     size_t start, end;
>     /* ... other declarations */
>     alias start this;
> }
>
> [1] https://dlang.org/spec/class.html#AliasThis


Thanks -- I always seem to forget about `alias this` !
However, I don't think that this helps me to call functions expecting type Elem with an integer. At least, it failed in my test -- could this be because Elem itself is already an alias?

>> Q2: Would replacing "Elem" with a generic type "T" in the function signatures for upperBound, lowerBound, and various related fns like _firstGreater / _firstGreaterEqual solve this problem?
>>
>> [...]
> Elem is already a generic type. I don't know how you can make it more generic without adding other template parameters to the class declaration (which means reimplementing RBTree from std.container);

Agree, (although I think I would only need to revise only perhaps 25% of the module in this case) but I definitely wanted to avoid this if possible.


February 05
On Tuesday, 5 February 2019 at 16:24:03 UTC, Eduard Staniloiu wrote:
> I think you are making a slight confusion. Your `Interval` struct and the `Elem` type that `lowerBound` takes, are the same type.
>
> You can define your RBTree and Interval as follows
> ```
> struct Interval
> {
>     int start;
>     int end;
> }
>
> alias IntervalTree = RedBlackTree!(Interval, (i1, i2) => i1.start < i2.start);
> ```
>
> Please see this runable example: https://run.dlang.io/is/4cPTik
>
> The in-order traversal will be the same as the wikipedia example here: https://en.wikipedia.org/wiki/Interval_tree
>
> Hope this helps.
>
> Cheers,
> Edi


Edi, thanks for your quick reply!

I do understand that Elem is aliased to my Interval type.

Your suggested rewrite of the less fn is precisely what I was groaning about (although not explicitly) in terms of rewriting opCmp to be a simple `this.start < other.start`. The reason that this is undesirable is that distinct intervals with same starting coordinates are then considered equal and not added, unless RBTree tepmlate is instantiated with allowDuplicates. However, even when allowing (pseudo)duplicates, this means the distinct intervals with same start but different end coordinates are not deterministically placed/sorted within the tree, because they are not sortable with the simple `this.start < other.start` less function.

Anyway, in the end, I will assume that in my problem domain there are no degenerate intervals with same start coordinate and use the simpler less to accomplish goal.

Hopefully the upperBound and lowerBound functions are lazy...
February 07
On Tuesday, 5 February 2019 at 19:12:43 UTC, James Blachly wrote:
>
>
> However, even when allowing (pseudo)duplicates, this means the distinct intervals with same start but different end coordinates are not deterministically placed/sorted within the tree, because they are not sortable with the simple `this.start < other.start` less function.
>

I have updated the example with an `opCmp` implementation.
https://run.dlang.io/is/ailnsy

I believe this is what you are looking for.

Cheers,
Edi