Thread overview
Exposing pred from SortedRange and changes to algorithms that assumeSorted
Jan 12, 2018
aliak
Jan 12, 2018
Seb
Jan 12, 2018
aliak
January 12, 2018
Would it be an acceptable enhancement to phobos to expose the predicate in SortedRange (https://dlang.org/library/std/range/sorted_range.html)?

The rationale behind it would be so that functions like setDifference (https://dlang.org/library/std/algorithm/setops/set_difference.html) or any function that requires a sorted range to "run" does not need to be provided the sorting predicate, ie:

setDifference
largestPartialIntersection
largestPartialIntersectionWeighted
multiwayUnion
setIntersection
setSymmetricDifference
setUnion
multiwayMerge

Where functions do required ranges to be sorted and a predicate passed in, this enhancement should reduce programmer errors (i.e. providing the wrong predicate to the algorithm that was used to sort the range that you want to operate on). The sortedrange knows how it was sorted, so there should be no need to duplicate that (unless there are maybe some valid reasons?).

It would also make for SortedRange itself to be made lighter (maybe not a good idea at this point but it can be done at least) by moving it's functionality in to algorithms that are already there. Eg, "SortedRange.contains" can be done by algorithm.canFind:

auto canFind(Range, T)(Range range, T element) if (isSortedRange!Range) {
  // do binary search for element
}

PS: Is this the correct place for posts like this? Apologies if not :s

January 12, 2018
On Friday, 12 January 2018 at 09:52:36 UTC, aliak wrote:
> Would it be an acceptable enhancement to phobos to expose the predicate in SortedRange (https://dlang.org/library/std/range/sorted_range.html)?
>
> The rationale behind it would be so that functions like setDifference (https://dlang.org/library/std/algorithm/setops/set_difference.html) or any function that requires a sorted range to "run" does not need to be provided the sorting predicate, ie:
>
> setDifference
> largestPartialIntersection
> largestPartialIntersectionWeighted
> multiwayUnion
> setIntersection
> setSymmetricDifference
> setUnion
> multiwayMerge
>
> Where functions do required ranges to be sorted and a predicate passed in, this enhancement should reduce programmer errors (i.e. providing the wrong predicate to the algorithm that was used to sort the range that you want to operate on). The sortedrange knows how it was sorted, so there should be no need to duplicate that (unless there are maybe some valid reasons?).
>
> It would also make for SortedRange itself to be made lighter (maybe not a good idea at this point but it can be done at least) by moving it's functionality in to algorithms that are already there. Eg, "SortedRange.contains" can be done by algorithm.canFind:
>
> auto canFind(Range, T)(Range range, T element) if (isSortedRange!Range) {
>   // do binary search for element
> }

canFind uses find internally, which already has a shortcut for SortedRange.
I don't like contains either, but the idea was to have a separate method with different performance guarantees as canFind is typically O(n).
Anyways I have tried to deprecate it:

https://github.com/dlang/phobos/pull/5651

Maybe you have better arguments?

----

Now to your main question:

Exposing doesn't help much, because you can't compare them, but there is WIP on lambda comparisons:

https://github.com/dlang/dmd/pull/7484

With it, the default lamdas can be compared for equality and what you want to do can/will be done.

> PS: Is this the correct place for posts like this? Apologies if not :s

Yeah I think it is.
January 12, 2018
On Friday, 12 January 2018 at 10:53:04 UTC, Seb wrote:
> canFind uses find internally, which already has a shortcut for SortedRange.
> I don't like contains either, but the idea was to have a separate method with different performance guarantees as canFind is typically O(n).
> Anyways I have tried to deprecate it:
>
> https://github.com/dlang/phobos/pull/5651
>
> Maybe you have better arguments?

Ah I see. Nah I don't have better arguments :p Think yours were good enough. If I understood correctly the argument to keeping contains is because it means "here is a function that searches logarithmically"? Is that correct? While I do agree "contains" indicates some kind of quick(er) check functionality over an algorithm with the word find in it, I can't quite elaborate why that's the case, but I think it only applies when there's context, and not "generically", and because I've used hash maps a lot. The problem is that it's not enforceable and hence cannot be depended upon, so I don't think it's a good argument in the current scenario. A better convention would be to have the indiction of complexity explicitly in the name, i.e. name it "locagirhtmicFind" or "binarySearch", etc, and have that on a SortedRange. But I assume that ship has sailed...

Tried looking for other languages which differentiate between find/search/exists/contains/etc based on complexity and I don't think there are any? Are there?

Swift does "contains"
C++ does "find"
Rust does "contains" (and find??)
Julia does "searchsorted"

None of them make any complexity promises, the only one I can look at and would be surprised if it didn't do a faster search is the Julia one.

>
> ----
>
> Now to your main question:
>
> Exposing doesn't help much, because you can't compare them, but there is WIP on lambda comparisons:
>
> https://github.com/dlang/dmd/pull/7484
>
> With it, the default lamdas can be compared for equality and what you want to do can/will be done.

Aha ok so basically you're saying that even if pred was public then I can't make sure R1.pred == R2.pred so it does't make it a 100% guarantee right? But regardless of that wouldn't making pred public be needed to do the above anyway? I.e. if an algorithm wants to make sure the SortedRange preds were the same it still needs to access them right? And if the algorithms are single ranged, then you don't need to compare any lambdas then - i.e. canFind just uses whatever pred was in SortedRange.

Or did I maybe misunderstand?

Cheers