Thread overview
medianOfMedians
Jan 20, 2016
Jack Stouffer
Jan 20, 2016
Timon Gehr
Jan 20, 2016
Ilya
January 19, 2016
I've seldom have code write itself so beautifully. Which, of course, means it needs to be destroyed. https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei
January 20, 2016
On Wednesday, 20 January 2016 at 01:20:11 UTC, Andrei Alexandrescu wrote:
> I've seldom have code write itself so beautifully. Which, of course, means it needs to be destroyed. https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei

I left some notes about some minor tweaks that I think should be made, but overall it looks good!
January 20, 2016
On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:
> I've seldom have code write itself so beautifully. Which, of course,
> means it needs to be destroyed.
> https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei

This is not an implementation of the median of medians algorithm.

int[] bad(int n){
    if(n<=5){ return iota(n).array; }
    auto next=bad(n/5);
    return next.map!(a=>[a-2,a-1,a,a+n,a+n+1]).join;
}

void main(){
    import std.stdio;
    auto a=bad(5^^10);
    auto idx=medianOfMedians(a);
    double notLarger=a.count!(x=>x<=a[idx])/(1.0*a.length);
    writeln(notLarger); // 0.0100766
    assert(notLarger>=0.3); // fail
}

The real algorithm works by computing an /exact/ median of the medians and uses it as the pivot value for quickselect in order to compute the precise median of the full array. In the given implementation, the approximations accumulate with each recursive invocation and no constant guarantees can be given on the percentile of the result.
January 20, 2016
On Wednesday, 20 January 2016 at 02:26:35 UTC, Timon Gehr wrote:
> On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:
>> [...]
>
> This is not an implementation of the median of medians algorithm.
>
> [...]

The approximate medianOfMedians algorithm can be used for topN. --Ilya
January 19, 2016
On 01/19/2016 09:26 PM, Timon Gehr wrote:
> On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:
>> I've seldom have code write itself so beautifully. Which, of course,
>> means it needs to be destroyed.
>> https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei
>
> This is not an implementation of the median of medians algorithm.
>
> int[] bad(int n){
>      if(n<=5){ return iota(n).array; }
>      auto next=bad(n/5);
>      return next.map!(a=>[a-2,a-1,a,a+n,a+n+1]).join;
> }
>
> void main(){
>      import std.stdio;
>      auto a=bad(5^^10);
>      auto idx=medianOfMedians(a);
>      double notLarger=a.count!(x=>x<=a[idx])/(1.0*a.length);
>      writeln(notLarger); // 0.0100766
>      assert(notLarger>=0.3); // fail
> }
>
> The real algorithm works by computing an /exact/ median of the medians
> and uses it as the pivot value for quickselect in order to compute the
> precise median of the full array. In the given implementation, the
> approximations accumulate with each recursive invocation and no constant
> guarantees can be given on the percentile of the result.

Thanks. Urgh. So the approximate median part cannot be separated from partitioning. It was suspiciously cute :o). Back to the drawing board. -- Andrei

January 20, 2016
On 01/19/2016 09:26 PM, Timon Gehr wrote:
> On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:
>> I've seldom have code write itself so beautifully. Which, of course,
>> means it needs to be destroyed.
>> https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei
>
> This is not an implementation of the median of medians algorithm.
[snip]

Thanks again for sharing your insight. FWIW there's a bit of variation floating on the Net regarding MoM. The Wikipedia article at https://en.wikipedia.org/wiki/Median_of_medians moves the medians of five to the beginning of the array (my implementation uses stride(), thus trading computation for data movement). I'm unclear on which approach is generally better.

http://austinrochford.com/posts/2013-10-28-median-of-medians.html does not mention the mutual recursion, suggesting (at least in a cursory reading) my wishy-washy previous implementation that doesn't use quickselect.

https://www.ics.uci.edu/~eppstein/161/960130.html only uses one recursive function, not two.

The original PICK algorithm at https://people.csail.mit.edu/rivest/pubs/BFPRT73.pdf only uses one recursive function.

Anyhow, I've implemented the two-functions version at https://github.com/D-Programming-Language/phobos/pull/3938. I'll next try whether the one-function version is just as good or better. Destroy?


Andrei

January 20, 2016
On 01/20/2016 10:20 AM, Andrei Alexandrescu wrote:
[snip]

And btw I now understand better why medianOfMedians is not so fast in practice. In fact my wishy-washy version at https://github.com/andralex/phobos/commit/9e004c35b824aac108e0e615183065e73384e9f9 seems to be practically attractive for choosing a good pivot even though it doesn't offer theoretical guarantees. I wonder how some jitter can be injected into it so as to improve its worst-case performance.

Andrei