| Thread overview | |||||||||
|---|---|---|---|---|---|---|---|---|---|
|
January 19, 2016 medianOfMedians | ||||
|---|---|---|---|---|
| ||||
I've seldom have code write itself so beautifully. Which, of course, means it needs to be destroyed. https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei | ||||
January 20, 2016 Re: medianOfMedians | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Wednesday, 20 January 2016 at 01:20:11 UTC, Andrei Alexandrescu wrote:
> I've seldom have code write itself so beautifully. Which, of course, means it needs to be destroyed. https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei
I left some notes about some minor tweaks that I think should be made, but overall it looks good!
| |||
January 20, 2016 Re: medianOfMedians | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote: > I've seldom have code write itself so beautifully. Which, of course, > means it needs to be destroyed. > https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei This is not an implementation of the median of medians algorithm. int[] bad(int n){ if(n<=5){ return iota(n).array; } auto next=bad(n/5); return next.map!(a=>[a-2,a-1,a,a+n,a+n+1]).join; } void main(){ import std.stdio; auto a=bad(5^^10); auto idx=medianOfMedians(a); double notLarger=a.count!(x=>x<=a[idx])/(1.0*a.length); writeln(notLarger); // 0.0100766 assert(notLarger>=0.3); // fail } The real algorithm works by computing an /exact/ median of the medians and uses it as the pivot value for quickselect in order to compute the precise median of the full array. In the given implementation, the approximations accumulate with each recursive invocation and no constant guarantees can be given on the percentile of the result. | |||
January 20, 2016 Re: medianOfMedians | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | On Wednesday, 20 January 2016 at 02:26:35 UTC, Timon Gehr wrote:
> On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:
>> [...]
>
> This is not an implementation of the median of medians algorithm.
>
> [...]
The approximate medianOfMedians algorithm can be used for topN. --Ilya
| |||
January 19, 2016 Re: medianOfMedians | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | On 01/19/2016 09:26 PM, Timon Gehr wrote:
> On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote:
>> I've seldom have code write itself so beautifully. Which, of course,
>> means it needs to be destroyed.
>> https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei
>
> This is not an implementation of the median of medians algorithm.
>
> int[] bad(int n){
> if(n<=5){ return iota(n).array; }
> auto next=bad(n/5);
> return next.map!(a=>[a-2,a-1,a,a+n,a+n+1]).join;
> }
>
> void main(){
> import std.stdio;
> auto a=bad(5^^10);
> auto idx=medianOfMedians(a);
> double notLarger=a.count!(x=>x<=a[idx])/(1.0*a.length);
> writeln(notLarger); // 0.0100766
> assert(notLarger>=0.3); // fail
> }
>
> The real algorithm works by computing an /exact/ median of the medians
> and uses it as the pivot value for quickselect in order to compute the
> precise median of the full array. In the given implementation, the
> approximations accumulate with each recursive invocation and no constant
> guarantees can be given on the percentile of the result.
Thanks. Urgh. So the approximate median part cannot be separated from partitioning. It was suspiciously cute :o). Back to the drawing board. -- Andrei
| |||
January 20, 2016 Re: medianOfMedians | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | On 01/19/2016 09:26 PM, Timon Gehr wrote: > On 01/20/2016 02:20 AM, Andrei Alexandrescu wrote: >> I've seldom have code write itself so beautifully. Which, of course, >> means it needs to be destroyed. >> https://github.com/D-Programming-Language/phobos/pull/3938 -- Andrei > > This is not an implementation of the median of medians algorithm. [snip] Thanks again for sharing your insight. FWIW there's a bit of variation floating on the Net regarding MoM. The Wikipedia article at https://en.wikipedia.org/wiki/Median_of_medians moves the medians of five to the beginning of the array (my implementation uses stride(), thus trading computation for data movement). I'm unclear on which approach is generally better. http://austinrochford.com/posts/2013-10-28-median-of-medians.html does not mention the mutual recursion, suggesting (at least in a cursory reading) my wishy-washy previous implementation that doesn't use quickselect. https://www.ics.uci.edu/~eppstein/161/960130.html only uses one recursive function, not two. The original PICK algorithm at https://people.csail.mit.edu/rivest/pubs/BFPRT73.pdf only uses one recursive function. Anyhow, I've implemented the two-functions version at https://github.com/D-Programming-Language/phobos/pull/3938. I'll next try whether the one-function version is just as good or better. Destroy? Andrei | |||
January 20, 2016 Re: medianOfMedians | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 01/20/2016 10:20 AM, Andrei Alexandrescu wrote: [snip] And btw I now understand better why medianOfMedians is not so fast in practice. In fact my wishy-washy version at https://github.com/andralex/phobos/commit/9e004c35b824aac108e0e615183065e73384e9f9 seems to be practically attractive for choosing a good pivot even though it doesn't offer theoretical guarantees. I wonder how some jitter can be injected into it so as to improve its worst-case performance. Andrei | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply