Worst-case performance of quickSort / getPivot (page 6)

On Monday, 18 November 2013 at 02:36:40 UTC, Andrei Alexandrescu wrote: > I think that's a terrific start! Thanks :) > (Not sure I understand where the 30 minutes come from...) On my computer (`-release -inline -noboundscheck` ... `-O` is omitted because it removes the summation all together since it isn't used anywhere): --- auto benchmarkSumming() { auto proportions = iota(150_000); // copied from std.random.dice: auto sum = reduce!("(assert(b >= 0), a + b)")(0.0, proportions.save); } auto bench = benchmark!benchmarkSumming(1_000); writeln(bench.front.msecs, " ms"); --- Gives me 1618 ms. ~150_000 is the number of entries in the probability table (number of surnames), and generating the sum for 1_000 names would take about that long. I really wanted 1 million names to sort, so scaling that up would mean 1618 seconds, which is nearly 27 minutes. That summation doesn't change, hence the importance of caching it for repeated runs. That's about 27 minutes of completely wasted work. That all isn't to say that std.random.dice is fundamentally flawed (it's fine and perfectly suited for one-shot dice rolls and is, in fact, a bit better than a DiceRange for that task) but a DiceRange is really needed to efficiently do repeated rolls like this. Jeez, we're talking about maximizing sorting efficiency, and now I've gone off talking about making dice rolls faster too! :)

November 18, 2013

Fast dice rolls

Posted by Chris Cain
in reply to bearophile

Permalink

Chris Cain

Posted in reply to bearophile

Permalink

On Monday, 18 November 2013 at 03:10:22 UTC, bearophile wrote:
> See also:
> http://d.puremagic.com/issues/show_bug.cgi?id=5849
>
> Bye,
> bearophile

Nice one, bearophile. You called it.

I've done a bit of testing with the solution you've attached to that bug report. It's 10-50% faster than my solution depending on how many proportions are provided (I've tested as much as 500_000). Despite mine being O(log(n)) to find the next dice roll and your solution being O(1), there really isn't that much of a difference. Strangely (to me), your solution actually seems to grow faster than mine as the number of proportions grow larger despite the proof claiming it should grow slower (although testing for > 500_000 seems pointless until actual data demands it).

As far as integer proportions go, I think my solution might be better overall because it's effectively equivalent to the current dice implementation (meaning it could be used as a near drop-in replacement and get precisely the same results, as long as the proportions are floating point values) and the code is simpler and avoids hiding any real bugs.

(That said, the original implementation has a bug that was easy to spot on my second comb-through)
--- in popFront:
// Return the number of elements that are not strictly greater than point
        _front = _propCumSumSorted.length - _propCumSumSorted.upperBound!pol(point).length;
---

On the other hand, a cumulative sum array approach (like mine and the std.random.dice implementation) will likely suffer major precision issues with floating point numbers. Especially if the first proportions are very large and the later proportions are very small. I haven't closely looked through the algorithm you're using to see whether it fares better, but I can't imagine it doing much worse.

On Friday, 29 November 2013 at 01:47:49 UTC, Xinok wrote: > I've made a pull request to fix this issue: > > https://github.com/D-Programming-Language/phobos/pull/1735 I've left some comments on Github. The greatest concern IMO is that the new implementation relies on assignments rather than swaps. It would be a regression to disallow sorting of entities with disabled assignment. Ivan Kazmenko.

On Sunday, 17 November 2013 at 03:14:41 UTC, Xinok wrote: > On Sunday, 17 November 2013 at 02:44:45 UTC, Vladimir Panteleev wrote: >> On Saturday, 16 November 2013 at 22:11:46 UTC, Xinok wrote: >>> And the results (last number is predicate calls): >>> >>> Current Unstable Sort 50ms 32783474 >>> New Unstable Sort 69ms 21503542 >>> Timsort 35ms 3905887 >> >> For the record, I tried both SwapStragegy options with my data (the data that got me to start this thread), and although TimSort did fewer comparisons (predicate calls), it ran about 20% slower. > > Could you try running a benchmark on the same data using this instead? > > https://github.com/Xinok/XSort/blob/master/unstablesort.d > > My implementation of quick sort has some odd tweaks which makes it slower in some cases, but I've never invoked worst-case behavior by accident. Sorry for the late reply - my access to my home PC was spotty in the past week. I wasn't entirely correct in the parent post - I didn't test things in the same way as the original problem, since I originally encountered it while using multiSort. multiSort can't use stable sort, because "stable $(D partition3) has not been implemented yet". If I hack multiSort to use TimSort instead of QuickSort, the program finishes quickly. I reran the test in the same way as in the parent post, incl. with unstableSort. Here are the results: SwapStrategy.unstable: 560576749 comparisons, 4796076 ticks SwapStrategy.stable : 340617464 comparisons, 5757974 ticks unstableSort : 573916901 comparisons, 5255061 ticks

On Monday, 18 November 2013 at 05:26:30 UTC, Chris Cain wrote: > On my computer (`-release -inline -noboundscheck` ... `-O` is omitted because it removes the summation all together since it isn't used anywhere): The better way to avoid this problem is usually to compile the operation to be benchmarked separately, in such a way that the result can't be ignored (e.g. by making it read the parameters from function arguments and returning the result as the return value). Comparing non-optimized (i.e. -O) builds for execution speed is of questionable relevance for most cases. David

Forums