Thread overview | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
December 17, 2012 Timsort vs some others | ||||
---|---|---|---|---|
| ||||
Regarding the recent Phobos improvements that introduce a Timsort: http://forum.dlang.org/thread/50c8a4e67f79_3fdd19b7ae814691e@sh3.rs.github.com.mail I have found a blog post that compares the performance of Timsort, Smoothsort, and std::stable_sort: http://www.altdevblogaday.com/2012/06/15/smoothsort-vs-timsort/ Bye, bearophile |
December 18, 2012 Re: Timsort vs some others | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Monday, 17 December 2012 at 15:28:36 UTC, bearophile wrote:
> Regarding the recent Phobos improvements that introduce a Timsort:
>
> http://forum.dlang.org/thread/50c8a4e67f79_3fdd19b7ae814691e@sh3.rs.github.com.mail
>
> I have found a blog post that compares the performance of Timsort, Smoothsort, and std::stable_sort:
>
> http://www.altdevblogaday.com/2012/06/15/smoothsort-vs-timsort/
>
> Bye,
> bearophile
While Smoothsort may be mathematically sound, it simply doesn't translate well to computer hardware. It's a variant of heap sort which requires a great deal of random access, whereas Timsort is a variant of merge sort which is largely sequential and benefits from the CPU cache. Furthermore, the Leonardo heap is much more computationally expensive than a typical binary or ternary heap.
Both Timsort and Smoothsort are what you call "natural sorts," meaning they typically require fewer comparisons on data with low entropy. They're also rather complex which means added overhead. When sorting primitive types like int, comparisons are inexpensive, so the overhead makes these algorithms slower. But had he tested it on a data type like strings, then we'd likely see Timsort take the lead.
On purely random data, quick sort and merge sort will win most of the time. But Timsort has an advantage over Smoothsort of being an adaptive algorithm; the so called "galloping mode," which is computationally expensive, is only activated when minGallop reaches a certain threshold and therefore beneficial. Otherwise, a simple linear merge is used (i.e. merge sort).
On another note, I highly doubt that std::sort uses a "median of medians" algorithm, which would add much overhead and essentially double the number of comparisons required with little to no benefit. More likely, it simply chooses the pivot from a median of three.
|
December 18, 2012 Re: Timsort vs some others | ||||
---|---|---|---|---|
| ||||
Posted in reply to Xinok | On Tuesday, 18 December 2012 at 06:52:27 UTC, Xinok wrote:
> On another note, I highly doubt that std::sort uses a "median of medians" algorithm, which would add much overhead and essentially double the number of comparisons required with little to no benefit. More likely, it simply chooses the pivot from a median of three.
Different implementations use different strategies.
libstdc++ seems to use median of 3.
The Dinkumware standard library (which ships with MSVC) uses median of 9.
|
December 18, 2012 Re: Timsort vs some others | ||||
---|---|---|---|---|
| ||||
Posted in reply to Xinok | On 12/18/2012 07:52 AM, Xinok wrote:
> While Smoothsort may be mathematically sound, it simply doesn't translate well
> to computer hardware. It's a variant of heap sort which requires a great deal of
> random access, whereas Timsort is a variant of merge sort which is largely
> sequential and benefits from the CPU cache. Furthermore, the Leonardo heap is
> much more computationally expensive than a typical binary or ternary heap.
... but I would guess that given the O(1) memory requirements it probably scales much better to sorting really, really large data, no?
That was surely a much, much bigger issue back in 1981 when Dijkstra proposed it, but still has a place today.
|
December 18, 2012 Re: Timsort vs some others | ||||
---|---|---|---|---|
| ||||
Posted in reply to Joseph Rushton Wakeling | Joseph Rushton Wakeling:
> ... but I would guess that given the O(1) memory requirements it probably scales much better to sorting really, really large data, no?
Why?
Bye,
bearophile
|
December 18, 2012 Re: Timsort vs some others | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 12/18/2012 04:30 PM, bearophile wrote:
> Why?
Because if you have to allocate O(n) memory for another algorithm, that might either give you a memory hit that you can't take (less likely these days, to be fair), or simply take a large amount of time to allocate that degrades the performance.
Happy to learn if my guess is wrong, though.
|
December 18, 2012 Re: Timsort vs some others | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | On 12/18/12 5:50 AM, Peter Alexander wrote:
> On Tuesday, 18 December 2012 at 06:52:27 UTC, Xinok wrote:
>> On another note, I highly doubt that std::sort uses a "median of
>> medians" algorithm, which would add much overhead and essentially
>> double the number of comparisons required with little to no benefit.
>> More likely, it simply chooses the pivot from a median of three.
>
> Different implementations use different strategies.
>
> libstdc++ seems to use median of 3.
>
> The Dinkumware standard library (which ships with MSVC) uses median of 9.
We should use a deferred pivot algorithm that I thought of a long time ago but never got around to test.
One thing about current pivot selection approaches is that they decide on a strategy (middle, median of 3, median of 9, random etc) _before_ ever looking at the data.
A different approach would be to take a look at a bit of data and _then_ decide what the pivot is.
Consider the following approach:
size_t partition(T[] d)
{
assert(a.length);
size_t a = 0, z = arr.length - 1;
auto maxa = a, minz = z;
for (; a < z && mina <= maxz;)
{
if (d[a] > d[z])
{
swap(d[a], d[z]);
}
if (d[maxa] < d[++a]) maxa = a;
if (d[minz] > d[--z]) minz = z;
}
--a;
++z;
/* Here data is already partitioned wrt d[a] or d[z]. If enough progress has been made (i.e. a is large enough compared to d.length), choose one of these as the pivot and only partition d[a .. z + 1]. Otherwise, use a classic pivot choice criterion.
*/
...
}
Another approach I wanted to try was to choose the median of K with K increasing with the size of the array. This is because a good pivot is more important for large arrays than for small arrays. As such, a possibility would be to simply sort a stride of the input (std.range.stride is random access and can be sorted right away without any particular implementation effort) and then choose the middle of the stride as the pivot.
If anyone has the time and inclination, have at it!
Andrei
|
December 19, 2012 Re: Timsort vs some others | ||||
---|---|---|---|---|
| ||||
Posted in reply to Joseph Rushton Wakeling | On Tuesday, 18 December 2012 at 15:27:07 UTC, Joseph Rushton Wakeling wrote:
> On 12/18/2012 07:52 AM, Xinok wrote:
>> While Smoothsort may be mathematically sound, it simply doesn't translate well
>> to computer hardware. It's a variant of heap sort which requires a great deal of
>> random access, whereas Timsort is a variant of merge sort which is largely
>> sequential and benefits from the CPU cache. Furthermore, the Leonardo heap is
>> much more computationally expensive than a typical binary or ternary heap.
>
> ... but I would guess that given the O(1) memory requirements it probably scales much better to sorting really, really large data, no?
Heap sort actually performs really well if the entirety of the data is small enough to fit into the CPU cache. But on larger data sets, heap sort is jumping all over the place resulting in a significant number of cache misses. When a block of memory is stored in cache, it doesn't remain there for long and very little work is done on it when it is cached. (I mention heap sort because the leonardo heap of smoothsort is still very computationally expensive)
Although merge sort is O(n), it's sequential nature results in far fewer cache misses. There are three blocks of memory being operated on at anytime: the two blocks to be merged, and a temporary buffer to store the merged elements. Three (or four) small pieces of memory can be sorted in the cache without any cache misses. Furthermore, thanks to the divide-and-conquer nature of merge sort, fairly large sublists can be sorted entirely in the CPU cache; this is even more so if you parallelize on a multi-core CPU which has a dedicated L1 cache for each CPU. Merge sort can be further optimized by using insertion sort on small sublists... which happens entirely in the CPU cache...
Another way to put it, merge sort is an ideal algorithm for sorting linked lists, and it was even practical for sorting large lists stored on tape drives.
Quick sort is a sequential sorting algorithm with O(log n) space complexity which is likely the reason it outperforms merge sort in most cases, albeit not being stable.
|
December 19, 2012 Re: Timsort vs some others | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Tuesday, 18 December 2012 at 15:55:17 UTC, Andrei Alexandrescu wrote: > Another approach I wanted to try was to choose the median of K with K increasing with the size of the array. This is because a good pivot is more important for large arrays than for small arrays. As such, a possibility would be to simply sort a stride of the input (std.range.stride is random access and can be sorted right away without any particular implementation effort) and then choose the middle of the stride as the pivot. > > If anyone has the time and inclination, have at it! Perhaps a "median of log n" is in order, but the trouble is finding an algorithm for picking the median from n elements. The so called "median of medians" algorithm can choose a pivot within 30-70% of the range of the median. Otherwise, there doesn't seem to be any algorithm for choosing the absolute median other than, say, an insertion sort. I came up with this clever little guy a while ago which I used in my implementation of quick sort: http://dpaste.dzfl.pl/b85e7ad8 I would love to enhance it to work on a variable number of elements, but from what I can comprehend, the result is essentially a partial heap sort. |
December 19, 2012 Re: Timsort vs some others | ||||
---|---|---|---|---|
| ||||
Posted in reply to Xinok | On 12/18/12 8:42 PM, Xinok wrote: > On Tuesday, 18 December 2012 at 15:55:17 UTC, Andrei Alexandrescu wrote: >> Another approach I wanted to try was to choose the median of K with K >> increasing with the size of the array. This is because a good pivot is >> more important for large arrays than for small arrays. As such, a >> possibility would be to simply sort a stride of the input >> (std.range.stride is random access and can be sorted right away >> without any particular implementation effort) and then choose the >> middle of the stride as the pivot. >> >> If anyone has the time and inclination, have at it! > > Perhaps a "median of log n" is in order, Yah I thought so! > but the trouble is finding an > algorithm for picking the median from n elements. The so called "median > of medians" algorithm can choose a pivot within 30-70% of the range of > the median. Otherwise, there doesn't seem to be any algorithm for > choosing the absolute median other than, say, an insertion sort. You don't need to choose a median - just sort the data (thereby making progress toward the end goal) and choose the middle element. > I came up with this clever little guy a while ago which I used in my > implementation of quick sort: http://dpaste.dzfl.pl/b85e7ad8 > I would love to enhance it to work on a variable number of elements, but > from what I can comprehend, the result is essentially a partial heap sort. A very efficient sort for various small fixed sizes is a great complement for quicksort. Andrei |
Copyright © 1999-2021 by the D Language Foundation