Timsort vs some others (page 2)

Andrei Alexandrescu: > A very efficient sort for various small fixed sizes is a great complement for quicksort. Do you mean to use it when the input is very short, or when the QuickSort recursion has produced a very small subarray? In the first case it's useful, but in the second case I've seen it's more efficient (maybe not for huge arrays, but it is more efficient for normal arrays in RAM) to not call a specialized sort for such small sub-arrays, and instead just stop the QuickSort recursion and produce a locally unsorted array, and then call an insertion sort on the whole data. Bye, bearophile

December 19, 2012

Re: Timsort vs some others

Posted by ixid
in reply to Andrei Alexandrescu

Permalink

ixid

Posted in reply to Andrei Alexandrescu

Permalink

On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei Alexandrescu wrote:
> On 12/18/12 8:42 PM, Xinok wrote:
>> On Tuesday, 18 December 2012 at 15:55:17 UTC, Andrei Alexandrescu wrote:
>>> Another approach I wanted to try was to choose the median of K with K
>>> increasing with the size of the array. This is because a good pivot is
>>> more important for large arrays than for small arrays. As such, a
>>> possibility would be to simply sort a stride of the input
>>> (std.range.stride is random access and can be sorted right away
>>> without any particular implementation effort) and then choose the
>>> middle of the stride as the pivot.
>>>
>>> If anyone has the time and inclination, have at it!
>>
>> Perhaps a "median of log n" is in order,
>
> Yah I thought so!
>
>> but the trouble is finding an
>> algorithm for picking the median from n elements. The so called "median
>> of medians" algorithm can choose a pivot within 30-70% of the range of
>> the median. Otherwise, there doesn't seem to be any algorithm for
>> choosing the absolute median other than, say, an insertion sort.
>
> You don't need to choose a median - just sort the data (thereby making progress toward the end goal) and choose the middle element.
>
>> I came up with this clever little guy a while ago which I used in my
>> implementation of quick sort: http://dpaste.dzfl.pl/b85e7ad8
>> I would love to enhance it to work on a variable number of elements, but
>> from what I can comprehend, the result is essentially a partial heap sort.
>
> A very efficient sort for various small fixed sizes is a great complement for quicksort.
>
>
> Andrei

A while ago in another thread about sorting I believe you mentioned the possibility of having templated sorting networks for small numbers of items, did that idea come to anything?

On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei Alexandrescu wrote: > You don't need to choose a median - just sort the data (thereby making progress toward the end goal) and choose the middle element. I don't think it would work well in practice, but I'll put something together to see if the idea does have merit.

On Tuesday, 18 December 2012 at 15:41:28 UTC, Joseph Rushton Wakeling wrote: > On 12/18/2012 04:30 PM, bearophile wrote: >> Why? > > Because if you have to allocate O(n) memory for another algorithm, that might either give you a memory hit that you can't take (less likely these days, to be fair), or simply take a large amount of time to allocate that degrades the performance. > > Happy to learn if my guess is wrong, though. Unless you have the data somehow presorted, or you get them one by one, other sort are faster.

On 12/18/12 10:35 PM, ixid wrote: > A while ago in another thread about sorting I believe you mentioned the > possibility of having templated sorting networks for small numbers of > items, did that idea come to anything? Not that I know of. Andrei

On 12/18/12 9:21 PM, bearophile wrote: > Andrei Alexandrescu: > >> A very efficient sort for various small fixed sizes is a great >> complement for quicksort. > > Do you mean to use it when the input is very short, or when the > QuickSort recursion has produced a very small subarray? The latter. > In the first > case it's useful, but in the second case I've seen it's more efficient > (maybe not for huge arrays, but it is more efficient for normal arrays > in RAM) to not call a specialized sort for such small sub-arrays, and > instead just stop the QuickSort recursion and produce a locally unsorted > array, and then call an insertion sort on the whole data. That's a commonly used approach, but I think it can be improved. Andrei

On 12/18/12 11:37 PM, Xinok wrote: > On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei Alexandrescu wrote: >> You don't need to choose a median - just sort the data (thereby making >> progress toward the end goal) and choose the middle element. > > I don't think it would work well in practice, but I'll put something > together to see if the idea does have merit. I mostly fear cache touching issues. Andrei

On 12/19/2012 06:00 AM, deadalnix wrote: > Unless you have the data somehow presorted, or you get them one by one, other > sort are faster. I was probably imprecise with my use of the word "scales". Obviously other algorithms have superior O() for the general case, but if memory use also scales with n, you are surely going to run into some kind of performance issues as n increases -- whereas if memory use is O(1), not so. Again, I imagine that was a more urgent issue in 1981 ...

On Wed, Dec 19, 2012 at 6:47 AM, Andrei Alexandrescu < SeeWebsiteForEmail@erdani.org> wrote: > On 12/18/12 10:35 PM, ixid wrote: > >> A while ago in another thread about sorting I believe you mentioned the possibility of having templated sorting networks for small numbers of items, did that idea come to anything? >> > > Not that I know of. My bad, that was me and I got sidetracked. I'll modifiy std.algo.sort to see if I get any speed-up.

On Wednesday, 19 December 2012 at 05:48:04 UTC, Andrei Alexandrescu wrote: > On 12/18/12 11:37 PM, Xinok wrote: >> On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei Alexandrescu wrote: >>> You don't need to choose a median - just sort the data (thereby making >>> progress toward the end goal) and choose the middle element. >> >> I don't think it would work well in practice, but I'll put something >> together to see if the idea does have merit. > > I mostly fear cache touching issues. > > Andrei I based my little experiment on my 'unstablesort' module, located here: https://github.com/Xinok/XSort/blob/master/unstablesort.d The results (sorting a random array of 1024*1024 uints): Median of Five: 142ms 21627203 comps Median of log n: 152ms 20778577 comps The code: size_t choosePivot(R range) { import std.math; // Reserve slice of range for choosing pivot R sub = range[0 .. cast(uint)log2(range.length) | 1]; // Pull in equally distributed elements swap(sub[sub.length - 1], range[range.length - 1]); foreach(i; 1 .. sub.length - 1) swap(sub[i], range[range.length / sub.length * i]); // Sort sublist to choose pivot insertionSort(sub); // Move partitionable elements sub = sub[piv + 1 .. sub.length]; foreach(i; 0 .. sub.length) swap(sub[i], range[range.length - sub.length + i]); // Return index of pivot return sub.length / 2; } My thoughts, while the idea does have merit, I think the median of five does a good job as it is. If you're interested in reducing the number of comparisons, replacing "optimisticInsertionSort" in std.algorithm with a binary insertion sort will do much more to achieve that goal. And if you're interested in O(n log n) running time, then add heap sort as a fall-back algorithm, as I did in my module (I actually plan to do this myself ... eventually).

Forums