December 19, 2012
Andrei Alexandrescu:

> A very efficient sort for various small fixed sizes is a great complement for quicksort.

Do you mean to use it when the input is very short, or when the QuickSort recursion has produced a very small subarray? In the first case it's useful, but in the second case I've seen it's more efficient (maybe not for huge arrays, but it is more efficient for normal arrays in RAM) to not call a specialized sort for such small sub-arrays, and instead just stop the QuickSort recursion and produce a locally unsorted array, and then call an insertion sort on the whole data.

Bye,
bearophile
December 19, 2012
On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei Alexandrescu wrote:
> On 12/18/12 8:42 PM, Xinok wrote:
>> On Tuesday, 18 December 2012 at 15:55:17 UTC, Andrei Alexandrescu wrote:
>>> Another approach I wanted to try was to choose the median of K with K
>>> increasing with the size of the array. This is because a good pivot is
>>> more important for large arrays than for small arrays. As such, a
>>> possibility would be to simply sort a stride of the input
>>> (std.range.stride is random access and can be sorted right away
>>> without any particular implementation effort) and then choose the
>>> middle of the stride as the pivot.
>>>
>>> If anyone has the time and inclination, have at it!
>>
>> Perhaps a "median of log n" is in order,
>
> Yah I thought so!
>
>> but the trouble is finding an
>> algorithm for picking the median from n elements. The so called "median
>> of medians" algorithm can choose a pivot within 30-70% of the range of
>> the median. Otherwise, there doesn't seem to be any algorithm for
>> choosing the absolute median other than, say, an insertion sort.
>
> You don't need to choose a median - just sort the data (thereby making progress toward the end goal) and choose the middle element.
>
>> I came up with this clever little guy a while ago which I used in my
>> implementation of quick sort: http://dpaste.dzfl.pl/b85e7ad8
>> I would love to enhance it to work on a variable number of elements, but
>> from what I can comprehend, the result is essentially a partial heap sort.
>
> A very efficient sort for various small fixed sizes is a great complement for quicksort.
>
>
> Andrei

A while ago in another thread about sorting I believe you mentioned the possibility of having templated sorting networks for small numbers of items, did that idea come to anything?
December 19, 2012
On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei Alexandrescu wrote:
> You don't need to choose a median - just sort the data (thereby making progress toward the end goal) and choose the middle element.

I don't think it would work well in practice, but I'll put something together to see if the idea does have merit.
December 19, 2012
On Tuesday, 18 December 2012 at 15:41:28 UTC, Joseph Rushton Wakeling wrote:
> On 12/18/2012 04:30 PM, bearophile wrote:
>> Why?
>
> Because if you have to allocate O(n) memory for another algorithm, that might either give you a memory hit that you can't take (less likely these days, to be fair), or simply take a large amount of time to allocate that degrades the performance.
>
> Happy to learn if my guess is wrong, though.

Unless you have the data somehow presorted, or you get them one by one, other sort are faster.
December 19, 2012
On 12/18/12 10:35 PM, ixid wrote:
> A while ago in another thread about sorting I believe you mentioned the
> possibility of having templated sorting networks for small numbers of
> items, did that idea come to anything?

Not that I know of.

Andrei
December 19, 2012
On 12/18/12 9:21 PM, bearophile wrote:
> Andrei Alexandrescu:
>
>> A very efficient sort for various small fixed sizes is a great
>> complement for quicksort.
>
> Do you mean to use it when the input is very short, or when the
> QuickSort recursion has produced a very small subarray?

The latter.

> In the first
> case it's useful, but in the second case I've seen it's more efficient
> (maybe not for huge arrays, but it is more efficient for normal arrays
> in RAM) to not call a specialized sort for such small sub-arrays, and
> instead just stop the QuickSort recursion and produce a locally unsorted
> array, and then call an insertion sort on the whole data.

That's a commonly used approach, but I think it can be improved.


Andrei
December 19, 2012
On 12/18/12 11:37 PM, Xinok wrote:
> On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei Alexandrescu wrote:
>> You don't need to choose a median - just sort the data (thereby making
>> progress toward the end goal) and choose the middle element.
>
> I don't think it would work well in practice, but I'll put something
> together to see if the idea does have merit.

I mostly fear cache touching issues.

Andrei
December 19, 2012
On 12/19/2012 06:00 AM, deadalnix wrote:
> Unless you have the data somehow presorted, or you get them one by one, other
> sort are faster.

I was probably imprecise with my use of the word "scales".  Obviously other algorithms have superior O() for the general case, but if memory use also scales with n, you are surely going to run into some kind of performance issues as n increases -- whereas if memory use is O(1), not so.

Again, I imagine that was a more urgent issue in 1981 ...

December 19, 2012
On Wed, Dec 19, 2012 at 6:47 AM, Andrei Alexandrescu < SeeWebsiteForEmail@erdani.org> wrote:

> On 12/18/12 10:35 PM, ixid wrote:
>
>> A while ago in another thread about sorting I believe you mentioned the possibility of having templated sorting networks for small numbers of items, did that idea come to anything?
>>
>
> Not that I know of.


My bad, that was me and I got sidetracked. I'll modifiy std.algo.sort to see if I get any speed-up.


December 21, 2012
On Wednesday, 19 December 2012 at 05:48:04 UTC, Andrei Alexandrescu wrote:
> On 12/18/12 11:37 PM, Xinok wrote:
>> On Wednesday, 19 December 2012 at 02:00:05 UTC, Andrei Alexandrescu wrote:
>>> You don't need to choose a median - just sort the data (thereby making
>>> progress toward the end goal) and choose the middle element.
>>
>> I don't think it would work well in practice, but I'll put something
>> together to see if the idea does have merit.
>
> I mostly fear cache touching issues.
>
> Andrei

I based my little experiment on my 'unstablesort' module, located here:
https://github.com/Xinok/XSort/blob/master/unstablesort.d

The results (sorting a random array of 1024*1024 uints):

Median of Five:
142ms     21627203 comps

Median of log n:
152ms     20778577 comps

The code:

size_t choosePivot(R range)
{
	import std.math;
	// Reserve slice of range for choosing pivot
	R sub = range[0 .. cast(uint)log2(range.length) | 1];
	// Pull in equally distributed elements
	swap(sub[sub.length - 1], range[range.length - 1]);
	foreach(i; 1 .. sub.length - 1) swap(sub[i], range[range.length / sub.length * i]);
	// Sort sublist to choose pivot
	insertionSort(sub);
	// Move partitionable elements
	sub = sub[piv + 1 .. sub.length];
	foreach(i; 0 .. sub.length) swap(sub[i], range[range.length - sub.length + i]);
	// Return index of pivot
	return sub.length / 2;
}

My thoughts, while the idea does have merit, I think the median of five does a good job as it is. If you're interested in reducing the number of comparisons, replacing "optimisticInsertionSort" in std.algorithm with a binary insertion sort will do much more to achieve that goal. And if you're interested in O(n log n) running time, then add heap sort as a fall-back algorithm, as I did in my module (I actually plan to do this myself ... eventually).
1 2
Next ›   Last »