View mode: basic / threaded / horizontal-split · Log in · Help
October 15, 2012
Sorting algorithms
Been watching online lectures that's going into sorting and 
searching, and from what I'm seeing most sorting algorithms (by 
using comparison; merge sort, quicksort, etc) and even tree 
algorithms peak at O(n log n). So an example area to be sorted 
with 16 elements would take on average about 100 compares while 
theoretically you can do it in half that number.

 What algorithms get you closest to that optimal value? If there 
isn't any I have an idea that may just do it. Either way I'll be 
trying to implement it and see how it performs.

 I wonder what happens if I succeed and become famous... O.O
October 15, 2012
Re: Sorting algorithms
On Monday, 15 October 2012 at 09:18:12 UTC, Era Scarecrow wrote:
> So an example area to be sorted with 16 elements would take on 
> average about 100 compares while theoretically you can do it in 
> half that number.

 Correction. 16 numbers would be solved in about 49 compares 
while an optimal sorting takes about 45. And for 21 numbers about 
74 compares while optimally about 63.

 These numbers don't seem that large, but at the same time they 
do.
October 15, 2012
Re: Sorting algorithms
On Mon, Oct 15, 2012 at 1:04 PM, Era Scarecrow <rtcvb32@yahoo.com> wrote:
> On Monday, 15 October 2012 at 09:18:12 UTC, Era Scarecrow wrote:
>>
>> So an example area to be sorted with 16 elements would take on average
>> about 100 compares while theoretically you can do it in half that number.
>
>
>  Correction. 16 numbers would be solved in about 49 compares while an
> optimal sorting takes about 45. And for 21 numbers about 74 compares while
> optimally about 63.
>
>  These numbers don't seem that large, but at the same time they do.

Somewhat related: I once played with sorting networks and how to
generate them at compile time in D. It's in template tutorial on
Github. Here are some results:

https://github.com/PhilippeSigaud/D-templates-tutorial/blob/master/templates_around.tex#L560

Discarding LaTeX markup:


n    Sorting    Standard    ratio
    network    sort
5    324        532         1.642
10   555        1096        1.975
15   803        1679        2.091
20   1154       2314        2.005
25   1538       3244        2.109
30   2173       3508        1.614
35   4075       4120        1.011
40   5918       5269        0.890
45   7479       5959        0.797
50   9179       6435        0.701

Were n is the (predetermined) number of elements in an array and a
ratio of 2.0 means sorting networks are twice faster than
std.algo.sort in this particular case.
October 15, 2012
Re: Sorting algorithms
On Monday, 15 October 2012 at 09:18:12 UTC, Era Scarecrow wrote:
>  Been watching online lectures that's going into sorting and 
> searching, and from what I'm seeing most sorting algorithms (by 
> using comparison; merge sort, quicksort, etc) and even tree 
> algorithms peak at O(n log n). So an example area to be sorted 
> with 16 elements would take on average about 100 compares while 
> theoretically you can do it in half that number.

Big-O notation doesn't give you actual numbers, O(n) = O(25*n). 
If you're interested in a practical method, look at TimSort and 
similar ones that combine different algorithms.
October 15, 2012
Re: Sorting algorithms
On 10/15/12 8:11 AM, Philippe Sigaud wrote:
> On Mon, Oct 15, 2012 at 1:04 PM, Era Scarecrow<rtcvb32@yahoo.com>  wrote:
>> On Monday, 15 October 2012 at 09:18:12 UTC, Era Scarecrow wrote:
>>>
>>> So an example area to be sorted with 16 elements would take on average
>>> about 100 compares while theoretically you can do it in half that number.
>>
>>
>>   Correction. 16 numbers would be solved in about 49 compares while an
>> optimal sorting takes about 45. And for 21 numbers about 74 compares while
>> optimally about 63.
>>
>>   These numbers don't seem that large, but at the same time they do.
>
> Somewhat related: I once played with sorting networks and how to
> generate them at compile time in D. It's in template tutorial on
> Github. Here are some results:
>
> https://github.com/PhilippeSigaud/D-templates-tutorial/blob/master/templates_around.tex#L560
>
> Discarding LaTeX markup:
>
>
> n    Sorting    Standard    ratio
>       network    sort
> 5    324        532         1.642
> 10   555        1096        1.975
> 15   803        1679        2.091
> 20   1154       2314        2.005
> 25   1538       3244        2.109
> 30   2173       3508        1.614
> 35   4075       4120        1.011
> 40   5918       5269        0.890
> 45   7479       5959        0.797
> 50   9179       6435        0.701
>
> Were n is the (predetermined) number of elements in an array and a
> ratio of 2.0 means sorting networks are twice faster than
> std.algo.sort in this particular case.

I wanted to investigate small sorts using sorting networks for ages, but 
never got around to it. That's important for quicksort because it 
produces many small arrays that need sorting.

Could you also test for very small sizes (2 to 4) and essentially test 
for 1-increment speed up to, say, 30 elements? I assume that's where the 
major wins come. I think we could use CT-generated sorting networks for 
arrays below a specific size. The converse risk is growth of generated code.


Andrei
October 15, 2012
Re: Sorting algorithms
On Monday, 15 October 2012 at 15:49:51 UTC, thedeemon wrote:
> On Monday, 15 October 2012 at 09:18:12 UTC, Era Scarecrow wrote:
> Big-O notation doesn't give you actual numbers, O(n) = O(25*n). 
> If you're interested in a practical method, look at TimSort and 
> similar ones that combine different algorithms.

 Yeah I know it's more of a generalized number of steps, but it 
still gives you a good idea of sorting time. I'll give TimSort a 
look over.

 Currently I'm estimating this will be a variant of merge-sort.
October 15, 2012
Re: Sorting algorithms
On 15-Oct-12 23:15, Era Scarecrow wrote:
> On Monday, 15 October 2012 at 15:49:51 UTC, thedeemon wrote:
>> On Monday, 15 October 2012 at 09:18:12 UTC, Era Scarecrow wrote:
>> Big-O notation doesn't give you actual numbers, O(n) = O(25*n). If
>> you're interested in a practical method, look at TimSort and similar
>> ones that combine different algorithms.
>
>   Yeah I know it's more of a generalized number of steps, but it still
> gives you a good idea of sorting time. I'll give TimSort a look over.
>
>   Currently I'm estimating this will be a variant of merge-sort.

A hybrid. I'm currently trying to get into Phobos:
https://github.com/D-Programming-Language/phobos/pull/787


-- 
Dmitry Olshansky
October 16, 2012
Re: Sorting algorithms
On Monday, 15 October 2012 at 20:58:36 UTC, Dmitry Olshansky 
wrote:
> A hybrid. I'm currently trying to get into Phobos:
> https://github.com/D-Programming-Language/phobos/pull/787

 I'll have to look it over in more detail another time.



 Although another question comes to mind. How many algorithms are 
really 'lazy'? I know some can stop after getting x elements, but 
almost all algorithms need to do at least a certain amount of 
work before they can pause or stop at a point.

 I think I can make mine lazy and range friendly, just not random 
access friendly at the same time.
October 17, 2012
Re: Sorting algorithms
On Mon, Oct 15, 2012 at 5:52 PM, Andrei Alexandrescu
<SeeWebsiteForEmail@erdani.org> wrote:

> I wanted to investigate small sorts using sorting networks for ages, but
> never got around to it. That's important for quicksort because it produces
> many small arrays that need sorting.
>
> Could you also test for very small sizes (2 to 4) and essentially test for
> 1-increment speed up to, say, 30 elements? I assume that's where the major
> wins come. I think we could use CT-generated sorting networks for arrays
> below a specific size. The converse risk is growth of generated code.

Here:

http://dpaste.dzfl.pl/42fac981

I don't know if the benchmarking code is OK. I substract a reference
because randomly shuffling an array takes some time.

Results for my computer (smaller ratio means faster network sort
compared to std.algorithm.sort)

Size 1, network: 2.10, std.algorithm.sort: 15.86, ratio network/std.algo: 0.13
Size 2, network: 2.23, std.algorithm.sort: 14.26, ratio network/std.algo: 0.16
Size 3, network: 6.22, std.algorithm.sort: 20.75, ratio network/std.algo: 0.30
Size 4, network: 8.25, std.algorithm.sort: 28.36, ratio network/std.algo: 0.29
Size 5, network: 18.54, std.algorithm.sort: 39.02, ratio network/std.algo: 0.48
Size 6, network: 20.12, std.algorithm.sort: 45.58, ratio network/std.algo: 0.44
Size 7, network: 27.49, std.algorithm.sort: 55.53, ratio network/std.algo: 0.50
Size 8, network: 33.91, std.algorithm.sort: 66.02, ratio network/std.algo: 0.51
Size 9, network: 53.98, std.algorithm.sort: 75.54, ratio network/std.algo: 0.71
Size 10, network: 46.66, std.algorithm.sort: 81.68, ratio network/std.algo: 0.57
Size 11, network: 65.06, std.algorithm.sort: 111.25, ratio
network/std.algo: 0.58
Size 12, network: 66.31, std.algorithm.sort: 109.40, ratio
network/std.algo: 0.61
Size 13, network: 74.84, std.algorithm.sort: 115.94, ratio
network/std.algo: 0.65
Size 14, network: 90.05, std.algorithm.sort: 131.84, ratio
network/std.algo: 0.68
Size 15, network: 95.23, std.algorithm.sort: 145.31, ratio
network/std.algo: 0.66
Size 16, network: 104.66, std.algorithm.sort: 162.84, ratio
network/std.algo: 0.64
Size 17, network: 125.30, std.algorithm.sort: 167.49, ratio
network/std.algo: 0.75
Size 18, network: 133.10, std.algorithm.sort: 182.20, ratio
network/std.algo: 0.73
Size 19, network: 143.92, std.algorithm.sort: 195.58, ratio
network/std.algo: 0.74
Size 20, network: 155.01, std.algorithm.sort: 211.59, ratio
network/std.algo: 0.73
Size 21, network: 171.43, std.algorithm.sort: 224.47, ratio
network/std.algo: 0.76
Size 22, network: 177.46, std.algorithm.sort: 236.92, ratio
network/std.algo: 0.75
Size 23, network: 192.22, std.algorithm.sort: 253.38, ratio
network/std.algo: 0.76
Size 24, network: 205.39, std.algorithm.sort: 270.83, ratio
network/std.algo: 0.76
Size 25, network: 213.25, std.algorithm.sort: 281.01, ratio
network/std.algo: 0.76
Size 26, network: 233.96, std.algorithm.sort: 283.57, ratio
network/std.algo: 0.83
Size 27, network: 246.73, std.algorithm.sort: 297.67, ratio
network/std.algo: 0.83
Size 28, network: 260.41, std.algorithm.sort: 313.88, ratio
network/std.algo: 0.83
Size 29, network: 280.06, std.algorithm.sort: 321.01, ratio
network/std.algo: 0.87
Size 30, network: 298.65, std.algorithm.sort: 342.55, ratio
network/std.algo: 0.87
Size 31, network: 308.09, std.algorithm.sort: 355.70, ratio
network/std.algo: 0.87
Size 32, network: 323.89, std.algorithm.sort: 380.31, ratio
network/std.algo: 0.85

On the computers I tested it (Windows, Linux, different machines), the
cutoff is at about 35-38 elements.
October 17, 2012
Re: Sorting algorithms
On 10/17/12 3:07 PM, Philippe Sigaud wrote:
> On Mon, Oct 15, 2012 at 5:52 PM, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org>  wrote:
>
>> I wanted to investigate small sorts using sorting networks for ages, but
>> never got around to it. That's important for quicksort because it produces
>> many small arrays that need sorting.
>>
>> Could you also test for very small sizes (2 to 4) and essentially test for
>> 1-increment speed up to, say, 30 elements? I assume that's where the major
>> wins come. I think we could use CT-generated sorting networks for arrays
>> below a specific size. The converse risk is growth of generated code.
>
> Here:
>
> http://dpaste.dzfl.pl/42fac981
>
> I don't know if the benchmarking code is OK. I substract a reference
> because randomly shuffling an array takes some time.
>
> Results for my computer (smaller ratio means faster network sort
> compared to std.algorithm.sort)
>
> Size 1, network: 2.10, std.algorithm.sort: 15.86, ratio network/std.algo: 0.13
> Size 2, network: 2.23, std.algorithm.sort: 14.26, ratio network/std.algo: 0.16
> Size 3, network: 6.22, std.algorithm.sort: 20.75, ratio network/std.algo: 0.30
> Size 4, network: 8.25, std.algorithm.sort: 28.36, ratio network/std.algo: 0.29
> Size 5, network: 18.54, std.algorithm.sort: 39.02, ratio network/std.algo: 0.48
> Size 6, network: 20.12, std.algorithm.sort: 45.58, ratio network/std.algo: 0.44
> Size 7, network: 27.49, std.algorithm.sort: 55.53, ratio network/std.algo: 0.50
> Size 8, network: 33.91, std.algorithm.sort: 66.02, ratio network/std.algo: 0.51
[snip]

Looking great, thanks. I'm on the road with little time and 
connectivity, but I want to encourage you with integrating this with 
std.sort. There seems to be a big gain drop off at size 9, so we could 
use sorting networks for size <= 8. (I'm also worried about generated 
code size.) So next step would be to integrate the sorting network 
within std.sort and see how it works there.

Please don't let this good work go to waste!


Andrei
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home