Sorting algorithms benchmark

Feb 27, 2007

Stewart Gordon

Feb 28, 2007

Deewiant

Mar 01, 2007

Mar 06, 2007

Mar 07, 2007

Mar 07, 2007

Mar 07, 2007

Mar 07, 2007

I've written a program to benchmark a number of sorting algorithms. http://pr.stewartsplace.org.uk/d/sortbench.d It includes options to output the timings to a CSV file and to control the number of tests of each algorithm to perform (likely to be needed on modern machines). Stewart.

February 28, 2007

Re: Sorting algorithms benchmark

Posted by Deewiant
in reply to Stewart Gordon

Permalink

Deewiant

Posted in reply to Stewart Gordon

Permalink

Stewart Gordon wrote:
> I've written a program to benchmark a number of sorting algorithms.
> 
> http://pr.stewartsplace.org.uk/d/sortbench.d
> 
> It includes options to output the timings to a CSV file and to control the number of tests of each algorithm to perform (likely to be needed on modern machines).
> 
> Stewart.
> 

Nice. One of the very first D programs I wrote was a class Sorter which aimed to do the same, but I never published it since most of the code was just pretty much directly copied from somewhere. Besides, I didn't really understand most of the algorithms, and couldn't get all the ones I wanted to work. Pigeonhole sort, in particular, couldn't reliably work for all types without opAssign overloading, which was necessary.

Some comments from what I do (think I do) know, however:

Your comb sort uses a shrink factor of 0.7. Comb sort 11 uses 1 / (1 - 1 / (e ^
phi)), where e is Napier's constant and phi the golden ratio. This leads to
possible optimization in the algorithm.

But alas, the page that used to contain the article describing it is gone from
the web: http://world.std.com/~jdveale/combsort.htm. Dammit! I haven't copied
the source code for it from there, either. Oh well, a less optimal version but
still, I wager, better than yours: http://yagni.com/combsort/index.php#cpp-combsort

What I used: const real SHRINK_FACTOR = 1.2473309501039786540366528676643474234433714124826;

You're missing shell sort: http://en.wikipedia.org/wiki/Shell_sort
See also: http://www.cs.princeton.edu/~rs/talks/shellsort.pdf
My 1.5-year old code (no idea how it works - I'm particularly frightened of the
bit math, I think it's the formula here:
http://www.research.att.com/~njas/sequences/A033622) is at the bottom of this
post, if you want to have a look.

Other sorts that could be added:
	- Gnome sort, http://en.wikipedia.org/wiki/Gnome_sort
	- Introspective sort: http://en.wikipedia.org/wiki/Introsort
	- Three-way merge sort:
http://www.cs.queensu.ca/home/cisc121/2002w/assignments/assn8/solution/MergeThreeWay.java
	- Counting sort, http://en.wikipedia.org/wiki/Counting_sort
	- Bucket sort, http://en.wikipedia.org/wiki/Bucket_sort
	- Pigeonhole sort, http://en.wikipedia.org/wiki/Pigeonhole_sort

I didn't look too much at your radix sorts, they might have a lot in common with the last three of the above.

Also, your "buffered insertion sort" is perhaps more commonly known as "binary insertion sort", since you mention it uses binary search.

--

// TYPE is just a template parameter
// comp defaults to a simple less-than comparison
// the inout is probably unnecessary, I wouldn't have known that back then
void shellSort(inout TYPE[] a, size_t beg = 0, size_t end = 0, cmpType cmp = null) {
	if (cmp is null) cmp = this.comp;
	if (end == 0) end = a.length;

	// can't figure out how to get it working for arbitrary beg and end, hence kept
separate here
	void actualShellSort(inout TYPE[] ra) {
		size_t h = 1, hh = 1;
		while (h < ra.length) {
			// magic numbers from formula:
			//
http://www.research.att.com/cgi-bin/access.cgi/as/njas/sequences/eisA.cgi?Anum=A033622
			if (hh % 2)
				h = 8 << hh - 6 << ((hh + 1) / 2) + 1;
			else
				h = 9 << hh - 9 << ( hh      / 2) + 1;
			++hh;
		}

		while (h > 0) {
			// for each set, of which there are h
			for (size_t i = h - 1; i < ra.length; ++i) {
				// pick last element in set
				TYPE tmp = ra[i];
				size_t j = i;

				// compare tmp to the one before it in the set
				// if they are not in order, continue this loop, moving
				// elements back to make room for tmp
				for (; j >= h && !cmp(tmp, ra[j-h]); j -= h)
					ra[j] = ra[j - h];

				ra[j] = tmp;
			}

			// all h-sets sorted
			h /= 3;
		}
	}

	if (!smartToSort(a, beg, end, cmp)) return;

	TYPE[] foo = a[beg..end];
	foo.actualShellSort();
}

// other relevant stuff to get the above to compile (not tested, though):

bit smartToSort(inout TYPE[] a, size_t beg, size_t end, cmpType cmp) {
	if (end <= beg)
		return false;
	if (end - beg > a.length)
		return false;

	if (end - beg < 2)
		return false;
	else if (end - beg == 2) {
		if (!cmp(a[end - 1], a[beg]))
			a.swap(beg, end - 1);
		return false;
	}

	return true;
}

alias int function(in TYPE left, in TYPE right) cmpType;

Deewiant Wrote: <snip> > Your comb sort uses a shrink factor of 0.7. Comb sort 11 uses 1 / (1 - 1 / (e ^ phi)), where e is Napier's constant and phi the golden ratio. This leads to possible optimization in the algorithm. Hang on - isn't 0.7 a grow factor, rather than a shrink factor? <snip> > You're missing shell sort: http://en.wikipedia.org/wiki/Shell_sort Indeed, that could be implemented as well. I noticed there: "For instance, if a list was 5-sorted and then 3-sorted, the list is now not only 3-sorted, but both 5- and 3-sorted." Guess I should read up on the proof sometime. <snip> > Other sorts that could be added: > - Gnome sort, http://en.wikipedia.org/wiki/Gnome_sort Not sure if this one is worth it.... > - Introspective sort: http://en.wikipedia.org/wiki/Introsort > - Three-way merge sort: > http://www.cs.queensu.ca/home/cisc121/2002w/assignments/assn8/solution/MergeThreeWay.java Certainly worth considering. > - Counting sort, http://en.wikipedia.org/wiki/Counting_sort > - Bucket sort, http://en.wikipedia.org/wiki/Bucket_sort > - Pigeonhole sort, > http://en.wikipedia.org/wiki/Pigeonhole_sort <snip> It took me a moment to figure the difference between these three. But I would need to reduce the range of the random sequences a lot before any of them will be remotely feasible. Stewart.

Stewart Gordon wrote: > Deewiant Wrote: >> Your comb sort uses a shrink factor of 0.7. Comb sort 11 uses 1 / (1 - 1 / (e ^ phi)), where e is Napier's constant and phi the golden ratio. This leads to possible optimization in the algorithm. > > Hang on - isn't 0.7 a grow factor, rather than a shrink factor? I took a more detailed (but still quite cursory) look at your code, and it seems you do things differently from the "traditional" method. I'm not sure if that's even equivalent to the normal comb sort. But yes, since 0.7 is less than 1, I suppose you could call it a grow factor. It still shrinks the size of the gap you're looking at, though. Up to you, but all references to comb sort I've seen talk about shrink factors. :-) >> Other sorts that could be added: >> - Gnome sort, http://en.wikipedia.org/wiki/Gnome_sort > > Not sure if this one is worth it.... It's only a few lines of code, so there's not much to be worth. And at least it's faster than bubble sort. ;-) -- Remove ".doesnotlike.spam" from the mail address.

Deewiant Wrote: <snip> > I took a more detailed (but still quite cursory) look at your code, and it seems you do things differently from the "traditional" method. I'm not sure if that's even equivalent to the normal comb sort. What do you consider to be the "normal comb sort"? OK, so the call to lastSwapBubbleSort was my idea, but other than that, there doesn't seem to be any difference at all, at least from this: http://en.wikipedia.org/wiki/Comb_sort <snip> >>> Other sorts that could be added: >>> - Gnome sort, http://en.wikipedia.org/wiki/Gnome_sort >> >> Not sure if this one is worth it.... > > It's only a few lines of code, so there's not much to be worth. And at least it's faster than bubble sort. ;-) Looking at it on Wikipedia, it appears to be just a way of implementing insertion sort without nesting loops. Strange - before, I'd understood it to be something even slower - only one index variable, and so after moving an element down you step forward one by one just to get back to where you moved it from. Stewart.

Stewart Gordon wrote: > Deewiant Wrote: > <snip> >> I took a more detailed (but still quite cursory) look at your code, and it seems you do things differently from the "traditional" method. I'm not sure if that's even equivalent to the normal comb sort. > > What do you consider to be the "normal comb sort"? OK, so the call to lastSwapBubbleSort was my idea, but other than that, there doesn't seem to be any difference at all, at least from this: > > http://en.wikipedia.org/wiki/Comb_sort > That's exactly what's confusing me. I suppose the reason you need that call is that you're not counting the number of swaps made in the loop, and you're stopping just when the gap is 1, not when swaps are also zero. Which I think makes the algorithm a bit different, since the number of "combs" you do is limited to once per gap size, plus the one bubble sort after you're done. I added my comb sort 11 implementation to your benchmark, and it's noticeably faster for random data than your comb sort, at least on my machine. So something's different, to be sure. Even without the optimization related to gap size 11, mine is faster. Here's the source, changed to work with your benchmark, if you're interested: void combSort11(uint[] a) out { debug (100) printList(a); checkSorted(a); } body { // empirically found to be good at: // http://world.std.com/~jdveale/combsort.htm // 1 / (1 - 1 / (e ^ phi)), phi being golden ratio = (sqrt(5) + 1)/2 const real SHRINK_FACTOR = 1.2473309501039786540366528676643474234433714124826L; bit swapped = false; size_t gap = a.length; do { if (gap > 1) { gap = cast(size_t)(cast(real) gap / SHRINK_FACTOR); // hence, comb sort 11 if (gap == 9 || gap == 10) gap = 11; } swapped = false; for (size_t i = 0; i < a.length - gap; ++i) { size_t j = i + gap; if (a[i] > a[j]) { swap(a[i], a[j]); swapped = true; } } } while (swapped || gap > 1); } -- Remove ".doesnotlike.spam" from the mail address.

Deewiant Wrote: <snip> > That's exactly what's confusing me. I suppose the reason you need that call is that you're not counting the number of swaps made in the loop, and you're stopping just when the gap is 1, not when swaps are also zero. Which I think makes the algorithm a bit different, since the number of "combs" you do is limited to once per gap size, plus the one bubble sort after you're done. It's more the other way round - calling bubble sort is the way I chose to implement the passes of gap size 1. It saves having to maintain a swap flag during the passes of gap > 1. > I added my comb sort 11 implementation to your benchmark, and it's noticeably faster for random data than your comb sort, at least on my machine. So something's different, to be sure. Even without the optimization related to gap size 11, mine is faster. <snip> That appears to be because you've implemented the optimum shrink factor. If I change my implementation to use it as well, it runs at about the same speed as yours. Stewart.

Stewart Gordon wrote: > Deewiant Wrote: >> I added my comb sort 11 implementation to your benchmark, and it's noticeably faster for random data than your comb sort, at least on my machine. So something's different, to be sure. Even without the optimization related to gap size 11, mine is faster. > <snip> > > That appears to be because you've implemented the optimum shrink factor. If I change my implementation to use it as well, it runs at about the same speed as yours. > Ah, of course. That clears it up. I ran the whole benchmark: for these cases, merge sort and comb sort 11 are the only two to beat the built-in sort (taking the average of the results of all 9 runs). Interesting. -- Remove ".doesnotlike.spam" from the mail address.

Forums