Randomness in built-in .sort - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Randomness in built-in .sort

Thread overview

Randomness in built-in .sort
Jan 05, 2009 Bill Baxter
Jan 05, 2009 dsimcha
Jan 05, 2009 Bill Baxter
Jan 05, 2009 Andrei Alexandrescu
Jan 05, 2009 dsimcha
Jan 05, 2009 Don
Jan 05, 2009 dsimcha
Jan 05, 2009 Bill Baxter
Jan 05, 2009 dsimcha
Jan 05, 2009 Bill Baxter
Jan 06, 2009 bearophile
Jan 06, 2009 bearophile
Jan 06, 2009 Bill Baxter
Mar 23, 2021 mw
Mar 23, 2021 jmh530
Jan 06, 2009 Benji Smith
Jan 06, 2009 Andrei Alexandrescu
Jan 06, 2009 Walter Bright
Jan 06, 2009 Andrei Alexandrescu
Jan 05, 2009 Stewart Gordon
Jan 05, 2009 Bill Baxter
Jan 06, 2009 Walter Bright
Jan 06, 2009 Sean Kelly
Jan 06, 2009 Ary Borenszweig
Jan 06, 2009 Walter Bright
Jan 06, 2009 Bill Baxter

January 05, 2009

Randomness in built-in .sort

Posted by Bill Baxter

Bill Baxter

It seems to me that the built-in .sort uses a randomized algorithm that leads to results that differ from run-to-run when there are elements with identical keys.

This seems like a bad idea to me for the built-in algorithm to have non-deterministic behavior because it means you can have bugs that aren't consistently repeatable.

I think the built-in sort should be some kind of stable sort.   Also the stability or lack thereof is not mentioned in the spec, and it probably should be because stability is critical for some uses of sorting.  One shouldn't have to guess whether a given implementation of D will use a stable sort or not.

--bb

January 05, 2009

Re: Randomness in built-in .sort

Posted by dsimcha
in reply to Bill Baxter

dsimcha

Posted in reply to Bill Baxter

== Quote from Bill Baxter (wbaxter@gmail.com)'s article
> It seems to me that the built-in .sort uses a randomized algorithm
> that leads to results that differ from run-to-run when there are
> elements with identical keys.
> This seems like a bad idea to me for the built-in algorithm to have
> non-deterministic behavior because it means you can have bugs that
> aren't consistently repeatable.
> I think the built-in sort should be some kind of stable sort.   Also
> the stability or lack thereof is not mentioned in the spec, and it
> probably should be because stability is critical for some uses of
> sorting.  One shouldn't have to guess whether a given implementation
> of D will use a stable sort or not.
> --bb

IDK why sorting is builtin anyhow.  I did some benchmarks a while back, and the builtin sort is slower than std.algorithm.  IIRC the builtin sort uses function pointers for comparison, while the std.algorithm sort uses template parameters that can be inlined at compile time.  I know this was mentioned before, but it didn't really get fully discussed.  Why is sorting builtin?  With property syntax, there's not even any syntactic sugar advantage.  Furthermore, the builtin sort is substantially less flexible than a library sort.

On another note, it would be nice if std.algorithm implemented a stable O(N log N) sort, like a merge sort.  Right now, IIRC it uses an interesting stable swapping algorithm on top of a quick sort for O(N log^2 N) performance.  This is good when space is tight, but I think in most cases a merge sort is better as a stable sort.

On the other hand, if builtin sort is kept, then I agree with you:  It should
follow the D convention of doing the safe thing by default (stable sort with no
O(N^2) corner cases) and allowing less safe behavior (unstable but possibly faster
sort that may or may not have some O(N^2) corner cases) in libraries.

January 05, 2009

Re: Randomness in built-in .sort

Posted by Bill Baxter
in reply to dsimcha

Bill Baxter

Posted in reply to dsimcha

On Mon, Jan 5, 2009 at 12:05 PM, dsimcha <dsimcha@yahoo.com> wrote:
> == Quote from Bill Baxter (wbaxter@gmail.com)'s article
>> It seems to me that the built-in .sort uses a randomized algorithm
>> that leads to results that differ from run-to-run when there are
>> elements with identical keys.
>> This seems like a bad idea to me for the built-in algorithm to have
>> non-deterministic behavior because it means you can have bugs that
>> aren't consistently repeatable.
>> I think the built-in sort should be some kind of stable sort.   Also
>> the stability or lack thereof is not mentioned in the spec, and it
>> probably should be because stability is critical for some uses of
>> sorting.  One shouldn't have to guess whether a given implementation
>> of D will use a stable sort or not.
>> --bb
>
> IDK why sorting is builtin anyhow.  I did some benchmarks a while back, and the builtin sort is slower than std.algorithm.  IIRC the builtin sort uses function pointers for comparison, while the std.algorithm sort uses template parameters that can be inlined at compile time.  I know this was mentioned before, but it didn't really get fully discussed.  Why is sorting builtin?  With property syntax, there's not even any syntactic sugar advantage.  Furthermore, the builtin sort is substantially less flexible than a library sort.

I agree.  Builtin sort should probably just go away in D2.
But it's there in D1 and can't be removed now.  The best we can hope
for there is for the specification to be tightened up to specify that
.sort must be a stable algorithm.

> On another note, it would be nice if std.algorithm implemented a stable O(N log N) sort, like a merge sort.  Right now, IIRC it uses an interesting stable swapping algorithm on top of a quick sort for O(N log^2 N) performance.  This is good when space is tight, but I think in most cases a merge sort is better as a stable sort.
>
> On the other hand, if builtin sort is kept, then I agree with you:  It should
> follow the D convention of doing the safe thing by default (stable sort with no
> O(N^2) corner cases) and allowing less safe behavior (unstable but possibly faster
> sort that may or may not have some O(N^2) corner cases) in libraries.


For my purposes, I just found Oskar Linde's stable sort implementation
(based on merge sort), and am using that now.
It supports a delegate as a predicate, too, which made my code a lot
simpler (got rid of a dummy struct who's only purpose was to provide
the necessary opCmp I wanted).

So yeh, there's not much point for built-in .sort (and .reverse)... except the ever-present "it it's easier to get CTFE working if it's built-in", because the compiler can have a special case CTFE implementation of those operations.

--bb

January 05, 2009

Re: Randomness in built-in .sort

Posted by Andrei Alexandrescu
in reply to dsimcha

Andrei Alexandrescu

Posted in reply to dsimcha

dsimcha wrote:
> On another note, it would be nice if std.algorithm implemented a stable O(N log N)
> sort, like a merge sort.  Right now, IIRC it uses an interesting stable swapping
> algorithm on top of a quick sort for O(N log^2 N) performance.  This is good when
> space is tight, but I think in most cases a merge sort is better as a stable sort.

I agree. I didn't have the time to implement a very good stable sort, but I also think the extra slowness is not critical. If anyone comes with a better implementation, I'd be glad to integrate it.

Andrei

January 05, 2009

Re: Randomness in built-in .sort

Posted by Stewart Gordon
in reply to Bill Baxter

Stewart Gordon

Posted in reply to Bill Baxter

Bill Baxter wrote:
<snip>
> I think the built-in sort should be some kind of stable sort.   Also
> the stability or lack thereof is not mentioned in the spec, and it
> probably should be because stability is critical for some uses of
> sorting.  One shouldn't have to guess whether a given implementation
> of D will use a stable sort or not.
> 
> --bb

A stable sort can be slower than an unstable sort.  There are many cases where the order of equally-ranking keys does not matter or it has no effect whatsoever (e.g. sorting an array of integers).  Then why take the extra overhead of making the sort stable?

We could have an extra property .stableSort, for those cases where you need stability.  Of course, it would be valid to implement .sort and .stableSort with the same algorithm, though to do so would be naive unless someone comes up with a stable sorting algorithm with O(n log n) or better time complexity and O(log n) or better extra memory requirement.  But in the absence of such an algorithm, the compiler could still replace .stableSort with .sort in those cases where it can guarantee it will make no difference to the end result.

Or you could argue that requiring sort to be stable is sufficiently uncommon that it might as well be left to a library function.

Stewart.

January 05, 2009

Re: Randomness in built-in .sort

Posted by dsimcha
in reply to Andrei Alexandrescu

dsimcha

Posted in reply to Andrei Alexandrescu

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> dsimcha wrote:
> > On another note, it would be nice if std.algorithm implemented a stable O(N log N) sort, like a merge sort.  Right now, IIRC it uses an interesting stable swapping algorithm on top of a quick sort for O(N log^2 N) performance.  This is good when space is tight, but I think in most cases a merge sort is better as a stable sort.
> I agree. I didn't have the time to implement a very good stable sort,
> but I also think the extra slowness is not critical. If anyone comes
> with a better implementation, I'd be glad to integrate it.
> Andrei

You could try the merge sort implementation from my dstats library at http://dsource.org/projects/dstats/browser/trunk/sort.d.  This is very heavily tested and optimized because some important higher level dstats functionality depends on it.  You'd basically just have to add a template parameter to allow a user-provided swap function to make it conform to std.algorithm conventions. Obviously, since it's by its nature a stable sort, you don't really need the swapStrategy alias.

One thing to note, though, is that this function is designed to allow for sorting parallel arrays in lockstep by simply passing them in as additional parameters. This adds some complexity to the implementation.  Also, it uses the TempAlloc region allocator for temp space.  You could put this in Phobos too (This allocator was actually your idea a while back, though you called it a SuperStack), or you could change the temp space allocation to simply use the regular heap allocation scheme.

January 05, 2009

Re: Randomness in built-in .sort

Posted by Don
in reply to dsimcha

Don

Posted in reply to dsimcha

dsimcha wrote:
> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
>> dsimcha wrote:
>>> On another note, it would be nice if std.algorithm implemented a stable O(N log N)
>>> sort, like a merge sort.  Right now, IIRC it uses an interesting stable swapping
>>> algorithm on top of a quick sort for O(N log^2 N) performance.  This is good when
>>> space is tight, but I think in most cases a merge sort is better as a stable sort.
>> I agree. I didn't have the time to implement a very good stable sort,
>> but I also think the extra slowness is not critical. If anyone comes
>> with a better implementation, I'd be glad to integrate it.
>> Andrei
> 
> You could try the merge sort implementation from my dstats library at
> http://dsource.org/projects/dstats/browser/trunk/sort.d.  This is very heavily
> tested and optimized because some important higher level dstats functionality
> depends on it.  You'd basically just have to add a template parameter to allow a
> user-provided swap function to make it conform to std.algorithm conventions.
> Obviously, since it's by its nature a stable sort, you don't really need the
> swapStrategy alias.
> 
> One thing to note, though, is that this function is designed to allow for sorting
> parallel arrays in lockstep by simply passing them in as additional parameters.
> This adds some complexity to the implementation.  Also, it uses the TempAlloc
> region allocator for temp space.  You could put this in Phobos too (This allocator
> was actually your idea a while back, though you called it a SuperStack), or you
> could change the temp space allocation to simply use the regular heap allocation
> scheme.

I want TempAlloc in dcore! Eventually, anyway.

January 05, 2009

Re: Randomness in built-in .sort

Posted by dsimcha
in reply to Don

dsimcha

Posted in reply to Don

== Quote from Don (nospam@nospam.com)'s article
> I want TempAlloc in dcore! Eventually, anyway.

The only problem is that, in its current form, TempAlloc is kind of dangerous because it puts the onus on the programmer to not store the only reference to a reference type in a TempAlloc-allocated block.  If TempAlloc were more tightly integrated with the GC, this could be solved easily with negligible overhead.  One would simply have to keep a thread-local variable that tracks how many blocks are currently in use, kind of like the stack pointer in the call stack, and a bit array that tracks whether each block that is currently in use should be scanned. However, without GC integration, just making everything scanned would cause so many false pointer issues and slowdowns when GC is run that I decided that, for now, given the anticipated use cases, making it a little dangerous and scanning nothing was the lesser of two evils.

January 05, 2009

Re: Randomness in built-in .sort

Posted by Bill Baxter
in reply to dsimcha

Bill Baxter

Posted in reply to dsimcha

On Tue, Jan 6, 2009 at 12:16 AM, dsimcha <dsimcha@yahoo.com> wrote:
> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
>> dsimcha wrote:
>> > On another note, it would be nice if std.algorithm implemented a stable O(N log N) sort, like a merge sort.  Right now, IIRC it uses an interesting stable swapping algorithm on top of a quick sort for O(N log^2 N) performance.  This is good when space is tight, but I think in most cases a merge sort is better as a stable sort.
>> I agree. I didn't have the time to implement a very good stable sort,
>> but I also think the extra slowness is not critical. If anyone comes
>> with a better implementation, I'd be glad to integrate it.
>> Andrei
>
> You could try the merge sort implementation from my dstats library at http://dsource.org/projects/dstats/browser/trunk/sort.d.  This is very heavily tested and optimized because some important higher level dstats functionality depends on it.  You'd basically just have to add a template parameter to allow a user-provided swap function to make it conform to std.algorithm conventions. Obviously, since it's by its nature a stable sort, you don't really need the swapStrategy alias.

Thanks for the pointer.  Actually when I said I "found" Oskar Linde's sorting code, that really meant I found it sitting in my source tree where I had incorporated it long ago after Oskar made it available. So I just used that.

> One thing to note, though, is that this function is designed to allow for sorting parallel arrays in lockstep by simply passing them in as additional parameters. This adds some complexity to the implementation.  Also, it uses the TempAlloc region allocator for temp space.  You could put this in Phobos too (This allocator was actually your idea a while back, though you called it a SuperStack), or you could change the temp space allocation to simply use the regular heap allocation scheme.

Actually, a function to sort multiple arrays in parallel was exactly what I was implementing using .sort.  So that doesn't sound like a limitation to me at all.   :-)

--bb

January 05, 2009

Re: Randomness in built-in .sort

Posted by Bill Baxter
in reply to Stewart Gordon

Bill Baxter

Posted in reply to Stewart Gordon

On Mon, Jan 5, 2009 at 8:40 PM, Stewart Gordon <smjg_1998@yahoo.com> wrote:
> Bill Baxter wrote:
> <snip>
>>
>> I think the built-in sort should be some kind of stable sort.   Also the stability or lack thereof is not mentioned in the spec, and it probably should be because stability is critical for some uses of sorting.  One shouldn't have to guess whether a given implementation of D will use a stable sort or not.
>>
>> --bb

> A stable sort can be slower than an unstable sort.  There are many cases where the order of equally-ranking keys does not matter or it has no effect whatsoever (e.g. sorting an array of integers).  Then why take the extra overhead of making the sort stable?

> We could have an extra property .stableSort, for those cases where you need stability.  Of course, it would be valid to implement .sort and .stableSort with the same algorithm, though to do so would be naive unless someone comes up with a stable sorting algorithm with O(n log n) or better time complexity and O(log n) or better extra memory requirement.  But in the absence of such an algorithm, the compiler could still replace .stableSort with .sort in those cases where it can guarantee it will make no difference to the end result.

Actually yeh, I don't really care so much if .sort is stable or not (as long as it's clear in the spec what to expect).  But I do care that it is deterministic.  The current implementation seems to use some randomness (random partitions?) so if I run my program 5 times I get 5 different sort orders when there are duplicated keys.  This makes it really hard to debug things that only happen with certain orders.

> Or you could argue that requiring sort to be stable is sufficiently uncommon that it might as well be left to a library function.

There is no one sort algorithm that is fastest for all inputs.  So it would make more sense to me to argue that if you want the fastest possible sort for your problem then you should go with a library solution.  Built-ins should be efficient, yes, but to argue that efficiency is more important than generality in a built-in seems wrong to me.  A least for .sort.   There's no performance benefit to having a built-in sort, so it's really only there for convenience.   Using a stable sort would make .sort applicable to more problems, thus increasing the integral of convenience.

--bb

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation