Randomness in built-in .sort (page 2)

== Quote from Bill Baxter (wbaxter@gmail.com)'s article > Actually, a function to sort multiple arrays in parallel was exactly > what I was implementing using .sort. So that doesn't sound like a > limitation to me at all. :-) > --bb Am I (and possibly you) the only one(s) who think that sorting multiple arrays in parallel should be standard library functionality? The standard rebuttal might be "use arrays of structs instead of parallel arrays". This is a good idea in some situations, but for others, parallel arrays are just plain better. Furthermore, with D's handling of variadic functions, generalizing any sort to handle parallel arrays is easy.

On Tue, Jan 6, 2009 at 6:15 AM, dsimcha <dsimcha@yahoo.com> wrote: > == Quote from Bill Baxter (wbaxter@gmail.com)'s article >> Actually, a function to sort multiple arrays in parallel was exactly >> what I was implementing using .sort. So that doesn't sound like a >> limitation to me at all. :-) >> --bb > > Am I (and possibly you) the only one(s) who think that sorting multiple arrays in > parallel should be standard library functionality? I use this all the time in NumPy in the form of "argsort" which returns a list of indices giving the sort order. That can then be used as to index other arrays (thereby permuting them to the sorted order). order = npy.argsort(keys) sortedA = A[order] sortedB = B[order] > The standard rebuttal might be > "use arrays of structs instead of parallel arrays". > This is a good idea in some > situations, but for others, parallel arrays are just plain better. Yeh, that's just a silly argument. Sometimes a column-oriented organization is more suitable than a row-oriented one. Sort can be made to work on column-oriented data, so there's no reason not to make it so. > Furthermore, with D's handling of variadic functions, generalizing any sort to handle parallel arrays is easy. Yeh, this works pretty nicely. --bb

dsimcha wrote: > == Quote from Bill Baxter (wbaxter@gmail.com)'s article >> Actually, a function to sort multiple arrays in parallel was exactly >> what I was implementing using .sort. So that doesn't sound like a >> limitation to me at all. :-) >> --bb > > Am I (and possibly you) the only one(s) who think that sorting multiple arrays in > parallel should be standard library functionality? The standard rebuttal might be > "use arrays of structs instead of parallel arrays". This is a good idea in some > situations, but for others, parallel arrays are just plain better. Furthermore, > with D's handling of variadic functions, generalizing any sort to handle parallel > arrays is easy. I've written my own parallel-array quicksort implementation (several times over, in many different languages). Parallel sorting is one of my favorite tricks, and I think it definitely belongs in the standard library. --benji

Bill Baxter: >I use this all the time in NumPy in the form of "argsort" which returns a list of indices giving the sort order. That can then be used as to index other arrays (thereby permuting them to the sorted order). order = npy.argsort(keys) sortedA = A[order] sortedB = B[order]< My dlibs already contain an efficient sorting routine, much faster than the built in one. So to my dlibs I have just added sortedIndexes and remapOrder, you can find their docs here: http://www.fantascienza.net/leonardo/so/dlibs/func.html The code: http://www.fantascienza.net/leonardo/so/libs_d.zip The usage is very simple, sortedIndexes is similar to sorted(): auto a = [-5, 2, 3, 1, -11].dup; auto order1 = a.sortedIndexes(); // now order1 == [4U, 0, 3, 1, 2] auto order2 = a.sortedIndexes((int x){return x < 0 ? -x : x;}); // Now order2 == [3U, 1, 2, 0, 4] int[2][] c = [[3, 4], [1,2], [1, 1]]; auto order3 = c.sortedIndexes(); // now order3 == [2U, 1, 0] auto a = [-5, 2, 3, 1, -11].dup; a.remapOrder([4, 0, 3, 1, 2]) ==> [-11, -5, 1, 2, 3] // a isn't changed here a.remapOrder([3, 1, 2, 0, 4]) ==> [1, 2, 3, -5, -11] auto b = ["oranges"[], "for", "apples", "is", "right"].dup; b.remapOrder([3, 1, 2, 0, 4]) ==> ["is"[], "for", "apples", "oranges", "right"].dup); (Maybe I have to rename remapOrder as remappedOrder). -------------------------------- While writing the unittest for those functions, I have seen this isn't accepted by DMD: int[2][] c = [[3, 4], [1, 2], [1, 1]].dup; While the compiler accepts this, I don't know why: int[2][] c0 = [[3, 4], [1, 2], [1, 1]]; int[2][] c2 = c0.dup; Bye, bearophile

January 06, 2009

Re: Randomness in built-in .sort

Posted by Bill Baxter
in reply to bearophile

Permalink

Bill Baxter

Posted in reply to bearophile

Permalink

On Tue, Jan 6, 2009 at 10:18 AM, bearophile <bearophileHUGS@lycos.com> wrote:
> Bill Baxter:
>
>>I use this all the time in NumPy in the form of "argsort" which returns a list of indices giving the sort order.  That can then be used as to index other arrays (thereby permuting them to the sorted order).
> order = npy.argsort(keys)
> sortedA = A[order]
> sortedB = B[order]<
>
> My dlibs already contain an efficient sorting routine, much faster than the built in one.
>
> So to my dlibs I have just added sortedIndexes and remapOrder, you can find their docs here: http://www.fantascienza.net/leonardo/so/dlibs/func.html
>
> The code: http://www.fantascienza.net/leonardo/so/libs_d.zip
>
> The usage is very simple, sortedIndexes is similar to sorted():
>
> auto a =  [-5, 2, 3, 1, -11].dup;
> auto order1 = a.sortedIndexes();
> // now order1 == [4U, 0, 3, 1, 2]
>
> auto order2 = a.sortedIndexes((int x){return x < 0 ? -x : x;});
> // Now order2 == [3U, 1, 2, 0, 4]
>
> int[2][] c = [[3, 4], [1,2], [1, 1]];
> auto order3 = c.sortedIndexes();
> // now order3 == [2U, 1, 0]
>
> auto a =  [-5, 2, 3, 1, -11].dup;
>
> a.remapOrder([4, 0, 3, 1, 2]) ==> [-11, -5, 1, 2, 3]
> // a isn't changed here
> a.remapOrder([3, 1, 2, 0, 4]) ==> [1, 2, 3, -5, -11]
>
> auto b = ["oranges"[], "for", "apples", "is", "right"].dup;
> b.remapOrder([3, 1, 2, 0, 4]) ==>
> ["is"[], "for", "apples", "oranges", "right"].dup);
>
> (Maybe I have to rename remapOrder as remappedOrder).
>
> --------------------------------
>
> While writing the unittest for those functions, I have seen this isn't accepted by DMD: int[2][] c = [[3, 4], [1, 2], [1, 1]].dup;
>
> While the compiler accepts this, I don't know why:
> int[2][] c0 = [[3, 4], [1, 2], [1, 1]];
> int[2][] c2 = c0.dup;


Making a permutation array like that is simple and nice, but another alternative which may be more efficient is to override the "swap(i,j)" routine used by the sort implementation, and make it swap the i'th and jth elements of all the arrays that are being sorted.  That can be done with O(1) extra storage instead of the O(n) required by making to make a permutation array.  I think it's also hard to avoid O(n) extra memory for permuting an array.  At least I can't think of how to do it with out using an extra allocation.

--bb

Benji Smith wrote: > dsimcha wrote: >> == Quote from Bill Baxter (wbaxter@gmail.com)'s article >>> Actually, a function to sort multiple arrays in parallel was exactly >>> what I was implementing using .sort. So that doesn't sound like a >>> limitation to me at all. :-) >>> --bb >> >> Am I (and possibly you) the only one(s) who think that sorting multiple arrays in >> parallel should be standard library functionality? The standard rebuttal might be >> "use arrays of structs instead of parallel arrays". This is a good idea in some >> situations, but for others, parallel arrays are just plain better. Furthermore, >> with D's handling of variadic functions, generalizing any sort to handle parallel >> arrays is easy. > > I've written my own parallel-array quicksort implementation (several times over, in many different languages). > > Parallel sorting is one of my favorite tricks, and I think it definitely belongs in the standard library. > > --benji std.algorithm's parameterized swap primitive is motivated in part by parallel array manipulation. See the schwartz routines in there for examples. Andrei

Andrei Alexandrescu wrote: > std.algorithm's parameterized swap primitive is motivated in part by parallel array manipulation. See the schwartz routines in there for examples. And may the schwartz be with your sorts!

Walter Bright wrote: > Andrei Alexandrescu wrote: >> std.algorithm's parameterized swap primitive is motivated in part by parallel array manipulation. See the schwartz routines in there for examples. > > And may the schwartz be with your sorts! A Schwartz joke has been present in http://digitalmars.com/d/2.0/phobos/std_algorithm.html since its early days. I am appalled nobody has noticed it. Andrei

Bill Baxter wrote: > It seems to me that the built-in .sort uses a randomized algorithm > that leads to results that differ from run-to-run when there are > elements with identical keys. No, it doesn't use random sorts. The source code is included, so this should be easy to verify.

Forums