January 05, 2009
== Quote from Bill Baxter (wbaxter@gmail.com)'s article
> Actually, a function to sort multiple arrays in parallel was exactly
> what I was implementing using .sort.  So that doesn't sound like a
> limitation to me at all.   :-)
> --bb

Am I (and possibly you) the only one(s) who think that sorting multiple arrays in parallel should be standard library functionality?  The standard rebuttal might be "use arrays of structs instead of parallel arrays".  This is a good idea in some situations, but for others, parallel arrays are just plain better.  Furthermore, with D's handling of variadic functions, generalizing any sort to handle parallel arrays is easy.
January 05, 2009
On Tue, Jan 6, 2009 at 6:15 AM, dsimcha <dsimcha@yahoo.com> wrote:
> == Quote from Bill Baxter (wbaxter@gmail.com)'s article
>> Actually, a function to sort multiple arrays in parallel was exactly
>> what I was implementing using .sort.  So that doesn't sound like a
>> limitation to me at all.   :-)
>> --bb
>
> Am I (and possibly you) the only one(s) who think that sorting multiple arrays in
> parallel should be standard library functionality?

I use this all the time in NumPy in the form of "argsort" which
returns a list of indices giving the sort order.  That can then be
used as to index other arrays (thereby permuting them to the sorted
order).
order = npy.argsort(keys)
sortedA = A[order]
sortedB = B[order]

> The standard rebuttal might be
> "use arrays of structs instead of parallel arrays".
> This is a good idea in some
> situations, but for others, parallel arrays are just plain better.

Yeh, that's just a silly argument.  Sometimes a column-oriented
organization is more suitable than a row-oriented one.
Sort can be made to work on column-oriented data, so there's no reason
not to make it so.

> Furthermore, with D's handling of variadic functions, generalizing any sort to handle parallel arrays is easy.

Yeh, this works pretty nicely.

--bb
January 06, 2009
dsimcha wrote:
> == Quote from Bill Baxter (wbaxter@gmail.com)'s article
>> Actually, a function to sort multiple arrays in parallel was exactly
>> what I was implementing using .sort.  So that doesn't sound like a
>> limitation to me at all.   :-)
>> --bb
> 
> Am I (and possibly you) the only one(s) who think that sorting multiple arrays in
> parallel should be standard library functionality?  The standard rebuttal might be
> "use arrays of structs instead of parallel arrays".  This is a good idea in some
> situations, but for others, parallel arrays are just plain better.  Furthermore,
> with D's handling of variadic functions, generalizing any sort to handle parallel
> arrays is easy.

I've written my own parallel-array quicksort implementation (several times over, in many different languages).

Parallel sorting is one of my favorite tricks, and I think it definitely belongs in the standard library.

--benji
January 06, 2009
Bill Baxter:

>I use this all the time in NumPy in the form of "argsort" which returns a list of indices giving the sort order.  That can then be used as to index other arrays (thereby permuting them to the sorted order).
order = npy.argsort(keys)
sortedA = A[order]
sortedB = B[order]<

My dlibs already contain an efficient sorting routine, much faster than the built in one.

So to my dlibs I have just added sortedIndexes and remapOrder, you can find their docs here: http://www.fantascienza.net/leonardo/so/dlibs/func.html

The code: http://www.fantascienza.net/leonardo/so/libs_d.zip

The usage is very simple, sortedIndexes is similar to sorted():

auto a =  [-5, 2, 3, 1, -11].dup;
auto order1 = a.sortedIndexes();
// now order1 == [4U, 0, 3, 1, 2]

auto order2 = a.sortedIndexes((int x){return x < 0 ? -x : x;});
// Now order2 == [3U, 1, 2, 0, 4]

int[2][] c = [[3, 4], [1,2], [1, 1]];
auto order3 = c.sortedIndexes();
// now order3 == [2U, 1, 0]

auto a =  [-5, 2, 3, 1, -11].dup;

a.remapOrder([4, 0, 3, 1, 2]) ==> [-11, -5, 1, 2, 3]
// a isn't changed here
a.remapOrder([3, 1, 2, 0, 4]) ==> [1, 2, 3, -5, -11]

auto b = ["oranges"[], "for", "apples", "is", "right"].dup;
b.remapOrder([3, 1, 2, 0, 4]) ==>
["is"[], "for", "apples", "oranges", "right"].dup);

(Maybe I have to rename remapOrder as remappedOrder).

--------------------------------

While writing the unittest for those functions, I have seen this isn't accepted by DMD: int[2][] c = [[3, 4], [1, 2], [1, 1]].dup;

While the compiler accepts this, I don't know why:
int[2][] c0 = [[3, 4], [1, 2], [1, 1]];
int[2][] c2 = c0.dup;

Bye,
bearophile
January 06, 2009
bearophile:
> (Maybe I have to rename remapOrder as remappedOrder).

'reordered' is better still, I've changed the name.

Bye,
bearophile
January 06, 2009
On Tue, Jan 6, 2009 at 10:18 AM, bearophile <bearophileHUGS@lycos.com> wrote:
> Bill Baxter:
>
>>I use this all the time in NumPy in the form of "argsort" which returns a list of indices giving the sort order.  That can then be used as to index other arrays (thereby permuting them to the sorted order).
> order = npy.argsort(keys)
> sortedA = A[order]
> sortedB = B[order]<
>
> My dlibs already contain an efficient sorting routine, much faster than the built in one.
>
> So to my dlibs I have just added sortedIndexes and remapOrder, you can find their docs here: http://www.fantascienza.net/leonardo/so/dlibs/func.html
>
> The code: http://www.fantascienza.net/leonardo/so/libs_d.zip
>
> The usage is very simple, sortedIndexes is similar to sorted():
>
> auto a =  [-5, 2, 3, 1, -11].dup;
> auto order1 = a.sortedIndexes();
> // now order1 == [4U, 0, 3, 1, 2]
>
> auto order2 = a.sortedIndexes((int x){return x < 0 ? -x : x;});
> // Now order2 == [3U, 1, 2, 0, 4]
>
> int[2][] c = [[3, 4], [1,2], [1, 1]];
> auto order3 = c.sortedIndexes();
> // now order3 == [2U, 1, 0]
>
> auto a =  [-5, 2, 3, 1, -11].dup;
>
> a.remapOrder([4, 0, 3, 1, 2]) ==> [-11, -5, 1, 2, 3]
> // a isn't changed here
> a.remapOrder([3, 1, 2, 0, 4]) ==> [1, 2, 3, -5, -11]
>
> auto b = ["oranges"[], "for", "apples", "is", "right"].dup;
> b.remapOrder([3, 1, 2, 0, 4]) ==>
> ["is"[], "for", "apples", "oranges", "right"].dup);
>
> (Maybe I have to rename remapOrder as remappedOrder).
>
> --------------------------------
>
> While writing the unittest for those functions, I have seen this isn't accepted by DMD: int[2][] c = [[3, 4], [1, 2], [1, 1]].dup;
>
> While the compiler accepts this, I don't know why:
> int[2][] c0 = [[3, 4], [1, 2], [1, 1]];
> int[2][] c2 = c0.dup;


Making a permutation array like that is simple and nice, but another alternative which may be more efficient is to override the "swap(i,j)" routine used by the sort implementation, and make it swap the i'th and jth elements of all the arrays that are being sorted.  That can be done with O(1) extra storage instead of the O(n) required by making to make a permutation array.  I think it's also hard to avoid O(n) extra memory for permuting an array.  At least I can't think of how to do it with out using an extra allocation.

--bb
January 06, 2009
Benji Smith wrote:
> dsimcha wrote:
>> == Quote from Bill Baxter (wbaxter@gmail.com)'s article
>>> Actually, a function to sort multiple arrays in parallel was exactly
>>> what I was implementing using .sort.  So that doesn't sound like a
>>> limitation to me at all.   :-)
>>> --bb
>>
>> Am I (and possibly you) the only one(s) who think that sorting multiple arrays in
>> parallel should be standard library functionality?  The standard rebuttal might be
>> "use arrays of structs instead of parallel arrays".  This is a good idea in some
>> situations, but for others, parallel arrays are just plain better.  Furthermore,
>> with D's handling of variadic functions, generalizing any sort to handle parallel
>> arrays is easy.
> 
> I've written my own parallel-array quicksort implementation (several times over, in many different languages).
> 
> Parallel sorting is one of my favorite tricks, and I think it definitely belongs in the standard library.
> 
> --benji

std.algorithm's parameterized swap primitive is motivated in part by parallel array manipulation. See the schwartz routines in there for examples.

Andrei
January 06, 2009
Andrei Alexandrescu wrote:
> std.algorithm's parameterized swap primitive is motivated in part by parallel array manipulation. See the schwartz routines in there for examples.

And may the schwartz be with your sorts!
January 06, 2009
Walter Bright wrote:
> Andrei Alexandrescu wrote:
>> std.algorithm's parameterized swap primitive is motivated in part by parallel array manipulation. See the schwartz routines in there for examples.
> 
> And may the schwartz be with your sorts!

A Schwartz joke has been present in http://digitalmars.com/d/2.0/phobos/std_algorithm.html since its early days. I am appalled nobody has noticed it.


Andrei
January 06, 2009
Bill Baxter wrote:
> It seems to me that the built-in .sort uses a randomized algorithm
> that leads to results that differ from run-to-run when there are
> elements with identical keys.

No, it doesn't use random sorts. The source code is included, so this should be easy to verify.