Thread overview
Sorting according to a primary and secondary criterion
Jul 17, 2013
bearophile
Jul 17, 2013
bearophile
Jul 17, 2013
monarch_dodra
Jul 17, 2013
bearophile
Jul 17, 2013
John Colvin
July 17, 2013
Hi all :-)

Suppose that I have two different arrays of the same length, arr1 and arr2, and an array of index values idx = [0 .. arr1.length].

Now, suppose that I want to sort the index values according to the corresponding values of arr1.  This is easy with schwartzSort:

    idx.schwartzSort!(a => arr1[a]);

But suppose now that I want to sort _primarily_ according to arr1, but secondarily according to arr2?  That is, that the index i should come before j if arr1[i] < arr1[j], but also if arr1[i] == arr1[j] && arr2[i] < arr2[j] ... ?

One option might be first to sort with respect to arr2 and then to sort with respect to arr1 using SwapStrategy.stable, but that seems overkill and also uncertain.

I guess I could do something like,

    .sort!("arr1[a] < arr1[b] || (arr1[a] == arr1[b] && arr2[a] < arr2[b]"));

... but I'm not sure that would be an optimal strategy.

Is there a standard, accepted approach for this kind of sort with primary/secondary criterion?

Thanks & best wishes,

    -- Joe
July 17, 2013
Joseph Rushton Wakeling:

> Is there a standard, accepted approach for this kind of sort with primary/secondary criterion?

There are various ways to do it. One way is to use a stable sort and sort the data two or more times.

Another way is to use something like this, but this needs some memory:

idx.schwartzSort!(i => tuple(arr1[i], arr2[i]));

But often the most efficient way is to use sort() with a comparison function that takes in account all your sorting criteria.

Bye,
bearophile
July 17, 2013
On 07/17/2013 02:07 PM, bearophile wrote:
> Another way is to use something like this, but this needs some memory:
> 
> idx.schwartzSort!(i => tuple(arr1[i], arr2[i]));

Oh, nice thought! :-)

> But often the most efficient way is to use sort() with a comparison function that takes in account all your sorting criteria.

That's what I assumed, but I don't understand how to provide that criterion to sort: if I do e.g.

    idx.sort!("arr1[a] < arr1[b] || (arr1[a] == arr1[b] && arr2[a] < arr2[b])");

I (unsurprisingly) get a load of errors about std.functional not having access
to arr1 or arr2.

Contrast with the example in std.algorithm docs:

    words.sort!("toUpper(a) < toUpper(b)", SwapStrategy.stable);

... but of course toUpper is a publicly available function.
July 17, 2013
On Wednesday, 17 July 2013 at 11:43:37 UTC, Joseph Rushton Wakeling wrote:
>
>     .sort!("arr1[a] < arr1[b] || (arr1[a] == arr1[b] && arr2[a] < arr2[b]"));
>
> ... but I'm not sure that would be an optimal strategy.

Is std.algorithm.multisort what you'd be looking for?
July 17, 2013
Joseph Rushton Wakeling:

>     idx.sort!("arr1[a] < arr1[b] || (arr1[a] == arr1[b] && arr2[a] < arr2[b])");
>
> I (unsurprisingly) get a load of errors about std.functional not having access to arr1 or arr2.

You need a lambda delegate for that. But I forgot about multisort algorithm... It's probably the right tool.

Bye,
bearophile
July 17, 2013
On 07/17/2013 02:35 PM, John Colvin wrote:
> Is std.algorithm.multisort what you'd be looking for?

Good thought.  Thanks to pointing me here I also noticed the following example in the schwartzSort docs which might be relevant:

    sort!((a, b) => hashFun(a) < hashFun(b))(array);

I'm going to try out multisort and try out this second way of writing the condition.

July 17, 2013
On 07/17/2013 02:07 PM, bearophile wrote:
> Another way is to use something like this, but this needs some memory:
> 
> idx.schwartzSort!(i => tuple(arr1[i], arr2[i]));

Actually, I don't find it needs any more memory than regular schwartzSort (which
I was using anyway), but it does cost _speed_ -- quite a lot. :-(

July 17, 2013
Joseph Rushton Wakeling:

> Actually, I don't find it needs any more memory than regular schwartzSort (which I was using anyway),

A and array of tuples should take more memory. Try with a much larger input array.


> but it does cost _speed_ -- quite a lot. :-(

Right, schwartzSort is quite slow:
http://d.puremagic.com/issues/show_bug.cgi?id=5077

But thankfully there are simple means to speed up schwartzSort... (like using alloca for small input arrays, improving its code, using minimallyInitializedArray, etc).

Bye,
bearophile
July 17, 2013
On 07/17/2013 02:42 PM, bearophile wrote:
> You need a lambda delegate for that. But I forgot about multisort algorithm... It's probably the right tool.

So, in the end I tried out 3 different alternatives:

  schwartzSort(a => tuple(arr1[a], arr2[a]), "a < b")
  sort((a, b) => arr1[a] < arr1[b] || (arr1[a] == arr1[b] && arr2[a] < arr2[b]))
  multiSort((a, b) => arr1[a] < arr1[b], (a, b) => arr2[a] < arr2[b])

sort() seems faster than schwartzSort for smaller-scale stuff, but takes longer for larger-scale sorts.  multiSort wins out over both of them.

I guess in principle it might be possible to have a multiSchwartzSort which allows for multiple transformations:

  multiSchwartzSort(a => arr1[a], a => arr2[a], "a < b", "a < b")

... which might gain some speed by caching the results of the transformations, as schwartzSort is supposed to do?

But anyway, I think this solution works at limited extra cost speed-wise. :-)

Thanks very much to both of you!

July 17, 2013
On Wednesday, 17 July 2013 at 13:12:18 UTC, Joseph Rushton Wakeling wrote:
> On 07/17/2013 02:42 PM, bearophile wrote:
>> You need a lambda delegate for that. But I forgot about multisort algorithm...
>> It's probably the right tool.
>
> So, in the end I tried out 3 different alternatives:
>
>   schwartzSort(a => tuple(arr1[a], arr2[a]), "a < b")
>   sort((a, b) => arr1[a] < arr1[b] || (arr1[a] == arr1[b] && arr2[a] < arr2[b]))
>   multiSort((a, b) => arr1[a] < arr1[b], (a, b) => arr2[a] < arr2[b])
>
> sort() seems faster than schwartzSort for smaller-scale stuff, but takes longer
> for larger-scale sorts.  multiSort wins out over both of them.
>
> I guess in principle it might be possible to have a multiSchwartzSort which
> allows for multiple transformations:
>
>   multiSchwartzSort(a => arr1[a], a => arr2[a], "a < b", "a < b")
>
> ... which might gain some speed by caching the results of the transformations,
> as schwartzSort is supposed to do?
>
> But anyway, I think this solution works at limited extra cost speed-wise. :-)
>
> Thanks very much to both of you!

schwartzSort should really only be used if the transformation function is expensive. For example, if you want to sort 3D dots according to their norm.

In your case, your transformation function (a => arr1[a]) isn't expensive. So a straight up (multi)sort should be preferred over schwartzSort.