February 05, 2016
On Thursday, 4 February 2016 at 20:30:57 UTC, Timon Gehr wrote:
> At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):
>
> ...
>

Inspired by this, I made four other versions of the function that are shorter but make more swaps (at most 4, 6, 7 and 8 swaps respectively). Each makes 6 comparisons and should be idempotent.

http://dpaste.dzfl.pl/1c53d8f432f7

Here are the distributions of the number of swaps when tested with all 120 possible permutations as the input:

testing 'partition5a' (by Timon Gehr)
  0 swaps: 4 orderings
  1 swaps: 32 orderings
  2 swaps: 68 orderings
  3 swaps: 16 orderings
testing 'partition5b'
  0 swaps: 4 orderings
  1 swaps: 20 orderings
  2 swaps: 42 orderings
  3 swaps: 40 orderings
  4 swaps: 14 orderings
testing 'partition5c'
  0 swaps: 4 orderings
  1 swaps: 14 orderings
  2 swaps: 25 orderings
  3 swaps: 30 orderings
  4 swaps: 26 orderings
  5 swaps: 16 orderings
  6 swaps: 5 orderings
testing 'partition5d'
  0 swaps: 4 orderings
  1 swaps: 14 orderings
  2 swaps: 24 orderings
  3 swaps: 29 orderings
  4 swaps: 26 orderings
  5 swaps: 16 orderings
  6 swaps: 6 orderings
  7 swaps: 1 orderings
testing 'partition5e'
  0 swaps: 4 orderings
  1 swaps: 12 orderings
  2 swaps: 19 orderings
  3 swaps: 25 orderings
  4 swaps: 25 orderings
  5 swaps: 18 orderings
  6 swaps: 11 orderings
  7 swaps: 5 orderings
  8 swaps: 1 orderings

February 05, 2016
On Thursday, 4 February 2016 at 20:30:57 UTC, Timon Gehr wrote:
> At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps)[...]

One swap usually decomposes into three moves. A cycle of length n can be sorted with n + 1 moves. Corresponding to the seven integer partitions of 5 we obtain:

                    5: 6
                4 + 1: 5
                3 + 2: 4 + 3
            3 + 1 + 1: 4
            2 + 2 + 1: 3 + 3
        2 + 1 + 1 + 1: 3
    1 + 1 + 1 + 1 + 1: 0

So we can do with seven moves instead of three swaps (nine moves). ;-)

February 05, 2016
On Friday, 5 February 2016 at 15:41:11 UTC, Fool wrote:
> So we can do with seven moves instead of three swaps (nine moves). ;-)

Yes, well, this all breaks down when you realize that this is all completely dominated by cache misses and that you can do median more effectively using SIMD instructions. :^P

February 05, 2016
On Friday, 5 February 2016 at 15:41:11 UTC, Fool wrote:
> One swap usually decomposes into three moves.

 Recently from reading some interesting hacks and super code, a good swap can also be done via 3 xor operations (and avoiding an intermediate storage unit).

http://aggregate.ee.engr.uky.edu/MAGIC/#Swap%20Values%20Without%20a%20Temporary

x ^= y;
y ^= x;
x ^= y;

 Since these can be used directly as registers it might be faster (assuming all 5 are mapped to registers until the final write out, which might not work).
February 05, 2016
On Friday, 5 February 2016 at 17:33:55 UTC, Era Scarecrow wrote:
> On Friday, 5 February 2016 at 15:41:11 UTC, Fool wrote:
>> One swap usually decomposes into three moves.
>
>  Recently from reading some interesting hacks and super code, a good swap can also be done via 3 xor operations (and avoiding an intermediate storage unit).
>
> http://aggregate.ee.engr.uky.edu/MAGIC/#Swap%20Values%20Without%20a%20Temporary
>
> x ^= y;
> y ^= x;
> x ^= y;
>
>  Since these can be used directly as registers it might be faster (assuming all 5 are mapped to registers until the final write out, which might not work).

It's a cool trick but I would be surprised if this were actually faster. Modern x64 processors have 16 "general purpose" registers so saving a single register for a simple set of instructions is likely to not have any benefit. My other thought is that the compiler may not recognize this pattern, making it more difficult to optimize. On the other hand, compilers are very good at rearranging simple reads and writes to avoid stalling the pipeline in the CPU.
February 05, 2016
On Friday, 5 February 2016 at 15:13:56 UTC, tn wrote:
> On Thursday, 4 February 2016 at 20:30:57 UTC, Timon Gehr wrote:
>> At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):
>>
>> ...
>>
>
> Inspired by this, I made four other versions of the function that are shorter but make more swaps (at most 4, 6, 7 and 8 swaps respectively). Each makes 6 comparisons and should be idempotent.
>
> http://dpaste.dzfl.pl/1c53d8f432f7
>
> ...

Very nice! I'm curious, to both you and Timon, how did you go about writing these and coming up with the solutions? I'm not sure if I could come up with these results myself and so quickly at that.
February 05, 2016
On 02/05/2016 04:41 PM, Fool wrote:
> On Thursday, 4 February 2016 at 20:30:57 UTC, Timon Gehr wrote:
>> At most 6 comparisons, <=3 swaps, idempotent (optimal number of
>> swaps)[...]
>
> One swap usually decomposes into three moves. A cycle of length n can be
> sorted with n + 1 moves. Corresponding to the seven integer partitions
> of 5 we obtain:
>
>                      5: 6
>                  4 + 1: 5
>                  3 + 2: 4 + 3
>              3 + 1 + 1: 4
>              2 + 2 + 1: 3 + 3
>          2 + 1 + 1 + 1: 3
>      1 + 1 + 1 + 1 + 1: 0
>
> So we can do with seven moves instead of three swaps (nine moves). ;-)
>

The goal is to partition though, not to sort.
Six moves are sufficient. (And necessary. E.g. for 42310.)
February 05, 2016
On Friday, 5 February 2016 at 21:48:58 UTC, Timon Gehr wrote:
> The goal is to partition though, not to sort.
> Six moves are sufficient. (And necessary. E.g. for 42310.)

Indeed. I was wondering about that. Great job!
February 05, 2016
On 02/05/2016 10:42 PM, Xinok wrote:
> On Friday, 5 February 2016 at 15:13:56 UTC, tn wrote:
>> On Thursday, 4 February 2016 at 20:30:57 UTC, Timon Gehr wrote:
>>> At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):
>>>
>>> ...
>>>
>>
>> Inspired by this, I made four other versions of the function that are
>> shorter but make more swaps (at most 4, 6, 7 and 8 swaps
>> respectively). Each makes 6 comparisons and should be idempotent.
>>
>> http://dpaste.dzfl.pl/1c53d8f432f7
>>
>> ...
>
> Very nice! I'm curious, to both you and Timon, how did you go about
> writing these and coming up with the solutions? I'm not sure if I could
> come up with these results myself and so quickly at that.

I quickly hacked together a brute-force search.
Code: http://dpaste.dzfl.pl/43503913ac1d

The search only makes sure that it works for distinct input values. Other input orders come for free; the verification code I have included checks this.

The code recursively splits the set of all permutations on 5 elements by all possible comparisons between two elements up to a maximal depth of 6 comparisons, using some basic pruning. The recursion terminates if all permutations in the set can be partitioned using the same swaps. (I.e., the index of the median is fixed and there are exactly two possible indices for the first two elements, and exactly two possible indices for the last two elements.)

The partition5 function I have posted is the first result that the search found. The above code also supports optimizing for source code size (-version=SHORT). The result is not that much shorter though.
February 05, 2016
On 02/04/2016 03:30 PM, Timon Gehr wrote:
> At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):

Timon, Ivan, this is amazing work. I don't know how your minds work folks - I sat for like two hours on it yesterday and couldn't crack it.

Surprisingly in a test on millions of random numbers, Ivan's version wins by a hair. Might be the tighter code which has icache effects.

Thanks very much! I should say that at these point no better definitions can be found on the Net.


Andrei