February 06, 2016
On Saturday, 6 February 2016 at 01:46:58 UTC, Andrei Alexandrescu wrote:
> On 02/04/2016 03:30 PM, Timon Gehr wrote:
>> At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):
>
> Oh, also: could you let that bad boy run and let it find anything that does idempotent partition in 6 compares and 0-7 swaps? Then slowly tighten the number of swaps until you find the best balance between number of swaps and code size. -- Andrei

That is kind of what I tried to do by hand. My function partition5e seems to be practically identical to Ivans solution and partition5a is just a copy of Timons solution. Function partition5b, partition5c and partition5d are in between of those with various tradeoffs between the number of swaps and code size.
February 06, 2016
On Saturday, 6 February 2016 at 07:06:27 UTC, Ivan Kazmenko wrote:
> 1. Primitive types (cheap swap, cheap comparison).
>
> 2. Heavy structs A (expensive swap, cheap comparison - e.g., compare one field of primitive type).
>
> 3. Heavy structs B (expensive swap, expensive comparison - e.g., call a heavy external function).
>
> 4. Heavy classes (cheap swap - pointers only, expensive comparison).
>
> So there's perhaps no single best solution.

That's right, but other factors are more important: preventing pipeline stalls. If you are collecting from 5 different cachelines in an array you are likely to get several 40-120 cycles delays unless you do prefetching, and if you do, you need to have other instructions to fill in the latency gaps.

But also instructions have latency and concurrency issues. Which is why your version did reasonably well as it made the compares/swaps independent so that they could be concurrently scheduled.

Yet, Haswell has SIMD instructions that can do 8-16x 32-bit max/min operations per cycle, with a latency of only 1 cycle, and 4-8x 64bit compares with a latency of 1 cycle.

So if you use as  small fixed N, like 5, it makes very little sense to count compares/swaps.

What makes sense is to focus on how you can avoid branching and build an algorithm with no pipeline stalls.

If sorting large arrays you also might want to look at multi-threaded parallel sort functions.

February 06, 2016
On 02/05/2016 08:58 PM, Timon Gehr wrote:
> On 02/06/2016 02:39 AM, Andrei Alexandrescu wrote:
>> On 02/04/2016 03:30 PM, Timon Gehr wrote:
>>> At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):
>>
>> What is the minimum number of comparisons? Thx! -- Andrei
>
> There is no partition algorithm that uses <= 5 comparisons in all cases.
> (If that is what you're asking.)

Was asking about this particular one. For example, the maximum comparisons is 6, but it would be good to know the average number of comparisons. I know I could read through the code, but it's hairy. Thanks! -- Andrei

February 06, 2016
On 02/05/2016 10:13 AM, tn wrote:
> On Thursday, 4 February 2016 at 20:30:57 UTC, Timon Gehr wrote:
>> At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):
>>
>> ...
>>
>
> Inspired by this, I made four other versions of the function that are
> shorter but make more swaps (at most 4, 6, 7 and 8 swaps respectively).
> Each makes 6 comparisons and should be idempotent.
>
> http://dpaste.dzfl.pl/1c53d8f432f7

Whoa, I missed this last night. Awesome work! Thanks! I'm kinda running out of time budget here for this particular subproblem, but I've got to do these justice and try them out as well.

> Here are the distributions of the number of swaps when tested with all
> 120 possible permutations as the input:
>
> testing 'partition5a' (by Timon Gehr)
>    0 swaps: 4 orderings
>    1 swaps: 32 orderings
>    2 swaps: 68 orderings
>    3 swaps: 16 orderings
> testing 'partition5b'
>    0 swaps: 4 orderings
>    1 swaps: 20 orderings
>    2 swaps: 42 orderings
>    3 swaps: 40 orderings
>    4 swaps: 14 orderings
[...]

Could you please add two simple calculations? Assuming uniform random distribution of data, compute the average number of swaps as a weighted average of orderings. Also, show the number of lines (one test or one swap per line) as a proxy for generated code size. That should provide good insights.


Thanks!

Andrei

February 06, 2016
On 02/06/2016 02:06 AM, Ivan Kazmenko wrote:
> On Saturday, 6 February 2016 at 00:59:17 UTC, Andrei Alexandrescu wrote:
>> On 02/05/2016 06:36 AM, Ivan Kazmenko wrote:
>>> Another interesting task would be to make the function stable, but I
>>> don't see how it is possible with such flat structure.
>>
>> Under what circumstances isn't your function unstable? -- Andrei
>
> For example, if elements are "value | id", compared by value, then:
> 0|0, 0|1, 1|2, 0|3, 0|4
> is transformed into
> 0|0, 0|1, 0|4, 0|3, 1|2
> and 0|4 is now placed earlier than 0|3, which violates the stability
> requirement.
> Things can be reordered a bit, but it seems that no possible order
> eliminates the need to remember a part of the history.
>
> On the other hand, when we track our whole path in the decision tree
> (e.g. at the leaves of Timon Gehr's tree of ifs), we have all
> information to make the partition stable.
>
> In any case, finding the shortest-code stable partition-of-5 algorithm
> looks like a problem solvable by an automated searcher.

Yah. I think stability should be solved if there's a need/application for it, e.g. is there a larger algorithm that would be stable if partition5 were stable?

> Regarding the speed, there are different use cases with different
> requirements, for example:
>
> 1. Primitive types (cheap swap, cheap comparison).
>
> 2. Heavy structs A (expensive swap, cheap comparison - e.g., compare one
> field of primitive type).
>
> 3. Heavy structs B (expensive swap, expensive comparison - e.g., call a
> heavy external function).
>
> 4. Heavy classes (cheap swap - pointers only, expensive comparison).
>
> So there's perhaps no single best solution.

Yah, good point. I should also add that more comparisons generally mean more branching and more code. -- Andrei


February 06, 2016
On 02/06/2016 02:00 PM, Andrei Alexandrescu wrote:
> On 02/05/2016 08:58 PM, Timon Gehr wrote:
>> On 02/06/2016 02:39 AM, Andrei Alexandrescu wrote:
>>> On 02/04/2016 03:30 PM, Timon Gehr wrote:
>>>> At most 6 comparisons, <=3 swaps, idempotent (optimal number of swaps):
>>>
>>> What is the minimum number of comparisons? Thx! -- Andrei
>>
>> There is no partition algorithm that uses <= 5 comparisons in all cases.
>> (If that is what you're asking.)
>
> Was asking about this particular one. For example, the maximum
> comparisons is 6, but it would be good to know the average number of
> comparisons. I know I could read through the code, but it's hairy.
> Thanks! -- Andrei
>

All versions posted in this thread do 6 comparisons on all code paths.
(It is not possible to do better.)

BTW, the code I had posted earlier suffers from the following flaws:

- Branches were cut off too early, so -version=SHORT didn't actually find the shortest code. [1]

- The code (by necessity) only performs the optimal number of swaps if all input elements are different. While it leaves already partitioned integer arrays in the same state as they were encountered, if multiple input elements are equal, the generated algorithm will sometimes swap them, violating idempotence if input elements can compare equal but be different. However, tn's versions fix this: they never perform any swaps if the input array is already partitioned.



[1] This seems to be the shortest code that satisfies the specification I have given (<=6 comparisons, optimal number of swaps) for permutations and that performs all the comparing before all the swapping:

void partition5(ref int[5] a){

if(a[0]<a[2]){if(a[1]<a[3]){if(a[0]<a[1]){if(a[2]<a[4]){if(a[1]<a[2]){if(a[3]<a[2]){swap(a[3],a[2]);}}else{if(a[1]<a[4]){swap(a[1],a[2]);}else{swap(a[4],a[2]);swap(a[1],a[4]);}}}else{if(a[1]<a[4]){if(a[3]<a[4]){swap(a[3],a[2]);}else{swap(a[4],a[2]);}}else{if(a[1]<a[2]){swap(a[1],a[2]);swap(a[1],a[4]);}else{swap(a[1],a[4]);}}}}else{if(a[3]<a[4]){if(a[0]<a[3]){if(a[3]<a[2]){swap(a[3],a[2]);}}else{if(a[0]<a[4]){swap(a[0],a[2]);swap(a[0],a[3]);}else{swap(a[4],a[2]);swap(a[0],a[3]);}}}else{if(a[0]<a[4]){if(a[4]<a[2]){swap(a[4],a[2]);}}else{if(a[0]<a[3]){swap(a[0],a[2]);swap(a[0],a[4]);}else{swap(a[3],a[2]);swap(a[0],a[4]);}}}}}else{if(a[0]<a[3]){if(a[2]<a[4]){if(a[2]<a[3]){if(a[3]<a[4]){swap(a[3],a[2]);swap(a[1],a[3]);}else{swap(a[4],a[2]);swap(a[1],a[4]);}}else{if(a[1]<a[2]){swap(a[1],a[2]);swap(a[1],a[3]);}else{swap(a[1],a[3]);}}}else{if(a[3]<a[4]){if(a[1]<a[4]){swap(a[1],a[2]);swap(a[1],a[3]);}else{swap(a[4],a[2]);swap(a[1],a[3]);}}else{if(a[2]<a[3]){swap(a[1],a[4]);}else{swap(a[3],a[2]
);swap(a[1],a[4]);}}}}else{if(a[1]<a[4]){if(a[0]<a[1]){if(a[1]<a[2]){swap(a[1],a[2]);swap(a[1],a[3]);}else{swap(a[1],a[3]);}}else{if(a[0]<a[4]){swap(a[0],a[2]);swap(a[0],a[3]);}else{swap(a[4],a[2]);swap(a[0],a[3]);}}}else{if(a[0]<a[4]){if(a[2]<a[4]){swap(a[1],a[3]);}else{swap(a[4],a[2]);swap(a[1],a[3]);}}else{if(a[0]<a[1]){swap(a[0],a[2]);swap(a[0],a[3]);swap(a[1],a[4]);}else{swap(a[1],a[2]);swap(a[0],a[3]);swap(a[1],a[4]);}}}}}}else{if(a[3]<a[4]){if(a[0]<a[4]){if(a[1]<a[3]){if(a[0]<a[3]){if(a[0]<a[1]){swap(a[1],a[2]);}else{swap(a[0],a[2]);}}else{if(a[2]<a[3]){swap(a[3],a[2]);swap(a[0],a[3]);}else{swap(a[0],a[3]);}}}else{if(a[0]<a[1]){if(a[0]<a[3]){swap(a[3],a[2]);swap(a[1],a[3]);}else{swap(a[0],a[2]);swap(a[1],a[3]);}}else{if(a[1]<a[2]){swap(a[0],a[3]);}else{swap(a[1],a[2]);swap(a[0],a[3]);}}}}else{if(a[1]<a[2]){if(a[2]<a[4]){if(a[2]<a[3]){swap(a[3],a[2]);swap(a[0],a[3]);}else{swap(a[0],a[3]);}}else{if(a[1]<a[4]){swap(a[4],a[2]);swap(a[0],a[3]);}else{swap(a[1],a[2]);swap(a[0],a[3]);s
wap(a[1],a[4]);}}}else{if(a[1]<a[4]){if(a[1]<a[3]){swap(a[3],a[2]);swap(a[0],a[3]);}else{swap(a[1],a[2]);swap(a[0],a[3]);}}else{if(a[2]<a[4]){swap(a[4],a[2]);swap(a[0],a[3]);swap(a[1],a[4]);}else{swap(a[0],a[3]);swap(a[1],a[4]);}}}}}else{if(a[0]<a[3]){if(a[1]<a[4]){if(a[0]<a[4]){if(a[0]<a[1]){swap(a[1],a[2]);}else{swap(a[0],a[2]);}}else{if(a[2]<a[4]){swap(a[4],a[2]);swap(a[0],a[4]);}else{swap(a[0],a[4]);}}}else{if(a[0]<a[1]){if(a[0]<a[4]){swap(a[4],a[2]);swap(a[1],a[4]);}else{swap(a[0],a[2]);swap(a[1],a[4]);}}else{if(a[1]<a[2]){swap(a[0],a[4]);}else{swap(a[1],a[2]);swap(a[0],a[4]);}}}}else{if(a[1]<a[2]){if(a[2]<a[3]){if(a[2]<a[4]){swap(a[4],a[2]);swap(a[0],a[4]);}else{swap(a[0],a[4]);}}else{if(a[1]<a[3]){swap(a[3],a[2]);swap(a[0],a[4]);}else{swap(a[1],a[2]);swap(a[0],a[3]);swap(a[1],a[4]);}}}else{if(a[1]<a[3]){if(a[1]<a[4]){swap(a[4],a[2]);swap(a[0],a[4]);}else{swap(a[1],a[2]);swap(a[0],a[4]);}}else{if(a[2]<a[3]){swap(a[3],a[2]);swap(a[0],a[3]);swap(a[1],a[4]);}else{swap(a[0],a[3]);sw
ap(a[1],a[4]);}}}}}}
}

February 06, 2016
On 2/6/16 9:26 AM, Timon Gehr wrote:
> [1] This seems to be the shortest code that satisfies the specification
> I have given (<=6 comparisons, optimal number of swaps) for permutations
> and that performs all the comparing before all the swapping:

Thanks. Tried this just now, it's better than the pre-discussion baselines but Ivan's still beats it (by a little). Then I just tried tn's partition5c which beats Ivan's.

I should also add that returns are starting to diminish. Idempotence did make a large difference. Then reducing max swaps made a smaller difference (at least for the uints I'm working with).

BTW my testbed is a careful implementation of BFPRT on random arrays of uint of various sizes.


Andrei

February 06, 2016
On Saturday, 6 February 2016 at 13:06:37 UTC, Andrei Alexandrescu wrote:
> Could you please add two simple calculations? Assuming uniform random distribution of data, compute the average number of swaps as a weighted average of orderings. Also, show the number of lines (one test or one swap per line) as a proxy for generated code size. That should provide good insights.

This now prints the numbers of comparison and swap lines and computes the average numbers of swaps. In addition, it also runs the same tests for permutations of 5 elements some of which might be equal (an example of such permutation would be [1, 2, 0, 2, 3], on the other hand [1, 2, 0, 2, 4] should not be included since it is identical). However, in this case the average number of swaps is perhaps not so meaningful.

http://dpaste.dzfl.pl/2012caf872ec

Output:

testing 'partition5a'
  Code: comparisons = 63, swaps = 107, total = 170
  With all orderings of distinct elements:
    0 swaps: 4 instances
    1 swaps: 32 instances
    2 swaps: 68 instances
    3 swaps: 16 instances
    Average number of swaps: 1.8
  With all orderings of potentially nondistinct elements:
    Error, not idempotent: [0, 0, 0, 0, 0] => [0, 0, 0, 0, 0]
testing 'partition5b'
  Code: comparisons = 17, swaps = 21, total = 38
  With all orderings of distinct elements:
    0 swaps: 4 instances
    1 swaps: 20 instances
    2 swaps: 42 instances
    3 swaps: 40 instances
    4 swaps: 14 instances
    Average number of swaps: 2.33333
  With all orderings of potentially nondistinct elements:
    0 swaps: 36 instances
    1 swaps: 122 instances
    2 swaps: 200 instances
    3 swaps: 146 instances
    4 swaps: 37 instances
    Average number of swaps: 2.04806
testing 'partition5c'
  Code: comparisons = 10, swaps = 12, total = 22
  With all orderings of distinct elements:
    0 swaps: 4 instances
    1 swaps: 14 instances
    2 swaps: 25 instances
    3 swaps: 30 instances
    4 swaps: 26 instances
    5 swaps: 16 instances
    6 swaps: 5 instances
    Average number of swaps: 3.06667
  With all orderings of potentially nondistinct elements:
    0 swaps: 36 instances
    1 swaps: 92 instances
    2 swaps: 130 instances
    3 swaps: 138 instances
    4 swaps: 92 instances
    5 swaps: 42 instances
    6 swaps: 11 instances
    Average number of swaps: 2.60628
testing 'partition5d'
  Code: comparisons = 7, swaps = 8, total = 15
  With all orderings of distinct elements:
    0 swaps: 4 instances
    1 swaps: 14 instances
    2 swaps: 24 instances
    3 swaps: 29 instances
    4 swaps: 26 instances
    5 swaps: 16 instances
    6 swaps: 6 instances
    7 swaps: 1 instances
    Average number of swaps: 3.13333
  With all orderings of potentially nondistinct elements:
    0 swaps: 36 instances
    1 swaps: 100 instances
    2 swaps: 128 instances
    3 swaps: 133 instances
    4 swaps: 94 instances
    5 swaps: 39 instances
    6 swaps: 10 instances
    7 swaps: 1 instances
    Average number of swaps: 2.57486
testing 'partition5e'
  Code: comparisons = 6, swaps = 8, total = 14
  With all orderings of distinct elements:
    0 swaps: 4 instances
    1 swaps: 12 instances
    2 swaps: 19 instances
    3 swaps: 25 instances
    4 swaps: 25 instances
    5 swaps: 18 instances
    6 swaps: 11 instances
    7 swaps: 5 instances
    8 swaps: 1 instances
    Average number of swaps: 3.53333
  With all orderings of potentially nondistinct elements:
    0 swaps: 36 instances
    1 swaps: 88 instances
    2 swaps: 100 instances
    3 swaps: 123 instances
    4 swaps: 106 instances
    5 swaps: 53 instances
    6 swaps: 25 instances
    7 swaps: 9 instances
    8 swaps: 1 instances
    Average number of swaps: 2.89649

February 06, 2016
On 02/06/2016 01:42 PM, tn wrote:
> On Saturday, 6 February 2016 at 13:06:37 UTC, Andrei Alexandrescu wrote:
>> Could you please add two simple calculations? Assuming uniform random
>> distribution of data, compute the average number of swaps as a
>> weighted average of orderings. Also, show the number of lines (one
>> test or one swap per line) as a proxy for generated code size. That
>> should provide good insights.
>
> This now prints the numbers of comparison and swap lines and computes
> the average numbers of swaps. In addition, it also runs the same tests
> for permutations of 5 elements some of which might be equal (an example
> of such permutation would be [1, 2, 0, 2, 3], on the other hand [1, 2,
> 0, 2, 4] should not be included since it is identical). However, in this
> case the average number of swaps is perhaps not so meaningful.
>
> http://dpaste.dzfl.pl/2012caf872ec

Awesome, thanks. Will need to try at least a few of these out. At a quick glance, partition5d seems to be the sweet spot - it's small and does only 3.13/2.57 swaps on average. -- Andrei

1 2 3 4
Next ›   Last »