July 03, 2012
I tested that, modulus is slower. The compiler is surely converting it to something branchless like:

    uint iter_next = (iter + 1) * !(iter + 1 > k);

I take your point but I think most people know that the equals operators have the lowest associativity.
July 03, 2012
In any case with large values of k the branch prediction will be right almost all of the time, explaining why this form is faster than modulo as modulo is fairly slow while this is a correctly predicted branch doing an addition if it doesn't make it branchless. The branchless version gives the same time result as branched, is there a way to force that line not to optimized to compare the predicted version?


July 03, 2012
ixid:

> I take your point but I think most people know that the equals operators have the lowest associativity.

Sorry I meant:
nums[iter_next] = total % (10 ^^ 8);

Instead of:
nums[iter_next] = total % 10^^8;

But I presume lot of people know that powers are higher precedence :-)

Bye,
bearophile
July 03, 2012
Oops! I have a bad habit of thinking of the power operator as a part of the value rather than as an operator.


July 03, 2012
ixid:

> In any case with large values of k the branch prediction will be right almost all of the time, explaining why this form is faster than modulo as modulo is fairly slow while this is a correctly predicted branch doing an addition if it doesn't make it branchless.

That seems the explanation.


> The branchless version gives the same time result as branched, is there a way to force that line not to optimized to compare the predicted version?

I don't fully understand the question. Do you mean annotations like the __builtin_expect of GCC?

Bye,
bearophile
July 03, 2012
On Tuesday, 3 July 2012 at 17:25:18 UTC, bearophile wrote:
> ixid:
>
>> In any case with large values of k the branch prediction will be right almost all of the time, explaining why this form is faster than modulo as modulo is fairly slow while this is a correctly predicted branch doing an addition if it doesn't make it branchless.
>
> That seems the explanation.
>
>
>> The branchless version gives the same time result as branched, is there a way to force that line not to optimized to compare the predicted version?
>
> I don't fully understand the question. Do you mean annotations like the __builtin_expect of GCC?
>
> Bye,
> bearophile

If

uint iter_next = iter + 1 > k? 0 : iter + 1;

is getting optimized to

uint iter_next = (iter + 1) * !(iter + 1 > k);

or something like it by the compiler then it would be nice to be able to test the branched code without having the rest of the program lose optimizations for speed because as I said, for large k branching will almost always be correctly predicted making me think it'd be faster than the branchless version.
1 2
Next ›   Last »