popFront causing more memory to be used (page 2)

I tested that, modulus is slower. The compiler is surely converting it to something branchless like: uint iter_next = (iter + 1) * !(iter + 1 > k); I take your point but I think most people know that the equals operators have the lowest associativity.

In any case with large values of k the branch prediction will be right almost all of the time, explaining why this form is faster than modulo as modulo is fairly slow while this is a correctly predicted branch doing an addition if it doesn't make it branchless. The branchless version gives the same time result as branched, is there a way to force that line not to optimized to compare the predicted version?

ixid: > I take your point but I think most people know that the equals operators have the lowest associativity. Sorry I meant: nums[iter_next] = total % (10 ^^ 8); Instead of: nums[iter_next] = total % 10^^8; But I presume lot of people know that powers are higher precedence :-) Bye, bearophile

ixid: > In any case with large values of k the branch prediction will be right almost all of the time, explaining why this form is faster than modulo as modulo is fairly slow while this is a correctly predicted branch doing an addition if it doesn't make it branchless. That seems the explanation. > The branchless version gives the same time result as branched, is there a way to force that line not to optimized to compare the predicted version? I don't fully understand the question. Do you mean annotations like the __builtin_expect of GCC? Bye, bearophile

July 03, 2012

Re: popFront causing more memory to be used

Posted by ixid
in reply to bearophile

Permalink

ixid

Posted in reply to bearophile

Permalink

On Tuesday, 3 July 2012 at 17:25:18 UTC, bearophile wrote:
> ixid:
>
>> In any case with large values of k the branch prediction will be right almost all of the time, explaining why this form is faster than modulo as modulo is fairly slow while this is a correctly predicted branch doing an addition if it doesn't make it branchless.
>
> That seems the explanation.
>
>
>> The branchless version gives the same time result as branched, is there a way to force that line not to optimized to compare the predicted version?
>
> I don't fully understand the question. Do you mean annotations like the __builtin_expect of GCC?
>
> Bye,
> bearophile

If

uint iter_next = iter + 1 > k? 0 : iter + 1;

is getting optimized to

uint iter_next = (iter + 1) * !(iter + 1 > k);

or something like it by the compiler then it would be nice to be able to test the branched code without having the rest of the program lose optimizations for speed because as I said, for large k branching will almost always be correctly predicted making me think it'd be faster than the branchless version.

Forums