May 19, 2015
On Tuesday, 19 May 2015 at 09:34:04 UTC, Almighty Bob wrote:
> On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu wrote:
>> So there's this classic trick:
>>
>> bool isPowerOf2(uint x)
>> {
>>    return (x & (x - 1)) == 0;
>> }
>>
>> which has no branches at least with dmd, see http://goo.gl/TVkCwc.
>>
>> Any ideas for faster code?
>
> If you dont mind asm then after you do...
>
> tmp = x-1;
>
> you could add the borrow/carry flag back onto the tmp, so it'd add back up to zero any time there's an underflow in the (x-1) op.
>
> So two extra instructions, (you need a zero for the ADC) no branch.

Oops, should be either add the carry onto x, or subtract the carry from tmp. Either way the result of the & is non zero.


May 19, 2015
On Tuesday, 19 May 2015 at 10:59:53 UTC, w0rp wrote:
> I believe you can also do x & -x == x.

I think that still returns true for x = 0.
May 19, 2015
On Tuesday, 19 May 2015 at 12:00:30 UTC, Almighty Bob wrote:
> On Tuesday, 19 May 2015 at 10:59:53 UTC, w0rp wrote:
>> I believe you can also do x & -x == x.
>
> I think that still returns true for x = 0.

You are right. Disregard that.
May 19, 2015
>On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu wrote:
> ...
> Any ideas for faster code?
>
>
> Thanks,
>
> Andrei

I found www.davespace.co.uk/blog/20150131-branchless-sequences.html
and its links to be useful and interesting.

Nic
May 19, 2015
On 5/19/15 1:37 AM, H. S. Teoh via Digitalmars-d wrote:
> On Mon, May 18, 2015 at 10:16:47PM -0700, Andrei Alexandrescu via Digitalmars-d wrote:
> [...]
>> bool isPowerOf2(uint x)
>> {
>>      return (x & (x - 1) | !x) == 0;
>> }
> [...]
>
> Are you sure that's correct? Doesn't that return true for all non-zero
> numbers?

It's  bitwise or, not logic or.

-Steve
May 19, 2015
On 5/19/15 1:16 AM, Andrei Alexandrescu wrote:
> So there's this classic trick:
>
> bool isPowerOf2(uint x)
> {
>      return (x & (x - 1)) == 0;
> }
>
> Pretty neat, but it wrongly returns true for x == 0. So the obvious fix is:
>
> bool isPowerOf2(uint x)
> {
>      return x && (x & (x - 1)) == 0;
> }
>
> But that has branches in it. So I came up with:
>
> bool isPowerOf2(uint x)
> {
>      return (x & (x - 1) | !x) == 0;
> }
>
> which has no branches at least with dmd, see http://goo.gl/TVkCwc.

Is it just me, or is it odd that your first version generates xor and a bunch of jumps?

I don't see xor anywhere in the "fast" version.

Another possibility is to avoid the zero check altogether. There may be cases where you already know before calling isPowerOf2 that it's not zero, and checking for zero again is wasteful.

Note that bsr is undefined for 0, so there is precedent for this.

-Steve
May 19, 2015
On 5/19/15 8:46 AM, Steven Schveighoffer wrote:
> On 5/19/15 1:16 AM, Andrei Alexandrescu wrote:
>> So there's this classic trick:
>>
>> bool isPowerOf2(uint x)
>> {
>>      return (x & (x - 1)) == 0;
>> }
>>
>> Pretty neat, but it wrongly returns true for x == 0. So the obvious
>> fix is:
>>
>> bool isPowerOf2(uint x)
>> {
>>      return x && (x & (x - 1)) == 0;
>> }
>>
>> But that has branches in it. So I came up with:
>>
>> bool isPowerOf2(uint x)
>> {
>>      return (x & (x - 1) | !x) == 0;
>> }
>>
>> which has no branches at least with dmd, see http://goo.gl/TVkCwc.
>
> Is it just me, or is it odd that your first version generates xor and a
> bunch of jumps?
>
> I don't see xor anywhere in the "fast" version.

Nevermind, xor is just zeroing a register.

I will note though, changing the slow version to:

if(x) return (x & (x-1)) == 0;
return 0;

this reduces the jumps by 2.

-Steve
May 19, 2015
On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu wrote:
> [...], but it wrongly returns true for x == 0.

When we're talking about modulo 2^n arithmetic, 0 is in fact a power of two.

Proof: 2^n mod 2^n == 0.

May 19, 2015
On 5/19/15 2:35 AM, Martin Nowak wrote:
> On Tuesday, 19 May 2015 at 07:56:27 UTC, Atila Neves wrote:
>> Aren't predictable branches cheap on current architectures?
>
> Yes they are, and it seems one would rarely if ever call isPowOf2 with 0
> in std.allocator. A good thing to do, is to use a good hardware event
> profiler like perf, and record the branch misses. Perf is also the right
> tool to compare the branchless version (I doubt it's significantly
> better, especially since the felt of the 0 branch is just a constant).

Yah measuring on-line is clearly the way to go. A comment on the branches - the branch predictor has a limited capacity so branching here might take resources away from other places. -- Andrei
May 19, 2015
On Tuesday, 19 May 2015 at 08:28:11 UTC, John Colvin wrote:
>
> I tested with a few different (modern) backends to see what was generated, they all essentially give you this (gcc 5.1.0 -O3 -march=broadwell):
>
> isPowerOf2:
> 	xorl	%eax, %eax
> 	testl	%edi, %edi
> 	je	.L5
> 	blsr	%edi, %edi
> 	testl	%edi, %edi
> 	sete	%al
> .L5:
> 	ret

I think you used:
    return x && (x & (x - 1)) == 0;
instead of
    return (x & (x - 1)) == 0 && x;

Which influences code generation (more weight on the x == 0 test,) hence the branch.