Jump to page: 1 25  
Page
Thread overview
0 is not a power of 2
May 19, 2015
Brian Schott
May 19, 2015
H. S. Teoh
May 19, 2015
Johan Engelen
May 19, 2015
Atila Neves
May 19, 2015
Martin Nowak
May 19, 2015
John Colvin
May 19, 2015
safety0ff
May 19, 2015
John Colvin
May 19, 2015
Almighty Bob
May 19, 2015
Almighty Bob
May 19, 2015
w0rp
May 19, 2015
Almighty Bob
May 19, 2015
w0rp
May 19, 2015
Nicholas Wilson
May 19, 2015
Matthias Bentrup
May 19, 2015
Marco Leise
May 19, 2015
Zoadian
May 19, 2015
Timon Gehr
May 19, 2015
deadalnix
May 19, 2015
deadalnix
May 19, 2015
deadalnix
May 21, 2015
deadalnix
May 19, 2015
Brian Schott
May 19, 2015
deadalnix
May 19, 2015
Matthias Bentrup
May 20, 2015
John Colvin
May 20, 2015
Temtaime
May 22, 2015
Jay Norwood
May 22, 2015
Jay Norwood
May 19, 2015
So there's this classic trick:

bool isPowerOf2(uint x)
{
    return (x & (x - 1)) == 0;
}

Pretty neat, but it wrongly returns true for x == 0. So the obvious fix is:

bool isPowerOf2(uint x)
{
    return x && (x & (x - 1)) == 0;
}

But that has branches in it. So I came up with:

bool isPowerOf2(uint x)
{
    return (x & (x - 1) | !x) == 0;
}

which has no branches at least with dmd, see http://goo.gl/TVkCwc.

Any ideas for faster code?


Thanks,

Andrei
May 19, 2015
On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu wrote:
> Any ideas for faster code?

Unless I'm mistaken, any uint that's a power of 2 will only have a single set bit, so why not use the "popcnt" instruction? http://dlang.org/phobos/core_bitop.html#.popcnt
May 19, 2015
On Mon, May 18, 2015 at 10:16:47PM -0700, Andrei Alexandrescu via Digitalmars-d wrote: [...]
> bool isPowerOf2(uint x)
> {
>     return (x & (x - 1) | !x) == 0;
> }
[...]

Are you sure that's correct? Doesn't that return true for all non-zero numbers?


T

-- 
"Uhh, I'm still not here." -- KD, while "away" on ICQ.
May 19, 2015
On 5/18/15 10:37 PM, H. S. Teoh via Digitalmars-d wrote:
> On Mon, May 18, 2015 at 10:16:47PM -0700, Andrei Alexandrescu via Digitalmars-d wrote:
> [...]
>> bool isPowerOf2(uint x)
>> {
>>      return (x & (x - 1) | !x) == 0;
>> }
> [...]
>
> Are you sure that's correct? Doesn't that return true for all non-zero
> numbers?

Excerpt from std.experimental.allocator.common:

package bool isPowerOf2(uint x)
{
    return (x & (x - 1) | !x) == 0;
}

unittest
{
    assert(!isPowerOf2(0));
    assert(isPowerOf2(1));
    assert(isPowerOf2(2));
    assert(!isPowerOf2(3));
    assert(isPowerOf2(4));
    assert(!isPowerOf2(5));
    assert(!isPowerOf2(6));
    assert(!isPowerOf2(7));
    assert(isPowerOf2(8));
    assert(!isPowerOf2(9));
    assert(!isPowerOf2(10));
}


Andrei

May 19, 2015
On 5/18/15 10:40 PM, Brian Schott wrote:
> On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu wrote:
>> Any ideas for faster code?
>
> Unless I'm mistaken, any uint that's a power of 2 will only have a
> single set bit, so why not use the "popcnt" instruction?
> http://dlang.org/phobos/core_bitop.html#.popcnt

There are complaints it's not that fast, e.g. http://danluu.com/assembly-intrinsics/. -- Andrei

May 19, 2015
Aren't predictable branches cheap on current architectures?

Atila

On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu wrote:
> So there's this classic trick:
>
> bool isPowerOf2(uint x)
> {
>     return (x & (x - 1)) == 0;
> }
>
> Pretty neat, but it wrongly returns true for x == 0. So the obvious fix is:
>
> bool isPowerOf2(uint x)
> {
>     return x && (x & (x - 1)) == 0;
> }
>
> But that has branches in it. So I came up with:
>
> bool isPowerOf2(uint x)
> {
>     return (x & (x - 1) | !x) == 0;
> }
>
> which has no branches at least with dmd, see http://goo.gl/TVkCwc.
>
> Any ideas for faster code?
>
>
> Thanks,
>
> Andrei

May 19, 2015
On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu wrote:
> So there's this classic trick:
>
> bool isPowerOf2(uint x)
> {
>     return (x & (x - 1)) == 0;
> }
>
> Pretty neat, but it wrongly returns true for x == 0. So the obvious fix is:
>
> bool isPowerOf2(uint x)
> {
>     return x && (x & (x - 1)) == 0;
> }
>
> But that has branches in it. So I came up with:
>
> bool isPowerOf2(uint x)
> {
>     return (x & (x - 1) | !x) == 0;
> }
>
> which has no branches at least with dmd, see http://goo.gl/TVkCwc.
>
> Any ideas for faster code?
>
>
> Thanks,
>
> Andrei

I tested with a few different (modern) backends to see what was generated, they all essentially give you this (gcc 5.1.0 -O3 -march=broadwell):

isPowerOf2:
	xorl	%eax, %eax
	testl	%edi, %edi
	je	.L5
	blsr	%edi, %edi
	testl	%edi, %edi
	sete	%al
.L5:
	ret
isPowerOf2b:
	blsr	%edi, %edx
	xorl	%eax, %eax
	testl	%edi, %edi
	sete	%al
	orl	%eax, %edx
	sete	%al
	ret

but if your not restricting the build to such recent processor, you can't emit BLSR, so you get this instead (gcc 5.1.0 -O3):

isPowerOf2b:
	leal	-1(%rdi), %eax
	xorl	%edx, %edx
	andl	%edi, %eax
	testl	%edi, %edi
	sete	%dl
	orl	%edx, %eax
	sete	%al
	ret
May 19, 2015
On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu wrote:
> So there's this classic trick:
>
> bool isPowerOf2(uint x)
> {
>     return (x & (x - 1)) == 0;
> }
>
> which has no branches at least with dmd, see http://goo.gl/TVkCwc.
>
> Any ideas for faster code?

If you dont mind asm then after you do...

tmp = x-1;

you could add the borrow/carry flag back onto the tmp, so it'd add back up to zero any time there's an underflow in the (x-1) op.

So two extra instructions, (you need a zero for the ADC) no branch.


May 19, 2015
On Tuesday, 19 May 2015 at 07:56:27 UTC, Atila Neves wrote:
> Aren't predictable branches cheap on current architectures?

Yes they are, and it seems one would rarely if ever call isPowOf2 with 0 in std.allocator. A good thing to do, is to use a good hardware event profiler like perf, and record the branch misses. Perf is also the right tool to compare the branchless version (I doubt it's significantly better, especially since the felt of the 0 branch is just a constant).
May 19, 2015
I believe you can also do x & -x == x. I'm not sure if that will be actually faster or slower. You could maybe cut the instructions down a little with an asm{} block. The compiler might not figure out that it can re-use a register for x on the left and x on the right there. You might use popcnt in a version() block too, so you can use the instruction when you've got it.
« First   ‹ Prev
1 2 3 4 5