May 19, 2015
On Tuesday, 19 May 2015 at 05:51:27 UTC, Andrei Alexandrescu wrote:
> On 5/18/15 10:37 PM, H. S. Teoh via Digitalmars-d wrote:
>> On Mon, May 18, 2015 at 10:16:47PM -0700, Andrei Alexandrescu via Digitalmars-d wrote:
>> [...]
>>> bool isPowerOf2(uint x)
>>> {
>>>     return (x & (x - 1) | !x) == 0;
>>> }
>> [...]
>>
>> Are you sure that's correct? Doesn't that return true for all non-zero
>> numbers?
>
> Excerpt from std.experimental.allocator.common:
>
> package bool isPowerOf2(uint x)
> {
>     return (x & (x - 1) | !x) == 0;
> }
>
> unittest
> {
>     assert(!isPowerOf2(0));
>     assert(isPowerOf2(1));
>     assert(isPowerOf2(2));
>     assert(!isPowerOf2(3));
>     assert(isPowerOf2(4));
>     assert(!isPowerOf2(5));
>     assert(!isPowerOf2(6));
>     assert(!isPowerOf2(7));
>     assert(isPowerOf2(8));
>     assert(!isPowerOf2(9));
>     assert(!isPowerOf2(10));
> }

I wish we had something like clang's -Wparenthesis.
I think
    return ( (x & (x-1)) | !x ) == 0;
is much clearer here.

 -Johan
May 19, 2015
Am Mon, 18 May 2015 22:16:47 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>:

> But that has branches in it. So I came up with:
> 
> bool isPowerOf2(uint x)
> {
>      return (x & (x - 1) | !x) == 0;
> }
> 
> which has no branches at least with dmd, see http://goo.gl/TVkCwc.
> 
> Any ideas for faster code?
> 
> 
> Thanks,
> 
> Andrei

Any faster ?! This is already minimal assembly code with pretty looks!

While you are at it, you might also need "round pointer up to" and "round pointer down to", which can be implemented with bit ops for power-of-2 multiples.

Its the kind of stuff that's not really a std.algorithm candidate, but still somewhat common in memory management code. Often these and other little helpers end up in everyones stdlib_extensions.d which I suppose don't look all that different. Who has some of these in their code?:

- isPowerOf2, nextLargerPowerOf2
- roundUp/Down to multiple of X
- increment with wraparound at X
- clamp value (low, high or both ends)
- check if numerical value is in between two others
- compile time "iota"
- format an argument as it would appear in source code
  (i.e. add quotes around strings)

-- 
Marco

May 19, 2015
On Tuesday, 19 May 2015 at 18:04:49 UTC, Marco Leise wrote:
> Am Mon, 18 May 2015 22:16:47 -0700
> schrieb Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>:
>
>> But that has branches in it. So I came up with:
>> 
>> bool isPowerOf2(uint x)
>> {
>>      return (x & (x - 1) | !x) == 0;
>> }
>> 
>> which has no branches at least with dmd, see http://goo.gl/TVkCwc.
>> 
>> Any ideas for faster code?
>> 
>> 
>> Thanks,
>> 
>> Andrei
>
> Any faster ?! This is already minimal assembly code with
> pretty looks!
>
> While you are at it, you might also need "round pointer up to"
> and "round pointer down to", which can be implemented with bit
> ops for power-of-2 multiples.
>
> Its the kind of stuff that's not really a std.algorithm
> candidate, but still somewhat common in memory management code.
> Often these and other little helpers end up in everyones
> stdlib_extensions.d which I suppose don't look all that
> different. Who has some of these in their code?:
>
> - isPowerOf2, nextLargerPowerOf2
> - roundUp/Down to multiple of X
> - increment with wraparound at X
> - clamp value (low, high or both ends)
> - check if numerical value is in between two others
> - compile time "iota"
> - format an argument as it would appear in source code
>   (i.e. add quotes around strings)

+1 ;)
all except the last one
May 19, 2015
On Tuesday, 19 May 2015 at 15:39:16 UTC, safety0ff wrote:
> On Tuesday, 19 May 2015 at 08:28:11 UTC, John Colvin wrote:
>>
>> I tested with a few different (modern) backends to see what was generated, they all essentially give you this (gcc 5.1.0 -O3 -march=broadwell):
>>
>> isPowerOf2:
>> 	xorl	%eax, %eax
>> 	testl	%edi, %edi
>> 	je	.L5
>> 	blsr	%edi, %edi
>> 	testl	%edi, %edi
>> 	sete	%al
>> .L5:
>> 	ret
>
> I think you used:
>     return x && (x & (x - 1)) == 0;
> instead of
>     return (x & (x - 1)) == 0 && x;
>
> Which influences code generation (more weight on the x == 0 test,) hence the branch.

I used what andrei posted, but yes i do see the difference. How infuriating. A compiler that will entirely restructure your code leaving you with something unrecognisable in many cases, but in others is sensitive to the order of operands around an &&.
May 19, 2015
On 5/19/15 11:05 AM, Marco Leise wrote:
> While you are at it, you might also need "round pointer up to"
> and "round pointer down to", which can be implemented with bit
> ops for power-of-2 multiples.

Yah, there are plenty of those in https://github.com/andralex/phobos/blob/allocator/std/experimental/allocator/common.d. Improvements welcome. -- Andrei
May 19, 2015
Have you tried things like :

(x >> bsr(x)) == 1 ?

I have no idea if this is faster or not, but worth trying.
May 19, 2015
On 5/19/15 4:01 PM, deadalnix wrote:
> Have you tried things like :
>
> (x >> bsr(x)) == 1 ?
>
> I have no idea if this is faster or not, but worth trying.

Hm.. I think this would always succeed. Perhaps you mean:

1 << bsr(x) == x;

And ironically, still doesn't work for 0 :D

IIRC, bsr is a binary search (even when it's an instruction), I doubt it's as fast as subtraction and logic-and.

-Steve
May 19, 2015
On 05/19/2015 09:56 PM, Andrei Alexandrescu wrote:
> On 5/19/15 11:05 AM, Marco Leise wrote:
>> While you are at it, you might also need "round pointer up to"
>> and "round pointer down to", which can be implemented with bit
>> ops for power-of-2 multiples.
>
> Yah, there are plenty of those in
> https://github.com/andralex/phobos/blob/allocator/std/experimental/allocator/common.d.
> Improvements welcome. -- Andrei

In case the range of s is such that divideRoundUp is actually good enough, the following avoids the conditional:

size_t roundUpToMultipleOf(size_t s,uint base){
    auto t=s+base-1;
    return t-t%base;
}

However, both divideRoundUp and this implementation of roundUpToMultipleOf do not work for s in [size_t.max-base+2, size_t.max].
May 19, 2015
On Tuesday, 19 May 2015 at 20:01:29 UTC, deadalnix wrote:
> Have you tried things like :
>
> (x >> bsr(x)) == 1 ?
>
> I have no idea if this is faster or not, but worth trying.

https://issues.dlang.org/show_bug.cgi?id=14380

"If the content source operand is 0, the content of the destination operand is undefined" - This applies to both the bsf and bsr instructions.
May 19, 2015
I think you can make the over/underflow at zero work in your favor:

bool isPowerOf2(uint x)
{
  return (x & -x) > (x - 1);
}