April 25, 2016
On Monday, 25 April 2016 at 15:27:02 UTC, Xinok wrote:
> On Monday, 25 April 2016 at 12:56:27 UTC, Andrei Alexandrescu wrote:
>> On 4/25/16 6:42 AM, Solomon E wrote:
>>> On Monday, 25 April 2016 at 05:35:12 UTC, Andrei Alexandrescu wrote:
>>>>
>>>> With gdc https://godbolt.org/g/jcU4np isPow2B is the winner (shortest
>>>> code, simplest instructions). -- Andrei
>>>
>>> I generalized function isPow2B to work for negative numbers in case of a
>>> signed type, while keeping the assembly the same or better. (That also
>>> incidentally made it correctly return false for zero.)
>>>
>>>> bool isPow2B(T)(T x)
>>>> {
>>>>    return (x & -x) > (x - 1);
>>>> }
>>>
>>> bool isPow2F(T)(T x)
>>> {
>>>      return (x & -x) > (x >>> 1);
>>> }
>>>
>>> assembly diff:
>>> --        subl    $1, %edi
>>> ++        shrl    %edi
>>>
>>
>> That's smaller and faster, thanks. But how do you show it is still correct? -- Andrei
>
> Brute force.
>
> http://dpaste.dzfl.pl/882d7cdc5f74
Yeah. And your test spares the failed case int.min (0x80000000),
because in this case x & -x is negative, but of cause x>>>1 is positive, so it returns false despite -(2^^31) is in fact a power of two. So this requires an extra cast to work correct (in fact no big difference in the assembly):

return (Unsigned!T)(x & -x) > (Unsigned!T)(x >>> 1);

April 25, 2016
On 23/04/16 16:04, Nordlöw wrote:
> Wanted: CT-trait and run-time predicate for checking whether its single
> integer parameter is an exact power of two.
>
> I guess
>
> https://dlang.org/phobos/std_math.html#.truncPow2
>
> or
>
> https://dlang.org/phobos/std_math.html#.nextPow2
>
> could be reused, but I bet there's a more efficient way of checking this.

The standard way (assuming 2's complement machine) is:
if( i == (-i & (~i)) )
	writeln("Power of 2");

Works for all 2's complement integers except 0, where it gives a false positive.

Shachar
April 25, 2016
On Saturday, 23 April 2016 at 21:04:52 UTC, Nordlöw wrote:
> On Saturday, 23 April 2016 at 20:42:25 UTC, Lass Safin wrote:
>> CPUID: https://en.wikipedia.org/wiki/CPUID.
>> You can check for the presence of a lot of instructions with this instruction.
>> However this will only work on x86 and only run-time.
>
> Code you give a complete code example in D, please or point out a suitable place in druntime/phobos?

I just found this: https://dlang.org/phobos/core_cpuid.html#.hasPopcnt! It does exactly as it says: checks if the system has popcnt.
Though read the top of https://dlang.org/phobos/core_cpuid.html before you use it:
> Bugs:
> Currently only works on x86 and Itanium CPUs. Many processors have bugs in their
> microcode for the CPUID instruction, so sometimes the cache information may be
> incorrect.

Example;

import core.bitop;
import core.cpuid;

int count;
if(hasPopcnt)
    count = _popcnt; // Uses x86-instruction "popcnt".
else
    count = popcnt; // Phobos's software implementation.

// Do stuff with count

April 25, 2016
On Monday, 25 April 2016 at 12:56:27 UTC, Andrei Alexandrescu wrote:
>>...
>
> That's smaller and faster, thanks. But how do you show it is still correct? -- Andrei

First I wrote the following variants of isPow2A. It's more obvious that they're correct, but they didn't compile to fewer instructions.

bool isPow2D(T)(T x)
{
	return (x > 0) && !(x & (x - 1));
}

(The variant isPow2E avoids a jump instruction:)

bool isPow2E(T)(T x)
{
	return (x > 0) & !(x & (x - 1));
}

So at least I had some functions that I knew were doing what I wanted (handling negative values for sure) to test against.

Then I tried some other things that didn't work, then the bit shift variant, isPow2F. It seemed to work, I looked at at long list of results in the terminal and saw what the pattern is, but right, I shouldn't trust it without a proof.

(x & -x) evaluates to the value of the lowest bit set in x. Either (x & -x) == x, which is a power of 2, or (x & -x) == x minus the value of all the bits above the lowest bit set.

(x >>> 1) is equal to half of x or one less than half of x, or a positive number if x is negative.

For (x & -x) > (x >>> 1) to be true, that would either be because x is a power of 2, or else x - n > x/2 - 1 (or x - n > x/2, which is implied by that) where n is the value of the bits above the lowest bit set, which means n >= 2/3*x. So x - n > x/2 - 1 would be 1/3*x > x/2 - 1, which would be 0 > 1/6*x - 1. That's impossible for x > 6. For values of x under 6, the results check out case by case.

When x is negative in a signed type, the unsigned shift treats x as if it were an unsigned number. The only particular bit arrangement that's treated differently when using a signed type is that -(2^^32) isn't a power of two because it's negative, so the expression (x & -x) should evaluate to a negative value before the comparison.

That seems to work for a byte or a short, where (x & -x) comes out negative when at the min value.

I've written this clumsily because I'm not sure what sort of symbols to use and don't know how to write mathematical proofs for algorithms in programming in general, but I thought it was interesting that writing it down and double checking required showing how close the algorithm cuts it.

For example, x = 6144, (x & -x) == 2048, (x >>> 1) == 3072.

The cases for 0 to 6, when using signed types for x:
     algo B   algo F
x 0  B true   F false  (x & -x) == 0,  (x >>> 1) == 0
x 1  B true   F true   (x & -x) == 1,  (x >>> 1) == 0
x 2  B true   F true   (x & -x) == 2,  (x >>> 1) == 1
x 3  B false  F false  (x & -x) == 1,  (x >>> 1) == 1
x 4  B true   F true   (x & -x) == 4,  (x >>> 1) == 2
x 5  B false  F false  (x & -x) == 1,  (x >>> 1) == 2
x 6  B false  F false  (x & -x) == 2,  (x >>> 1) == 3

April 25, 2016
On 25 April 2016 at 18:48, Lass Safin via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Saturday, 23 April 2016 at 21:04:52 UTC, Nordlöw wrote:
>>
>> On Saturday, 23 April 2016 at 20:42:25 UTC, Lass Safin wrote:
>>>
>>> CPUID: https://en.wikipedia.org/wiki/CPUID.
>>> You can check for the presence of a lot of instructions with this
>>> instruction.
>>> However this will only work on x86 and only run-time.
>>
>>
>> Code you give a complete code example in D, please or point out a suitable place in druntime/phobos?
>
>
> I just found this: https://dlang.org/phobos/core_cpuid.html#.hasPopcnt! It
> does exactly as it says: checks if the system has popcnt.
> Though read the top of https://dlang.org/phobos/core_cpuid.html before you
> use it:
>>
>> Bugs:
>> Currently only works on x86 and Itanium CPUs. Many processors have bugs in
>> their
>> microcode for the CPUID instruction, so sometimes the cache information
>> may be
>> incorrect.
>
>
> Example;
>
> import core.bitop;
> import core.cpuid;
>
> int count;
> if(hasPopcnt)
>     count = _popcnt; // Uses x86-instruction "popcnt".
> else
>     count = popcnt; // Phobos's software implementation.
>

version(X86)
{
  if (hasPopcnt)
    ...
}
else
{
  // API not guaranteed to be the same.
}

April 25, 2016
On Monday, 25 April 2016 at 15:27:02 UTC, Xinok wrote:
> On Monday, 25 April 2016 at 12:56:27 UTC, Andrei Alexandrescu wrote:
>> On 4/25/16 6:42 AM, Solomon E wrote:
>>> ...
>>> I generalized function isPow2B to work for negative numbers in case of a
>>> signed type, while keeping the assembly the same or better. ...
>>>> [...]
>>>
>>> bool isPow2F(T)(T x)
>>> {
>>>      return (x & -x) > (x >>> 1);
>>> }
>>>
>>> ...
>>
>> That's smaller and faster, thanks. But how do you show it is still correct? -- Andrei
>
> Brute force.
>
> http://dpaste.dzfl.pl/882d7cdc5f74


Brute force? That doesn't prove the algorithm, it only proves the implementation.
April 25, 2016
On Monday, 25 April 2016 at 16:48:14 UTC, Lass Safin wrote:
> Example;
>
> import core.bitop;
> import core.cpuid;
>
> int count;
> if(hasPopcnt)
>     count = _popcnt; // Uses x86-instruction "popcnt".
> else
>     count = popcnt; // Phobos's software implementation.
>
> // Do stuff with count

That is not right (since 2.069, I think?). Adding that check just makes things slower and more complicated:

core.bitop.popcnt() itself already checks hasPopcnt, and forwards to the intrinsic if it is available.
April 25, 2016
On Monday, 25 April 2016 at 16:53:20 UTC, Solomon E wrote:
> ...
>
> (x >>> 1) is equal to half of x or one less than half of x, or a positive number if x is negative.
>
> ...

Obviously wrong explanation, sorry. (x >>> 1) is equal to half of x or half of one less than x, or a positive number if x is negative.

There are other parts in isPow2F where I'm not sure exactly what the bits are doing, such as how the compiler makes the result negative for (x & -x) at the bit and short min values, where in other expressions while learning D I've found an implicit cast to int that I worked around by casting back to the type I put in. (I guess that's the way it's supposed to be for compatibility with C/C++ expectations.) Also I skipped mentioning case x = 6 works too, not just cases over and under 6. It's just a sketch of a proof.

I would use the other function I wrote first, isPow2D, to be more sure of the logic, although it's not the most optimized, or I'd use a library function that promises to do the task for all types if I found one first, rather than the isPow2 variants I wrote and others wrote, and rather than popcnt. I'm more concerned with getting the application logic right when I program than optimizing for speed or bytes, until something is noticeably pausing or taking extra megabytes.

bool isPow2D(T)(T x)
{
    return (x > 0) && !(x & (x - 1));
}

Just now I tested popcnt against isPow2D and found that popcnt reports the number of bits set in a uint, regardless of popcnt's argument being a smaller size integer type or signed. That would make (popcnt( -(2^^31) ) == 1) yield true, which isn't what I want for an isPow2 function. popcnt is only documented to take uint or ulong, so that's not a bug in the library, only in my try at using it roughly.

April 25, 2016
On 04/25/2016 03:21 PM, Solomon E wrote:
> There are other parts in isPow2F where I'm not sure exactly what the
> bits are doing, such as how the compiler makes the result negative

I suggest you simply consider x unsigned. There is no value to the discussion brought by negative numbers, and implementation-wise you can insert a cast or let the overload mechanism only use unsigned integers. -- Andrei
April 25, 2016
On 04/25/2016 03:21 PM, Solomon E wrote:
> I'm more concerned with getting the application
> logic right when I program than optimizing for speed or bytes, until
> something is noticeably pausing or taking extra megabytes.

This is a luxury not available to library writers.

> bool isPow2D(T)(T x)
> {
>      return (x > 0) && !(x & (x - 1));
> }

This is worse than what we have now, which is easy to show is correct.


Andrei