Thread overview
popcnt usage
Dec 23, 2013
Todd VanderVeen
Dec 23, 2013
Todd VanderVeen
Dec 23, 2013
Todd VanderVeen
Dec 23, 2013
Iain Buclaw
Dec 23, 2013
Iain Buclaw
Dec 24, 2013
Marco Leise
Dec 24, 2013
Marco Leise
Jan 02, 2014
Kai Nacke
December 23, 2013
First, let me say thanks for the addition of the popcnt inline assembler opcode. I had placed a project on hold until it was available. I look forward to using D again.

I determined this instruction was available after some experimentation as its not documented on the inline assembler page.

uint popcnt (ulong bits) {
   asm {
      mov RAX, bits ;
      popcnt RAX, RAX ;
   }
}

Mention is made in the documentation of SSE4.2 support but I understand popcnt and lzcnt aren't really considered part of this instruction set as they aren't register based. If I were to submit a pull request to address the documentation, how would you prefer this is represented, simply as additions to the opcode table or annotated that they were implemented alongside SSE4.2? Both?

A second concern is whether it is possible to determine the availability of this instruction at compile time. I want to do something like the following in a custom popcnt method:

version(X86_64) {
   static if (hasPopcnt()) {
      asm {
         ... performant assembly version
      }
   } else {
      ... slower procedural version
   }
}

But the miscellaneous features of core.cpuid are not available for conditional compilation. Is there an undocumented version label that could be used to this end? Is my only option to pass a version flag on the command line?

version(X86_64) {
   version(Has_Popcnt) {
      asm {
         ... performant assembly version
      }
   }
   else {
      ... slower procedural version
   }
}

This is workable, but it would be nice if these finer architectural distinctions were available for conditional compilation without the need for the extra external configuration.
December 23, 2013
I retract my second concern. I misread a purity error for a CTFE error. This does work as expected.

import core.cpuid: hasPopcnt;

/// Returns the number of bits which are set.
uint popcnt(ulong bits) nothrow
{
   version(X86_64) {
      if(hasPopcnt()) {
         asm {
            ....
         }
      }
      else {
         ...
      }
   }
}

Is there any reason that core.cpuid.hasPopcnt() cannot be made pure? Hopefully, calling it won't change my processor :)

December 23, 2013
Actually, I dropped the static if in my example and traded one problem for another.

As the static variables of core.cpuid are not accessible at compile time, I would like to emulate the cpu interrogation done there, but I see that asm statements are disallowed in CTFE. Is versioning the only option here?
December 23, 2013
On 23 December 2013 16:47, Todd VanderVeen <tdvanderveen@gmail.com> wrote:
> First, let me say thanks for the addition of the popcnt inline assembler opcode. I had placed a project on hold until it was available. I look forward to using D again.
>
> I determined this instruction was available after some experimentation as its not documented on the inline assembler page.
>
> uint popcnt (ulong bits) {
>    asm {
>       mov RAX, bits ;
>       popcnt RAX, RAX ;
>    }
> }
>
> Mention is made in the documentation of SSE4.2 support but I understand popcnt and lzcnt aren't really considered part of this instruction set as they aren't register based. If I were to submit a pull request to address the documentation, how would you prefer this is represented, simply as additions to the opcode table or annotated that they were implemented alongside SSE4.2? Both?
>
> A second concern is whether it is possible to determine the availability of this instruction at compile time. I want to do something like the following in a custom popcnt method:
>
> version(X86_64) {
>    static if (hasPopcnt()) {
>       asm {
>          ... performant assembly version
>       }
>    } else {
>       ... slower procedural version
>    }
> }
>

There's no way to do this at compile time, other than assume that D_InlineAsm_X86_64 imples popcnt, or do a runtime check to determine the correct path to take.
December 23, 2013
On 23 December 2013 20:18, Todd VanderVeen <tdvanderveen@gmail.com> wrote:
>
> Is there any reason that core.cpuid.hasPopcnt() cannot be made pure?
> Hopefully, calling it won't change my processor :)
>

It may have side effects, or no one thought about making it pure.
December 24, 2013
Am Mon, 23 Dec 2013 23:46:11 +0000
schrieb Iain Buclaw <ibuclaw@gdcproject.org>:

> On 23 December 2013 16:47, Todd VanderVeen <tdvanderveen@gmail.com> wrote:
> > First, let me say thanks for the addition of the popcnt inline assembler opcode. I had placed a project on hold until it was available. I look forward to using D again.
> >
> > I determined this instruction was available after some experimentation as its not documented on the inline assembler page.
> >
> > uint popcnt (ulong bits) {
> >    asm {
> >       mov RAX, bits ;
> >       popcnt RAX, RAX ;
> >    }
> > }
> >
> > Mention is made in the documentation of SSE4.2 support but I understand popcnt and lzcnt aren't really considered part of this instruction set as they aren't register based. If I were to submit a pull request to address the documentation, how would you prefer this is represented, simply as additions to the opcode table or annotated that they were implemented alongside SSE4.2? Both?
> >
> > A second concern is whether it is possible to determine the availability of this instruction at compile time. I want to do something like the following in a custom popcnt method:
> >
> > version(X86_64) {
> >    static if (hasPopcnt()) {
> >       asm {
> >          ... performant assembly version
> >       }
> >    } else {
> >       ... slower procedural version
> >    }
> > }
> >
> 
> There's no way to do this at compile time, other than assume that D_InlineAsm_X86_64 imples popcnt, or do a runtime check to determine the correct path to take.

You _could_ export the the target CPU as some built-in enum. Like in the old days where it resulted in Pentium Pro and K6 builds.

-- 
Marco

December 24, 2013
Am Mon, 23 Dec 2013 23:46:11 +0000
schrieb Iain Buclaw <ibuclaw@gdcproject.org>:

> On 23 December 2013 16:47, Todd VanderVeen <tdvanderveen@gmail.com> wrote:
> > First, let me say thanks for the addition of the popcnt inline assembler opcode. I had placed a project on hold until it was available. I look forward to using D again.
> >
> > I determined this instruction was available after some experimentation as its not documented on the inline assembler page.
> >
> > uint popcnt (ulong bits) {
> >    asm {
> >       mov RAX, bits ;
> >       popcnt RAX, RAX ;
> >    }
> > }
> >
> > Mention is made in the documentation of SSE4.2 support but I understand popcnt and lzcnt aren't really considered part of this instruction set as they aren't register based. If I were to submit a pull request to address the documentation, how would you prefer this is represented, simply as additions to the opcode table or annotated that they were implemented alongside SSE4.2? Both?
> >
> > A second concern is whether it is possible to determine the availability of this instruction at compile time. I want to do something like the following in a custom popcnt method:
> >
> > version(X86_64) {
> >    static if (hasPopcnt()) {
> >       asm {
> >          ... performant assembly version
> >       }
> >    } else {
> >       ... slower procedural version
> >    }
> > }
> >
> 
> There's no way to do this at compile time, other than assume that D_InlineAsm_X86_64 imples popcnt, or do a runtime check to determine the correct path to take.

Oh and if I remember correctly the popcnt intrinsic in GDC is somewhat slow in emulation mode. No biggie, I just got reminded.

-- 
Marco

January 02, 2014
On Monday, 23 December 2013 at 16:47:32 UTC, Todd VanderVeen wrote:
> First, let me say thanks for the addition of the popcnt inline assembler opcode. I had placed a project on hold until it was available. I look forward to using D again.
>
> I determined this instruction was available after some experimentation as its not documented on the inline assembler page.
>
> uint popcnt (ulong bits) {
>    asm {
>       mov RAX, bits ;
>       popcnt RAX, RAX ;
>    }
> }
>
> Mention is made in the documentation of SSE4.2 support but I understand popcnt and lzcnt aren't really considered part of this instruction set as they aren't register based. If I were to submit a pull request to address the documentation, how would you prefer this is represented, simply as additions to the opcode table or annotated that they were implemented alongside SSE4.2? Both?
>
> A second concern is whether it is possible to determine the availability of this instruction at compile time. I want to do something like the following in a custom popcnt method:
>
> version(X86_64) {
>    static if (hasPopcnt()) {
>       asm {
>          ... performant assembly version
>       }
>    } else {
>       ... slower procedural version
>    }
> }
>
> But the miscellaneous features of core.cpuid are not available for conditional compilation. Is there an undocumented version label that could be used to this end? Is my only option to pass a version flag on the command line?
>
> version(X86_64) {
>    version(Has_Popcnt) {
>       asm {
>          ... performant assembly version
>       }
>    }
>    else {
>       ... slower procedural version
>    }
> }
>
> This is workable, but it would be nice if these finer architectural distinctions were available for conditional compilation without the need for the extra external configuration.

With ldc2 you can use -mattr=+popcnt to use popcnt instruction and -mattr=-popcnt to use the emulation.

Regards,
Kai