Jump to page: 1 2
Thread overview
SIMD under LDC
Sep 04, 2017
Igor
Sep 04, 2017
Nicholas Wilson
Sep 05, 2017
12345swordy
Sep 05, 2017
Igor
Sep 05, 2017
Johan Engelen
Sep 06, 2017
Igor
Sep 06, 2017
Igor
Sep 07, 2017
Johan Engelen
Sep 07, 2017
Igor
Sep 11, 2017
Igor
Sep 11, 2017
Igor
September 04, 2017
I found that I can't use __simd function from core.simd under LDC and that it has ldc.simd but I couldn't find how to implement equivalent to this with it:

ubyte16* masks = ...;
foreach (ref c; pixels) {
	c = __simd(XMM.PSHUFB, c, *masks);
}

I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?

BTW. Shuffling channels within pixels using DMD simd is about 5 times faster than with normal code on my machine :)
September 04, 2017
On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
> I found that I can't use __simd function from core.simd under LDC

Correct LDC does not support the core.simd interface.

> and that it has ldc.simd but I couldn't find how to implement equivalent to this with it:
>
> ubyte16* masks = ...;
> foreach (ref c; pixels) {
> 	c = __simd(XMM.PSHUFB, c, *masks);
> }
>
> I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?

You have several options:
* write a regular for loop and let LDC's optimiser take care of the rest.

alias mask_t = ReturnType!(equalMask!ubyte16);
pragma(LDC_intrinsic, "llvm.masked.load.v16i8.p0v16i8")
    ubyte16 llvm_masked_load(ubyte16* val,int align, mask_t mask, ubyte16 fallthru);

ubyte16* masks = ...;
foreach (ref c; pixels) {
        auto mask = equalMask!ubyte16(*masks, [-1,-1,-1, ...]);
	c = llvm_masked_load(&c,16,mask, [0,0,0,0 ... ]);
}

The second one might not work, because of type differences in llvm, but should serve as a guide to hacking the `cmpMask` IR code in ldc.simd to do what you want it to.

> BTW. Shuffling channels within pixels using DMD simd is about 5 times faster than with normal code on my machine :)

Don't underestimate ldc's optimiser ;)
September 05, 2017
On Monday, 4 September 2017 at 23:06:27 UTC, Nicholas Wilson wrote:
> On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
>> I found that I can't use __simd function from core.simd under LDC
>
> Correct LDC does not support the core.simd interface.
>
>> and that it has ldc.simd but I couldn't find how to implement equivalent to this with it:
>>
>> ubyte16* masks = ...;
>> foreach (ref c; pixels) {
>> 	c = __simd(XMM.PSHUFB, c, *masks);
>> }
>>
>> I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?
>
> You have several options:
> * write a regular for loop and let LDC's optimiser take care of the rest.
>
> alias mask_t = ReturnType!(equalMask!ubyte16);
> pragma(LDC_intrinsic, "llvm.masked.load.v16i8.p0v16i8")
>     ubyte16 llvm_masked_load(ubyte16* val,int align, mask_t mask, ubyte16 fallthru);
>
> ubyte16* masks = ...;
> foreach (ref c; pixels) {
>         auto mask = equalMask!ubyte16(*masks, [-1,-1,-1, ...]);
> 	c = llvm_masked_load(&c,16,mask, [0,0,0,0 ... ]);
> }
>
> The second one might not work, because of type differences in llvm, but should serve as a guide to hacking the `cmpMask` IR code in ldc.simd to do what you want it to.
>
>> BTW. Shuffling channels within pixels using DMD simd is about 5 times faster than with normal code on my machine :)
>
> Don't underestimate ldc's optimiser ;)
I seen cases where the compiler fail to optimized for smid.

September 05, 2017
On Tuesday, 5 September 2017 at 01:11:29 UTC, 12345swordy wrote:
> On Monday, 4 September 2017 at 23:06:27 UTC, Nicholas Wilson wrote:
>> Don't underestimate ldc's optimiser ;)
> I seen cases where the compiler fail to optimized for smid.

I tried it and LDC optimized build did generate SIMD instructions from regular code but it used multiple ones to do job so it is about 1.4 times slower than manual SIMD version with DMD. That is probably good enough for me.
September 05, 2017
On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
> I found that I can't use __simd function from core.simd under LDC and that it has ldc.simd but I couldn't find how to implement equivalent to this with it:
>
> ubyte16* masks = ...;
> foreach (ref c; pixels) {
> 	c = __simd(XMM.PSHUFB, c, *masks);
> }
>
> I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?

You can use the module ldc.gccbuiltins_x86.di, __builtin_ia32_pshufb128 and __builtin_ia32_pshufb256.

(also see https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/X86-Built_002din-Functions.html)

Please file a feature request about shufflevector with variable mask in our (LDC) issue tracker on Github; with some code that you'd expect to work. Thanks.

- Johan


September 06, 2017
On Tuesday, 5 September 2017 at 18:50:34 UTC, Johan Engelen wrote:
> On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
>> I found that I can't use __simd function from core.simd under LDC and that it has ldc.simd but I couldn't find how to implement equivalent to this with it:
>>
>> ubyte16* masks = ...;
>> foreach (ref c; pixels) {
>> 	c = __simd(XMM.PSHUFB, c, *masks);
>> }
>>
>> I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?
>
> You can use the module ldc.gccbuiltins_x86.di, __builtin_ia32_pshufb128 and __builtin_ia32_pshufb256.
>
> (also see https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/X86-Built_002din-Functions.html)
>
> Please file a feature request about shufflevector with variable mask in our (LDC) issue tracker on Github; with some code that you'd expect to work. Thanks.
>
> - Johan

I'll try that this evening. Thanks! I'll also open an issue but are you sure such feature request is valid since LLVM shufflevector instruction, as far as I see, only supports constant masks as well.
September 06, 2017
On Wednesday, 6 September 2017 at 09:01:18 UTC, Igor wrote:
> On Tuesday, 5 September 2017 at 18:50:34 UTC, Johan Engelen wrote:
>> On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
>>> I found that I can't use __simd function from core.simd under LDC and that it has ldc.simd but I couldn't find how to implement equivalent to this with it:
>>>
>>> ubyte16* masks = ...;
>>> foreach (ref c; pixels) {
>>> 	c = __simd(XMM.PSHUFB, c, *masks);
>>> }
>>>
>>> I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?
>>
>> You can use the module ldc.gccbuiltins_x86.di, __builtin_ia32_pshufb128 and __builtin_ia32_pshufb256.
>>
>> (also see https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/X86-Built_002din-Functions.html)
>>
>> Please file a feature request about shufflevector with variable mask in our (LDC) issue tracker on Github; with some code that you'd expect to work. Thanks.
>>
>> - Johan
>
> I'll try that this evening. Thanks! I'll also open an issue but are you sure such feature request is valid since LLVM shufflevector instruction, as far as I see, only supports constant masks as well.

I opened a feature request on github. I also tried using the gccbuiltins but I got this error:

LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = X86ISD::PSHUFB 0x2199c74e9a8, 0x2199c74d6c0
  0x2199c74e9a8: v16i8,ch = CopyFromReg 0x21994bcfd90, Register:v16i8 %vreg384
    0x2199c96fb00: v16i8 = Register %vreg384
  0x2199c74d6c0: v16i8,ch = CopyFromReg 0x21994bcfd90, Register:v16i8 %vreg385
    0x2199c74ed50: v16i8 = Register %vreg385
In function: _D7assetdb12loadBmpImageFAxaZf
Building x64\LDCDebug\DNgin.exe failed!

You can see the code I used here: https://github.com/igor84/dngin/blob/3c171330843af71170a6ee4ae164a76bf58c35f6/source/assetdb.d#L123

Note that if you want to try it you will need a test.bmp in specific format where header.compression == 3, like this one: https://drive.google.com/file/d/0B9l8IgnRaPwCU0hIWEtHUElhTTg/view?usp=sharing

September 07, 2017
On Wednesday, 6 September 2017 at 20:43:01 UTC, Igor wrote:
>
> I opened a feature request on github. I also tried using the gccbuiltins but I got this error:
>
> LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = X86ISD::PSHUFB 0x2199c74e9a8, 0x2199c74d6c0

That's because SSSE3 instructions are not enabled by default, so the compiler isn't allowed to generate the PSHUFB instruction.
Some options you have:
1. Set a cpu that has ssse3, e.g. compile with `-mcpu=native`
2. Enable SSSE3: compile with `-mattr=+ssse3`
3. Perhaps best for your case, enable SSSE3 for that function, importing the ldc.attributes module and using the @target("ssse3") UDA on that function.

-Johan
September 07, 2017
On Thursday, 7 September 2017 at 15:24:13 UTC, Johan Engelen wrote:
> On Wednesday, 6 September 2017 at 20:43:01 UTC, Igor wrote:
>>
>> I opened a feature request on github. I also tried using the gccbuiltins but I got this error:
>>
>> LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = X86ISD::PSHUFB 0x2199c74e9a8, 0x2199c74d6c0
>
> That's because SSSE3 instructions are not enabled by default, so the compiler isn't allowed to generate the PSHUFB instruction.
> Some options you have:
> 1. Set a cpu that has ssse3, e.g. compile with `-mcpu=native`
> 2. Enable SSSE3: compile with `-mattr=+ssse3`
> 3. Perhaps best for your case, enable SSSE3 for that function, importing the ldc.attributes module and using the @target("ssse3") UDA on that function.
>
> -Johan

Thanks Johan. I tried this and now it does compile but it crashes with Access Violation in debug build. In optimized build it seems to be working though.
September 11, 2017
On Thursday, 7 September 2017 at 16:45:40 UTC, Igor wrote:
> On Thursday, 7 September 2017 at 15:24:13 UTC, Johan Engelen wrote:
>> On Wednesday, 6 September 2017 at 20:43:01 UTC, Igor wrote:
>>>
>>> I opened a feature request on github. I also tried using the gccbuiltins but I got this error:
>>>
>>> LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = X86ISD::PSHUFB 0x2199c74e9a8, 0x2199c74d6c0
>>
>> That's because SSSE3 instructions are not enabled by default, so the compiler isn't allowed to generate the PSHUFB instruction.
>> Some options you have:
>> 1. Set a cpu that has ssse3, e.g. compile with `-mcpu=native`
>> 2. Enable SSSE3: compile with `-mattr=+ssse3`
>> 3. Perhaps best for your case, enable SSSE3 for that function, importing the ldc.attributes module and using the @target("ssse3") UDA on that function.
>>
>> -Johan
>
> Thanks Johan. I tried this and now it does compile but it crashes with Access Violation in debug build. In optimized build it seems to be working though.

I will try to reproduce this in minimal project and open LDC bug if successful.

In the meantime can anyone tell me how to add an attribute to a function only if something is defined, since this doesn't work:

version(USE_SIMD_WITH_LDC) {
  import ldc.attributes;
  @target("ssse3")
} void funcThatUsesSIMD() {
  ...
  version(LDC) {
    import ldc.gccbuiltins_x86;
    c = __builtin_ia32_pshufb128(c, *simdMasks);
  } else {
    c = __simd(XMM.PSHUFB, c, *simdMasks);
  }
  ...
}
« First   ‹ Prev
1 2