SIMD under LDC - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » SIMD under LDC

Thread overview

SIMD under LDC
Sep 04, 2017 Igor
Sep 04, 2017 Nicholas Wilson
Sep 05, 2017 12345swordy
Sep 05, 2017 Igor
Sep 05, 2017 Johan Engelen
Sep 06, 2017 Igor
Sep 06, 2017 Igor
Sep 07, 2017 Johan Engelen
Sep 07, 2017 Igor
Sep 11, 2017 Igor
Sep 11, 2017 Igor

September 04, 2017

Posted by Igor

Igor

I found that I can't use __simd function from core.simd under LDC and that it has ldc.simd but I couldn't find how to implement equivalent to this with it:

ubyte16* masks = ...;
foreach (ref c; pixels) {
	c = __simd(XMM.PSHUFB, c, *masks);
}

I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?

BTW. Shuffling channels within pixels using DMD simd is about 5 times faster than with normal code on my machine :)

September 04, 2017

Re: SIMD under LDC

Posted by Nicholas Wilson
in reply to Igor

Nicholas Wilson

Posted in reply to Igor

On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
> I found that I can't use __simd function from core.simd under LDC

Correct LDC does not support the core.simd interface.

> and that it has ldc.simd but I couldn't find how to implement equivalent to this with it:
>
> ubyte16* masks = ...;
> foreach (ref c; pixels) {
> 	c = __simd(XMM.PSHUFB, c, *masks);
> }
>
> I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?

You have several options:
* write a regular for loop and let LDC's optimiser take care of the rest.

alias mask_t = ReturnType!(equalMask!ubyte16);
pragma(LDC_intrinsic, "llvm.masked.load.v16i8.p0v16i8")
    ubyte16 llvm_masked_load(ubyte16* val,int align, mask_t mask, ubyte16 fallthru);

ubyte16* masks = ...;
foreach (ref c; pixels) {
        auto mask = equalMask!ubyte16(*masks, [-1,-1,-1, ...]);
	c = llvm_masked_load(&c,16,mask, [0,0,0,0 ... ]);
}

The second one might not work, because of type differences in llvm, but should serve as a guide to hacking the `cmpMask` IR code in ldc.simd to do what you want it to.

> BTW. Shuffling channels within pixels using DMD simd is about 5 times faster than with normal code on my machine :)

Don't underestimate ldc's optimiser ;)

September 05, 2017

Re: SIMD under LDC

Posted by 12345swordy
in reply to Nicholas Wilson

12345swordy

Posted in reply to Nicholas Wilson

On Monday, 4 September 2017 at 23:06:27 UTC, Nicholas Wilson wrote:
> On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
>> I found that I can't use __simd function from core.simd under LDC
>
> Correct LDC does not support the core.simd interface.
>
>> and that it has ldc.simd but I couldn't find how to implement equivalent to this with it:
>>
>> ubyte16* masks = ...;
>> foreach (ref c; pixels) {
>> 	c = __simd(XMM.PSHUFB, c, *masks);
>> }
>>
>> I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?
>
> You have several options:
> * write a regular for loop and let LDC's optimiser take care of the rest.
>
> alias mask_t = ReturnType!(equalMask!ubyte16);
> pragma(LDC_intrinsic, "llvm.masked.load.v16i8.p0v16i8")
>     ubyte16 llvm_masked_load(ubyte16* val,int align, mask_t mask, ubyte16 fallthru);
>
> ubyte16* masks = ...;
> foreach (ref c; pixels) {
>         auto mask = equalMask!ubyte16(*masks, [-1,-1,-1, ...]);
> 	c = llvm_masked_load(&c,16,mask, [0,0,0,0 ... ]);
> }
>
> The second one might not work, because of type differences in llvm, but should serve as a guide to hacking the `cmpMask` IR code in ldc.simd to do what you want it to.
>
>> BTW. Shuffling channels within pixels using DMD simd is about 5 times faster than with normal code on my machine :)
>
> Don't underestimate ldc's optimiser ;)
I seen cases where the compiler fail to optimized for smid.

September 05, 2017

Re: SIMD under LDC

Posted by Igor
in reply to 12345swordy

Igor

Posted in reply to 12345swordy

On Tuesday, 5 September 2017 at 01:11:29 UTC, 12345swordy wrote:
> On Monday, 4 September 2017 at 23:06:27 UTC, Nicholas Wilson wrote:
>> Don't underestimate ldc's optimiser ;)
> I seen cases where the compiler fail to optimized for smid.

I tried it and LDC optimized build did generate SIMD instructions from regular code but it used multiple ones to do job so it is about 1.4 times slower than manual SIMD version with DMD. That is probably good enough for me.

September 05, 2017

Re: SIMD under LDC

Posted by Johan Engelen
in reply to Igor

Johan Engelen

Posted in reply to Igor

On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
> I found that I can't use __simd function from core.simd under LDC and that it has ldc.simd but I couldn't find how to implement equivalent to this with it:
>
> ubyte16* masks = ...;
> foreach (ref c; pixels) {
> 	c = __simd(XMM.PSHUFB, c, *masks);
> }
>
> I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?

You can use the module ldc.gccbuiltins_x86.di, __builtin_ia32_pshufb128 and __builtin_ia32_pshufb256.

(also see https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/X86-Built_002din-Functions.html)

Please file a feature request about shufflevector with variable mask in our (LDC) issue tracker on Github; with some code that you'd expect to work. Thanks.

- Johan

September 06, 2017

Re: SIMD under LDC

Posted by Igor
in reply to Johan Engelen

Igor

Posted in reply to Johan Engelen

On Tuesday, 5 September 2017 at 18:50:34 UTC, Johan Engelen wrote:
> On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
>> I found that I can't use __simd function from core.simd under LDC and that it has ldc.simd but I couldn't find how to implement equivalent to this with it:
>>
>> ubyte16* masks = ...;
>> foreach (ref c; pixels) {
>> 	c = __simd(XMM.PSHUFB, c, *masks);
>> }
>>
>> I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?
>
> You can use the module ldc.gccbuiltins_x86.di, __builtin_ia32_pshufb128 and __builtin_ia32_pshufb256.
>
> (also see https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/X86-Built_002din-Functions.html)
>
> Please file a feature request about shufflevector with variable mask in our (LDC) issue tracker on Github; with some code that you'd expect to work. Thanks.
>
> - Johan

I'll try that this evening. Thanks! I'll also open an issue but are you sure such feature request is valid since LLVM shufflevector instruction, as far as I see, only supports constant masks as well.

September 06, 2017

Re: SIMD under LDC

Posted by Igor
in reply to Igor

Igor

Posted in reply to Igor

On Wednesday, 6 September 2017 at 09:01:18 UTC, Igor wrote:
> On Tuesday, 5 September 2017 at 18:50:34 UTC, Johan Engelen wrote:
>> On Monday, 4 September 2017 at 20:39:11 UTC, Igor wrote:
>>> I found that I can't use __simd function from core.simd under LDC and that it has ldc.simd but I couldn't find how to implement equivalent to this with it:
>>>
>>> ubyte16* masks = ...;
>>> foreach (ref c; pixels) {
>>> 	c = __simd(XMM.PSHUFB, c, *masks);
>>> }
>>>
>>> I see it has shufflevector function but it only accepts constant masks and I am using a variable one. Is this possible under LDC?
>>
>> You can use the module ldc.gccbuiltins_x86.di, __builtin_ia32_pshufb128 and __builtin_ia32_pshufb256.
>>
>> (also see https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/X86-Built_002din-Functions.html)
>>
>> Please file a feature request about shufflevector with variable mask in our (LDC) issue tracker on Github; with some code that you'd expect to work. Thanks.
>>
>> - Johan
>
> I'll try that this evening. Thanks! I'll also open an issue but are you sure such feature request is valid since LLVM shufflevector instruction, as far as I see, only supports constant masks as well.

I opened a feature request on github. I also tried using the gccbuiltins but I got this error:

LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = X86ISD::PSHUFB 0x2199c74e9a8, 0x2199c74d6c0
  0x2199c74e9a8: v16i8,ch = CopyFromReg 0x21994bcfd90, Register:v16i8 %vreg384
    0x2199c96fb00: v16i8 = Register %vreg384
  0x2199c74d6c0: v16i8,ch = CopyFromReg 0x21994bcfd90, Register:v16i8 %vreg385
    0x2199c74ed50: v16i8 = Register %vreg385
In function: _D7assetdb12loadBmpImageFAxaZf
Building x64\LDCDebug\DNgin.exe failed!

You can see the code I used here: https://github.com/igor84/dngin/blob/3c171330843af71170a6ee4ae164a76bf58c35f6/source/assetdb.d#L123

Note that if you want to try it you will need a test.bmp in specific format where header.compression == 3, like this one: https://drive.google.com/file/d/0B9l8IgnRaPwCU0hIWEtHUElhTTg/view?usp=sharing

September 07, 2017

Re: SIMD under LDC

Posted by Johan Engelen
in reply to Igor

Johan Engelen

Posted in reply to Igor

On Wednesday, 6 September 2017 at 20:43:01 UTC, Igor wrote:
>
> I opened a feature request on github. I also tried using the gccbuiltins but I got this error:
>
> LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = X86ISD::PSHUFB 0x2199c74e9a8, 0x2199c74d6c0

That's because SSSE3 instructions are not enabled by default, so the compiler isn't allowed to generate the PSHUFB instruction.
Some options you have:
1. Set a cpu that has ssse3, e.g. compile with `-mcpu=native`
2. Enable SSSE3: compile with `-mattr=+ssse3`
3. Perhaps best for your case, enable SSSE3 for that function, importing the ldc.attributes module and using the @target("ssse3") UDA on that function.

-Johan

September 07, 2017

Re: SIMD under LDC

Posted by Igor
in reply to Johan Engelen

Igor

Posted in reply to Johan Engelen

On Thursday, 7 September 2017 at 15:24:13 UTC, Johan Engelen wrote:
> On Wednesday, 6 September 2017 at 20:43:01 UTC, Igor wrote:
>>
>> I opened a feature request on github. I also tried using the gccbuiltins but I got this error:
>>
>> LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = X86ISD::PSHUFB 0x2199c74e9a8, 0x2199c74d6c0
>
> That's because SSSE3 instructions are not enabled by default, so the compiler isn't allowed to generate the PSHUFB instruction.
> Some options you have:
> 1. Set a cpu that has ssse3, e.g. compile with `-mcpu=native`
> 2. Enable SSSE3: compile with `-mattr=+ssse3`
> 3. Perhaps best for your case, enable SSSE3 for that function, importing the ldc.attributes module and using the @target("ssse3") UDA on that function.
>
> -Johan

Thanks Johan. I tried this and now it does compile but it crashes with Access Violation in debug build. In optimized build it seems to be working though.

September 11, 2017

Re: SIMD under LDC

Posted by Igor
in reply to Igor

Igor

Posted in reply to Igor

On Thursday, 7 September 2017 at 16:45:40 UTC, Igor wrote:
> On Thursday, 7 September 2017 at 15:24:13 UTC, Johan Engelen wrote:
>> On Wednesday, 6 September 2017 at 20:43:01 UTC, Igor wrote:
>>>
>>> I opened a feature request on github. I also tried using the gccbuiltins but I got this error:
>>>
>>> LLVM ERROR: Cannot select: 0x2199c96fd70: v16i8 = X86ISD::PSHUFB 0x2199c74e9a8, 0x2199c74d6c0
>>
>> That's because SSSE3 instructions are not enabled by default, so the compiler isn't allowed to generate the PSHUFB instruction.
>> Some options you have:
>> 1. Set a cpu that has ssse3, e.g. compile with `-mcpu=native`
>> 2. Enable SSSE3: compile with `-mattr=+ssse3`
>> 3. Perhaps best for your case, enable SSSE3 for that function, importing the ldc.attributes module and using the @target("ssse3") UDA on that function.
>>
>> -Johan
>
> Thanks Johan. I tried this and now it does compile but it crashes with Access Violation in debug build. In optimized build it seems to be working though.

I will try to reproduce this in minimal project and open LDC bug if successful.

In the meantime can anyone tell me how to add an attribute to a function only if something is defined, since this doesn't work:

version(USE_SIMD_WITH_LDC) {
  import ldc.attributes;
  @target("ssse3")
} void funcThatUsesSIMD() {
  ...
  version(LDC) {
    import ldc.gccbuiltins_x86;
    c = __builtin_ia32_pshufb128(c, *simdMasks);
  } else {
    c = __simd(XMM.PSHUFB, c, *simdMasks);
  }
  ...
}

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation