Jump to page: 1 2
Thread overview
intel-intrinsics v1.0.0
Feb 06, 2019
Guillaume Piolat
Feb 06, 2019
Simen Kjærås
Feb 06, 2019
Guillaume Piolat
Feb 08, 2019
NaN
Feb 08, 2019
Guillaume Piolat
Feb 08, 2019
NaN
Feb 13, 2019
Crayo List
Feb 13, 2019
Guillaume Piolat
Feb 13, 2019
Crayo List
Feb 14, 2019
Simen Kjærås
Feb 14, 2019
Guillaume Piolat
Feb 14, 2019
Ethan
Feb 14, 2019
Crayo List
Feb 14, 2019
Guillaume Piolat
Feb 14, 2019
H. S. Teoh
Feb 15, 2019
Guillaume Piolat
Feb 14, 2019
Ethan
February 06, 2019
"intel-intrinsics" is a DUB package for people interested in x86 performance that want neither to write assembly, nor a LDC-specific snippet... and still have fastest possible code.

Available through DUB: http://code.dlang.org/packages/intel-intrinsics


*** Features of v1.1.0:

- All intrinsics in this list: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=MMX,SSE,SSE2 Use existing Intel documentation and syntax

- write the same code for both DMD and LDC, in the last 6 versions for each. (Note that debug performance might suffer a lot when no inlining is activated.)

- Use operators on SIMD vectors as if core.simd were implemented on DMD 32-bit

- Introduces int2 and float2 because short SIMD vectors are useful

- about 6000 LOC (for now! more to come)

- Bonus: approximated pow/exp/log. Perform 4 approximated pow at once.


<future>
The long-term goal for this library is to be _only about semantics_, and not particularly codegen(!). This is because LLVM IR is portable, so forcing a particular instruction is undoing this portability work. **This can seem odd** for an "intrinsics" library but this way exact codegen options can be choosen by the library user, and most intrinsics can gracefuly degrade to portable IR in theory.

In the future, "magic" LLVM intrinsics will only be used when built for x86, but I think all of it can become portable and not x86-specific. Besides, there is a trend in LLVM to remove magic intrinsics once they are doable with IR only.
</future>


tl;dr you can use "intel-intrinsics" today, and get quite-optimal code with LDC, without duplication. You may come across early bugs too.
http://code.dlang.org/packages/intel-intrinsics

(note: it's important to bench against vanilla D code or arrays ops too, in some case the vanilla code wins)
February 06, 2019
On Wednesday, 6 February 2019 at 01:05:29 UTC, Guillaume Piolat wrote:
> "intel-intrinsics" is a DUB package for people interested in x86 performance that want neither to write assembly, nor a LDC-specific snippet... and still have fastest possible code.

Neat. Question: On Github it's stated that implicit conversions aren't supported, with this example:

__m128i b = _mm_set1_epi32(42);
__m128 a = b;             // NO, only works in LDC

Couldn't this be solved through something like this:

struct __m128 {
    float4 value;
    alias value this;
    void opAssign(__m128i rhs) {
        value = cast(float4)rhs.value;
    }
}

--
  Simen
February 06, 2019
On Wednesday, 6 February 2019 at 07:41:25 UTC, Simen Kjærås wrote:
>
> struct __m128 {
>     float4 value;
>     alias value this;
>     void opAssign(__m128i rhs) {
>         value = cast(float4)rhs.value;
>     }
> }
>
> --
>   Simen

The problem is that when you emulate core.simd (DMD 32-bit on Windows require that, if you want super fast OPTLINK build times), then you have no way to have user-defined implicit conversions.
and magic vector types from the compiler float4 / int4 / short8 / long2 / byte16 are all implicitely convertible to each other, but I don't think we can replicate this.
February 08, 2019
On Wednesday, 6 February 2019 at 01:05:29 UTC, Guillaume Piolat wrote:
> "intel-intrinsics" is a DUB package for people interested in x86 performance that want neither to write assembly, nor a LDC-specific snippet... and still have fastest possible code.
>
> Available through DUB: http://code.dlang.org/packages/intel-intrinsics

Big thanks for this, it's been a massive help for me.
cheers!
February 08, 2019
On Friday, 8 February 2019 at 12:22:14 UTC, NaN wrote:
> On Wednesday, 6 February 2019 at 01:05:29 UTC, Guillaume Piolat wrote:
>> "intel-intrinsics" is a DUB package for people interested in x86 performance that want neither to write assembly, nor a LDC-specific snippet... and still have fastest possible code.
>>
>> Available through DUB: http://code.dlang.org/packages/intel-intrinsics
>
> Big thanks for this, it's been a massive help for me.
> cheers!

You're welcome! I'd be interested to know what you are making with it, to feed the "users" list! https://github.com/AuburnSounds/intel-intrinsics/blob/master/README.md
February 08, 2019
On Friday, 8 February 2019 at 12:39:22 UTC, Guillaume Piolat wrote:
> On Friday, 8 February 2019 at 12:22:14 UTC, NaN wrote:
>> On Wednesday, 6 February 2019 at 01:05:29 UTC, Guillaume Piolat wrote:
>>> "intel-intrinsics" is a DUB package for people interested in x86 performance that want neither to write assembly, nor a LDC-specific snippet... and still have fastest possible code.
>>>
>>> Available through DUB: http://code.dlang.org/packages/intel-intrinsics
>>
>> Big thanks for this, it's been a massive help for me.
>> cheers!
>
> You're welcome! I'd be interested to know what you are making with it, to feed the "users" list! https://github.com/AuburnSounds/intel-intrinsics/blob/master/README.md

Im the guy from #graphics who's writing a software rasterizer. I'll let you know when I put it on github.


February 13, 2019
On Wednesday, 6 February 2019 at 01:05:29 UTC, Guillaume Piolat wrote:
> "intel-intrinsics" is a DUB package for people interested in x86 performance that want neither to write assembly, nor a LDC-specific snippet... and still have fastest possible code.
>
> [...]



This is really cool and I appreciate your efforts!

However (for those who are unaware) there is an alternative way that is (arguably) better;
https://ispc.github.io/index.html

You can write portable vectorized code that can be trivially invoked from D.
February 13, 2019
On Wednesday, 13 February 2019 at 04:57:29 UTC, Crayo List wrote:
> On Wednesday, 6 February 2019 at 01:05:29 UTC, Guillaume Piolat wrote:
>> "intel-intrinsics" is a DUB package for people interested in x86 performance that want neither to write assembly, nor a LDC-specific snippet... and still have fastest possible code.
>>
> This is really cool and I appreciate your efforts!
>
> However (for those who are unaware) there is an alternative way that is (arguably) better;
> https://ispc.github.io/index.html
>
> You can write portable vectorized code that can be trivially invoked from D.

ispc is another compiler in your build, and you'd write in another language, so it's not really the same thing. I haven't used it (nor do I know anyone who do) so don't really know why it would be any better
February 13, 2019
On Wednesday, 13 February 2019 at 19:55:05 UTC, Guillaume Piolat wrote:
> On Wednesday, 13 February 2019 at 04:57:29 UTC, Crayo List wrote:
>> On Wednesday, 6 February 2019 at 01:05:29 UTC, Guillaume Piolat wrote:
>>> "intel-intrinsics" is a DUB package for people interested in x86 performance that want neither to write assembly, nor a LDC-specific snippet... and still have fastest possible code.
>>>
>> This is really cool and I appreciate your efforts!
>>
>> However (for those who are unaware) there is an alternative way that is (arguably) better;
>> https://ispc.github.io/index.html
>>
>> You can write portable vectorized code that can be trivially invoked from D.
>
> ispc is another compiler in your build, and you'd write in another language, so it's not really the same thing.

That's mostly what I said, except that I did not say it's the same thing.
It's an alternative way to produce vectorized code in a deterministic and portable way.
This is NOT an auto-vectorizing compiler!

> I haven't used it (nor do I know anyone who do) so don't really know why it would be any better
And that's precisely why I posted here; for those people that have interest in vectorizing their code in a portable way to be aware that there is another (arguably) better way.
I highly recommend browsing through the walkthrough example;
https://ispc.github.io/example.html

For example, I have code that I can run on my Xeon Phi 7250 Knights Landing CPU by compiling with --target=avx512knl-i32x16, then I can run the exact same code with no change at all on my i7-5820k by compiling with --target=avx2-i32x8. Each time I get optimal code. This is not something you can easily do with intrinsics!


February 14, 2019
On Wednesday, 13 February 2019 at 23:26:48 UTC, Crayo List wrote:
> On Wednesday, 13 February 2019 at 19:55:05 UTC, Guillaume Piolat wrote:
>> On Wednesday, 13 February 2019 at 04:57:29 UTC, Crayo List wrote:
>>> However (for those who are unaware) there is an alternative way that is (arguably) better;
>>> https://ispc.github.io/index.html
>>>
>>> You can write portable vectorized code that can be trivially invoked from D.
>>
>> ispc is another compiler in your build, and you'd write in another language, so it's not really the same thing.
>
> That's mostly what I said, except that I did not say it's the same thing.
> It's an alternative way to produce vectorized code in a deterministic and portable way.

While you didn't say it was the same thing, you did say it's an alternative that 'is arguably better'. Adding another compiler using another language is arguably worse, so there are tradeoffs here, which Guillaume may have felt were undercommunicated (I know I did).

That said, it *is* a good alternative in some cases, and may well be worth pointing out in a thread like this.

--
  Simen
« First   ‹ Prev
1 2