February 14, 2019 Re: intel-intrinsics v1.0.0 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Crayo List | On Wednesday, 13 February 2019 at 23:26:48 UTC, Crayo List wrote: > On Wednesday, 13 February 2019 at 19:55:05 UTC, Guillaume Piolat wrote: >> On Wednesday, 13 February 2019 at 04:57:29 UTC, Crayo List wrote: >>> On Wednesday, 6 February 2019 at 01:05:29 UTC, Guillaume Piolat wrote: >>>> "intel-intrinsics" is a DUB package for people interested in x86 performance that want neither to write assembly, nor a LDC-specific snippet... and still have fastest possible code. >>>> >>> This is really cool and I appreciate your efforts! >>> >>> However (for those who are unaware) there is an alternative way that is (arguably) better; >>> https://ispc.github.io/index.html >>> >>> You can write portable vectorized code that can be trivially invoked from D. >> >> ispc is another compiler in your build, and you'd write in another language, so it's not really the same thing. > > That's mostly what I said, except that I did not say it's the same thing. > It's an alternative way to produce vectorized code in a deterministic and portable way. > This is NOT an auto-vectorizing compiler! > >> I haven't used it (nor do I know anyone who do) so don't really know why it would be any better > And that's precisely why I posted here; for those people that have interest in vectorizing their code in a portable way to be aware that there is another (arguably) better way. > I highly recommend browsing through the walkthrough example; > https://ispc.github.io/example.html > > For example, I have code that I can run on my Xeon Phi 7250 Knights Landing CPU by compiling with --target=avx512knl-i32x16, then I can run the exact same code with no change at all on my i7-5820k by compiling with --target=avx2-i32x8. Each time I get optimal code. This is not something you can easily do with intrinsics! I don't disagree but ispc sounds more like a host-only OpenCL to me, rather than a replacement/competition for intel-intrinsics. Intrinsics are easy: if calling another compiler with another source language might be trivial, then importing a DUB package and start using it within the same source code is even more trivial! I take issue with the claim that Single Program Multiple Data yields much more performance than well written intrinsics code: when your compiler auto-vectorize (or you vectorized using SIMD semantics) you _also_ have one instruction for multiple data. The only gain I can see for SPMD would be use of non-temporal writes, since they are so hard to use effectively in practice. I also take some issue with "portability": SIMD intrinsics optimize quite deterministically (some instructions get generated since LDC 1.0.0 -O0), also LLVM IR is portable to ARM, whereas ispc will likely never as admitted by its author: https://pharr.org/matt/blog/2018/04/29/ispc-retrospective.html My interests on AVX-512 are subnormal: it can _slow down_ things on some x86 CPUs: https://gist.github.com/rygorous/32bc3ea8301dba09358fd2c64e02d774 In general the latest instructions sets are increasingly hard to apply, and have lower yield. The newer Intel instruction sets are basically a scam for the performance-minded. Sponsored work on x265 yields really abnormally low results, rewriting things with AVX-512: https://software.intel.com/en-us/articles/accelerating-x265-with-intel-advanced-vector-extensions-512-intel-avx-512 As to compiling precisely for the host target: we are building B2C software here so don't control the host machine. Thankfully the ancient SIMD instructions sets yield most of the value! Since a lot of the time memory throughput is the bottleneck. I can see ispc being more useful when you know the precise model of your target Intel CPU. I would also like to see it compare to Intel's own software OpenCL: it seems it started its life as internal competition. |
February 14, 2019 Re: intel-intrinsics v1.0.0 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Crayo List | On Wednesday, 13 February 2019 at 23:26:48 UTC, Crayo List wrote:
> And that's precisely why I posted here; for those people that have interest in vectorizing their code in a portable way to be aware that there is another (arguably) better way.
All power to the people that have code that simple. But auto-vectorising in any capacity is the wrong way to do things in my field. An intrinsics library is vital to write highly specialised code.
The tl;dr here is that we *FINALLY* have a minimum-spec for x64 CPUs represented with SSE intrinsics. Instead of whatever core.simd is. That's really important, and talks about auto-vectorisation are really best saved for another thread.
|
February 14, 2019 Re: intel-intrinsics v1.0.0 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ethan | On Thursday, 14 February 2019 at 16:13:21 UTC, Ethan wrote:
> On Wednesday, 13 February 2019 at 23:26:48 UTC, Crayo List wrote:
>> And that's precisely why I posted here; for those people that have interest in vectorizing their code in a portable way to be aware that there is another (arguably) better way.
>
> All power to the people that have code that simple. But auto-vectorising in any capacity is the wrong way to do things in my field. An intrinsics library is vital to write highly specialised code.
>
> The tl;dr here is that we *FINALLY* have a minimum-spec for x64 CPUs represented with SSE intrinsics. Instead of whatever core.simd is. That's really important, and talks about auto-vectorisation are really best saved for another thread.
Please re-read my post carefully!
|
February 14, 2019 Re: intel-intrinsics v1.0.0 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Crayo List | On Thursday, 14 February 2019 at 21:45:57 UTC, Crayo List wrote:
> On Thursday, 14 February 2019 at 16:13:21 UTC, Ethan wrote:
>> On Wednesday, 13 February 2019 at 23:26:48 UTC, Crayo List wrote:
>>> And that's precisely why I posted here; for those people that have interest in vectorizing their code in a portable way to be aware that there is another (arguably) better way.
>>
>> All power to the people that have code that simple. But auto-vectorising in any capacity is the wrong way to do things in my field. An intrinsics library is vital to write highly specialised code.
>>
>> The tl;dr here is that we *FINALLY* have a minimum-spec for x64 CPUs represented with SSE intrinsics. Instead of whatever core.simd is. That's really important, and talks about auto-vectorisation are really best saved for another thread.
>
> Please re-read my post carefully!
I think ispc is interesting, and a very D-ish thing to have would be an ispc-like compiler at CTFE that outputs LLVM IR (or assembly or intel-intrinsics). That would break the language boundary and allows inlining. Though probably we need newCTFE for this, as everything interesting seems to need newCTFE :) And it's a gigantic amount of work.
|
February 14, 2019 Re: intel-intrinsics v1.0.0 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Guillaume Piolat | On Thu, Feb 14, 2019 at 10:15:19PM +0000, Guillaume Piolat via Digitalmars-d-announce wrote: [...] > I think ispc is interesting, and a very D-ish thing to have would be an ispc-like compiler at CTFE that outputs LLVM IR (or assembly or intel-intrinsics). That would break the language boundary and allows inlining. Though probably we need newCTFE for this, as everything interesting seems to need newCTFE :) And it's a gigantic amount of work. Much as I love the idea of generating D code at compile-time and look forward to newCTFE, there comes a point when I'd really rather just run the DSL through some kind of preprocessing (i.e., compile with ispc) as part of the build, then link the result to the D code, rather than trying to shoehorn everything into (new)CTFE. T -- You have to expect the unexpected. -- RL |
February 14, 2019 Re: intel-intrinsics v1.0.0 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Crayo List | On Thursday, 14 February 2019 at 21:45:57 UTC, Crayo List wrote:
> Please re-read my post carefully!
Or - even better - take the hint that not every use of SIMD can be expressed in a high level manner.
|
February 15, 2019 Re: intel-intrinsics v1.0.0 | ||||
---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On Thursday, 14 February 2019 at 22:28:46 UTC, H. S. Teoh wrote: > trying to shoehorn everything into (new)CTFE. Couldn't help but find a similarity between http://www.dsource.org/projects/mathextra/browser/trunk/blade/BladeDemo.d and ispc |
Copyright © 1999-2021 by the D Language Foundation