auto vectorization observations - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » auto vectorization observations

Thread overview

auto vectorization observations
Jun 08, 2022 Bruce Carneal
Jun 08, 2022 Guillaume Piolat
Jun 09, 2022 Siarhei Siamashka
Jun 09, 2022 Bruce Carneal
Jun 09, 2022 user1234
Jun 10, 2022 Bruce Carneal

June 08, 2022

auto vectorization observations

Posted by Bruce Carneal

Bruce Carneal

Auto vectorization (autovec) can yield significant performance improvements but this back end technology may struggle with some very simple forms, leaving you with lackluster performance. When that happens in performance critical code D's __vector types will come in handy.

ldc and gdc differ in their autovec capabilities with gdc coming out ahead in at least one important area: dealing with conditionals.

As an example, gdc is able to vectorize the following for both ARM SVE and x86-SIMD architectures while ldc, per my godbolt testing at least, can not.

alias T = ubyte; // 16 wide (128 bit HW) to 64 wide (512 bit HW)
alias CT = const(T);

void choose(size_t n, CT* src, T threshold, CT* a, CT* b, T* dst)
{
    foreach(i; 0 .. n)
        dst[i] = src[i] < threshold ? a[i] : b[i];
}

You can handle conditionals manually in the __vector world but it's tedious and error prone so kudos Iain and the gcc crew.

Additional observations wrt D and auto vectorization, good and bad, are welcome.

June 08, 2022

Re: auto vectorization observations

Posted by Guillaume Piolat
in reply to Bruce Carneal

Guillaume Piolat

Posted in reply to Bruce Carneal

On Wednesday, 8 June 2022 at 18:41:44 UTC, Bruce Carneal wrote:

>

Additional observations wrt D and auto vectorization, good and bad, are welcome.

Also failed to get it autovectorized in LDC. @restrict, going short, or else doesn't seem to work - strange.

June 09, 2022

Re: auto vectorization observations

Posted by Siarhei Siamashka
in reply to Bruce Carneal

Siarhei Siamashka

Posted in reply to Bruce Carneal

On Wednesday, 8 June 2022 at 18:41:44 UTC, Bruce Carneal wrote:

>

As an example, gdc is able to vectorize the following for both ARM SVE and x86-SIMD architectures while ldc, per my godbolt testing at least, can not.

I was unable to confirm this: https://d.godbolt.org/z/Y9fEvn83e (neither GDC nor LDC can vectorize it). Could you please post a link to your godbolt results with the right compiler versions and optimization options?

June 09, 2022

Re: auto vectorization observations

Posted by Bruce Carneal
in reply to Siarhei Siamashka

Bruce Carneal

Posted in reply to Siarhei Siamashka

On Thursday, 9 June 2022 at 14:28:31 UTC, Siarhei Siamashka wrote:

>

On Wednesday, 8 June 2022 at 18:41:44 UTC, Bruce Carneal wrote:

>

As an example, gdc is able to vectorize the following for both ARM SVE and x86-SIMD architectures while ldc, per my godbolt testing at least, can not.

I was unable to confirm this: https://d.godbolt.org/z/Y9fEvn83e (neither GDC nor LDC can vectorize it). Could you please post a link to your godbolt results with the right compiler versions and optimization options?

https://godbolt.org/z/1exqWT49c

The above is a link to gdc/ldc godbolt comparison with x86-64-v4 targets. I'm not sure which subset of the v4 capabilities are required for gdc to vectorize the code. It does not vectorize v3, so very recent x86s in any event.

The ARM story is similar, SVE is very recent. Note: I think I had to go to C equivalent code to get a godbolt visible ARM SVE compiler.

Auto vectorization could be the future but if you're targetting non-cutting-edge HW SIMT is a better bet if available and __vector if not.

June 09, 2022

Re: auto vectorization observations

Posted by user1234
in reply to Bruce Carneal

user1234

Posted in reply to Bruce Carneal

On Thursday, 9 June 2022 at 20:32:44 UTC, Bruce Carneal wrote:

>

On Thursday, 9 June 2022 at 14:28:31 UTC, Siarhei Siamashka wrote:

>

On Wednesday, 8 June 2022 at 18:41:44 UTC, Bruce Carneal wrote:

>

As an example, gdc is able to vectorize the following for both ARM SVE and x86-SIMD architectures while ldc, per my godbolt testing at least, can not.

I was unable to confirm this: https://d.godbolt.org/z/Y9fEvn83e (neither GDC nor LDC can vectorize it). Could you please post a link to your godbolt results with the right compiler versions and optimization options?

https://godbolt.org/z/1exqWT49c

The above is a link to gdc/ldc godbolt comparison with x86-64-v4 targets. I'm not sure which subset of the v4 capabilities are required for gdc to vectorize the code. It does not vectorize v3, so very recent x86s in any event.

Thanks for the precision, I wondered the same as Siarhei.
gdc options "-O3 -mavx512bw" produceq the same output as your.
With no surprise LDC does not do better with -mattr=+avx512bw.

June 10, 2022

Re: auto vectorization observations

Posted by Bruce Carneal
in reply to user1234

Bruce Carneal

Posted in reply to user1234

On Thursday, 9 June 2022 at 23:35:11 UTC, user1234 wrote:

>

On Thursday, 9 June 2022 at 20:32:44 UTC, Bruce Carneal wrote:

Thanks for the precision, I wondered the same as Siarhei.
gdc options "-O3 -mavx512bw" produceq the same output as your.
With no surprise LDC does not do better with -mattr=+avx512bw.

Turns out gdc can vectorize it with the much more widely available x86-64-v3 target if your data elements are 4 bytes or larger. Unfortunately ldc still can not.

Your avx512bw discovery prompted me to give the larger types a try. Thanks for digging into it.

Here's the link:
https://godbolt.org/z/ajjjj8vfY

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation