SIMD c = a op b

Jun 18, 2023

Jun 19, 2023

Is it true that this doesn’t work (in either branch)? float4 a,b; static if (__traits(compiles, a/b)) c = a / b; else c[] = a[] / b[]; I tried it with 4 x 64-bit ulongs in a 256-bit vector instead. Hoping I have done things correctly, I got an error message about requiring a destination variable as in c = a op b where I tried simply "return a / b;" In the else branch, I got a type conversion error. Is that because a[] is an array of 256-bit vectors, in the else case, not an array of ulongs?

On Sunday, 18 June 2023 at 04:54:08 UTC, Cecil Ward wrote: > Is it true that this doesn’t work (in either branch)? > > float4 a,b; > static if (__traits(compiles, a/b)) > c = a / b; > else > c[] = a[] / b[]; > > I tried it with 4 x 64-bit ulongs in a 256-bit vector instead. > Hoping I have done things correctly, I got an error message about requiring a destination variable as in c = a op b where I tried simply "return a / b;" In the else branch, I got a type conversion error. Is that because a[] is an array of 256-bit vectors, in the else case, not an array of ulongs? Correction I should have written ‘always work’ - I just copied the example straight from the language documentation for simd and adapted it to use ulongs and a wider vector. I was using GDC.

June 19, 2023

Re: SIMD c = a op b

Posted by Guillaume Piolat
in reply to Cecil Ward

Permalink

Guillaume Piolat

Posted in reply to Cecil Ward

Permalink

On Sunday, 18 June 2023 at 05:01:16 UTC, Cecil Ward wrote:
> On Sunday, 18 June 2023 at 04:54:08 UTC, Cecil Ward wrote:
>> Is it true that this doesn’t always work (in either branch)?
>>
>> float4 a,b;
>> static if (__traits(compiles, a/b))
>>     c = a / b;
>> else
>>     c[] = a[] / b[];
>>

It's because SIMD stuff doesn't always works that intel-intrinsics was created. It insulates you from the compiler underneath.

import inteli.emmintrin;

void main()
{
    float4 a, b, c;
    c = a / b;            // _always_ works
    c = _mm_div_ps(a, b); // _always_ works
}

Sure in some case it may emulate those vectors, but for vector of float it's only in DMD -m32. It relies on excellent __vector work made a long time ago, and supplements it.

For 32-byte vectors such as __vector(float[8]), you will have trouble on GDC when -mavx isn't there, or with DMD.

Do you think the builtin __vector support the same operations across the compilers? The answer is "it's getting there", in the meanwhile using intel-intrinsics will lower your exposure to the compiler woes.
If you want to use DMD and -O -inline, you should also expect much more problems unless working extra in order to have SIMD.

Forums