Thread overview
SIMD c = a op b
Jun 18, 2023
Cecil Ward
Jun 18, 2023
Cecil Ward
Jun 19, 2023
Guillaume Piolat
June 18, 2023
Is it true that this doesn’t work (in either branch)?

float4 a,b;
static if (__traits(compiles, a/b))
    c = a / b;
else
    c[] = a[] / b[];

I tried it with 4 x 64-bit ulongs in a 256-bit vector instead.
Hoping I have done things correctly, I got an error message about requiring a destination variable as in c = a op b where I tried simply "return a / b;" In the else branch, I got a type conversion error. Is that because a[] is an array of 256-bit vectors, in the else case, not an array of ulongs?
June 18, 2023
On Sunday, 18 June 2023 at 04:54:08 UTC, Cecil Ward wrote:
> Is it true that this doesn’t work (in either branch)?
>
> float4 a,b;
> static if (__traits(compiles, a/b))
>     c = a / b;
> else
>     c[] = a[] / b[];
>
> I tried it with 4 x 64-bit ulongs in a 256-bit vector instead.
> Hoping I have done things correctly, I got an error message about requiring a destination variable as in c = a op b where I tried simply "return a / b;" In the else branch, I got a type conversion error. Is that because a[] is an array of 256-bit vectors, in the else case, not an array of ulongs?

Correction I should have written ‘always work’ - I just copied the example straight from the language documentation for simd and adapted it to use ulongs and a wider vector.

I was using GDC.
June 19, 2023
On Sunday, 18 June 2023 at 05:01:16 UTC, Cecil Ward wrote:
> On Sunday, 18 June 2023 at 04:54:08 UTC, Cecil Ward wrote:
>> Is it true that this doesn’t always work (in either branch)?
>>
>> float4 a,b;
>> static if (__traits(compiles, a/b))
>>     c = a / b;
>> else
>>     c[] = a[] / b[];
>>

It's because SIMD stuff doesn't always works that intel-intrinsics was created. It insulates you from the compiler underneath.


import inteli.emmintrin;

void main()
{
    float4 a, b, c;
    c = a / b;            // _always_ works
    c = _mm_div_ps(a, b); // _always_ works
}

Sure in some case it may emulate those vectors, but for vector of float it's only in DMD -m32. It relies on excellent __vector work made a long time ago, and supplements it.

For 32-byte vectors such as __vector(float[8]), you will have trouble on GDC when -mavx isn't there, or with DMD.

Do you think the builtin __vector support the same operations across the compilers? The answer is "it's getting there", in the meanwhile using intel-intrinsics will lower your exposure to the compiler woes.
If you want to use DMD and -O -inline, you should also expect much more problems unless working extra in order to have SIMD.