Thread overview
SIMD-specialized overloads of Phobos algorithms
Jun 28, 2019
Per Nordlöw
Jun 28, 2019
kinke
Jun 28, 2019
Per Nordlöw
Jun 28, 2019
Nicholas Wilson
Jun 30, 2019
Johan Engelen
Jul 01, 2019
Per Nordlöw
Jun 30, 2019
a11e99z
Jul 01, 2019
9il
June 28, 2019
According to

http://0x80.pl/notesen/2018-10-03-simd-index-of-min.html

SIMD-tuning a Phobos function, such as,

    std.algorithm.searching.minIndex

for an `int[]`-haystack on AVX512f leads to a speedup of 15x.

Should such specializations be added to Phobos or is such an optimization only possible for LDC or GCC but not for DMD?

Further, when will compilers, such as LDC and GDC, be able to do these auto-vectorizations automatically? Will the GCC and Clang compiler setting of `-march=native` play a role also for LDC in the future. Currently (in LDC 1.16.0) the setting `-march=native` is not allowed.
June 28, 2019
On Friday, 28 June 2019 at 15:49:34 UTC, Per Nordlöw wrote:
> Will the GCC and Clang compiler setting of `-march=native` play a role also for LDC in the future. Currently (in LDC 1.16.0) the setting `-march=native` is not allowed.

It's `-mcpu=native`.
June 28, 2019
On Friday, 28 June 2019 at 16:30:31 UTC, kinke wrote:
> It's `-mcpu=native`.

Can/Could the value of this switch be detected at compile-time?
June 28, 2019
On Friday, 28 June 2019 at 19:53:10 UTC, Per Nordlöw wrote:
> On Friday, 28 June 2019 at 16:30:31 UTC, kinke wrote:
>> It's `-mcpu=native`.
>
> Can/Could the value of this switch be detected at compile-time?

Well from the point of view of the code you usually don't really care because it's just like passing a higher value of n to `-On`.

It's probably already implemented as an LDC specific __trait, but if it isn't it shouldn't be too hard to add to __traits(getTargetInfo) (the optimiser values that -mcpu=native sets probably will be made available through getTargetInfo at some point though).
June 30, 2019
On Friday, 28 June 2019 at 22:53:45 UTC, Nicholas Wilson wrote:
> On Friday, 28 June 2019 at 19:53:10 UTC, Per Nordlöw wrote:
>> On Friday, 28 June 2019 at 16:30:31 UTC, kinke wrote:
>>> It's `-mcpu=native`.
>>
>> Can/Could the value of this switch be detected at compile-time?
>
> It's probably already implemented as an LDC specific __trait,

Indeed:
https://wiki.dlang.org/LDC-specific_language_changes#targetCPU

-Johan

June 30, 2019
On Friday, 28 June 2019 at 19:53:10 UTC, Per Nordlöw wrote:
> On Friday, 28 June 2019 at 16:30:31 UTC, kinke wrote:
>> It's `-mcpu=native`.
>
> Can/Could the value of this switch be detected at compile-time?

dynamic compilation may be useful too
https://forum.dlang.org/post/bskpxhrqyfkvaqzoospx@forum.dlang.org
July 01, 2019
On Sunday, 30 June 2019 at 15:40:07 UTC, Johan Engelen wrote:
> Indeed:
> https://wiki.dlang.org/LDC-specific_language_changes#targetCPU
>
> -Johan

Cool.

Even better,

https://wiki.dlang.org/LDC-specific_language_changes#targetHasFeature

is _exactly_ what I want! :)

Still, the question remains to be answered:

    Where should these CPU-specialized overloads of Phobos algorithms be placed?
July 01, 2019
On Friday, 28 June 2019 at 15:49:34 UTC, Per Nordlöw wrote:
> According to
>
> http://0x80.pl/notesen/2018-10-03-simd-index-of-min.html
>
> SIMD-tuning a Phobos function, such as,
>
>     std.algorithm.searching.minIndex
>
> for an `int[]`-haystack on AVX512f leads to a speedup of 15x.
>
> Should such specializations be added to Phobos or is such an optimization only possible for LDC or GCC but not for DMD?

Specializations are welcome for mir-algorithm.

http://mir-algorithm.libmir.org/mir_algorithm_iteration.html
July 04, 2019
On 7/1/19 11:52 AM, Per Nordlöw wrote:
> On Sunday, 30 June 2019 at 15:40:07 UTC, Johan Engelen wrote:
>> Indeed:
>> https://wiki.dlang.org/LDC-specific_language_changes#targetCPU
>>
>> -Johan
> 
> Cool.
> 
> Even better,
> 
> https://wiki.dlang.org/LDC-specific_language_changes#targetHasFeature
> 
> is _exactly_ what I want! :)
> 
> Still, the question remains to be answered:
> 
>      Where should these CPU-specialized overloads of Phobos algorithms be placed?

There are several schools of thought. A simple way to ease into it is to place specializations with the algorithms. That's transparent to coders and backward compatible. The decision to create visible, user-selectable versions can be thus postponed.