[Issue 23641] core.simd.int4 multiplication

Jan 19, 2023

ponce

Jan 19, 2023

ponce

Jan 19, 2023

Iain Buclaw

Jan 29, 2023

Walter Bright

https://issues.dlang.org/show_bug.cgi?id=23641 ponce <aliloko@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |backend, SIMD, spec --

January 19, 2023

[Issue 23641] core.simd.int4 multiplication

Posted by Iain Buclaw

Permalink

Iain Buclaw

Permalink

https://issues.dlang.org/show_bug.cgi?id=23641

Iain Buclaw <ibuclaw@gdcproject.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ibuclaw@gdcproject.org

--- Comment #2 from Iain Buclaw <ibuclaw@gdcproject.org> ---
(In reply to ponce from comment #0)
> LDC, GDC and DMD implement int4 differently when it comes to multiplication.
> 

With DMD, you need to explicitly pass -mcpu=avx when compiling.  It uses a strict gate at compile-time to determine whether or not the expression would map to a single opcode in the dmd backend for the given type mode.

GDC and LDC ignores this gate - even if the information is there and can be queried against GCC or LLVM respectively - and just permissively allows the operation, which does mean that when passing down to the backend, it may split up the vector op into narrower modes when the target being compiled for doesn't have an available opcode.

This behaviour is justified because strictly, we don't know whether the optimizer might rewrite the expression in such a way that there *is* an a supported opcode.

For example: `a / b` has no vector op, but `a >> b` does.

https://d.godbolt.org/z/vrn77GG9f

(FYI, in gdc-13, `-Wvector-operation-performance` will be turned on by default so you'll at least get a non-blocking warning about expressions that have been expanded at narrower modes).

--

https://issues.dlang.org/show_bug.cgi?id=23641 Walter Bright <bugzilla@digitalmars.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |bugzilla@digitalmars.com Resolution|--- |INVALID --- Comment #3 from Walter Bright <bugzilla@digitalmars.com> --- This behavior of DMD is as designed. (As mentioned here, it will work if the -mcpu=avx switch is used.) Workarounds can be much much slower than the native instructions. The user may not realize that a slow workaround is happening. By notifying the user that the native instruction for it does not exist, the user can then deliberately choose the workaround that works best for his particular application. In particular, the user may not actually need the full capability of the native instruction, so using a full semantic workaround is a pessimization. Or a different algorithm can be selected that does not require the missing native instruction. This behavior comes at the request of Manu Evans, who spends a lot of time coding high performance vector code. GDC and LDC have a different philosophy about this, which is their prerogative. Therefore, I'm going to mark this as INVALID, as the behavior is deliberate and not a bug. --

Forums