Thread overview
[Issue 23641] core.simd.int4 multiplication
Jan 19, 2023
ponce
Jan 19, 2023
ponce
Jan 19, 2023
Iain Buclaw
Jan 29, 2023
Walter Bright
January 19, 2023
https://issues.dlang.org/show_bug.cgi?id=23641

--- Comment #1 from ponce <aliloko@gmail.com> ---
> requires Neon or SSE4.1 with the

requires Neon or SSE4.1 with the pmulld instruction

--
January 19, 2023
https://issues.dlang.org/show_bug.cgi?id=23641

ponce <aliloko@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |backend, SIMD, spec

--
January 19, 2023
https://issues.dlang.org/show_bug.cgi?id=23641

Iain Buclaw <ibuclaw@gdcproject.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ibuclaw@gdcproject.org

--- Comment #2 from Iain Buclaw <ibuclaw@gdcproject.org> ---
(In reply to ponce from comment #0)
> LDC, GDC and DMD implement int4 differently when it comes to multiplication.
> 

With DMD, you need to explicitly pass -mcpu=avx when compiling.  It uses a strict gate at compile-time to determine whether or not the expression would map to a single opcode in the dmd backend for the given type mode.

GDC and LDC ignores this gate - even if the information is there and can be queried against GCC or LLVM respectively - and just permissively allows the operation, which does mean that when passing down to the backend, it may split up the vector op into narrower modes when the target being compiled for doesn't have an available opcode.

This behaviour is justified because strictly, we don't know whether the optimizer might rewrite the expression in such a way that there *is* an a supported opcode.

For example: `a / b` has no vector op, but `a >> b` does.

https://d.godbolt.org/z/vrn77GG9f

(FYI, in gdc-13, `-Wvector-operation-performance` will be turned on by default so you'll at least get a non-blocking warning about expressions that have been expanded at narrower modes).

--
January 29, 2023
https://issues.dlang.org/show_bug.cgi?id=23641

Walter Bright <bugzilla@digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |bugzilla@digitalmars.com
         Resolution|---                         |INVALID

--- Comment #3 from Walter Bright <bugzilla@digitalmars.com> ---
This behavior of DMD is as designed. (As mentioned here, it will work if the -mcpu=avx switch is used.)

Workarounds can be much much slower than the native instructions. The user may not realize that a slow workaround is happening. By notifying the user that the native instruction for it does not exist, the user can then deliberately choose the workaround that works best for his particular application. In particular, the user may not actually need the full capability of the native instruction, so using a full semantic workaround is a pessimization. Or a different algorithm can be selected that does not require the missing native instruction.

This behavior comes at the request of Manu Evans, who spends a lot of time coding high performance vector code.

GDC and LDC have a different philosophy about this, which is their prerogative.

Therefore, I'm going to mark this as INVALID, as the behavior is deliberate and not a bug.

--