January 19, 2023
https://issues.dlang.org/show_bug.cgi?id=23641

          Issue ID: 23641
           Summary: core.simd.int4 multiplication
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Severity: minor
          Priority: P1
         Component: dmd
          Assignee: nobody@puremagic.com
          Reporter: aliloko@gmail.com

LDC, GDC and DMD implement int4 differently when it comes to multiplication.

------ test-case.d ---------

import core.simd;

int4 mul_4_ints (int4 a, int4 b)
{
    return a * b; // ok with LDC and GDC, but not DMD
}

----------------------------


An efficient int4 * int4 requires Neon or SSE4.1 with the
- DMD doesn't implement int4 * int4
- GDC and LDC implement it with a replacement sequence and two multiply
instructions. GDC gained that function at one point.

In intel-intrinsics, I now tell people to use a _mm_mullo_epi32 to stay portable, it will do the workarounds. Since having this operation is a bit of a portability trap.


Two solutions I could see here:

  - A. remove support from LDC and GDC, since no particular hardware support is
here below SSE4.1. User is forced to think about portability.

  - B. add support for int4*int4 in DMD, to match the capabilities. Use can use
core.simd without unknowingly breaking compat.

Personally have no idea what is best.

--