February 09, 2019
https://issues.dlang.org/show_bug.cgi?id=19663

          Issue ID: 19663
           Summary: On x86_64 the fabs intrinsic should use SSE
           Product: D
           Version: D2
          Hardware: x86_64
                OS: All
            Status: NEW
          Keywords: performance
          Severity: enhancement
          Priority: P1
         Component: dmd
          Assignee: nobody@puremagic.com
          Reporter: b2.temp@gmx.com

Currently on x86_64 dmd backend uses the FPU FABS homonymous instruction but since `single` and `double` parameters are passed, as defined by ABI, in SSE registers, the they have to travel from these SSE registers to GP registers then only to FPU registers and depending on what's done with the absolute value that's obtained: back to a GP register (and all of this to clear a bit !), then again back to SSE register if the func has to return the value etc.

It would be more wise to use SSE logical AND with a mask. This would be done only for the single and double types.

Several options exist
1. generate mask and ANDPS/ANDPD
2. ANDPS/ANDPD on a constant mask (LDC2 does that btw)
3. left shift and right shift by one


Forum discussion: https://forum.dlang.org/post/diljelbvmenuxtaqbuxw@forum.dlang.org

Reference for the possible solutions: https://stackoverflow.com/questions/32408665/fastest-way-to-compute-absolute-value-using-sse

--