Thread overview
[Issue 21041] core.bitop.byteswap(ushort) should used ROL/ROR instead of XCHG
Jul 12, 2020
safety0ff.bugz
Jul 12, 2020
Bruce Carneal
Jul 12, 2020
safety0ff.bugz
Jul 12, 2020
safety0ff.bugz
Jul 12, 2020
safety0ff.bugz
Jul 12, 2020
Bruce Carneal
Mar 21, 2021
Iain Buclaw
Dec 17, 2022
Iain Buclaw
July 12, 2020
https://issues.dlang.org/show_bug.cgi?id=21041

safety0ff.bugz <safety0ff.bugz@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |performance

--
July 12, 2020
https://issues.dlang.org/show_bug.cgi?id=21041

Bruce Carneal <bcarneal11@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bcarneal11@gmail.com

--- Comment #1 from Bruce Carneal <bcarneal11@gmail.com> ---
I didn't find a 'byteswap' in the core.bitop documentation.  There is a bswap but only for uints and ulongs AFAICT.  Regardless, here's a byteswap implementation for discussion:

auto byteswap(ushort x) { return cast(ushort)(x >> 8 | x << 8); }

For the above code ldc at -O or above generates:
  movl %edi, %eax
  rolw $8, %ax
  retq

With ldc you can also get the above sequence using core.bitop.rol!8 explicitly.

Current dmd -O emits 7 instructions to accomplish the rolw in the code body. The code emitted by dmd -O for the explicit call to core.bitop.rol is even worse, which is strange.

So, yes, there's room here for DMD code gen improvement but ldc is right there.

--
July 12, 2020
https://issues.dlang.org/show_bug.cgi?id=21041

--- Comment #2 from safety0ff.bugz <safety0ff.bugz@gmail.com> ---
(In reply to Bruce Carneal from comment #1)
> I didn't find a 'byteswap' in the core.bitop documentation.  There is a bswap but only for uints and ulongs AFAICT.

The intrinsic in question was added in the master branch here: https://github.com/dlang/dmd/pull/11388

Also the 64 bit version is to be added here: https://github.com/dlang/dmd/pull/11408

> For the above code ldc at -O or above generates:
>   movl %edi, %eax
>   rolw $8, %ax
>   retq

I'd expect that since C/C++ clang emit that.

--
July 12, 2020
https://issues.dlang.org/show_bug.cgi?id=21041

--- Comment #3 from safety0ff.bugz <safety0ff.bugz@gmail.com> ---
(In reply to Bruce Carneal from comment #1)
> Current dmd -O emits 7 instructions to accomplish the rolw in the code body.

D converts many operations on narrow types to int, which DMD's backend then fails to optimize away when it is possible/advantageous.

--
July 12, 2020
https://issues.dlang.org/show_bug.cgi?id=21041

--- Comment #4 from safety0ff.bugz <safety0ff.bugz@gmail.com> ---
(In reply to safety0ff.bugz from comment #3)
> (In reply to Bruce Carneal from comment #1)
> > Current dmd -O emits 7 instructions to accomplish the rolw in the code body.
> 
> D converts many operations on narrow types to int, which DMD's backend then fails to optimize away when it is possible/advantageous.

Further investigation: dmd/backend/cod2.d function cdshift also converts rotates of 8 in upper/lower 8 of word into XCHG's

--
July 12, 2020
https://issues.dlang.org/show_bug.cgi?id=21041

--- Comment #5 from Bruce Carneal <bcarneal11@gmail.com> ---
(In reply to safety0ff.bugz from comment #3)
> (In reply to Bruce Carneal from comment #1)
> > Current dmd -O emits 7 instructions to accomplish the rolw in the code body.
> 
> D converts many operations on narrow types to int, which DMD's backend then fails to optimize away when it is possible/advantageous.

Yes.  DMDs back end is quick, but the code it generates is not state-of-the-art.

That said, optimizing the DMD code gen for code.bitop rotations seems more useful than a ushort byteswap improvement.  The latter could be implemented as an "inline" of the former.

Recognizing the rotation patterns generally, ala LLVM, would be even better but quite a bit of work I'd imagine.  Probably not worth it given current resource constraints (Walter's time).  Lots of big front-end fish to fry.

--
March 21, 2021
https://issues.dlang.org/show_bug.cgi?id=21041

Iain Buclaw <ibuclaw@gdcproject.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |backend
                 CC|                            |ibuclaw@gdcproject.org

--
December 17, 2022
https://issues.dlang.org/show_bug.cgi?id=21041

Iain Buclaw <ibuclaw@gdcproject.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P1                          |P4

--