Thread overview | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
July 12, 2020 [Issue 21041] core.bitop.byteswap(ushort) should used ROL/ROR instead of XCHG | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=21041 safety0ff.bugz <safety0ff.bugz@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |performance -- |
July 12, 2020 [Issue 21041] core.bitop.byteswap(ushort) should used ROL/ROR instead of XCHG | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=21041 Bruce Carneal <bcarneal11@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bcarneal11@gmail.com --- Comment #1 from Bruce Carneal <bcarneal11@gmail.com> --- I didn't find a 'byteswap' in the core.bitop documentation. There is a bswap but only for uints and ulongs AFAICT. Regardless, here's a byteswap implementation for discussion: auto byteswap(ushort x) { return cast(ushort)(x >> 8 | x << 8); } For the above code ldc at -O or above generates: movl %edi, %eax rolw $8, %ax retq With ldc you can also get the above sequence using core.bitop.rol!8 explicitly. Current dmd -O emits 7 instructions to accomplish the rolw in the code body. The code emitted by dmd -O for the explicit call to core.bitop.rol is even worse, which is strange. So, yes, there's room here for DMD code gen improvement but ldc is right there. -- |
July 12, 2020 [Issue 21041] core.bitop.byteswap(ushort) should used ROL/ROR instead of XCHG | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=21041 --- Comment #2 from safety0ff.bugz <safety0ff.bugz@gmail.com> --- (In reply to Bruce Carneal from comment #1) > I didn't find a 'byteswap' in the core.bitop documentation. There is a bswap but only for uints and ulongs AFAICT. The intrinsic in question was added in the master branch here: https://github.com/dlang/dmd/pull/11388 Also the 64 bit version is to be added here: https://github.com/dlang/dmd/pull/11408 > For the above code ldc at -O or above generates: > movl %edi, %eax > rolw $8, %ax > retq I'd expect that since C/C++ clang emit that. -- |
July 12, 2020 [Issue 21041] core.bitop.byteswap(ushort) should used ROL/ROR instead of XCHG | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=21041 --- Comment #3 from safety0ff.bugz <safety0ff.bugz@gmail.com> --- (In reply to Bruce Carneal from comment #1) > Current dmd -O emits 7 instructions to accomplish the rolw in the code body. D converts many operations on narrow types to int, which DMD's backend then fails to optimize away when it is possible/advantageous. -- |
July 12, 2020 [Issue 21041] core.bitop.byteswap(ushort) should used ROL/ROR instead of XCHG | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=21041 --- Comment #4 from safety0ff.bugz <safety0ff.bugz@gmail.com> --- (In reply to safety0ff.bugz from comment #3) > (In reply to Bruce Carneal from comment #1) > > Current dmd -O emits 7 instructions to accomplish the rolw in the code body. > > D converts many operations on narrow types to int, which DMD's backend then fails to optimize away when it is possible/advantageous. Further investigation: dmd/backend/cod2.d function cdshift also converts rotates of 8 in upper/lower 8 of word into XCHG's -- |
July 12, 2020 [Issue 21041] core.bitop.byteswap(ushort) should used ROL/ROR instead of XCHG | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=21041 --- Comment #5 from Bruce Carneal <bcarneal11@gmail.com> --- (In reply to safety0ff.bugz from comment #3) > (In reply to Bruce Carneal from comment #1) > > Current dmd -O emits 7 instructions to accomplish the rolw in the code body. > > D converts many operations on narrow types to int, which DMD's backend then fails to optimize away when it is possible/advantageous. Yes. DMDs back end is quick, but the code it generates is not state-of-the-art. That said, optimizing the DMD code gen for code.bitop rotations seems more useful than a ushort byteswap improvement. The latter could be implemented as an "inline" of the former. Recognizing the rotation patterns generally, ala LLVM, would be even better but quite a bit of work I'd imagine. Probably not worth it given current resource constraints (Walter's time). Lots of big front-end fish to fry. -- |
March 21, 2021 [Issue 21041] core.bitop.byteswap(ushort) should used ROL/ROR instead of XCHG | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=21041 Iain Buclaw <ibuclaw@gdcproject.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |backend CC| |ibuclaw@gdcproject.org -- |
December 17, 2022 [Issue 21041] core.bitop.byteswap(ushort) should used ROL/ROR instead of XCHG | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=21041 Iain Buclaw <ibuclaw@gdcproject.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P1 |P4 -- |
Copyright © 1999-2021 by the D Language Foundation