Thread overview | |||||
---|---|---|---|---|---|
|
April 27, 2019 LLVM codgen improvement, count bits intrinsics | ||||
---|---|---|---|---|
| ||||
Where do you sugest to LLVM people that codegem could be improved? The bit scan forward and reverse both test for zero and do jumps (when you want zero defined), when they could be doing conditional moves because both instructions st the zero flag if the input is zero. Basically... import ldc.intrinsics; alias llvm_bsf = llvm_cttz; void foo(int a) { a = llvm_bsf(a,false); writeln(a); } compiles to this... test ebx, ebx je .LBB0_1 bsf ebx, ebx jmp .LBB0_3 .LBB0_1: mov ebx, 32 .LBB0_3: where it could just be mov edi,32 bsf ebx,ebx cmovz ebx,edi |
April 28, 2019 Re: LLVM codgen improvement, count bits intrinsics | ||||
---|---|---|---|---|
| ||||
Posted in reply to NaN | On Saturday, 27 April 2019 at 20:25:01 UTC, NaN wrote: > Where do you sugest to LLVM people that codegem could be improved? On their mailinglist or in their bug tracker. > The bit scan forward and reverse both test for zero and do jumps (when you want zero defined), when they could be doing conditional moves because both instructions st the zero flag if the input is zero. Two remarks: 1. Conditional move is not necessarily faster than branching 2. On recent CPUs `tzcnt` is the better instruction that has defined output for input 0 -Johan |
April 30, 2019 Re: LLVM codgen improvement, count bits intrinsics | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johan Engelen | On Sunday, 28 April 2019 at 11:22:57 UTC, Johan Engelen wrote:
> On Saturday, 27 April 2019 at 20:25:01 UTC, NaN wrote:
>>
> Two remarks:
> 1. Conditional move is not necessarily faster than branching
> 2. On recent CPUs `tzcnt` is the better instruction that has defined output for input 0
Unfortunately neither my CPU or myself are very recent.
|
Copyright © 1999-2021 by the D Language Foundation