Thread overview | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
December 21, 2013 Inlining problem of core.bitops | ||||
---|---|---|---|---|
| ||||
A little test program: import core.bitop; uint foo1(in uint x) pure nothrow { return bsf(x); } version(LDC) { import ldc.intrinsics; uint foo2(in uint x) pure nothrow { return llvm_cttz(x, true); } uint foo3(in uint x) pure nothrow { return llvm_cttz(x, false); } } void main() {} ------------------------- DMD gives me this asm, showing the direct use of bsf instruction: dmd -O -release -inline test.d _D4test4foo1FNaNbxkZk: push EAX bsf EAX,AL pop ECX ret ------------------------- Wile ldc2 doesn't inline core.bitop.bsf, but it inlines llvm_cttz: ldmd2 -O -release -inline -output-s test.d LDC - the LLVM D compiler (0.12.1): based on DMD v2.063.2 and LLVM 3.3.1 Default target: i686-pc-mingw32 __D4test4foo1FNaNbxkZk: calll __D4core5bitop3bsfFNaNbNfkZi ret __D4test4foo2FNaNbxkZk: bsfl %eax, %eax ret __D4test4foo3FNaNbxkZk: movl $32, %ecx bsfl %eax, %eax cmovel %ecx, %eax ret ------------------------- I have seen the same problem with core.bitop.popcnt versus llvm_ctpop(). Bye, bearophile |
December 28, 2013 Re: Inlining problem of core.bitops | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | In LDC, core.bitop.bsf is just an ordinary function compiled in libdruntime-ldc.a. Since bitop.d isn't on the command line, LDC uses the precompiled code in the library, which can't be inlined. You can get it to inline bsf by putting bitop.d on the command line: ldmd2 -O -release -inline -output-s test.d /opt/ldc/include/d/core/bitop.d _D4test4foo1FNaNbxkZk: .cfi_startproc movl %edi, %eax bsfq %rax, %rax ret It inlines llvm_cttz because that is an llvm intrinsic. |
December 28, 2013 Re: Inlining problem of core.bitops | ||||
---|---|---|---|---|
| ||||
Posted in reply to jkrempus | jkrempus@gmail.com:
> It inlines llvm_cttz because that is an llvm intrinsic.
I see, thank you.
Can't ldc2 replace a call to core.bitop.bsf with the llvm intrinsic?
Bye,
bearophile
|
December 28, 2013 Re: Inlining problem of core.bitops | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | > Can't ldc2 replace a call to core.bitop.bsf with the llvm intrinsic?
It would be possilbe to add an ldc intrinsic that
would tell ldc to do that. But I think it would be a better, more general
solution to add a forceinline attribute that would force compilation of
function body whether the containing module was on the command line or
not, and mark the resulting function as alwaysinline.
It is currently almost possible to implement bsf using LDC_inline_ir
(which we result in bsf being always inlined).
The only problem is that the the compilation will fail if llvm intrinsic
llvm.cttz.i64 isn't declared at the time when inline ir is parsed. It
may be possible to fix this behavior of LDC_inline_ir.
|
December 28, 2013 Re: Inlining problem of core.bitops | ||||
---|---|---|---|---|
| ||||
Posted in reply to jkrempus | On Sat, Dec 28, 2013 at 1:13 PM, <jkrempus@gmail.com> wrote: > But I think it would be a better, more general > solution to add a forceinline attribute that would force compilation of > function body whether the containing module was on the command line or > not, and mark the resulting function as alwaysinline. I agree. Now we only need somebody to actually implement this feature *hint* *hint*: https://github.com/ldc-developers/ldc/issues/561 David |
December 28, 2013 Re: Inlining problem of core.bitops | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Nadlinger | David Nadlinger:
> https://github.com/ldc-developers/ldc/issues/561
Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a @alwaysinline or @forceinline as D standard. The differences between D compilers should be minimized.
Bye,
bearophile
|
October 20, 2015 Re: Inlining problem of core.bitops | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | Am Sat, 28 Dec 2013 17:04:09 +0000 schrieb "bearophile" <bearophileHUGS@lycos.com>: > David Nadlinger: > > > https://github.com/ldc-developers/ldc/issues/561 > > Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a @alwaysinline or @forceinline as D standard. The differences between D compilers should be minimized. > > Bye, > bearophile Funny enough, when working on fast.json I had to avoid bsr(), too because of missed inlining. (It is a common need for emulated floating point calculations.) -- Marco |
October 20, 2015 Re: Inlining problem of core.bitops | ||||
---|---|---|---|---|
| ||||
Posted in reply to Marco Leise | On Tuesday, 20 October 2015 at 07:15:52 UTC, Marco Leise wrote:
> Am Sat, 28 Dec 2013 17:04:09 +0000
> schrieb "bearophile" <bearophileHUGS@lycos.com>:
>
>> David Nadlinger:
>>
>> > https://github.com/ldc-developers/ldc/issues/561
>>
>> Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a @alwaysinline or @forceinline as D standard. The differences between D compilers should be minimized.
>>
>> Bye,
>> bearophile
>
> Funny enough, when working on fast.json I had to avoid bsr(),
> too because of missed inlining. (It is a common need for
> emulated floating point calculations.)
If you copy the definition of bsr from ldc's druntime to the current module then ldc will inline it. Ugly but effective.
|
October 20, 2015 Re: Inlining problem of core.bitops | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On Tuesday, 20 October 2015 at 09:12:28 UTC, John Colvin wrote:
> On Tuesday, 20 October 2015 at 07:15:52 UTC, Marco Leise wrote:
>> Am Sat, 28 Dec 2013 17:04:09 +0000
>> schrieb "bearophile" <bearophileHUGS@lycos.com>:
>>
>>> David Nadlinger:
>>>
>>> > https://github.com/ldc-developers/ldc/issues/561
>>>
>>> Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a @alwaysinline or @forceinline as D standard. The differences between D compilers should be minimized.
>>>
>>> Bye,
>>> bearophile
>>
>> Funny enough, when working on fast.json I had to avoid bsr(),
>> too because of missed inlining. (It is a common need for
>> emulated floating point calculations.)
>
> If you copy the definition of bsr from ldc's druntime to the current module then ldc will inline it. Ugly but effective.
I also noticed better optimisations if I made bsr return a uint instead of an int.
|
October 20, 2015 Re: Inlining problem of core.bitops | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | Am Tue, 20 Oct 2015 09:13:43 +0000 schrieb John Colvin <john.loughran.colvin@gmail.com>: > > If you copy the definition of bsr from ldc's druntime to the current module then ldc will inline it. Ugly but effective. > > I also noticed better optimisations if I made bsr return a uint instead of an int. Ah you see I got clz with ubyte return but missed bsr and bsf. Thanks for the reminder. -- Marco |
Copyright © 1999-2021 by the D Language Foundation