Thread overview
Inlining problem of core.bitops
Dec 21, 2013
bearophile
Dec 28, 2013
jkrempus
Dec 28, 2013
bearophile
Dec 28, 2013
jkrempus
Dec 28, 2013
David Nadlinger
Dec 28, 2013
bearophile
Oct 20, 2015
Marco Leise
Oct 20, 2015
John Colvin
Oct 20, 2015
John Colvin
Oct 20, 2015
Marco Leise
December 21, 2013
A little test program:


import core.bitop;

uint foo1(in uint x) pure nothrow {
    return bsf(x);
}

version(LDC) {
    import ldc.intrinsics;

    uint foo2(in uint x) pure nothrow {
        return llvm_cttz(x, true);
    }

    uint foo3(in uint x) pure nothrow {
        return llvm_cttz(x, false);
    }
}

void main() {}

-------------------------

DMD gives me this asm, showing the direct use of bsf instruction:

dmd -O -release -inline test.d


_D4test4foo1FNaNbxkZk:
    push    EAX
    bsf EAX,AL
    pop ECX
    ret

-------------------------

Wile ldc2 doesn't inline core.bitop.bsf, but it inlines llvm_cttz:


ldmd2 -O -release -inline -output-s test.d

LDC - the LLVM D compiler (0.12.1):
  based on DMD v2.063.2 and LLVM 3.3.1
  Default target: i686-pc-mingw32


__D4test4foo1FNaNbxkZk:
    calll   __D4core5bitop3bsfFNaNbNfkZi
    ret

__D4test4foo2FNaNbxkZk:
    bsfl    %eax, %eax
    ret

__D4test4foo3FNaNbxkZk:
    movl    $32, %ecx
    bsfl    %eax, %eax
    cmovel  %ecx, %eax
    ret

-------------------------

I have seen the same problem with core.bitop.popcnt versus llvm_ctpop().

Bye,
bearophile
December 28, 2013
In LDC, core.bitop.bsf is just an ordinary function compiled in libdruntime-ldc.a. Since bitop.d isn't on the command line, LDC uses the precompiled code in the library, which can't be inlined. You can get it to inline bsf by putting bitop.d on the command line:

ldmd2 -O -release -inline -output-s test.d /opt/ldc/include/d/core/bitop.d

_D4test4foo1FNaNbxkZk:
    .cfi_startproc
    movl	%edi, %eax
    bsfq	%rax, %rax
    ret

It inlines llvm_cttz because that is an llvm intrinsic.
December 28, 2013
jkrempus@gmail.com:

> It inlines llvm_cttz because that is an llvm intrinsic.

I see, thank you.
Can't ldc2 replace a call to core.bitop.bsf with the llvm intrinsic?

Bye,
bearophile
December 28, 2013
> Can't ldc2 replace a call to core.bitop.bsf with the llvm intrinsic?

It would be possilbe to add an ldc intrinsic that
would tell ldc to do that. But I think it would be a better, more general
solution to add a forceinline attribute that would force compilation of
function body whether the containing module was on the command line or
not, and mark the resulting function as alwaysinline.

It is currently almost possible to implement bsf using LDC_inline_ir
(which we result in bsf being always inlined).
The only problem is that the the compilation will fail if llvm intrinsic
llvm.cttz.i64 isn't declared at the time when inline ir is parsed. It
may be possible to fix this behavior of LDC_inline_ir.


December 28, 2013
On Sat, Dec 28, 2013 at 1:13 PM,  <jkrempus@gmail.com> wrote:
> But I think it would be a better, more general
> solution to add a forceinline attribute that would force compilation of
> function body whether the containing module was on the command line or
> not, and mark the resulting function as alwaysinline.

I agree. Now we only need somebody to actually implement this feature *hint* *hint*: https://github.com/ldc-developers/ldc/issues/561

David
December 28, 2013
David Nadlinger:

> https://github.com/ldc-developers/ldc/issues/561

Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a @alwaysinline or @forceinline as D standard. The differences between D compilers should be minimized.

Bye,
bearophile
October 20, 2015
Am Sat, 28 Dec 2013 17:04:09 +0000
schrieb "bearophile" <bearophileHUGS@lycos.com>:

> David Nadlinger:
> 
> > https://github.com/ldc-developers/ldc/issues/561
> 
> Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a @alwaysinline or @forceinline as D standard. The differences between D compilers should be minimized.
> 
> Bye,
> bearophile

Funny enough, when working on fast.json I had to avoid bsr(),
too because of missed inlining. (It is a common need for
emulated floating point calculations.)

-- 
Marco

October 20, 2015
On Tuesday, 20 October 2015 at 07:15:52 UTC, Marco Leise wrote:
> Am Sat, 28 Dec 2013 17:04:09 +0000
> schrieb "bearophile" <bearophileHUGS@lycos.com>:
>
>> David Nadlinger:
>> 
>> > https://github.com/ldc-developers/ldc/issues/561
>> 
>> Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a @alwaysinline or @forceinline as D standard. The differences between D compilers should be minimized.
>> 
>> Bye,
>> bearophile
>
> Funny enough, when working on fast.json I had to avoid bsr(),
> too because of missed inlining. (It is a common need for
> emulated floating point calculations.)

If you copy the definition of bsr from ldc's druntime to the current module then ldc will inline it. Ugly but effective.
October 20, 2015
On Tuesday, 20 October 2015 at 09:12:28 UTC, John Colvin wrote:
> On Tuesday, 20 October 2015 at 07:15:52 UTC, Marco Leise wrote:
>> Am Sat, 28 Dec 2013 17:04:09 +0000
>> schrieb "bearophile" <bearophileHUGS@lycos.com>:
>>
>>> David Nadlinger:
>>> 
>>> > https://github.com/ldc-developers/ldc/issues/561
>>> 
>>> Given the intensity Manu wants this feature, I think this needs to be discussed in the main D newsgroup, to bring a @alwaysinline or @forceinline as D standard. The differences between D compilers should be minimized.
>>> 
>>> Bye,
>>> bearophile
>>
>> Funny enough, when working on fast.json I had to avoid bsr(),
>> too because of missed inlining. (It is a common need for
>> emulated floating point calculations.)
>
> If you copy the definition of bsr from ldc's druntime to the current module then ldc will inline it. Ugly but effective.

I also noticed better optimisations if I made bsr return a uint instead of an int.
October 20, 2015
Am Tue, 20 Oct 2015 09:13:43 +0000
schrieb John Colvin <john.loughran.colvin@gmail.com>:

> > If you copy the definition of bsr from ldc's druntime to the current module then ldc will inline it. Ugly but effective.
> 
> I also noticed better optimisations if I made bsr return a uint instead of an int.

Ah you see I got clz with ubyte return but missed bsr and bsf. Thanks for the reminder.

-- 
Marco