Thread overview
call @PLT Performance
Jan 16, 2019
SrMordred
Jan 16, 2019
Johan Engelen
Jan 16, 2019
SrMordred
January 16, 2019
Compiler noob here:

auto a = popcnt(bitset);
auto b = bsf(bitset);

generate this:

call    pure nothrow @nogc @safe int core.bitop.popcnt(uint)@PLT
call    pure nothrow @nogc @safe int core.bitop.bsf(uint)@PLT

Why not generate the bsf/popcnt instruction?

Aren't this call's slower?

(this question expand to all the places where calls to @PLT happen)
January 16, 2019
On Wednesday, 16 January 2019 at 13:03:59 UTC, SrMordred wrote:
> Compiler noob here:
>
> auto a = popcnt(bitset);
> auto b = bsf(bitset);
>
> generate this:
>
> call    pure nothrow @nogc @safe int core.bitop.popcnt(uint)@PLT
> call    pure nothrow @nogc @safe int core.bitop.bsf(uint)@PLT
>
> Why not generate the bsf/popcnt instruction?
>
> Aren't this call's slower?

Yeah this is a known issue: LDC does not cross-module inline. You can enable that by passing the "-enable-cross-module-inlining" compile flag.
It's a long standing issue, but became a little less urgent because of LTO (`-flto=...`).

-Johan

January 16, 2019
On Wednesday, 16 January 2019 at 14:19:27 UTC, Johan Engelen wrote:
> On Wednesday, 16 January 2019 at 13:03:59 UTC, SrMordred wrote:
>> Compiler noob here:
>>
>> auto a = popcnt(bitset);
>> auto b = bsf(bitset);
>>
>> generate this:
>>
>> call    pure nothrow @nogc @safe int core.bitop.popcnt(uint)@PLT
>> call    pure nothrow @nogc @safe int core.bitop.bsf(uint)@PLT
>>
>> Why not generate the bsf/popcnt instruction?
>>
>> Aren't this call's slower?
>
> Yeah this is a known issue: LDC does not cross-module inline. You can enable that by passing the "-enable-cross-module-inlining" compile flag.
> It's a long standing issue, but became a little less urgent because of LTO (`-flto=...`).
>
> -Johan

Oh Nice, thanks!