Thread overview
Cross-module inlining in gdc
Feb 09, 2011
Mike Farnsworth
Feb 09, 2011
Trass3r
Feb 09, 2011
Mike Farnsworth
Feb 09, 2011
Iain Buclaw
February 09, 2011
So, as I've been working on getting the gcc builtins available to D code (somewhat successfully as of last night, I might add), I've run into a fairly significant inlining problem.

Given a function definition in D, where I want to force inlining:

// Assume __v4sf is defined by the compiler
pragma(set_attribute, _mm_add_ps, always_inline, artificial);
__v4sf _mm_add_ps (__v4sf __A, __v4sf __B)
{
    return __builtin_ia32_addps(__A, __B);
}

When this occurs in the module I care about, it works dandy.  It gets inlined, the generated code is pretty optimal, etc.  When it is defined in another module, and I call the function, I get messages like "sorry, unimplemented: inlining failed" where it states it doesn't have the body of the function.

I was compiling each file as a separate module, one at a time, so I used -combine to give it multiple source files at once and allow it to link it right away.  That didn't make any difference.  If I take away the pragma, it will then compile, but it never inlines.

When doing -combine, is there a way to get gdc to feed all of the source to the frontend all at once, such that all the definitions/bodies/etc. are all present so that inlining can occur?  I would imagine even this strategy falls apart when linking against a library; is there any way we can support something like -flto so that at codegen time gcc has more opportunity to do inlining?

Intrinsic wrappers defined in a different module, and then never getting inlined kinda defeats the purpose of the intrinsics.  It'd be nice if we can find a way to get cross-module inlining to work, even if it means using link-time optimization.

-Mike

February 09, 2011
> // Assume __v4sf is defined by the compiler
> pragma(set_attribute, _mm_add_ps, always_inline, artificial);
> __v4sf _mm_add_ps (__v4sf __A, __v4sf __B)
> {
>     return __builtin_ia32_addps(__A, __B);
> }

2 notes:
Isn't it pragma(GNU_set_attribute?

And you should be able to do
pragma(GNU_attribute, always_inline, artificial)
__v4sf _mm_add_ps....

as well.
February 09, 2011
Trass3r Wrote:

> > // Assume __v4sf is defined by the compiler
> > pragma(set_attribute, _mm_add_ps, always_inline, artificial);
> > __v4sf _mm_add_ps (__v4sf __A, __v4sf __B)
> > {
> >     return __builtin_ia32_addps(__A, __B);
> > }
> 
> 2 notes:
> Isn't it pragma(GNU_set_attribute?
> 
> And you should be able to do
> pragma(GNU_attribute, always_inline, artificial)
> __v4sf _mm_add_ps....
> 
> as well.

That's the syntax that ibuclaw gave me, and it does indeed work.  GNU_set_attribute is deprecated now, as far as I know (from spelunking through the code).

-Mike

February 09, 2011
== Quote from Mike Farnsworth (mike.farnsworth@gmail.com)'s article
> When doing -combine, is there a way to get gdc to feed all of the source to the
frontend all at once, such that all the definitions/bodies/etc. are all present so that inlining can occur?  I would imagine even this strategy falls apart when linking against a library; is there any way we can support something like -flto so that at codegen time gcc has more opportunity to do inlining?

-combine does feed all of the source to the frontend all at once. Why it doesn't get inlined is likely because the gcc backend consider to not do so (ie: because code size would grow).

-flto should be supported if gcc was builtin with it enabled (--enable-languages=lto) I've never tried it though, so that's a second guess.