Thread overview | ||||||
---|---|---|---|---|---|---|
|
February 09, 2011 Cross-module inlining in gdc | ||||
---|---|---|---|---|
| ||||
So, as I've been working on getting the gcc builtins available to D code (somewhat successfully as of last night, I might add), I've run into a fairly significant inlining problem. Given a function definition in D, where I want to force inlining: // Assume __v4sf is defined by the compiler pragma(set_attribute, _mm_add_ps, always_inline, artificial); __v4sf _mm_add_ps (__v4sf __A, __v4sf __B) { return __builtin_ia32_addps(__A, __B); } When this occurs in the module I care about, it works dandy. It gets inlined, the generated code is pretty optimal, etc. When it is defined in another module, and I call the function, I get messages like "sorry, unimplemented: inlining failed" where it states it doesn't have the body of the function. I was compiling each file as a separate module, one at a time, so I used -combine to give it multiple source files at once and allow it to link it right away. That didn't make any difference. If I take away the pragma, it will then compile, but it never inlines. When doing -combine, is there a way to get gdc to feed all of the source to the frontend all at once, such that all the definitions/bodies/etc. are all present so that inlining can occur? I would imagine even this strategy falls apart when linking against a library; is there any way we can support something like -flto so that at codegen time gcc has more opportunity to do inlining? Intrinsic wrappers defined in a different module, and then never getting inlined kinda defeats the purpose of the intrinsics. It'd be nice if we can find a way to get cross-module inlining to work, even if it means using link-time optimization. -Mike |
February 09, 2011 Re: Cross-module inlining in gdc | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike Farnsworth | > // Assume __v4sf is defined by the compiler
> pragma(set_attribute, _mm_add_ps, always_inline, artificial);
> __v4sf _mm_add_ps (__v4sf __A, __v4sf __B)
> {
> return __builtin_ia32_addps(__A, __B);
> }
2 notes:
Isn't it pragma(GNU_set_attribute?
And you should be able to do
pragma(GNU_attribute, always_inline, artificial)
__v4sf _mm_add_ps....
as well.
|
February 09, 2011 Re: Cross-module inlining in gdc | ||||
---|---|---|---|---|
| ||||
Posted in reply to Trass3r | Trass3r Wrote:
> > // Assume __v4sf is defined by the compiler
> > pragma(set_attribute, _mm_add_ps, always_inline, artificial);
> > __v4sf _mm_add_ps (__v4sf __A, __v4sf __B)
> > {
> > return __builtin_ia32_addps(__A, __B);
> > }
>
> 2 notes:
> Isn't it pragma(GNU_set_attribute?
>
> And you should be able to do
> pragma(GNU_attribute, always_inline, artificial)
> __v4sf _mm_add_ps....
>
> as well.
That's the syntax that ibuclaw gave me, and it does indeed work. GNU_set_attribute is deprecated now, as far as I know (from spelunking through the code).
-Mike
|
February 09, 2011 Re: Cross-module inlining in gdc | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike Farnsworth | == Quote from Mike Farnsworth (mike.farnsworth@gmail.com)'s article
> When doing -combine, is there a way to get gdc to feed all of the source to the
frontend all at once, such that all the definitions/bodies/etc. are all present so that inlining can occur? I would imagine even this strategy falls apart when linking against a library; is there any way we can support something like -flto so that at codegen time gcc has more opportunity to do inlining?
-combine does feed all of the source to the frontend all at once. Why it doesn't get inlined is likely because the gcc backend consider to not do so (ie: because code size would grow).
-flto should be supported if gcc was builtin with it enabled (--enable-languages=lto) I've never tried it though, so that's a second guess.
|
Copyright © 1999-2021 by the D Language Foundation