| |
| Posted by max haughton in reply to ClapTrap | PermalinkReply |
|
max haughton
Posted in reply to ClapTrap
| On Monday, 6 December 2021 at 00:38:18 UTC, ClapTrap wrote:
> On Sunday, 5 December 2021 at 23:36:21 UTC, kinke wrote:
> On Sunday, 5 December 2021 at 21:38:55 UTC, max haughton wrote:
> On Friday, 3 December 2021 at 21:24:07 UTC, claptrap wrote:
> Just a simple function to split a bezier in two.
Using "-O3"
LDC the operator version is 84 instructions
LDC the hand expanded math is 49 instructions.
It seems something as simple as this should be better optimised? Or am I missing something?
https://godbolt.org/z/4h9vob3Yo
[...]
[...]
Seems like GCC does not have this issue.
With gdc v11.1, I count 69 instructions for split and 51 for split2 (59 with -O3). So I guess there's a semantic difference here with the slightly changed evaluation order (2D addition before scaling).
gdc v11.1 doesn't inline the operator calls when I try it, if you try an earlier version 10.2 it does which reduces it to 48 instructions
> With alias Point = __vector(float[2]) , split is reduced to 28 instructions: https://godbolt.org/z/7ffebjaz8
Wow, that's awesome!
To make GCC inline properly without LTO you can use -fwhole-program .
Maybe Iain also has a flag that restores the old template behaviour.
These kinds of wacky phase ordering (I assume) issues is why I am slightly distrustful of GDC post-inlining decision.
|