Thread overview
Operator overloading leads to bad code optimization
Dec 03, 2021
claptrap
Dec 05, 2021
max haughton
Dec 05, 2021
kinke
Dec 06, 2021
ClapTrap
Dec 06, 2021
max haughton
Dec 08, 2021
max haughton
December 03, 2021

Just a simple function to split a bezier in two.

Using "-O3"

LDC the operator version is 84 instructions
LDC the hand expanded math is 49 instructions.

It seems something as simple as this should be better optimised? Or am I missing something?

https://godbolt.org/z/4h9vob3Yo

In fact there's quite a few bits where it looks like completely redundant code is left in? Eg...

123 movss dword ptr [rsp - 24], xmm1
124 movss xmm0, dword ptr [rip + .LCPI4_0]
125 mulss xmm1, xmm0
126 movss dword ptr [rsp - 24], xmm1

137 movss dword ptr [rsp - 24], xmm2
138 mulss xmm2, xmm0
139 movss dword ptr [rsp - 24], xmm2

December 05, 2021

On Friday, 3 December 2021 at 21:24:07 UTC, claptrap wrote:

>

Just a simple function to split a bezier in two.

Using "-O3"

LDC the operator version is 84 instructions
LDC the hand expanded math is 49 instructions.

It seems something as simple as this should be better optimised? Or am I missing something?

https://godbolt.org/z/4h9vob3Yo

In fact there's quite a few bits where it looks like completely redundant code is left in? Eg...

123 movss dword ptr [rsp - 24], xmm1
124 movss xmm0, dword ptr [rip + .LCPI4_0]
125 mulss xmm1, xmm0
126 movss dword ptr [rsp - 24], xmm1

137 movss dword ptr [rsp - 24], xmm2
138 mulss xmm2, xmm0
139 movss dword ptr [rsp - 24], xmm2

This is (to me at least) an odd one. Maybe there's a pass-ordering issue here leading to bad code.

Seems like GCC does not have this issue.

December 05, 2021

On Sunday, 5 December 2021 at 21:38:55 UTC, max haughton wrote:

>

On Friday, 3 December 2021 at 21:24:07 UTC, claptrap wrote:

>

Just a simple function to split a bezier in two.

Using "-O3"

LDC the operator version is 84 instructions
LDC the hand expanded math is 49 instructions.

It seems something as simple as this should be better optimised? Or am I missing something?

https://godbolt.org/z/4h9vob3Yo
[...]

[...]
Seems like GCC does not have this issue.

With gdc v11.1, I count 69 instructions for split and 51 for split2 (59 with -O3). So I guess there's a semantic difference here with the slightly changed evaluation order (2D addition before scaling).

With alias Point = __vector(float[2]), split is reduced to 28 instructions: https://godbolt.org/z/7ffebjaz8

December 06, 2021

On Sunday, 5 December 2021 at 23:36:21 UTC, kinke wrote:

>

On Sunday, 5 December 2021 at 21:38:55 UTC, max haughton wrote:

>

On Friday, 3 December 2021 at 21:24:07 UTC, claptrap wrote:

>

Just a simple function to split a bezier in two.

Using "-O3"

LDC the operator version is 84 instructions
LDC the hand expanded math is 49 instructions.

It seems something as simple as this should be better optimised? Or am I missing something?

https://godbolt.org/z/4h9vob3Yo
[...]

[...]
Seems like GCC does not have this issue.

With gdc v11.1, I count 69 instructions for split and 51 for split2 (59 with -O3). So I guess there's a semantic difference here with the slightly changed evaluation order (2D addition before scaling).

gdc v11.1 doesn't inline the operator calls when I try it, if you try an earlier version 10.2 it does which reduces it to 48 instructions

>

With alias Point = __vector(float[2]), split is reduced to 28 instructions: https://godbolt.org/z/7ffebjaz8

Wow, that's awesome!

December 06, 2021

On Monday, 6 December 2021 at 00:38:18 UTC, ClapTrap wrote:

>

On Sunday, 5 December 2021 at 23:36:21 UTC, kinke wrote:

>

On Sunday, 5 December 2021 at 21:38:55 UTC, max haughton wrote:

>

On Friday, 3 December 2021 at 21:24:07 UTC, claptrap wrote:

>

Just a simple function to split a bezier in two.

Using "-O3"

LDC the operator version is 84 instructions
LDC the hand expanded math is 49 instructions.

It seems something as simple as this should be better optimised? Or am I missing something?

https://godbolt.org/z/4h9vob3Yo
[...]

[...]
Seems like GCC does not have this issue.

With gdc v11.1, I count 69 instructions for split and 51 for split2 (59 with -O3). So I guess there's a semantic difference here with the slightly changed evaluation order (2D addition before scaling).

gdc v11.1 doesn't inline the operator calls when I try it, if you try an earlier version 10.2 it does which reduces it to 48 instructions

>

With alias Point = __vector(float[2]), split is reduced to 28 instructions: https://godbolt.org/z/7ffebjaz8

Wow, that's awesome!

To make GCC inline properly without LTO you can use -fwhole-program.

Maybe Iain also has a flag that restores the old template behaviour.

These kinds of wacky phase ordering (I assume) issues is why I am slightly distrustful of GDC post-inlining decision.

December 06, 2021

On Monday, 6 December 2021 at 00:41:06 UTC, max haughton wrote:

>

These kinds of wacky phase ordering (I assume) issues is why I am slightly distrustful of GDC post-inlining decision.

Multiplying with 0.5 only affects the exponent, but the add could overflow/underflow. Maybe that is wacky for D since it specifies IEEE compliance, but in the gcc-family -O# is really a shortcut for a set of options. If I specify -O or -O3 I would expect the same options as gcc. Otherwise people will claim that C++ is faster?

December 08, 2021

On Monday, 6 December 2021 at 11:55:08 UTC, Ola Fosheim Grøstad wrote:

>

On Monday, 6 December 2021 at 00:41:06 UTC, max haughton wrote:

>

These kinds of wacky phase ordering (I assume) issues is why I am slightly distrustful of GDC post-inlining decision.

Multiplying with 0.5 only affects the exponent, but the add could overflow/underflow. Maybe that is wacky for D since it specifies IEEE compliance, but in the gcc-family -O# is really a shortcut for a set of options. If I specify -O or -O3 I would expect the same options as gcc. Otherwise people will claim that C++ is faster?

My sentence was referring to Iains decision to refuse to inline templates (i.e. defer to LTO). Makes it harder to work out what the compiler is going to do / is doing.