Thread overview
optimize vector code
Nov 30, 2007
Sascha Katzner
Nov 30, 2007
Bill Baxter
Nov 30, 2007
Sascha Katzner
Nov 30, 2007
Bill Baxter
Nov 30, 2007
Sascha Katzner
Nov 30, 2007
Bill Baxter
Dec 01, 2007
Saaa
November 30, 2007
Hi,

I'm currently trying to optimize my vector/matrix code.

the relevant section:
> struct Vector3(T) {
> 	T x, y, z;
> 	void opAddAssign(Vector3 v) {
> 		x += v.x;
> 		y += v.y;
> 		z += v.z;
> 	}
> 	Vector3 opMul(T s) {
> 		return Vector3(x * s, y * s, z * s);
> 	}
> }

If you compare the resulting code from this two examples:

first:
> 	v1 += v2 * 3.0f;
=> 0x59 bytes

second:
> 	v1.x += v2.x * 3.0f;
> 	v1.y += v2.y * 3.0f;
> 	v1.z += v2.z * 3.0f;
=> 0x36 bytes

...it is rather obvious that this is not very good optimized, because opMul() creates a new struct in the first example. Is there any Way to guide the compiler in the first example to create more efficient code?

LLAP,
Sascha

P.S. attached a complete example of this source


November 30, 2007
Sascha Katzner wrote:
> Hi,
> 
> I'm currently trying to optimize my vector/matrix code.
> 
> the relevant section:
>> struct Vector3(T) {
>>     T x, y, z;
>>     void opAddAssign(Vector3 v) {
>>         x += v.x;
>>         y += v.y;
>>         z += v.z;
>>     }
>>     Vector3 opMul(T s) {
>>         return Vector3(x * s, y * s, z * s);
>>     }
>> }
> 
> If you compare the resulting code from this two examples:
> 
> first:
>>     v1 += v2 * 3.0f;
> => 0x59 bytes
> 
> second:
>>     v1.x += v2.x * 3.0f;
>>     v1.y += v2.y * 3.0f;
>>     v1.z += v2.z * 3.0f;
> => 0x36 bytes
> 
> ...it is rather obvious that this is not very good optimized, because opMul() creates a new struct in the first example. Is there any Way to guide the compiler in the first example to create more efficient code?
> 
> LLAP,
> Sascha
> 
> P.S. attached a complete example of this source
> 

Pass big structs by reference.
     void opAddAssign(ref Vector3 v) {...

--bb
November 30, 2007
Bill Baxter wrote:
> Pass big structs by reference.
>      void opAddAssign(ref Vector3 v) {...

In this example this is not a good idea, because it prevents that the compiler inlines opAddAssign(). Don't know why, but it does.

yourSuggestion.sizeof = 0x38 + 0x23 = 0x5b bytes ;-)

LLAP,
Sascha
November 30, 2007
Sascha Katzner wrote:
> Bill Baxter wrote:
>> Pass big structs by reference.
>>      void opAddAssign(ref Vector3 v) {...
> 
> In this example this is not a good idea, because it prevents that the compiler inlines opAddAssign(). Don't know why, but it does.
> 
> yourSuggestion.sizeof = 0x38 + 0x23 = 0x5b bytes ;-)
> 
> LLAP,
> Sascha

All I know is that actual benchmarking has been done on raytracers and changing all the pass-by-values to pass-by-ref improved speed.

I have no idea what your sizeof is benchmarking there.  But if you're interested in actual execution speed I suggest measuring time rather than bytes.  I'd be very interested to know if pass-by-ref is no longer faster than pass-by-value for big structs.

--bb
November 30, 2007
Bill Baxter wrote:
> All I know is that actual benchmarking has been done on raytracers and changing all the pass-by-values to pass-by-ref improved speed.
> 
> I have no idea what your sizeof is benchmarking there.  But if you're interested in actual execution speed I suggest measuring time rather than bytes.  I'd be very interested to know if pass-by-ref is no longer faster than pass-by-value for big structs.

Youre right, comparing the size of the generated code was a VERY rough estimate... and wrong in this case.

I've benchmarked the three cases and got:
9.5s without ref
6.7s with ref (<- your suggestion)
4.1s manual inlined

So, it is a lot faster indeed, but yet not as fast as inling the functions manually. :(

LLAP,
Sascha


November 30, 2007
Sascha Katzner wrote:
> Bill Baxter wrote:
>> All I know is that actual benchmarking has been done on raytracers and changing all the pass-by-values to pass-by-ref improved speed.
>>
>> I have no idea what your sizeof is benchmarking there.  But if you're interested in actual execution speed I suggest measuring time rather than bytes.  I'd be very interested to know if pass-by-ref is no longer faster than pass-by-value for big structs.
> 
> Youre right, comparing the size of the generated code was a VERY rough estimate... and wrong in this case.
> 
> I've benchmarked the three cases and got:
> 9.5s without ref
> 6.7s with ref (<- your suggestion)
> 4.1s manual inlined
> 
> So, it is a lot faster indeed, but yet not as fast as inling the functions manually. :(

It's been mentioned before that DMD is particularly poor at floating point optimizations.  If it matters a lot to you, you might be able to get better results from GDC, which uses gcc's backend.  If you do try it I'd love to hear the benchmark results.

--bb
December 01, 2007
> > So, it is a lot faster indeed, but yet not as fast as inling the functions manually. :(
> 
> It's been mentioned before that DMD is particularly poor at floating point optimizations.  If it matters a lot to you, you might be able to get better results from GDC, which uses gcc's backend.  If you do try it I'd love to hear the benchmark results.
> 
> --bb

If the inlining was done correctly how could floating-point-optimizations account for the difference in speed? Or am I missing something? (probably:)