optimize vector code

Nov 30, 2007

Sascha Katzner

Nov 30, 2007

Bill Baxter

Nov 30, 2007

Nov 30, 2007

Nov 30, 2007

Nov 30, 2007

Dec 01, 2007

Hi, I'm currently trying to optimize my vector/matrix code. the relevant section: > struct Vector3(T) { > T x, y, z; > void opAddAssign(Vector3 v) { > x += v.x; > y += v.y; > z += v.z; > } > Vector3 opMul(T s) { > return Vector3(x * s, y * s, z * s); > } > } If you compare the resulting code from this two examples: first: > v1 += v2 * 3.0f; => 0x59 bytes second: > v1.x += v2.x * 3.0f; > v1.y += v2.y * 3.0f; > v1.z += v2.z * 3.0f; => 0x36 bytes ...it is rather obvious that this is not very good optimized, because opMul() creates a new struct in the first example. Is there any Way to guide the compiler in the first example to create more efficient code? LLAP, Sascha P.S. attached a complete example of this source

Sascha Katzner wrote: > Hi, > > I'm currently trying to optimize my vector/matrix code. > > the relevant section: >> struct Vector3(T) { >> T x, y, z; >> void opAddAssign(Vector3 v) { >> x += v.x; >> y += v.y; >> z += v.z; >> } >> Vector3 opMul(T s) { >> return Vector3(x * s, y * s, z * s); >> } >> } > > If you compare the resulting code from this two examples: > > first: >> v1 += v2 * 3.0f; > => 0x59 bytes > > second: >> v1.x += v2.x * 3.0f; >> v1.y += v2.y * 3.0f; >> v1.z += v2.z * 3.0f; > => 0x36 bytes > > ...it is rather obvious that this is not very good optimized, because opMul() creates a new struct in the first example. Is there any Way to guide the compiler in the first example to create more efficient code? > > LLAP, > Sascha > > P.S. attached a complete example of this source > Pass big structs by reference. void opAddAssign(ref Vector3 v) {... --bb

Bill Baxter wrote: > Pass big structs by reference. > void opAddAssign(ref Vector3 v) {... In this example this is not a good idea, because it prevents that the compiler inlines opAddAssign(). Don't know why, but it does. yourSuggestion.sizeof = 0x38 + 0x23 = 0x5b bytes ;-) LLAP, Sascha

Sascha Katzner wrote: > Bill Baxter wrote: >> Pass big structs by reference. >> void opAddAssign(ref Vector3 v) {... > > In this example this is not a good idea, because it prevents that the compiler inlines opAddAssign(). Don't know why, but it does. > > yourSuggestion.sizeof = 0x38 + 0x23 = 0x5b bytes ;-) > > LLAP, > Sascha All I know is that actual benchmarking has been done on raytracers and changing all the pass-by-values to pass-by-ref improved speed. I have no idea what your sizeof is benchmarking there. But if you're interested in actual execution speed I suggest measuring time rather than bytes. I'd be very interested to know if pass-by-ref is no longer faster than pass-by-value for big structs. --bb

Bill Baxter wrote: > All I know is that actual benchmarking has been done on raytracers and changing all the pass-by-values to pass-by-ref improved speed. > > I have no idea what your sizeof is benchmarking there. But if you're interested in actual execution speed I suggest measuring time rather than bytes. I'd be very interested to know if pass-by-ref is no longer faster than pass-by-value for big structs. Youre right, comparing the size of the generated code was a VERY rough estimate... and wrong in this case. I've benchmarked the three cases and got: 9.5s without ref 6.7s with ref (<- your suggestion) 4.1s manual inlined So, it is a lot faster indeed, but yet not as fast as inling the functions manually. :( LLAP, Sascha

November 30, 2007

Re: optimize vector code

Posted by Bill Baxter
in reply to Sascha Katzner

Permalink

Bill Baxter

Posted in reply to Sascha Katzner

Permalink

Sascha Katzner wrote:
> Bill Baxter wrote:
>> All I know is that actual benchmarking has been done on raytracers and changing all the pass-by-values to pass-by-ref improved speed.
>>
>> I have no idea what your sizeof is benchmarking there.  But if you're interested in actual execution speed I suggest measuring time rather than bytes.  I'd be very interested to know if pass-by-ref is no longer faster than pass-by-value for big structs.
> 
> Youre right, comparing the size of the generated code was a VERY rough estimate... and wrong in this case.
> 
> I've benchmarked the three cases and got:
> 9.5s without ref
> 6.7s with ref (<- your suggestion)
> 4.1s manual inlined
> 
> So, it is a lot faster indeed, but yet not as fast as inling the functions manually. :(

It's been mentioned before that DMD is particularly poor at floating point optimizations.  If it matters a lot to you, you might be able to get better results from GDC, which uses gcc's backend.  If you do try it I'd love to hear the benchmark results.

--bb

> > So, it is a lot faster indeed, but yet not as fast as inling the functions manually. :( > > It's been mentioned before that DMD is particularly poor at floating point optimizations. If it matters a lot to you, you might be able to get better results from GDC, which uses gcc's backend. If you do try it I'd love to hear the benchmark results. > > --bb If the inlining was done correctly how could floating-point-optimizations account for the difference in speed? Or am I missing something? (probably:)

Forums