Thread overview | |||||||||
---|---|---|---|---|---|---|---|---|---|
|
November 30, 2007 optimize vector code | ||||
---|---|---|---|---|
| ||||
Attachments: | Hi, I'm currently trying to optimize my vector/matrix code. the relevant section: > struct Vector3(T) { > T x, y, z; > void opAddAssign(Vector3 v) { > x += v.x; > y += v.y; > z += v.z; > } > Vector3 opMul(T s) { > return Vector3(x * s, y * s, z * s); > } > } If you compare the resulting code from this two examples: first: > v1 += v2 * 3.0f; => 0x59 bytes second: > v1.x += v2.x * 3.0f; > v1.y += v2.y * 3.0f; > v1.z += v2.z * 3.0f; => 0x36 bytes ...it is rather obvious that this is not very good optimized, because opMul() creates a new struct in the first example. Is there any Way to guide the compiler in the first example to create more efficient code? LLAP, Sascha P.S. attached a complete example of this source |
November 30, 2007 Re: optimize vector code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sascha Katzner | Sascha Katzner wrote:
> Hi,
>
> I'm currently trying to optimize my vector/matrix code.
>
> the relevant section:
>> struct Vector3(T) {
>> T x, y, z;
>> void opAddAssign(Vector3 v) {
>> x += v.x;
>> y += v.y;
>> z += v.z;
>> }
>> Vector3 opMul(T s) {
>> return Vector3(x * s, y * s, z * s);
>> }
>> }
>
> If you compare the resulting code from this two examples:
>
> first:
>> v1 += v2 * 3.0f;
> => 0x59 bytes
>
> second:
>> v1.x += v2.x * 3.0f;
>> v1.y += v2.y * 3.0f;
>> v1.z += v2.z * 3.0f;
> => 0x36 bytes
>
> ...it is rather obvious that this is not very good optimized, because opMul() creates a new struct in the first example. Is there any Way to guide the compiler in the first example to create more efficient code?
>
> LLAP,
> Sascha
>
> P.S. attached a complete example of this source
>
Pass big structs by reference.
void opAddAssign(ref Vector3 v) {...
--bb
|
November 30, 2007 Re: optimize vector code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Baxter | Bill Baxter wrote:
> Pass big structs by reference.
> void opAddAssign(ref Vector3 v) {...
In this example this is not a good idea, because it prevents that the compiler inlines opAddAssign(). Don't know why, but it does.
yourSuggestion.sizeof = 0x38 + 0x23 = 0x5b bytes ;-)
LLAP,
Sascha
|
November 30, 2007 Re: optimize vector code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sascha Katzner | Sascha Katzner wrote:
> Bill Baxter wrote:
>> Pass big structs by reference.
>> void opAddAssign(ref Vector3 v) {...
>
> In this example this is not a good idea, because it prevents that the compiler inlines opAddAssign(). Don't know why, but it does.
>
> yourSuggestion.sizeof = 0x38 + 0x23 = 0x5b bytes ;-)
>
> LLAP,
> Sascha
All I know is that actual benchmarking has been done on raytracers and changing all the pass-by-values to pass-by-ref improved speed.
I have no idea what your sizeof is benchmarking there. But if you're interested in actual execution speed I suggest measuring time rather than bytes. I'd be very interested to know if pass-by-ref is no longer faster than pass-by-value for big structs.
--bb
|
November 30, 2007 Re: optimize vector code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Baxter Attachments: | Bill Baxter wrote:
> All I know is that actual benchmarking has been done on raytracers and changing all the pass-by-values to pass-by-ref improved speed.
>
> I have no idea what your sizeof is benchmarking there. But if you're interested in actual execution speed I suggest measuring time rather than bytes. I'd be very interested to know if pass-by-ref is no longer faster than pass-by-value for big structs.
Youre right, comparing the size of the generated code was a VERY rough estimate... and wrong in this case.
I've benchmarked the three cases and got:
9.5s without ref
6.7s with ref (<- your suggestion)
4.1s manual inlined
So, it is a lot faster indeed, but yet not as fast as inling the functions manually. :(
LLAP,
Sascha
|
November 30, 2007 Re: optimize vector code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sascha Katzner | Sascha Katzner wrote:
> Bill Baxter wrote:
>> All I know is that actual benchmarking has been done on raytracers and changing all the pass-by-values to pass-by-ref improved speed.
>>
>> I have no idea what your sizeof is benchmarking there. But if you're interested in actual execution speed I suggest measuring time rather than bytes. I'd be very interested to know if pass-by-ref is no longer faster than pass-by-value for big structs.
>
> Youre right, comparing the size of the generated code was a VERY rough estimate... and wrong in this case.
>
> I've benchmarked the three cases and got:
> 9.5s without ref
> 6.7s with ref (<- your suggestion)
> 4.1s manual inlined
>
> So, it is a lot faster indeed, but yet not as fast as inling the functions manually. :(
It's been mentioned before that DMD is particularly poor at floating point optimizations. If it matters a lot to you, you might be able to get better results from GDC, which uses gcc's backend. If you do try it I'd love to hear the benchmark results.
--bb
|
December 01, 2007 Re: optimize vector code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Baxter |
> > So, it is a lot faster indeed, but yet not as fast as inling the functions manually. :(
>
> It's been mentioned before that DMD is particularly poor at floating point optimizations. If it matters a lot to you, you might be able to get better results from GDC, which uses gcc's backend. If you do try it I'd love to hear the benchmark results.
>
> --bb
If the inlining was done correctly how could floating-point-optimizations account for the difference in speed? Or am I missing something? (probably:)
|
Copyright © 1999-2021 by the D Language Foundation