August 06, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | Iain Buclaw: > 1) using pointers over dynamic arrays. (5% speedup) > 2) removing the calls to CalVector4's constructor (5.7% speedup) With DMD I have seen 180k -> 190k vertices/sec replacing this: struct CalVector4 { float X, Y, Z, W; this(float x, float y, float z, float w = 0.0f) { X = x; Y = y; Z = z; W = w; } } With: struct CalVector4 { float X, Y, Z, W=0.0f; } I'd like the D compiler to optimize better there. > http://ideone.com/4PP2D This line of code is not good: auto vertices = cast(Vertex *) new Vertex[N]; This is much better, it's less bug-prone, simpler and shorter: auto vertices = (new Vertex[N]).ptr; But in practice in this program it is enough to allocate dynamic arrays normally, and then perform the call like this (with DMD it gives the same performance): calculateVerticesAndNormals(boneTransforms.ptr, N, vertices.ptr, influences.ptr, output.ptr); I don't know why passing pointers gives some more performance here, compared to passing dynamic arrays (but I have seen the same behaviour in other D programs of mine). Bye, bearophile | |||
August 06, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 8/6/2011 3:19 PM, bearophile wrote:
> I don't know why passing pointers gives some more performance here, compared
> to passing dynamic arrays (but I have seen the same behaviour in other D
> programs of mine).
A dynamic array is two values being passed, a pointer is one.
| |||
August 06, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | == Quote from bearophile (bearophileHUGS@lycos.com)'s article > Iain Buclaw: > > 1) using pointers over dynamic arrays. (5% speedup) > > 2) removing the calls to CalVector4's constructor (5.7% speedup) > With DMD I have seen 180k -> 190k vertices/sec replacing this: > struct CalVector4 { > float X, Y, Z, W; > this(float x, float y, float z, float w = 0.0f) { > X = x; > Y = y; > Z = z; > W = w; > } > } > With: > struct CalVector4 { > float X, Y, Z, W=0.0f; > } > I'd like the D compiler to optimize better there. > > http://ideone.com/4PP2D > This line of code is not good: > auto vertices = cast(Vertex *) new Vertex[N]; > This is much better, it's less bug-prone, simpler and shorter: > auto vertices = (new Vertex[N]).ptr; > But in practice in this program it is enough to allocate dynamic arrays normally, and then perform the call like this (with DMD it gives the same performance): > calculateVerticesAndNormals(boneTransforms.ptr, N, vertices.ptr, influences.ptr, output.ptr); I was playing about with heap vs stack. Must've forgot to remove that, sorry. :) Anyways, I've tweaked the GDC codegen, and program speed meets that of C++ now (on my system). Implementation: http://ideone.com/0j0L1 Command-line: gdc -O3 -mfpmath=sse -ffast-math -march=native -frelease g++ bench.cc -O3 -mfpmath=sse -ffast-math -march=native Best times: G++-32bit: 11400000 vps GDC-32bit: 11350000 vps Regards Iain | |||
August 06, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter:
> A dynamic array is two values being passed, a pointer is one.
I know, but I think there are many optimization opportunities. An example:
private void foo(int[] a2) {}
void main() {
int[100] a1;
foo(a1);
}
In code like that I think a D compiler is free to compile like this, because foo is private, so it's free to perform optimizations based on just the code inside the module:
private void foo(ref int[100] a2) {}
void main() {
int[100] a1;
foo(a1);
}
I think there are several cases where a D compiler is free to replace the two values with just a pointer.
Another example, to optimize code like this:
private void foo(int[] a1, int[] a2) {}
void main() {
int n = 100; // run-time value
auto a3 = new int[n];
auto a4 = new int[n];
foo(a3, a4);
}
Into something like this:
private void foo(int* a1, int* a2, size_t a1a2len) {}
void main() {
int n = 100;
auto a3 = new int[n];
auto a4 = new int[n];
foo(a3.ptr, a4.ptr, n);
}
Bye,
bearophile
| |||
August 06, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | Iain Buclaw: > Anyways, I've tweaked the GDC codegen, and program speed meets that of C++ now (on > my system). Are you willing to explain your changes (and maybe give a link to the changes)? Maybe Walter is interested for DMD too. > Command-line: > gdc -O3 -mfpmath=sse -ffast-math -march=native -frelease > g++ bench.cc -O3 -mfpmath=sse -ffast-math -march=native In newer versions of GCC -Ofast means -ffast-math too. Walter is not a lover of that -ffast-math switch. But I now think that the combination of D strongly pure functions with unsafe FP optimizations offers optimization opportunities that maybe not even GCC is able to use now when it compiles C/C++ code (do you see why?). Not using this opportunity is a waste, in my opinion. Bye, bearophile | |||
August 07, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 8/6/2011 4:46 PM, bearophile wrote:
> Walter is not a lover of that -ffast-math switch.
No, I am not. Few understand the subtleties of IEEE arithmetic, and breaking IEEE conformance is something very, very few should even consider.
| |||
August 07, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter: > On 8/6/2011 4:46 PM, bearophile wrote: > > Walter is not a lover of that -ffast-math switch. > > No, I am not. Few understand the subtleties of IEEE arithmetic, and breaking IEEE conformance is something very, very few should even consider. I have read several papers about FP arithmetic, but I am not an expert yet on them. Both GDC and LDC have compilation switches to perform those unsafe FP optimizations, so even if you don't like them, most D compilers today have them optional, and I don't think those switches will be removed. If you want to simulate a flock of boids (http://en.wikipedia.org/wiki/Boids ) on the screen using D, and you use floating point values to represent their speed vector, introducing unsafe FP optimizations will not harm so much. Video games are a significant purpose for D language, and in them FP errors are often benign (maybe some parts of the game are able to tolerate them and some other part of the game needs to be compiled with strict FP semantics). Bye, bearophile | |||
August 07, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | > Anyways, I've tweaked the GDC codegen, and program speed meets that of C++ now (on my system).
>
> Implementation: http://ideone.com/0j0L1
>
> Command-line:
> gdc -O3 -mfpmath=sse -ffast-math -march=native -frelease
> g++ bench.cc -O3 -mfpmath=sse -ffast-math -march=native
>
> Best times:
> G++-32bit: 11400000 vps
> GDC-32bit: 11350000 vps
>
>
> Regards
> Iain
64Bit:
C++:
45010000
44270000
42740000
43900000
44680000
43490000
42390000
GDC:
42900000
44010000
44000000
44010000
44010000
44000000
GDC with -fno-bounds-check:
43280000
44440000
44420000
44340000
44440000
44450000
| |||
August 08, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 8/6/2011 8:34 PM, bearophile wrote: > Walter: > >> On 8/6/2011 4:46 PM, bearophile wrote: >>> Walter is not a lover of that -ffast-math switch. >> >> No, I am not. Few understand the subtleties of IEEE arithmetic, and breaking >> IEEE conformance is something very, very few should even consider. > > I have read several papers about FP arithmetic, but I am not an expert yet on them. Both GDC and LDC have compilation switches to perform those unsafe FP optimizations, so even if you don't like them, most D compilers today have them optional, and I don't think those switches will be removed. > > If you want to simulate a flock of boids (http://en.wikipedia.org/wiki/Boids ) on the screen using D, and you use floating point values to represent their speed vector, introducing unsafe FP optimizations will not harm so much. Video games are a significant purpose for D language, and in them FP errors are often benign (maybe some parts of the game are able to tolerate them and some other part of the game needs to be compiled with strict FP semantics). > > Bye, > bearophile Floating point determinism can be very important when it comes to reducing network traffic. If you can achieve it, then you can make sure all players have the same game state and then only send user input commands over the network. Glenn Fiedler has an interesting writeup on it, but I haven't had a chance to read all of it yet: http://gafferongames.com/networking-for-game-programmers/floating-point-determinism/ | |||
August 08, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Eric Poggel (JoeCoder) | Eric Poggel (JoeCoder):
> determinism can be very important when it comes to reducing network traffic. If you can achieve it, then you can make sure all players have the same game state and then only send user input commands over the network.
It seems a hard thing to obtain, but I agree that it gets useful.
For me having some FP determinism is useful for debugging: to avoid results from changing randomly if I perform a tiny change in the source code that triggers a change in what optimizations the compiler does.
But there are several situations (if I am writing a ray tracer?) where FP determinism is not required in my release build. I was not arguing about removing FP rules from the D compiler, just that there are situations where relaxing those FP rules, on request, doesn't seem to harm. I am not expert about the risks Walter was talking about, so maybe I'm just walking on thin ice (but no one will get hurt if my little raytrcer produces some errors in its images).
You don't come often in this newsgroup, thank you for the link :-)
Bye,
bearophile
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply