August 06, 2011
Iain Buclaw:

> 1) using pointers over dynamic arrays. (5% speedup)
> 2) removing the calls to CalVector4's constructor (5.7% speedup)

With DMD I have seen 180k -> 190k vertices/sec replacing this:

struct CalVector4 {
    float X, Y, Z, W;

    this(float x, float y, float z, float w = 0.0f) {
        X = x;
        Y = y;
        Z = z;
        W = w;
    }
}

With:

struct CalVector4 {
    float X, Y, Z, W=0.0f;
}

I'd like the D compiler to optimize better there.



> http://ideone.com/4PP2D

This line of code is not good:
auto vertices = cast(Vertex *) new Vertex[N];

This is much better, it's less bug-prone, simpler and shorter:
auto vertices = (new Vertex[N]).ptr;

But in practice in this program it is enough to allocate dynamic arrays normally, and then perform the call like this (with DMD it gives the same performance):
calculateVerticesAndNormals(boneTransforms.ptr, N, vertices.ptr, influences.ptr, output.ptr);

I don't know why passing pointers gives some more performance here, compared to passing dynamic arrays (but I have seen the same behaviour in other D programs of mine).

Bye,
bearophile
August 06, 2011
On 8/6/2011 3:19 PM, bearophile wrote:
> I don't know why passing pointers gives some more performance here, compared
> to passing dynamic arrays (but I have seen the same behaviour in other D
> programs of mine).

A dynamic array is two values being passed, a pointer is one.
August 06, 2011
== Quote from bearophile (bearophileHUGS@lycos.com)'s article
> Iain Buclaw:
> > 1) using pointers over dynamic arrays. (5% speedup)
> > 2) removing the calls to CalVector4's constructor (5.7% speedup)
> With DMD I have seen 180k -> 190k vertices/sec replacing this:
> struct CalVector4 {
>     float X, Y, Z, W;
>     this(float x, float y, float z, float w = 0.0f) {
>         X = x;
>         Y = y;
>         Z = z;
>         W = w;
>     }
> }
> With:
> struct CalVector4 {
>     float X, Y, Z, W=0.0f;
> }
> I'd like the D compiler to optimize better there.
> > http://ideone.com/4PP2D
> This line of code is not good:
> auto vertices = cast(Vertex *) new Vertex[N];
> This is much better, it's less bug-prone, simpler and shorter:
> auto vertices = (new Vertex[N]).ptr;
> But in practice in this program it is enough to allocate dynamic arrays
normally, and then perform the call like this (with DMD it gives the same
performance):
> calculateVerticesAndNormals(boneTransforms.ptr, N, vertices.ptr, influences.ptr,
output.ptr);

I was playing about with heap vs stack. Must've forgot to remove that, sorry. :)

Anyways, I've tweaked the GDC codegen, and program speed meets that of C++ now (on
my system).

Implementation: http://ideone.com/0j0L1

Command-line:
gdc -O3 -mfpmath=sse -ffast-math -march=native -frelease
g++ bench.cc -O3 -mfpmath=sse -ffast-math -march=native

Best times:
G++-32bit:  11400000 vps
GDC-32bit:  11350000 vps


Regards
Iain
August 06, 2011
Walter:

> A dynamic array is two values being passed, a pointer is one.

I know, but I think there are many optimization opportunities. An example:


private void foo(int[] a2) {}
void main() {
    int[100] a1;
    foo(a1);
}


In code like that I think a D compiler is free to compile like this, because foo is private, so it's free to perform optimizations based on just the code inside the module:

private void foo(ref int[100] a2) {}
void main() {
    int[100] a1;
    foo(a1);
}


I think there are several cases where a D compiler is free to replace the two values with just a pointer.


Another example, to optimize code like this:

private void foo(int[] a1, int[] a2) {}
void main() {
    int n = 100; // run-time value
    auto a3 = new int[n];
    auto a4 = new int[n];
    foo(a3, a4);
}


Into something like this:

private void foo(int* a1, int* a2, size_t a1a2len) {}
void main() {
    int n = 100;
    auto a3 = new int[n];
    auto a4 = new int[n];
    foo(a3.ptr, a4.ptr, n);
}

Bye,
bearophile
August 06, 2011
Iain Buclaw:

> Anyways, I've tweaked the GDC codegen, and program speed meets that of C++ now (on
> my system).

Are you willing to explain your changes (and maybe give a link to the changes)? Maybe Walter is interested for DMD too.


> Command-line:
> gdc -O3 -mfpmath=sse -ffast-math -march=native -frelease
> g++ bench.cc -O3 -mfpmath=sse -ffast-math -march=native

In newer versions of GCC -Ofast means -ffast-math too.

Walter is not a lover of that -ffast-math switch.
But I now think that the combination of D strongly pure functions with unsafe FP optimizations offers optimization opportunities that maybe not even GCC is able to use now when it compiles C/C++ code (do you see why?). Not using this opportunity is a waste, in my opinion.

Bye,
bearophile
August 07, 2011
On 8/6/2011 4:46 PM, bearophile wrote:
> Walter is not a lover of that -ffast-math switch.

No, I am not. Few understand the subtleties of IEEE arithmetic, and breaking IEEE conformance is something very, very few should even consider.
August 07, 2011
Walter:

> On 8/6/2011 4:46 PM, bearophile wrote:
> > Walter is not a lover of that -ffast-math switch.
> 
> No, I am not. Few understand the subtleties of IEEE arithmetic, and breaking IEEE conformance is something very, very few should even consider.

I have read several papers about FP arithmetic, but I am not an expert yet on them. Both GDC and LDC have compilation switches to perform those unsafe FP optimizations, so even if you don't like them, most D compilers today have them optional, and I don't think those switches will be removed.

If you want to simulate a flock of boids (http://en.wikipedia.org/wiki/Boids ) on the screen using D, and you use floating point values to represent their speed vector, introducing unsafe FP optimizations will not harm so much. Video games are a significant purpose for D language, and in them FP errors are often benign (maybe some parts of the game are able to tolerate them and some other part of the game needs to be compiled with strict FP semantics).

Bye,
bearophile
August 07, 2011
> Anyways, I've tweaked the GDC codegen, and program speed meets that of C++ now (on my system).
>
> Implementation: http://ideone.com/0j0L1
>
> Command-line:
> gdc -O3 -mfpmath=sse -ffast-math -march=native -frelease
> g++ bench.cc -O3 -mfpmath=sse -ffast-math -march=native
>
> Best times:
> G++-32bit:  11400000 vps
> GDC-32bit:  11350000 vps
>
>
> Regards
> Iain

64Bit:

C++:
45010000
44270000
42740000
43900000
44680000
43490000
42390000

GDC:
42900000
44010000
44000000
44010000
44010000
44000000

GDC with -fno-bounds-check:
43280000
44440000
44420000
44340000
44440000
44450000
August 08, 2011
On 8/6/2011 8:34 PM, bearophile wrote:
> Walter:
>
>> On 8/6/2011 4:46 PM, bearophile wrote:
>>> Walter is not a lover of that -ffast-math switch.
>>
>> No, I am not. Few understand the subtleties of IEEE arithmetic, and breaking
>> IEEE conformance is something very, very few should even consider.
>
> I have read several papers about FP arithmetic, but I am not an expert yet on them. Both GDC and LDC have compilation switches to perform those unsafe FP optimizations, so even if you don't like them, most D compilers today have them optional, and I don't think those switches will be removed.
>
> If you want to simulate a flock of boids (http://en.wikipedia.org/wiki/Boids ) on the screen using D, and you use floating point values to represent their speed vector, introducing unsafe FP optimizations will not harm so much. Video games are a significant purpose for D language, and in them FP errors are often benign (maybe some parts of the game are able to tolerate them and some other part of the game needs to be compiled with strict FP semantics).
>
> Bye,
> bearophile

Floating point determinism can be very important when it comes to reducing network traffic.  If you can achieve it, then you can make sure all players have the same game state and then only send user input commands over the network.

Glenn Fiedler has an interesting writeup on it, but I haven't had a chance to read all of it yet:

http://gafferongames.com/networking-for-game-programmers/floating-point-determinism/
August 08, 2011
Eric Poggel (JoeCoder):

> determinism can be very important when it comes to reducing network traffic.  If you can achieve it, then you can make sure all players have the same game state and then only send user input commands over the network.

It seems a hard thing to obtain, but I agree that it gets useful.

For me having some FP determinism is useful for debugging: to avoid results from changing randomly if I perform a tiny change in the source code that triggers a change in what optimizations the compiler does.

But there are several situations (if I am writing a ray tracer?) where FP determinism is not required in my release build. I was not arguing about removing FP rules from the D compiler, just that there are situations where relaxing those FP rules, on request, doesn't seem to harm. I am not expert about the risks Walter was talking about, so maybe I'm just walking on thin ice (but no one will get hurt if my little raytrcer produces some errors in its images).

You don't come often in this newsgroup, thank you for the link :-)

Bye,
bearophile