Jump to page: 1 2
Thread overview
Vector performance
Jan 10, 2012
Manu
Jan 10, 2012
bearophile
Jan 10, 2012
Manu
Jan 10, 2012
Walter Bright
Jan 11, 2012
F i L
Jan 11, 2012
Manu
Jan 11, 2012
F i L
Jan 11, 2012
Manu
Jan 12, 2012
F i L
Jan 12, 2012
Walter Bright
Jan 12, 2012
Manu
Jan 12, 2012
Iain Buclaw
Jan 13, 2012
Marco Leise
Jan 13, 2012
Iain Buclaw
Jan 13, 2012
Marco Leise
Jan 13, 2012
simendsjo
January 10, 2012
Just thought I might share a real-life case study today. Been a lot of talk of SIMD stuff, some people might be interested.

Working on an android product today, I noticed the matrix library was
burning a ridiculous amount of our frame time.
The disassembly looked like pretty normal ARM float code, so rewriting a
couple of the key routines to use the VFPU (carefully), our key device
moved from 19fps -> 34fps (limited at 30, we can now ship).
GalaxyS 2 is now running at 170fps, and devices we previously considered
un-viable can now actually get a release! .. Most devices saw around 25-45%
speed improvement.

Imagine if all vector code throughout was using the vector hardware nicely,
and not just one or 2 key functions...
Getting the API right (intuitively encouraging proper usage and disallowing
inefficient operations), it'll make a big difference!


January 10, 2012
Manu:

> Imagine if all vector code throughout was using the vector hardware nicely, and not just one or 2 key functions...

Is Walter adding types/ops for 256 bit YMM registers too? (AVX2 is not here yet, but AVX is).

Bye,
bearophile
January 10, 2012
On 10 January 2012 16:31, bearophile <bearophileHUGS@lycos.com> wrote:

> Manu:
>
> > Imagine if all vector code throughout was using the vector hardware
> nicely,
> > and not just one or 2 key functions...
>
> Is Walter adding types/ops for 256 bit YMM registers too? (AVX2 is not
> here yet, but AVX is).
>

Eventually.
I don't think we need to do that until we have gotten the API right though.


January 10, 2012
On 1/10/2012 6:39 AM, Manu wrote:
> On 10 January 2012 16:31, bearophile <bearophileHUGS@lycos.com
> <mailto:bearophileHUGS@lycos.com>> wrote:
>
>     Manu:
>
>      > Imagine if all vector code throughout was using the vector hardware nicely,
>      > and not just one or 2 key functions...
>
>     Is Walter adding types/ops for 256 bit YMM registers too? (AVX2 is not here
>     yet, but AVX is).
>
>
> Eventually.
> I don't think we need to do that until we have gotten the API right though.

Right. We'll see how the 128 bit SIMD works out before doing the work to extend it.
January 11, 2012
On Tuesday, 10 January 2012 at 14:14:41 UTC, Manu wrote:
> Just thought I might share a real-life case study today. Been a lot of talk
> of SIMD stuff, some people might be interested.
>
> Working on an android product today, I noticed the matrix library was
> burning a ridiculous amount of our frame time.
> The disassembly looked like pretty normal ARM float code, so rewriting a
> couple of the key routines to use the VFPU (carefully), our key device
> moved from 19fps -> 34fps (limited at 30, we can now ship).
> GalaxyS 2 is now running at 170fps, and devices we previously considered
> un-viable can now actually get a release! .. Most devices saw around 25-45%
> speed improvement.
>
> Imagine if all vector code throughout was using the vector hardware nicely,
> and not just one or 2 key functions...
> Getting the API right (intuitively encouraging proper usage and disallowing
> inefficient operations), it'll make a big difference!

Wow, impressive difference.

In the future, how will [your idea of] D's SIMD vector libraries effect my math libraries? Will I simply replace:

   struct Vector4(T) {
       T x, y, z, w;
   }

with something like:

   struct Vector4(T) {
       __vector(T[4]) values;
   }

or will std.simd automatically provide a full range of vector operations (normalize, dot, cross, etc) like mono.simd? I can't help but hope for the latter, even if it does make my current efforts redundant, it would defiantly be a benefit to future D pioneers.
January 11, 2012
On 11 January 2012 02:47, F i L <witte2008@gmail.com> wrote:

> On Tuesday, 10 January 2012 at 14:14:41 UTC, Manu wrote:
>
>> Just thought I might share a real-life case study today. Been a lot of
>> talk
>> of SIMD stuff, some people might be interested.
>>
>> Working on an android product today, I noticed the matrix library was
>> burning a ridiculous amount of our frame time.
>> The disassembly looked like pretty normal ARM float code, so rewriting a
>> couple of the key routines to use the VFPU (carefully), our key device
>> moved from 19fps -> 34fps (limited at 30, we can now ship).
>> GalaxyS 2 is now running at 170fps, and devices we previously considered
>> un-viable can now actually get a release! .. Most devices saw around
>> 25-45%
>> speed improvement.
>>
>> Imagine if all vector code throughout was using the vector hardware
>> nicely,
>> and not just one or 2 key functions...
>> Getting the API right (intuitively encouraging proper usage and
>> disallowing
>> inefficient operations), it'll make a big difference!
>>
>
> Wow, impressive difference.
>
> In the future, how will [your idea of] D's SIMD vector libraries effect my math libraries? Will I simply replace:
>
>   struct Vector4(T) {
>       T x, y, z, w;
>   }
>
> with something like:
>
>   struct Vector4(T) {
>       __vector(T[4]) values;
>   }
>

This is too simple an example, but yes that's basically the idea. Have some code of more complex operations?


> or will std.simd automatically provide a full range of vector operations (normalize, dot, cross, etc) like mono.simd? I can't help but hope for the latter, even if it does make my current efforts redundant, it would defiantly be a benefit to future D pioneers.
>

Yes the lib would supply standard operations, probably even a matrix type or 2.


January 11, 2012
Manu wrote:
> Yes the lib would supply standard operations, probably even a matrix type or 2.

Okay cool. That's basically what I wanted to know. However, I'm still wondering exactly how flexible these libraries will be.

> Have some code of more complex operations?

My main concern is with my "transition" objects. Example:

   struct Transition(T) {
       T value, start, target;
       alias value this;

       void update(U)(U iteration) {
           value = start + ((target - start) * iteration);
       }
   }


   struct Vector4(T) {
       T x, y, z, w;

       auto abs() { ... }
       auto dot() { ... }
       auto norm() { ... }
       // ect...

       static if (isTransition(T)) {
           void update(U)(U iteration) {
               x.update(iteration);
               y.update(iteration);
               z.update(iteration);
               w.update(iteration);
           }
       }
   }


   void main() {
       // Simple transition vector
       auto tranVec = Transition!(Vector4!float)();
       tranVec.target = {50f, 36f}
       tranVec.update(0.5f);

       // Or transition per channel
       auto vecTran = Vector4!(Transition!float)();
       vecTran.x.target = 50f;
       vecTran.y.target = 36f;
       vecTran.update();
   }

I could make a free function "auto Linear(U)(U start, U target)" but it's but best to keep things in object oriented containers, IMO. I've illustrated a simple linear transition here, but the goal is to make many different transition types: Bezier, EaseIn, Circular, Bounce, etc and continuous/physics one like: SmoothLookAt, Giggly, Shaky, etc.

My matrix code also looks something like:

   struct Matrix4(T)
    if (isVector(T) || isTransitionOfVector(T)) {
       T x, y, z, w;
   }

So Transitions potentially work with matrices in some areas. I'm still new to Quarternion math, but I'm guessing these might be able to apply there as well.

So my main concern is how SIMD will effect this sort of flexibility, or if I'm going to have to rethink my whole model here to accommodate SSE operations. SIMD is usually 128 bit right? So making a Vector4!double doesn't really work... unless it was something like:

   struct Vector4(T) {
       version (SIMD_128) {
           static if (T.sizeof == 32) {
               __v128 xyzw;
           }
           else if (T.sizeof == 64) {
               __v128 xy;
               __v128 zw;
           }
       }
       version (SIMD_256) {
           // ...
       }
   }

Of course, that would obviously complicate the method code quite a bit. IDK, your thoughts?
January 11, 2012
On 12 January 2012 01:15, F i L <witte2008@gmail.com> wrote:

> Manu wrote:
>
>> Yes the lib would supply standard operations, probably even a matrix type or 2.
>>
>
> Okay cool. That's basically what I wanted to know. However, I'm still wondering exactly how flexible these libraries will be.


Define 'flexible'?
Probably not very flexible, they will be fast!


> Have some code of more complex operations?
>>
>
> My main concern is with my "transition" objects. Example:
>
>   struct Transition(T) {
>       T value, start, target;
>       alias value this;
>
>       void update(U)(U iteration) {
>           value = start + ((target - start) * iteration);
>
>       }
>   }
>
>
>   struct Vector4(T) {
>       T x, y, z, w;
>
>       auto abs() { ... }
>       auto dot() { ... }
>       auto norm() { ... }
>       // ect...
>
>       static if (isTransition(T)) {
>           void update(U)(U iteration) {
>               x.update(iteration);
>               y.update(iteration);
>               z.update(iteration);
>               w.update(iteration);
>           }
>       }
>   }
>
>
>   void main() {
>       // Simple transition vector
>       auto tranVec = Transition!(Vector4!float)();
>       tranVec.target = {50f, 36f}
>       tranVec.update(0.5f);
>
>       // Or transition per channel
>       auto vecTran = Vector4!(Transition!float)();
>       vecTran.x.target = 50f;
>       vecTran.y.target = 36f;
>       vecTran.update();
>   }
>
> I could make a free function "auto Linear(U)(U start, U target)" but it's but best to keep things in object oriented containers, IMO. I've illustrated a simple linear transition here, but the goal is to make many different transition types: Bezier, EaseIn, Circular, Bounce, etc and continuous/physics one like: SmoothLookAt, Giggly, Shaky, etc.
>

I don't see any problem here. This looks trivial. It depends on basically
nothing, it might even work with what Walter has already added, and no libs
:)
I think the term 'iteration' is a bit ugly/misleading though, it should be
't' or 'time'.


My matrix code also looks something like:
>
>   struct Matrix4(T)
>    if (isVector(T) || isTransitionOfVector(T)) {
>
>       T x, y, z, w;
>   }
>
> So Transitions potentially work with matrices in some areas. I'm still new to Quarternion math, but I'm guessing these might be able to apply there as well.
>

I would probably make a transition of matrices, rather than a matrix of vector transitions (so you can get references to the internal matrices)... but aside from that, I don't see any problems here either.


So my main concern is how SIMD will effect this sort of flexibility, or if
> I'm going to have to rethink my whole model here to accommodate SSE operations. SIMD is usually 128 bit right? So making a Vector4!double doesn't really work... unless it was something like:
>
>   struct Vector4(T) {
>       version (SIMD_128) {
>           static if (T.sizeof == 32) {
>               __v128 xyzw;
>           }
>           else if (T.sizeof == 64) {
>               __v128 xy;
>               __v128 zw;
>           }
>       }
>       version (SIMD_256) {
>           // ...
>       }
>   }
>
> Of course, that would obviously complicate the method code quite a bit. IDK, your thoughts?
>

I think that is also possible if that's what you want to do, and I see no reason why any of these constructs wouldn't be efficient (or supported). You can probably even try it out now with what Walter has already done...


January 12, 2012
Manu wrote:
> Define 'flexible'?
> Probably not very flexible, they will be fast!

Flexible as in my examples.


> I think the term 'iteration' is a bit ugly/misleading though, it should be
> 't' or 'time'.

I've tried to come up with a better term. I guess the logic behind 'iteration' (which i got from someone else) is that an iteration of 2 gives you a value of two distances from start to target. Whereas 'time' (or 't') could imply any measurement, eg, seconds or hours. Maybe 'tween', as in between? idk, i'll keep looking.


> I would probably make a transition of matrices, rather than a matrix of
> vector transitions (so you can get references to the internal matrices)...

Well the idea is you can have both. You could even have a:

   Vector2!(Transition!(Vector4!(Transition!float))) // headache
   or something more practical...

   Vector4!(Vector4!float) // Matrix4f
   Vector4!(Transition!(Vector4!float)) // Smooth Matrix4f

Or anything like that. I should point out that my example didn't make it clear that a Matrix4!(Transition!float) would be pointless compared to Transition!(Matrix4!float) unless each Transition held it's own iteration value. Example:

   struct Transition(T, bool isTimer = false) {
       T value, start, target;
       alias value this;

       static if (isTimer) {
           float time, speed;

           void update() {
               time += speed;
               value = start + ((target - start) * time);
           }
       }
   }

That way each channel could update on it's own time frame. There may even be a way to have each channel be it's own separate Transition type. Which could be interesting. I'm still playing with possibilities.


> I think that is also possible if that's what you want to do, and I see no
> reason why any of these constructs wouldn't be efficient (or supported).
> You can probably even try it out now with what Walter has already done...

Cool, I was unaware Walter had begun implementing SIMD operations. I'll have to build DMD and test them out. What's the syntax like right now?

I was under the impression you would be helping him here, or that you would be building the SIMD-based math libraries. Or something like that. That's why I was posting my examples in question to how the std.simd lib would compare.
January 12, 2012
On 1/11/2012 4:46 PM, F i L wrote:
>> I think that is also possible if that's what you want to do, and I see no
>> reason why any of these constructs wouldn't be efficient (or supported).
>> You can probably even try it out now with what Walter has already done...
>
> Cool, I was unaware Walter had begun implementing SIMD operations. I'll have to
> build DMD and test them out. What's the syntax like right now?

It's not ready yet. Give me some more time ;-)
« First   ‹ Prev
1 2