On 6 February 2012 17:13, a <a@a.com> wrote:
True, I have only been working in x86 GDC so far, but I just wanted to get
feedback about my approach and API design at this point.
It seems there are no serious objections, I'll continue as is.

I have one proposal about API design of matrix operations. Maybe there could be functions that would take row vectors as parameters in addition to those that take matrix structs. That way one could call matrix functions on data that isn't stored as matrix structures without copying. So for example for the transpose function there would also be a function that would be used like this (a* are inputs and r* are outputs):

transpose(aX, aY, aZ, aW, rX, rY, rZ, rW);

... the problem is, without multiple return values (come on, D should have multiple return values!), how do you return the result? :)
 
Maybe those functions could be used to implement the functions that take and return structs.

Yes... I've been pondering how to do this properly for ages actually. That's the main reason I haven't fleshed out any matrix functions yet; I'm still not at all sold on how to represent the matrices.
Ideally, there should not be any memory access. But even if they pass by ref/pointer, as soon as the function is inlined, the memory access will disappear, and it'll effectively generate the same code...

So the problem is not so much with respect to THIS API, but with respect to the matrix calling convention in general...

I also think that interleave and deinterleave operations would be useful. For four element float vectors those can be implemented with only one instruction at least for SSE (using unpcklps, unpckhps and shufps) and  NEON (using vuzp and vzip).

Sure. I wasn't sure how useful they were in practise... I didn't want to load it with countless silly permutation routines so I figured I'll add them by request, or as they are proven useful in real world apps.
What would you typically do with the interleave functions at a high level? Sure you don't just use it as a component behind a few actually useful functions which should be exposed instead?

I have an
ARM compiler too now, so I'll be implementing/testing against that as
reference also.

Could you please tell me how did you get the ARM compiler to work?

I did not.. It was the work of another fine chap in the gdc newsgroup ;)