On 6 January 2012 16:06, bearophile <bearophileHUGS@lycos.com> wrote:
Manu:

> To make it perform float4 math, or double2 match, you either write the
> pseudo assembly you want directly, but more realistically, you use the
> __float4 type supplied in the standard library, which will already
> associate all the float4 related functionality, and try and map it across
> various architectures as efficiently as possible.

I see. While you design, you need to think about the other features of D :-) Is it possible to mix CPU SIMD with D vector ops?

__float4[10] a, b, c;
c[] = a[] + b[];

I don't see any issue with this. An array of vectors makes perfect sense, and I see no reason why arrays/slices/etc of hardware vectors should be any sort of problem.
This particular expression should be just as efficient as if it were an array of flat floats, especially if the compiler unrolls it.

D's array/slice syntax is something I'm very excited about actually in conjunction with hardware vectors. I could do some really elegant geometry processing with slices from vertex streams.