On 6 January 2012 11:04, Andrew Wiley <wiley.andrew.j@gmail.com> wrote:
On Fri, Jan 6, 2012 at 2:43 AM, Walter Bright
<newshound2@digitalmars.com> wrote:
> On 1/5/2012 5:42 PM, Manu wrote:
>>
>> So I've been hassling about this for a while now, and Walter asked me to
>> pitch
>> an email detailing a minimal implementation with some initial thoughts.
>
>
> Takeaways:
>
> 1. SIMD behavior is going to be very machine specific.
>
> 2. Even trying to do something with + is fraught with peril, as integer adds
> with SIMD can be saturated or unsaturated.
>
> 3. Trying to build all the details about how each of the various adds and
> other ops work into the compiler/optimizer is a large undertaking. D would
> have to support internally maybe a 100 or more new operators.
>
> So some simplification is in order, perhaps a low level layer that is fairly
> extensible for new instructions, and for which a library can be layered over
> for a more presentable interface. A half-formed idea of mine is, taking a
> cue from yours:
>
> Declare one new basic type:
>
>    __v128
>
> which represents the 16 byte aligned 128 bit vector type. The only
> operations defined to work on it would be construction and assignment. The
> __ prefix signals that it is non-portable.
>
> Then, have:
>
>   import core.simd;
>
> which provides two functions:
>
>   __v128 simdop(operator, __v128 op1);
>   __v128 simdop(operator, __v128 op1, __v128 op2);
>
> This will be a function built in to the compiler, at least for the x86.
> (Other architectures can provide an implementation of it that simulates its
> operation, but I doubt that it would be worth anyone's while to use that.)
>
> The operators would be an enum listing of the SIMD opcodes,
>
>    PFACC, PFADD, PFCMPEQ, etc.
>
> For:
>
>    z = simdop(PFADD, x, y);
>
> the compiler would generate:
>
>    MOV z,x
>    PFADD z,y
>

Would this tie SIMD support directly to x86/x86_64, or would it
possible to also support NEON on ARM (also 128 bit SIMD, see
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0409g/index.html
) ?
(Obviously not for DMD, but if the syntax wasn't directly tied to
x86/64, GDC and LDC could support this)
It seems like using a standard naming convention instead of directly
referencing instructions could let the underlying SIMD instructions
vary across platforms, but I don't know enough about the technologies
to say whether NEON's capabilities match SSE closely enough that they
could be handled the same way.

The underlying architectures are too different to try and map opcodes across architectures.
__v128 should map to each architecutres native SIMD type, allowing for the compiler to express the hardware, but the opcodes would come from architecture specific opcodes available in each compiler.

As I keep suggesting, LIBRARIES would be created to supply the types like float4, int4, etc, which may also use version() liberally behind the scenes to support all architectures, allowing a common and efficient API for all architectures at this level.