On 5 January 2012 02:42, bearophile <bearophileHUGS@lycos.com> wrote:

Manu:

> I'm not referring to vector OPERATIONS. I only refer to the creation of a
> type to identify these registers...

Please, try to step back a bit and look at this problem from a bit more distance. D has vector operations, and so far they have received only a tiny amount of love. Are you able to find some ways to solve some of your problems using a hypothetical much better implementation of D vector operations? Please, think about the possibilities of this syntax.

Think about future CPU evolution with SIMD registers 128, then 256, then 512, then 1024 bits long. In theory a good compiler is able to use them with no changes in the D code that uses vector operations.

These are all fundamentally different types, like int and long.. float and double... and I certainly want a keyword to identify each of them. Even if the compiler is trying to make auto vector optimisations, you can't deny programmers explicit control to the hardware when they want/need it.

Look at x86 compilers, been TRYING to perform automatic SSE optimisations for 10 years, with basically no success... do you really think you can do better then all that work by microsoft and GCC?

In my experience, I've even run into a lot of VC's auto-SSE-ed code that is SLOWER than the original float code.

Let's not even mention architectures that receive much less love than x86, and are arguably more important (ARM; slower, simpler processors with more demand to perform well, and not waste power)

Also, D is NOT a good compiler, it's a rubbish compiler with respect to code generation. And with a community so small, it has no hope of becoming a 'good' compiler any time soon.. Even C/C++ compilers that have been around for decades used by millions have been promising optimisations that are still not available, and the ones that are come at the expense of decades of smart engineers on huge paycheques.

Intrinsics are an additive change, adding them later is possible. But I think fixing the syntax of vector ops is more important. I have some bug reports in Bugzilla about vector ops that are sleeping there since two years or so, and they are not about implementation performance.

Vector ops and SIMD ops are different things. float[4] (or more realistically, float[3]) should NOT be a candidate for automatic SIMD implementation, likewise, simd_type should not have its components individually accessible. These are operations the hardware can not actually perform. So no syntax to worry about, just a type.

I think the good Hara will be able to implement those syntax fixes in a matter of just one day or very few days if a consensus is reached about what actually is to be fixed in D vector ops syntax.

Instead of discussing about *adding* something (register intrinsics) I suggest to discuss about what to fix about the *already present* vector op syntax. This is not a request to just you Manu, but to this whole newsgroup.

And I think this is exactly the wrong approach. A vector is NOT an array of 4 (actually, usually 3) floats. It should not appear as one. This is overly complicated and ultimately wrong way to engage this hardware.

Imagine the complexity in the compiler to try and force float[4] operations into vector arithmetic vs adding a 'v128' type which actually does what people want anyway...

SIMD units are not float units, they should not appear like an aggregation of float units. They have:

* Different error semantics, exception handling rules, sometimes different precision...

* Special alignment rules.

* Special literal expression/assignment.

* You can NOT access individual components at will.

* May be reinterpreted at any time as float[1] float[4] double[2] short[8] char[16], etc... (up to the architecture intrinsics)

* Can not be involved in conventional comparison logic (array of floats would make you think they could)

*** Can NOT interact with the regular 'float' unit... Vectors as an array of floats certainly suggests that you can interact with scalar floats...

I will use architecture intrinsics to operate on these regs, and put that nice and neatly behind a hardware vector type with version()'s for each architecture, and an API with a whole lot of sugar to make them nice and friendly to use.

My argument is that even IF the compiler some day attempts to make vector optimisations to float[4] arrays, the raw hardware should be exposed first, and allow programmers to use it directly. This starts with a language defined (platform independant) v128 type.