January 06, 2012
On 1/6/12 1:11 PM, Walter Bright wrote:
> On 1/6/2012 10:25 AM, Brad Roberts wrote:
>> How is making __v128 a builtin type better than defining it as:
>>
>> align(16) struct __v128
>> {
>> ubyte[16] data;
>> }
>
> Then the back end knows it should be mapped onto the XMM registers
> rather than the usual arithmetic set.

If it's possible, then it would be great to express the new constructs within the existing language (optionally by leaving it to the implementation to strengthen guarantees of certain constructs).

I very warmly recommend avoiding defining things in the language and compiler wherever the same is possible within a library (however non-portable). Confining features to the language/compiler drastically reduces the number of people that can work on them.


Andrei
January 06, 2012
On 6 January 2012 22:36, Walter Bright <newshound2@digitalmars.com> wrote:

>    To me, the advantage of making the SIMD types typed are:
>>
>>    1. the language does typechecking, for example, trying to add a vector
>> of 4
>>    floats to 16 bytes would be (and should be) an error.
>>
>>
>> The language WILL do that checking as soon as we create the strongly typed libraries. And people will use those libraries, they'll never touch the primitive type.
>>
>
> I'm not so sure this will work out satisfactorily.
>

How so, can you support this theory?


>    2. Some of the SIMD operations do map nicely onto the operators, so one
>>    could write:
>>
>>       a = b + c + -d;
>>
>>
>> This is not even true, as you said yourself in a previous post. SIMD int ops may wrap, or saturate... which is it?
>>
>
> It would only be for those ops that actually do map onto the D operators. (This is already done by the library implementation of the array arithmetic operations.) The saturated int ops would not be usable this way.


But why are you against adding this stuff in the library? It's contrary to the general sentiment around here where people like putting stuff in libraries where possible. It's less committing, and allows alternative implementations if desired.

Don't try and express this at the language level. Let the libraries do it,
>> and
>> if they fail, or are revealed to be poorly defined, they can be
>> updated/changed.
>>
>
> Doing it as a library type pretty much prevents certain optimizations, for example, the fused operations, from being expressed using infix operators.
>

You're talking about MADD? I was going to make a separate suggestion
regarding that actually.
Multiply-add is a common concept, often available to FPU's aswell (and no
way to express it)... I was going to suggest an opMultiplyAdd() operator,
which you could have the language call if it detects a conforming
arrangement of * and + operators on a type. This would allow operator
access to madd in library vectors too.

   And, of course, casting would be allowed and would be zero cost.
>>
>> Zero cost? You're suggesting all casts would be reinterprets? Surely:
>> float4
>> fVec = (float4)intVec; should perform a type conversion?
>> Again, this is detail that can/should be discussed when implementing the
>> standard library, leave this sort of problem out of the language.
>>
>
> Painting a new type (i.e. reinterpret casts) do have zero runtime cost to them. I don't think it's a real problem - we do it all the time when, for example, we want to retype an int as a uint:
>
>   int i;
>   uint u = cast(uint)i;


Yeah sure, but I don't think that's fundamentally correct, if you're drifting towards typing these things in the language, then you should also start considering cast mechanics... and that's a larger topic of debate. I don't really think "float4 floatVec = (float4)intVec;" should be a reinterpret... surely, as a high level type, this should perform a type conversion?

I'm afraid this is become a lot more complicated than it needs to be.
Can you illustrate your current thoughts/plan, to have it summarised in one
place. Has it drifted from what you said last night?


January 06, 2012
On 6 January 2012 22:40, Martin Nowak <dawg@dawgfoto.de> wrote:

> On Fri, 06 Jan 2012 20:00:15 +0100, Manu <turkeyman@gmail.com> wrote:
>
>  On 6 January 2012 20:17, Martin Nowak <dawg@dawgfoto.de> wrote:
>>
>>  There is another benefit.
>>> Consider the following:
>>>
>>> __vec128 addps(__vec128 a, __vec128 b) pure
>>> {
>>>   __vec128 res = a;
>>>
>>>   if (__ctfe)
>>>   {
>>>       foreach(i; 0 .. 4)
>>>          res[i] += b[i];
>>>   }
>>>   else
>>>   {
>>>       asm (res, b)
>>>
>>>       {
>>>           addps res, b;
>>>       }
>>>   }
>>>   return res;
>>>
>>> }
>>>
>>>
>> You don't need to use inline ASM to be able to do this, it will work the
>> same with intrinsics.
>> I've detailed numerous problems with using inline asm, and complications
>> with extending the inline assembler to support this.
>>
>>  Don't get me wrong here. The idea is to find out if intrinsics
> can be build with the help of inlineable asm functions.
> The ctfe support is one good reason to go with a library solution.


/agree, this is a nice argument to support putting it in libraries.


> Most compilers can't reschedule code around inline asm blocks. There are a
>> lot of reasons for this, google can help you.
>> The main reason is that a COMPILER doesn't attempt to understand the
>> assembly it's being asked to insert inline. The information that it may
>> use
>>
> It doesn't have to understand the assembly.
> Wrapping these in functions creates an IR expression with inputs and
> outputs.
> Declaring them as pure gives the compiler free hands to apply whatever
> optimizations he does normally on an IR tree.
> Common subexpressions elimination, removing dead expressions...


These functions shouldn't be functions... if they're not all inlined, then
the implementation is broken.
Once you inline all these micro asm blocks; 100 small asm blocks inlined in
a single function, you're making a very hard time for the optimiser.


> Same problem as above. The compiler would need to understand enough about
>> assembly to perform optimisation on the assembly its self to clean this
>> up.
>> Using intrinsics, all the register allocation, load/store code, etc, is
>> all
>> in the regular realm of compiling the language, and the code generation
>> and
>> optimisation will all work as usual.
>>
>>  There is no informational difference between the intrinsic
>
> __m128 _mm_add_ps(__m128 a, __m128 b);
>
> and an inline assembler version
>

There is actually. To the compiler, the intrinsic is a normal function,
with some hook in the code generator to produce the appropriate opcode when
it's performing actual code generation.
On most compilers, the inline asm on the other hand, is unknown to the
compiler, the optimiser can't do much anymore, because it doesn't know what
the inline asm has done, and the code generator just goes and pastes your
asm code inline where you told it to. It doesn't know if you've written to
aliased variables, called functions, etc.. it can no longer safely
rearrange code around the inline asm block.. which means it's not free to
pipeline the code efficiently.

So the argument here is that intrinsics in D can easier be
> mapped to existing intrinsics in GCC?
> I do understand that this will be pretty difficult for GDC
> to implement.
> Reminds me that Walter has stated several times how much
> better an internal assembler can integrate with the language.


Basically yes.


January 06, 2012
On 6 January 2012 23:23, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>wrote:

> On 1/6/12 1:11 PM, Walter Bright wrote:
>
>> On 1/6/2012 10:25 AM, Brad Roberts wrote:
>>
>>> How is making __v128 a builtin type better than defining it as:
>>>
>>> align(16) struct __v128
>>> {
>>> ubyte[16] data;
>>> }
>>>
>>
>> Then the back end knows it should be mapped onto the XMM registers rather than the usual arithmetic set.
>>
>
> If it's possible, then it would be great to express the new constructs within the existing language (optionally by leaving it to the implementation to strengthen guarantees of certain constructs).
>

Now you're at odds with Walter's new take on it.. He seems to have changed his mind and decided library implementation of the complex/strict types is a bad idea now..?


> I very warmly recommend avoiding defining things in the language and compiler wherever the same is possible within a library (however non-portable). Confining features to the language/compiler drastically reduces the number of people that can work on them.


Aye, and my proposal requests only the minimum support required from the
language, allowing libraries to do the rest.
For some reason Walter seems to have done a bit of a 180 in the last few
hours ;)


January 06, 2012
On 6 January 2012 19:53, Manu <turkeyman@gmail.com> wrote:
> On 6 January 2012 21:34, Walter Bright <newshound2@digitalmars.com> wrote:
>>
>> On 1/6/2012 11:08 AM, Manu wrote:
>>>
>>> I think we should take this conversation to IRC, or a separate thread?
>>> I'll generate some examples from VC for you in various situations. If you
>>> can
>>> write me a short list of trouble cases as you see them, I'll make sure to
>>> address them specifically...
>>> Have you tested the code that GCC produces? I'm sure it'll be identical
>>> to VC...
>>
>>
>> What I'm going to do is make the SIMD stuff work on 64 bits for now. The alignment problem is solved for it, and is an orthogonal issue.
>
>
> ...I'm using DMD on windows... x32. So this isn't ideal ;)
> Although with this change, Iain should be able to expose the vector types in
> GDC, and I can work from there, and hopefully even build an ARM/PPC
> toolchain to experiment with the library in a cross platform environment.
>

And will also allow me to tap into many vector intrinsics that gcc offers too via the gcc.builtins; module. :)


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
January 06, 2012
On 6 January 2012 23:59, Iain Buclaw <ibuclaw@ubuntu.com> wrote:

> On 6 January 2012 19:53, Manu <turkeyman@gmail.com> wrote:
> > ...I'm using DMD on windows... x32. So this isn't ideal ;) Although with this change, Iain should be able to expose the vector
> types in
> > GDC, and I can work from there, and hopefully even build an ARM/PPC toolchain to experiment with the library in a cross platform environment.
> >
>
> And will also allow me to tap into many vector intrinsics that gcc offers too via the gcc.builtins; module. :)


Huzzah! ... Like what?


January 06, 2012
On 6 January 2012 22:37, Manu <turkeyman@gmail.com> wrote:
> On 6 January 2012 23:59, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
>>
>> On 6 January 2012 19:53, Manu <turkeyman@gmail.com> wrote:
>> > ...I'm using DMD on windows... x32. So this isn't ideal ;)
>> > Although with this change, Iain should be able to expose the vector
>> > types in
>> > GDC, and I can work from there, and hopefully even build an ARM/PPC
>> > toolchain to experiment with the library in a cross platform
>> > environment.
>> >
>>
>> And will also allow me to tap into many vector intrinsics that gcc offers too via the gcc.builtins; module. :)
>
>
> Huzzah! ... Like what?

For backend intrinsics, they are all functions that map to asm instructions of the same name, ie: __builtin_ia32_addps.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
January 06, 2012
On Friday, 6 January 2012 at 19:53:52 UTC, Manu wrote:
> Iain should be able to expose the vector types in GDC,
> and I can work from there, and hopefully even build an ARM/PPC toolchain to experiment with the library in a cross platform environment.

On Windoze? You're a masochist ^^
January 06, 2012
On 1/6/2012 1:46 PM, Manu wrote:
> For some reason Walter seems to have done a bit of a 180 in the last few hours ;)

It must be the drugs!
January 06, 2012
On 7 January 2012 00:47, Iain Buclaw <ibuclaw@ubuntu.com> wrote:

> On 6 January 2012 22:37, Manu <turkeyman@gmail.com> wrote:
> > On 6 January 2012 23:59, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
> >>
> >> On 6 January 2012 19:53, Manu <turkeyman@gmail.com> wrote:
> >> > ...I'm using DMD on windows... x32. So this isn't ideal ;)
> >> > Although with this change, Iain should be able to expose the vector
> >> > types in
> >> > GDC, and I can work from there, and hopefully even build an ARM/PPC
> >> > toolchain to experiment with the library in a cross platform
> >> > environment.
> >> >
> >>
> >> And will also allow me to tap into many vector intrinsics that gcc offers too via the gcc.builtins; module. :)
> >
> >
> > Huzzah! ... Like what?
>
> For backend intrinsics, they are all functions that map to asm instructions of the same name, ie: __builtin_ia32_addps.


Ah yeah, perfect.. obviously we need all of those for this vector type to be of any use at all ;)