January 06, 2012
On 7 January 2012 01:34, Walter Bright <newshound2@digitalmars.com> wrote:

> On 1/6/2012 1:46 PM, Manu wrote:
>
>> For some reason Walter seems to have done a bit of a 180 in the last few hours ;)
>>
>
> It must be the drugs!
>

That's what I was starting to suspect too! :P


January 06, 2012
On 1/6/2012 1:32 PM, Manu wrote:
> On 6 January 2012 22:36, Walter Bright <newshound2@digitalmars.com
> <mailto:newshound2@digitalmars.com>> wrote:
>
>             To me, the advantage of making the SIMD types typed are:
>
>             1. the language does typechecking, for example, trying to add a
>         vector of 4
>             floats to 16 bytes would be (and should be) an error.
>
>
>         The language WILL do that checking as soon as we create the strongly typed
>         libraries. And people will use those libraries, they'll never touch the
>         primitive type.
>
>
>     I'm not so sure this will work out satisfactorily.
>
>
> How so, can you support this theory?

For one thing, the compiler has a very hard time optimizing library implemented types. It's why int, float, etc., are native types. We've come a long way with library types, but there are limits.

>
>             2. Some of the SIMD operations do map nicely onto the operators, so one
>             could write:
>
>                a = b + c + -d;
>
>
>         This is not even true, as you said yourself in a previous post.
>         SIMD int ops may wrap, or saturate... which is it?
>
>
>     It would only be for those ops that actually do map onto the D operators.
>     (This is already done by the library implementation of the array arithmetic
>     operations.) The saturated int ops would not be usable this way.
>
>
> But why are you against adding this stuff in the library? It's contrary to the
> general sentiment around here where people like putting stuff in libraries where
> possible. It's less committing, and allows alternative implementations if desired.
>
>         Don't try and express this at the language level. Let the libraries do
>         it, and
>         if they fail, or are revealed to be poorly defined, they can be
>         updated/changed.
>
>
>     Doing it as a library type pretty much prevents certain optimizations, for
>     example, the fused operations, from being expressed using infix operators.
>
>
> You're talking about MADD? I was going to make a separate suggestion regarding
> that actually.
> Multiply-add is a common concept, often available to FPU's aswell (and no way to
> express it)... I was going to suggest an opMultiplyAdd() operator, which you
> could have the language call if it detects a conforming arrangement of * and +
> operators on a type. This would allow operator access to madd in library vectors
> too.

Detecting a "conforming arrangement" is how native types work! Once you wed the compiler logic to a particular library implementation, it acquires the worst aspects of a native type with the worst aspects of a library type.


> Yeah sure, but I don't think that's fundamentally correct, if you're drifting
> towards typing these things in the language, then you should also start
> considering cast mechanics... and that's a larger topic of debate.
> I don't really think "float4 floatVec = (float4)intVec;" should be a
> reinterpret... surely, as a high level type, this should perform a type conversion?

That's a good point.

> I'm afraid this is become a lot more complicated than it needs to be.
> Can you illustrate your current thoughts/plan, to have it summarised in one
> place.

Support the 10 vector types as basic types, support them with the arithmetic infix operators, and use intrinsics for the rest of the operations. I believe this scheme:

1. will look better in code, and will be easier to use
2. will allow for better error detection and more comprehensible error messages when things are misused
3. will generate better code
4. shouldn't be hard to implement, as I already did most of the work when I did the SIMD support for float and double.

> Has it drifted from what you said last night?

Yes.

January 07, 2012
On 1/6/2012 12:45 PM, Manu wrote:
> Here are some examples of tight interacting between int/float, and interacting
> ON floats with int operations...
> Naturally the examples I present will be wrapped as useful functions in
> libraries, but the primitive type shouldn't try and make this more annoying by
> trying to enforce pointless type safety errors like you seem to be suggesting.

I am suggesting it, no doubt about it!

> In computer graphics it's common to work with float16's, a type not supported by
> simd units. Pack/Unpack code involved detailed float/int interaction.
> You might take a register of floats, then mask the exponent and then perform
> integer arithmetic on the exponent to shift it into the float16 exponent
> range... then you will mask the bottom of the mantissa and shift them into place.
> Unpacking is same process in reverse.
>
> Other tricks with the float sign bits, making everything negative, by or-ing in
> 1's into the top bits. or you can gather the signs using various techniques..
> useful for identifying the cell in a quad-tree for instance.
> Integer manipulation of floats is surprisingly common.

I'm aware of such tricks, and actually do them with the floating point code generation in the compiler back end. I don't think that renders the idea that floats and ints should be different types a bad one.

I'd also argue that such tricks are tricks, and using a reinterpret cast on them makes it clear in the code that you know what you're doing, rather than doing something bizarre like a left shift on a float type.

I've worked a lot with large assembler programs. As you know, EAX has no type. The assembler code would constantly shift the type of things that were in EAX, sometimes a pointer, sometimes an int, sometimes a ushort, sometimes treating a pointer as an int, etc. I can unequivocably state that this typeless approach is confusing, buggy, hard to untangle, and ultimately a freedom that is not justifiable.

Static typing is a big improvement, and having to insert a few reinterpret casts is a good thing, not a detriment.
January 07, 2012
On 1/6/2012 1:43 PM, Manu wrote:
> There is actually. To the compiler, the intrinsic is a normal function, with
> some hook in the code generator to produce the appropriate opcode when it's
> performing actual code generation.
> On most compilers, the inline asm on the other hand, is unknown to the compiler,
> the optimiser can't do much anymore, because it doesn't know what the inline asm
> has done, and the code generator just goes and pastes your asm code inline where
> you told it to. It doesn't know if you've written to aliased variables, called
> functions, etc.. it can no longer safely rearrange code around the inline asm
> block.. which means it's not free to pipeline the code efficiently.

And, in fact, the compiler should not try to optimize inline assembler. The IA is there so that the programmer can hand tweak things without the compiler defeating his attempts.

For example, suppose the compiler schedules instructions for processor X. The programmer writes inline asm to schedule for Y, because the compiler doesn't specifically support Y. The compiler goes ahead and reschedules it for X.

Arggh!

What dmd does do with the inline assembler is it keeps track of which registers are read/written, so that effective register allocation can be done for the non-asm code.
January 07, 2012
On 7 January 2012 01:52, Walter Bright <newshound2@digitalmars.com> wrote:

> On 1/6/2012 1:32 PM, Manu wrote:
>
>> Yeah sure, but I don't think that's fundamentally correct, if you're drifting
>
> towards typing these things in the language, then you should also start
>> considering cast mechanics... and that's a larger topic of debate.
>> I don't really think "float4 floatVec = (float4)intVec;" should be a
>> reinterpret... surely, as a high level type, this should perform a type
>> conversion?
>>
>
> That's a good point.


.. oh god, what have I done. :/

I'm afraid this is become a lot more complicated than it needs to be.
>> Can you illustrate your current thoughts/plan, to have it summarised in
>> one
>> place.
>>
>
> Support the 10 vector types as basic types, support them with the arithmetic infix operators, and use intrinsics for the rest of the operations. I believe this scheme:
>
> 1. will look better in code, and will be easier to use
> 2. will allow for better error detection and more comprehensible error
> messages when things are misused
> 3. will generate better code
> 4. shouldn't be hard to implement, as I already did most of the work when
> I did the SIMD support for float and double.
>
>
>  Has it drifted from what you said last night?
>>
>
> Yes.
>

Okay, I'm very worried at this point. Please don't just do this...
There are so many details and gotchas in what you suggest. I couldn't feel
comfortable short of reading a thorough proposal.

Come on IRC? This requires involved conversation.

I'm sure you realise how much more work this is...
Why would you commit to this right off the bat? Why not produce the simple
primitive type, and allow me the opportunity to try it with the libraries
before polluting the language its self with a massive volume of stuff...
I'm genuinely concerned that once you add this to the language, it's done,
and it'll be stuck there like lots of other debatable features... we can
tweak the library implementation as we gain experience with usage of the
feature.

MS also agree that the primitive __m128 is the right approach. I'm not basing my opinion on their judgement at all, I independently conclude it is the right approach, but it's encouraging that they agree... and perhaps they're a more respectable authority than me and my opinion :)

What I proposed in the OP is the simplest, most non-destructive initial implementation in the language. I think there is the lest opportunity for making a mistake/wrong decision in my initial proposal, and it can be extended with what you're suggesting in time after we have the opportunity to prove that it's correct. We can test and prove the rest with libraries before committing to implement it in the language...


January 07, 2012
On 7 January 2012 02:06, Walter Bright <newshound2@digitalmars.com> wrote:

> On 1/6/2012 1:43 PM, Manu wrote:
>
>> There is actually. To the compiler, the intrinsic is a normal function,
>> with
>> some hook in the code generator to produce the appropriate opcode when
>> it's
>> performing actual code generation.
>> On most compilers, the inline asm on the other hand, is unknown to the
>> compiler,
>> the optimiser can't do much anymore, because it doesn't know what the
>> inline asm
>> has done, and the code generator just goes and pastes your asm code
>> inline where
>> you told it to. It doesn't know if you've written to aliased variables,
>> called
>> functions, etc.. it can no longer safely rearrange code around the inline
>> asm
>> block.. which means it's not free to pipeline the code efficiently.
>>
>
> And, in fact, the compiler should not try to optimize inline assembler. The IA is there so that the programmer can hand tweak things without the compiler defeating his attempts.
>
> For example, suppose the compiler schedules instructions for processor X. The programmer writes inline asm to schedule for Y, because the compiler doesn't specifically support Y. The compiler goes ahead and reschedules it for X.
>
> Arggh!
>
> What dmd does do with the inline assembler is it keeps track of which registers are read/written, so that effective register allocation can be done for the non-asm code.
>

And I agree this is exactly correct for the IA... and also why intrinsics must be used to do this work, not IA.


January 07, 2012
Walter:

> I've worked a lot with large assembler programs. As you know, EAX has no type. The assembler code would constantly shift the type of things that were in EAX, sometimes a pointer, sometimes an int, sometimes a ushort, sometimes treating a pointer as an int, etc. I can unequivocably state that this typeless approach is confusing, buggy, hard to untangle, and ultimately a freedom that is not justifiable.

There is even some desire of a typed assembly. It's not easy to design and implement, but it seems able to avoid some bugs:
http://www.cs.cornell.edu/talc/papers.html

Bye,
bearophile
January 07, 2012
On 1/6/2012 4:15 PM, Manu wrote:
> And I agree this is exactly correct for the IA... and also why intrinsics must
> be used to do this work, not IA.

Yup.
January 07, 2012
On 7 January 2012 02:00, Walter Bright <newshound2@digitalmars.com> wrote:

> On 1/6/2012 12:45 PM, Manu wrote:
>
>> In computer graphics it's common to work with float16's, a type not supported by
>
> simd units. Pack/Unpack code involved detailed float/int interaction.
>> You might take a register of floats, then mask the exponent and then
>> perform
>> integer arithmetic on the exponent to shift it into the float16 exponent
>> range... then you will mask the bottom of the mantissa and shift them
>> into place.
>> Unpacking is same process in reverse.
>>
>> Other tricks with the float sign bits, making everything negative, by
>> or-ing in
>> 1's into the top bits. or you can gather the signs using various
>> techniques..
>> useful for identifying the cell in a quad-tree for instance.
>> Integer manipulation of floats is surprisingly common.
>>
>
> I'm aware of such tricks, and actually do them with the floating point code generation in the compiler back end. I don't think that renders the idea that floats and ints should be different types a bad one.
>
> I'd also argue that such tricks are tricks, and using a reinterpret cast on them makes it clear in the code that you know what you're doing, rather than doing something bizarre like a left shift on a float type.
>
> I've worked a lot with large assembler programs. As you know, EAX has no type. The assembler code would constantly shift the type of things that were in EAX, sometimes a pointer, sometimes an int, sometimes a ushort, sometimes treating a pointer as an int, etc. I can unequivocably state that this typeless approach is confusing, buggy, hard to untangle, and ultimately a freedom that is not justifiable.
>
> Static typing is a big improvement, and having to insert a few reinterpret casts is a good thing, not a detriment.
>

To be clear, I'm not opposing strongly typing vector types... that's my primary goal too. But they're not as simple I think you believe.

>From experience, microsoft provices __m128, but GCC does what you're
proposing (although I get the feeling it's not a 'proposal' anymore). GCC uses 'vector float', 'vector int', 'vector unsigned short', etc...

I hate writing vector code the GCC way, it's really ugly. The lines tend to become dominated by casts, and it's all for nothing, since it all gets wrapped up behind a library anyway.

Secondly, you're introducing confusion. A cast from float4 to int4... does
it reinterpret, or does it type convert?
In GCC it reinterprets, but what do you actually expect? and regardless of
what you expect, what do you actually WANT most of the time...
I'm sure you'll agree that the expected/'proper' thing would be a type
conversion (and I know you're into 'proper'-ness), but in practise you
almost always want to reinterpret. This inevitably leads to ugly
reinterpret syntax all over the place.
If it were a typeless vector reg type, it all goes away.

Despite all this worry and effort, NOBODY will ever use these strongly
typed (but still primitive) types of yours. They will need to be extended
with bunches of methods, which means wrapping them up in libraries anyway,
to add all the higher level functionality... so what's the point?
The only reason they will use them is to wrap them up in a library of their
own, at which point I promise you they'll be just as annoyed as me by the
typing and need for casts all over the place to pass them into basic
intrinsics.

But if you're insistent on doing this, can you detail the proposal...
 What types will exist?
 How will each one cast/interact?
 What about error conditions/exceptions? How do I control these? ...on a
per-type basis?
 What about CTFE, will you add understanding for every operation supported
by each type? This is easily handled in a library...
 How will you assign literals?
 How can you assign a typeless literal? (a single 128bit value, used
primarily for masks)
 What operators will be supported... and what will they do?
 Will you extend support for 64bit and 256bit vector types, that's a whole
bundle more types again... I really feel this is polluting the language.
 ... is this whole thing just so you can support MADD? If so, there are
others to worry about too...


January 07, 2012
On 7 January 2012 00:38, Manu <turkeyman@gmail.com> wrote:
> On 7 January 2012 02:00, Walter Bright <newshound2@digitalmars.com> wrote:
>>
>> On 1/6/2012 12:45 PM, Manu wrote:
>>>
>>> In computer graphics it's common to work with float16's, a type not supported by
>>>
>>> simd units. Pack/Unpack code involved detailed float/int interaction.
>>> You might take a register of floats, then mask the exponent and then
>>> perform
>>> integer arithmetic on the exponent to shift it into the float16 exponent
>>> range... then you will mask the bottom of the mantissa and shift them
>>> into place.
>>> Unpacking is same process in reverse.
>>>
>>> Other tricks with the float sign bits, making everything negative, by
>>> or-ing in
>>> 1's into the top bits. or you can gather the signs using various
>>> techniques..
>>> useful for identifying the cell in a quad-tree for instance.
>>> Integer manipulation of floats is surprisingly common.
>>
>>
>> I'm aware of such tricks, and actually do them with the floating point code generation in the compiler back end. I don't think that renders the idea that floats and ints should be different types a bad one.
>>
>> I'd also argue that such tricks are tricks, and using a reinterpret cast on them makes it clear in the code that you know what you're doing, rather than doing something bizarre like a left shift on a float type.
>>
>> I've worked a lot with large assembler programs. As you know, EAX has no type. The assembler code would constantly shift the type of things that were in EAX, sometimes a pointer, sometimes an int, sometimes a ushort, sometimes treating a pointer as an int, etc. I can unequivocably state that this typeless approach is confusing, buggy, hard to untangle, and ultimately a freedom that is not justifiable.
>>
>> Static typing is a big improvement, and having to insert a few reinterpret casts is a good thing, not a detriment.
>
>
> To be clear, I'm not opposing strongly typing vector types... that's my primary goal too. But they're not as simple I think you believe.
>
> From experience, microsoft provices __m128, but GCC does what you're proposing (although I get the feeling it's not a 'proposal' anymore). GCC uses 'vector float', 'vector int', 'vector unsigned short', etc...
>
> I hate writing vector code the GCC way, it's really ugly. The lines tend to become dominated by casts, and it's all for nothing, since it all gets wrapped up behind a library anyway.
>
> Secondly, you're introducing confusion. A cast from float4 to int4... does
> it reinterpret, or does it type convert?
> In GCC it reinterprets, but what do you actually expect? and regardless of
> what you expect, what do you actually WANT most of the time...
> I'm sure you'll agree that the expected/'proper' thing would be a type
> conversion (and I know you're into 'proper'-ness), but in practise you
> almost always want to reinterpret. This inevitably leads to ugly reinterpret
> syntax all over the place.
> If it were a typeless vector reg type, it all goes away.
>

FYI, vector conversion in GCC is roughly to the idiom of  *(float4 *)&X; in C.


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';