start on SIMD documentation (page 6)

Le 13/01/2012 09:39, Walter Bright a écrit : > https://github.com/D-Programming-Language/d-programming-language.org/blob/master/simd.dd > > > and core.simd: > > https://github.com/D-Programming-Language/druntime/blob/master/src/core/simd.d > Let me propose something that solve this problem and many other. I never had time to explain it properly, but this is definitively interesting for this conversation, so I'll give a quick draft. The idea is to add a new ASM instruction to add an alias to a register choosen by the compiler and ensure it represent some existing variable. Thus, the compiler could ensure not to use a register already used and avoid stack manipulation when possible. This would also help ASM to get more readable. As register are not equal one to another, we need to specify which type of register we want to use. For that, we just replace the « variable » part of the register name by N. For exemple XMMN is any SSE register. RNX represent a general purpose register on x86_64. Now we define the false alias ASM instruction. Here is a (dumb) exemple : long toto = 42; long tata = 1337; asm { alias toto RNX; alias tata RNX; add toto tata; // Now toto is 1379. } If tot and tata are already in some register, the compiler can simply map. If they are in memory, the the compiler must choose a register and mov the right data into it.

What about the 256 bit types that are already present in AVX instruction set? I've written a several C++ based SIMD math libraries (for SSE2 up through AVX), and PPC's VMX instruction sets that you can find on game consoles. The variable type naming is probably the most annoying thing to work out. For HLSL they use float, float1, float2, float3, float4 and int, uint and double versions, and this convention works out quite well until you start having to deal with smaller integer types or FP16 half floats. However on the CPU side of things there are signed and unsigned 8, 16, 32, 64 and 128 bit values. It gets even more complicated in that not all the math operations or comparisons are supported on the non-32 bit types. The hardware is really designed for you to pack and unpack the smaller types to 32 bit do the work and pack the results back, and the 64 bit integer support is also a bit spotty (esp wrt multiply and divide). On 1/13/2012 2:57 PM, bearophile wrote: > Walter: > >> What's our vector, Victor? >> http://www.youtube.com/watch?v=fVq4_HhBK8Y > > Thank you Walter :-) > > >> If int4 is out, I'd prefer something like vint4. Something short. > > Current names: > > void16 > double2 > float4 > byte16 > ubyte16 > short8 > ushort8 > int4 > uint4 > long2 > > Your suggestion: > > vvoid16 > vdouble2 > vfloat4 > vbyte16 > vubyte16 > vshort8 > vushort8 > vint4 > vuint4 > vlong2 > > > My suggestion: > > void16v > double2v > float4v > byte16v > ubyte16v > short8v > ushort8v > int4v > uint4v > long2v > > Bye, > bearophile

On 1/14/2012 9:15 PM, Sean Cavanaugh wrote: > What about the 256 bit types that are already present in AVX instruction set? Eventually, I'd like to do them, too. > I've written a several C++ based SIMD math libraries (for SSE2 up through AVX), > and PPC's VMX instruction sets that you can find on game consoles. > > The variable type naming is probably the most annoying thing to work out. > > For HLSL they use float, float1, float2, float3, float4 and int, uint and double > versions, and this convention works out quite well until you start having to > deal with smaller integer types or FP16 half floats. > > However on the CPU side of things there are signed and unsigned 8, 16, 32, 64 > and 128 bit values. I'm not sure why the convention used in std.simd fails here. > It gets even more complicated in that not all the math > operations or comparisons are supported on the non-32 bit types. Right. D is designed to give an error for operations that are not supported, rather than downgrade to emulation like gcc does.

On Sun, 15 Jan 2012 06:21:22 +0100, Walter Bright <newshound2@digitalmars.com> wrote: > On 1/14/2012 9:15 PM, Sean Cavanaugh wrote: >> What about the 256 bit types that are already present in AVX instruction set? > > Eventually, I'd like to do them, too. > I already did some work for adding AVX to the inline assembler. https://github.com/dawgfoto/dmd/commits/AVXSupport

On 1/14/2012 9:52 PM, Martin Nowak wrote: > On Sun, 15 Jan 2012 06:21:22 +0100, Walter Bright <newshound2@digitalmars.com> > wrote: > >> On 1/14/2012 9:15 PM, Sean Cavanaugh wrote: >>> What about the 256 bit types that are already present in AVX instruction set? >> >> Eventually, I'd like to do them, too. >> > I already did some work for adding AVX to the inline assembler. > https://github.com/dawgfoto/dmd/commits/AVXSupport Nice, how about a pull request?

On Sun, 15 Jan 2012 07:07:03 +0100, Walter Bright <newshound2@digitalmars.com> wrote: > On 1/14/2012 9:52 PM, Martin Nowak wrote: >> On Sun, 15 Jan 2012 06:21:22 +0100, Walter Bright <newshound2@digitalmars.com> >> wrote: >> >>> On 1/14/2012 9:15 PM, Sean Cavanaugh wrote: >>>> What about the 256 bit types that are already present in AVX instruction set? >>> >>> Eventually, I'd like to do them, too. >>> >> I already did some work for adding AVX to the inline assembler. >> https://github.com/dawgfoto/dmd/commits/AVXSupport > > Nice, how about a pull request? It's still some instructions to go.

January 16, 2012

Re: start on SIMD documentation

Posted by Andrei Alexandrescu
in reply to Walter Bright

Permalink

Andrei Alexandrescu

Posted in reply to Walter Bright

Permalink

On 1/13/12 11:02 PM, Walter Bright wrote:
> On 1/13/2012 8:52 PM, Andrei Alexandrescu wrote:
>> On 1/13/12 10:03 PM, Walter Bright wrote:
>>> On 1/13/2012 7:03 PM, Andrei Alexandrescu wrote:
>>>> How is that possibly different from what you have now?
>>>
>>> Intrinsic functions are today just a table lookup in the compiler.
>>
>> They're a table lookup if the operation is a compile-time constant. So
>> this
>> argument does not apply.
>>
>>> Template intrinsics currently do not exist, so more code needs to be
>>> written for them.
>>
>> The same table lookup could be done, except in this case it would be more
>> principled.
>
> You and I are talking about different things.

No. Please hear me out.

> The current compiler looks for intrinsics after all template functions
> are converted into real functions. The mangled name is looked up in a
> table to see if:
>
> 1. it is an intrinsic function
>
> 2. what is the corresponding expression node operator
>
> Doing it for intrinsic functions would require either:
>
> 1. adding hundreds of function signatures to the table
>
> 2. moving the intrinsic detection to the template instantiation logic

So this is an implementation issue that has nothing to do with doing the right thing. That's no reason to do the wrong thing. The real problem with the current approach is as follows.

Defining an intrinsic function is cheating. It means the language's facilities are unable to expose computation to the compiler in a manner that makes it able to translate it to efficient code. This in turn points to problems in either the language or the compiler technology.

To a good extent these are known issues of the state of the art. There are advantages to e.g. making integers intrinsic types and imbuing the compiler with understanding of basic arithmetic identities. Or it makes sense to define integral rotation as an intrinsic function (or devise a peephole optimization that detects the pattern) because it's one assembler operation deriving from a rather involved algorithm. So intrinsics are a necessary evil.

On to your implementation of simd, which is

simd(opcode, op1, op2)

This is a lie - it's cheating twice. The expression looks like a function but does not feel like a function. The first argument, the opcode, is NOT a function parameter. It's part of the function. Passing a variable in there does not work as a matter of design - the generated code depends on that so opcode must be known during compilation.

So what do we have to integrate the cheating operation within the current language semantics? I can think of two. First, make the operand part of the function name:

simdOPCODE(op1, op2)

This is reasonable because it acknowledges what really happens - each function has its identity and its generated code.

The second, spaceship-era approach (and closer to the current implementation) is to make the first argument a template parameter. This is because template parameters must be known during compilation, which is exactly opcode's requirement:

simd!opcode(op1, op2)

Either approach should work perfectly fine; it "cheats in style" by using an existing language construct that exactly matches the special-cased capability. The current approach cheats badly - it messes with the language's fabric by defining a construct that's not a function but looks like one.

I'd be in your debt if you could at least do the right thing when defining new intrinsics.

Thanks,

Andrei

On 1/16/2012 8:38 AM, Andrei Alexandrescu wrote: > Either approach should work perfectly fine; it "cheats in style" by using an > existing language construct that exactly matches the special-cased capability. > The current approach cheats badly - it messes with the language's fabric by > defining a construct that's not a function but looks like one. You're right, it is cheating. > I'd be in your debt if you could at least do the right thing when defining new > intrinsics. We can talk about that some more. At some point, there's going to have to be some compiler magic going on.

On 1/16/12 1:32 PM, Walter Bright wrote: > On 1/16/2012 8:38 AM, Andrei Alexandrescu wrote: >> Either approach should work perfectly fine; it "cheats in style" by >> using an >> existing language construct that exactly matches the special-cased >> capability. >> The current approach cheats badly - it messes with the language's >> fabric by >> defining a construct that's not a function but looks like one. > > You're right, it is cheating. > >> I'd be in your debt if you could at least do the right thing when >> defining new >> intrinsics. > > We can talk about that some more. At some point, there's going to have > to be some compiler magic going on. OK. The overarching point is that the magic should be virtually indistinguishable from using the regular constructs. It's an optimization, and just like with any other optimization, you don't expect it to change the meaning of a construct. Andrei

Forums