View mode: basic / threaded / horizontal-split · Log in · Help
January 13, 2012
Re: SIMD support...
On 12.01.2012 23:10, Peter Alexander wrote:
> On 12/01/12 8:13 PM, Norbert Nemec wrote:
>> Considering these hardware details of the SSE architecture alone, I fear
>> that portable low-level support for SIMD is very hard to achieve. If you
>> want to offer access to the raw power of each architecture, it might be
>> simpler to have machine-specific language extensions for SIMD and leave
>> the portability for a wrapper library with a common front-end and
>> various back-ends for the different architectures.
>
> You are right, but don't forget that the same is true for instructions
> already in the language. For example, (1 << x) is a very slow operation
> on PPUs (it's micro-coded).
>
> It's simply not possible to be portable and achieve maximum performance
> for any language features, not just vectors. Algorithms must be tuned
> for specific architectures in version statements. However, you can get a
> decent baseline by providing the lowest common denominator in
> functionality. This v128 type (or whatever it will be called) does that.

Actually, my essential message is: The single v128 is too simplistic for 
the SSE architecture. You actually need different types because the 
compiler needs to know what type is stored in any given register to be 
able to move it around.
January 13, 2012
Re: SIMD support...
On 13 January 2012 08:34, Norbert Nemec <Norbert@nemec-online.de> wrote:

> On 12.01.2012 23:10, Peter Alexander wrote:
>
>> On 12/01/12 8:13 PM, Norbert Nemec wrote:
>>
>>> Considering these hardware details of the SSE architecture alone, I fear
>>> that portable low-level support for SIMD is very hard to achieve. If you
>>> want to offer access to the raw power of each architecture, it might be
>>> simpler to have machine-specific language extensions for SIMD and leave
>>> the portability for a wrapper library with a common front-end and
>>> various back-ends for the different architectures.
>>>
>>
>> You are right, but don't forget that the same is true for instructions
>> already in the language. For example, (1 << x) is a very slow operation
>> on PPUs (it's micro-coded).
>>
>> It's simply not possible to be portable and achieve maximum performance
>> for any language features, not just vectors. Algorithms must be tuned
>> for specific architectures in version statements. However, you can get a
>> decent baseline by providing the lowest common denominator in
>> functionality. This v128 type (or whatever it will be called) does that.
>>
>
> Actually, my essential message is: The single v128 is too simplistic for
> the SSE architecture. You actually need different types because the
> compiler needs to know what type is stored in any given register to be able
> to move it around.
>

This has already been concluded some days back, the language has a quite of
types, just like GCC.
January 14, 2012
Re: SIMD support...
In case this is at all helpful...
[see attached]
January 15, 2012
Re: SIMD support...
MS has three types, __m128, __m128i and __m128d  (float, int, double)

Six if you count AVX's 256 forms.

On 1/7/2012 6:54 PM, Peter Alexander wrote:
> On 7/01/12 9:28 PM, Andrei Alexandrescu wrote:
> I agree with Manu that we should just have a single type like __m128 in
> MSVC. The other types and their conversions should be solvable in a
> library with something like strong typedefs.
>
January 15, 2012
Re: SIMD support...
On 1/6/2012 9:44 AM, Manu wrote:
> On 6 January 2012 17:01, Russel Winder <russel@russel.org.uk
> <mailto:russel@russel.org.uk>> wrote:
> As said, I think these questions are way outside the scope of SIMD
> vector libraries ;)
> Although this is a fundamental piece of the puzzle, since GPGPU is no
> use without SIMD type expression... but I think everything we've
> discussed here so far will map perfectly to GPGPU.

I don't think you are in any danger as the GPGPU instructions are more 
flexible than the CPU SIMD counterparts GPU hardware natively works with 
float2, float3 extremely well.  GPUs have VLIW instructions that can 
effectively add a huge number of instruction modifiers to their 
instructions (things like built in saturates of 0..1 range on variable 
arguments _reads_, arbitrary swizzle on read and write, write masks that 
leave partial data untouched etc, all in one clock).

The CPU SIMD stuff is simplistic by comparions.  A good bang for the 
buck would be to have some basic set of operators (* / + - < > == != <= 
>= and especially ? (the ternary operator)), and versions of 'any' and 
'all' from HLSL for dynamic branching, that can work at the very least 
for integer, float, and double types.

Bit shifting is useful (esp manipulating floats for transcendental 
functions or workingw ith half FP16 types requires a lot of), but should 
be restricted to integer types.  Having dedicated signed and unsigned 
right shifts would be pretty nice to (since about 95% of my right shifts 
end up needing to be of the zero-extended variety even though I had to 
cast to 'vector integers')
January 15, 2012
Re: SIMD support...
On 1/14/2012 9:58 PM, Sean Cavanaugh wrote:
> MS has three types, __m128, __m128i and __m128d (float, int, double)
>
> Six if you count AVX's 256 forms.
>
> On 1/7/2012 6:54 PM, Peter Alexander wrote:
>> On 7/01/12 9:28 PM, Andrei Alexandrescu wrote:
>> I agree with Manu that we should just have a single type like __m128 in
>> MSVC. The other types and their conversions should be solvable in a
>> library with something like strong typedefs.
>>

The trouble with MS's scheme, is given the following:

    __m128i v;
    v += 2;

Can't tell what to do. With D,

   int4 v;
   v += 2;

it's clear (add 2 to each of the 4 ints).
January 15, 2012
Re: SIMD support...
On 1/6/2012 7:58 PM, Manu wrote:
> On 7 January 2012 03:46, Vladimir Panteleev <vladimir@thecybershadow.net
> <mailto:vladimir@thecybershadow.net>> wrote:
>
> I've never seen a memcpy on any console system I've ever worked on that
> takes advantage if its large registers... writing a fast memcpy is
> usually one of the first things we do when we get a new platform ;)

Plus memcpy is optimized for reading and writing to cached virtual 
memory, so you need several others to write to write-combined or 
uncached memory efficiently and whatnot.
January 15, 2012
Re: SIMD support...
On 1/15/2012 12:09 AM, Walter Bright wrote:
> On 1/14/2012 9:58 PM, Sean Cavanaugh wrote:
>> MS has three types, __m128, __m128i and __m128d (float, int, double)
>>
>> Six if you count AVX's 256 forms.
>>
>> On 1/7/2012 6:54 PM, Peter Alexander wrote:
>>> On 7/01/12 9:28 PM, Andrei Alexandrescu wrote:
>>> I agree with Manu that we should just have a single type like __m128 in
>>> MSVC. The other types and their conversions should be solvable in a
>>> library with something like strong typedefs.
>>>
>
> The trouble with MS's scheme, is given the following:
>
> __m128i v;
> v += 2;
>
> Can't tell what to do. With D,
>
> int4 v;
> v += 2;
>
> it's clear (add 2 to each of the 4 ints).

Working with their intrinsics in their raw form for real code is pure 
insanity :)  You need to wrap it all with a good math library (even if 
90% of the library is the intrinsics wrapped into __forceinlined 
functions), so you can start having sensible operator overloads, and so 
you can write code that is readable.


if (any4(a > b))
{
  // do stuff
}


is way way way better than (pseudocode)

if (__movemask_ps(_mm_gt_ps(a, b)) == 0x0F)
{
}



and (if the ternary operator was overrideable in C++)

float4 foo = (a > b) ? c : d;

would be better than

float4 mask = _mm_gt_ps(a, b);
float4 foo = _mm_or_ps(_mm_and_ps(mask, c), _mm_nand_ps_(mask, d));
January 15, 2012
Re: SIMD support...
On 1/14/2012 2:11 AM, Mehrdad wrote:
> In case this is at all helpful...
> [see attached]

Hope you like the new simd compiler stuff.
January 15, 2012
Re: SIMD support...
On 1/13/2012 7:38 AM, Manu wrote:
> On 13 January 2012 08:34, Norbert Nemec <Norbert@nemec-online.de
> <mailto:Norbert@nemec-online.de>> wrote:
>
>
> This has already been concluded some days back, the language has a quite
> of types, just like GCC.

So I would definitely like to help out on the SIMD stuff in some way, as 
I have a lot of experience using SIMD math to speed up the games I work 
on.  I've got a vectorized set of transcendetal (currently in the form 
of MSVC++ intrinics) functions for float and double that would be a good 
start if anyone is interested.  Beyond that I just want to help 'make it 
right' because its a topic I care alot about, and is my personal biggest 
gripe with the langauge at the moment.

I also have experience with VMX as they two are not exactly the same, it 
definitely would help to avoid making the code too intel-centric (though 
typically the VMX is the more flexible design as it can do dynamic 
shuffling based on the contents of the vector registers etc)
9 10 11 12 13 14 15 16
Top | Discussion index | About this forum | D home