View mode: basic / threaded / horizontal-split · Log in · Help
October 24, 2012
Very simple SIMD programming
I have found a nice paper, "Extending a C-like Language for 
Portable SIMD Programming", (2012), by Roland L., Sebastian Hack 
and Ingo Wald:

http://www.cdl.uni-saarland.de/projects/vecimp/vecimp_tr.pdf

SIMD programming is necessary in a system language, or in any 
language that wants to use the modern CPUs well. So languages 
like C, C++, D (and Mono-C#) support such wider registers.

The authors of this paper have understood that it's also 
important to make SIMD programming easy, almost as easy as scalar 
code, so most programmers are able to write such kind of correct 
code.

So this this paper presents ideas to better express SIMD 
semantics in a C-like language. They introduce few new constructs 
in a large subset of C language, with few ideas. The result 
coding patterns seem easy enough (they are surely look simpler 
than most multi-core coding patterns I've seen, including Cilk+).


They present a simple scalar program in C:

struct data_t {
    int key;
    int other;
};

int search(data_t* data , int N) {
    for (int i = 0; i < N; i++) {
        int x = data[i].key;
        if (4 < x & x <= 8) return x;
    }
    return -1;
}


Then they explain the three most common ways to represent an 
array of structs, here a struct that contains 3 values:

x0 y0 z0 x1 y1 z1 x2 y2 z2 x3 y3 z3 x4 y4 z4 x5 y5 z5 x6 y6 z6 x7 
y7 z7
(a) Array of Structures (AoS)

x0 x1 x2 x3 x4 x5 x6 x7   y0 y1 y2 y3 y4 y5 y6 y7   z0 z1 z2 z3 
z4 z5 z6 z7
(b) Structure of Arrays (SoA)

x0 x1 x2 x3 y0 y1 y2 y3 z0 z1 z2 z3 x4 x5 x6 x7 y4 y5 y6 y7 z4 z5 
z6 z7
(c) Hybrid Structure of Arrays (Hybrid SoA)

They explain how the (c) is the preferred pattern in SIMD 
programming.


Using the (c) data pattern they show how in C with (nice) SIMD 
intrinsics you write vectorized code (a simd_data_t struct 
instance contains 8 int values):

struct simd_data_t {
    simd_int key;
    simd_int other;
};

int search(simd_data_t* data , int N) {
    for (int i = 0; i < N/L; ++i) {
        simd_int x = load(data[i].key);
        simd_int cmp = simd_and(simd_lt(4, x),
        simd_le(x, 8));
        int mask = simd_to_mask(cmp);
        if (mask != 0) {
            simd_int result = simd_and(mask , x);
            for (int j = 0; j < log2(L); j++)
                result = simd_or(result ,
                whole_reg_shr(result , 1 << j));
                return simd_extract(result , 0);
            }
        }
    return -1;
}


D should do become able to do this (that is not too much bad), or 
better.


Their C language extensions allow to write nicer code like:

struct data_t {
    int key;
    int other;
};

int search(data_t *scalar data , int scalar N) {
    int L = lengthof(*data);
    for (int i = 0; i < N/L; ++i) {
        int x = data[i].key;
        if (4 < x & x <= 8)
            int block[L] result = [x, 0];
        scalar {
            for (int j = 0; j < log2(L); ++j)
                result |= whole_reg_shr(result , 1 << j);
            return get(x, 0);
        }
    }
    return -1;
}


This is based on just few simple ideas, explained in the paper 
(they are interesting, but quoting here those parts of the paper 
is not a good idea). Such ideas are not directly portable to D 
(unless the front-end is changed. Their compiler works by 
lowering, and emits regular C++ code with intrinsics).


Near the end of the paper they also propose some C++ library code:

>the C++ template mechanism would allow to define a hybrid SoA 
>container class: Similar to std::vector which abstracts a 
>traditional C array, one could implement a wrapper around a T 
>block[N]*:<


// scalar context throughout this example
struct vec3 { float x, y, z; };
// vec3 block[N]* pointing to ceil(n/N) elements
hsoa <vec3 > vecs(n);
// preferred vector length of vec3 automatically derived
static const int N = hsoa <vec3 >::vector_length;
int i = /*...*/
hsoa <vec3 >::block_index ii = /*...*/
vec3 v = vecs[i]; // gather
vecs[i] = v; // scatter
vec3 block[N] w = vecs[ii]; // fetch whole block
hsoa <vec3 >::ref r = vecs[i]; // get proxy to a scalar
r = v; // pipe through proxy
// for each element
vecs.foreach([](vec3& scalar v) { /*...*/ });


Regardless of the other ideas of their C-like language, a similar 
struct should be added to Phobos once a bit higher level SIMD 
support is in better shape in D. Supporting Hybrid-SoA and few 
operations on it will be an important but probably quite short 
and simple addition to Phobos collections (it's essentially an 
struct that acts like an array, with few simple extra operations).

I think no commonly used language allows both very simple and 
quite efficient SIMD programming (Scala, CUDA, C, C++, C#, Java, 
Go, and currently Rust too, are not able to support SIMD 
programming well. I think currently Haskell too is not supporting 
it well, but Haskell is very flexible, and it's compiled by a 
native compiler, so such things are maybe possible to add). So 
supporting it well in D will be an interesting selling point of 
D. (Supporting a very simple SIMD coding in D will make D more 
widespread, but such kind of programming will probably keep being 
a small niche).

Bye,
bearophile
October 24, 2012
Re: Very simple SIMD programming
On 24/10/12 04:41, bearophile wrote:
> I have found a nice paper, "Extending a C-like Language for Portable
> SIMD Programming", (2012), by Roland L., Sebastian Hack and Ingo Wald:
>
> http://www.cdl.uni-saarland.de/projects/vecimp/vecimp_tr.pdf
>

> They present a simple scalar program in C:
>
> struct data_t {
>      int key;
>      int other;
> };
>
> int search(data_t* data , int N) {
>      for (int i = 0; i < N; i++) {
>          int x = data[i].key;
>          if (4 < x & x <= 8) return x;
>      }
>      return -1;
> }

I don't know what that code does. I think the if statement is always 
true. Try compiling it in D.

test.d(8): Error: 4 < x must be parenthesized when next to operator &
test.d(8): Error: x <= 8 must be parenthesized when next to operator &

Making that an error was such a good idea.
<g>
October 24, 2012
Re: Very simple SIMD programming
On 10/24/2012 11:24 AM, Don Clugston wrote:
> On 24/10/12 04:41, bearophile wrote:
>> I have found a nice paper, "Extending a C-like Language for Portable
>> SIMD Programming", (2012), by Roland L., Sebastian Hack and Ingo Wald:
>>
>> http://www.cdl.uni-saarland.de/projects/vecimp/vecimp_tr.pdf
>>
>
>> They present a simple scalar program in C:
>>
>> struct data_t {
>>      int key;
>>      int other;
>> };
>>
>> int search(data_t* data , int N) {
>>      for (int i = 0; i < N; i++) {
>>          int x = data[i].key;
>>          if (4 < x & x <= 8) return x;
>>      }
>>      return -1;
>> }
>
> I don't know what that code does. I think the if statement is always
> true.

No, the code is fine.

> Try compiling it in D.
>
> test.d(8): Error: 4 < x must be parenthesized when next to operator &
> test.d(8): Error: x <= 8 must be parenthesized when next to operator &
>
> Making that an error was such a good idea.
> <g>

C's precedence rules are the same as in math in this case.
October 24, 2012
Re: Very simple SIMD programming
Don Clugston:

> Making that an error was such a good idea.
> <g>

There are two other common sources of bugs code that I'd like to 
see removed from D code:

http://d.puremagic.com/issues/show_bug.cgi?id=5409
http://d.puremagic.com/issues/show_bug.cgi?id=8757

Bye,
bearophile
October 24, 2012
Re: Very simple SIMD programming
On Wednesday, 24 October 2012 at 02:41:53 UTC, bearophile wrote:
> I have found a nice paper, "Extending a C-like Language for 
> Portable SIMD Programming", (2012), by Roland L., Sebastian 
> Hack and Ingo Wald:
>
> http://www.cdl.uni-saarland.de/projects/vecimp/vecimp_tr.pdf
>
> SIMD programming is necessary in a system language, or in any 
> language that wants to use the modern CPUs well. So languages 
> like C, C++, D (and Mono-C#) support such wider registers.
>
> The authors of this paper have understood that it's also 
> important to make SIMD programming easy, almost as easy as 
> scalar code, so most programmers are able to write such kind of 
> correct code.
>
> So this this paper presents ideas to better express SIMD 
> semantics in a C-like language. They introduce few new 
> constructs in a large subset of C language, with few ideas. The 
> result coding patterns seem easy enough (they are surely look 
> simpler than most multi-core coding patterns I've seen, 
> including Cilk+).
>
>
> They present a simple scalar program in C:
>
> struct data_t {
>     int key;
>     int other;
> };
>
> int search(data_t* data , int N) {
>     for (int i = 0; i < N; i++) {
>         int x = data[i].key;
>         if (4 < x & x <= 8) return x;
>     }
>     return -1;
> }
>
>
> Then they explain the three most common ways to represent an 
> array of structs, here a struct that contains 3 values:
>
> x0 y0 z0 x1 y1 z1 x2 y2 z2 x3 y3 z3 x4 y4 z4 x5 y5 z5 x6 y6 z6 
> x7 y7 z7
> (a) Array of Structures (AoS)
>
> x0 x1 x2 x3 x4 x5 x6 x7   y0 y1 y2 y3 y4 y5 y6 y7   z0 z1 z2 z3 
> z4 z5 z6 z7
> (b) Structure of Arrays (SoA)
>
> x0 x1 x2 x3 y0 y1 y2 y3 z0 z1 z2 z3 x4 x5 x6 x7 y4 y5 y6 y7 z4 
> z5 z6 z7
> (c) Hybrid Structure of Arrays (Hybrid SoA)
>
> They explain how the (c) is the preferred pattern in SIMD 
> programming.
>
>
> Using the (c) data pattern they show how in C with (nice) SIMD 
> intrinsics you write vectorized code (a simd_data_t struct 
> instance contains 8 int values):
>
> struct simd_data_t {
>     simd_int key;
>     simd_int other;
> };
>
> int search(simd_data_t* data , int N) {
>     for (int i = 0; i < N/L; ++i) {
>         simd_int x = load(data[i].key);
>         simd_int cmp = simd_and(simd_lt(4, x),
>         simd_le(x, 8));
>         int mask = simd_to_mask(cmp);
>         if (mask != 0) {
>             simd_int result = simd_and(mask , x);
>             for (int j = 0; j < log2(L); j++)
>                 result = simd_or(result ,
>                 whole_reg_shr(result , 1 << j));
>                 return simd_extract(result , 0);
>             }
>         }
>     return -1;
> }
>
>
> D should do become able to do this (that is not too much bad), 
> or better.
>
>
> Their C language extensions allow to write nicer code like:
>
> struct data_t {
>     int key;
>     int other;
> };
>
> int search(data_t *scalar data , int scalar N) {
>     int L = lengthof(*data);
>     for (int i = 0; i < N/L; ++i) {
>         int x = data[i].key;
>         if (4 < x & x <= 8)
>             int block[L] result = [x, 0];
>         scalar {
>             for (int j = 0; j < log2(L); ++j)
>                 result |= whole_reg_shr(result , 1 << j);
>             return get(x, 0);
>         }
>     }
>     return -1;
> }
>
>
> This is based on just few simple ideas, explained in the paper 
> (they are interesting, but quoting here those parts of the 
> paper is not a good idea). Such ideas are not directly portable 
> to D (unless the front-end is changed. Their compiler works by 
> lowering, and emits regular C++ code with intrinsics).
>
>
> Near the end of the paper they also propose some C++ library 
> code:
>
>>the C++ template mechanism would allow to define a hybrid SoA 
>>container class: Similar to std::vector which abstracts a 
>>traditional C array, one could implement a wrapper around a T 
>>block[N]*:<
>
>
> // scalar context throughout this example
> struct vec3 { float x, y, z; };
> // vec3 block[N]* pointing to ceil(n/N) elements
> hsoa <vec3 > vecs(n);
> // preferred vector length of vec3 automatically derived
> static const int N = hsoa <vec3 >::vector_length;
> int i = /*...*/
> hsoa <vec3 >::block_index ii = /*...*/
> vec3 v = vecs[i]; // gather
> vecs[i] = v; // scatter
> vec3 block[N] w = vecs[ii]; // fetch whole block
> hsoa <vec3 >::ref r = vecs[i]; // get proxy to a scalar
> r = v; // pipe through proxy
> // for each element
> vecs.foreach([](vec3& scalar v) { /*...*/ });
>
>
> Regardless of the other ideas of their C-like language, a 
> similar struct should be added to Phobos once a bit higher 
> level SIMD support is in better shape in D. Supporting 
> Hybrid-SoA and few operations on it will be an important but 
> probably quite short and simple addition to Phobos collections 
> (it's essentially an struct that acts like an array, with few 
> simple extra operations).
>
> I think no commonly used language allows both very simple and 
> quite efficient SIMD programming (Scala, CUDA, C, C++, C#, 
> Java, Go, and currently Rust too, are not able to support SIMD 
> programming well. I think currently Haskell too is not 
> supporting it well, but Haskell is very flexible, and it's 
> compiled by a native compiler, so such things are maybe 
> possible to add). So supporting it well in D will be an 
> interesting selling point of D. (Supporting a very simple SIMD 
> coding in D will make D more widespread, but such kind of 
> programming will probably keep being a small niche).
>
> Bye,
> bearophile


Actually, I am yet to see any language that has SIMD as part of 
the language standard and not as an extension where each vendor 
does its own way.
October 24, 2012
Re: Very simple SIMD programming
Paulo Pinto:

> Actually, I am yet to see any language that has SIMD as part of 
> the language standard and not as an extension where each vendor 
> does its own way.

D is, or is going to be, one such language :-)

Bye,
bearophile
October 24, 2012
Re: Very simple SIMD programming
On 24 October 2012 15:39, Paulo Pinto <pjmlp@progtools.org> wrote:

> On Wednesday, 24 October 2012 at 02:41:53 UTC, bearophile wrote:
>
>> I have found a nice paper, "Extending a C-like Language for Portable SIMD
>> Programming", (2012), by Roland L., Sebastian Hack and Ingo Wald:
>>
>> http://www.cdl.uni-saarland.**de/projects/vecimp/vecimp_tr.**pdf<http://www.cdl.uni-saarland.de/projects/vecimp/vecimp_tr.pdf>
>>
>> SIMD programming is necessary in a system language, or in any language
>> that wants to use the modern CPUs well. So languages like C, C++, D (and
>> Mono-C#) support such wider registers.
>>
>> The authors of this paper have understood that it's also important to
>> make SIMD programming easy, almost as easy as scalar code, so most
>> programmers are able to write such kind of correct code.
>>
>> So this this paper presents ideas to better express SIMD semantics in a
>> C-like language. They introduce few new constructs in a large subset of C
>> language, with few ideas. The result coding patterns seem easy enough (they
>> are surely look simpler than most multi-core coding patterns I've seen,
>> including Cilk+).
>>
>>
>> They present a simple scalar program in C:
>>
>> struct data_t {
>>     int key;
>>     int other;
>> };
>>
>> int search(data_t* data , int N) {
>>     for (int i = 0; i < N; i++) {
>>         int x = data[i].key;
>>         if (4 < x & x <= 8) return x;
>>     }
>>     return -1;
>> }
>>
>>
>> Then they explain the three most common ways to represent an array of
>> structs, here a struct that contains 3 values:
>>
>> x0 y0 z0 x1 y1 z1 x2 y2 z2 x3 y3 z3 x4 y4 z4 x5 y5 z5 x6 y6 z6 x7 y7 z7
>> (a) Array of Structures (AoS)
>>
>> x0 x1 x2 x3 x4 x5 x6 x7   y0 y1 y2 y3 y4 y5 y6 y7   z0 z1 z2 z3 z4 z5 z6
>> z7
>> (b) Structure of Arrays (SoA)
>>
>> x0 x1 x2 x3 y0 y1 y2 y3 z0 z1 z2 z3 x4 x5 x6 x7 y4 y5 y6 y7 z4 z5 z6 z7
>> (c) Hybrid Structure of Arrays (Hybrid SoA)
>>
>> They explain how the (c) is the preferred pattern in SIMD programming.
>>
>>
>> Using the (c) data pattern they show how in C with (nice) SIMD intrinsics
>> you write vectorized code (a simd_data_t struct instance contains 8 int
>> values):
>>
>> struct simd_data_t {
>>     simd_int key;
>>     simd_int other;
>> };
>>
>> int search(simd_data_t* data , int N) {
>>     for (int i = 0; i < N/L; ++i) {
>>         simd_int x = load(data[i].key);
>>         simd_int cmp = simd_and(simd_lt(4, x),
>>         simd_le(x, 8));
>>         int mask = simd_to_mask(cmp);
>>         if (mask != 0) {
>>             simd_int result = simd_and(mask , x);
>>             for (int j = 0; j < log2(L); j++)
>>                 result = simd_or(result ,
>>                 whole_reg_shr(result , 1 << j));
>>                 return simd_extract(result , 0);
>>             }
>>         }
>>     return -1;
>> }
>>
>>
>> D should do become able to do this (that is not too much bad), or better.
>>
>>
>> Their C language extensions allow to write nicer code like:
>>
>> struct data_t {
>>     int key;
>>     int other;
>> };
>>
>> int search(data_t *scalar data , int scalar N) {
>>     int L = lengthof(*data);
>>     for (int i = 0; i < N/L; ++i) {
>>         int x = data[i].key;
>>         if (4 < x & x <= 8)
>>             int block[L] result = [x, 0];
>>         scalar {
>>             for (int j = 0; j < log2(L); ++j)
>>                 result |= whole_reg_shr(result , 1 << j);
>>             return get(x, 0);
>>         }
>>     }
>>     return -1;
>> }
>>
>>
>> This is based on just few simple ideas, explained in the paper (they are
>> interesting, but quoting here those parts of the paper is not a good idea).
>> Such ideas are not directly portable to D (unless the front-end is changed.
>> Their compiler works by lowering, and emits regular C++ code with
>> intrinsics).
>>
>>
>> Near the end of the paper they also propose some C++ library code:
>>
>>  the C++ template mechanism would allow to define a hybrid SoA container
>>> class: Similar to std::vector which abstracts a traditional C array, one
>>> could implement a wrapper around a T block[N]*:<
>>>
>>
>>
>> // scalar context throughout this example
>> struct vec3 { float x, y, z; };
>> // vec3 block[N]* pointing to ceil(n/N) elements
>> hsoa <vec3 > vecs(n);
>> // preferred vector length of vec3 automatically derived
>> static const int N = hsoa <vec3 >::vector_length;
>> int i = /*...*/
>> hsoa <vec3 >::block_index ii = /*...*/
>> vec3 v = vecs[i]; // gather
>> vecs[i] = v; // scatter
>> vec3 block[N] w = vecs[ii]; // fetch whole block
>> hsoa <vec3 >::ref r = vecs[i]; // get proxy to a scalar
>> r = v; // pipe through proxy
>> // for each element
>> vecs.foreach([](vec3& scalar v) { /*...*/ });
>>
>>
>> Regardless of the other ideas of their C-like language, a similar struct
>> should be added to Phobos once a bit higher level SIMD support is in better
>> shape in D. Supporting Hybrid-SoA and few operations on it will be an
>> important but probably quite short and simple addition to Phobos
>> collections (it's essentially an struct that acts like an array, with few
>> simple extra operations).
>>
>> I think no commonly used language allows both very simple and quite
>> efficient SIMD programming (Scala, CUDA, C, C++, C#, Java, Go, and
>> currently Rust too, are not able to support SIMD programming well. I think
>> currently Haskell too is not supporting it well, but Haskell is very
>> flexible, and it's compiled by a native compiler, so such things are maybe
>> possible to add). So supporting it well in D will be an interesting selling
>> point of D. (Supporting a very simple SIMD coding in D will make D more
>> widespread, but such kind of programming will probably keep being a small
>> niche).
>>
>> Bye,
>> bearophile
>>
>
>
> Actually, I am yet to see any language that has SIMD as part of the
> language standard and not as an extension where each vendor does its own
> way.
>

HLSL, GLSL, Cg? :)
I don't think it's possible considering that D is designed to plug on to
various backends.
D already has what's required to do some fairly nice (by comparison) simd
stuff with good supporting libraries.

One thing I can think of that would really improve simd (and not only simd)
would be a way to define compound operators.
If the library could detect/hook sequences of operations and implement them
more efficiently as a compound, that would make some very powerful
optimisations available.

Simple example:
 T opCompound(string seq)(T a, T b, T c) if(seq == "* +") { return
_madd(a, b, c); }
 T opCompound(string seq)(T a, T b, T c) if(seq == "+ *") { return
_madd(b, c, a); }
October 24, 2012
Re: Very simple SIMD programming
Manu:

> D already has what's required to do some fairly nice (by 
> comparison) simd stuff with good supporting libraries.

After reading that paper I am not sure you are right. See how 
their language manages masks by itself. This is from page 3:


// vector length of context = 1; current_mask = T
int block[4] v = <0,3,4,1>;
int block[4] w = 3; // <3,3,3,3> via broadcast
bool block[4] m = v < w; // <T,F,F,T>
++v; // <1,4,5,2>
if (m) {
    // vector length of context = 4; current_mask = m
    v += 2; // <3,4,5,4>
} else {
    // vector length of context = 4; current_mask = ~m
    v += 3; // <3,7,8,4>
}
// vector length of context = 1; current_mask = T


(The simple benchmarks of the paper show a 5-15% performance loss 
compared to handwritten SIMD code.)

Bye,
bearophile
October 24, 2012
Re: Very simple SIMD programming
On 24/10/12 11:33, Timon Gehr wrote:
> On 10/24/2012 11:24 AM, Don Clugston wrote:
>> On 24/10/12 04:41, bearophile wrote:
>>> I have found a nice paper, "Extending a C-like Language for Portable
>>> SIMD Programming", (2012), by Roland L., Sebastian Hack and Ingo Wald:
>>>
>>> http://www.cdl.uni-saarland.de/projects/vecimp/vecimp_tr.pdf
>>>
>>
>>> They present a simple scalar program in C:
>>>
>>> struct data_t {
>>>      int key;
>>>      int other;
>>> };
>>>
>>> int search(data_t* data , int N) {
>>>      for (int i = 0; i < N; i++) {
>>>          int x = data[i].key;
>>>          if (4 < x & x <= 8) return x;
>>>      }
>>>      return -1;
>>> }
>>
>> I don't know what that code does. I think the if statement is always
>> true.
>
> No, the code is fine.

Oh, you're right. It's crap code though.
October 24, 2012
Re: Very simple SIMD programming
On Wednesday, 24 October 2012 at 12:50:44 UTC, Manu wrote:
> On 24 October 2012 15:39, Paulo Pinto <pjmlp@progtools.org> 
> wrote:
>
>> On Wednesday, 24 October 2012 at 02:41:53 UTC, bearophile 
>> wrote:
>>
>>> I have found a nice paper, "Extending a C-like Language for 
>>> Portable SIMD
>>> Programming", (2012), by Roland L., Sebastian Hack and Ingo 
>>> Wald:
>>>
>>> http://www.cdl.uni-saarland.**de/projects/vecimp/vecimp_tr.**pdf<http://www.cdl.uni-saarland.de/projects/vecimp/vecimp_tr.pdf>
>>>
>>> SIMD programming is necessary in a system language, or in any 
>>> language
>>> that wants to use the modern CPUs well. So languages like C, 
>>> C++, D (and
>>> Mono-C#) support such wider registers.
>>>
>>> The authors of this paper have understood that it's also 
>>> important to
>>> make SIMD programming easy, almost as easy as scalar code, so 
>>> most
>>> programmers are able to write such kind of correct code.
>>>
>>> So this this paper presents ideas to better express SIMD 
>>> semantics in a
>>> C-like language. They introduce few new constructs in a large 
>>> subset of C
>>> language, with few ideas. The result coding patterns seem 
>>> easy enough (they
>>> are surely look simpler than most multi-core coding patterns 
>>> I've seen,
>>> including Cilk+).
>>>
>>>
>>> They present a simple scalar program in C:
>>>
>>> struct data_t {
>>>     int key;
>>>     int other;
>>> };
>>>
>>> int search(data_t* data , int N) {
>>>     for (int i = 0; i < N; i++) {
>>>         int x = data[i].key;
>>>         if (4 < x & x <= 8) return x;
>>>     }
>>>     return -1;
>>> }
>>>
>>>
>>> Then they explain the three most common ways to represent an 
>>> array of
>>> structs, here a struct that contains 3 values:
>>>
>>> x0 y0 z0 x1 y1 z1 x2 y2 z2 x3 y3 z3 x4 y4 z4 x5 y5 z5 x6 y6 
>>> z6 x7 y7 z7
>>> (a) Array of Structures (AoS)
>>>
>>> x0 x1 x2 x3 x4 x5 x6 x7   y0 y1 y2 y3 y4 y5 y6 y7   z0 z1 z2 
>>> z3 z4 z5 z6
>>> z7
>>> (b) Structure of Arrays (SoA)
>>>
>>> x0 x1 x2 x3 y0 y1 y2 y3 z0 z1 z2 z3 x4 x5 x6 x7 y4 y5 y6 y7 
>>> z4 z5 z6 z7
>>> (c) Hybrid Structure of Arrays (Hybrid SoA)
>>>
>>> They explain how the (c) is the preferred pattern in SIMD 
>>> programming.
>>>
>>>
>>> Using the (c) data pattern they show how in C with (nice) 
>>> SIMD intrinsics
>>> you write vectorized code (a simd_data_t struct instance 
>>> contains 8 int
>>> values):
>>>
>>> struct simd_data_t {
>>>     simd_int key;
>>>     simd_int other;
>>> };
>>>
>>> int search(simd_data_t* data , int N) {
>>>     for (int i = 0; i < N/L; ++i) {
>>>         simd_int x = load(data[i].key);
>>>         simd_int cmp = simd_and(simd_lt(4, x),
>>>         simd_le(x, 8));
>>>         int mask = simd_to_mask(cmp);
>>>         if (mask != 0) {
>>>             simd_int result = simd_and(mask , x);
>>>             for (int j = 0; j < log2(L); j++)
>>>                 result = simd_or(result ,
>>>                 whole_reg_shr(result , 1 << j));
>>>                 return simd_extract(result , 0);
>>>             }
>>>         }
>>>     return -1;
>>> }
>>>
>>>
>>> D should do become able to do this (that is not too much 
>>> bad), or better.
>>>
>>>
>>> Their C language extensions allow to write nicer code like:
>>>
>>> struct data_t {
>>>     int key;
>>>     int other;
>>> };
>>>
>>> int search(data_t *scalar data , int scalar N) {
>>>     int L = lengthof(*data);
>>>     for (int i = 0; i < N/L; ++i) {
>>>         int x = data[i].key;
>>>         if (4 < x & x <= 8)
>>>             int block[L] result = [x, 0];
>>>         scalar {
>>>             for (int j = 0; j < log2(L); ++j)
>>>                 result |= whole_reg_shr(result , 1 << j);
>>>             return get(x, 0);
>>>         }
>>>     }
>>>     return -1;
>>> }
>>>
>>>
>>> This is based on just few simple ideas, explained in the 
>>> paper (they are
>>> interesting, but quoting here those parts of the paper is not 
>>> a good idea).
>>> Such ideas are not directly portable to D (unless the 
>>> front-end is changed.
>>> Their compiler works by lowering, and emits regular C++ code 
>>> with
>>> intrinsics).
>>>
>>>
>>> Near the end of the paper they also propose some C++ library 
>>> code:
>>>
>>>  the C++ template mechanism would allow to define a hybrid 
>>> SoA container
>>>> class: Similar to std::vector which abstracts a traditional 
>>>> C array, one
>>>> could implement a wrapper around a T block[N]*:<
>>>>
>>>
>>>
>>> // scalar context throughout this example
>>> struct vec3 { float x, y, z; };
>>> // vec3 block[N]* pointing to ceil(n/N) elements
>>> hsoa <vec3 > vecs(n);
>>> // preferred vector length of vec3 automatically derived
>>> static const int N = hsoa <vec3 >::vector_length;
>>> int i = /*...*/
>>> hsoa <vec3 >::block_index ii = /*...*/
>>> vec3 v = vecs[i]; // gather
>>> vecs[i] = v; // scatter
>>> vec3 block[N] w = vecs[ii]; // fetch whole block
>>> hsoa <vec3 >::ref r = vecs[i]; // get proxy to a scalar
>>> r = v; // pipe through proxy
>>> // for each element
>>> vecs.foreach([](vec3& scalar v) { /*...*/ });
>>>
>>>
>>> Regardless of the other ideas of their C-like language, a 
>>> similar struct
>>> should be added to Phobos once a bit higher level SIMD 
>>> support is in better
>>> shape in D. Supporting Hybrid-SoA and few operations on it 
>>> will be an
>>> important but probably quite short and simple addition to 
>>> Phobos
>>> collections (it's essentially an struct that acts like an 
>>> array, with few
>>> simple extra operations).
>>>
>>> I think no commonly used language allows both very simple and 
>>> quite
>>> efficient SIMD programming (Scala, CUDA, C, C++, C#, Java, 
>>> Go, and
>>> currently Rust too, are not able to support SIMD programming 
>>> well. I think
>>> currently Haskell too is not supporting it well, but Haskell 
>>> is very
>>> flexible, and it's compiled by a native compiler, so such 
>>> things are maybe
>>> possible to add). So supporting it well in D will be an 
>>> interesting selling
>>> point of D. (Supporting a very simple SIMD coding in D will 
>>> make D more
>>> widespread, but such kind of programming will probably keep 
>>> being a small
>>> niche).
>>>
>>> Bye,
>>> bearophile
>>>
>>
>>
>> Actually, I am yet to see any language that has SIMD as part 
>> of the
>> language standard and not as an extension where each vendor 
>> does its own
>> way.
>>
>
> HLSL, GLSL, Cg? :)

I was thinking about general purpose programming languages, not 
domain specific ones.

--
Paulo
« First   ‹ Prev
1 2 3
Top | Discussion index | About this forum | D home