Thread overview | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
February 01, 2011 Support for gcc vector attributes, SIMD builtins | ||||
---|---|---|---|---|
| ||||
I built gdc from tip on Fedora 13 (x86-64) and started playing around with creating a vector struct (x,y,z,w) to see what kind of optimization the code generator did with it. It was able to partially drop into SSE registers and instructions, but not as well as I had hoped from writing "regular" D code. I poked through the builtins that get pulled into d-builtins.c / d-builtins2.cc but I don't see anything that might be pulling in definitions such as __builtin_ia32_* for SSE, for example. How hard would it be to get some sort of vector attribute attached to a type (or just plain indroduce v4sf, __m128, or something like that) and get those SIMD builtins available? For the curious, here are how they are defined in, for example, xmmintrin.h for gcc: typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__)); typedef float __v4sf __attribute__ ((__vector_size__ (16))); extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_add_ps (__m128 __A, __m128 __B) { return (__m128) __builtin_ia32_addps ((__v4sf)__A, (__v4sf)__B); } I'm game for making an attempt myself if someone can point me in the right direction. I'm a hardcore ray tracing / rendering guy, and performance is of the utmost importance. If I could write a ray tracer in D that matches my C++ tracer for performance, I'd be ecstatic. -Mike |
February 01, 2011 Re: Support for gcc vector attributes, SIMD builtins | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike Farnsworth | Am 01.02.2011 09:10, schrieb Mike Farnsworth: > I built gdc from tip on Fedora 13 (x86-64) and started playing around > with creating a vector struct (x,y,z,w) to see what kind of optimization > the code generator did with it. It was able to partially drop into SSE > registers and instructions, but not as well as I had hoped from writing > "regular" D code. > > I poked through the builtins that get pulled into d-builtins.c / > d-builtins2.cc but I don't see anything that might be pulling in > definitions such as __builtin_ia32_* for SSE, for example. > > How hard would it be to get some sort of vector attribute attached to a > type (or just plain indroduce v4sf, __m128, or something like that) and > get those SIMD builtins available? > > For the curious, here are how they are defined in, for example, > xmmintrin.h for gcc: > > typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__)); > > typedef float __v4sf __attribute__ ((__vector_size__ (16))); > > extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, > __artificial__)) > _mm_add_ps (__m128 __A, __m128 __B) > { > return (__m128) __builtin_ia32_addps ((__v4sf)__A, (__v4sf)__B); > } > > I'm game for making an attempt myself if someone can point me in the > right direction. I'm a hardcore ray tracing / rendering guy, and > performance is of the utmost importance. If I could write a ray tracer > in D that matches my C++ tracer for performance, I'd be ecstatic. > > -Mike > I'm not sure if that'll help at all, but you may try something like alias float[4] vec4; // or whatever type you're using /Maybe/ SSE optimizations work better on arrays than on structs. Of course, such a type isn't as handy because it'll be vec4[0] instead of vec4.x, but it may be worth a try.. If it helps (i.e. SSE is used better) you could go on trying to put that vector in a struct, have x, y, z, w as properties[1] that get/set the corresponding fields in the array and overload operators so they work directly on the array. Cheers, - Daniel [1] http://digitalmars.com/d/2.0/property.html at the bottom of the page. I guess this will cause little/no overhead because of inlining. |
February 01, 2011 Re: Support for gcc vector attributes, SIMD builtins | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike Farnsworth | == Quote from Mike Farnsworth (mike.farnsworth@gmail.com)'s article
> I built gdc from tip on Fedora 13 (x86-64) and started playing around
> with creating a vector struct (x,y,z,w) to see what kind of optimization
> the code generator did with it. It was able to partially drop into SSE
> registers and instructions, but not as well as I had hoped from writing
> "regular" D code.
> I poked through the builtins that get pulled into d-builtins.c /
> d-builtins2.cc but I don't see anything that might be pulling in
> definitions such as __builtin_ia32_* for SSE, for example.
> How hard would it be to get some sort of vector attribute attached to a
> type (or just plain indroduce v4sf, __m128, or something like that) and
> get those SIMD builtins available?
> For the curious, here are how they are defined in, for example,
> xmmintrin.h for gcc:
> typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__));
> typedef float __v4sf __attribute__ ((__vector_size__ (16)));
> extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__,
> __artificial__))
> _mm_add_ps (__m128 __A, __m128 __B)
> {
> return (__m128) __builtin_ia32_addps ((__v4sf)__A, (__v4sf)__B);
> }
Although GDC hashes out GCC builtins and attributes, most of it is very much incomplete. For example, a D version (for GDC) of the code above would be something like:
import gcc.builtins;
pragma(set_attribute, __m128, vector_size(16), may_alias);
pragma(set_attribute, __v4sf, vector_size(16));
pragma(set_attribute, _mm_add_ps, always_inline, artificial);
typedef float __m128;
typedef float __v4sf;
__m128 _mm_add_ps (__m128 __A, __m128 __B)
{
return cast(__m128) __builtin_ia32_addps (cast(__v4sf)__A, cast(__v4sf)__B);
}
However, this doesn't work because
1) There is no 128bit float type in DMDFE (can be put in though, even if it is
just for internal use).
2) Vectors are not representable in DMDFE.
So __builtin_ia32_addps (and many other ia32 builtins) cannot be emitted to the D
environment.
Interestingly enough, this particular example actually ICEs the compiler. It appears that while *explicit* casting is done in the code, DMDFE actually *ignores* this, which is terrible on DMD's part...
Saying that, workaround is to use array types.
typedef float[4] __m128;
typedef float[4] __v4sf;
All the more reason to show you that pragma(attribute) is still very incomplete to
use. Any ideas to improve it are welcome though. :)
|
February 01, 2011 Re: Support for gcc vector attributes, SIMD builtins | ||||
---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | Iain Buclaw Wrote: > == Quote from Mike Farnsworth (mike.farnsworth@gmail.com)'s article > > I built gdc from tip on Fedora 13 (x86-64) and started playing around > > with creating a vector struct (x,y,z,w) to see what kind of optimization > > the code generator did with it. It was able to partially drop into SSE > > registers and instructions, but not as well as I had hoped from writing > > "regular" D code. > > I poked through the builtins that get pulled into d-builtins.c / > > d-builtins2.cc but I don't see anything that might be pulling in > > definitions such as __builtin_ia32_* for SSE, for example. > > How hard would it be to get some sort of vector attribute attached to a > > type (or just plain indroduce v4sf, __m128, or something like that) and > > get those SIMD builtins available? > > Saying that, workaround is to use array types. > typedef float[4] __m128; > typedef float[4] __v4sf; > > > All the more reason to show you that pragma(attribute) is still very incomplete to > use. Any ideas to improve it are welcome though. :) The workaround actually looks like a cleaner way to define types for vector intrinsics. How hard would it be to export vector intrinsics so the API expects float[4], for example? |
February 01, 2011 Re: Support for gcc vector attributes, SIMD builtins | ||||
---|---|---|---|---|
| ||||
Posted in reply to Daniel Gibson | Daniel Gibson Wrote:
> I'm not sure if that'll help at all, but you may try something like
> alias float[4] vec4; // or whatever type you're using
> /Maybe/ SSE optimizations work better on arrays than on structs.
> Of course, such a type isn't as handy because it'll be vec4[0] instead
> of vec4.x, but it may be worth a try..
> If it helps (i.e. SSE is used better) you could go on trying to put that
> vector in a struct, have x, y, z, w as properties[1] that get/set the
> corresponding fields in the array and overload operators so they work
> directly on the array.
I actually tried making the actual data a float[4] vs float x, y, z, w, and while it generates different code, neither one boiled down to the simpler SSE instructions I had hoped for (and generally get out of gcc with my c++ classes, especially if I use the SSE intrinsics in the *mmintrin.h headers). I poked with various bits of syntax to see if I could convince it, with no luck.
-Mike
|
February 01, 2011 Re: Support for gcc vector attributes, SIMD builtins | ||||
---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | Iain Buclaw Wrote: > == Quote from Mike Farnsworth (mike.farnsworth@gmail.com)'s article > > I built gdc from tip on Fedora 13 (x86-64) and started playing around > > with creating a vector struct (x,y,z,w) to see what kind of optimization > > the code generator did with it. It was able to partially drop into SSE > > registers and instructions, but not as well as I had hoped from writing > > "regular" D code. > > I poked through the builtins that get pulled into d-builtins.c / > > d-builtins2.cc but I don't see anything that might be pulling in > > definitions such as __builtin_ia32_* for SSE, for example. > > How hard would it be to get some sort of vector attribute attached to a > > type (or just plain indroduce v4sf, __m128, or something like that) and > > get those SIMD builtins available? > > For the curious, here are how they are defined in, for example, > > xmmintrin.h for gcc: > > typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__)); > > typedef float __v4sf __attribute__ ((__vector_size__ (16))); > > extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, > > __artificial__)) > > _mm_add_ps (__m128 __A, __m128 __B) > > { > > return (__m128) __builtin_ia32_addps ((__v4sf)__A, (__v4sf)__B); > > } > > Although GDC hashes out GCC builtins and attributes, most of it is very much incomplete. For example, a D version (for GDC) of the code above would be something like: > > > import gcc.builtins; > > pragma(set_attribute, __m128, vector_size(16), may_alias); > pragma(set_attribute, __v4sf, vector_size(16)); > pragma(set_attribute, _mm_add_ps, always_inline, artificial); > > typedef float __m128; > typedef float __v4sf; > > __m128 _mm_add_ps (__m128 __A, __m128 __B) > { > return cast(__m128) __builtin_ia32_addps (cast(__v4sf)__A, cast(__v4sf)__B); > } > > > > However, this doesn't work because > > 1) There is no 128bit float type in DMDFE (can be put in though, even if it is > just for internal use). > 2) Vectors are not representable in DMDFE. > > So __builtin_ia32_addps (and many other ia32 builtins) cannot be emitted to the D > environment. I figured this would be the case; the "typedef float whatever __attribute((vector_size(16)))" stuff is already weird, so I don't expect dmdfe to do the right thing with even similar syntax at all. > Interestingly enough, this particular example actually ICEs the compiler. It appears that while *explicit* casting is done in the code, DMDFE actually *ignores* this, which is terrible on DMD's part... Hah. It's obvious dmdfe doesn't understand that the builtin's signature correctly, so I'll hold off on a bug report until I can figure out what kind of signature that builtin had registered with dmdfe. > Saying that, workaround is to use array types. > typedef float[4] __m128; > typedef float[4] __v4sf; > > > All the more reason to show you that pragma(attribute) is still very incomplete to > use. Any ideas to improve it are welcome though. :) In my (not very abundant) spare time, I'll poke around the attribute stuff to see if I can attach the vector_size(16) attribute to a float[4] array type. I know the __builtin_ia32_addps function, for example, takes a v4sf (__m128 is just Intel's version that can change personalities at will; I feel no inclination to keep it around, and instead go with more strictly defined types and cast intrinsics). If I can get that builtin to take a typedef'd float[4] without a cast, perhaps dmdfe will not drop any data and the codegen will happen properly. Where do I look to see the attribute pragmas in gdc? Where do I look to potentially change the signature that dmdfe sees for the __builtin_ia32_* functions? If I can get a hand-coded signature to work, then we'll be in business. -Mike |
February 01, 2011 Re: Support for gcc vector attributes, SIMD builtins | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jerry Quinn | == Quote from Jerry Quinn (jlquinn@optonline.net)'s article
> Iain Buclaw Wrote:
> > == Quote from Mike Farnsworth (mike.farnsworth@gmail.com)'s article
> > > I built gdc from tip on Fedora 13 (x86-64) and started playing around
> > > with creating a vector struct (x,y,z,w) to see what kind of optimization
> > > the code generator did with it. It was able to partially drop into SSE
> > > registers and instructions, but not as well as I had hoped from writing
> > > "regular" D code.
> > > I poked through the builtins that get pulled into d-builtins.c /
> > > d-builtins2.cc but I don't see anything that might be pulling in
> > > definitions such as __builtin_ia32_* for SSE, for example.
> > > How hard would it be to get some sort of vector attribute attached to a
> > > type (or just plain indroduce v4sf, __m128, or something like that) and
> > > get those SIMD builtins available?
> >
> > Saying that, workaround is to use array types.
> > typedef float[4] __m128;
> > typedef float[4] __v4sf;
> >
> >
> > All the more reason to show you that pragma(attribute) is still very incomplete to
> > use. Any ideas to improve it are welcome though. :)
> The workaround actually looks like a cleaner way to define types for vector
intrinsics. How hard would it be to export vector intrinsics so the API expects float[4], for example?
I haven't given it much thought on how internal representation could be, but I'd lean on using unions in D code for usage in the language. As its probably most portable.
For example, one of the older 'hello vectors' I know of:
import std.c.stdio;
pragma(set_attribute, __v4sf, vector_size(16));
typedef float __v4sf;
union f4vector
{
__v4sf v;
float[4] f;
}
int main()
{
f4vector a, b, c;
a.f = [1, 2, 3, 4];
b.f = [5, 6, 7, 8];
c.v = a.v + b.v;
printf("%f, %f, %f, %f\n", c.f[0], c.f[1], c.f[2], c.f[3]);
return 0;
}
Compile: gdc -c -g -msse hellovector.d
Dump Object: objdump -dS hellovector.o'
And the output of the SIMD operation speaks for itself:
c.v = a.v + b.v;
xorps %xmm1,%xmm1
movlps %gs:0x0,%xmm1
movhps %gs:0x8,%xmm1
xorps %xmm0,%xmm0
movlps %gs:0x0,%xmm0
movhps %gs:0x8,%xmm0
addps %xmm1,%xmm0
movlps %xmm0,%gs:0x0
movhps %xmm0,%gs:0x8
Regards.
Iain
|
February 01, 2011 Re: Support for gcc vector attributes, SIMD builtins | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mike Farnsworth | == Quote from Mike Farnsworth (mike.farnsworth@gmail.com)'s article > Iain Buclaw Wrote: > > Interestingly enough, this particular example actually ICEs the compiler. It appears that while *explicit* casting is done in the code, DMDFE actually *ignores* this, which is terrible on DMD's part... > Hah. It's obvious dmdfe doesn't understand that the builtin's signature correctly, so I'll hold off on a bug report until I can figure out what kind of signature that builtin had registered with dmdfe. Actually, it appears it's much more simpler than that. IntA a; IntB b; a = cast(IntA)b; Although explicit casts are required to not get errors, somewhere in the semantic stage (I presume), the frontend decides no codegen is required to perform the cast, so omits it. Where this puts GDC (I think), is that the backend is told to perform a convert/move where the to and from register types are different (due to attributes applied to the type), and triggers an assert. It's nothing too much to worry about, but maybe raise a bug (to remind me to look at it in better depth sometime). > > Saying that, workaround is to use array types. > > typedef float[4] __m128; > > typedef float[4] __v4sf; > > > > > > All the more reason to show you that pragma(attribute) is still very incomplete to > > use. Any ideas to improve it are welcome though. :) > In my (not very abundant) spare time, I'll poke around the attribute stuff to see if I can attach the vector_size(16) attribute to a float[4] array type. I know the __builtin_ia32_addps function, for example, takes a v4sf (__m128 is just Intel's version that can change personalities at will; I feel no inclination to keep it around, and instead go with more strictly defined types and cast intrinsics). If I can get that builtin to take a typedef'd float[4] without a cast, perhaps dmdfe will not drop any data and the codegen will happen properly. > Where do I look to see the attribute pragmas in gdc? Where do I look to potentially change the signature that dmdfe sees for the __builtin_ia32_* functions? If I can get a hand-coded signature to work, then we'll be in business. > -Mike In d-builtins2.cc: // Entry point for import gcc.builtins, builds GCC builtins in DMD AST on the fly. d_gcc_magic_builtins_module() // Convert GCC type to DMD type, builds functions as well normal D types. gcc_type_to_d_type() In d-bi-attrs*.h are all handlers for supported attributes (you needn't touch them, as they are all copied from c-common.c). Regards Iain |
February 01, 2011 Re: Support for gcc vector attributes, SIMD builtins | ||||
---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | Iain Buclaw Wrote:
> == Quote from Jerry Quinn (jlquinn@optonline.net)'s article
> > Iain Buclaw Wrote:
> > > == Quote from Mike Farnsworth (mike.farnsworth@gmail.com)'s article
> > > > I built gdc from tip on Fedora 13 (x86-64) and started playing around
> > > > with creating a vector struct (x,y,z,w) to see what kind of optimization
> > > > the code generator did with it. It was able to partially drop into SSE
> > > > registers and instructions, but not as well as I had hoped from writing
> > > > "regular" D code.
> > > > I poked through the builtins that get pulled into d-builtins.c /
> > > > d-builtins2.cc but I don't see anything that might be pulling in
> > > > definitions such as __builtin_ia32_* for SSE, for example.
> > > > How hard would it be to get some sort of vector attribute attached to a
> > > > type (or just plain indroduce v4sf, __m128, or something like that) and
> > > > get those SIMD builtins available?
> > >
> > > Saying that, workaround is to use array types.
> > > typedef float[4] __m128;
> > > typedef float[4] __v4sf;
> > >
> > >
> > > All the more reason to show you that pragma(attribute) is still very incomplete to
> > > use. Any ideas to improve it are welcome though. :)
> > The workaround actually looks like a cleaner way to define types for vector
> intrinsics. How hard would it be to export vector intrinsics so the API expects float[4], for example?
>
> I haven't given it much thought on how internal representation could be, but I'd lean on using unions in D code for usage in the language. As its probably most portable.
>
> For example, one of the older 'hello vectors' I know of:
>
> import std.c.stdio;
>
> pragma(set_attribute, __v4sf, vector_size(16));
> typedef float __v4sf;
>
> union f4vector
> {
> __v4sf v;
> float[4] f;
> }
>
> int main()
> {
> f4vector a, b, c;
>
> a.f = [1, 2, 3, 4];
> b.f = [5, 6, 7, 8];
>
> c.v = a.v + b.v;
> printf("%f, %f, %f, %f\n", c.f[0], c.f[1], c.f[2], c.f[3]);
>
> return 0;
> }
>
>
> Compile: gdc -c -g -msse hellovector.d
> Dump Object: objdump -dS hellovector.o'
>
> And the output of the SIMD operation speaks for itself:
>
> c.v = a.v + b.v;
> xorps %xmm1,%xmm1
> movlps %gs:0x0,%xmm1
> movhps %gs:0x8,%xmm1
> xorps %xmm0,%xmm0
> movlps %gs:0x0,%xmm0
> movhps %gs:0x8,%xmm0
> addps %xmm1,%xmm0
> movlps %xmm0,%gs:0x0
> movhps %xmm0,%gs:0x8
>
>
> Regards.
> Iain
Huh, that's actually pretty promising. Hooray for gcc's vector ops. =)
I suppose I should still try to beat up on the __builtin_ia32_* stuff to make sure that can work, but if the codegen already gets us that far then that's pretty good. With a little -O3 it might even clean up some of the extraneous stuff, especially with a sequence of vector operations. The intrinsics on will get us some of the more interesting things like movemasks, shuffles, vector compares, etc.
As long as the union doesn't cause a bunch of load/store deadweight in the generated code, this might work nicely. However, I'll bet dmdfe doesn't undertand that __v4sf isn't really just a float, though...so at some point that will need to be fixed so that there is not accidental slicing and invalid array/structure sizes, etc.
-Mike
|
February 06, 2011 Re: Support for gcc vector attributes, SIMD builtins | ||||
---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | On 02/01/2011 10:38 AM, Iain Buclaw wrote: > I haven't given it much thought on how internal representation could be, but I'd lean on using unions in D code for usage in the language. As its probably most portable. > > For example, one of the older 'hello vectors' I know of: > > import std.c.stdio; > > pragma(set_attribute, __v4sf, vector_size(16)); > typedef float __v4sf; > > union f4vector > { > __v4sf v; > float[4] f; > } > > int main() > { > f4vector a, b, c; > > a.f = [1, 2, 3, 4]; > b.f = [5, 6, 7, 8]; > > c.v = a.v + b.v; > printf("%f, %f, %f, %f\n", c.f[0], c.f[1], c.f[2], c.f[3]); > > return 0; > } I've been giving this a serious try, and while the above works, I can't get any __builtin_... functions to actually work. I've added support for the VECTOR_TYPE tree code in gcc_type_to_d_type(tree) function (in d_builtins2.cc): case VECTOR_TYPE: { tree baseType = TREE_TYPE(t); d = gcc_type_to_d_type(baseType, printstuff); if (d) return d; break; } This allows it to succeed in interpreting the SSE-related builtins in gcc_type_to_d_type(tree). Note that all it does is grab the base vector element type and convert that to a D type so as not to confuse the frontend; this way it matches the typedef for __v4sf, so as long as we use the union we won't lose data before we can pass it to a builtin. I've verified (with a bunch of verbatim(...) calls) that the compiler *is* pushing function declarations of things like __builtin_ia32_addps, but I cannot for the life of me get my actual D code to see any of those functions: ======== pragma(set_attribute, __v4sf, vector_size(16)); typedef float __v4sf; union v4f { __v4sf v; float[4] f; } import gcc.builtins; pragma(set_attribute, _mm_add_ps, always_inline, artificial); __v4sf _mm_add_ps(__v4sf __A, __v4sf __B) { return __builtin_ia32_addps(__A, __B); } ======== And I get: ../../Vectors.d:24: Error: undefined identifier __builtin_ia32_addps If I explicitly prefix the call as gcc.builtins.__builtin_ia32_addps(__A, __B) I get: ../../Vectors.d:24: Error: undefined identifier module builtins.__builtin_ia32_addps Which doesn't make a whole lot of sense. I thought there might be something wrong recognizing the argument types, so I tried __isnanf and __isnan builtins as well, and...same failures. I don't think any of the builtins besides the alias declarations are working, honestly. (__builtin_Clong, __builtin_Culong, etc do work, but that's the only thing from gcc.builtins that I can access without errors). Any hints? -Mike |
Copyright © 1999-2021 by the D Language Foundation