February 06, 2011
On 2011-02-06 07:24, Mike Farnsworth wrote:
> On 02/01/2011 10:38 AM, Iain Buclaw wrote:
>> I haven't given it much thought on how internal representation could be, but I'd
>> lean on using unions in D code for usage in the language. As its probably most
>> portable.
>>
>> For example, one of the older 'hello vectors' I know of:
>>
>> import std.c.stdio;
>>
>> pragma(set_attribute, __v4sf, vector_size(16));
>> typedef float __v4sf;
>>
>> union f4vector
>> {
>>      __v4sf v;
>>      float[4] f;
>> }
>>
>> int main()
>> {
>>      f4vector a, b, c;
>>
>>      a.f = [1, 2, 3, 4];
>>      b.f = [5, 6, 7, 8];
>>
>>      c.v = a.v + b.v;
>>      printf("%f, %f, %f, %f\n", c.f[0], c.f[1], c.f[2], c.f[3]);
>>
>>      return 0;
>> }
>
> I've been giving this a serious try, and while the above works, I can't
> get any __builtin_... functions to actually work.  I've added support
> for the VECTOR_TYPE tree code in gcc_type_to_d_type(tree) function (in
> d_builtins2.cc):
>
>          case VECTOR_TYPE:
>          {
>              tree baseType = TREE_TYPE(t);
>              d = gcc_type_to_d_type(baseType, printstuff);
>              if (d)
>                  return d;
>              break;
>          }
>
> This allows it to succeed in interpreting the SSE-related builtins in
> gcc_type_to_d_type(tree).  Note that all it does is grab the base vector
> element type and convert that to a D type so as not to confuse the
> frontend; this way it matches the typedef for __v4sf, so as long as we
> use the union we won't lose data before we can pass it to a builtin.
>
> I've verified (with a bunch of verbatim(...) calls) that the compiler
> *is* pushing function declarations of things like __builtin_ia32_addps,
> but I cannot for the life of me get my actual D code to see any of those
> functions:
>
> ========
> pragma(set_attribute, __v4sf, vector_size(16));
> typedef float __v4sf;
>
> union v4f
> {
>      __v4sf v;
>      float[4] f;
> }
>
> import gcc.builtins;
>
> pragma(set_attribute, _mm_add_ps, always_inline, artificial);
>
> __v4sf _mm_add_ps(__v4sf __A, __v4sf __B)
> {
>      return __builtin_ia32_addps(__A, __B);
> }
> ========
>
> And I get:
> ../../Vectors.d:24: Error: undefined identifier __builtin_ia32_addps
>
> If I explicitly prefix the call as
> gcc.builtins.__builtin_ia32_addps(__A, __B)
>
> I get:
> ../../Vectors.d:24: Error: undefined identifier module
> builtins.__builtin_ia32_addps
>
> Which doesn't make a whole lot of sense.
>
> I thought there might be something wrong recognizing the argument types,
> so I tried __isnanf and __isnan builtins as well, and...same failures.
> I don't think any of the builtins besides the alias declarations are
> working, honestly.  (__builtin_Clong, __builtin_Culong, etc do work, but
> that's the only thing from gcc.builtins that I can access without errors).
>
> Any hints?
>
> -Mike

Don't know it it has anything to do with it but you should wrap any non-standard pragmas in a version block.

-- 
/Jacob Carlborg
February 06, 2011
== Quote from Mike Farnsworth (mike.farnsworth@gmail.com)'s article
> On 02/01/2011 10:38 AM, Iain Buclaw wrote:
> > I haven't given it much thought on how internal representation could be, but I'd lean on using unions in D code for usage in the language. As its probably most portable.
> >
> > For example, one of the older 'hello vectors' I know of:
> >
> > import std.c.stdio;
> >
> > pragma(set_attribute, __v4sf, vector_size(16));
> > typedef float __v4sf;
> >
> > union f4vector
> > {
> >     __v4sf v;
> >     float[4] f;
> > }
> >
> > int main()
> > {
> >     f4vector a, b, c;
> >
> >     a.f = [1, 2, 3, 4];
> >     b.f = [5, 6, 7, 8];
> >
> >     c.v = a.v + b.v;
> >     printf("%f, %f, %f, %f\n", c.f[0], c.f[1], c.f[2], c.f[3]);
> >
> >     return 0;
> > }
> I've been giving this a serious try, and while the above works, I can't
> get any __builtin_... functions to actually work.  I've added support
> for the VECTOR_TYPE tree code in gcc_type_to_d_type(tree) function (in
> d_builtins2.cc):
>         case VECTOR_TYPE:
>         {
>             tree baseType = TREE_TYPE(t);
>             d = gcc_type_to_d_type(baseType, printstuff);
>             if (d)
>                 return d;
>             break;
>         }
> This allows it to succeed in interpreting the SSE-related builtins in
> gcc_type_to_d_type(tree).  Note that all it does is grab the base vector
> element type and convert that to a D type so as not to confuse the
> frontend; this way it matches the typedef for __v4sf, so as long as we
> use the union we won't lose data before we can pass it to a builtin.

Try:
        case VECTOR_TYPE:
        {
            tree basetype = TYPE_DEBUG_REPRESENTATION_TYPE(t);
            assert(TREE_CODE(basetype) == RECORD_TYPE);
            basetype = TREE_TYPE(TYPE_FIELDS(basetype));
            d = gcc_type_to_d_type(basetype);
            if (d)
            {
                d->ctype = t;
                return d;
            }
            break;
        }

That makes them static arrays, so you needn't require a whacky union to use vector functions.

  float[4] a = [1,2,3,4], b = [5,6,7,8], c;
  c = __builtin_ia32_addps(a,b);



Secondly, __builtin_ia32_addps requires SSE turned on. Compile with -msse


Regards
February 06, 2011
== Quote from Iain Buclaw (ibuclaw@ubuntu.com)'s article
> == Quote from Mike Farnsworth (mike.farnsworth@gmail.com)'s article
> > On 02/01/2011 10:38 AM, Iain Buclaw wrote:
> > > I haven't given it much thought on how internal representation could be, but I'd lean on using unions in D code for usage in the language. As its probably most portable.
> > >
> > > For example, one of the older 'hello vectors' I know of:
> > >
> > > import std.c.stdio;
> > >
> > > pragma(set_attribute, __v4sf, vector_size(16));
> > > typedef float __v4sf;
> > >
> > > union f4vector
> > > {
> > >     __v4sf v;
> > >     float[4] f;
> > > }
> > >
> > > int main()
> > > {
> > >     f4vector a, b, c;
> > >
> > >     a.f = [1, 2, 3, 4];
> > >     b.f = [5, 6, 7, 8];
> > >
> > >     c.v = a.v + b.v;
> > >     printf("%f, %f, %f, %f\n", c.f[0], c.f[1], c.f[2], c.f[3]);
> > >
> > >     return 0;
> > > }
> > I've been giving this a serious try, and while the above works, I can't
> > get any __builtin_... functions to actually work.  I've added support
> > for the VECTOR_TYPE tree code in gcc_type_to_d_type(tree) function (in
> > d_builtins2.cc):
> >         case VECTOR_TYPE:
> >         {
> >             tree baseType = TREE_TYPE(t);
> >             d = gcc_type_to_d_type(baseType, printstuff);
> >             if (d)
> >                 return d;
> >             break;
> >         }
> > This allows it to succeed in interpreting the SSE-related builtins in
> > gcc_type_to_d_type(tree).  Note that all it does is grab the base vector
> > element type and convert that to a D type so as not to confuse the
> > frontend; this way it matches the typedef for __v4sf, so as long as we
> > use the union we won't lose data before we can pass it to a builtin.
> Try:
>         case VECTOR_TYPE:
>         {
>             tree basetype = TYPE_DEBUG_REPRESENTATION_TYPE(t);
>             assert(TREE_CODE(basetype) == RECORD_TYPE);
>             basetype = TREE_TYPE(TYPE_FIELDS(basetype));
>             d = gcc_type_to_d_type(basetype);
>             if (d)
>             {
>                 d->ctype = t;
>                 return d;
>             }
>             break;
>         }
> That makes them static arrays, so you needn't require a whacky union to use vector
> functions.

A better way actually:

        case VECTOR_TYPE:
        {
            d = gcc_type_to_d_type(TREE_TYPE(t));
            if (d)
            {
                d = new TypeSArray(d,
                        new IntegerExp(0, TYPE_VECTOR_SUBPARTS(t),
                            Type::tindex));
                d->ctype = t;
                return d;
            }
            break;
        }

Happy hacking! :)

Regards
Iain
February 06, 2011
On 2/6/2011 4:15 AM, Iain Buclaw wrote:
> == Quote from Mike Farnsworth (mike.farnsworth@gmail.com)'s article
>> On 02/01/2011 10:38 AM, Iain Buclaw wrote:
>>> I haven't given it much thought on how internal representation could be, but I'd lean on using unions in D code for usage in the language. As its probably most portable.
>>>
>>> For example, one of the older 'hello vectors' I know of:
>>>
>>> import std.c.stdio;
>>>
>>> pragma(set_attribute, __v4sf, vector_size(16));
>>> typedef float __v4sf;
>>>
>>> union f4vector
>>> {
>>>     __v4sf v;
>>>     float[4] f;
>>> }
>>>
>>> int main()
>>> {
>>>     f4vector a, b, c;
>>>
>>>     a.f = [1, 2, 3, 4];
>>>     b.f = [5, 6, 7, 8];
>>>
>>>     c.v = a.v + b.v;
>>>     printf("%f, %f, %f, %f\n", c.f[0], c.f[1], c.f[2], c.f[3]);
>>>
>>>     return 0;
>>> }
>> I've been giving this a serious try, and while the above works, I can't
>> get any __builtin_... functions to actually work.  I've added support
>> for the VECTOR_TYPE tree code in gcc_type_to_d_type(tree) function (in
>> d_builtins2.cc):
>>         case VECTOR_TYPE:
>>         {
>>             tree baseType = TREE_TYPE(t);
>>             d = gcc_type_to_d_type(baseType, printstuff);
>>             if (d)
>>                 return d;
>>             break;
>>         }
>> This allows it to succeed in interpreting the SSE-related builtins in
>> gcc_type_to_d_type(tree).  Note that all it does is grab the base vector
>> element type and convert that to a D type so as not to confuse the
>> frontend; this way it matches the typedef for __v4sf, so as long as we
>> use the union we won't lose data before we can pass it to a builtin.
> 
> Try:
>         case VECTOR_TYPE:
>         {
>             tree basetype = TYPE_DEBUG_REPRESENTATION_TYPE(t);
>             assert(TREE_CODE(basetype) == RECORD_TYPE);
>             basetype = TREE_TYPE(TYPE_FIELDS(basetype));
>             d = gcc_type_to_d_type(basetype);
>             if (d)
>             {
>                 d->ctype = t;
>                 return d;
>             }
>             break;
>         }
> 
> That makes them static arrays, so you needn't require a whacky union to use vector functions.
> 
>   float[4] a = [1,2,3,4], b = [5,6,7,8], c;
>   c = __builtin_ia32_addps(a,b);
> 
> 
> 
> Secondly, __builtin_ia32_addps requires SSE turned on. Compile with -msse
> 
> 
> Regards

I'd be happy to have gcc finding vectorization opportunities, but there's no need to add this sort of thing to the language.  This already has a hook to call a library function:

float[4] a = [1,2,3,4], b = [5,6,7,8], c;
c[] = a[] + b[];


February 06, 2011
== Quote from Brad Roberts (braddr@puremagic.com)'s article
> I'd be happy to have gcc finding vectorization opportunities, but there's no
need to add this sort of thing to the
> language.  This already has a hook to call a library function:
> float[4] a = [1,2,3,4], b = [5,6,7,8], c;
> c[] = a[] + b[];

Aye, and 9 times out of 10 I would agree with this thinking also.

The pros to hashing out GCC Vector intrinsics to the D frontend though are that the GCC backend has much more creative control over the codegen. Inlining and optimising the intrinsics in a far better way than optimising the overhead of an external library call.

Baring in mind that DMD's array libraries are already extremely performant anyway, I honestly don't see the harm if it makes the poignant speed freaks happy.

Regards
February 06, 2011
On 02/06/2011 02:58 PM, Iain Buclaw wrote:
> == Quote from Brad Roberts (braddr@puremagic.com)'s article
>> I'd be happy to have gcc finding vectorization opportunities, but there's no
> need to add this sort of thing to the
>> language.  This already has a hook to call a library function:
>> float[4] a = [1,2,3,4], b = [5,6,7,8], c;
>> c[] = a[] + b[];
> 
> Aye, and 9 times out of 10 I would agree with this thinking also.
> 
> The pros to hashing out GCC Vector intrinsics to the D frontend though are that the GCC backend has much more creative control over the codegen. Inlining and optimising the intrinsics in a far better way than optimising the overhead of an external library call.
> 
> Baring in mind that DMD's array libraries are already extremely performant anyway, I honestly don't see the harm if it makes the poignant speed freaks happy.
> 
> Regards

Yes, and I am definitely a "speed freak", but I have good reason: in my field, 3D rendering performance is extremely important.  If I can write a few classes in D that can get me to very optimized SSE code on x86(-64) for most vector/point/color operations, for example, that can make or break my ability to get anyone to use my renderer *at all*.

In D I can easily do the version(gnu) thing to make the program 100% cross-platform for the cases where I don't have the intrinsics.  I would love it if the array-wise operations were able to automatically just boil down to the intrinsics, but in order to make it fast enough they must always be 16-byte aligned, pass float[4] by SSE register where possible, etc, etc.  Someday, the compiler hopefully will just do that, but it doesn't always do it today (or really at all, in my tests of just the float[4] static arrays).

-Mike
February 06, 2011
On 2/6/2011 2:58 PM, Iain Buclaw wrote:
> == Quote from Brad Roberts (braddr@puremagic.com)'s article
>> I'd be happy to have gcc finding vectorization opportunities, but there's no
> need to add this sort of thing to the
>> language.  This already has a hook to call a library function:
>> float[4] a = [1,2,3,4], b = [5,6,7,8], c;
>> c[] = a[] + b[];
> 
> Aye, and 9 times out of 10 I would agree with this thinking also.
> 
> The pros to hashing out GCC Vector intrinsics to the D frontend though are that the GCC backend has much more creative control over the codegen. Inlining and optimising the intrinsics in a far better way than optimising the overhead of an external library call.
> 
> Baring in mind that DMD's array libraries are already extremely performant anyway, I honestly don't see the harm if it makes the poignant speed freaks happy.
> 
> Regards

The harm that I'd like to minimize (preferably avoid) is compiler specific language changes.  When GDC or LDC or DMD add little things to the language that aren't supported by all three, the choice of compilers used to build a chunk of code is reduced.  The result is a fragmentation of the language.

I agree that gcc's inliner and optimizers are way better than dmd's when it comes to vectors (among other things), and would love to see those brought to bear.

So, imho, try not to go any higher than the glue layer.

Later,
Brad
1 2
Next ›   Last »