Thread overview
[Issue 16489] [backend][optimizaton][registers] DMD is 10-20 times slower for GLAS
Sep 26, 2016
Walter Bright
Sep 27, 2016
Walter Bright
Oct 08, 2016
Walter Bright
Nov 09, 2016
Walter Bright
September 26, 2016
https://issues.dlang.org/show_bug.cgi?id=16489

Walter Bright <bugzilla@digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla@digitalmars.com

--- Comment #1 from Walter Bright <bugzilla@digitalmars.com> ---
Could you post a short example, please?

--
September 27, 2016
https://issues.dlang.org/show_bug.cgi?id=16489

--- Comment #2 from Илья Ярошенко <ilyayaroshenko@gmail.com> ---
size_t length; // > 0
__vector(float[4])[2]* a; //aligned
float[6]* b;

__vector(float[4])[2][6] reg; // should be located in the registers // init reg = 0;

__vector(float[4])[2] ai = void;
__vector(float[4])[6] bi = void;


do {
   ai[0] = a[0][0]; // should be located in the registers
   ai[1] = a[0][1]; // should be located in the registers

   foreach(i; AliasSeq!(0, 1, 2, 3, 4, 5))
   {
      bi[i] = b[0][i]; // Issue 16488, // should be located in the registers
      reg[i][0] += ai[0] * bi[i];
      reg[i][1] += ai[1] * bi[i];
   }

   a++;
   b++;
} while(--length);

--
September 27, 2016
https://issues.dlang.org/show_bug.cgi?id=16489

Илья Ярошенко <ilyayaroshenko@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |SIMD

--
September 27, 2016
https://issues.dlang.org/show_bug.cgi?id=16489

--- Comment #3 from Walter Bright <bugzilla@digitalmars.com> ---
Ok, I understand. This is the 'slicing' optimization where an aggregate can be sliced up and stored in multiple registers. I went over it with deadalnix a while ago, as it was identified as a key optimization. It applies more generally than just for SIMD.

I also worked out a scheme for implementing it in the DMD BE, I don't think it is that hard, or I've misunderstood it. The slicing can be done if:

1. all accesses lie within slices (not across slice boundaries)
2. a pointer to the aggregate is not taken (because then you lose control of
(case 1)).

The slicing then becomes a rewrite of the IR so the aggregate is decomposed into multiple independent variables, and the rest of the backend then proceeds normally.

--
October 08, 2016
https://issues.dlang.org/show_bug.cgi?id=16489

Walter Bright <bugzilla@digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |performance

--
November 09, 2016
https://issues.dlang.org/show_bug.cgi?id=16489

--- Comment #4 from Walter Bright <bugzilla@digitalmars.com> ---
There are enough issues with the example that it won't compile, and inventing changes to make it compile may not show the issue. Can you please post one that does compile and illustrates the issue?

--
November 10, 2016
https://issues.dlang.org/show_bug.cgi?id=16489

--- Comment #5 from Илья Ярошенко <ilyayaroshenko@gmail.com> ---
(In reply to Walter Bright from comment #4)
> There are enough issues with the example that it won't compile, and inventing changes to make it compile may not show the issue. Can you please post one that does compile and illustrates the issue?


void foo(
    ref __vector(float[4])[2][6] c,
    __vector(float[4])[2]* a,
    __vector(float[4])[6]* b,
    size_t length)
{
    import std.meta;

    __vector(float[4])[2][6] reg = void; // should be located in the registers
    reg = c;

    __vector(float[4])[2] ai = void;
    __vector(float[4])[6] bi = void;

    do {
       ai[0] = a[0][0]; // should be located in the registers
       ai[1] = a[0][1]; // should be located in the registers

       foreach(i; AliasSeq!(0, 1, 2, 3, 4, 5))
       {
          bi[i] = b[0][i]; // Issue 16488, // should be located in the
registers
          reg[i][0] += ai[0] * bi[i];
          reg[i][1] += ai[1] * bi[i];
       }

       a++;
       b++;
    } while(--length);
    c = reg;
}

--
April 12, 2019
https://issues.dlang.org/show_bug.cgi?id=16489

Илья Ярошенко <ilyayaroshenko@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |LATER

--