March 03, 2020
it is difficult to write an efficient matrix matrix multiplication in any language. If you want a fair comparison, implement your naive method in python and compare those timings.

Op di 3 mrt. 2020 om 04:20 schreef 9il via Digitalmars-d-learn < digitalmars-d-learn@puremagic.com>:

> On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote:
> > Hello again,
> >
> > Thanks to previous thread on multidimensional arrays, I managed to play around with pure D matrix representations and even benchmark a little against numpy:
> >
> > [...]
>
> Matrix multiplication is about cache-friendly blocking. https://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.pdf
>
> `mir-blas` package can be used for matrix operations for ndslice.
>   `cblas`  - if you want to work with your own matrix type .
>


March 03, 2020
On Tuesday, 3 March 2020 at 10:25:27 UTC, maarten van damme wrote:
> it is difficult to write an efficient matrix matrix multiplication in any language. If you want a fair comparison, implement your naive method in python and compare those timings.
>
> Op di 3 mrt. 2020 om 04:20 schreef 9il via Digitalmars-d-learn < digitalmars-d-learn@puremagic.com>:
>
>> On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote:
>> > [...]
>>
>> Matrix multiplication is about cache-friendly blocking. https://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.pdf
>>
>> `mir-blas` package can be used for matrix operations for ndslice.
>>   `cblas`  - if you want to work with your own matrix type .

Yeah, got it. After some reading, I understand that's not trivial once bigger matrices are involved.
March 03, 2020
On Tuesday, 3 March 2020 at 10:25:27 UTC, maarten van damme wrote:
> it is difficult to write an efficient matrix matrix multiplication in any language. If you want a fair comparison, implement your naive method in python and compare those timings.
> [snip]

And of course there's going to be a big slowdown in using native python. Numpy basically calls blas in the background. A naive C implementation might be another comparison.
March 03, 2020
On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote:
> pragma(inline) static int toIdx(T)(Matrix!T m, in int i, in int j)
> {
>     return m.cols * i + j;
> }

This is row-major order [1]. BTW: Why don't you make toIdx a member of Matrix? It saves one parameter. You may also define opIndex as

   ref T opIndex(in int r, in int c)

Then the innermost summation becomes more readable:

   m3[i, j] += m1[i, k] * m2[k, j];

How about performing an in-place transposition of m2 before performing the dot product? Then you can then rewrite the innermost loop:

   m3[i, j] += m1[i, k] * m2[j, k]; // note: j and k swapped

This should avoid the costly jumping thru the memory. A good starting point for a performance analysis would be looking over the assember code of the innermost loop.

[1] https://en.wikipedia.org/wiki/Row_major
March 04, 2020
On 01.03.20 21:58, p.shkadzko wrote:
> 
> ******************************************************************
> Matrix!T matrixDotProduct(T)(Matrix!T m1, Matrix!T m2)
> in
> {
>      assert(m1.rows == m2.cols);

This asserts that the result is a square matrix. I think you want `m1.cols==m2.rows` instead.

> }
> do
> {
>      Matrix!T m3 = Matrix!T(m1.rows, m2.cols);
> 
>      for (int i; i < m1.rows; ++i)
>      {
>          for (int j; j < m2.cols; ++j)
>          {
>              for (int k; k < m2.rows; ++k)
>              {
>                  m3.data[toIdx(m3, i, j)] += m1[i, k] * m2[k, j];
>              }
>          }
>      }
>      return m3;
> }
> ******************************************************************
> ...
> I can see that accessing the appropriate array member in Matrix.data is costly due to toIdx operation but, I can hardly explain why it gets so much costly. Maybe there is a better way to do it after all?

Changing the order of the second and third loop probably goes a pretty long way in terms of cache efficiency:

Matrix!T matrixDotProduct(T)(Matrix!T m1,Matrix!T m2)in{
    assert(m1.cols==m2.rows);
}do{
    int m=m1.rows,n=m1.cols,p=m2.cols;
    Matrix!T m3=Matrix!T(m,p);
    foreach(i;0..m) foreach(j;0..n) foreach(k;0..p)
        m3.data[i*p+k]+=m1.data[i*n+j]*m2.data[j*p+k];
    return m3;
}


(untested.)
1 2
Next ›   Last »