March 03, 2020 Re: Improving dot product for standard multidimensional D arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to 9il Attachments:
| it is difficult to write an efficient matrix matrix multiplication in any language. If you want a fair comparison, implement your naive method in python and compare those timings. Op di 3 mrt. 2020 om 04:20 schreef 9il via Digitalmars-d-learn < digitalmars-d-learn@puremagic.com>: > On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote: > > Hello again, > > > > Thanks to previous thread on multidimensional arrays, I managed to play around with pure D matrix representations and even benchmark a little against numpy: > > > > [...] > > Matrix multiplication is about cache-friendly blocking. https://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.pdf > > `mir-blas` package can be used for matrix operations for ndslice. > `cblas` - if you want to work with your own matrix type . > |
March 03, 2020 Re: Improving dot product for standard multidimensional D arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to maarten van damme | On Tuesday, 3 March 2020 at 10:25:27 UTC, maarten van damme wrote:
> it is difficult to write an efficient matrix matrix multiplication in any language. If you want a fair comparison, implement your naive method in python and compare those timings.
>
> Op di 3 mrt. 2020 om 04:20 schreef 9il via Digitalmars-d-learn < digitalmars-d-learn@puremagic.com>:
>
>> On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote:
>> > [...]
>>
>> Matrix multiplication is about cache-friendly blocking. https://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.pdf
>>
>> `mir-blas` package can be used for matrix operations for ndslice.
>> `cblas` - if you want to work with your own matrix type .
Yeah, got it. After some reading, I understand that's not trivial once bigger matrices are involved.
|
March 03, 2020 Re: Improving dot product for standard multidimensional D arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to maarten van damme | On Tuesday, 3 March 2020 at 10:25:27 UTC, maarten van damme wrote:
> it is difficult to write an efficient matrix matrix multiplication in any language. If you want a fair comparison, implement your naive method in python and compare those timings.
> [snip]
And of course there's going to be a big slowdown in using native python. Numpy basically calls blas in the background. A naive C implementation might be another comparison.
|
March 03, 2020 Re: Improving dot product for standard multidimensional D arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to p.shkadzko | On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote: > pragma(inline) static int toIdx(T)(Matrix!T m, in int i, in int j) > { > return m.cols * i + j; > } This is row-major order [1]. BTW: Why don't you make toIdx a member of Matrix? It saves one parameter. You may also define opIndex as ref T opIndex(in int r, in int c) Then the innermost summation becomes more readable: m3[i, j] += m1[i, k] * m2[k, j]; How about performing an in-place transposition of m2 before performing the dot product? Then you can then rewrite the innermost loop: m3[i, j] += m1[i, k] * m2[j, k]; // note: j and k swapped This should avoid the costly jumping thru the memory. A good starting point for a performance analysis would be looking over the assember code of the innermost loop. [1] https://en.wikipedia.org/wiki/Row_major |
March 04, 2020 Re: Improving dot product for standard multidimensional D arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to p.shkadzko | On 01.03.20 21:58, p.shkadzko wrote: > > ****************************************************************** > Matrix!T matrixDotProduct(T)(Matrix!T m1, Matrix!T m2) > in > { > assert(m1.rows == m2.cols); This asserts that the result is a square matrix. I think you want `m1.cols==m2.rows` instead. > } > do > { > Matrix!T m3 = Matrix!T(m1.rows, m2.cols); > > for (int i; i < m1.rows; ++i) > { > for (int j; j < m2.cols; ++j) > { > for (int k; k < m2.rows; ++k) > { > m3.data[toIdx(m3, i, j)] += m1[i, k] * m2[k, j]; > } > } > } > return m3; > } > ****************************************************************** > ... > I can see that accessing the appropriate array member in Matrix.data is costly due to toIdx operation but, I can hardly explain why it gets so much costly. Maybe there is a better way to do it after all? Changing the order of the second and third loop probably goes a pretty long way in terms of cache efficiency: Matrix!T matrixDotProduct(T)(Matrix!T m1,Matrix!T m2)in{ assert(m1.cols==m2.rows); }do{ int m=m1.rows,n=m1.cols,p=m2.cols; Matrix!T m3=Matrix!T(m,p); foreach(i;0..m) foreach(j;0..n) foreach(k;0..p) m3.data[i*p+k]+=m1.data[i*n+j]*m2.data[j*p+k]; return m3; } (untested.) |
Copyright © 1999-2021 by the D Language Foundation