Thread overview | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
March 01, 2020 Improving dot product for standard multidimensional D arrays | ||||
---|---|---|---|---|
| ||||
Hello again, Thanks to previous thread on multidimensional arrays, I managed to play around with pure D matrix representations and even benchmark a little against numpy: +-------------------------------------------------------------+------------+-----------+ | benchmark | time (sec) | vs Numpy | +-------------------------------------------------------------+------------+-----------+ | Sum of two [5000, 6000] int array of arrays | ~0.28 | x4.5 | | Multiplication of two [5000, 6000] double array of arrays | ~0.3 | x2.6 | | Sum of two [5000, 6000] int struct matrices | ~0.039 | x0.6 | | Multiplication of two [5000, 6000] double struct matrices | ~0.135 | x1.2 | | L2 norm of [5000, 6000] double struct matrix | ~0.015 | x15 | | Sort of [5000, 6000] double struct matrix (axis=-1) | ~2.435 | x1.9 | | Dot product of [500, 600]&[600, 500] double struct matrices | ~0.172 | -- | +-------------------------------------------------------------+------------+-----------+ However, there is one benchmark I am trying to make at least a little comparable. That is the dot product of two struct matrices. Concretely [A x B] @ [B x A] = [A, A] operation. There is a dotProduct function in std.numeric but it works with 1D ranges only. After it was clear that array of arrays are not very good to represent multidimensional data, I used struct to represent a multidimensional arrays like so: ****************************************************************** struct Matrix(T) { T[] data; // keep our data as 1D array and reshape to 2D when needed int rows; int cols; // allow Matrix[] instead of Matrix.data[] alias data this; this(int rows, int cols) { this.data = new T[rows * cols]; this.rows = rows; this.cols = cols; } this(int rows, int cols, T[] data) { assert(data.length == rows * cols); this.data = data; this.rows = rows; this.cols = cols; } T[][] to2D() { return this.data.chunks(this.cols).array; } /// Allow element 2D indexing, e.g. Matrix[row, col] T opIndex(in int r, in int c) { return this.data[toIdx(this, r, c)]; } } pragma(inline) static int toIdx(T)(Matrix!T m, in int i, in int j) { return m.cols * i + j; } ****************************************************************** And here is the dot product function: ****************************************************************** Matrix!T matrixDotProduct(T)(Matrix!T m1, Matrix!T m2) in { assert(m1.rows == m2.cols); } do { Matrix!T m3 = Matrix!T(m1.rows, m2.cols); for (int i; i < m1.rows; ++i) { for (int j; j < m2.cols; ++j) { for (int k; k < m2.rows; ++k) { m3.data[toIdx(m3, i, j)] += m1[i, k] * m2[k, j]; } } } return m3; } ****************************************************************** However, attempting to run dotProduct on two 5000x6000 struct Matrices took ~20 min while 500x600 only 0.172 sec. And I wondered if there is something really wrong with the matrixDotProduct function. I can see that accessing the appropriate array member in Matrix.data is costly due to toIdx operation but, I can hardly explain why it gets so much costly. Maybe there is a better way to do it after all? |
March 02, 2020 Re: Improving dot product for standard multidimensional D arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to p.shkadzko | On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote:
> Hello again,
>
> [snip]
What compiler did you use and what flags?
|
March 02, 2020 Re: Improving dot product for standard multidimensional D arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to jmh530 | On Monday, 2 March 2020 at 11:33:25 UTC, jmh530 wrote:
> On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote:
>> Hello again,
>>
>> [snip]
>
>
> What compiler did you use and what flags?
Ah yes, sorry. I used latest ldc2 (1.20.0-x64) for Windows.
Dflags -mcpu=native and "inline", "optimize", "releaseMode".
Here is a dub.json of the project:
{
"name": "app",
"targetType": "executable",
"dependencies": {
"mir": "~>3.2.0"
},
"dflags-ldc": ["-mcpu=native"],
"buildTypes": {
"release": {
"buildOptions": ["releaseMode", "inline", "optimize"],
"dflags": ["-boundscheck=off"]
},
"tests": {
"buildOptions": ["unittests"]
}
}
}
|
March 02, 2020 Re: Improving dot product for standard multidimensional D arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to p.shkadzko | On Monday, 2 March 2020 at 13:35:15 UTC, p.shkadzko wrote:
> [snip]
Thanks. I don't have time right now to review this thoroughly. My recollection is that the dot product of two matrices is actually matrix multiplication, correct? It generally makes sense to defer to other people's implementation of this. I recommend trying lubeck's version against numpy. It uses a blas/lapack implementation. mir-glas, I believe, also has a version.
Also, I'm not sure if the fastmath attribute would do anything here, but something worth looking into.
|
March 02, 2020 Re: Improving dot product for standard multidimensional D arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to jmh530 | On Monday, 2 March 2020 at 15:00:56 UTC, jmh530 wrote:
> On Monday, 2 March 2020 at 13:35:15 UTC, p.shkadzko wrote:
>> [snip]
>
> Thanks. I don't have time right now to review this thoroughly. My recollection is that the dot product of two matrices is actually matrix multiplication, correct? It generally makes sense to defer to other people's implementation of this. I recommend trying lubeck's version against numpy. It uses a blas/lapack implementation. mir-glas, I believe, also has a version.
>
> Also, I'm not sure if the fastmath attribute would do anything here, but something worth looking into.
Yes, this it is a sum of multiplications between elements of two matrices or a scalar product in case of vectors. This is not simple element-wise multiplication that I did in earlier benchmarks.
I tested @fastmath and @optmath for toIdx function and that didn't change anyting.
|
March 02, 2020 Re: Improving dot product for standard multidimensional D arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to p.shkadzko | On Monday, 2 March 2020 at 18:17:05 UTC, p.shkadzko wrote:
> [snip]
> I tested @fastmath and @optmath for toIdx function and that didn't change anyting.
@optmath is from mir, correct? I believe it implies @fastmath. The latest code in mir doesn't have it doing anything else at least.
|
March 02, 2020 Re: Improving dot product for standard multidimensional D arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to p.shkadzko | On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote: > Hello again, > > Thanks to previous thread on multidimensional arrays, I managed to play around with pure D matrix representations and even benchmark a little against numpy: > > [...] Interesting growth of processing time. Could it be GC? +------------------+-------------+ | matrixDotProduct | time (sec.) | +------------------+-------------+ | 2x[100 x 100] | 0.01 | | 2x[1000 x 1000] | 2.21 | | 2x[1500 x 1000] | 5.6 | | 2x[1500 x 1500] | 9.28 | | 2x[2000 x 2000] | 44.59 | | 2x[2100 x 2100] | 55.13 | +------------------+-------------+ |
March 02, 2020 Re: Improving dot product for standard multidimensional D arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to p.shkadzko | On Monday, 2 March 2020 at 20:22:55 UTC, p.shkadzko wrote:
> [snip]
>
> Interesting growth of processing time. Could it be GC?
>
> +------------------+-------------+
> | matrixDotProduct | time (sec.) |
> +------------------+-------------+
> | 2x[100 x 100] | 0.01 |
> | 2x[1000 x 1000] | 2.21 |
> | 2x[1500 x 1000] | 5.6 |
> | 2x[1500 x 1500] | 9.28 |
> | 2x[2000 x 2000] | 44.59 |
> | 2x[2100 x 2100] | 55.13 |
> +------------------+-------------+
Your matrixDotProduct creates a new Matrix and then returns it. When you look at the Matrix struct, it is basically building upon D's GC-backed slices. So yes, you are using the GC here.
You could try creating the output matrices outside of the matrixDotProduct function and then pass them by pointer or reference into the function if you want to profile just the calculation.
|
March 02, 2020 Re: Improving dot product for standard multidimensional D arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to jmh530 | On Monday, 2 March 2020 at 20:56:50 UTC, jmh530 wrote: > On Monday, 2 March 2020 at 20:22:55 UTC, p.shkadzko wrote: >> [snip] >> >> Interesting growth of processing time. Could it be GC? >> >> +------------------+-------------+ >> | matrixDotProduct | time (sec.) | >> +------------------+-------------+ >> | 2x[100 x 100] | 0.01 | >> | 2x[1000 x 1000] | 2.21 | >> | 2x[1500 x 1000] | 5.6 | >> | 2x[1500 x 1500] | 9.28 | >> | 2x[2000 x 2000] | 44.59 | >> | 2x[2100 x 2100] | 55.13 | >> +------------------+-------------+ > > Your matrixDotProduct creates a new Matrix and then returns it. When you look at the Matrix struct, it is basically building upon D's GC-backed slices. So yes, you are using the GC here. > > You could try creating the output matrices outside of the matrixDotProduct function and then pass them by pointer or reference into the function if you want to profile just the calculation. I tried using ref (pointer to struct) but it only made things slower by 0.5 s. I an not passing the result matrix to "toIdx" anymore, this is not necessary we just need the column value. This didn't change anything though. Here is how the code looks now. ************************************************************************* pragma(inline) static int toIdx(int matrixCols, in int i, in int j) { return matrixCols * i + j; } Matrix!T matrixDotProduct(T)(Matrix!T m1, Matrix!T m2, ref Matrix!T initM) in { assert(m1.rows == m2.cols); } do { /// This implementation requires opIndex in Matrix struct. for (int i; i < m1.rows; ++i) { for (int j; j < m2.cols; ++j) { for (int k; k < m2.rows; ++k) { initM.data[toIdx(initM.cols, i, j)] += m1[i, k] * m2[k, j]; } } } return initM; } void main() { Matrix!double initMatrix = Matrix!double(m1.rows, m2.cols); auto e = matrixDotProduct!double(m5, m6, initMatrix).to2D; } ************************************************************************* I tried disabling GC via GC.disable; GC.enable; before and after 3 loops in matrixDotProduct to see what happens. But nothing changed :( |
March 03, 2020 Re: Improving dot product for standard multidimensional D arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to p.shkadzko | On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote: > Hello again, > > Thanks to previous thread on multidimensional arrays, I managed to play around with pure D matrix representations and even benchmark a little against numpy: > > [...] Matrix multiplication is about cache-friendly blocking. https://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.pdf `mir-blas` package can be used for matrix operations for ndslice. `cblas` - if you want to work with your own matrix type . |
Copyright © 1999-2021 by the D Language Foundation