Jump to page: 1 2
Thread overview
Improving dot product for standard multidimensional D arrays
Mar 01, 2020
p.shkadzko
Mar 02, 2020
jmh530
Mar 02, 2020
p.shkadzko
Mar 02, 2020
jmh530
Mar 02, 2020
p.shkadzko
Mar 02, 2020
jmh530
Mar 02, 2020
p.shkadzko
Mar 02, 2020
jmh530
Mar 02, 2020
p.shkadzko
Mar 03, 2020
9il
Mar 03, 2020
maarten van damme
Mar 03, 2020
p.shkadzko
Mar 03, 2020
jmh530
Mar 03, 2020
kdevel
Mar 04, 2020
Timon Gehr
March 01, 2020
Hello again,

Thanks to previous thread on multidimensional arrays, I managed to play around with pure D matrix representations and even benchmark a little against numpy:

+-------------------------------------------------------------+------------+-----------+
|                          benchmark                          | time (sec) |  vs Numpy |
+-------------------------------------------------------------+------------+-----------+
| Sum of two [5000, 6000] int array of arrays                 | ~0.28      | x4.5      |
| Multiplication of two [5000, 6000] double array of arrays   | ~0.3       | x2.6      |
| Sum of two [5000, 6000] int struct matrices                 | ~0.039     | x0.6      |
| Multiplication of two [5000, 6000] double struct matrices   | ~0.135     | x1.2      |
| L2 norm of [5000, 6000] double struct matrix                | ~0.015     | x15       |
| Sort of [5000, 6000] double struct matrix (axis=-1)         | ~2.435     | x1.9      |
| Dot product of [500, 600]&[600, 500] double struct matrices | ~0.172     | --        |
+-------------------------------------------------------------+------------+-----------+

However, there is one benchmark I am trying to make at least a little comparable. That is the dot product of two struct matrices. Concretely [A x B] @ [B x A] = [A, A] operation.
There is a dotProduct function in std.numeric but it works with 1D ranges only.

After it was clear that array of arrays are not very good to represent multidimensional data, I used struct to represent a multidimensional arrays like so:

******************************************************************
struct Matrix(T)
{
    T[] data; // keep our data as 1D array and reshape to 2D when needed
    int rows;
    int cols;
    // allow Matrix[] instead of Matrix.data[]
    alias data this;

    this(int rows, int cols)
    {
        this.data = new T[rows * cols];
        this.rows = rows;
        this.cols = cols;
    }

    this(int rows, int cols, T[] data)
    {
        assert(data.length == rows * cols);
        this.data = data;
        this.rows = rows;
        this.cols = cols;
    }

    T[][] to2D()
    {
        return this.data.chunks(this.cols).array;
    }

    /// Allow element 2D indexing, e.g. Matrix[row, col]
    T opIndex(in int r, in int c)
    {
        return this.data[toIdx(this, r, c)];
    }

}

pragma(inline) static int toIdx(T)(Matrix!T m, in int i, in int j)
{
    return m.cols * i + j;
}
******************************************************************

And here is the dot product function:

******************************************************************
Matrix!T matrixDotProduct(T)(Matrix!T m1, Matrix!T m2)
in
{
    assert(m1.rows == m2.cols);
}
do
{
    Matrix!T m3 = Matrix!T(m1.rows, m2.cols);

    for (int i; i < m1.rows; ++i)
    {
        for (int j; j < m2.cols; ++j)
        {
            for (int k; k < m2.rows; ++k)
            {
                m3.data[toIdx(m3, i, j)] += m1[i, k] * m2[k, j];
            }
        }
    }
    return m3;
}
******************************************************************

However, attempting to run dotProduct on two 5000x6000 struct Matrices took ~20 min while 500x600 only 0.172 sec. And I wondered if there is something really wrong with the matrixDotProduct function.

I can see that accessing the appropriate array member in Matrix.data is costly due to toIdx operation but, I can hardly explain why it gets so much costly. Maybe there is a better way to do it after all?
March 02, 2020
On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote:
> Hello again,
>
> [snip]


What compiler did you use and what flags?
March 02, 2020
On Monday, 2 March 2020 at 11:33:25 UTC, jmh530 wrote:
> On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote:
>> Hello again,
>>
>> [snip]
>
>
> What compiler did you use and what flags?

Ah yes, sorry. I used latest ldc2 (1.20.0-x64) for Windows.
Dflags -mcpu=native and "inline", "optimize", "releaseMode".

Here is a dub.json of the project:

{
	"name": "app",
	"targetType": "executable",
	"dependencies": {
        "mir": "~>3.2.0"
	},
	"dflags-ldc": ["-mcpu=native"],

	"buildTypes": {
		"release": {
			"buildOptions": ["releaseMode", "inline", "optimize"],
			"dflags": ["-boundscheck=off"]
		},
		"tests": {
			"buildOptions": ["unittests"]
		}

	}
}

March 02, 2020
On Monday, 2 March 2020 at 13:35:15 UTC, p.shkadzko wrote:
> [snip]

Thanks. I don't have time right now to review this thoroughly. My recollection is that the dot product of two matrices is actually matrix multiplication, correct? It generally makes sense to defer to other people's implementation of this. I recommend trying lubeck's version against numpy. It uses a blas/lapack implementation. mir-glas, I believe, also has a version.

Also, I'm not sure if the fastmath attribute would do anything here, but something worth looking into.

March 02, 2020
On Monday, 2 March 2020 at 15:00:56 UTC, jmh530 wrote:
> On Monday, 2 March 2020 at 13:35:15 UTC, p.shkadzko wrote:
>> [snip]
>
> Thanks. I don't have time right now to review this thoroughly. My recollection is that the dot product of two matrices is actually matrix multiplication, correct? It generally makes sense to defer to other people's implementation of this. I recommend trying lubeck's version against numpy. It uses a blas/lapack implementation. mir-glas, I believe, also has a version.
>
> Also, I'm not sure if the fastmath attribute would do anything here, but something worth looking into.

Yes, this it is a sum of multiplications between elements of two matrices or a scalar product in case of vectors. This is not simple element-wise multiplication that I did in earlier benchmarks.

I tested @fastmath and @optmath for toIdx function and that didn't change anyting.
March 02, 2020
On Monday, 2 March 2020 at 18:17:05 UTC, p.shkadzko wrote:
> [snip]
> I tested @fastmath and @optmath for toIdx function and that didn't change anyting.

@optmath is from mir, correct? I believe it implies @fastmath. The latest code in mir doesn't have it doing anything else at least.
March 02, 2020
On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote:
> Hello again,
>
> Thanks to previous thread on multidimensional arrays, I managed to play around with pure D matrix representations and even benchmark a little against numpy:
>
> [...]

Interesting growth of processing time. Could it be GC?

+------------------+-------------+
| matrixDotProduct | time (sec.) |
+------------------+-------------+
| 2x[100 x 100]    |        0.01 |
| 2x[1000 x 1000]  |        2.21 |
| 2x[1500 x 1000]  |         5.6 |
| 2x[1500 x 1500]  |        9.28 |
| 2x[2000 x 2000]  |       44.59 |
| 2x[2100 x 2100]  |       55.13 |
+------------------+-------------+
March 02, 2020
On Monday, 2 March 2020 at 20:22:55 UTC, p.shkadzko wrote:
> [snip]
>
> Interesting growth of processing time. Could it be GC?
>
> +------------------+-------------+
> | matrixDotProduct | time (sec.) |
> +------------------+-------------+
> | 2x[100 x 100]    |        0.01 |
> | 2x[1000 x 1000]  |        2.21 |
> | 2x[1500 x 1000]  |         5.6 |
> | 2x[1500 x 1500]  |        9.28 |
> | 2x[2000 x 2000]  |       44.59 |
> | 2x[2100 x 2100]  |       55.13 |
> +------------------+-------------+

Your matrixDotProduct creates a new Matrix and then returns it. When you look at the Matrix struct, it is basically building upon D's GC-backed slices. So yes, you are using the GC here.

You could try creating the output matrices outside of the matrixDotProduct function and then pass them by pointer or reference into the function if you want to profile just the calculation.
March 02, 2020
On Monday, 2 March 2020 at 20:56:50 UTC, jmh530 wrote:
> On Monday, 2 March 2020 at 20:22:55 UTC, p.shkadzko wrote:
>> [snip]
>>
>> Interesting growth of processing time. Could it be GC?
>>
>> +------------------+-------------+
>> | matrixDotProduct | time (sec.) |
>> +------------------+-------------+
>> | 2x[100 x 100]    |        0.01 |
>> | 2x[1000 x 1000]  |        2.21 |
>> | 2x[1500 x 1000]  |         5.6 |
>> | 2x[1500 x 1500]  |        9.28 |
>> | 2x[2000 x 2000]  |       44.59 |
>> | 2x[2100 x 2100]  |       55.13 |
>> +------------------+-------------+
>
> Your matrixDotProduct creates a new Matrix and then returns it. When you look at the Matrix struct, it is basically building upon D's GC-backed slices. So yes, you are using the GC here.
>
> You could try creating the output matrices outside of the matrixDotProduct function and then pass them by pointer or reference into the function if you want to profile just the calculation.

I tried using ref (pointer to struct) but it only made things slower by 0.5 s.
I an not passing the result matrix to "toIdx" anymore, this is not necessary we just need the column value. This didn't change anything though.
Here is how the code looks now.

*************************************************************************
pragma(inline) static int toIdx(int matrixCols, in int i, in int j)
{
    return matrixCols * i + j;
}

Matrix!T matrixDotProduct(T)(Matrix!T m1, Matrix!T m2, ref Matrix!T initM)
in
{
    assert(m1.rows == m2.cols);
}
do
{
    /// This implementation requires opIndex in Matrix struct.
    for (int i; i < m1.rows; ++i)
    {
        for (int j; j < m2.cols; ++j)
        {
            for (int k; k < m2.rows; ++k)
            {
                initM.data[toIdx(initM.cols, i, j)] += m1[i, k] * m2[k, j];
            }
        }
    }
    return initM;
}

void main() {
    Matrix!double initMatrix = Matrix!double(m1.rows, m2.cols);
    auto e = matrixDotProduct!double(m5, m6, initMatrix).to2D;
}
*************************************************************************

I tried disabling GC via GC.disable; GC.enable; before and after 3 loops in matrixDotProduct to see what happens. But nothing changed :(
March 03, 2020
On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote:
> Hello again,
>
> Thanks to previous thread on multidimensional arrays, I managed to play around with pure D matrix representations and even benchmark a little against numpy:
>
> [...]

Matrix multiplication is about cache-friendly blocking.
https://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.pdf

`mir-blas` package can be used for matrix operations for ndslice.  `cblas`  - if you want to work with your own matrix type .
« First   ‹ Prev
1 2