Jump to page: 1 2 3
Thread overview
Benchmarking mir.ndslice + lubeck against numpy and Julia
Jan 11, 2020
p.shkadzko
Jan 11, 2020
user1234
Jan 11, 2020
p.shkadzko
Jan 11, 2020
Jon Degenhardt
Jan 12, 2020
JN
Jan 12, 2020
Doc Andrew
Jan 12, 2020
p.shkadzko
Jan 12, 2020
Dennis
Jan 12, 2020
p.shkadzko
Jan 12, 2020
bachmeier
Jan 12, 2020
Dennis
Jan 12, 2020
Arine
Jan 12, 2020
Arine
Jan 12, 2020
p.shkadzko
Jan 12, 2020
bachmeier
Jan 12, 2020
bachmeier
Jan 13, 2020
jmh530
Jan 13, 2020
bachmeier
Jan 13, 2020
jmh530
Jan 13, 2020
jmh530
Jan 13, 2020
Pavel Shkadzko
Jan 14, 2020
9il
January 11, 2020
Today I decided to write a couple of benchmarks to compare D mir with lubeck against Python numpy, then I also added Julia snippets. The results appeared to be quite interesting.


Allocation and SVD of [5000 x 10] matrix:

+--------+--------------------+------------+
|  lang  |        libs        | time (sec) |
+--------+--------------------+------------+
| Python | numpy+scipy        |        0.5 |
| Julia  | LinearAlgebra      |     0.0014 |
| D      | mir.ndslice+lubeck |        0.6 |
+--------+--------------------+------------+


Allocation and SVD of [5000 x 100] matrix:

+--------+--------------------+------------+
|  lang  |        libs        | time (sec) |
+--------+--------------------+------------+
| Python | numpy+scipy        |        4.5 |
| Julia  | LinearAlgebra      |      0.024 |
| D      | mir.ndslice+lubeck |        5.2 |
+--------+--------------------+------------+


Allocation and SVD of [5000 x 1000] matrix:

+--------+--------------------+------------+
|  lang  |        libs        | time (sec) |
+--------+--------------------+------------+
| Python | numpy+scipy        |        1.4 |
| Julia  | LinearAlgebra      |          1 |
| D      | mir.ndslice+lubeck |      12.85 |
+--------+--------------------+------------+


Allocation and SVD of [500 x 10000] matrix:

+--------+--------------------+------------+
|  lang  |        libs        | time (sec) |
+--------+--------------------+------------+
| Python | numpy+scipy        |       2.34 |
| Julia  | LinearAlgebra      |        1.1 |
| D      | mir.ndslice+lubeck |         25 |
+--------+--------------------+------------+


Matrices allocation and dot product A [3000 x 3000] * B [3000 x 3000]

+--------+--------------------+------------+
|  lang  |        libs        | time (sec) |
+--------+--------------------+------------+
| Python | numpy+scipy        |       0.62 |
| Julia  | LinearAlgebra      |      0.215 |
| D      | mir.ndslice+lubeck |        1.5 |
+--------+--------------------+------------+


D lubeck's svd method it quite slow and gets even slower with growth of the second dimension while scipy svd becomes surprisingly faster. Dot product unfortunately is also disappointing. I can only complement Julia on such amazing results.

Below is the code I was using.

Allocation and SVD of [A x B] matrix:
Python
------
import numpy as np
from scipy.linalg import svd
import timeit

def svd_fun():
    data = np.random.randint(0, 1024, 5000000).reshape((5000, 1000))
    u, s, v = svd(data)

print(timeit.timeit(svd_fun, number=1))

Julia
-----
using LinearAlgebra
using BenchmarkTools

function svdFun()
    a = rand([0, 1024], 5000, 1000)
    res = svd(a)
end

@btime svdFun()


D mir.ndslice+lubeck
--------------------
/+dub.sdl:
dependency "mir" version="~>3.2.0"
dependency "lubeck" version="~>1.1.7"
libs "lapack" "openblas"
+/
import std.datetime;
import std.datetime.stopwatch : benchmark;
import std.stdio;
import std.random : Xorshift, unpredictableSeed, uniform;
import std.array: array;
import std.range: generate, take;


import mir.ndslice;
import mir.math.common : optmath;
import lubeck;


void svdFun()
{
    Xorshift rnd;
    rnd.seed(unpredictableSeed);
    auto matrix = generate(() => uniform(0, 1024, rnd))
        .take(5000_000)
        .array
        .sliced(5000, 1000);
    auto svdResult = matrix.svd;
}

void main()
{
    auto svdTime = benchmark!(svdFun)(1);
    writeln(svdTime);

}

Matrices allocation and dot product A [3000 x 3000] * B [3000 x 3000]
Python
------
def dot_fun():
    a = np.random.random(9000000).reshape((3000, 3000))
    b = np.random.random(9000000).reshape((3000, 3000))
    c = np.dot(a, b)
print(timeit.timeit(dot_fun, number=10)/10)

Julia
-----
function dotFun()
    a = rand([0, 1.0], 3000, 3000)
    b = rand([0, 1.0], 3000, 3000)
    c = dot(a, b)
end
@btime dotFun()

D mir.ndslice+lubeck
--------------------
static @optmath auto rndMatrix(T)(const T maxN, in int dimA, in int dimB) {
    Xorshift rnd;
    rnd.seed(unpredictableSeed);
    const amount = dimA * dimB;
    return generate(() => uniform(0, maxN, rnd))
        .take(amount)
        .array
        .sliced(dimA, dimB);
}

static @optmath T fmuladd(T, Z)(const T a, Z z){
    return a + z.a * z.b;
}

void dotFun() {
    auto matrixA = rndMatrix!double(1.0, 3000, 3000);
    auto matrixB = rndMatrix!double(1.0, 3000, 3000);
    auto zipped = zip!true(matrixA, matrixB);
    auto dot = reduce!fmuladd(0.0, zipped);

}

void main()
{
    auto dotTime = benchmark!(dotFun)(1);
    writeln(dotTime);
}
January 11, 2020
On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:
> Today I decided to write a couple of benchmarks to compare D mir with lubeck against Python numpy, then I also added Julia snippets. The results appeared to be quite interesting.
>
>
> [...]

Can you specify the command line used for D ?
For now it's not clear if use LDC, GDC or DMD and the options used (assertions compiled or not ?). DMD is not a good choice for benchmarking.
January 11, 2020
On Saturday, 11 January 2020 at 22:21:18 UTC, user1234 wrote:
> On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:
>> Today I decided to write a couple of benchmarks to compare D mir with lubeck against Python numpy, then I also added Julia snippets. The results appeared to be quite interesting.
>>
>>
>> [...]
>
> Can you specify the command line used for D ?
> For now it's not clear if use LDC, GDC or DMD and the options used (assertions compiled or not ?). DMD is not a good choice for benchmarking.

All D code was compiled with ldc2
dub build --compiler=ldc2 --single matrix_ops.d

I also tried using $DFLAGS="--O3" but that didn't produce better results.
January 11, 2020
On Saturday, 11 January 2020 at 22:50:46 UTC, p.shkadzko wrote:
> On Saturday, 11 January 2020 at 22:21:18 UTC, user1234 wrote:
>> On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:
>>> Today I decided to write a couple of benchmarks to compare D mir with lubeck against Python numpy, then I also added Julia snippets. The results appeared to be quite interesting.
>>>
>>>
>>> [...]
>>
>> Can you specify the command line used for D ?
>> For now it's not clear if use LDC, GDC or DMD and the options used (assertions compiled or not ?). DMD is not a good choice for benchmarking.
>
> All D code was compiled with ldc2
> dub build --compiler=ldc2 --single matrix_ops.d
>
> I also tried using $DFLAGS="--O3" but that didn't produce better results.

A useful set of flags to try:

    -O -release -flto=thin -defaultlib=phobos2-ldc-lto,druntime-ldc-lto

The '-flto' option turns on LTO. Doesn't always help, but sometimes makes a significant difference, especially when library code is used in tight loops. Another option is to disable bounds checking in @safe code (--boundscheck=off).
January 12, 2020
On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:
> Allocation and SVD of [5000 x 10] matrix:
>
> +--------+--------------------+------------+
> |  lang  |        libs        | time (sec) |
> +--------+--------------------+------------+
> | Python | numpy+scipy        |        0.5 |
> | Julia  | LinearAlgebra      |     0.0014 |
> | D      | mir.ndslice+lubeck |        0.6 |
> +--------+--------------------+------------+

Meh. I don't like this kind of table. I think it should at least say "Python/C". Imagine a newcomer encountering this table and being like "wow, D is slower even than Python, what a mess!". But numpy/scipy are running C underneath, so D is not that bad in comparison.
January 12, 2020
On Sunday, 12 January 2020 at 00:25:44 UTC, JN wrote:
>
> Meh. I don't like this kind of table. I think it should at least say "Python/C". Imagine a newcomer encountering this table and being like "wow, D is slower even than Python, what a mess!". But numpy/scipy are running C underneath, so D is not that bad in comparison.

C? Try Fortran! https://github.com/scipy/scipy/blob/master/scipy/linalg/src/id_dist/src/idz_svd.f (I think that's the underlying code being called in the OP's example)

But your point still stands. Most of the heavy-duty lifting done by Python scientific computing libs is done by carefully-tuned, long-standing native code. There's nothing stopping D from linking to said libraries...

-Doc
January 12, 2020
I ran your dot product test for python and D.

$ python main.py
0.6625702600000001

$ ./main
[100 ms, 916 ╬╝s, and 2 hnsecs] // 0.1009162 secs

Some things I guess. Dub is horrible. The defaults are horrible. A lot of the way it functions is horrible. So I'm not surprised you got worse results, even though it seems you have a faster PC than me.

    dub build --config=release --compiler=ldc2 --arch=x86_64 --single main.d

Dub sometimes defaults to x86, cause why not, it's not a dying platform. Which can give worse codegen. It also default to debug, cause, you know if you run something through dub it's obviously being run through a debugger.

January 12, 2020
On Sunday, 12 January 2020 at 05:05:15 UTC, Arine wrote:
> I ran your dot product test for python and D.
>
> $ python main.py
> 0.6625702600000001
>
> $ ./main
> [100 ms, 916 ╬╝s, and 2 hnsecs] // 0.1009162 secs
>
> Some things I guess. Dub is horrible. The defaults are horrible. A lot of the way it functions is horrible. So I'm not surprised you got worse results, even though it seems you have a faster PC than me.
>
>     dub build --config=release --compiler=ldc2 --arch=x86_64 --single main.d
>
> Dub sometimes defaults to x86, cause why not, it's not a dying platform. Which can give worse codegen. It also default to debug, cause, you know if you run something through dub it's obviously being run through a debugger.

ops `--config=release` should be `--build=release` cause you know dub doesn't have confusing argument names and a really horrible documentation page that literally just repeats all the same arguments 20 times.
January 12, 2020
On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:
> Today I decided to write a couple of benchmarks to compare D mir with lubeck against Python numpy, then I also added Julia snippets. The results appeared to be quite interesting.

A decent optimizer would remove all your code except the print statement. Make sure to output the result of the computation. Also make sure you use the same algorithms and accuracy. If you write your own innerproduct in one language then you should do so in the other languages as well and require the result to follow ieee754 by evaluating the result.

Please note that floating point code cannot be fully restructured by the optimizer without setting the optimizer to less predictable fast-math settings. So it cannot even in theory approach hand tuned library code.
January 12, 2020
On Sunday, 12 January 2020 at 00:25:44 UTC, JN wrote:
> On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:
>> Allocation and SVD of [5000 x 10] matrix:
>>
>> +--------+--------------------+------------+
>> |  lang  |        libs        | time (sec) |
>> +--------+--------------------+------------+
>> | Python | numpy+scipy        |        0.5 |
>> | Julia  | LinearAlgebra      |     0.0014 |
>> | D      | mir.ndslice+lubeck |        0.6 |
>> +--------+--------------------+------------+
>
> Meh. I don't like this kind of table. I think it should at least say "Python/C". Imagine a newcomer encountering this table and being like "wow, D is slower even than Python, what a mess!". But numpy/scipy are running C underneath, so D is not that bad in comparison.

Yes, technically we are benchmarking against C numpy and C/Fortran scipy through Python syntax layer. It should have been "C/Fortran".
« First   ‹ Prev
1 2 3