Jump to page: 1 2 3
Thread overview
Standard D, Mir D benchmarks against Numpy (BLAS)
Mar 12, 2020
Pavel Shkadzko
Mar 12, 2020
rikki cattermole
Mar 12, 2020
Pavel Shkadzko
Mar 12, 2020
rikki cattermole
Mar 12, 2020
9il
Mar 12, 2020
9il
Mar 12, 2020
9il
Mar 12, 2020
jmh530
Mar 12, 2020
Pavel Shkadzko
Mar 12, 2020
9il
Mar 12, 2020
Pavel Shkadzko
Mar 12, 2020
9il
Mar 12, 2020
p.shkadzko
Mar 13, 2020
bachmeier
Mar 13, 2020
Patrick Schluter
Mar 12, 2020
jmh530
Mar 12, 2020
Pavel Shkadzko
Mar 12, 2020
9il
Mar 12, 2020
drug
Mar 12, 2020
Pavel Shkadzko
Mar 12, 2020
drug
Mar 14, 2020
Jacob Carlborg
Mar 14, 2020
9il
Mar 15, 2020
Pavel Shkadzko
Mar 15, 2020
Pavel Shkadzko
Mar 15, 2020
Jon Degenhardt
Mar 16, 2020
9il
March 12, 2020
I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers.
Since I am not a D expert, I would be happy if someone could take a second look and double check.

https://github.com/tastyminerals/mir_benchmarks

Compile and run the project via: dub run --compiler=ldc --build=release

*Table descriptions reduced to fit into post width.

+---------------------------------+---------------------+--------------------+---------------------+
| Description                     | Numpy (BLAS) (sec.) | Standard D (sec.)  | Mir D (sec.)        |
+---------------------------------+---------------------+--------------------+---------------------+
| sum of two 250x200 (50 loops)   | 0.00115             | 0.00400213(x3.5)   | 0.00014372(x1/8)    |
| mult of two 250x200 (50 loops)  | 0.0011578           | 0.0132323(x11.4)   | 0.00013852(x1/8.3)  |
| sum of two 500x600 (50 loops)   | 0.0101275           | 0.016496(x1.6)     | 0.00021556(x1/47)   |
| mult of two 500x600 (50 loops)  | 0.010182            | 0.06857(x6.7)      | 0.00021717(x1/47)   |
| sum of two 1k x 1k (50 loops)   | 0.0493201           | 0.0614544(x1.3)    | 0.000422135(x1/117) |
| mult of two 1k x 1k (50 loops)  | 0.0493693           | 0.233827(x4.7)     | 0.000453535(x1/109) |
| Scalar product of two 30k       | 0.0152186           | 0.0227465(x1.5)    | 0.0198812(x1.3)     |
| Dot product of 5k x 6k, 6k x 5k | 1.6084685           | --------------     | 2.03398(x1.2)       |
| L2 norm of 5k x 6k              | 0.0072423           | 0.0160546(x2.2)    | 0.0110136(x1.6)     |
| Quicksort of 5k x 6k            | 2.6516816           | 0.178071(x1/14.8)  | 1.52406(x1/0.6)     |
+---------------------------------+---------------------+--------------------+---------------------+


March 13, 2020
You forgot to disable the GC for both bench's.
Also @fastmath for standard_ops_bench.

FYI: standard_ops_bench does a LOT of memory allocations.
March 12, 2020
On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:
> I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers.
> Since I am not a D expert, I would be happy if someone could take a second look and double check.
>
> https://github.com/tastyminerals/mir_benchmarks
>
> Compile and run the project via: dub run --compiler=ldc --build=release
>
> *Table descriptions reduced to fit into post width.
>
> +---------------------------------+---------------------+--------------------+---------------------+
> | Description                     | Numpy (BLAS) (sec.) | Standard D (sec.)  | Mir D (sec.)        |
> +---------------------------------+---------------------+--------------------+---------------------+
> | sum of two 250x200 (50 loops)   | 0.00115             | 0.00400213(x3.5)   | 0.00014372(x1/8)    |
> | mult of two 250x200 (50 loops)  | 0.0011578           | 0.0132323(x11.4)   | 0.00013852(x1/8.3)  |
> | sum of two 500x600 (50 loops)   | 0.0101275           | 0.016496(x1.6)     | 0.00021556(x1/47)   |
> | mult of two 500x600 (50 loops)  | 0.010182            | 0.06857(x6.7)      | 0.00021717(x1/47)   |
> | sum of two 1k x 1k (50 loops)   | 0.0493201           | 0.0614544(x1.3)    | 0.000422135(x1/117) |
> | mult of two 1k x 1k (50 loops)  | 0.0493693           | 0.233827(x4.7)     | 0.000453535(x1/109) |
> | Scalar product of two 30k       | 0.0152186           | 0.0227465(x1.5)    | 0.0198812(x1.3)     |
> | Dot product of 5k x 6k, 6k x 5k | 1.6084685           | --------------     | 2.03398(x1.2)       |
> | L2 norm of 5k x 6k              | 0.0072423           | 0.0160546(x2.2)    | 0.0110136(x1.6)     |
> | Quicksort of 5k x 6k            | 2.6516816           | 0.178071(x1/14.8)  | 1.52406(x1/0.6)     |
> +---------------------------------+---------------------+--------------------+---------------------+

Haha
March 12, 2020
On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:
> I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers.
> Since I am not a D expert, I would be happy if someone could take a second look and double check.
>

Generally speaking, the D/Mir code of the benchmark is slow by how it has been written.
I am not arguing you to use  D/Mir. Furthermore, sometimes I am arguing my clients to do not to use it if you can. On the commercial request, I can write the benchmark or an applied algorithm so D/Mir will beat numpy in all the tests including gemm. --Ilya
March 12, 2020
On Thursday, 12 March 2020 at 14:00:48 UTC, 9il wrote:
> On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:
>> I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers.
>> Since I am not a D expert, I would be happy if someone could take a second look and double check.
>>
>
> Generally speaking, the D/Mir code of the benchmark is slow by how it has been written.
> I am not arguing you to use  D/Mir. Furthermore, sometimes I am arguing my clients to do not to use it if you can. On the commercial request, I can write the benchmark or an applied algorithm so D/Mir will beat numpy in all the tests including gemm. --Ilya

Ah, nevermind, the forum table didn't show mir numbers aligned. Thank you for the work. I will open an MR with a few addons.
March 12, 2020
On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:
> [snip]

Looked into some of those that aren't faster than numpy:

For dot product, (what I would just call matrix multiplication), both functions are using gemm. There might be some quirks that have caused a difference in performance, but otherwise I would expect to be pretty close and it is. It looks like you are allocating the output matrix with the GC, which could be a driver of the difference.

For the L2-norm, you are calculating the L2 norm entry-wise as a Froebenius norm. That should be the same as the default for numpy. For numpy, the only difference I can tell between yours and there is that it re-uses its dot product function. Otherwise it looks the same.
March 12, 2020
On Thursday, 12 March 2020 at 14:00:48 UTC, 9il wrote:
> [snip]
>
> Generally speaking, the D/Mir code of the benchmark is slow by how it has been written.
> I am not arguing you to use  D/Mir. Furthermore, sometimes I am arguing my clients to do not to use it if you can. On the commercial request, I can write the benchmark or an applied algorithm so D/Mir will beat numpy in all the tests including gemm. --Ilya

I saw your subsequent post about not seeing the numbers, but I think my broader response is that most people don't need to get every single drop of performance. Typical performance for numpy versus typical performance for mir is still valuable information for people to know.
March 12, 2020
On 3/12/20 3:59 PM, Pavel Shkadzko wrote:
> 
> [snip]
> 
How long the benchmark runs? It have already took 20 min and continue to run in "Mir D" stage.

P.S.
Probably the reason is that I use
```
"subConfigurations": {"mir-blas": "blas"},
```
instead of
```
"subConfigurations": {"mir-blas": "twolib"},
```
March 12, 2020
On Thursday, 12 March 2020 at 13:18:41 UTC, rikki cattermole wrote:
> You forgot to disable the GC for both bench's.
> Also @fastmath for standard_ops_bench.
>
> FYI: standard_ops_bench does a LOT of memory allocations.

Thank you.
Add GC.disable; inside the main function, right? It didn't really change anything for any of the benchmarks, maybe I did it wrong.
Does @fastmath work only on functions with plain loops or everything with math ops? It is not clear from LDC docs.


March 12, 2020
On Thursday, 12 March 2020 at 14:26:14 UTC, drug wrote:
> On 3/12/20 3:59 PM, Pavel Shkadzko wrote:
>> 
>> [snip]
>> 
> How long the benchmark runs? It have already took 20 min and continue to run in "Mir D" stage.
>
> P.S.
> Probably the reason is that I use
> ```
> "subConfigurations": {"mir-blas": "blas"},
> ```
> instead of
> ```
> "subConfigurations": {"mir-blas": "twolib"},
> ```

For Numpy Python it's ~1m 30s, but for all D benchmarks it takes around ~2m on my machine which I think is the real benchmark here :)
« First   ‹ Prev
1 2 3