Standard D, Mir D benchmarks against Numpy (BLAS)

I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers. Since I am not a D expert, I would be happy if someone could take a second look and double check. https://github.com/tastyminerals/mir_benchmarks Compile and run the project via: dub run --compiler=ldc --build=release *Table descriptions reduced to fit into post width. +---------------------------------+---------------------+--------------------+---------------------+ | Description | Numpy (BLAS) (sec.) | Standard D (sec.) | Mir D (sec.) | +---------------------------------+---------------------+--------------------+---------------------+ | sum of two 250x200 (50 loops) | 0.00115 | 0.00400213(x3.5) | 0.00014372(x1/8) | | mult of two 250x200 (50 loops) | 0.0011578 | 0.0132323(x11.4) | 0.00013852(x1/8.3) | | sum of two 500x600 (50 loops) | 0.0101275 | 0.016496(x1.6) | 0.00021556(x1/47) | | mult of two 500x600 (50 loops) | 0.010182 | 0.06857(x6.7) | 0.00021717(x1/47) | | sum of two 1k x 1k (50 loops) | 0.0493201 | 0.0614544(x1.3) | 0.000422135(x1/117) | | mult of two 1k x 1k (50 loops) | 0.0493693 | 0.233827(x4.7) | 0.000453535(x1/109) | | Scalar product of two 30k | 0.0152186 | 0.0227465(x1.5) | 0.0198812(x1.3) | | Dot product of 5k x 6k, 6k x 5k | 1.6084685 | -------------- | 2.03398(x1.2) | | L2 norm of 5k x 6k | 0.0072423 | 0.0160546(x2.2) | 0.0110136(x1.6) | | Quicksort of 5k x 6k | 2.6516816 | 0.178071(x1/14.8) | 1.52406(x1/0.6) | +---------------------------------+---------------------+--------------------+---------------------+

March 12, 2020

Re: Standard D, Mir D benchmarks against Numpy (BLAS)

Posted by 9il
in reply to Pavel Shkadzko

Permalink

9il

Posted in reply to Pavel Shkadzko

Permalink

On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:
> I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers.
> Since I am not a D expert, I would be happy if someone could take a second look and double check.
>
> https://github.com/tastyminerals/mir_benchmarks
>
> Compile and run the project via: dub run --compiler=ldc --build=release
>
> *Table descriptions reduced to fit into post width.
>
> +---------------------------------+---------------------+--------------------+---------------------+
> | Description                     | Numpy (BLAS) (sec.) | Standard D (sec.)  | Mir D (sec.)        |
> +---------------------------------+---------------------+--------------------+---------------------+
> | sum of two 250x200 (50 loops)   | 0.00115             | 0.00400213(x3.5)   | 0.00014372(x1/8)    |
> | mult of two 250x200 (50 loops)  | 0.0011578           | 0.0132323(x11.4)   | 0.00013852(x1/8.3)  |
> | sum of two 500x600 (50 loops)   | 0.0101275           | 0.016496(x1.6)     | 0.00021556(x1/47)   |
> | mult of two 500x600 (50 loops)  | 0.010182            | 0.06857(x6.7)      | 0.00021717(x1/47)   |
> | sum of two 1k x 1k (50 loops)   | 0.0493201           | 0.0614544(x1.3)    | 0.000422135(x1/117) |
> | mult of two 1k x 1k (50 loops)  | 0.0493693           | 0.233827(x4.7)     | 0.000453535(x1/109) |
> | Scalar product of two 30k       | 0.0152186           | 0.0227465(x1.5)    | 0.0198812(x1.3)     |
> | Dot product of 5k x 6k, 6k x 5k | 1.6084685           | --------------     | 2.03398(x1.2)       |
> | L2 norm of 5k x 6k              | 0.0072423           | 0.0160546(x2.2)    | 0.0110136(x1.6)     |
> | Quicksort of 5k x 6k            | 2.6516816           | 0.178071(x1/14.8)  | 1.52406(x1/0.6)     |
> +---------------------------------+---------------------+--------------------+---------------------+

Haha

On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote: > I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers. > Since I am not a D expert, I would be happy if someone could take a second look and double check. > Generally speaking, the D/Mir code of the benchmark is slow by how it has been written. I am not arguing you to use D/Mir. Furthermore, sometimes I am arguing my clients to do not to use it if you can. On the commercial request, I can write the benchmark or an applied algorithm so D/Mir will beat numpy in all the tests including gemm. --Ilya

On Thursday, 12 March 2020 at 14:00:48 UTC, 9il wrote: > On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote: >> I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers. >> Since I am not a D expert, I would be happy if someone could take a second look and double check. >> > > Generally speaking, the D/Mir code of the benchmark is slow by how it has been written. > I am not arguing you to use D/Mir. Furthermore, sometimes I am arguing my clients to do not to use it if you can. On the commercial request, I can write the benchmark or an applied algorithm so D/Mir will beat numpy in all the tests including gemm. --Ilya Ah, nevermind, the forum table didn't show mir numbers aligned. Thank you for the work. I will open an MR with a few addons.

On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote: > [snip] Looked into some of those that aren't faster than numpy: For dot product, (what I would just call matrix multiplication), both functions are using gemm. There might be some quirks that have caused a difference in performance, but otherwise I would expect to be pretty close and it is. It looks like you are allocating the output matrix with the GC, which could be a driver of the difference. For the L2-norm, you are calculating the L2 norm entry-wise as a Froebenius norm. That should be the same as the default for numpy. For numpy, the only difference I can tell between yours and there is that it re-uses its dot product function. Otherwise it looks the same.

On Thursday, 12 March 2020 at 14:00:48 UTC, 9il wrote: > [snip] > > Generally speaking, the D/Mir code of the benchmark is slow by how it has been written. > I am not arguing you to use D/Mir. Furthermore, sometimes I am arguing my clients to do not to use it if you can. On the commercial request, I can write the benchmark or an applied algorithm so D/Mir will beat numpy in all the tests including gemm. --Ilya I saw your subsequent post about not seeing the numbers, but I think my broader response is that most people don't need to get every single drop of performance. Typical performance for numpy versus typical performance for mir is still valuable information for people to know.

On 3/12/20 3:59 PM, Pavel Shkadzko wrote: > > [snip] > How long the benchmark runs? It have already took 20 min and continue to run in "Mir D" stage. P.S. Probably the reason is that I use ``` "subConfigurations": {"mir-blas": "blas"}, ``` instead of ``` "subConfigurations": {"mir-blas": "twolib"}, ```

On Thursday, 12 March 2020 at 13:18:41 UTC, rikki cattermole wrote: > You forgot to disable the GC for both bench's. > Also @fastmath for standard_ops_bench. > > FYI: standard_ops_bench does a LOT of memory allocations. Thank you. Add GC.disable; inside the main function, right? It didn't really change anything for any of the benchmarks, maybe I did it wrong. Does @fastmath work only on functions with plain loops or everything with math ops? It is not clear from LDC docs.

On Thursday, 12 March 2020 at 14:26:14 UTC, drug wrote: > On 3/12/20 3:59 PM, Pavel Shkadzko wrote: >> >> [snip] >> > How long the benchmark runs? It have already took 20 min and continue to run in "Mir D" stage. > > P.S. > Probably the reason is that I use > ``` > "subConfigurations": {"mir-blas": "blas"}, > ``` > instead of > ``` > "subConfigurations": {"mir-blas": "twolib"}, > ``` For Numpy Python it's ~1m 30s, but for all D benchmarks it takes around ~2m on my machine which I think is the real benchmark here :)

Forums