March 12, 2020
On Thursday, 12 March 2020 at 14:00:48 UTC, 9il wrote:
> On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:
>> I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers.
>> Since I am not a D expert, I would be happy if someone could take a second look and double check.
>>
>
> Generally speaking, the D/Mir code of the benchmark is slow by how it has been written.
> I am not arguing you to use  D/Mir. Furthermore, sometimes I am arguing my clients to do not to use it if you can. On the commercial request, I can write the benchmark or an applied algorithm so D/Mir will beat numpy in all the tests including gemm. --Ilya

Didn't understand. You argue against D/Mir usage when talking to your clients?

Actually, I feel like it is also useful to have unoptimized D code benchmarked because this is how most people will write their code when they first write it. Although, I can hardly call these benchmarks unoptimized because I use LDC optimization flags as well as some tips from you.
March 13, 2020
On 13/03/2020 3:27 AM, Pavel Shkadzko wrote:
> On Thursday, 12 March 2020 at 13:18:41 UTC, rikki cattermole wrote:
>> You forgot to disable the GC for both bench's.
>> Also @fastmath for standard_ops_bench.
>>
>> FYI: standard_ops_bench does a LOT of memory allocations.
> 
> Thank you.
> Add GC.disable; inside the main function, right? It didn't really change anything for any of the benchmarks, maybe I did it wrong.

Okay that means no GC collection was triggered during your benchmarks.

This is good to know, that means the performance problems are indeed on your end and not runtime related.

> Does @fastmath work only on functions with plain loops or everything with math ops? It is not clear from LDC docs.

Try it :)

I have no idea how much it'll help. You have used it on one but not the other, so it seems odd to not do it on both.
March 12, 2020
On 3/12/20 5:30 PM, Pavel Shkadzko wrote:
> 
> For Numpy Python it's ~1m 30s, but for all D benchmarks it takes around ~2m on my machine which I think is the real benchmark here :)

Hmm, I see. In my case the benchmark hangs up in dgemm_ infinitely(
March 12, 2020
On Thursday, 12 March 2020 at 14:12:14 UTC, jmh530 wrote:
> On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:
>> [snip]
>
> Looked into some of those that aren't faster than numpy:
>
> For dot product, (what I would just call matrix multiplication), both functions are using gemm. There might be some quirks that have caused a difference in performance, but otherwise I would expect to be pretty close and it is. It looks like you are allocating the output matrix with the GC, which could be a driver of the difference.
>
> For the L2-norm, you are calculating the L2 norm entry-wise as a Froebenius norm. That should be the same as the default for numpy. For numpy, the only difference I can tell between yours and there is that it re-uses its dot product function. Otherwise it looks the same.

Numpy uses BLAS "gemm" and D uses OpenBlas "gemm".
March 12, 2020
On Thursday, 12 March 2020 at 15:18:43 UTC, Pavel Shkadzko wrote:
> On Thursday, 12 March 2020 at 14:12:14 UTC, jmh530 wrote:
>> [...]
>
> Numpy uses BLAS "gemm" and D uses OpenBlas "gemm".

Depending on the system they can use the same or configure specific like OpenBlas or intel MKL (sure about mir, NumPy likely allows to do it as well )
March 12, 2020
On Thursday, 12 March 2020 at 14:37:13 UTC, Pavel Shkadzko wrote:
> On Thursday, 12 March 2020 at 14:00:48 UTC, 9il wrote:
>> On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:
>>> I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers.
>>> Since I am not a D expert, I would be happy if someone could take a second look and double check.
>>>
>>
>> Generally speaking, the D/Mir code of the benchmark is slow by how it has been written.
>> I am not arguing you to use  D/Mir. Furthermore, sometimes I am arguing my clients to do not to use it if you can. On the commercial request, I can write the benchmark or an applied algorithm so D/Mir will beat numpy in all the tests including gemm. --Ilya
>
> Didn't understand. You argue against D/Mir usage when talking to your clients?

It depends on the problem they wanted me to solve.

> Actually, I feel like it is also useful to have unoptimized D code benchmarked because this is how most people will write their code when they first write it. Although, I can hardly call these benchmarks unoptimized because I use LDC optimization flags as well as some tips from you.

Agreed.  I just misunderstood the table at the forum, it was misaligned for me. The numbers look cool, thank you for the benchmark. Mir sorting looks slower then Phobos, it is interesting, and need a fix. You can use Phobos sorting with ndslice the same way with `each`.

Minor updates
https://github.com/tastyminerals/mir_benchmarks/pull/1

March 12, 2020
On Thursday, 12 March 2020 at 15:34:58 UTC, 9il wrote:
> On Thursday, 12 March 2020 at 14:37:13 UTC, Pavel Shkadzko wrote:
>> [...]
>
> It depends on the problem they wanted me to solve.
>
>> [...]
>
> Agreed.  I just misunderstood the table at the forum, it was misaligned for me. The numbers look cool, thank you for the benchmark. Mir sorting looks slower then Phobos, it is interesting, and need a fix. You can use Phobos sorting with ndslice the same way with `each`.
>
> Minor updates
> https://github.com/tastyminerals/mir_benchmarks/pull/1

Thank you for the comments!

Looks like I will be updating the benchmarks tables today :)
March 12, 2020
On Thursday, 12 March 2020 at 15:46:47 UTC, Pavel Shkadzko wrote:
> On Thursday, 12 March 2020 at 15:34:58 UTC, 9il wrote:
>> On Thursday, 12 March 2020 at 14:37:13 UTC, Pavel Shkadzko wrote:
>>> [...]
>>
>> It depends on the problem they wanted me to solve.
>>
>>> [...]
>>
>> Agreed.  I just misunderstood the table at the forum, it was misaligned for me. The numbers look cool, thank you for the benchmark. Mir sorting looks slower then Phobos, it is interesting, and need a fix. You can use Phobos sorting with ndslice the same way with `each`.

Phobos sort bench bug report:
https://github.com/tastyminerals/mir_benchmarks/issues/2

>> Minor updates
>> https://github.com/tastyminerals/mir_benchmarks/pull/1
>
> Thank you for the comments!
>
> Looks like I will be updating the benchmarks tables today :)

another small update that changes the ration a lot
https://github.com/tastyminerals/mir_benchmarks/pull/3
March 12, 2020
On Thursday, 12 March 2020 at 15:34:58 UTC, 9il wrote:
> On Thursday, 12 March 2020 at 14:37:13 UTC, Pavel Shkadzko wrote:
>> On Thursday, 12 March 2020 at 14:00:48 UTC, 9il wrote:
>>> On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:
>>>>[...]
>>>
>>> Generally speaking, the D/Mir code of the benchmark is slow by how it has been written.
>>> I am not arguing you to use  D/Mir. Furthermore, sometimes I am arguing my clients to do not to use it if you can. On the commercial request, I can write the benchmark or an applied algorithm so D/Mir will beat numpy in all the tests including gemm. --Ilya
>>
>> Didn't understand. You argue against D/Mir usage when talking to your clients?
>
> It depends on the problem they wanted me to solve.
>
>> Actually, I feel like it is also useful to have unoptimized D code benchmarked because this is how most people will write their code when they first write it. Although, I can hardly call these benchmarks unoptimized because I use LDC optimization flags as well as some tips from you.
>
> Agreed.  I just misunderstood the table at the forum, it was misaligned for me. The numbers look cool, thank you for the benchmark. Mir sorting looks slower then Phobos, it is interesting, and need a fix. You can use Phobos sorting with ndslice the same way with `each`.
>
> Minor updates
> https://github.com/tastyminerals/mir_benchmarks/pull/1

I am actually intrigued with the timings of huge matrices. Why Mir D and Standard D are so much better than NumPy? Once we get to 500x600, 1000x1000 sizes there is a huge drop in performance for NumPy and not so much for D. You mentioned L3 cache but CPU architecture is equal for all the benchmarks so what's going on?
March 13, 2020
On Thursday, 12 March 2020 at 20:39:59 UTC, p.shkadzko wrote:

> I am actually intrigued with the timings of huge matrices. Why Mir D and Standard D are so much better than NumPy? Once we get to 500x600, 1000x1000 sizes there is a huge drop in performance for NumPy and not so much for D. You mentioned L3 cache but CPU architecture is equal for all the benchmarks so what's going on?

Been quite a while since I worked with numpy, but I think that's where you're hitting memory limits (easier to do with Python than with D) and it causes performance to deteriorate quickly. I had those problems with R, and I believe it's relatively easy to hit that constraint with numpy as well, but you definitely want to find a numpy expert to confirm - something I definitely am not.