Thread overview
MIR vs. Numpy
Nov 18, 2020
Tobias Schmidt
Nov 18, 2020
Bastiaan Veelo
Nov 18, 2020
John Colvin
Nov 18, 2020
jmh530
Nov 18, 2020
9il
Nov 18, 2020
Max Haughton
Nov 18, 2020
jmh530
Nov 20, 2020
Tobias Schmidt
Nov 18, 2020
9il
November 18, 2020
Dear all,

to compare MIR and Numpy in the HPC context, we implemented a multigrid solver in Python using Numpy and in D using Mir and perforemd some benchmarks with them.

You can find our code and results here:
https://github.com/typohnebild/numpy-vs-mir

Feedback is very welcome. Please feel free to open issues, pull requests or simply post your thoughts below.

Kind regards,
Tobias
November 18, 2020
On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt wrote:
> Dear all,
>
> to compare MIR and Numpy in the HPC context, we implemented a multigrid solver in Python using Numpy and in D using Mir and perforemd some benchmarks with them.
>
> You can find our code and results here:
> https://github.com/typohnebild/numpy-vs-mir

Nice numbers. I’m not a Python guy but I was under the impression that Numpy actually is written in C, so that when you benchmark Numpy you’re mostly benchmarking C, not Python. Therefore I had expected the Numpy performance to be much closer to D’s. An important factor I think, which I’m not sure you have discussed (didn’t look too closely), is the compiler backend that was used to compile D and Numpy. Then again, as a user one is mostly interested in the out-of-the-box performance, which this seems to be a good measure of.

— Bastiaan.
November 18, 2020
On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt wrote:
> Dear all,
>
> to compare MIR and Numpy in the HPC context, we implemented a multigrid solver in Python using Numpy and in D using Mir and perforemd some benchmarks with them.
>
> You can find our code and results here:
> https://github.com/typohnebild/numpy-vs-mir
>
> Feedback is very welcome. Please feel free to open issues, pull requests or simply post your thoughts below.
>
> Kind regards,
> Tobias

Very nice write up.

It's been a while since I've used numba, so I was a little confused on the numba 1 and numba 8 runs.

It also looks like you are compiling on ldc with -mcpu=native --boundscheck=off. Why not -O as well?
November 18, 2020
On Wednesday, 18 November 2020 at 13:01:42 UTC, Bastiaan Veelo wrote:
> On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt wrote:
>> Dear all,
>>
>> to compare MIR and Numpy in the HPC context, we implemented a multigrid solver in Python using Numpy and in D using Mir and perforemd some benchmarks with them.
>>
>> You can find our code and results here:
>> https://github.com/typohnebild/numpy-vs-mir
>
> Nice numbers. I’m not a Python guy but I was under the impression that Numpy actually is written in C, so that when you benchmark Numpy you’re mostly benchmarking C, not Python. Therefore I had expected the Numpy performance to be much closer to D’s. An important factor I think, which I’m not sure you have discussed (didn’t look too closely), is the compiler backend that was used to compile D and Numpy. Then again, as a user one is mostly interested in the out-of-the-box performance, which this seems to be a good measure of.
>
> — Bastiaan.

A lot of numpy is in C, C++, fortran, asm etc....

But when you chain a bunch of things together, you are going via python. The language boundary (and python being slow) means that internal iteration in native code is a requirement for performance, which leads to eager allocation for composability via python, which then hurts performance. Numpy makes a very good effort, but is always constrained by this. Clever schemes with laziness where operations in python are actually just composing operations for execution later/on-demand can work as an alternative, but a) that's hard and b) even if you can completely avoid calling back in to python during iteration you would still need JIT to really unlock the performance.

Julia fixes this by having all/most in one language which is JIT'd

D can do the same with templates AOT, like C++/Eigen does but more flexible and less terrifying code. That's (one part of) what mir provides.
November 18, 2020
On Wednesday, 18 November 2020 at 13:14:37 UTC, jmh530 wrote:
> On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt wrote:
>
> It also looks like you are compiling on ldc with -mcpu=native --boundscheck=off. Why not -O as well?

-O is added by DUB
November 18, 2020
On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt wrote:
> Dear all,
>
> to compare MIR and Numpy in the HPC context, we implemented a multigrid solver in Python using Numpy and in D using Mir and perforemd some benchmarks with them.
>
> You can find our code and results here:
> https://github.com/typohnebild/numpy-vs-mir
>
> Feedback is very welcome. Please feel free to open issues, pull requests or simply post your thoughts below.
>
> Kind regards,
> Tobias

Thank you a lot! It is a huge benefit for Mir and D to have so quality benchmarks.

Python's sweep_3D access memory only once for one element computation, while old D's sweep_slice access it 7 times.

A PR [1] for new version of sweep_slice was added, I expect it will be at least twice faster. The new sweep_slice uses a more D'sh approach and single memory access to the computation element.

[1] https://github.com/typohnebild/numpy-vs-mir/pull/1

Cheers,
Ilya
November 18, 2020
On Wednesday, 18 November 2020 at 15:20:19 UTC, 9il wrote:
> On Wednesday, 18 November 2020 at 13:14:37 UTC, jmh530 wrote:
>> On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt wrote:
>>
>> It also looks like you are compiling on ldc with -mcpu=native --boundscheck=off. Why not -O as well?
>
> -O is added by DUB

Just -O? LDC is quite impressive with lto and cross-module-inlining turned on
November 18, 2020
On Wednesday, 18 November 2020 at 15:20:19 UTC, 9il wrote:
> [snip]
>
> -O is added by DUB

Ah, the -release-nobounds
November 20, 2020
Thanks for all of your feedback!

On Wednesday, 18 November 2020 at 13:14:37 UTC, jmh530 wrote:
> It's been a while since I've used numba, so I was a little confused on the numba 1 and numba 8 runs.

The number was meant as the number of used threads in our runs. The prefix 'numba' is indicating if numba was used (numba) or not (nonumba).
We have added a section to clarify this. Thanks for the hint.