Jump to page: 1 24  
Page
Thread overview
Mir vs. Numpy: Reworked!
Dec 03
9il
Dec 05
9il
Dec 03
jmh530
Dec 05
9il
Dec 03
jmh530
Dec 03
mw
Dec 04
jmh530
Dec 04
jmh530
Dec 05
9il
Dec 05
9il
Dec 07
9il
Dec 07
jmh530
Dec 07
jmh530
Dec 10
jmh530
Dec 07
9il
Dec 05
9il
December 03
Hi all,

Since the first announcement [0] the original benchmark [1] has been boosted [2] with Mir-like implementations.

D+Mir:
 1. is more abstract than NumPy
 2. requires less code for multidimensional algorithms
 3. doesn't require indexing
 4. uses recursion across dimensions
 5. a few times faster than NumPy for non-trivial real-world applications.

Why Mir is faster than NumPy?

1. Mir allows the compiler to generate specialized kernels while NumPy constraints a user to write code that needs to access memory twice or more times.

Another Mir killer feature is the ability to write generalized N-dimensional implementations, while Numpy code needs to have separate implementations for 1D, 2D, and 3D cases. For example, the main D loop in the benchmark can compile for 4D, 5D, and higher dimensional optimizations.

2. @nogc iteration loop. @nogc helps when you need to control what is going on with your memory allocations in the critical code part.

[0] https://forum.dlang.org/post/pemharpztorlqkxdooul@forum.dlang.org
[1] https://github.com/typohnebild/numpy-vs-mir
[2] https://github.com/typohnebild/numpy-vs-mir/pull/1

The benchmark [1] has been created by Christoph Alt and Tobias Schmidt.

Kind regards,
Ilya

December 03
On Thursday, 3 December 2020 at 16:27:59 UTC, 9il wrote:
> Hi all,
>
> Since the first announcement [0] the original benchmark [1] has been boosted [2] with Mir-like implementations.
>
> D+Mir:
>  1. is more abstract than NumPy
>  2. requires less code for multidimensional algorithms
>  3. doesn't require indexing
>  4. uses recursion across dimensions
>  5. a few times faster than NumPy for non-trivial real-world applications.
>
> Why Mir is faster than NumPy?
>
> 1. Mir allows the compiler to generate specialized kernels while NumPy constraints a user to write code that needs to access memory twice or more times.
>
> Another Mir killer feature is the ability to write generalized N-dimensional implementations, while Numpy code needs to have separate implementations for 1D, 2D, and 3D cases. For example, the main D loop in the benchmark can compile for 4D, 5D, and higher dimensional optimizations.
>
> 2. @nogc iteration loop. @nogc helps when you need to control what is going on with your memory allocations in the critical code part.
>
> [0] https://forum.dlang.org/post/pemharpztorlqkxdooul@forum.dlang.org
> [1] https://github.com/typohnebild/numpy-vs-mir
> [2] https://github.com/typohnebild/numpy-vs-mir/pull/1
>
> The benchmark [1] has been created by Christoph Alt and Tobias Schmidt.
>
> Kind regards,
> Ilya

Hi Ilya,

Thanks a lot for sharing the update. I am currently working on porting a python package called FMPY to D. This package makes usage of numpy and I hope I can use MIR here.

Somehow it is hard to get started to learn MIR. What maybe could help python developers is to have some articles showing numpy coding and side by side the equivalent MIR coding.

What I miss in MIR is a function to read and write CSV files. Is s.th. like numpy.genfromtxt planned?

Kind regards
Andre
December 03
On Thursday, 3 December 2020 at 16:27:59 UTC, 9il wrote:
> Hi all,
>
> Since the first announcement [0] the original benchmark [1] has been boosted [2] with Mir-like implementations.
>
> D+Mir:
>  1. is more abstract than NumPy
>  2. requires less code for multidimensional algorithms
>  3. doesn't require indexing
>  4. uses recursion across dimensions
>  5. a few times faster than NumPy for non-trivial real-world applications.
>
> Why Mir is faster than NumPy?
>
> 1. Mir allows the compiler to generate specialized kernels while NumPy constraints a user to write code that needs to access memory twice or more times.
>
> Another Mir killer feature is the ability to write generalized N-dimensional implementations, while Numpy code needs to have separate implementations for 1D, 2D, and 3D cases. For example, the main D loop in the benchmark can compile for 4D, 5D, and higher dimensional optimizations.
>
> 2. @nogc iteration loop. @nogc helps when you need to control what is going on with your memory allocations in the critical code part.
>
> [0] https://forum.dlang.org/post/pemharpztorlqkxdooul@forum.dlang.org
> [1] https://github.com/typohnebild/numpy-vs-mir
> [2] https://github.com/typohnebild/numpy-vs-mir/pull/1
>
> The benchmark [1] has been created by Christoph Alt and Tobias Schmidt.
>
> Kind regards,
> Ilya

Looks good, but a few typos:

"The big difference is especially visible in this figures."

"For bigger prolem sizes the FLOP/s slightly drop and finally level out."

"Propably this is mainly caused by the overhead of the Python interpreter and might be reduced by more optimization efforts."
December 03
On Thursday, 3 December 2020 at 16:27:59 UTC, 9il wrote:
> Hi all,
>
> Since the first announcement [0] the original benchmark [1] has been boosted [2] with Mir-like implementations.
>
> ... [SNIP]
>
> Kind regards,
> Ilya

Very interesting work. What is the difference between Mir's field, slice, native and ndslice? Looks like your packages are really hotting up. I wonder how the performance compares with an equivalent Julia implementation.
December 03
On Thursday, 3 December 2020 at 20:25:11 UTC, data pulverizer wrote:
> [snip]
>
> Very interesting work. What is the difference between Mir's field, slice, native and ndslice? [...]

The document says:
    Slice: Python like. Uses D Slices and Strides for grouping (Red-Black).
    Naive: one for-loop for each dimension. Matrix-Access via multi-dimensional Array.
    Field: one for-loop. Matrix is flattened. Access via flattened index.
    NdSlice: D like. Uses just MIR functionalities.

December 03
On Thursday, 3 December 2020 at 21:28:04 UTC, jmh530 wrote:
> On Thursday, 3 December 2020 at 20:25:11 UTC, data pulverizer wrote:
>> [snip]
>>
>> Very interesting work. What is the difference between Mir's field, slice, native and ndslice? [...]
>
> The document says:
>     Slice: Python like. Uses D Slices and Strides for grouping (Red-Black).
>     Naive: one for-loop for each dimension. Matrix-Access via multi-dimensional Array.
>     Field: one for-loop. Matrix is flattened. Access via flattened index.
>     NdSlice: D like. Uses just MIR functionalities.


As Andre said:

"""
What maybe could help python developers is to have some articles showing numpy coding and side by side the equivalent MIR coding.
"""

I think such Numpy v.s Mir side-by-side equivalent (or improvement) document will greatly boost the adoption of Mir.


December 04
On Thursday, 3 December 2020 at 21:28:04 UTC, jmh530 wrote:
> The document says:
>     Slice: Python like. Uses D Slices and Strides for grouping (Red-Black).
>     Naive: one for-loop for each dimension. Matrix-Access via multi-dimensional Array.
>     Field: one for-loop. Matrix is flattened. Access via flattened index.
>     NdSlice: D like. Uses just MIR functionalities.

Thanks, evidently I should have been more thorough in reading the document. :-)
December 04
On Thursday, 3 December 2020 at 21:28:04 UTC, jmh530 wrote:
> The document says:
>     Slice: Python like. Uses D Slices and Strides for grouping (Red-Black).
>     Naive: one for-loop for each dimension. Matrix-Access via multi-dimensional Array.
>     Field: one for-loop. Matrix is flattened. Access via flattened index.
>     NdSlice: D like. Uses just MIR functionalities.

It's quite interesting because it says that it's well worth implementing a field index as supposed to naive access - at least for this algorithm. It makes sense because in the field case at least you know that all the data is on the same array - and therefore in close proximity in memory, whereas individual arrays in multidimensional array could be far apart in the memory. NDSlice is even faster for this case - cool. Am I correct in assuming that the data in the NDSlice is also a single array?

December 03
On 12/3/2020 8:27 AM, 9il wrote:
> Since the first announcement [0] the original benchmark [1] has been boosted [2] with Mir-like implementations.

This is really great! Can you write an article about it? Such would be really helpful in letting people know about it.
December 04
On Friday, 4 December 2020 at 02:35:49 UTC, data pulverizer wrote:
> [snip]
> NDSlice is even faster for this case - cool. Am I correct in assuming that the data in the NDSlice is also a single array?

It looks like all the `sweep_XXX` functions are only defined for contiguous slices, as that would be the default if define a Slice!(T, N).

How the functions access the data is a big difference. If you compare the `sweep_field` version with the `sweep_naive` version, the `sweep_field` function is able to access through one index, whereas the `sweep_naive` function has to use two in the 2d version and 3 in the 3d version.

Also, the main difference in the NDSlice version is that it uses *built-in* MIR functionality, like how `sweep_ndslice` uses the `each` function from MIR, whereas `sweep_field` uses a for loop. I think this is partially to show that the built-in MIR functionality is as fast as if you tried to do it with a for loop yourself.
« First   ‹ Prev
1 2 3 4