Thread overview
[phobos] Ndslice speed
August 27
Hello,

I've come across the library experimental.ndslice, which is supposed to mimic NumPy. In order to test it I wrote a very crude matrix multiplication:

http://pastebin.com/Ew4u2iVz

and for comparison I also implemented it in Fortran90:

http://pastebin.com/6afnVyZF

Then I used linux's "time" command to time them each:

ifort test.f90 && time ./a.out
         600

real    0m0.154s
user    0m0.148s
sys     0m0.004s


dmd test.d  && time ./test
1.16681e+08
600   600

real    0m6.770s
user    0m6.772s
sys     0m0.004s


I understand that dmd is not optimized for speed, but in the end both do basically the same thing. Both implement 2D array and both array types include the size of the array (unlike C). Given that both are compiled languages the difference seems to be unreasonably large.

If I turn on boundschecking for Fortran I get:

ifort -check all test.f90 && time ./a.out
         600

real    0m6.049s
user    0m6.044s
sys     0m0.004s


which is roughly the speed difference I'd expect, but if I use the -boundscheck=off option for dmd that doesn't help. Am I using ndslice correctly? Why is the speed difference so large? How do I speed it up?

Kind regards

Matthias

_______________________________________________
phobos mailing list
phobos@puremagic.com
http://lists.puremagic.com/mailman/listinfo/phobos
August 31
It may help to turn dmd's optimizer and inliner on - "dmd -inline -release -O -boundscheck=off".

On Sun, Aug 28, 2016 at 5:43 AM, Matthias Redies via phobos <phobos@puremagic.com> wrote:
> Hello,
>
> I've come across the library experimental.ndslice, which is supposed to mimic NumPy. In order to test it I wrote a very crude matrix multiplication:
>
> http://pastebin.com/Ew4u2iVz
>
> and for comparison I also implemented it in Fortran90:
>
> http://pastebin.com/6afnVyZF
>
> Then I used linux's "time" command to time them each:
>
> ifort test.f90 && time ./a.out
>          600
>
> real    0m0.154s
> user    0m0.148s
> sys     0m0.004s
>
>
> dmd test.d  && time ./test
> 1.16681e+08
> 600   600
>
> real    0m6.770s
> user    0m6.772s
> sys     0m0.004s
>
>
> I understand that dmd is not optimized for speed, but in the end both do basically the same thing. Both implement 2D array and both array types include the size of the array (unlike C). Given that both are compiled languages the difference seems to be unreasonably large.
>
> If I turn on boundschecking for Fortran I get:
>
> ifort -check all test.f90 && time ./a.out
>          600
>
> real    0m6.049s
> user    0m6.044s
> sys     0m0.004s
>
>
> which is roughly the speed difference I'd expect, but if I use the -boundscheck=off option for dmd that doesn't help. Am I using ndslice correctly? Why is the speed difference so large? How do I speed it up?
>
> Kind regards
>
> Matthias
>
> _______________________________________________
> phobos mailing list
> phobos@puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
_______________________________________________
phobos mailing list
phobos@puremagic.com
http://lists.puremagic.com/mailman/listinfo/phobos
December 29
On 08/27/2016 09:43 PM, Matthias Redies via phobos wrote:
> I understand that dmd is not optimized for speed, but in the end both do basically the same thing. Both implement 2D array and both array types include the size of the array (unlike C). Given that both are compiled languages the difference seems to be unreasonably large.
The difference is likely so huge, because one is using vectorized ops
(SSE), which dmd doesn't do.
Use an optimizing compiler that support auto-vectorization (ldc or gdc)
and get into touch with the ndslice authors.
You should be able to get very similar numbers.

-Martin
_______________________________________________
phobos mailing list
phobos@puremagic.com
http://lists.puremagic.com/mailman/listinfo/phobos
December 30
On 08/27/2016 09:43 PM, Matthias Redies via phobos wrote:
> Hello,
>
> I've come across the library experimental.ndslice, which is supposed to mimic NumPy. In order to test it I wrote a very crude matrix multiplication:
Posting a reply from Ilya here:

Hi Matthias,

It is incorrect to compare the same code for ndslice and fortran because:

1. current ndslice is numpy like vectors (matrixes are always has both
string and raw strides).
2. m[i, j] can not be vectrized for non-strided vectors too because D
language constraint: D has not macros engine; operator overloading for
[i, j] destruct vectorisation for LDC and GDC.

You can achieve the same speed as fortran if you will use mir.ndslice.algorithm [1]. It is available at [3] (with mir.ndslice). The blog post is about mir.ndslice.algorithm can be found at [2]. An LDC compiler should be used (DMD is supported but it is too slow).

We are working on new version of ndslice, which will include classic BLAS-like matrixes, and will simplify mir.ndslice.algorithm logic [4] (it is still can not be used, will be released during one month). With new ndslice m[i, j] will be still slow, however indexing as m[i][j] will be fast as fortran.

In general forward access (front/popFront) is more user-friendly for
vectorisation then random access (indexing like [i, j]).

Please use mir.ndslice.algorithm for now.

Best regards,
Ilya

[1] http://docs.mir.dlang.io/latest/mir_ndslice_algorithm.html
[2]
http://blog.mir.dlang.io/ndslice/algorithm/optimization/2016/12/12/writing-efficient-numerical-code.html
[3] https://github.com/libmir/mir
[4] https://github.com/libmir/mir-algorithm