Thread overview
ndBenchmarks #1: ndslice.algorithm vs std.numeric vs std.algorithm
Aug 03, 2016
Ilya Yaroshenko
Aug 03, 2016
Ilya Yaroshenko
Aug 03, 2016
Ilya Yaroshenko
Aug 03, 2016
Seb
Aug 03, 2016
Johan Engelen
Aug 04, 2016
Ilya Yaroshenko
Aug 04, 2016
jmh530
Aug 09, 2016
Ilya Yaroshenko
August 03, 2016
Hi all,

There are two first [1] benchmarks for upcoming ndslice.algorithm [2].
Recent LDC alpha based on LLVM 3.8 and recent Mir v0.16.0-alpha3 are required. @fasmath syntax may be changed a little bit and will be simplified anyway.

Dot Product:

       ndReduce vectorized = 3 ms, 314 μs
                  ndReduce = 14 ms, 767 μs
numeric.dotProduct, arrays = 7 ms, 260 μs
numeric.dotProduct, slices = 14 ms, 782 μs
              zip & reduce = 74 ms, 280 μs

Euclidean Distance:

                ndReduce vectorized = 3 ms, 668 μs
                           ndReduce = 14 ms, 595 μs
  numeric.euclideanDistance, arrays = 14 ms, 463 μs
  numeric.euclideanDistance, slices = 14 ms, 465 μs
                       zip & reduce = 73 ms, 678 μs

[1] https://github.com/libmir/mir/tree/master/benchmarks/ndslice
[2] https://github.com/dlang/phobos/pull/4652

Best regards,
Ilya

August 03, 2016
On Wednesday, 3 August 2016 at 20:53:59 UTC, Ilya Yaroshenko wrote:

Update:

> Dot Product:
>               zip & reduce = 74 ms, 280 μs

              zip & reduce = 44 ms, 57 μs

>
> Euclidean Distance:
>                        zip & reduce = 73 ms, 678 μs

                       zip & reduce = 44 ms, 646 μs


August 03, 2016
The tests above are for double precision floating point numbers. The results for single precision are below.

Dot Product (single precision):

       ndReduce vectorized = 2 ms, 200 μs
                  ndReduce = 14 ms, 543 μs
numeric.dotProduct, arrays = 7 ms, 208 μs
numeric.dotProduct, slices = 14 ms, 414 μs
              zip & reduce = 43 ms, 657 μs

Euclidean Distance (single precisoin):

                ndReduce vectorized = 2 ms, 226 μs
                           ndReduce = 14 ms, 661 μs
  numeric.euclideanDistance, arrays = 14 ms, 597 μs
  numeric.euclideanDistance, slices = 14 ms, 581 μs
                       zip & reduce = 46 ms, 759 μs

August 03, 2016
On Wednesday, 3 August 2016 at 20:53:59 UTC, Ilya Yaroshenko wrote:
> Hi all,
>
> There are two first [1] benchmarks for upcoming ndslice.algorithm [2].
> Recent LDC alpha based on LLVM 3.8 and recent Mir v0.16.0-alpha3 are required. @fasmath syntax may be changed a little bit and will be simplified anyway.
>
> [...]

Ilya: The result are awesome!!
Let's make some noise:

https://www.reddit.com/r/programming/comments/4w16i5/ndslicealgorithm_speed_up_your_matrix/
August 03, 2016
On Wednesday, 3 August 2016 at 20:53:59 UTC, Ilya Yaroshenko wrote:
> Hi all,
>
> There are two first [1] benchmarks for upcoming ndslice.algorithm [2].
> Recent LDC alpha based on LLVM 3.8 and recent Mir v0.16.0-alpha3 are required. @fasmath syntax may be changed a little bit and will be simplified anyway.
>
> Dot Product:
>
>        ndReduce vectorized = 3 ms, 314 μs
>                   ndReduce = 14 ms, 767 μs

**That's** the difference with or without fastmath??

(awesome work of course!)

-Johan

August 04, 2016
On Wednesday, 3 August 2016 at 22:22:19 UTC, Johan Engelen wrote:
> On Wednesday, 3 August 2016 at 20:53:59 UTC, Ilya Yaroshenko
>> Dot Product:
>>
>>        ndReduce vectorized = 3 ms, 314 μs
>>                   ndReduce = 14 ms, 767 μs
>
> **That's** the difference with or without fastmath??

The first one is with @fastmath and addition execution branch for iteration in case of stride equal to 1.
August 04, 2016
On Wednesday, 3 August 2016 at 20:53:59 UTC, Ilya Yaroshenko wrote:
> Hi all,
>
> There are two first [1] benchmarks for upcoming ndslice.algorithm [2].
> Recent LDC alpha based on LLVM 3.8 and recent Mir v0.16.0-alpha3 are required. @fasmath syntax may be changed a little bit and will be simplified anyway.


Keep up the good work!
August 09, 2016
On Wednesday, 3 August 2016 at 20:53:59 UTC, Ilya Yaroshenko wrote:
> Hi all,
>
> There are two first [1] benchmarks for upcoming ndslice.algorithm [2].
> Recent LDC alpha based on LLVM 3.8 and recent Mir v0.16.0-alpha3 are required. @fasmath syntax may be changed a little bit and will be simplified anyway.
>
> [...]

The PR and Mir v0.16.0-alpha7 have half and triangular selections. They are very helpful to work with matrixes.