August 30, 2016
Okay looks like I've made a boo boo and ldc is compiling out that entire multiplication loop out.

Its passing the array statically and since its never assigned back, its just never compiled in (unless you specify it via ref).

So, this is where I give up as it is 2am.

Perhaps try and make it parallel (std.parallemism can help hugely).
August 29, 2016
Dne 29.8.2016 v 15:57 rikki cattermole via Digitalmars-d-learn napsal(a):

> My bad, fixed:
>
> double[1000][] A, B, C;
>
> void main() {
>         A = new double[1000][1000];
>         B = new double[1000][1000];
>         C = new double[1000][1000];
>
>         import std.conv : to;
>         import std.datetime;
>         import std.stdio : writeln;
>
>         ini(A);
>         ini(B);
>         ini(C);
>
>         auto r = benchmark!run_test(10000);
>         auto res = to!Duration(r[0]);
>         writeln(res);
> }
>
> void run_test() {
>         MatMul(A, B, C);
> }
>
> void ini(T)(T mtx) {
>         foreach(ref v; mtx) {
>                 v = 3.4;
>         }
>
>         foreach(i, v; mtx) {
>                 foreach(j, ref vv; v) {
>                         vv += (i * j) + (0.6 * j);
>                 }
>         }
> }
>
> void MatMul(T)(T A, T B, T C) {
>         foreach(cv; C) {
>                 cv = 0f;
>         }
>
>         foreach(i, cv; C) {
>                 foreach(j, av; A[i]) {
>                         auto bv = B[j];
>                         foreach(k, cvv; cv) {
>                                 cvv += av * bv[k];
>                         }
>                 }
>         }
>
> }

This will not work, you need to add some ref :).

    foreach(i, ref cv; C) {
        foreach(j, av; A[i]) {
            auto bv = B[j];
            foreach(k, ref cvv; cv) {
                cvv += av * bv[k];
            }
        }
    }


August 29, 2016
Dne 29.8.2016 v 16:08 rikki cattermole via Digitalmars-d-learn napsal(a):

> Okay looks like I've made a boo boo and ldc is compiling out that entire multiplication loop out.
>
> Its passing the array statically and since its never assigned back, its just never compiled in (unless you specify it via ref).
>
> So, this is where I give up as it is 2am.
>
> Perhaps try and make it parallel (std.parallemism can help hugely).
this is my version:

import std.stdio;

immutable int n = 1000, l=1000, m=1000;

struct ZeroDouble
{
    double val = 0f;
    alias val this;
}

void main(string[] args)
{
    auto A = new double [1000][m];
    auto B = new double [1000][n];
    auto C = new ZeroDouble[1000][n];
    ini!(A);
    ini!(B);
    MatMul!(A,B,C);
    writeln(C[1][1]);
    writefln("%d   %d ", C.length, C[0].length);
}

void ini(alias mtx)(){
    foreach(i, ref mtxInner; mtx) {
        foreach(j, ref cell; mtxInner) {
            cell = i*j + 0.6*j +3.4;
        }
    }
}

void MatMul(alias A, alias B, alias C)() {
    foreach(i, ref cv; C) {
        foreach(j, av; A[i]) {
            foreach(k, ref cvv; cv) {
                cvv += av * B[j][k];
            }
        }
    }
}
August 29, 2016
On Monday, 29 August 2016 at 10:20:56 UTC, rikki cattermole wrote:
> By the looks you're not running the tests more then once.
> Druntime initialization could be effecting this.
>
> Please execute each test (without memory allocation) 10000 times atleast and then report back what they are.

D program startup is on the order of milliseconds, so the difference is negligible for a benchmark that runs for more than a second vs. 200 ms.

 — David
August 29, 2016
On Monday, 29 August 2016 at 09:53:12 UTC, Steinhagelvoll wrote:
> Hello,
>
> I'm trying to find a fast way to use multi dimensional arrays. For this I implemented a matrix multiplication and compared the times for different ways. As a reference I used a Fortran90 implementation.
>
> [...]

Any chance you can post the generated asm ?
I have a suspicion:
you are not passing your cpu arch to ldc,
thus probably it generated i486 code.
August 29, 2016
Dne 29.8.2016 v 16:21 Stefan Koch via Digitalmars-d-learn napsal(a):

> On Monday, 29 August 2016 at 09:53:12 UTC, Steinhagelvoll wrote:
>> Hello,
>>
>> I'm trying to find a fast way to use multi dimensional arrays. For this I implemented a matrix multiplication and compared the times for different ways. As a reference I used a Fortran90 implementation.
>>
>> [...]
>
> Any chance you can post the generated asm ?
> I have a suspicion:
> you are not passing your cpu arch to ldc,
> thus probably it generated i486 code.

why i486, I belive it will select x86_64 by default on linux

August 29, 2016
On Monday, 29 August 2016 at 13:59:15 UTC, Daniel Kozak wrote:
> Dne 29.8.2016 v 11:53 Steinhagelvoll via Digitalmars-d-learn napsal(a):
>
>> [...]
> It is unfair to compare different backend:
>
> gfortran -O3 -o test test.f90
> [kozak@dajinka ~]$ time ./test
>
> real    0m2.072s
> user    0m2.053s
> sys    0m0.013s
>
> gdc -O3 -o test test.d
> [kozak@dajinka ~]$ time ./test
>
> real    0m1.655s
> user    0m1.640s
> sys    0m0.010s
>
> Obviously ifort can use some special instruction on your CPU

This seems to be it. I also implemented it in C++ (because gfortran isn't the main focus of GNU) and this is the result:

$ ./cpp_test_clang
elapsed time 1.12785
$ ./cpp_test_gpp
elapsed time 1.24206
$ ./cpp_test_intel
elapsed time 0.298331

It is quite surprising that there is this much of a difference, even when all run sequential. I believe this might be specific to this small problem.
August 29, 2016
Dne 29.8.2016 v 16:43 Steinhagelvoll via Digitalmars-d-learn napsal(a):

> On Monday, 29 August 2016 at 13:59:15 UTC, Daniel Kozak wrote:
>> Dne 29.8.2016 v 11:53 Steinhagelvoll via Digitalmars-d-learn napsal(a):
>>
>>> [...]
>> It is unfair to compare different backend:
>>
>> gfortran -O3 -o test test.f90
>> [kozak@dajinka ~]$ time ./test
>>
>> real    0m2.072s
>> user    0m2.053s
>> sys    0m0.013s
>>
>> gdc -O3 -o test test.d
>> [kozak@dajinka ~]$ time ./test
>>
>> real    0m1.655s
>> user    0m1.640s
>> sys    0m0.010s
>>
>> Obviously ifort can use some special instruction on your CPU
>
> This seems to be it. I also implemented it in C++ (because gfortran isn't the main focus of GNU) and this is the result:
>
> $ ./cpp_test_clang
> elapsed time 1.12785
> $ ./cpp_test_gpp
> elapsed time 1.24206
> $ ./cpp_test_intel
> elapsed time 0.298331
>
> It is quite surprising that there is this much of a difference, even when all run sequential. I believe this might be specific to this small problem.

with gcc you can try enable some optimalizations: g++ -O3 -march=native -o test test.cpp

August 29, 2016
On Monday, 29 August 2016 at 14:43:08 UTC, Steinhagelvoll wrote:
> It is quite surprising that there is this much of a difference, even when all run sequential. I believe this might be specific to this small problem.

You should definitely have a look at this benchmark for matrix multiplication across a many languages:

https://github.com/kostya/benchmarks#matmul

With the recent generic GLAS kernel in mir, matrix multiplication in D is the blazingly fast (it improved the existing results by at least 8x).
Please not that this requires the latest LDC beta with includes the fastMath pragma and GLAS is still under development at mir:

https://github.com/libmir/mir
August 29, 2016
On Monday, 29 August 2016 at 14:55:50 UTC, Seb wrote:
> On Monday, 29 August 2016 at 14:43:08 UTC, Steinhagelvoll wrote:
>> It is quite surprising that there is this much of a difference, even when all run sequential. I believe this might be specific to this small problem.
>
> You should definitely have a look at this benchmark for matrix multiplication across a many languages:
>
> https://github.com/kostya/benchmarks#matmul
>
> With the recent generic GLAS kernel in mir, matrix multiplication in D is the blazingly fast (it improved the existing results by at least 8x).
> Please not that this requires the latest LDC beta with includes the fastMath pragma and GLAS is still under development at mir:
>
> https://github.com/libmir/mir

It not really about multiplying matrices. I wanted to see how D compares for different tasks. If I actually want to do matrix multiplication I will  use LAPACK or something of that nature.

In this task the difference was much bigger compared to e.g. prime testing, which was about even.