Jump to page: 1 2 3
Thread overview
Fast multidimensional Arrays
Aug 29, 2016
Steinhagelvoll
Aug 29, 2016
kink
Aug 29, 2016
rikki cattermole
Aug 29, 2016
Steinhagelvoll
Aug 29, 2016
Daniel Kozak
Aug 29, 2016
rikki cattermole
Aug 29, 2016
rikki cattermole
Aug 29, 2016
Steinhagelvoll
Aug 29, 2016
rikki cattermole
Aug 29, 2016
rikki cattermole
Aug 29, 2016
Daniel Kozak
Aug 29, 2016
Daniel Kozak
Aug 29, 2016
David Nadlinger
Aug 29, 2016
Daniel Kozak
Aug 29, 2016
Steinhagelvoll
Aug 29, 2016
Daniel Kozak
Aug 29, 2016
Seb
Aug 29, 2016
Steinhagelvoll
Aug 29, 2016
Ilya Yaroshenko
Aug 29, 2016
Stefan Koch
Aug 29, 2016
Daniel Kozak
August 29, 2016
Hello,

I'm trying to find a fast way to use multi dimensional arrays. For this I implemented a matrix multiplication and compared the times for different ways. As a reference I used a Fortran90 implementation.

Fortran reference: http://pastebin.com/Hd5zTHVJ
ifort test.f90  -o testf && time ./testf
real    0m0.680s
user    0m0.672s
sys     0m0.008s

ifort -O3 test.f90 -o testf && time ./testf
real    0m0.235s
user    0m0.228s
sys     0m0.004s

ifort -check all test.f90  -o testf && time ./testf
        1000

real    0m34.993s
user    0m35.012s
sys     0m0.008s


For D it tried a number of different ways:

NDSlice: http://pastebin.com/nUbMnt8B
real	0m35.922s
user	0m35.888s
sys	0m0.008


1D Arrays: http://pastebin.com/R7CJFybK
dmd -boundscheck=off -O test.d && time ./test
real	0m4.415s
user	0m4.412s
sys	0m0.004s

ldc2 -O3 test.d && time ./test
real	0m4.261s
user	0m4.252s
sys	0m0.004s

2D Arrays: http://pastebin.com/4CuB4Y0c

dmd -boundscheck=off -O nd_test.d && time ./nd_test
real	0m3.565s
user	0m3.560s
sys	0m0.004s


ldc2 -O3 nd_test.d && time ./nd_test
real	0m3.568s
user	0m3.560s
sys	0m0.004s

None of them is even close to the Fortran implementation, only when I enable all check in Fortran it seems to be equal to Ndslice. Is there a speedy way to use multi-dimensional matrices?

Kind regards

Matthias
August 29, 2016
At the very least, give the LDC command line a `-release`, otherwise you end up with all assertions enabled etc.
August 29, 2016
On 29/08/2016 9:53 PM, Steinhagelvoll wrote:
> Hello,
>
> I'm trying to find a fast way to use multi dimensional arrays. For this
> I implemented a matrix multiplication and compared the times for
> different ways. As a reference I used a Fortran90 implementation.
>
> Fortran reference: http://pastebin.com/Hd5zTHVJ
> ifort test.f90  -o testf && time ./testf
> real    0m0.680s
> user    0m0.672s
> sys     0m0.008s
>
> ifort -O3 test.f90 -o testf && time ./testf
> real    0m0.235s
> user    0m0.228s
> sys     0m0.004s
>
> ifort -check all test.f90  -o testf && time ./testf
>         1000
>
> real    0m34.993s
> user    0m35.012s
> sys     0m0.008s
>
>
> For D it tried a number of different ways:
>
> NDSlice: http://pastebin.com/nUbMnt8B
> real    0m35.922s
> user    0m35.888s
> sys    0m0.008
>
>
> 1D Arrays: http://pastebin.com/R7CJFybK
> dmd -boundscheck=off -O test.d && time ./test
> real    0m4.415s
> user    0m4.412s
> sys    0m0.004s
>
> ldc2 -O3 test.d && time ./test
> real    0m4.261s
> user    0m4.252s
> sys    0m0.004s
>
> 2D Arrays: http://pastebin.com/4CuB4Y0c
>
> dmd -boundscheck=off -O nd_test.d && time ./nd_test
> real    0m3.565s
> user    0m3.560s
> sys    0m0.004s
>
>
> ldc2 -O3 nd_test.d && time ./nd_test
> real    0m3.568s
> user    0m3.560s
> sys    0m0.004s
>
> None of them is even close to the Fortran implementation, only when I
> enable all check in Fortran it seems to be equal to Ndslice. Is there a
> speedy way to use multi-dimensional matrices?
>
> Kind regards
>
> Matthias

By the looks you're not running the tests more then once.
Druntime initialization could be effecting this.

Please execute each test (without memory allocation) 10000 times atleast and then report back what they are.

Something like https://dlang.org/phobos/std_datetime.html#.benchmark will be very helpful.
August 29, 2016
Ok I added release and implemented the benchmark for 500 iterations, 10000 are not reasonable. I build on the 2d array with LDC: http://pastebin.com/aXxzEdS4 (changes just in the beginning)

$ ldc2 -release -O3 nd_test.d
$ ./nd_test
12 minutes, 18 secs, 21 ms, 858 μs, and 3 hnsecs

, which is 738 seconds. Compared to (also 500 iterations)

ifort -O3 -o fort_test test.f90 && ./fort_test
 time:    107.4640    seconds


This still seems like a big difference. Is it because I don't use a continous piece of memory, but rather a pointer to a pointer?
August 29, 2016
Dne 29.8.2016 v 14:13 Steinhagelvoll via Digitalmars-d-learn napsal(a):

> Ok I added release and implemented the benchmark for 500 iterations, 10000 are not reasonable. I build on the 2d array with LDC: http://pastebin.com/aXxzEdS4 (changes just in the beginning)
>
> $ ldc2 -release -O3 nd_test.d
> $ ./nd_test
> 12 minutes, 18 secs, 21 ms, 858 μs, and 3 hnsecs
>
> , which is 738 seconds. Compared to (also 500 iterations)
>
> ifort -O3 -o fort_test test.f90 && ./fort_test
>  time:    107.4640    seconds
>
>
> This still seems like a big difference. Is it because I don't use a continous piece of memory, but rather a pointer to a pointer?
It is possible, there is a lot of indirections
August 30, 2016
On 30/08/2016 12:13 AM, Steinhagelvoll wrote:
> Ok I added release and implemented the benchmark for 500 iterations,
> 10000 are not reasonable. I build on the 2d array with LDC:
> http://pastebin.com/aXxzEdS4 (changes just in the beginning)
>
> $ ldc2 -release -O3 nd_test.d
> $ ./nd_test
> 12 minutes, 18 secs, 21 ms, 858 μs, and 3 hnsecs
>
> , which is 738 seconds. Compared to (also 500 iterations)
>
> ifort -O3 -o fort_test test.f90 && ./fort_test
>  time:    107.4640    seconds
>
>
> This still seems like a big difference. Is it because I don't use a
> continous piece of memory, but rather a pointer to a pointer?

double[1000][] A, B, C;

void main() {
        A = new double[1000][1000];
        B = new double[1000][1000];
        C = new double[1000][1000];

        import std.conv : to;
        import std.datetime;
        import std.stdio : writeln;

        ini(A);
        ini(B);
        ini(C);

        auto r = benchmark!run_test(10000);
        auto res = to!Duration(r[0]);
        writeln(res);
}

void run_test() {
        MatMul(A, B, C);
}

void ini(T)(T mtx) {
        foreach(v; mtx) {
                v = 3.4;
        }

        foreach(i, v; mtx) {
                foreach(j, vv; v) {
                        vv += (i * j) + (0.6 * j);
                }
        }
}

void MatMul(T)(T A, T B, T C) {
        foreach(cv; C) {
                cv = 0f;
        }

        foreach(i, cv; C) {
                foreach(j, av; A[i]) {
                        foreach(k, cvv; cv) {
                                cvv += av * B[j][k];
                        }
                }
        }
}

$ ldc2 test.d -O5 -release -oftest.exe -m64
$ ./test
3 secs, 995 ms, 115 μs, and 2 hnsecs

Please verify that it is still doing the same thing that you want.
August 30, 2016
On 30/08/2016 1:02 AM, rikki cattermole wrote:
> On 30/08/2016 12:13 AM, Steinhagelvoll wrote:
>> Ok I added release and implemented the benchmark for 500 iterations,
>> 10000 are not reasonable. I build on the 2d array with LDC:
>> http://pastebin.com/aXxzEdS4 (changes just in the beginning)
>>
>> $ ldc2 -release -O3 nd_test.d
>> $ ./nd_test
>> 12 minutes, 18 secs, 21 ms, 858 μs, and 3 hnsecs
>>
>> , which is 738 seconds. Compared to (also 500 iterations)
>>
>> ifort -O3 -o fort_test test.f90 && ./fort_test
>>  time:    107.4640    seconds
>>
>>
>> This still seems like a big difference. Is it because I don't use a
>> continous piece of memory, but rather a pointer to a pointer?
>
> double[1000][] A, B, C;
>
> void main() {
>         A = new double[1000][1000];
>         B = new double[1000][1000];
>         C = new double[1000][1000];
>
>         import std.conv : to;
>         import std.datetime;
>         import std.stdio : writeln;
>
>         ini(A);
>         ini(B);
>         ini(C);
>
>         auto r = benchmark!run_test(10000);
>         auto res = to!Duration(r[0]);
>         writeln(res);
> }
>
> void run_test() {
>         MatMul(A, B, C);
> }
>
> void ini(T)(T mtx) {
>         foreach(v; mtx) {
>                 v = 3.4;
>         }
>
>         foreach(i, v; mtx) {
>                 foreach(j, vv; v) {
>                         vv += (i * j) + (0.6 * j);
>                 }
>         }
> }
>
> void MatMul(T)(T A, T B, T C) {
>         foreach(cv; C) {
>                 cv = 0f;
>         }
>
>         foreach(i, cv; C) {
>                 foreach(j, av; A[i]) {
>                         foreach(k, cvv; cv) {
>                                 cvv += av * B[j][k];
>                         }
>                 }
>         }
> }
>
> $ ldc2 test.d -O5 -release -oftest.exe -m64
> $ ./test
> 3 secs, 995 ms, 115 μs, and 2 hnsecs
>
> Please verify that it is still doing the same thing that you want.

Below change is slightly faster:

        foreach(i, cv; C) { 



                foreach(j, av; A[i]) { 



                        auto bv = B[j]; 



                        foreach(k, cvv; cv) { 



                                cvv += av * bv[k]; 



                        } 


                                     } 



        }
August 29, 2016
On Monday, 29 August 2016 at 13:02:43 UTC, rikki cattermole wrote:
> On 30/08/2016 12:13 AM, Steinhagelvoll wrote:
>> [...]
>
> double[1000][] A, B, C;
>
> void main() {
>         A = new double[1000][1000];
>         B = new double[1000][1000];
>         C = new double[1000][1000];
>
> [...]

It seems that the ini doesn't work properly. Every value seems to be nan.

ini(A);
ini(B);
ini(C);
writeln(A[0][0]);
writeln(C[3][9]);

nan
nan
August 30, 2016
On 30/08/2016 1:50 AM, Steinhagelvoll wrote:
> It seems that the ini doesn't work properly. Every value seems to be nan.
>
> ini(A);
> ini(B);
> ini(C);
> writeln(A[0][0]);
> writeln(C[3][9]);
>
> nan
> nan

My bad, fixed:

double[1000][] A, B, C;

void main() {
        A = new double[1000][1000];
        B = new double[1000][1000];
        C = new double[1000][1000];

        import std.conv : to;
        import std.datetime;
        import std.stdio : writeln;

        ini(A);
        ini(B);
        ini(C);

        auto r = benchmark!run_test(10000);
        auto res = to!Duration(r[0]);
        writeln(res);
}

void run_test() {
        MatMul(A, B, C);
}

void ini(T)(T mtx) {
        foreach(ref v; mtx) {
                v = 3.4;
        }

        foreach(i, v; mtx) {
                foreach(j, ref vv; v) {
                        vv += (i * j) + (0.6 * j);
                }
        }
}

void MatMul(T)(T A, T B, T C) {
        foreach(cv; C) {
                cv = 0f;
        }

        foreach(i, cv; C) {
                foreach(j, av; A[i]) {
                        auto bv = B[j];
                        foreach(k, cvv; cv) {
                                cvv += av * bv[k];
                        }
                }
        }

}
August 29, 2016
Dne 29.8.2016 v 11:53 Steinhagelvoll via Digitalmars-d-learn napsal(a):

> Hello,
>
> I'm trying to find a fast way to use multi dimensional arrays. For this I implemented a matrix multiplication and compared the times for different ways. As a reference I used a Fortran90 implementation.
>
> Fortran reference: http://pastebin.com/Hd5zTHVJ
> ifort test.f90  -o testf && time ./testf
> real    0m0.680s
> user    0m0.672s
> sys     0m0.008s
>
> ifort -O3 test.f90 -o testf && time ./testf
> real    0m0.235s
> user    0m0.228s
> sys     0m0.004s
>
> ifort -check all test.f90  -o testf && time ./testf
>         1000
>
> real    0m34.993s
> user    0m35.012s
> sys     0m0.008s
>
>
> For D it tried a number of different ways:
>
> NDSlice: http://pastebin.com/nUbMnt8B
> real    0m35.922s
> user    0m35.888s
> sys    0m0.008
>
>
> 1D Arrays: http://pastebin.com/R7CJFybK
> dmd -boundscheck=off -O test.d && time ./test
> real    0m4.415s
> user    0m4.412s
> sys    0m0.004s
>
> ldc2 -O3 test.d && time ./test
> real    0m4.261s
> user    0m4.252s
> sys    0m0.004s
>
> 2D Arrays: http://pastebin.com/4CuB4Y0c
>
> dmd -boundscheck=off -O nd_test.d && time ./nd_test
> real    0m3.565s
> user    0m3.560s
> sys    0m0.004s
>
>
> ldc2 -O3 nd_test.d && time ./nd_test
> real    0m3.568s
> user    0m3.560s
> sys    0m0.004s
>
> None of them is even close to the Fortran implementation, only when I enable all check in Fortran it seems to be equal to Ndslice. Is there a speedy way to use multi-dimensional matrices?
>
> Kind regards
>
> Matthias
It is unfair to compare different backend:

gfortran -O3 -o test test.f90
[kozak@dajinka ~]$ time ./test

real    0m2.072s
user    0m2.053s
sys    0m0.013s

gdc -O3 -o test test.d
[kozak@dajinka ~]$ time ./test

real    0m1.655s
user    0m1.640s
sys    0m0.010s

Obviously ifort can use some special instruction on your CPU
« First   ‹ Prev
1 2 3