Thread overview | |||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
August 29, 2016 Fast multidimensional Arrays | ||||
---|---|---|---|---|
| ||||
Hello, I'm trying to find a fast way to use multi dimensional arrays. For this I implemented a matrix multiplication and compared the times for different ways. As a reference I used a Fortran90 implementation. Fortran reference: http://pastebin.com/Hd5zTHVJ ifort test.f90 -o testf && time ./testf real 0m0.680s user 0m0.672s sys 0m0.008s ifort -O3 test.f90 -o testf && time ./testf real 0m0.235s user 0m0.228s sys 0m0.004s ifort -check all test.f90 -o testf && time ./testf 1000 real 0m34.993s user 0m35.012s sys 0m0.008s For D it tried a number of different ways: NDSlice: http://pastebin.com/nUbMnt8B real 0m35.922s user 0m35.888s sys 0m0.008 1D Arrays: http://pastebin.com/R7CJFybK dmd -boundscheck=off -O test.d && time ./test real 0m4.415s user 0m4.412s sys 0m0.004s ldc2 -O3 test.d && time ./test real 0m4.261s user 0m4.252s sys 0m0.004s 2D Arrays: http://pastebin.com/4CuB4Y0c dmd -boundscheck=off -O nd_test.d && time ./nd_test real 0m3.565s user 0m3.560s sys 0m0.004s ldc2 -O3 nd_test.d && time ./nd_test real 0m3.568s user 0m3.560s sys 0m0.004s None of them is even close to the Fortran implementation, only when I enable all check in Fortran it seems to be equal to Ndslice. Is there a speedy way to use multi-dimensional matrices? Kind regards Matthias |
August 29, 2016 Re: Fast multidimensional Arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steinhagelvoll | At the very least, give the LDC command line a `-release`, otherwise you end up with all assertions enabled etc. |
August 29, 2016 Re: Fast multidimensional Arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steinhagelvoll | On 29/08/2016 9:53 PM, Steinhagelvoll wrote: > Hello, > > I'm trying to find a fast way to use multi dimensional arrays. For this > I implemented a matrix multiplication and compared the times for > different ways. As a reference I used a Fortran90 implementation. > > Fortran reference: http://pastebin.com/Hd5zTHVJ > ifort test.f90 -o testf && time ./testf > real 0m0.680s > user 0m0.672s > sys 0m0.008s > > ifort -O3 test.f90 -o testf && time ./testf > real 0m0.235s > user 0m0.228s > sys 0m0.004s > > ifort -check all test.f90 -o testf && time ./testf > 1000 > > real 0m34.993s > user 0m35.012s > sys 0m0.008s > > > For D it tried a number of different ways: > > NDSlice: http://pastebin.com/nUbMnt8B > real 0m35.922s > user 0m35.888s > sys 0m0.008 > > > 1D Arrays: http://pastebin.com/R7CJFybK > dmd -boundscheck=off -O test.d && time ./test > real 0m4.415s > user 0m4.412s > sys 0m0.004s > > ldc2 -O3 test.d && time ./test > real 0m4.261s > user 0m4.252s > sys 0m0.004s > > 2D Arrays: http://pastebin.com/4CuB4Y0c > > dmd -boundscheck=off -O nd_test.d && time ./nd_test > real 0m3.565s > user 0m3.560s > sys 0m0.004s > > > ldc2 -O3 nd_test.d && time ./nd_test > real 0m3.568s > user 0m3.560s > sys 0m0.004s > > None of them is even close to the Fortran implementation, only when I > enable all check in Fortran it seems to be equal to Ndslice. Is there a > speedy way to use multi-dimensional matrices? > > Kind regards > > Matthias By the looks you're not running the tests more then once. Druntime initialization could be effecting this. Please execute each test (without memory allocation) 10000 times atleast and then report back what they are. Something like https://dlang.org/phobos/std_datetime.html#.benchmark will be very helpful. |
August 29, 2016 Re: Fast multidimensional Arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to rikki cattermole | Ok I added release and implemented the benchmark for 500 iterations, 10000 are not reasonable. I build on the 2d array with LDC: http://pastebin.com/aXxzEdS4 (changes just in the beginning) $ ldc2 -release -O3 nd_test.d $ ./nd_test 12 minutes, 18 secs, 21 ms, 858 μs, and 3 hnsecs , which is 738 seconds. Compared to (also 500 iterations) ifort -O3 -o fort_test test.f90 && ./fort_test time: 107.4640 seconds This still seems like a big difference. Is it because I don't use a continous piece of memory, but rather a pointer to a pointer? |
August 29, 2016 Re: Fast multidimensional Arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steinhagelvoll | Dne 29.8.2016 v 14:13 Steinhagelvoll via Digitalmars-d-learn napsal(a):
> Ok I added release and implemented the benchmark for 500 iterations, 10000 are not reasonable. I build on the 2d array with LDC: http://pastebin.com/aXxzEdS4 (changes just in the beginning)
>
> $ ldc2 -release -O3 nd_test.d
> $ ./nd_test
> 12 minutes, 18 secs, 21 ms, 858 μs, and 3 hnsecs
>
> , which is 738 seconds. Compared to (also 500 iterations)
>
> ifort -O3 -o fort_test test.f90 && ./fort_test
> time: 107.4640 seconds
>
>
> This still seems like a big difference. Is it because I don't use a continous piece of memory, but rather a pointer to a pointer?
It is possible, there is a lot of indirections
|
August 30, 2016 Re: Fast multidimensional Arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steinhagelvoll | On 30/08/2016 12:13 AM, Steinhagelvoll wrote:
> Ok I added release and implemented the benchmark for 500 iterations,
> 10000 are not reasonable. I build on the 2d array with LDC:
> http://pastebin.com/aXxzEdS4 (changes just in the beginning)
>
> $ ldc2 -release -O3 nd_test.d
> $ ./nd_test
> 12 minutes, 18 secs, 21 ms, 858 μs, and 3 hnsecs
>
> , which is 738 seconds. Compared to (also 500 iterations)
>
> ifort -O3 -o fort_test test.f90 && ./fort_test
> time: 107.4640 seconds
>
>
> This still seems like a big difference. Is it because I don't use a
> continous piece of memory, but rather a pointer to a pointer?
double[1000][] A, B, C;
void main() {
A = new double[1000][1000];
B = new double[1000][1000];
C = new double[1000][1000];
import std.conv : to;
import std.datetime;
import std.stdio : writeln;
ini(A);
ini(B);
ini(C);
auto r = benchmark!run_test(10000);
auto res = to!Duration(r[0]);
writeln(res);
}
void run_test() {
MatMul(A, B, C);
}
void ini(T)(T mtx) {
foreach(v; mtx) {
v = 3.4;
}
foreach(i, v; mtx) {
foreach(j, vv; v) {
vv += (i * j) + (0.6 * j);
}
}
}
void MatMul(T)(T A, T B, T C) {
foreach(cv; C) {
cv = 0f;
}
foreach(i, cv; C) {
foreach(j, av; A[i]) {
foreach(k, cvv; cv) {
cvv += av * B[j][k];
}
}
}
}
$ ldc2 test.d -O5 -release -oftest.exe -m64
$ ./test
3 secs, 995 ms, 115 μs, and 2 hnsecs
Please verify that it is still doing the same thing that you want.
|
August 30, 2016 Re: Fast multidimensional Arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to rikki cattermole | On 30/08/2016 1:02 AM, rikki cattermole wrote:
> On 30/08/2016 12:13 AM, Steinhagelvoll wrote:
>> Ok I added release and implemented the benchmark for 500 iterations,
>> 10000 are not reasonable. I build on the 2d array with LDC:
>> http://pastebin.com/aXxzEdS4 (changes just in the beginning)
>>
>> $ ldc2 -release -O3 nd_test.d
>> $ ./nd_test
>> 12 minutes, 18 secs, 21 ms, 858 μs, and 3 hnsecs
>>
>> , which is 738 seconds. Compared to (also 500 iterations)
>>
>> ifort -O3 -o fort_test test.f90 && ./fort_test
>> time: 107.4640 seconds
>>
>>
>> This still seems like a big difference. Is it because I don't use a
>> continous piece of memory, but rather a pointer to a pointer?
>
> double[1000][] A, B, C;
>
> void main() {
> A = new double[1000][1000];
> B = new double[1000][1000];
> C = new double[1000][1000];
>
> import std.conv : to;
> import std.datetime;
> import std.stdio : writeln;
>
> ini(A);
> ini(B);
> ini(C);
>
> auto r = benchmark!run_test(10000);
> auto res = to!Duration(r[0]);
> writeln(res);
> }
>
> void run_test() {
> MatMul(A, B, C);
> }
>
> void ini(T)(T mtx) {
> foreach(v; mtx) {
> v = 3.4;
> }
>
> foreach(i, v; mtx) {
> foreach(j, vv; v) {
> vv += (i * j) + (0.6 * j);
> }
> }
> }
>
> void MatMul(T)(T A, T B, T C) {
> foreach(cv; C) {
> cv = 0f;
> }
>
> foreach(i, cv; C) {
> foreach(j, av; A[i]) {
> foreach(k, cvv; cv) {
> cvv += av * B[j][k];
> }
> }
> }
> }
>
> $ ldc2 test.d -O5 -release -oftest.exe -m64
> $ ./test
> 3 secs, 995 ms, 115 μs, and 2 hnsecs
>
> Please verify that it is still doing the same thing that you want.
Below change is slightly faster:
foreach(i, cv; C) {
foreach(j, av; A[i]) {
auto bv = B[j];
foreach(k, cvv; cv) {
cvv += av * bv[k];
}
}
}
|
August 29, 2016 Re: Fast multidimensional Arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to rikki cattermole | On Monday, 29 August 2016 at 13:02:43 UTC, rikki cattermole wrote:
> On 30/08/2016 12:13 AM, Steinhagelvoll wrote:
>> [...]
>
> double[1000][] A, B, C;
>
> void main() {
> A = new double[1000][1000];
> B = new double[1000][1000];
> C = new double[1000][1000];
>
> [...]
It seems that the ini doesn't work properly. Every value seems to be nan.
ini(A);
ini(B);
ini(C);
writeln(A[0][0]);
writeln(C[3][9]);
nan
nan
|
August 30, 2016 Re: Fast multidimensional Arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steinhagelvoll | On 30/08/2016 1:50 AM, Steinhagelvoll wrote:
> It seems that the ini doesn't work properly. Every value seems to be nan.
>
> ini(A);
> ini(B);
> ini(C);
> writeln(A[0][0]);
> writeln(C[3][9]);
>
> nan
> nan
My bad, fixed:
double[1000][] A, B, C;
void main() {
A = new double[1000][1000];
B = new double[1000][1000];
C = new double[1000][1000];
import std.conv : to;
import std.datetime;
import std.stdio : writeln;
ini(A);
ini(B);
ini(C);
auto r = benchmark!run_test(10000);
auto res = to!Duration(r[0]);
writeln(res);
}
void run_test() {
MatMul(A, B, C);
}
void ini(T)(T mtx) {
foreach(ref v; mtx) {
v = 3.4;
}
foreach(i, v; mtx) {
foreach(j, ref vv; v) {
vv += (i * j) + (0.6 * j);
}
}
}
void MatMul(T)(T A, T B, T C) {
foreach(cv; C) {
cv = 0f;
}
foreach(i, cv; C) {
foreach(j, av; A[i]) {
auto bv = B[j];
foreach(k, cvv; cv) {
cvv += av * bv[k];
}
}
}
}
|
August 29, 2016 Re: Fast multidimensional Arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steinhagelvoll | Dne 29.8.2016 v 11:53 Steinhagelvoll via Digitalmars-d-learn napsal(a):
> Hello,
>
> I'm trying to find a fast way to use multi dimensional arrays. For this I implemented a matrix multiplication and compared the times for different ways. As a reference I used a Fortran90 implementation.
>
> Fortran reference: http://pastebin.com/Hd5zTHVJ
> ifort test.f90 -o testf && time ./testf
> real 0m0.680s
> user 0m0.672s
> sys 0m0.008s
>
> ifort -O3 test.f90 -o testf && time ./testf
> real 0m0.235s
> user 0m0.228s
> sys 0m0.004s
>
> ifort -check all test.f90 -o testf && time ./testf
> 1000
>
> real 0m34.993s
> user 0m35.012s
> sys 0m0.008s
>
>
> For D it tried a number of different ways:
>
> NDSlice: http://pastebin.com/nUbMnt8B
> real 0m35.922s
> user 0m35.888s
> sys 0m0.008
>
>
> 1D Arrays: http://pastebin.com/R7CJFybK
> dmd -boundscheck=off -O test.d && time ./test
> real 0m4.415s
> user 0m4.412s
> sys 0m0.004s
>
> ldc2 -O3 test.d && time ./test
> real 0m4.261s
> user 0m4.252s
> sys 0m0.004s
>
> 2D Arrays: http://pastebin.com/4CuB4Y0c
>
> dmd -boundscheck=off -O nd_test.d && time ./nd_test
> real 0m3.565s
> user 0m3.560s
> sys 0m0.004s
>
>
> ldc2 -O3 nd_test.d && time ./nd_test
> real 0m3.568s
> user 0m3.560s
> sys 0m0.004s
>
> None of them is even close to the Fortran implementation, only when I enable all check in Fortran it seems to be equal to Ndslice. Is there a speedy way to use multi-dimensional matrices?
>
> Kind regards
>
> Matthias
It is unfair to compare different backend:
gfortran -O3 -o test test.f90
[kozak@dajinka ~]$ time ./test
real 0m2.072s
user 0m2.053s
sys 0m0.013s
gdc -O3 -o test test.d
[kozak@dajinka ~]$ time ./test
real 0m1.655s
user 0m1.640s
sys 0m0.010s
Obviously ifort can use some special instruction on your CPU
|
Copyright © 1999-2021 by the D Language Foundation