Thread overview | ||||||||
---|---|---|---|---|---|---|---|---|
|
December 26, 2007 Struct method access speed | ||||
---|---|---|---|---|
| ||||
A tiny benchmark of mine that's a reduced version of some real code (I hope my code doesn't contain bugs/silly inefficiencies). Four compilations (gdc, dmd, gcc, gcc aggressive): gcc version 3.4.5 (mingw special) (gdc 0.24, using dmd 1.020) gdc -O3 -s -frelease -finline-functions -ffast-math -fomit-frame-pointer -funroll-loops -march=pentiumpro vec_test.d -o vec_test_gdc dmd v1.024 dmd -O -release -inline vec_test.d 1) gcc version 3.4.2 (mingw-special) g++ -s -O3 vec_testCpp.cpp -o vec_testCpp 2) gcc version 3.4.2 (mingw-special) g++ -O3 -s -finline-functions -ffast-math -fomit-frame-pointer -funroll-loops -march=pentiumpro vec_testCpp.cpp -o vec_testCpp Timings in seconds, N = 20_000, on Pentium 3: dmd (D): 3.31 1.66 gdc (D): 8.92 1.39 gcc 1 (C++): 1.67 1.66 gcc 2 (C++): 1.36 1.36 The interesting results are for dmd: I think such two benchmarks have to run for the same time, but the experiment shows it's false for DMD. Note that this isn't an inlining problem, because if you remove the -inline flag you obtain much worse timings. // D ------------------------ import std.stdio, std.gc, std.c.time; struct Vec(T, int N) { T* ap; void create() { ap = cast(T*)malloc(N * T.sizeof); for(int i; i < N; i++) ap[i] = T.init; } void opIndexAssign(T x, uint i) { ap[i] = x; } } void main() { const uint N = 20_000; alias uint T; auto t = clock(); Vec!(T, N) v1; v1.create; for(uint i; i < N; i++) for(uint j; j < N; j++) v1[j] = j; printf("%.2f\n", cast(float)(clock()-t)/CLOCKS_PER_SEC); t = clock(); auto v2 = cast(T*)malloc(N * T.sizeof); for(uint i; i < N; i++) for(uint j; j < N; j++) v2[j] = j; printf("%.2f\n", cast(float)(clock()-t)/CLOCKS_PER_SEC); } // C++ ------------------------ #include <time.h> #include <stdio.h> #include <stdlib.h> #define N 20000 template <class T, int M> struct Vec { T* ap; void create() { ap = (T*)malloc(M * sizeof(T)); for(int i = 0; i < M; i++) ap[i] = 0; } T &operator[](unsigned int i) { return ap[i]; } }; int main() { typedef unsigned int T; clock_t t; t = clock(); Vec<T, N> v1; v1.create(); for(unsigned int i = 0; i < N; i++) for(unsigned int j = 0; j < N; j++) v1[j] = j; printf("%.2f\n", (float)(clock()-t)/CLOCKS_PER_SEC); t = clock(); T* v2 = (T*)malloc(N * sizeof(T)); for(unsigned int i = 0; i < N; i++) for(unsigned int j = 0; j < N; j++) v2[j] = j; printf("%.2f\n", (float)(clock()-t)/CLOCKS_PER_SEC); } Bye, bearophile |
December 26, 2007 Re: Struct method access speed | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile Attachments: | bearophile wrote: > A tiny benchmark of mine that's a reduced version of some real code (I hope my code doesn't contain bugs/silly inefficiencies). Four compilations (gdc, dmd, gcc, gcc aggressive): > > gcc version 3.4.5 (mingw special) (gdc 0.24, using dmd 1.020) > gdc -O3 -s -frelease -finline-functions -ffast-math -fomit-frame-pointer -funroll-loops -march=pentiumpro vec_test.d -o vec_test_gdc > > dmd v1.024 > dmd -O -release -inline vec_test.d > > 1) gcc version 3.4.2 (mingw-special) > g++ -s -O3 vec_testCpp.cpp -o vec_testCpp > > 2) gcc version 3.4.2 (mingw-special) > g++ -O3 -s -finline-functions -ffast-math -fomit-frame-pointer -funroll-loops -march=pentiumpro vec_testCpp.cpp -o vec_testCpp > > > Timings in seconds, N = 20_000, on Pentium 3: > > dmd (D): > 3.31 > 1.66 > > gdc (D): > 8.92 > 1.39 > > gcc 1 (C++): > 1.67 > 1.66 > > gcc 2 (C++): > 1.36 > 1.36 > > The interesting results are for dmd: I think such two benchmarks have to run for the same time, but the experiment shows it's false for DMD. > The one that surprises me is gdc. For DMD, you actually fill the vector twice in the first loop (once with zeroes, then a second time with values) and it runs about twice as long as the second loop which is normal. However, for gdc, I can't explain the large time for the first loop... Jerome - -- +------------------------- Jerome M. BERGER ---------------------+ | mailto:jeberger@free.fr | ICQ: 238062172 | | http://jeberger.free.fr/ | Jabber: jeberger@jabber.fr | +---------------------------------+------------------------------+ |
December 26, 2007 Re: Struct method access speed | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jérôme M. Berger | M. Berger:
> For DMD, you actually fill the
> vector twice in the first loop (once with zeroes, then a second time
> with values) and it runs about twice as long as the second loop
> which is normal.
I don't agree. The method "create" is called only once, so it must add nearly nothing (about 1/20000) to the first timing. Please correct me if you think I am wrong.
Bye,
bearophile
|
December 27, 2007 Re: Struct method access speed | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jérôme M. Berger | > The one that surprises me is gdc. For DMD, you actually fill the
> vector twice in the first loop (once with zeroes, then a second time
> with values) and it runs about twice as long as the second loop which is
> normal. However, for gdc, I can't explain the large time for the first
> loop...
>
> Jerome
I agreed with you so I tested it out. I removed the for loop in create:
void create() {
ap = cast(T*)malloc(N * T.sizeof);
}
Results:
timbus@TiMBoX:~/Desktop$ dmd -O -release -inline what.d
gcc what.o -o what -m32 -lphobos -lpthread -lm
timbus@TiMBoX:~/Desktop$ ./what
0.37
0.19
Seems opIndexAssign needs a bit of fine tuning.
|
December 27, 2007 Re: Struct method access speed | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jarrod | On Thu, 27 Dec 2007 02:02:09 +0000, Jarrod wrote:
>> The one that surprises me is gdc. For DMD, you actually fill the vector twice in the first loop (once with zeroes, then a second time with values) and it runs about twice as long as the second loop which is normal. However, for gdc, I can't explain the large time for the first loop...
>>
>> Jerome
>
> I agreed with you so I tested it out. I removed the for loop in create:
> void create() {
> ap = cast(T*)malloc(N * T.sizeof);
> }
>
> Results:
> timbus@TiMBoX:~/Desktop$ dmd -O -release -inline what.d gcc what.o -o
> what -m32 -lphobos -lpthread -lm timbus@TiMBoX:~/Desktop$ ./what
> 0.37
> 0.19
>
> Seems opIndexAssign needs a bit of fine tuning.
I also added the following method to the struct:
void assign(T x, uint i){
ap[i] = x;
}
Then instead of using v1[i] = j; I used v1.assign(j, i);
It worked just as fast as the v2 way.
So it's not an issue with member access. It's an issue with overloading the array operator.
|
December 27, 2007 Re: Struct method access speed | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile Attachments: | bearophile wrote: > M. Berger: >> For DMD, you actually fill the >> vector twice in the first loop (once with zeroes, then a second time >> with values) and it runs about twice as long as the second loop >> which is normal. > > I don't agree. The method "create" is called only once, so it must add nearly nothing (about 1/20000) to the first timing. Please correct me if you think I am wrong. > You're right of course. I hadn't seen that. Note to self: never post just before going to bed... Jerome - -- +------------------------- Jerome M. BERGER ---------------------+ | mailto:jeberger@free.fr | ICQ: 238062172 | | http://jeberger.free.fr/ | Jabber: jeberger@jabber.fr | +---------------------------------+------------------------------+ |
Copyright © 1999-2021 by the D Language Foundation