Jump to page: 1 2 3
Thread overview
D speed compared to C++
Mar 18, 2008
Matthew Allen
Mar 18, 2008
BCS
Mar 18, 2008
Sean Kelly
Mar 18, 2008
Frits van Bommel
Mar 18, 2008
BCS
Mar 18, 2008
Sean Kelly
Mar 18, 2008
Sean Kelly
Mar 18, 2008
Frits van Bommel
Mar 18, 2008
Frits van Bommel
Mar 18, 2008
BCS
Mar 18, 2008
Frits van Bommel
Mar 18, 2008
Walter Bright
Mar 19, 2008
Matthew Allen
Mar 20, 2008
Koroskin Denis
Mar 21, 2008
Walter Bright
Mar 18, 2008
bearophile
Mar 19, 2008
Dan
Mar 19, 2008
Saaa
Mar 19, 2008
Paul Findlay
Mar 26, 2008
Georg Wrede
Mar 26, 2008
lutger
Mar 26, 2008
bearophile
Mar 19, 2008
Vladimir Panteleev
Mar 19, 2008
David Ferenczi
Mar 19, 2008
Matthew Allen
Mar 19, 2008
lutger
March 18, 2008
I am looking to use D for programming a high speed vision application which was previously written in C/C++. I have done some arbitary speed tests and am finding that C/C++ seems to be faster than D by a magnitude of about 3 times. I have done some simple loop tests that increment a float value by some number and also some memory allocation/deallocation loops and C/C++ seems to come out on top each time. Is D meant to be faster or as fast as C/C++ and if so how can I optimize the code. I am using -inline, -O, and -release.

An example of a simple loop test I ran is as follows:

DWORD start = timeGetTime();
	int i,j,k;
	float dx=0;
    for(i=0; i<1000;i++)
        for(j=0; j<1000;j++)
            for(k=0; k<10; k++)
                {
                     dx++;
                }
    DWORD end = timeGetTime();

In C++ int and doubles. The C++ came back with a time of 15ms, and D came back with 45ms.
March 18, 2008
Matthew Allen wrote:
> 
> An example of a simple loop test I ran is as follows:
> 
> DWORD start = timeGetTime();
> 	int i,j,k;
> 	float dx=0;
>     for(i=0; i<1000;i++)
>         for(j=0; j<1000;j++)
>             for(k=0; k<10; k++)
>                 {
>                      dx++;
>                 }
>     DWORD end = timeGetTime();
> 
> In C++ int and doubles. The C++ came back with a time of 15ms, and D came back with 45ms.

first of all what C++ compiler? the best for testing would be DMC as that removes the back end differences.

Second how many test runs was that over?

third, try it with doubles (64bit reals) in both programs as the different conversions might be making a difference.

Another thing that might mask some stuff is start up time. Try running the test loops in another loop and spit out sequential times. I have seen large (2x - 3x) differences in the first run of a test vs. later runs. This would avoid random variables like the test code spanning a page boundary in one case and no in the other.

If you have done these things already then I don't known what's happening. /My/ next step would be to start looking at the ASM, but then again I'm known to be crazy.
March 18, 2008
== Quote from Matthew Allen (matt.allen@removeme.creativelifestyles.com)'s article
> I am looking to use D for programming a high speed vision application which was previously written
in C/C++. I have done some arbitary speed tests and am finding that C/C++ seems to be faster than D
by a magnitude of about 3 times. I have done some simple loop tests that increment a float value by
some number and also some memory allocation/deallocation loops and C/C++ seems to come out on
top each time. Is D meant to be faster or as fast as C/C++ and if so how can I optimize the code. I am
using -inline, -O, and -release.
> An example of a simple loop test I ran is as follows:
> DWORD start = timeGetTime();
> 	int i,j,k;
> 	float dx=0;
>     for(i=0; i<1000;i++)
>         for(j=0; j<1000;j++)
>             for(k=0; k<10; k++)
>                 {
>                      dx++;
>                 }
>     DWORD end = timeGetTime();
> In C++ int and doubles. The C++ came back with a time of 15ms, and D came back with 45ms.

Are these tests with DMD vs. DMC, or GDC vs. GCC?  If you're using different compilers for the C++ and D tests then you're really testing the code generator and optimizer more than the language.  D code generated by DMD, for example, is notoriously slow at floating point operations, while the same code is much faster with GDC.  This is an artifact of the Digital Mars back-end rather than the language itself.


Sean
March 18, 2008
== Quote from BCS (BCS@pathlink.com)'s article
> Matthew Allen wrote:
> >
> > An example of a simple loop test I ran is as follows:
> >
> > DWORD start = timeGetTime();
> > 	int i,j,k;
> > 	float dx=0;
> >     for(i=0; i<1000;i++)
> >         for(j=0; j<1000;j++)
> >             for(k=0; k<10; k++)
> >                 {
> >                      dx++;
> >                 }
> >     DWORD end = timeGetTime();
> >
> > In C++ int and doubles. The C++ came back with a time of 15ms, and D came back with 45ms.
> first of all what C++ compiler? the best for testing would be DMC as
> that removes the back end differences.
> Second how many test runs was that over?
> third, try it with doubles (64bit reals) in both programs as the
> different conversions might be making a difference.
> Another thing that might mask some stuff is start up time. Try running
> the test loops in another loop and spit out sequential times. I have
> seen large (2x - 3x) differences in the first run of a test vs. later
> runs.

D apps also have more going on in the application initialization phase than C++ apps.  For a real
apples-apples comparison, you might want to consider using Tango with the "stub" GC plugged in.
That just calls malloc/free and has no initialization cost, at the expense of no actual garbage collection.
I'll have to check whether the stub GC compiles with the latest Tango--it's been a while since I used it.


Sean
March 18, 2008
Matthew Allen wrote:
> I am looking to use D for programming a high speed vision application which was previously written in C/C++. I have done some arbitary speed tests and am finding that C/C++ seems to be faster than D by a magnitude of about 3 times. I have done some simple loop tests that increment a float value by some number and also some memory allocation/deallocation loops and C/C++ seems to come out on top each time. Is D meant to be faster or as fast as C/C++ and if so how can I optimize the code. I am using -inline, -O, and -release. 
> 
> An example of a simple loop test I ran is as follows:
> 
> DWORD start = timeGetTime();
> 	int i,j,k;
> 	float dx=0;
>     for(i=0; i<1000;i++)
>         for(j=0; j<1000;j++)
>             for(k=0; k<10; k++)
>                 {
>                      dx++;
>                 }
>     DWORD end = timeGetTime();
> 
> In C++ int and doubles. The C++ came back with a time of 15ms, and D came back with 45ms.

That's not a useful benchmark. G++ completely optimizes away the loop, leaving you timing how fast an empty piece of code runs...

However, after adding 'printf("%d", dx)' the generated code for D and C++ is virtually identical, as are the timings. At least on my machine and with my compilers (gdc and g++ on 64-bit Ubuntu).
If you're seeing different results it may just be a difference between your C++ and your D compiler; especially if they're not g++ and gdc or dmc and dmd, i.e. if they don't share the same backend.
March 18, 2008
Sean Kelly wrote:
> == Quote from BCS (BCS@pathlink.com)'s article
>>> DWORD start = timeGetTime();
>>> 	int i,j,k;
>>> 	float dx=0;
>>>     for(i=0; i<1000;i++)
>>>         for(j=0; j<1000;j++)
>>>             for(k=0; k<10; k++)
>>>                 {
>>>                      dx++;
>>>                 }
>>>     DWORD end = timeGetTime();
>>>
[snip]
> 
> D apps also have more going on in the application initialization phase than C++ apps.  For a real
> apples-apples comparison, you might want to consider using Tango with the "stub" GC plugged in.
> That just calls malloc/free and has no initialization cost, at the expense of no actual garbage collection.
> I'll have to check whether the stub GC compiles with the latest Tango--it's been a while since I used it.

How is the startup time relevant, when he appears to be measuring in-process?
March 18, 2008
Frits van Bommel wrote:
> Matthew Allen wrote:
>>     float dx=0;
[snip]
> However, after adding 'printf("%d", dx)' the generated code for D and 

Oops, that shouldn't be "%d", should it?
Well, it doesn't matter because I just put that in to keep the compiler from completely optimizing out the loop but that does explain why I get such weird output :).
March 18, 2008
Frits van Bommel wrote:
>
>>>> DWORD start = timeGetTime();
>>>>     int i,j,k;
>>>>     float dx=0;
>>>>     for(i=0; i<1000;i++)
>>>>         for(j=0; j<1000;j++)
>>>>             for(k=0; k<10; k++)
>>>>                 {
>>>>                      dx++;
>>>>                 }
>>>>     DWORD end = timeGetTime();
>>>>
> [snip]
> 
> How is the startup time relevant, when he appears to be measuring in-process?

the GC time is not, but cache priming and such can make a difference. I  have actually worked on code like the above and seen a consistent and significant drop in the second pass time.
March 18, 2008
Frits van Bommel wrote:

> However, after adding 'printf("%d", dx)' the generated code for D and C++ is virtually identical, as are the timings.

printf is kinda a heavy weight function. how does it compare with some dummy function?
March 18, 2008
Matthew Allen wrote:
> I am looking to use D for programming a high speed vision application
> which was previously written in C/C++. I have done some arbitary
> speed tests and am finding that C/C++ seems to be faster than D by a
> magnitude of about 3 times. I have done some simple loop tests that
> increment a float value by some number and also some memory
> allocation/deallocation loops and C/C++ seems to come out on top each
> time. Is D meant to be faster or as fast as C/C++ and if so how can I
> optimize the code. I am using -inline, -O, and -release.
> 
> An example of a simple loop test I ran is as follows:
> 
> DWORD start = timeGetTime(); int i,j,k; float dx=0; for(i=0;
> i<1000;i++) for(j=0; j<1000;j++) for(k=0; k<10; k++) { dx++; } DWORD
> end = timeGetTime();
> 
> In C++ int and doubles. The C++ came back with a time of 15ms, and D
> came back with 45ms.

Loop unrolling could be a big issue here. DMD doesn't do loop unrolling, but that is not a language issue at all, it's an optimizer issue. It's easy enough to check - get the assembler output of the loop from your compiler and post it here.
« First   ‹ Prev
1 2 3