is D so slow? (page 2)

"Saaa" <empty@needmail.com> wrote in message news:g340sc$d1j$1@digitalmars.com... > baleog are you Marco? (same ip) > What kind of hardware do you have? > Because Marco also had some strange speed problems I couldn't replicate. They have the same IP because they both used the web interface. You'll notice that everyone who uses the web interface has the same IP.

Unknown W. Brackets Wrote: > What about switches? Your program uses arrays; if you have array bounds checks enabled, that could easily account for the difference. > > One way to see is dump the assembly (I think there's a utility called dumpobj included with dmd) and compare. Obviously, it's doing something differently - there's nothing instrinsically "slower" about the language for sure. > > Also - keep in mind that gdc doesn't take advantage of all the optimizations that gcc is able to provide, at least at this time. A couple of bytes can go a long long way if not optimized right. There's another classic benchmark issue that you could be stumbling over. The sample code you posted throws away the results inside the function. GCC C can detect that the result of the computations are not used, and optimize everything out of existence. That kind of difference could easily explain the speed difference you're seeing. If you're going to do this kind of micro-benchmark, you need to print the result of computation or otherwise convince the compiler you need the result.

June 16, 2008

Re: is D so slow?

Posted by Dave
in reply to Jarrett Billingsley

Permalink

Dave

Posted in reply to Jarrett Billingsley

Permalink

"Jarrett Billingsley" <kb3ctd2@yahoo.com> wrote in message news:g336hl$10c8$1@digitalmars.com...
> "baleog" <maccarka@yahoo.com> wrote in message news:g32umu$11kq$1@digitalmars.com...
>> Tomas Lindquist Olsen Wrote:
>>
>>> What switches did you use to compile? Not much info you're giving ...
>>
>> Ubuntu-6.06
>> dmd-2.0.14 - 40sec witth n=500
>> dmd -O -release -inline test.d
>> gdc-0.24 - 32sec
>> gdmd -O -release test.d
>> and gcc-4.0.3 - 1.5sec
>> gcc test.c
>>
>> so gcc without optimization runs 20 times faster than gdc
>> but i can't find how to suppress array bound checking
>
> Array bounds checking is off as long as you specify -release.
>
> I don't know if your computer is just really, REALLY slow, but out of curiosity I tried running the D program on my computer.  It completes in 1.2 seconds.
>
> Also, using malloc/free vs. new/delete shouldn't much matter in this program, because you make all of three allocations, all before any loops. The GC is never going to be called during the program.

I agree, but nonetheless the malloc version runs much faster on my systems (both Linux/Windows, P4 and Core2, all compiled w/ -O -inline -release). The relative performance difference gets larger as n increases:

n            malloc        GC
100        0.094         0.328
200        0.140         1.859
300        0.203         6.094
400        0.312        14.141
500        0.547        27.625

import std.conv;

void main(string[] args)
{
   if(args.length > 1)
       test(toInt(args[1]));
   else
       printf("usage: mm nnn\n");
}

version(malloc)
{
import std.c.stdlib;
}

void test(int n)
{
 version(malloc)
 {
 float* xs = cast(float*)malloc(n*n*float.sizeof);
 float* ys = cast(float*)malloc(n*n*float.sizeof);
 }
 else
 {
 float[] xs = new float[n*n];
 float[] ys = new float[n*n];
 }

 for(int i = n-1; i>=0; --i) {
   xs[i] = 1.0;
 }
 for(int i = n-1; i>=0; --i) {
   ys[i] = 2.0;
 }

 version(malloc)
 {
 float* zs = cast(float*)malloc(n*n*float.sizeof);
 }
 else
 {
 float[] zs = new float[n*n];
 }

 for (int i=0; i<n; ++i) {
   for (int j=0; j<n; ++j) {
     float s = 0.0;
     for (int k=0; k<n; ++k) {
       s = s + (xs[k + (i*n)] * ys[j + (k*n)]);
     }
     zs[j+ (i*n)] =  s;
   }
 }

 version(malloc)
 {
 free(zs);
 free(ys);
 free(xs);
 }
 else
 {
 delete xs;
 delete ys;
 delete zs;
 }
}

On 2008-06-15 13:53:30 +0200, baleog <maccarka@yahoo.com> said: > Thank you for your replies! I used malloc instead of new and run time was about 1sec But you probably did not understand why... and it seems that neither did others around here... Indeed it is a subtle pitfall in which it is easy to fall. When you benchmark 1) print something depending on the result like the sum of everything (it is not the main issue in this case, but doing it would have probably shown the problem), so you can also have at least a tiny chance to notice if your algorithm is wrong 2) NaNs operations involving NaNs depending on the IEEE compliance requested on the processor can be 1000 times slower!!!!!!!! D (very thoughtfully, as it makes spotting errors easier) initializes the floating point numbers with NaNs (unlike C). -> your results follow if you use malloc, the memory is not initialized with NaNs -> performance manual malloc in this case is definitely not requested writing a benchmark can be subtle... benchmarking correct code is easier... Fawzi

On 2008-06-16 16:32:56 +0200, Fawzi Mohamed <fmohamed@mac.com> said: > On 2008-06-15 13:53:30 +0200, baleog <maccarka@yahoo.com> said: > >> Thank you for your replies! I used malloc instead of new and run time was about 1sec > > But you probably did not understand why... and it seems that neither did others around here... > > Indeed it is a subtle pitfall in which it is easy to fall. > > When you benchmark > 1) print something depending on the result like the sum of everything (it is not the main issue in this case, but doing it would have probably shown the problem), so you can also have at least a tiny chance to notice if your algorithm is wrong > > 2) NaNs ehm, sorry... You do initialize everything... ehm, never post without testing... Fawzi

"Dave" <Dave_member@pathlink.com> wrote in message news:g34sja$2m1a$1@digitalmars.com... > > I agree, but nonetheless the malloc version runs much faster on my systems (both Linux/Windows, P4 and Core2, all compiled w/ -O -inline -release). The relative performance difference gets larger as n increases: > > n malloc GC > 100 0.094 0.328 > 200 0.140 1.859 > 300 0.203 6.094 > 400 0.312 14.141 > 500 0.547 27.625 I'm sorry, but using your code, I can't reproduce times anywhere near that. I'm on Windows, DMD, Athlon X2 64. Here are my results: Phobos: n malloc GC ------------------------ 100 0.005206 0.005285 200 0.045083 0.045199 300 0.148954 0.148920 400 0.400136 0.404554 500 0.933754 1.076060 Tango: n malloc GC ------------------------ 100 0.005221 0.005298 200 0.045342 0.044910 300 0.150753 0.149157 400 0.402951 0.403343 500 0.946041 1.073466 Tested with both Tango and Phobos to be sure, and the times are not really any different between the two. The malloc and GC times don't really differ until n=500, and even then it's not by much.

Jarrett Billingsley Wrote: > I'm sorry, but using your code, I can't reproduce times anywhere near that. Maybe it depends on hardware? And `new` effictiveness depends on used hardware. my /proc/cpuinfo: intel celeron 1.5GHz flags: fpu, vme, de, tsk, msr, pae, mce, cx8, apic, sep, mtrr, pge, mca, cmov, pat, clflush, dts, acpi, mmx, fxsr, sse, sse2, ss, tm, pbe, nx

Saaa Wrote: > baleog are you Marco? (same ip) No > What kind of hardware do you have? HP Compaq nx6110. Ubuntu Linux 6.06 > Because Marco also had some strange speed problems I couldn't replicate. > >

On 2008-06-16 16:40:16 +0200, Fawzi Mohamed <fmohamed@mac.com> said: > On 2008-06-16 16:32:56 +0200, Fawzi Mohamed <fmohamed@mac.com> said: > >> On 2008-06-15 13:53:30 +0200, baleog <maccarka@yahoo.com> said: >> >>> Thank you for your replies! I used malloc instead of new and run time was about 1sec >> >> But you probably did not understand why... and it seems that neither did others around here... >> >> Indeed it is a subtle pitfall in which it is easy to fall. >> >> When you benchmark >> 1) print something depending on the result like the sum of everything (it is not the main issue in this case, but doing it would have probably shown the problem), so you can also have at least a tiny chance to notice if your algorithm is wrong >> >> 2) NaNs > > ehm, sorry... > You do initialize everything... > ehm, never post without testing... > > Fawzi I tested... and well I was actually right (I should have trusted my gut feeling a little more...) NaN is the culprit. check your algorithm (you initialize, backwards for some strange reason) just part of the arrays... putting xs[] = 1.0; ys[] = 2.0; instead of your strange loops, solves everything... Fawzi

Forums