June 15, 2008 Re: is D so slow? | ||||
---|---|---|---|---|
| ||||
Posted in reply to baleog | baleog are you Marco? (same ip) What kind of hardware do you have? Because Marco also had some strange speed problems I couldn't replicate. |
June 15, 2008 Re: is D so slow? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Saaa | "Saaa" <empty@needmail.com> wrote in message news:g340sc$d1j$1@digitalmars.com... > baleog are you Marco? (same ip) > What kind of hardware do you have? > Because Marco also had some strange speed problems I couldn't replicate. They have the same IP because they both used the web interface. You'll notice that everyone who uses the web interface has the same IP. |
June 16, 2008 Re: is D so slow? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Unknown W. Brackets | Unknown W. Brackets Wrote:
> What about switches? Your program uses arrays; if you have array bounds checks enabled, that could easily account for the difference.
>
> One way to see is dump the assembly (I think there's a utility called dumpobj included with dmd) and compare. Obviously, it's doing something differently - there's nothing instrinsically "slower" about the language for sure.
>
> Also - keep in mind that gdc doesn't take advantage of all the optimizations that gcc is able to provide, at least at this time. A couple of bytes can go a long long way if not optimized right.
There's another classic benchmark issue that you could be stumbling over. The sample code you posted throws away the results inside the function.
GCC C can detect that the result of the computations are not used, and optimize everything out of existence. That kind of difference could easily explain the speed difference you're seeing.
If you're going to do this kind of micro-benchmark, you need to print the result of computation or otherwise convince the compiler you need the result.
|
June 16, 2008 Re: is D so slow? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jarrett Billingsley | "Jarrett Billingsley" <kb3ctd2@yahoo.com> wrote in message news:g336hl$10c8$1@digitalmars.com... > "baleog" <maccarka@yahoo.com> wrote in message news:g32umu$11kq$1@digitalmars.com... >> Tomas Lindquist Olsen Wrote: >> >>> What switches did you use to compile? Not much info you're giving ... >> >> Ubuntu-6.06 >> dmd-2.0.14 - 40sec witth n=500 >> dmd -O -release -inline test.d >> gdc-0.24 - 32sec >> gdmd -O -release test.d >> and gcc-4.0.3 - 1.5sec >> gcc test.c >> >> so gcc without optimization runs 20 times faster than gdc >> but i can't find how to suppress array bound checking > > Array bounds checking is off as long as you specify -release. > > I don't know if your computer is just really, REALLY slow, but out of curiosity I tried running the D program on my computer. It completes in 1.2 seconds. > > Also, using malloc/free vs. new/delete shouldn't much matter in this program, because you make all of three allocations, all before any loops. The GC is never going to be called during the program. I agree, but nonetheless the malloc version runs much faster on my systems (both Linux/Windows, P4 and Core2, all compiled w/ -O -inline -release). The relative performance difference gets larger as n increases: n malloc GC 100 0.094 0.328 200 0.140 1.859 300 0.203 6.094 400 0.312 14.141 500 0.547 27.625 import std.conv; void main(string[] args) { if(args.length > 1) test(toInt(args[1])); else printf("usage: mm nnn\n"); } version(malloc) { import std.c.stdlib; } void test(int n) { version(malloc) { float* xs = cast(float*)malloc(n*n*float.sizeof); float* ys = cast(float*)malloc(n*n*float.sizeof); } else { float[] xs = new float[n*n]; float[] ys = new float[n*n]; } for(int i = n-1; i>=0; --i) { xs[i] = 1.0; } for(int i = n-1; i>=0; --i) { ys[i] = 2.0; } version(malloc) { float* zs = cast(float*)malloc(n*n*float.sizeof); } else { float[] zs = new float[n*n]; } for (int i=0; i<n; ++i) { for (int j=0; j<n; ++j) { float s = 0.0; for (int k=0; k<n; ++k) { s = s + (xs[k + (i*n)] * ys[j + (k*n)]); } zs[j+ (i*n)] = s; } } version(malloc) { free(zs); free(ys); free(xs); } else { delete xs; delete ys; delete zs; } } |
June 16, 2008 Re: is D so slow? | ||||
---|---|---|---|---|
| ||||
Posted in reply to baleog | On 2008-06-15 13:53:30 +0200, baleog <maccarka@yahoo.com> said:
> Thank you for your replies! I used malloc instead of new and run time was about 1sec
But you probably did not understand why... and it seems that neither did others around here...
Indeed it is a subtle pitfall in which it is easy to fall.
When you benchmark
1) print something depending on the result like the sum of everything (it is not the main issue in this case, but doing it would have probably shown the problem), so you can also have at least a tiny chance to notice if your algorithm is wrong
2) NaNs
operations involving NaNs depending on the IEEE compliance requested on the processor can be 1000 times slower!!!!!!!!
D (very thoughtfully, as it makes spotting errors easier) initializes the floating point numbers with NaNs (unlike C).
-> your results follow
if you use malloc, the memory is not initialized with NaNs -> performance
manual malloc in this case is definitely not requested
writing a benchmark can be subtle... benchmarking correct code is easier...
Fawzi
|
June 16, 2008 Re: is D so slow? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Fawzi Mohamed | On 2008-06-16 16:32:56 +0200, Fawzi Mohamed <fmohamed@mac.com> said:
> On 2008-06-15 13:53:30 +0200, baleog <maccarka@yahoo.com> said:
>
>> Thank you for your replies! I used malloc instead of new and run time was about 1sec
>
> But you probably did not understand why... and it seems that neither did others around here...
>
> Indeed it is a subtle pitfall in which it is easy to fall.
>
> When you benchmark
> 1) print something depending on the result like the sum of everything (it is not the main issue in this case, but doing it would have probably shown the problem), so you can also have at least a tiny chance to notice if your algorithm is wrong
>
> 2) NaNs
ehm, sorry...
You do initialize everything...
ehm, never post without testing...
Fawzi
|
June 16, 2008 Re: is D so slow? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dave | "Dave" <Dave_member@pathlink.com> wrote in message news:g34sja$2m1a$1@digitalmars.com... > > I agree, but nonetheless the malloc version runs much faster on my systems (both Linux/Windows, P4 and Core2, all compiled w/ -O -inline -release). The relative performance difference gets larger as n increases: > > n malloc GC > 100 0.094 0.328 > 200 0.140 1.859 > 300 0.203 6.094 > 400 0.312 14.141 > 500 0.547 27.625 I'm sorry, but using your code, I can't reproduce times anywhere near that. I'm on Windows, DMD, Athlon X2 64. Here are my results: Phobos: n malloc GC ------------------------ 100 0.005206 0.005285 200 0.045083 0.045199 300 0.148954 0.148920 400 0.400136 0.404554 500 0.933754 1.076060 Tango: n malloc GC ------------------------ 100 0.005221 0.005298 200 0.045342 0.044910 300 0.150753 0.149157 400 0.402951 0.403343 500 0.946041 1.073466 Tested with both Tango and Phobos to be sure, and the times are not really any different between the two. The malloc and GC times don't really differ until n=500, and even then it's not by much. |
June 16, 2008 Re: is D so slow? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jarrett Billingsley | Jarrett Billingsley Wrote:
> I'm sorry, but using your code, I can't reproduce times anywhere near that.
Maybe it depends on hardware? And `new` effictiveness depends on used hardware.
my /proc/cpuinfo:
intel celeron 1.5GHz
flags: fpu, vme, de, tsk, msr, pae, mce, cx8, apic, sep, mtrr, pge, mca, cmov, pat, clflush, dts, acpi, mmx, fxsr, sse, sse2, ss, tm, pbe, nx
|
June 16, 2008 Re: is D so slow? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Saaa | Saaa Wrote: > baleog are you Marco? (same ip) No > What kind of hardware do you have? HP Compaq nx6110. Ubuntu Linux 6.06 > Because Marco also had some strange speed problems I couldn't replicate. > > |
June 16, 2008 Re: is D so slow? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Fawzi Mohamed | On 2008-06-16 16:40:16 +0200, Fawzi Mohamed <fmohamed@mac.com> said:
> On 2008-06-16 16:32:56 +0200, Fawzi Mohamed <fmohamed@mac.com> said:
>
>> On 2008-06-15 13:53:30 +0200, baleog <maccarka@yahoo.com> said:
>>
>>> Thank you for your replies! I used malloc instead of new and run time was about 1sec
>>
>> But you probably did not understand why... and it seems that neither did others around here...
>>
>> Indeed it is a subtle pitfall in which it is easy to fall.
>>
>> When you benchmark
>> 1) print something depending on the result like the sum of everything (it is not the main issue in this case, but doing it would have probably shown the problem), so you can also have at least a tiny chance to notice if your algorithm is wrong
>>
>> 2) NaNs
>
> ehm, sorry...
> You do initialize everything...
> ehm, never post without testing...
>
> Fawzi
I tested... and well I was actually right (I should have trusted my gut feeling a little more...)
NaN is the culprit.
check your algorithm (you initialize, backwards for some strange reason) just part of the arrays...
putting
xs[] = 1.0;
ys[] = 2.0;
instead of your strange loops, solves everything...
Fawzi
|
Copyright © 1999-2021 by the D Language Foundation