Search
speed of polymorphic calls
Apr 13, 2005
Marcio
Apr 13, 2005
Sean Kelly
Apr 14, 2005
Manfred Nowak
Apr 14, 2005
Sean Kelly
Apr 14, 2005
Dave
Apr 14, 2005
Manfred Nowak
Apr 14, 2005
Dave
Apr 14, 2005
Manfred Nowak
Apr 14, 2005
Manfred Nowak
Apr 14, 2005
Dave
Apr 14, 2005
Marcio
Apr 14, 2005
Dave
Apr 14, 2005
Marcio
Apr 14, 2005
Dave
Apr 14, 2005
pragma
Apr 14, 2005
Dave
```Hi,

I was comparing the speed of polymorphic calls in D versus other
languages (SmartEiffel, for example) and noticed that D's numbers do not
shine.

Having a Smalltalk background, I used the favorite little benchmark
of Smalltalkers: fibonacci (tests polymorphic sends and integer
arithmetic). The sources for SmartEiffel and D are at the end.

SmartEiffel 2.1b3 computes fib(12) in a loop of 1,000,000 times in
3.04 seconds.

D computes fib(12) in a loop of 1,000,000 times in 7 seconds.

(Java JRE 1.4.1_01 does it in 3.73 seconds thanks to its JIT)

Is there any hope that the generated D code will be
improved/optimized ? Maybe D's OO runtime could incorporate Polymorphic
Inline Caches etc ? ( http://c2.com/cgi/wiki?PolymorphicInlineCaches )

Note: if you switch to a pure functional approach (get rid of the
class around fibr etc) the number comes down to 2 seconds.

Cheers,

marcio

---------------------- cut here --------------------------------
class FIBOO -- eiffel wants all uppercase letters for class name

creation
make

feature
make is
local
n, count, fib_value, i: INTEGER;
start_time, end_time : MICROSECOND_TIME
do
print( "Fibonacci (recursive, OO) in SmartEiffel%N" )
n := argument(1).to_integer
count := argument(2).to_integer

start_time.update
from i := 0 until i = count loop
fib_value := fibr (n)
i := i + 1
end
end_time.update
print( "%N " + count.to_string + " times:  fibOO(" +
n.to_string + ")= " + fib_value.to_string)
print( "%N time (s)= " + (start_time.elapsed_seconds
(end_time)).to_string)
end -- make

fibr (n : INTEGER) : INTEGER is
do
if (n <= 2) then
Result := 1
else
Result := fibr (n-1) + fibr (n-2)
end
end -- fibr

end -- class FIBOO
---------------------- cut here --------------------------------

import std.c.stdio;
import std.string;
import std.date;

/*
Compute fibonacci recursively, in an OO-way ,
in D ( http://www.digitalmars.com/d/index.html )
*/

class Fib {
int fibr(int n) {
if (n <= 2) return(1);
else return fibr(n-1)+fibr(n-2);
}
}

int main (char[][] args) {
if (args.length<3) {
printf ("fib <num> <loopCount>");
return -1;
}
int num = atoi (args[1]);
int loopCount = atoi (args[2]);
Fib f = new Fib();
int result;
d_time theStart = getUTCtime();
for (int i = 0; i < loopCount; i++) {
result = f.fibr(num);
}
d_time theEnd = getUTCtime();
float totalTimeInSeconds = (theEnd - theStart) / TicksPerSecond;
printf("fibOO(%i)=%i  (%i times in %f s)", num, result, loopCount,
totalTimeInSeconds);
return 0;
}
---------------------- cut here --------------------------------

```
```I tried this out just to test a theory and the results surprised me.  I altered the code a bit to this:

import std.c.stdio;
import std.c.stdlib;
import std.c.string;
import std.c.time;

/*
Compute fibonacci recursively, in an OO-way ,
in D ( http://www.digitalmars.com/d/index.html )
*/

class Fib {
int fibr(int n) {
if (n <= 2) return(1);
else return fibr(n-1)+fibr(n-2);
}
}

int main (char[][] args) {
if (args.length<3) {
printf ("fib <num> <loopCount>");
return -1;
}
int num = atoi (args[1]);
int loopCount = atoi (args[2]);
Fib f = new Fib();
int result;
clock_t theStart = clock();
for (int i = 0; i < loopCount; i++) {
result = f.fibr(num);
}
clock_t theEnd = clock();
double totalTimeInSeconds = cast(double) (theEnd - theStart) / CLOCKS_PER_SEC;
printf("fibOO(%i)=%i  (%i times in %f s)", num, result, loopCount,
totalTimeInSeconds);
return 0;
}

I compiled with options:

-release -O -inline

and ran:

fib 12 1000000

twice.  The average time was ~4.4 seconds.  Next, I edited the code to make Fin.fibr a final method, re-compiled, and re-ran.  The result?  An average time of ~8.8 seconds!  Why in the world is declaring the method "final" doubling execution time?

Sean

```
```Sean Kelly <sean@f4.ca> wrote:

[...]
> Why in the world
> is declaring the method "final" doubling execution time?

By a mistake of yours?

My test did not confirm your result, because declaring `fibr'
`final' caused a significant drop of
11% execution time compiled by dmd 0.120 from 2.422s to 2.156s
15% execution time compiled by gdc 0.10  from 1.953s to 1.656s

-manfred
```
```Marcio <mqmnews321@sglebs.com> wrote:

[...]
>          D computes fib(12) in a loop of 1,000,000 times in 7
>          seconds.
[...]

Did you use `-release -O -inline'? This seems to reduce execution time by more than 60%.

-manfred
```
```In article <d3kjd9\$2sc7\$1@digitaldaemon.com>, Manfred Nowak says...
>
>Sean Kelly <sean@f4.ca> wrote:
>
>[...]
>> Why in the world
>> is declaring the method "final" doubling execution time?
>
>By a mistake of yours?
>
>My test did not confirm your result, because declaring `fibr'
>`final' caused a significant drop of
>11% execution time compiled by dmd 0.120 from 2.422s to 2.156s
>15% execution time compiled by gdc 0.10  from 1.953s to 1.656s

I must have messed something up doing my first test.  When I run it now, the standard implementation averages ~4.0 seconds and adding 'final' reduces the time to ~3.3 seconds.

Sean

```
```How were they compiled?

DMD should have been: dmd -O -inline -release <file_name>

I get the same results with DMD vs. the latest SmartEiffel v2-1 downloaded from the loria Site (with the same code as below).

If I finalize or make the fibr() method private, DMD is actually faster.

I compiled the SE code with: compile -boost -no_split -O3 fib.e -o fib_e

The compiler options for DMD can be viewed by just typing 'dmd' at the command line. They are also here: http://digitalmars.com/d/dcompiler.html

- Dave

In article <Xns9637C09FD665Fmyidtoken@63.105.9.61>, Marcio says...
>
>Hi,
>
>    	I was comparing the speed of polymorphic calls in D versus other
>languages (SmartEiffel, for example) and noticed that D's numbers do not shine.
>
>    	Having a Smalltalk background, I used the favorite little benchmark
>of Smalltalkers: fibonacci (tests polymorphic sends and integer arithmetic). The sources for SmartEiffel and D are at the end.
>
>    	SmartEiffel 2.1b3 computes fib(12) in a loop of 1,000,000 times in
>3.04 seconds.
>
>    	D computes fib(12) in a loop of 1,000,000 times in 7 seconds.
>
>    	(Java JRE 1.4.1_01 does it in 3.73 seconds thanks to its JIT)
>
>    	Is there any hope that the generated D code will be
>improved/optimized ? Maybe D's OO runtime could incorporate Polymorphic Inline Caches etc ? ( http://c2.com/cgi/wiki?PolymorphicInlineCaches )
>
>    	Note: if you switch to a pure functional approach (get rid of the
>class around fibr etc) the number comes down to 2 seconds.
>
>    	Cheers,
>
>marcio
>
>---------------------- cut here --------------------------------
>class FIBOO -- eiffel wants all uppercase letters for class name
>
>creation
>    make
>
>feature
>	make is
>		local
>			n, count, fib_value, i: INTEGER;
>			start_time, end_time : MICROSECOND_TIME
>		do
>			print( "Fibonacci (recursive, OO) in SmartEiffel%N" )
>			n := argument(1).to_integer
>			count := argument(2).to_integer
>
> 			start_time.update
>			from i := 0 until i = count loop
>	 		   fib_value := fibr (n)
>			   i := i + 1
>			end
>			end_time.update
>			print( "%N " + count.to_string + " times:  fibOO(" +
>n.to_string + ")= " + fib_value.to_string)
>			print( "%N time (s)= " + (start_time.elapsed_seconds
>(end_time)).to_string)
>		end -- make
>
>	fibr (n : INTEGER) : INTEGER is
>	    do
>	      if (n <= 2) then
>	         Result := 1
>	      else
>	         Result := fibr (n-1) + fibr (n-2)
>	      end
>	    end -- fibr
>
>end -- class FIBOO
>---------------------- cut here --------------------------------
>
>import std.c.stdio;
>import std.string;
>import std.date;
>
>/*
>  Compute fibonacci recursively, in an OO-way ,
>  in D ( http://www.digitalmars.com/d/index.html )
>*/
>
>class Fib {
>	int fibr(int n) {
>	  if (n <= 2) return(1);
>	  else return fibr(n-1)+fibr(n-2);
>	}
>}
>
>int main (char[][] args) {
>	if (args.length<3) {
>		printf ("fib <num> <loopCount>");
>		return -1;
>	}
>	int num = atoi (args[1]);
>	int loopCount = atoi (args[2]);
>	Fib f = new Fib();
>	int result;
>	d_time theStart = getUTCtime();
>	for (int i = 0; i < loopCount; i++) {
>		result = f.fibr(num);
>	}
>	d_time theEnd = getUTCtime();
>	float totalTimeInSeconds = (theEnd - theStart) / TicksPerSecond;
>	printf("fibOO(%i)=%i  (%i times in %f s)", num, result, loopCount,
>totalTimeInSeconds);
>	return 0;
>}
>---------------------- cut here --------------------------------
>
>

```
```In article <d3kjd9\$2sc7\$1@digitaldaemon.com>, Manfred Nowak says...
>
>Sean Kelly <sean@f4.ca> wrote:
>
>[...]
>> Why in the world
>> is declaring the method "final" doubling execution time?
>
>By a mistake of yours?
>
>My test did not confirm your result, because declaring `fibr'
>`final' caused a significant drop of
>11% execution time compiled by dmd 0.120 from 2.422s to 2.156s
>15% execution time compiled by gdc 0.10  from 1.953s to 1.656s
>
>-manfred

What compiler options?

W/o 'final' or 'private' DMD is the same as GDC (well, 5% diff, not 25% like

W/ final or private, DMD is actually a tad bit faster.

- Dave

```
```Dave <Dave_member@pathlink.com> wrote in news:d3kn2c\$2ul7\$1@digitaldaemon.com:

>
> How were they compiled?

D: I had used just -O

SmartEiffel: with this BAT:

rem Don't forget to activate the MS tools first:
rem  "C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin
\vcvars32.bat"

compile -boost -no_gc -relax -no_split fiboo

> DMD should have been: dmd -O -inline -release <file_name>

With -O -inline -release Now I get this:

[OO]
dmd -O -inline -release fibOO.d

fibOO 12 1000000
fibOO(12)=144  (1000000 times in 4.000000 s)

[non-OO]
dmd -O -inline -release fib.d

fib 12 1000000
fib(12)=144  (1000000 times in 2.000000 s)

D's OO version is still a bit slower than SmartEiffel:

fibOO 12 1000000
Fibonacci (recursive, OO) in SmartEiffel
1000000 times:  fibOO(12)= 144
time (s)= 3.766000

And still a bit slower than Java (source at the end):

java FibOO 12 1000000
Fibonacci (recursive, OO) in Java
1000000 times: fibOO(12)= 144
time(ms)= 3641

I am guessing that SmartEiffel is doing its awesome compile-time
optimizations of polymorphic calls (when they are not really polymorphic),
and Java is doing its smart JIT thing.

Question is: shouldn't D be as fast as (or faster than) SmartEiffel ?

marcio
---------

public class FibOO {
//	Fibonacci benchmark done in an OO way

public int fibr(int n) {
if (n <= 2)
return 1;
else
return fibr(n - 1) + fibr(n - 2);
}

public static void main(String[] args) {
System.out.println("Fibonacci (recursive, OO) in Java");
int n = Integer.parseInt(args[0]);
int count = Integer.parseInt(args[1]);
FibOO fibOO= new FibOO();
int fib = 0;
long start = System.currentTimeMillis();
for (int i = 0 ; i < count ; i++) {
fib = fibOO.fibr(n);
}
long stop = System.currentTimeMillis();
System.out.println(" " + count + " times: fibOO(" + n + ")= " + fib);
System.out.println(" time(ms)= " + (stop - start));
}
}
```
```Yes - there does seem to be a penalty for OO in any case for this test. The reason is probably because all methods are considered virtual in D by default. One of the things pointed out by others in this thread is that "finalizing" the class or method gets rid of some of this penalty and makes up the difference between D and either SE or Java.

For example final class Fib { ... } or final fibr(...)

IIRC, it's been a goal stated in the past for the compiler to "auto-finalize" if it can. Because of the language design this is something D compilers can readily do, even across source files. But it hasn't been implemented yet - the D compiler is not at v1.0 yet (but that's not to imply that v1.0 will do auto-finalization either - I don't know).

If you're interested in taking a look at how D does vs. other languages in general, the best source right now is probably:

http://shootout.alioth.debian.org/great/benchmark.php?test=all&lang=all&sort=fullcpu

Thanks,
- Dave

"Marcio" <mqmnews321@sglebs.com> wrote in message news:Xns96385C92189CDmyidtoken@63.105.9.61...
> Dave <Dave_member@pathlink.com> wrote in news:d3kn2c\$2ul7\$1@digitaldaemon.com:
>
>>
>> How were they compiled?
>
>    D: I had used just -O
>
>    SmartEiffel: with this BAT:
>
> rem Don't forget to activate the MS tools first:
> rem  "C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin
> \vcvars32.bat"
>
> compile -boost -no_gc -relax -no_split fiboo
>
>
>
>> DMD should have been: dmd -O -inline -release <file_name>
>
>
>    With -O -inline -release Now I get this:
>
> [OO]
> dmd -O -inline -release fibOO.d
>
> fibOO 12 1000000
> fibOO(12)=144  (1000000 times in 4.000000 s)
>
>
> [non-OO]
> dmd -O -inline -release fib.d
>
> fib 12 1000000
> fib(12)=144  (1000000 times in 2.000000 s)
>
>
> D's OO version is still a bit slower than SmartEiffel:
>
> fibOO 12 1000000
> Fibonacci (recursive, OO) in SmartEiffel
> 1000000 times:  fibOO(12)= 144
> time (s)= 3.766000
>
>
>
> And still a bit slower than Java (source at the end):
>
> java FibOO 12 1000000
> Fibonacci (recursive, OO) in Java
> 1000000 times: fibOO(12)= 144
> time(ms)= 3641
>
>
> I am guessing that SmartEiffel is doing its awesome compile-time optimizations of polymorphic calls (when they are not really polymorphic), and Java is doing its smart JIT thing.
>
> Question is: shouldn't D be as fast as (or faster than) SmartEiffel ?
>
> marcio
> ---------
>
> public class FibOO {
> // Fibonacci benchmark done in an OO way
>
>  public int fibr(int n) {
> if (n <= 2)
> return 1;
> else
> return fibr(n - 1) + fibr(n - 2);
>  }
>
>  public static void main(String[] args) {
> System.out.println("Fibonacci (recursive, OO) in Java");
> int n = Integer.parseInt(args[0]);
> int count = Integer.parseInt(args[1]);
> FibOO fibOO= new FibOO();
> int fib = 0;
> long start = System.currentTimeMillis();
> for (int i = 0 ; i < count ; i++) {
> fib = fibOO.fibr(n);
> }
> long stop = System.currentTimeMillis();
> System.out.println(" " + count + " times: fibOO(" + n + ")= " + fib);
> System.out.println(" time(ms)= " + (stop - start));
>  }
> }

```
```"Dave" <Dave_member@pathlink.com> wrote in news:d3m1oa\$15de\$1@digitaldaemon.com:

> IIRC, it's been a goal stated in the past for the compiler to "auto-finalize" if it can. Because of the language design this is something D compilers can readily do, even across source files. But it hasn't been implemented yet - the D compiler is not at v1.0 yet (but that's not to imply that v1.0 will do auto-finalization either - I don't know).

It will be interesting to see when it gets added.

I also wonder if global analysis & this optimization can be done at all if
you allow the system to be extended via "OO DLLs". I mean components written
in your OO language which can be linked in at runtime. For example, OTI/IBM
Smalltalk's ICX stuff, or Java JARs etc. Does D support this ? Does it plan
to ? Interestingly enough, SmartEiffel does not support this feature, but
only a "monolythic app". It will be interesting to see what D will be capable
of doing in this area... speed versus runtime OO extensions... Will it be
able to do both ?

marcio
```
« First   ‹ Prev
1 2