February 08, 2009
Denis Koroskin Wrote:

> On Sun, 08 Feb 2009 18:40:53 +0300, Denis Koroskin <2korden@gmail.com> wrote:
> 
> > On Sun, 08 Feb 2009 18:09:41 +0300, naryl <cy@ngs.ru> wrote:
> >
> >> It's a bit offtopic but I'd be grateful if someone can explain why D with structs completes this simple benchmark (see attachment) so slowly compared to C++ with classes on stack:
> >>
> >> D struct - 27.85s
> >> C++ stack - 8.32s
> >>
> >> D class - 271.58s
> >> C++ heap - 249.32s
> >>
> >> Compiled with "dmd -O". -release decreases performance by 10% in this case. -inline doesn't affects it at all.
> >
> > I noticed that you calculate Fib(27) in fibs.cc and Fib(40) in fibs.d
> > Can this affect such a big difference between C++ and D version?
> >
> 
> They both perform roughly the same when this typo is corrected.
> 

Sorry. :)

For n=40 I get:
C++ compiled with "g++ -O fibs.cc" - 5.37s
D compiled with "dmd -O -inline fibs.d" - 14.32s
D compiled with "dmd -O -inline -release fibs.d" - 15.20s

DMD 2.023 is still almost three times slower.
February 08, 2009
Hello Yigal,

> I'm curious, What is a heffalump?
> 

http://en.wikipedia.org/wiki/Heffalump


February 08, 2009
Weed wrote:
> (Has started here:
> http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=81359)
> 
> To me still does not give rest performance of classes (in comparison
> with C++ or with D structs)

On my system, your struct example averaged 0.36 seconds, and your class example averaged 0.38 seconds.

Your benchmark is flawed in three ways:
1. You're timing allocations in the class example. Use opAddAssign to avoid it.
2. You're passing large structs in the struct example. Use ref to avoid it.
3. c1 is never assigned to, so c1 + c1 + c1 is moved outside the loop. The assignment has no side effect, so the loop is optimized out. Replace "c1 + c1 + c1" with "c1 + c2 + c1", for instance, and the struct example takes 20 seconds rather than 0.36. The original class example takes 57 seconds to do 50 million allocations.

Using those tricks, the class example takes about four seconds on my machine, and the struct example takes two.

For reference, the benchmarks I used:
// class example -------------------------
scope class C {
    int i;
    real[5] unused; // to prevent returning this object in registers

    final void opAddAssign( C src ) {
        this.i += src.i;
    }
}

int main() {
    scope c1 = new C;
    scope c2 = new C;

    // initialise i by "random" value to prevent compile-time calculation
    c1.i = cast(int)&c1;
    c2.i = 0;

    for(int i = 0; i < 50_000_000; ++i)
    {
        c2 += c1;
        c2 += c1;
        c2 += c1;
    }

    return c2.i;
}

// struct example --------------------------
struct C {
    int i;
    real[5] unused; // to prevent returning this object in registers

    void opAddAssign( ref C src ) {
        this.i += src.i;
    }
}

int main() {
    C c1, c2;

    // initialise i by "random" value to prevent compile-time calculation
    c1.i = cast(int)&c1;
    c2.i = 0;

    for(int i = 0; i < 50_000_000; ++i)
    {
        c2 += c1;
        c2 += c1;
        c2 += c1;
    }

    return c2.i;
}
February 08, 2009
Frits van Bommel wrote:
> Which helps (a bit) with the two instances allocated in main(), but is rather unhelpful with the 100_000_000 allocated in opAdd() (they're returned)...

The use of classes in this example is like using a screwdriver as a hammer. It'll work in a pinch, but a hammer works a lot better. If you're allocating 100_000_000 classes in a tight loop, some refactoring looks to be in order.

In particular, classes are *meant* to be used as reference types, but the program is trying to treat them as value types. Virtual functions are orthogonal to what value types are - a continuing problem C++ programs have is conflating value types with reference types.
February 08, 2009
On Sun, 08 Feb 2009 20:24:08 +0300, naryl <cy@ngs.ru> wrote:

> Denis Koroskin Wrote:
>
>> On Sun, 08 Feb 2009 18:40:53 +0300, Denis Koroskin <2korden@gmail.com> wrote:
>>
>> > On Sun, 08 Feb 2009 18:09:41 +0300, naryl <cy@ngs.ru> wrote:
>> >
>> >> It's a bit offtopic but I'd be grateful if someone can explain why D
>> >> with structs completes this simple benchmark (see attachment) so  
>> slowly
>> >> compared to C++ with classes on stack:
>> >>
>> >> D struct - 27.85s
>> >> C++ stack - 8.32s
>> >>
>> >> D class - 271.58s
>> >> C++ heap - 249.32s
>> >>
>> >> Compiled with "dmd -O". -release decreases performance by 10% in this
>> >> case. -inline doesn't affects it at all.
>> >
>> > I noticed that you calculate Fib(27) in fibs.cc and Fib(40) in fibs.d
>> > Can this affect such a big difference between C++ and D version?
>> >
>>
>> They both perform roughly the same when this typo is corrected.
>>
>
> Sorry. :)
>
> For n=40 I get:
> C++ compiled with "g++ -O fibs.cc" - 5.37s
> D compiled with "dmd -O -inline fibs.d" - 14.32s
> D compiled with "dmd -O -inline -release fibs.d" - 15.20s
>
> DMD 2.023 is still almost three times slower.

Here is code, setting and result I got.

D version:

import std.stdio;

extern(Windows) int timeGetTime();

struct Fib {
	private int _value;

	int value() {
		if(_value <= 2)
			return 1;

		scope f1 = Fib(_value - 1);
		scope f2 = Fib(_value - 2);

		return f1.value() + f2.value();
	}
}

void main()
{
	int start = timeGetTime();
	
	int value = 0;

	foreach (i; 0 .. 10) {
		value += Fib(40).value;
	}
	
	int stop = timeGetTime();

	writefln(value);
	writefln("Time elapsed: %s", stop - start);
}

C++ version:


#include <stdio.h>
#include <windows.h>

class Fib
{
	private:
		int _value;

	public:
		Fib(int n) { _value = n; }

		int value()
		{
			if(_value <= 2)
				return 1;

			Fib f1 = Fib(_value - 1);
			Fib f2 = Fib(_value - 2);

			return f1.value() + f2.value();
		}
};

int main()
{
	int start = timeGetTime();
	
	int value = 0;

	for(int i=0; i<10; i++)
	{
		Fib x = Fib(40);
		value += x.value();
	}
	
	int stop = timeGetTime();
	
	printf("%d\n", value);
	printf("Time elapsed: %d\n", stop - start);

	return 0;
}

And here are results (best/average of 3 runs):

DMD2.023 - 12.492/12.576 ms (-O -inline)
DMC8.42n - 13.941/14.131 ms (-O -inline)


February 08, 2009
Michel Fortin wrote:
> Polymorphism doesn't work very well while passing objects by value, even in C++. This is called the slicing problem.

I have heard about the slicing problem.  I know what it is.  But in all my years of using C++ as my primary language, I have never actually encountered it.  I don't believe it actually exists.


-- 
Rainer Deyke - rainerd@eldwood.com
February 08, 2009
Denis Koroskin Wrote:
> And here are results (best/average of 3 runs):
> 
> DMD2.023 - 12.492/12.576 ms (-O -inline)
> DMC8.42n - 13.941/14.131 ms (-O -inline)
> 
> 

The only explanation I see is gcc does some optimization that dmd backend doesn't.
February 08, 2009
naryl wrote:
> Denis Koroskin Wrote:
>> And here are results (best/average of 3 runs):
>>
>> DMD2.023 - 12.492/12.576 ms (-O -inline)
>> DMC8.42n - 13.941/14.131 ms (-O -inline)
>>
>>
> 
> The only explanation I see is gcc does some optimization that dmd backend doesn't.

That's a good explanation. The only positive thing about the dmd backend (or dmd in general) is its compile speed.
February 08, 2009
Rainer Deyke wrote:
> Michel Fortin wrote:
>> Polymorphism doesn't work very well while passing objects by value, even
>> in C++. This is called the slicing problem.
> 
> I have heard about the slicing problem.  I know what it is.  But in all
> my years of using C++ as my primary language, I have never actually
> encountered it.  I don't believe it actually exists.

How long have you used C++?

This is not a tendentious question. In the recent years, the advent of quality smart pointers, an increased scrutiny of use of inheritance, and the teaching of idioms associated with reference types has greatly diminished slicing accidents. But in the olden days, people were inheriting value types left and right because it was the emperor's new clothes.

Andrei
February 08, 2009
On Sun, 08 Feb 2009 23:45:41 +0300, naryl <cy@ngs.ru> wrote:

> Denis Koroskin Wrote:
>> And here are results (best/average of 3 runs):
>>
>> DMD2.023 - 12.492/12.576 ms (-O -inline)
>> DMC8.42n - 13.941/14.131 ms (-O -inline)
>>
>>
>
> The only explanation I see is gcc does some optimization that dmd backend doesn't.

Yes, that's why you can't claim D is slow because GCC outperforms DMD.