May 18, 2010
Tue, 18 May 2010 12:13:02 -0700, Walter Bright wrote:

> %u wrote:
>> The DMC++ compiler you mentioned sounds interesting too. I'd like to compare performance with that, the VC++ one, and the Intel compiler.
> 
> When comparing D performance with C++, it is best to compare compilers with the same back end, i.e.:
> 
>     dmd with dmc
>     gcc with gdc
>     lcc with ldc
> 
> This is because back ends can vary greatly in the code generated.

What if I'm using a clean room implementation of D with a custom backend and no accompanying C compiler, am I not allowed to compare the performance with anything?

When people compare C compilers, they usually use the latest Visual Studio, gcc, icc, and llvm versions -- i.e. C compilers from various vendors. Using the same logic one is not allowed to compare dmc against those since it would always lose.
May 18, 2010
%u:
> One issue I have with the Visual C++ compiler is that it doesn't seem to support
> loop unswitching (i.e. doubling up code with boolean If statements). I wonder if
> one of the D compilers supports it. I started a thread over at cprogramming
> about it here: http://cboard.cprogramming.com/c-programming/126756-lack-compiler-loop-optimization-loop-unswitching.html

In LDC (LLVM) this optimization is named -loop-unswitch and it's present on default on -O3 and higher.

--------------------------

Your C++ code cleaned up a bit:


#include <stdio.h>
#include <stdlib.h>
#include <math.h>

double test(bool b) {
	double d = 0.0;
	double u = 0.0;
	for (int n = 0; n < 1000000000; n++) {
		d += u;
		if (b)
		    u = sin((double)n);
	}
	return d;
}

int main() {
    bool b = (bool)atoi("1");
    printf("%f\n", test(b));
}


The asm generated of just the test() function:
g++ -O3 -S

__Z4testb:
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ebx
	subl	$36, %esp
	cmpb	$0, 8(%ebp)
	jne	L2
	fldz
	movl	$1000000000, %eax
	fld	%st(0)
	.p2align 4,,7
L3:
	subl	$1, %eax
	fadd	%st(1), %st
	jne	L3
	fstp	%st(1)
	addl	$36, %esp
	popl	%ebx
	popl	%ebp
	ret
	.p2align 4,,7
L2:
	fldz
	xorl	%ebx, %ebx
	fld	%st(0)
	jmp	L5
	.p2align 4,,7
L9:
	fxch	%st(1)
L5:
	faddp	%st, %st(1)
	movl	%ebx, -12(%ebp)
	addl	$1, %ebx
	fildl	-12(%ebp)
	fstpl	(%esp)
	fstpl	-24(%ebp)
	call	_sin
	cmpl	$1000000000, %ebx
	fldl	-24(%ebp)
	jne	L9
	fstp	%st(1)
	addl	$36, %esp
	popl	%ebx
	popl	%ebp
	ret

-------------------

More aggressive compilation:
g++ -O3 -s -fomit-frame-pointer -msse3 -march=native -ffast-math -S

__Z4testb:
	subl	$4, %esp
	cmpb	$0, 8(%esp)
	jne	L2
	movl	$1000000000, %eax
	.p2align 4,,10
L3:
	decl	%eax
	jne	L3
	fldz
	addl	$4, %esp
	ret
	.p2align 4,,10
L2:
	fldz
	xorl	%eax, %eax
	fld	%st(0)
	.p2align 4,,10
L5:
	movl	%eax, (%esp)
	faddp	%st, %st(1)
	incl	%eax
	fildl	(%esp)
	cmpl	$1000000000, %eax
	fsin
	jne	L5
	fstp	%st(0)
	addl	$4, %esp
	ret

--------------------------

This is a D1 translation:


import tango.math.Math: sin;
import tango.stdc.stdio: printf;
import tango.stdc.stdlib: atoi;

double test(bool b) {
    double d = 0.0;
    double u = 0.0;
    for (int n; n < 1_000_000_000; n++) {
        d += u;
        if (b)
            u = sin(cast(double)n);
    }

    return d;
}

void main() {
    bool b = cast(bool)atoi("1");
    printf("%f\n", test(b));
}


Compiled with:
ldc -O3 -release -inline test.d
Asm produced, note the je .LBB1_4 near the top:


_D5test54testFbZd:
	pushl	%esi
	subl	$64, %esp
	testb	$1, %al
	je	.LBB1_4
	pxor	%xmm0, %xmm0
	movsd	%xmm0, 32(%esp)
	movl	$1000000000, %esi
	movsd	%xmm0, 24(%esp)
	movsd	%xmm0, 16(%esp)
	.align	16
.LBB1_2:
	movsd	32(%esp), %xmm0
	movsd	%xmm0, 56(%esp)
	fldl	56(%esp)
	fstpt	(%esp)
	call	sinl
	fstpl	48(%esp)
	movsd	24(%esp), %xmm1
	addsd	16(%esp), %xmm1
	movsd	%xmm1, 24(%esp)
	decl	%esi
	movsd	32(%esp), %xmm0
	addsd	.LCPI1_0, %xmm0
	movsd	%xmm0, 32(%esp)
	movsd	48(%esp), %xmm0
	movsd	%xmm0, 16(%esp)
	##FP_REG_KILL
	jne	.LBB1_2
.LBB1_3:
	movsd	24(%esp), %xmm0
	movsd	%xmm0, 40(%esp)
	fldl	40(%esp)
	addl	$64, %esp
	popl	%esi
	ret
.LBB1_4:
	movl	$1000000000, %eax
	.align	16
.LBB1_5:
	decl	%eax
	jne	.LBB1_5
	pxor	%xmm0, %xmm0
	movsd	%xmm0, 24(%esp)
	jmp	.LBB1_3

This runs in about 86 seconds.

--------------------------

Aggressive compilation with LDC:
ldc -O3 -release -inline -enable-unsafe-fp-math -unroll-allow-partial test.d

_D5test54testFbZd:
	subl	$92, %esp
	testb	$1, %al
	je	.LBB1_4
	pxor	%xmm0, %xmm0
	xorl	%eax, %eax
	movapd	%xmm0, %xmm1
	movapd	%xmm0, %xmm2
	.align	16
.LBB1_2:
	leal	1(%eax), %ecx
	cvtsi2sd	%ecx, %xmm3
	movsd	%xmm3, 40(%esp)
	leal	2(%eax), %ecx
	cvtsi2sd	%ecx, %xmm3
	movsd	%xmm3, 48(%esp)
	leal	3(%eax), %ecx
	cvtsi2sd	%ecx, %xmm3
	movsd	%xmm3, 56(%esp)
	leal	4(%eax), %ecx
	cvtsi2sd	%ecx, %xmm3
	movsd	%xmm3, 64(%esp)
	movsd	%xmm0, 80(%esp)
	fldl	80(%esp)
	fsin
	fstpl	72(%esp)
	fldl	40(%esp)
	fsin
	fstpl	8(%esp)
	fldl	48(%esp)
	fsin
	fstpl	16(%esp)
	fldl	56(%esp)
	fsin
	fstpl	24(%esp)
	fldl	64(%esp)
	fsin
	fstpl	32(%esp)
	addsd	%xmm1, %xmm2
	addsd	72(%esp), %xmm2
	addsd	8(%esp), %xmm2
	addsd	16(%esp), %xmm2
	movapd	%xmm2, %xmm1
	addsd	24(%esp), %xmm1
	addl	$5, %eax
	cmpl	$1000000000, %eax
	addsd	.LCPI1_0, %xmm0
	movsd	32(%esp), %xmm2
	##FP_REG_KILL
	jne	.LBB1_2
.LBB1_3:
	movsd	%xmm1, (%esp)
	fldl	(%esp)
	addl	$92, %esp
	ret
.LBB1_4:
	xorl	%eax, %eax
	.align	16
.LBB1_5:
	addl	$10, %eax
	cmpl	$1000000000, %eax
	jne	.LBB1_5
	pxor	%xmm1, %xmm1
	jmp	.LBB1_3


This runs in about 58 seconds. Note also it's partially unrolled 4 times.

Here both G++ and LDC are performing loop unswitching.

Bye,
bearophile
May 18, 2010
On 18/05/10 20:19, retard wrote:
> What if I'm using a clean room implementation of D with a custom backend
> and no accompanying C compiler, am I not allowed to compare the
> performance with anything?
>
> When people compare C compilers, they usually use the latest Visual
> Studio, gcc, icc, and llvm versions -- i.e. C compilers from various
> vendors. Using the same logic one is not allowed to compare dmc against
> those since it would always lose.

I don't believe Walter is arguing against this methodology. What he is arguing against is comparing dmd with gcc for example. Comparing ldc with gdc and dmd is fine, comparing dmd with dmc is fine, but when it comes to comparing D and C, he believes you should compare compilers using the same backend, that is dmd and dmc rather than dmd and gcc. Or that's what I took from it.

This said, I don't agree with that methodology, unless it's only a small test. If you're comparing lots of C compilers and D you should include dmc for example if you're using dmd as the D reference, or clang if you're using ldc as a reference. If you're comparing C and D, you should stick to compilers with the same backend, otherwise the one with the superior backend will always win, and it's not a fair interlanguage comparison.
May 18, 2010
Robert Clipsham:
> otherwise the one with the superior backend will always win, and it's not a fair interlanguage comparison.

Life isn't fair. Too bad for the one with a inferior back-end.

Bye,
bearophile
May 18, 2010
retard wrote:
> Tue, 18 May 2010 12:13:02 -0700, Walter Bright wrote:
> 
>> %u wrote:
>>> The DMC++ compiler you mentioned sounds interesting too. I'd like to
>>> compare performance with that, the VC++ one, and the Intel compiler.
>> When comparing D performance with C++, it is best to compare compilers
>> with the same back end, i.e.:
>>
>>     dmd with dmc
>>     gcc with gdc
>>     lcc with ldc
>>
>> This is because back ends can vary greatly in the code generated.
> 
> What if I'm using a clean room implementation of D with a custom backend and no accompanying C compiler, am I not allowed to compare the performance with anything?

You're allowed to do whatever you want. I'm pointing out that the difference in code generator ability should not be misconstrued as a difference in the languages.


> When people compare C compilers, they usually use the latest Visual Studio, gcc, icc, and llvm versions -- i.e. C compilers from various vendors. Using the same logic one is not allowed to compare dmc against those since it would always lose.

It's perfectly reasonable to compare dmc and gcc for code generation quality.
May 18, 2010
bearophile wrote:
> Life isn't fair. Too bad for the one with a inferior back-end.

Of course it isn't fair. But if you want to draw useful conclusions from a benchmark, you have to do what is known as "isolate the variables". If there are two independent variables feeding into performance, you CANNOT draw a conclusion about one of them from the performance. In other words, if:

   g = f(x,y)

then knowing g, x and y tells you nothing at all about x's contribution to g.
May 19, 2010
Tue, 18 May 2010 15:03:43 -0700, Walter Bright wrote:

> retard wrote:
>> Tue, 18 May 2010 12:13:02 -0700, Walter Bright wrote:
>> 
>>> %u wrote:
>>>> The DMC++ compiler you mentioned sounds interesting too. I'd like to compare performance with that, the VC++ one, and the Intel compiler.
>>> When comparing D performance with C++, it is best to compare compilers with the same back end, i.e.:
>>>
>>>     dmd with dmc
>>>     gcc with gdc
>>>     lcc with ldc
>>>
>>> This is because back ends can vary greatly in the code generated.
>> 
>> What if I'm using a clean room implementation of D with a custom backend and no accompanying C compiler, am I not allowed to compare the performance with anything?
> 
> You're allowed to do whatever you want. I'm pointing out that the difference in code generator ability should not be misconstrued as a difference in the languages.

It's a rookie mistake to believe that languages have some kind of differences performance wise. That kind of comparison was likely useful in the 80s when languages and instruction sets had a greater resemblance (they were all low level languages). But as you can see from the bearophile's link ( http://blog.llvm.org/2010/05/glasgow-haskell-compiler- and-llvm.html ), there is larger performance gap between a naive and a highly tuned implementation of the same language than between decent implementations of different modern languages.

Why developers want to compare dmd with g++ is just because they're not interested in D or D's code generator per se. They have a task to solve and they want the fastest production ready (stable enough to compile their solution) toolchain for the problem - NOW. There is no loyalty left. Most mainstream languages contain the same imperative / object oriented hybrid core with small functional extensions (closures/lambdas). You only need to choose the best for this particular task. Usually there's only a limited amount of time left so you may need to guess. You just have to evaluate partial information snippets, for instance that dmd sucks at inlining closures and Java doesn't do tail call optimization.

Ideally a casual developer studies the language grammar for a few hours and then starts writing code. If the language turns out to be bad, he just moves on and forgets it unless the toolchain improves later and there will be a reddit post about it. That's how I met Perl. With years of Pascal/C/C++/Java experience under my belt, I learned that Perl might be a perfect tool for extending apache with our plugin. Few hours of studying (the language) + quite a bit more (the APIdocs) and there I was writing Perl - probably really buggy code, but code nonetheless.

There are even languages that consist of visual graphs (the "editor" is just a CAD-like GUI) or sentences written in normal english - they don't have any kind of link between the target machine and the solution other than the abstract computational model. If you encounter a statement such as:

  find_longest_common_substring(string1, string2);

you cannot know how fast it is. This kind of code is getting more popular and it's called declarative - it doesn't tell how it solves it problem, it just tells what it does. It's also the abstraction level that most developers are (should be) using. You may ask, if that statement is faster in C than in Python. The Python coder could just use the one written in C and invoke it via a foreign function interface. The FFI might add few cycles worth of overhead, but overall the algorithm is the same.
May 20, 2010
retard wrote:
> It's a rookie mistake to believe that languages have some kind of differences performance wise.

Well, they do. It's also true that these performance differences can be swamped by the quality of the implementation, and the ability of the programmer. But that doesn't mean there are not inherent performance differences due to the semantics the language requires.

It's like car racing. The performance is a combination of 3 factors:

1. the 'formula' for the particular class you're racing in
2. the quality of the construction of the car to that formula
3. the ability of the driver

It's simply wrong to measure the performance and then naively attribute it to one of those three, pretending the other two are constant.
May 21, 2010
Thu, 20 May 2010 10:06:17 -0700, Walter Bright wrote:

> retard wrote:
>> It's a rookie mistake to believe that languages have some kind of differences performance wise.
> 
> Well, they do. It's also true that these performance differences can be swamped by the quality of the implementation, and the ability of the programmer. But that doesn't mean there are not inherent performance differences due to the semantics the language requires.
> 
> It's like car racing. The performance is a combination of 3 factors:
> 
> 1. the 'formula' for the particular class you're racing in 2. the quality of the construction of the car to that formula 3. the ability of the driver
> 
> It's simply wrong to measure the performance and then naively attribute it to one of those three, pretending the other two are constant.

Of course. The language/implementation comparisons are all faulty. You also need to model the performance of the programmer by building some kind of developer skill profiles and measure how the languages & implementations compete against each other in all these skill classes. For example the language shooutout site favors experienced programmers; bad programmers generate code with 2-3 orders of magnitude worse performance.
1 2 3
Next ›   Last »