View mode: basic / threaded / horizontal-split · Log in · Help
May 18, 2010
Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL
Tue, 18 May 2010 12:13:02 -0700, Walter Bright wrote:

> %u wrote:
>> The DMC++ compiler you mentioned sounds interesting too. I'd like to
>> compare performance with that, the VC++ one, and the Intel compiler.
> 
> When comparing D performance with C++, it is best to compare compilers
> with the same back end, i.e.:
> 
>     dmd with dmc
>     gcc with gdc
>     lcc with ldc
> 
> This is because back ends can vary greatly in the code generated.

What if I'm using a clean room implementation of D with a custom backend 
and no accompanying C compiler, am I not allowed to compare the 
performance with anything?

When people compare C compilers, they usually use the latest Visual 
Studio, gcc, icc, and llvm versions -- i.e. C compilers from various 
vendors. Using the same logic one is not allowed to compare dmc against 
those since it would always lose.
May 18, 2010
Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL
%u:
> One issue I have with the Visual C++ compiler is that it doesn't seem to support
> loop unswitching (i.e. doubling up code with boolean If statements). I wonder if
> one of the D compilers supports it. I started a thread over at cprogramming
> about it here: http://cboard.cprogramming.com/c-programming/126756-lack-compiler-loop-optimization-loop-unswitching.html

In LDC (LLVM) this optimization is named -loop-unswitch and it's present on default on -O3 and higher.

--------------------------

Your C++ code cleaned up a bit:


#include <stdio.h>
#include <stdlib.h>
#include <math.h>

double test(bool b) {
	double d = 0.0;
	double u = 0.0;
	for (int n = 0; n < 1000000000; n++) {
		d += u;
		if (b)
		    u = sin((double)n);		
	}	
	return d;
}

int main() {
   bool b = (bool)atoi("1");
   printf("%f\n", test(b));    
}


The asm generated of just the test() function:
g++ -O3 -S

__Z4testb:
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ebx
	subl	$36, %esp
	cmpb	$0, 8(%ebp)
	jne	L2
	fldz
	movl	$1000000000, %eax
	fld	%st(0)
	.p2align 4,,7
L3:
	subl	$1, %eax
	fadd	%st(1), %st
	jne	L3
	fstp	%st(1)
	addl	$36, %esp
	popl	%ebx
	popl	%ebp
	ret
	.p2align 4,,7
L2:
	fldz
	xorl	%ebx, %ebx
	fld	%st(0)
	jmp	L5
	.p2align 4,,7
L9:
	fxch	%st(1)
L5:
	faddp	%st, %st(1)
	movl	%ebx, -12(%ebp)
	addl	$1, %ebx
	fildl	-12(%ebp)
	fstpl	(%esp)
	fstpl	-24(%ebp)
	call	_sin
	cmpl	$1000000000, %ebx
	fldl	-24(%ebp)
	jne	L9
	fstp	%st(1)
	addl	$36, %esp
	popl	%ebx
	popl	%ebp
	ret

-------------------

More aggressive compilation:
g++ -O3 -s -fomit-frame-pointer -msse3 -march=native -ffast-math -S

__Z4testb:
	subl	$4, %esp
	cmpb	$0, 8(%esp)
	jne	L2
	movl	$1000000000, %eax
	.p2align 4,,10
L3:
	decl	%eax
	jne	L3
	fldz
	addl	$4, %esp
	ret
	.p2align 4,,10
L2:
	fldz
	xorl	%eax, %eax
	fld	%st(0)
	.p2align 4,,10
L5:
	movl	%eax, (%esp)
	faddp	%st, %st(1)
	incl	%eax
	fildl	(%esp)
	cmpl	$1000000000, %eax
	fsin
	jne	L5
	fstp	%st(0)
	addl	$4, %esp
	ret

--------------------------

This is a D1 translation:


import tango.math.Math: sin;
import tango.stdc.stdio: printf;
import tango.stdc.stdlib: atoi;

double test(bool b) {
   double d = 0.0;
   double u = 0.0;
   for (int n; n < 1_000_000_000; n++) {
       d += u;
       if (b)
           u = sin(cast(double)n);
   }

   return d;
}

void main() {
   bool b = cast(bool)atoi("1");
   printf("%f\n", test(b));    
}


Compiled with:
ldc -O3 -release -inline test.d
Asm produced, note the je .LBB1_4 near the top:


_D5test54testFbZd:
	pushl	%esi
	subl	$64, %esp
	testb	$1, %al
	je	.LBB1_4
	pxor	%xmm0, %xmm0
	movsd	%xmm0, 32(%esp)
	movl	$1000000000, %esi
	movsd	%xmm0, 24(%esp)
	movsd	%xmm0, 16(%esp)
	.align	16
.LBB1_2:
	movsd	32(%esp), %xmm0
	movsd	%xmm0, 56(%esp)
	fldl	56(%esp)
	fstpt	(%esp)
	call	sinl
	fstpl	48(%esp)
	movsd	24(%esp), %xmm1
	addsd	16(%esp), %xmm1
	movsd	%xmm1, 24(%esp)
	decl	%esi
	movsd	32(%esp), %xmm0
	addsd	.LCPI1_0, %xmm0
	movsd	%xmm0, 32(%esp)
	movsd	48(%esp), %xmm0
	movsd	%xmm0, 16(%esp)
	##FP_REG_KILL
	jne	.LBB1_2
.LBB1_3:
	movsd	24(%esp), %xmm0
	movsd	%xmm0, 40(%esp)
	fldl	40(%esp)
	addl	$64, %esp
	popl	%esi
	ret
.LBB1_4:
	movl	$1000000000, %eax
	.align	16
.LBB1_5:
	decl	%eax
	jne	.LBB1_5
	pxor	%xmm0, %xmm0
	movsd	%xmm0, 24(%esp)
	jmp	.LBB1_3

This runs in about 86 seconds.

--------------------------

Aggressive compilation with LDC:
ldc -O3 -release -inline -enable-unsafe-fp-math -unroll-allow-partial test.d

_D5test54testFbZd:
	subl	$92, %esp
	testb	$1, %al
	je	.LBB1_4
	pxor	%xmm0, %xmm0
	xorl	%eax, %eax
	movapd	%xmm0, %xmm1
	movapd	%xmm0, %xmm2
	.align	16
.LBB1_2:
	leal	1(%eax), %ecx
	cvtsi2sd	%ecx, %xmm3
	movsd	%xmm3, 40(%esp)
	leal	2(%eax), %ecx
	cvtsi2sd	%ecx, %xmm3
	movsd	%xmm3, 48(%esp)
	leal	3(%eax), %ecx
	cvtsi2sd	%ecx, %xmm3
	movsd	%xmm3, 56(%esp)
	leal	4(%eax), %ecx
	cvtsi2sd	%ecx, %xmm3
	movsd	%xmm3, 64(%esp)
	movsd	%xmm0, 80(%esp)
	fldl	80(%esp)
	fsin
	fstpl	72(%esp)
	fldl	40(%esp)
	fsin
	fstpl	8(%esp)
	fldl	48(%esp)
	fsin
	fstpl	16(%esp)
	fldl	56(%esp)
	fsin
	fstpl	24(%esp)
	fldl	64(%esp)
	fsin
	fstpl	32(%esp)
	addsd	%xmm1, %xmm2
	addsd	72(%esp), %xmm2
	addsd	8(%esp), %xmm2
	addsd	16(%esp), %xmm2
	movapd	%xmm2, %xmm1
	addsd	24(%esp), %xmm1
	addl	$5, %eax
	cmpl	$1000000000, %eax
	addsd	.LCPI1_0, %xmm0
	movsd	32(%esp), %xmm2
	##FP_REG_KILL
	jne	.LBB1_2
.LBB1_3:
	movsd	%xmm1, (%esp)
	fldl	(%esp)
	addl	$92, %esp
	ret
.LBB1_4:
	xorl	%eax, %eax
	.align	16
.LBB1_5:
	addl	$10, %eax
	cmpl	$1000000000, %eax
	jne	.LBB1_5
	pxor	%xmm1, %xmm1
	jmp	.LBB1_3


This runs in about 58 seconds. Note also it's partially unrolled 4 times.

Here both G++ and LDC are performing loop unswitching.

Bye,
bearophile
May 18, 2010
Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL
On 18/05/10 20:19, retard wrote:
> What if I'm using a clean room implementation of D with a custom backend
> and no accompanying C compiler, am I not allowed to compare the
> performance with anything?
>
> When people compare C compilers, they usually use the latest Visual
> Studio, gcc, icc, and llvm versions -- i.e. C compilers from various
> vendors. Using the same logic one is not allowed to compare dmc against
> those since it would always lose.

I don't believe Walter is arguing against this methodology. What he is 
arguing against is comparing dmd with gcc for example. Comparing ldc 
with gdc and dmd is fine, comparing dmd with dmc is fine, but when it 
comes to comparing D and C, he believes you should compare compilers 
using the same backend, that is dmd and dmc rather than dmd and gcc. Or 
that's what I took from it.

This said, I don't agree with that methodology, unless it's only a small 
test. If you're comparing lots of C compilers and D you should include 
dmc for example if you're using dmd as the D reference, or clang if 
you're using ldc as a reference. If you're comparing C and D, you should 
stick to compilers with the same backend, otherwise the one with the 
superior backend will always win, and it's not a fair interlanguage 
comparison.
May 18, 2010
Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG,
Robert Clipsham:
> otherwise the one with the 
> superior backend will always win, and it's not a fair interlanguage 
> comparison.

Life isn't fair. Too bad for the one with a inferior back-end.

Bye,
bearophile
May 18, 2010
Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL
retard wrote:
> Tue, 18 May 2010 12:13:02 -0700, Walter Bright wrote:
> 
>> %u wrote:
>>> The DMC++ compiler you mentioned sounds interesting too. I'd like to
>>> compare performance with that, the VC++ one, and the Intel compiler.
>> When comparing D performance with C++, it is best to compare compilers
>> with the same back end, i.e.:
>>
>>     dmd with dmc
>>     gcc with gdc
>>     lcc with ldc
>>
>> This is because back ends can vary greatly in the code generated.
> 
> What if I'm using a clean room implementation of D with a custom backend 
> and no accompanying C compiler, am I not allowed to compare the 
> performance with anything?

You're allowed to do whatever you want. I'm pointing out that the difference in 
code generator ability should not be misconstrued as a difference in the languages.


> When people compare C compilers, they usually use the latest Visual 
> Studio, gcc, icc, and llvm versions -- i.e. C compilers from various 
> vendors. Using the same logic one is not allowed to compare dmc against 
> those since it would always lose.

It's perfectly reasonable to compare dmc and gcc for code generation quality.
May 18, 2010
Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG,
bearophile wrote:
> Life isn't fair. Too bad for the one with a inferior back-end.

Of course it isn't fair. But if you want to draw useful conclusions from a 
benchmark, you have to do what is known as "isolate the variables". If there are 
two independent variables feeding into performance, you CANNOT draw a conclusion 
about one of them from the performance. In other words, if:

   g = f(x,y)

then knowing g, x and y tells you nothing at all about x's contribution to g.
May 19, 2010
Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL
Tue, 18 May 2010 15:03:43 -0700, Walter Bright wrote:

> retard wrote:
>> Tue, 18 May 2010 12:13:02 -0700, Walter Bright wrote:
>> 
>>> %u wrote:
>>>> The DMC++ compiler you mentioned sounds interesting too. I'd like to
>>>> compare performance with that, the VC++ one, and the Intel compiler.
>>> When comparing D performance with C++, it is best to compare compilers
>>> with the same back end, i.e.:
>>>
>>>     dmd with dmc
>>>     gcc with gdc
>>>     lcc with ldc
>>>
>>> This is because back ends can vary greatly in the code generated.
>> 
>> What if I'm using a clean room implementation of D with a custom
>> backend and no accompanying C compiler, am I not allowed to compare the
>> performance with anything?
> 
> You're allowed to do whatever you want. I'm pointing out that the
> difference in code generator ability should not be misconstrued as a
> difference in the languages.

It's a rookie mistake to believe that languages have some kind of 
differences performance wise. That kind of comparison was likely useful 
in the 80s when languages and instruction sets had a greater resemblance 
(they were all low level languages). But as you can see from the 
bearophile's link ( http://blog.llvm.org/2010/05/glasgow-haskell-compiler-
and-llvm.html ), there is larger performance gap between a naive and a 
highly tuned implementation of the same language than between decent 
implementations of different modern languages.

Why developers want to compare dmd with g++ is just because they're not 
interested in D or D's code generator per se. They have a task to solve 
and they want the fastest production ready (stable enough to compile 
their solution) toolchain for the problem - NOW. There is no loyalty 
left. Most mainstream languages contain the same imperative / object 
oriented hybrid core with small functional extensions (closures/lambdas). 
You only need to choose the best for this particular task. Usually 
there's only a limited amount of time left so you may need to guess. You 
just have to evaluate partial information snippets, for instance that dmd 
sucks at inlining closures and Java doesn't do tail call optimization.

Ideally a casual developer studies the language grammar for a few hours 
and then starts writing code. If the language turns out to be bad, he 
just moves on and forgets it unless the toolchain improves later and 
there will be a reddit post about it. That's how I met Perl. With years 
of Pascal/C/C++/Java experience under my belt, I learned that Perl might 
be a perfect tool for extending apache with our plugin. Few hours of 
studying (the language) + quite a bit more (the APIdocs) and there I was 
writing Perl - probably really buggy code, but code nonetheless.

There are even languages that consist of visual graphs (the "editor" is 
just a CAD-like GUI) or sentences written in normal english - they don't 
have any kind of link between the target machine and the solution other 
than the abstract computational model. If you encounter a statement such 
as:

 find_longest_common_substring(string1, string2);

you cannot know how fast it is. This kind of code is getting more popular 
and it's called declarative - it doesn't tell how it solves it problem, 
it just tells what it does. It's also the abstraction level that most 
developers are (should be) using. You may ask, if that statement is 
faster in C than in Python. The Python coder could just use the one 
written in C and invoke it via a foreign function interface. The FFI 
might add few cycles worth of overhead, but overall the algorithm is the 
same.
May 20, 2010
Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL
retard wrote:
> It's a rookie mistake to believe that languages have some kind of 
> differences performance wise.

Well, they do. It's also true that these performance differences can be swamped 
by the quality of the implementation, and the ability of the programmer. But 
that doesn't mean there are not inherent performance differences due to the 
semantics the language requires.

It's like car racing. The performance is a combination of 3 factors:

1. the 'formula' for the particular class you're racing in
2. the quality of the construction of the car to that formula
3. the ability of the driver

It's simply wrong to measure the performance and then naively attribute it to 
one of those three, pretending the other two are constant.
May 21, 2010
Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL
Thu, 20 May 2010 10:06:17 -0700, Walter Bright wrote:

> retard wrote:
>> It's a rookie mistake to believe that languages have some kind of
>> differences performance wise.
> 
> Well, they do. It's also true that these performance differences can be
> swamped by the quality of the implementation, and the ability of the
> programmer. But that doesn't mean there are not inherent performance
> differences due to the semantics the language requires.
> 
> It's like car racing. The performance is a combination of 3 factors:
> 
> 1. the 'formula' for the particular class you're racing in 2. the
> quality of the construction of the car to that formula 3. the ability of
> the driver
> 
> It's simply wrong to measure the performance and then naively attribute
> it to one of those three, pretending the other two are constant.

Of course. The language/implementation comparisons are all faulty. You 
also need to model the performance of the programmer by building some 
kind of developer skill profiles and measure how the languages & 
implementations compete against each other in all these skill classes. 
For example the language shooutout site favors experienced programmers; 
bad programmers generate code with 2-3 orders of magnitude worse 
performance.
Next ›   Last »
1 2 3
Top | Discussion index | About this forum | D home