May 18, 2010 Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Tue, 18 May 2010 12:13:02 -0700, Walter Bright wrote:
> %u wrote:
>> The DMC++ compiler you mentioned sounds interesting too. I'd like to compare performance with that, the VC++ one, and the Intel compiler.
>
> When comparing D performance with C++, it is best to compare compilers with the same back end, i.e.:
>
> dmd with dmc
> gcc with gdc
> lcc with ldc
>
> This is because back ends can vary greatly in the code generated.
What if I'm using a clean room implementation of D with a custom backend and no accompanying C compiler, am I not allowed to compare the performance with anything?
When people compare C compilers, they usually use the latest Visual Studio, gcc, icc, and llvm versions -- i.e. C compilers from various vendors. Using the same logic one is not allowed to compare dmc against those since it would always lose.
|
May 18, 2010 Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL | ||||
---|---|---|---|---|
| ||||
Posted in reply to %u | %u: > One issue I have with the Visual C++ compiler is that it doesn't seem to support > loop unswitching (i.e. doubling up code with boolean If statements). I wonder if > one of the D compilers supports it. I started a thread over at cprogramming > about it here: http://cboard.cprogramming.com/c-programming/126756-lack-compiler-loop-optimization-loop-unswitching.html In LDC (LLVM) this optimization is named -loop-unswitch and it's present on default on -O3 and higher. -------------------------- Your C++ code cleaned up a bit: #include <stdio.h> #include <stdlib.h> #include <math.h> double test(bool b) { double d = 0.0; double u = 0.0; for (int n = 0; n < 1000000000; n++) { d += u; if (b) u = sin((double)n); } return d; } int main() { bool b = (bool)atoi("1"); printf("%f\n", test(b)); } The asm generated of just the test() function: g++ -O3 -S __Z4testb: pushl %ebp movl %esp, %ebp pushl %ebx subl $36, %esp cmpb $0, 8(%ebp) jne L2 fldz movl $1000000000, %eax fld %st(0) .p2align 4,,7 L3: subl $1, %eax fadd %st(1), %st jne L3 fstp %st(1) addl $36, %esp popl %ebx popl %ebp ret .p2align 4,,7 L2: fldz xorl %ebx, %ebx fld %st(0) jmp L5 .p2align 4,,7 L9: fxch %st(1) L5: faddp %st, %st(1) movl %ebx, -12(%ebp) addl $1, %ebx fildl -12(%ebp) fstpl (%esp) fstpl -24(%ebp) call _sin cmpl $1000000000, %ebx fldl -24(%ebp) jne L9 fstp %st(1) addl $36, %esp popl %ebx popl %ebp ret ------------------- More aggressive compilation: g++ -O3 -s -fomit-frame-pointer -msse3 -march=native -ffast-math -S __Z4testb: subl $4, %esp cmpb $0, 8(%esp) jne L2 movl $1000000000, %eax .p2align 4,,10 L3: decl %eax jne L3 fldz addl $4, %esp ret .p2align 4,,10 L2: fldz xorl %eax, %eax fld %st(0) .p2align 4,,10 L5: movl %eax, (%esp) faddp %st, %st(1) incl %eax fildl (%esp) cmpl $1000000000, %eax fsin jne L5 fstp %st(0) addl $4, %esp ret -------------------------- This is a D1 translation: import tango.math.Math: sin; import tango.stdc.stdio: printf; import tango.stdc.stdlib: atoi; double test(bool b) { double d = 0.0; double u = 0.0; for (int n; n < 1_000_000_000; n++) { d += u; if (b) u = sin(cast(double)n); } return d; } void main() { bool b = cast(bool)atoi("1"); printf("%f\n", test(b)); } Compiled with: ldc -O3 -release -inline test.d Asm produced, note the je .LBB1_4 near the top: _D5test54testFbZd: pushl %esi subl $64, %esp testb $1, %al je .LBB1_4 pxor %xmm0, %xmm0 movsd %xmm0, 32(%esp) movl $1000000000, %esi movsd %xmm0, 24(%esp) movsd %xmm0, 16(%esp) .align 16 .LBB1_2: movsd 32(%esp), %xmm0 movsd %xmm0, 56(%esp) fldl 56(%esp) fstpt (%esp) call sinl fstpl 48(%esp) movsd 24(%esp), %xmm1 addsd 16(%esp), %xmm1 movsd %xmm1, 24(%esp) decl %esi movsd 32(%esp), %xmm0 addsd .LCPI1_0, %xmm0 movsd %xmm0, 32(%esp) movsd 48(%esp), %xmm0 movsd %xmm0, 16(%esp) ##FP_REG_KILL jne .LBB1_2 .LBB1_3: movsd 24(%esp), %xmm0 movsd %xmm0, 40(%esp) fldl 40(%esp) addl $64, %esp popl %esi ret .LBB1_4: movl $1000000000, %eax .align 16 .LBB1_5: decl %eax jne .LBB1_5 pxor %xmm0, %xmm0 movsd %xmm0, 24(%esp) jmp .LBB1_3 This runs in about 86 seconds. -------------------------- Aggressive compilation with LDC: ldc -O3 -release -inline -enable-unsafe-fp-math -unroll-allow-partial test.d _D5test54testFbZd: subl $92, %esp testb $1, %al je .LBB1_4 pxor %xmm0, %xmm0 xorl %eax, %eax movapd %xmm0, %xmm1 movapd %xmm0, %xmm2 .align 16 .LBB1_2: leal 1(%eax), %ecx cvtsi2sd %ecx, %xmm3 movsd %xmm3, 40(%esp) leal 2(%eax), %ecx cvtsi2sd %ecx, %xmm3 movsd %xmm3, 48(%esp) leal 3(%eax), %ecx cvtsi2sd %ecx, %xmm3 movsd %xmm3, 56(%esp) leal 4(%eax), %ecx cvtsi2sd %ecx, %xmm3 movsd %xmm3, 64(%esp) movsd %xmm0, 80(%esp) fldl 80(%esp) fsin fstpl 72(%esp) fldl 40(%esp) fsin fstpl 8(%esp) fldl 48(%esp) fsin fstpl 16(%esp) fldl 56(%esp) fsin fstpl 24(%esp) fldl 64(%esp) fsin fstpl 32(%esp) addsd %xmm1, %xmm2 addsd 72(%esp), %xmm2 addsd 8(%esp), %xmm2 addsd 16(%esp), %xmm2 movapd %xmm2, %xmm1 addsd 24(%esp), %xmm1 addl $5, %eax cmpl $1000000000, %eax addsd .LCPI1_0, %xmm0 movsd 32(%esp), %xmm2 ##FP_REG_KILL jne .LBB1_2 .LBB1_3: movsd %xmm1, (%esp) fldl (%esp) addl $92, %esp ret .LBB1_4: xorl %eax, %eax .align 16 .LBB1_5: addl $10, %eax cmpl $1000000000, %eax jne .LBB1_5 pxor %xmm1, %xmm1 jmp .LBB1_3 This runs in about 58 seconds. Note also it's partially unrolled 4 times. Here both G++ and LDC are performing loop unswitching. Bye, bearophile |
May 18, 2010 Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL | ||||
---|---|---|---|---|
| ||||
Posted in reply to retard | On 18/05/10 20:19, retard wrote:
> What if I'm using a clean room implementation of D with a custom backend
> and no accompanying C compiler, am I not allowed to compare the
> performance with anything?
>
> When people compare C compilers, they usually use the latest Visual
> Studio, gcc, icc, and llvm versions -- i.e. C compilers from various
> vendors. Using the same logic one is not allowed to compare dmc against
> those since it would always lose.
I don't believe Walter is arguing against this methodology. What he is arguing against is comparing dmd with gcc for example. Comparing ldc with gdc and dmd is fine, comparing dmd with dmc is fine, but when it comes to comparing D and C, he believes you should compare compilers using the same backend, that is dmd and dmc rather than dmd and gcc. Or that's what I took from it.
This said, I don't agree with that methodology, unless it's only a small test. If you're comparing lots of C compilers and D you should include dmc for example if you're using dmd as the D reference, or clang if you're using ldc as a reference. If you're comparing C and D, you should stick to compilers with the same backend, otherwise the one with the superior backend will always win, and it's not a fair interlanguage comparison.
|
May 18, 2010 Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, | ||||
---|---|---|---|---|
| ||||
Posted in reply to Robert Clipsham | Robert Clipsham:
> otherwise the one with the superior backend will always win, and it's not a fair interlanguage comparison.
Life isn't fair. Too bad for the one with a inferior back-end.
Bye,
bearophile
|
May 18, 2010 Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL | ||||
---|---|---|---|---|
| ||||
Posted in reply to retard | retard wrote: > Tue, 18 May 2010 12:13:02 -0700, Walter Bright wrote: > >> %u wrote: >>> The DMC++ compiler you mentioned sounds interesting too. I'd like to >>> compare performance with that, the VC++ one, and the Intel compiler. >> When comparing D performance with C++, it is best to compare compilers >> with the same back end, i.e.: >> >> dmd with dmc >> gcc with gdc >> lcc with ldc >> >> This is because back ends can vary greatly in the code generated. > > What if I'm using a clean room implementation of D with a custom backend and no accompanying C compiler, am I not allowed to compare the performance with anything? You're allowed to do whatever you want. I'm pointing out that the difference in code generator ability should not be misconstrued as a difference in the languages. > When people compare C compilers, they usually use the latest Visual Studio, gcc, icc, and llvm versions -- i.e. C compilers from various vendors. Using the same logic one is not allowed to compare dmc against those since it would always lose. It's perfectly reasonable to compare dmc and gcc for code generation quality. |
May 18, 2010 Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | bearophile wrote:
> Life isn't fair. Too bad for the one with a inferior back-end.
Of course it isn't fair. But if you want to draw useful conclusions from a benchmark, you have to do what is known as "isolate the variables". If there are two independent variables feeding into performance, you CANNOT draw a conclusion about one of them from the performance. In other words, if:
g = f(x,y)
then knowing g, x and y tells you nothing at all about x's contribution to g.
|
May 19, 2010 Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Tue, 18 May 2010 15:03:43 -0700, Walter Bright wrote: > retard wrote: >> Tue, 18 May 2010 12:13:02 -0700, Walter Bright wrote: >> >>> %u wrote: >>>> The DMC++ compiler you mentioned sounds interesting too. I'd like to compare performance with that, the VC++ one, and the Intel compiler. >>> When comparing D performance with C++, it is best to compare compilers with the same back end, i.e.: >>> >>> dmd with dmc >>> gcc with gdc >>> lcc with ldc >>> >>> This is because back ends can vary greatly in the code generated. >> >> What if I'm using a clean room implementation of D with a custom backend and no accompanying C compiler, am I not allowed to compare the performance with anything? > > You're allowed to do whatever you want. I'm pointing out that the difference in code generator ability should not be misconstrued as a difference in the languages. It's a rookie mistake to believe that languages have some kind of differences performance wise. That kind of comparison was likely useful in the 80s when languages and instruction sets had a greater resemblance (they were all low level languages). But as you can see from the bearophile's link ( http://blog.llvm.org/2010/05/glasgow-haskell-compiler- and-llvm.html ), there is larger performance gap between a naive and a highly tuned implementation of the same language than between decent implementations of different modern languages. Why developers want to compare dmd with g++ is just because they're not interested in D or D's code generator per se. They have a task to solve and they want the fastest production ready (stable enough to compile their solution) toolchain for the problem - NOW. There is no loyalty left. Most mainstream languages contain the same imperative / object oriented hybrid core with small functional extensions (closures/lambdas). You only need to choose the best for this particular task. Usually there's only a limited amount of time left so you may need to guess. You just have to evaluate partial information snippets, for instance that dmd sucks at inlining closures and Java doesn't do tail call optimization. Ideally a casual developer studies the language grammar for a few hours and then starts writing code. If the language turns out to be bad, he just moves on and forgets it unless the toolchain improves later and there will be a reddit post about it. That's how I met Perl. With years of Pascal/C/C++/Java experience under my belt, I learned that Perl might be a perfect tool for extending apache with our plugin. Few hours of studying (the language) + quite a bit more (the APIdocs) and there I was writing Perl - probably really buggy code, but code nonetheless. There are even languages that consist of visual graphs (the "editor" is just a CAD-like GUI) or sentences written in normal english - they don't have any kind of link between the target machine and the solution other than the abstract computational model. If you encounter a statement such as: find_longest_common_substring(string1, string2); you cannot know how fast it is. This kind of code is getting more popular and it's called declarative - it doesn't tell how it solves it problem, it just tells what it does. It's also the abstraction level that most developers are (should be) using. You may ask, if that statement is faster in C than in Python. The Python coder could just use the one written in C and invoke it via a foreign function interface. The FFI might add few cycles worth of overhead, but overall the algorithm is the same. |
May 20, 2010 Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL | ||||
---|---|---|---|---|
| ||||
Posted in reply to retard | retard wrote:
> It's a rookie mistake to believe that languages have some kind of differences performance wise.
Well, they do. It's also true that these performance differences can be swamped by the quality of the implementation, and the ability of the programmer. But that doesn't mean there are not inherent performance differences due to the semantics the language requires.
It's like car racing. The performance is a combination of 3 factors:
1. the 'formula' for the particular class you're racing in
2. the quality of the construction of the car to that formula
3. the ability of the driver
It's simply wrong to measure the performance and then naively attribute it to one of those three, pretending the other two are constant.
|
May 21, 2010 Re: Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Thu, 20 May 2010 10:06:17 -0700, Walter Bright wrote:
> retard wrote:
>> It's a rookie mistake to believe that languages have some kind of differences performance wise.
>
> Well, they do. It's also true that these performance differences can be swamped by the quality of the implementation, and the ability of the programmer. But that doesn't mean there are not inherent performance differences due to the semantics the language requires.
>
> It's like car racing. The performance is a combination of 3 factors:
>
> 1. the 'formula' for the particular class you're racing in 2. the quality of the construction of the car to that formula 3. the ability of the driver
>
> It's simply wrong to measure the performance and then naively attribute it to one of those three, pretending the other two are constant.
Of course. The language/implementation comparisons are all faulty. You also need to model the performance of the programmer by building some kind of developer skill profiles and measure how the languages & implementations compete against each other in all these skill classes. For example the language shooutout site favors experienced programmers; bad programmers generate code with 2-3 orders of magnitude worse performance.
|
Copyright © 1999-2021 by the D Language Foundation