| Thread overview | ||||||||
|---|---|---|---|---|---|---|---|---|
|
June 25, 2011 Re: Programming language benchmark | ||||
|---|---|---|---|---|
| ||||
Don: Sorry for my slow answer, I was quite busy for days. > I've never heard that claim before. Do you have evidence for that? I compare/convert code to D every day, so I am aware that D code compiled with DMD is often slower than C/C++ code compiled with GCC. Since some years I even keep a collection of snippets of slow code. But I am also aware that the low performance has many different causes, like some missing inlining, missing loop unrolling, etc, so spotting a clear and small case of integer arithmetic code that causes a slow down, to give you evidence, is not easy. So I am sorry for my overly broad claim. >If it is true, there's a strong possibility that it's a small, fixable issue (for example, DMD used to have terrible performance for ulong multiplication).< You are right, the case I'm going to show is a precise problem that's fixable. ----------------------- // C code #include "limits.h" #include "stdio.h" int divideBySeven(int x) { return x / 7; } int main() { int i = INT_MAX; int r; while (i--) r = divideBySeven(i); printf("%d\n", r); return 0; } ----------------------- // D code int divideBySeven(int x) { return x / 7; } void main() { int i = int.max; int r; while (i--) r = divideBySeven(i); printf("%d\n", r); } ----------------------- Asm from the C version: _divideBySeven: pushl %ebx movl $-1840700269, %ebx movl 8(%esp), %ecx movl %ebx, %eax popl %ebx imull %ecx leal (%edx,%ecx), %eax sarl $31, %ecx sarl $2, %eax subl %ecx, %eax ret _main: leal 4(%esp), %ecx andl $-16, %esp pushl -4(%ecx) pushl %ebx movl $-1840700269, %ebx pushl %ecx subl $20, %esp call ___main movl $2147483646, %ecx .p2align 4,,10 L4: movl %ecx, %eax imull %ebx movl %ecx, %eax addl %ecx, %edx sarl $31, %eax sarl $2, %edx decl %ecx subl %eax, %edx cmpl $-1, %ecx jne L4 movl %edx, 4(%esp) movl $LC0, (%esp) call _printf addl $20, %esp xorl %eax, %eax popl %ecx popl %ebx leal -4(%ecx), %esp ret .def _printf; .scl 2; .type 32; .endef ----------------------- Asm from the D version: _D9int_div_d13divideBySevenFiZi comdat mov ECX,7 cdq idiv ECX ret __Dmain comdat L0: push EAX push EBX mov EBX,07FFFFFFFh push ESI xor ESI,ESI test EBX,EBX lea EBX,-1[EBX] je L24 L11: mov EAX,EBX mov ECX,7 cdq idiv ECX test EBX,EBX mov ESI,EAX lea EBX,-1[EBX] jne L11 L24: push ESI mov EDX,offset FLAT:_DATA push EDX call near ptr _printf add ESP,8 xor EAX,EAX pop ESI pop EBX pop ECX ret ----------------------- For a more real case see: http://d.puremagic.com/issues/show_bug.cgi?id=5607 Bye, bearophile | ||||
June 26, 2011 Re: Programming language benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | bearophile wrote: > Don: > > Sorry for my slow answer, I was quite busy for days. > > >> I've never heard that claim before. Do you have evidence for that? > > I compare/convert code to D every day, so I am aware that D code compiled with DMD is often slower than C/C++ code compiled with GCC. Since some years I even keep a collection of snippets of slow code. > > But I am also aware that the low performance has many different causes, like some missing inlining, missing loop unrolling, etc, so spotting a clear and small case of integer arithmetic code that causes a slow down, to give you evidence, is not easy. So I am sorry for my overly broad claim. It is true in general that DMD's inliner is not very good. I _suspect_ that it is the primary cause of most instances of poor integer performance. It's actually part of the front-end, not the back-end. So many of those performance problems won't apply to DMC. It's also true that the DMD/DMC instruction scheduler doesn't schedule for modern processors. But last I checked, GCC wasn't really much better in practice (you have to be almost perfect to get a benefit from instruction scheduling these days, the hardware does a very good job on unscheduled code). Otherwise, I don't think there's any major optimisation it misses. But it's quite likely that it misses several very specific minor optimizations. >> If it is true, there's a strong possibility that it's a small, fixable issue (for example, DMD used to have terrible performance for ulong multiplication).< > > You are right, the case I'm going to show is a precise problem that's fixable. [snip] > ----------------------- > > For a more real case see: > http://d.puremagic.com/issues/show_bug.cgi?id=5607 Thanks, that's helpful. It's a major speed difference (factor of 20, maybe) so it wouldn't have to occur very often to be noticeable. | |||
June 26, 2011 Re: Programming language benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Don | Don wrote: > bearophile wrote: >> Don: >> >> Sorry for my slow answer, I was quite busy for days. >> >> >>> I've never heard that claim before. Do you have evidence for that? >> >> I compare/convert code to D every day, so I am aware that D code compiled with DMD is often slower than C/C++ code compiled with GCC. Since some years I even keep a >> collection of snippets of slow code. >> >> But I am also aware that the low performance has many different causes, like some missing inlining, missing loop unrolling, etc, so spotting a clear and small case of > integer arithmetic code that causes a slow down, to give you evidence, is not easy. So I am sorry for my overly broad claim. > > It is true in general that DMD's inliner is not very good. I _suspect_ that it is the primary cause of most instances of poor integer performance. It's actually part of the front-end, not the back-end. So many of those performance problems won't apply to DMC. > > It's also true that the DMD/DMC instruction scheduler doesn't schedule for modern processors. But last I checked, GCC wasn't really much better in practice (you have to be almost perfect to get a benefit from instruction scheduling these days, the hardware does a very good job on unscheduled code). > > Otherwise, I don't think there's any major optimisation it misses. But it's quite likely that it misses several very specific minor optimizations. > >>> If it is true, there's a strong possibility that it's a small, fixable issue (for example, DMD used to have terrible performance for ulong multiplication).< >> >> You are right, the case I'm going to show is a precise problem that's fixable. > > [snip] >> ----------------------- >> >> For a more real case see: http://d.puremagic.com/issues/show_bug.cgi?id=5607 > > > Thanks, that's helpful. It's a major speed difference (factor of 20, maybe) so it wouldn't have to occur very often to be noticeable. You may also want to have a look at this paper: http://www.agner.org/optimize/optimizing_cpp.pdf I don't know if it still accurately reflects the current state though. Interestingly, it says that DMC is already able to perform the optimization requested by bearophile. On page 73 starts a tabular that is quite specific about which optimizations the DMC backend is lacking. Cheers, -Timon | |||
June 26, 2011 Re: Programming language benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr | Timon Gehr wrote: > Don wrote: >> bearophile wrote: >>> Don: >>> >>> Sorry for my slow answer, I was quite busy for days. >>> >>> >>>> I've never heard that claim before. Do you have evidence for that? >>> I compare/convert code to D every day, so I am aware that D code compiled with > DMD is often slower than C/C++ code compiled with GCC. Since some years I even keep a >>> collection of snippets of slow code. >>> >>> But I am also aware that the low performance has many different causes, like > some missing inlining, missing loop unrolling, etc, so spotting a clear and small > case of >> integer arithmetic code that causes a slow down, to give you evidence, is not > easy. So I am sorry for my overly broad claim. >> It is true in general that DMD's inliner is not very good. I _suspect_ >> that it is the primary cause of most instances of poor integer >> performance. It's actually part of the front-end, not the back-end. So >> many of those performance problems won't apply to DMC. >> >> It's also true that the DMD/DMC instruction scheduler doesn't schedule >> for modern processors. But last I checked, GCC wasn't really much better >> in practice (you have to be almost perfect to get a benefit from >> instruction scheduling these days, the hardware does a very good job on >> unscheduled code). >> >> Otherwise, I don't think there's any major optimisation it misses. But >> it's quite likely that it misses several very specific minor optimizations. >> >>>> If it is true, there's a strong possibility that it's a small, fixable issue > (for example, DMD used to have terrible performance for ulong multiplication).< >>> You are right, the case I'm going to show is a precise problem that's fixable. >> [snip] >>> ----------------------- >>> >>> For a more real case see: >>> http://d.puremagic.com/issues/show_bug.cgi?id=5607 >> >> Thanks, that's helpful. It's a major speed difference (factor of 20, >> maybe) so it wouldn't have to occur very often to be noticeable. > > You may also want to have a look at this paper: > > http://www.agner.org/optimize/optimizing_cpp.pdf > > I don't know if it still accurately reflects the current state though. It's a little out of date, DMD now does a couple of things it didn't do when Agner did the testing. Incidentally I contributed a bit to that paper <g>. > Interestingly, it says that DMC is already able to perform the optimization > requested by bearophile. > > On page 73 starts a tabular that is quite specific about which optimizations the > DMC backend is lacking. > > Cheers, > -Timon | |||
June 26, 2011 Re: Programming language benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Don | On 6/26/2011 2:24 PM, Don wrote:
>> You may also want to have a look at this paper:
>>
>> http://www.agner.org/optimize/optimizing_cpp.pdf
>>
>> I don't know if it still accurately reflects the current state though.
>
> It's a little out of date, DMD now does a couple of things it didn't do when
> Agner did the testing. Incidentally I contributed a bit to that paper <g>.
>
The table has several errors wrt DMC++. For example, DMC++ certainly does function inlining, constant propagation and the branch optimizations.
| |||
June 28, 2011 Re: Programming language benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Kind of off topic, but a good place to get benchmark results for many of the programming languages is Sphere Online Judge: http://www.spoj.pl/problems/classical/ They accept solutions in D, but not many have been submitted. I found a few: http://www.spoj.pl/ranks/FCTRL/lang=D http://www.spoj.pl/ranks/HASHIT/lang=D http://www.spoj.pl/ranks/ONP/lang=D Most of the fastest solutions are in C++, but D is pretty close. Maybe we could start submitting solutions :-) | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply