Inherent code performance advantages of D over C? (page 10)

On 08/12/13 11:24, John Colvin wrote: > std.stdio -> std.algorithm -> std.random -> std.numeric -> std.complex. I'd forgotten that std.algorithm pulled in std.random. Glancing through, I'm not sure it uses it apart from for unittests? So it might be possible to strip out the dependency ... I'll have a look this afternoon. This could be a useful lint tool to have, checking for imports that are only used by unittest blocks.

On Sunday, 8 December 2013 at 10:13:58 UTC, Araq wrote: > On Friday, 6 December 2013 at 22:20:19 UTC, Walter Bright wrote: >> >> "there is no way proper C code can be slower than those languages." >> >> -- http://www.reddit.com/r/programming/comments/1s5ze3/benchmarking_d_vs_go_vs_erlang_vs_c_for_mqtt/cduwwoy >> >> comes up now and then. I think it's incorrect, D has many inherent advantages in generating code over C: >> >> 1. D knows when data is immutable. C has to always make worst case assumptions, and assume indirectly accessed data mutates. >> >> 2. D knows when functions are pure. C has to make worst case assumptions. >> >> 3. Function inlining has generally been shown to be of tremendous value in optimization. D has access to all the source code in the program, or at least as much as you're willing to show it, and can inline across modules. C cannot inline functions unless they appear in the same module or in .h files. It's a rare practice to push many functions into .h files. Of course, there are now linkers that can do whole program optimization for C, but those are kind of herculean efforts to work around that C limitation of being able to see only one module at a time. >> >> 4. C strings are 0-terminated, D strings have a length property. The former has major negative performance consequences: >> >> a. lots of strlen()'s are necessary >> >> b. using substrings usually requires a malloc/copy/free sequence >> >> 5. CTFE can push a lot of computation to compile time rather than run time. This has had spectacular positive performance consequences for things like regex. C has no CTFE ability. >> >> 6. D's array slicing coupled with GC means that many malloc/copy/free's normally done in C are unnecessary in D. >> >> 7. D's "final switch" enables more efficient switch code generation, because the default doesn't have to be considered. > > coding conventions (4,5,6) (always pass a (char*, len) pair around for efficient slicing). How does a coding convention allow you to create a high-performance regex engine at compile time? How does it allow you to do pretty much any of what CTFE can do? > Interestingly, things that are encouraged in Ada (this is an array of integers of range 0..30, see value range propagation) are much harder to recompute with whole program optimization and D lacks them. Agreed.

On Sunday, 8 December 2013 at 10:31:49 UTC, Joseph Rushton Wakeling wrote: > On 08/12/13 11:24, John Colvin wrote: >> std.stdio -> std.algorithm -> std.random -> std.numeric -> std.complex. > > I'd forgotten that std.algorithm pulled in std.random. Glancing through, I'm not sure it uses it apart from for unittests? So it might be possible to strip out the dependency ... I'll have a look this afternoon. > > This could be a useful lint tool to have, checking for imports that are only used by unittest blocks. This was just from a quick grepping session. I'm sure there are other paths from std.stdio to std.complex. You should run DGraph on it :p

On 08/12/13 11:34, John Colvin wrote: > This was just from a quick grepping session. I'm sure there are other paths from > std.stdio to std.complex. You should run DGraph on it :p Nice thought, must get round to it :-)

On 08/12/13 11:31, Joseph Rushton Wakeling wrote: > I'd forgotten that std.algorithm pulled in std.random. Glancing through, I'm > not sure it uses it apart from for unittests? On closer look, it's used for std.algorithm.topN. I guess it could be relegated to being imported inside that function (and appropriate unittest blocks), but that does justify it being a top-level import.

Araq: > Interestingly, things that are encouraged in Ada (this is an array of integers of range 0..30, see value range propagation) are much harder to recompute with whole program optimization and D lacks them. I am currently thinking about related topics. What do you mean? I don't understand. Bye, bearophile

December 08, 2013

Re: Inherent code performance advantages of D over C?

Posted by ponce
in reply to Walter Bright

Permalink

ponce

Posted in reply to Walter Bright

Permalink

I work all day with C++ optimization and deal closely with the Intel compiler, here is what I have to say. I agree with all points but I think 1, 3 and 7 are slightly innacurate.

> 1. D knows when data is immutable. C has to always make worst case assumptions, and assume indirectly accessed data mutates.

ICC (and other C++ compilers) has plenty of way to disambiguate aliasing:
- a pragma to let the optimizer assume no loop dependency
- restrict keyword
- /Qalias-const: assumes a parameter of type pointer-to-const does not alias with a parameter of type pointer-to-non-const.
- GCC-like strict aliasing rule

In most case I've seen, the "no loop dependency" pragma is downright spectacular and gives the most bang for the bucks. Every other methods is annoying and barely useful in comparison.

It's not clear to me which aliasing rules D assume.

> 3. Function inlining has generally been shown to be of tremendous value in optimization. D has access to all the source code in the program, or at least as much as you're willing to show it, and can inline across modules. C cannot inline functions unless they appear in the same module or in .h files. It's a rare practice to push many functions into .h files. Of course, there are now linkers that can do whole program optimization for C, but those are kind of herculean efforts to work around that C limitation of being able to see only one module at a time.

This point is not entirely accurate. While the C model is generally harmful with inlining, with the Intel C++ compiler you can absolutely rely on cross-module inlining when doing global optimization. I don't know how it works, but all out tiny functions hidden in separate translation units get inlined.
ICC also provide 4 very useful pragmas for optimization: {forcing|not forcing} inlining [recursively] at call-point, instead of definition point. I find them better than any inline/__force_inline at definition point.

> 7. D's "final switch" enables more efficient switch code generation, because the default doesn't have to be considered.

A good point.
The default: branch can be marked unreachable with most C++ compilers I know of. People don't do it though.
In my experience, ICC performs sufficient static analysis to be able to avoid the switch prelude test. I don't like it, since it is not desirable for reliable optimization.

Would be amazing to have the ICC backend work with a D front-end :)
It kicked my ass so many times.

On 08/12/13 13:35, ponce wrote: > I work all day with C++ optimization and deal closely with the Intel compiler, > here is what I have to say. I agree with all points but I think 1, 3 and 7 are > slightly innacurate. How is icc doing these days? I used it years ago (almost 10 years ago!) when it produced significantly faster executables than gcc, but I had the impression that more recent gcc releases either matched its performance or significantly narrowed the gap.

On Sunday, 8 December 2013 at 13:00:26 UTC, Joseph Rushton Wakeling wrote: > How is icc doing these days? I used it years ago (almost 10 years ago!) when it produced significantly faster executables than gcc, but I had the impression that more recent gcc releases either matched its performance or significantly narrowed the gap. I don't know. People say the gap has reduced a lot and you have to use the #pragmas to get ahead.

Forums