May 31, 2013 Re: Slow performance compared to C++, ideas? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Timon Gehr Attachments:
| On 31 May 2013 21:05, Timon Gehr <timon.gehr@gmx.ch> wrote:
> On 05/31/2013 12:58 PM, Joseph Rushton Wakeling wrote:
>
>> On 05/31/2013 08:34 AM, Manu wrote:
>>
>>> What's taking the most time?
>>> The lighting loop is so template-tastic, I can't get a feel for how fast
>>> that
>>> loop would be.
>>>
>>
>> Hah, I found this out the hard way recently -- have been doing some
>> experimental
>> reworking of code where some key inner functions were templatized, and it
>> had a
>> nasty effect on performance. I'm guessing it made it impossible for the
>> compilers to inline these functions :-(
>>
>>
> That wouldn't make any sense though, since after template expansion there is no difference between the generated version and a particular handwritten version.
>
Assuming that you would hand-write exactly the same code as the template
expansion...
Typically template expansion leads to countless temporary redundancies,
which you expect the compiler to try and optimise away, but it's not always
able to do so, especially if there is an if() nearby, or worse, a pointer
dereference.
|
May 31, 2013 Re: Slow performance compared to C++, ideas? | ||||
---|---|---|---|---|
| ||||
On 05/31/2013 01:48 PM, Manu wrote:
> I find that using templates actually makes it more likely for the compiler to properly inline. But I think the totally generic expressions produce cases where the compiler is considering too many possibilities that inhibit many optimisations. It might also be that the optimisations get a lot more complex when the code fragments span across a complex call tree with optimisation dependencies on non-deterministic inlining.
Thanks for the detailed advice. :-)
There are two particular things I noted about my own code. One is that whereas in the original the template variables were very simple (just a floating-point type) in the new version they are more complex structures that are indeed more generic (the idea was to enable the code to handle both mutable and immutable forms of one particular data structure).
The second is that the templatization gets moved from the mixin to the functions themselves. I guess that the mixin has the effect of copy-pasting _as if_ I was just writing precisely what I intended.
|
May 31, 2013 Re: Slow performance compared to C++, ideas? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On Friday, 31 May 2013 at 11:49:05 UTC, Manu wrote:
> I find that using templates actually makes it more likely for the compiler
> to properly inline. But I think the totally generic expressions produce
> cases where the compiler is considering too many possibilities that inhibit
> many optimisations.
> It might also be that the optimisations get a lot more complex when the
> code fragments span across a complex call tree with optimisation
> dependencies on non-deterministic inlining.
>
> One of the most important jobs for the optimiser is code re-ordering.
> Generic code is often written in such a way that makes it hard/impossible
> for the optimiser to reorder the flattened code properly.
> Hand written code can have branches and memory accesses carefully placed at
> the appropriate locations.
> Generic code will usually package those sorts of operations behind little
> templates that often flatten out in a different order.
> The optimiser is rarely able to re-order code across if statements, or
> pointer accesses. __restrict is very important in generic code to allow the
> optimiser to reorder across any indirection, otherwise compilers typically
> have to be conservative and presume that something somewhere may have
> changed the destination of a pointer, and leave the order as the template
> expanded. Sadly, D doesn't even support __restrict, and nobody ever uses it
> in C++ anyway.
>
> I've always has better results with writing precisely what I intend the
> compiler to do, and using __forceinline where it needs a little extra
> encouragement.
Thanks for valuable input. Have never had a pleasure to actually try templates in performance-critical code and this a good stuff to remember about. Have added to notes.
|
May 31, 2013 Re: Slow performance compared to C++, ideas? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | I actually have some experience with C++ template
meta-programming in HD video codecs. My experience is that it is
possible for generic code through TMP to match or even beat hand
written code. Modern C++ compilers are very good, able to
optimize away most of the temporary variables resulting very
compact object code, provides you can avoid branches and keep the
arguments const refs as much as possible. A real example is my
TMP generic codec beat the original hand optimized c/asm version
(both use sse intrinsics) by as much as 30% with only a fraction
of the line of code. Another example is the Eigen linear algebra
library, through template meta-programming it is able to match
the speed of Intel MKL.
D is very strong at TMP, it provides a lot more tools
specifically designed for TMP, that is vastly superior than C++
which relies on abusing the templates. This is actually the main
reason drawing me to D: TMP in a more pleasant way. IMO one thing
D needs to address is less surprises, eg. innocent looking code
like v[] = [x,x,x] shouldn't cause major performance hit. In c++
memory allocation is explicit, either operator new or malloc, or
indirectly through a method call, otherwise the language would
not do heap allocation for you.
On Friday, 31 May 2013 at 11:51:04 UTC, Manu wrote:
> Assuming that you would hand-write exactly the same code as the template
> expansion...
> Typically template expansion leads to countless temporary redundancies,
> which you expect the compiler to try and optimise away, but it's not always
> able to do so, especially if there is an if() nearby, or worse, a pointer
> dereference.
|
May 31, 2013 Re: Slow performance compared to C++, ideas? | ||||
---|---|---|---|---|
| ||||
Posted in reply to finalpatch Attachments:
| On 31 May 2013 23:07, finalpatch <fengli@gmail.com> wrote: > I actually have some experience with C++ template meta-programming in HD video codecs. My experience is that it is possible for generic code through TMP to match or even beat hand written code. Modern C++ compilers are very good, able to optimize away most of the temporary variables resulting very compact object code, provides you can avoid branches and keep the arguments const refs as much as possible. A real example is my TMP generic codec beat the original hand optimized c/asm version (both use sse intrinsics) by as much as 30% with only a fraction of the line of code. Another example is the Eigen linear algebra library, through template meta-programming it is able to match the speed of Intel MKL. > Just to clarify, I'm not trying to say templates are slow because they're tempaltes. There's no reason carefully crafted template code couldn't be identical to hand crafted code. What I am saying, is that it introduces the possibility for countless subtle details to get in the way. If you want maximum performance from templates, you often need to be really good at expanding the code in your mind, and visualising it all in expanded context, so you can then reason whether anything is likely to get in the way of the optimiser or not. A lot of people don't possess this skill, and for good reason, it's hard! It usually takes considerable time to optimise template code, and optimised template code may often only be optimal in the context you tested against. At some point, depending on the complexity of your code, it might just be easier/less time consuming to write the code directly. It's a fine line, but I've seen so much code that takes it WAAAAY too far. There's always the unpredictable element too. Imagine a large-ish template function, and one very small detail inside is customised of otherwise identical functions. Let's say 2 routines are generated for int and long; the cost of casting int -> long and calling the long function in both cases is insignificant, but using templates, your exe just got bigger, branches less predictable, icache got more noisy, and there's no way to profile for loss of performance introduced this way. In-fact, the profiler will typically erroneously lead you to believe your code is FASTER, but it results in code that may be slower at net. I'm attracted to D for the power of it's templates too, but that attraction is all about simplicity and readability. In D, you can do more with less. The goal is not to use more and more templates, but make the few templates I use, more readable and maintainable. D is very strong at TMP, it provides a lot more tools > specifically designed for TMP, that is vastly superior than C++ which relies on abusing the templates. This is actually the main reason drawing me to D: TMP in a more pleasant way. IMO one thing D needs to address is less surprises, eg. innocent looking code like v[] = [x,x,x] shouldn't cause major performance hit. In c++ memory allocation is explicit, either operator new or malloc, or indirectly through a method call, otherwise the language would not do heap allocation for you. Yeah well... I have a constant inner turmoil with this in D. I want to believe the GC is the future, but I'm still trying to convince myself of that (and I think the GC is losing the battle at the moment). Fortunately you can avoid the GC fairly effectively (if you forego large parts of phobos!). Buy things like the array initialisation are inexcusable. Array literals should NOT allocate, this desperately needs to be fixed. And scope/escape analysis, so local dynamic arrays can be lowered onto the stack in self-contained situations. That's the biggest source of difficult-to-control allocations in my experience. On Friday, 31 May 2013 at 11:51:04 UTC, Manu wrote: > >> Assuming that you would hand-write exactly the same code as the template >> expansion... >> Typically template expansion leads to countless temporary redundancies, >> which you expect the compiler to try and optimise away, but it's not >> always >> able to do so, especially if there is an if() nearby, or worse, a pointer >> dereference. >> > |
May 31, 2013 Re: Slow performance compared to C++, ideas? | ||||
---|---|---|---|---|
| ||||
Posted in reply to finalpatch | On 5/31/13 9:07 AM, finalpatch wrote: > D is very strong at TMP, it provides a lot more tools > specifically designed for TMP, that is vastly superior than C++ > which relies on abusing the templates. This is actually the main > reason drawing me to D: TMP in a more pleasant way. IMO one thing > D needs to address is less surprises, eg. innocent looking code > like v[] = [x,x,x] shouldn't cause major performance hit. In c++ > memory allocation is explicit, either operator new or malloc, or > indirectly through a method call, otherwise the language would > not do heap allocation for you. It would be great if we addressed that in 2.064. I'm sure I've seen the report in bugzilla, but the closest I found were: http://d.puremagic.com/issues/show_bug.cgi?id=9335 http://d.puremagic.com/issues/show_bug.cgi?id=8449 Andrei |
May 31, 2013 Re: Slow performance compared to C++, ideas? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Namespace | Namespace: > I thought GDC or LDC have something like: > float[$] v = [x, x, x]; > which is converted to > flot[3] v = [x, x, x]; > > Am I wrong? > DMD need something like this too. Right. Vote (currently only 6 votes): http://d.puremagic.com/issues/show_bug.cgi?id=481 Bye, bearophile |
May 31, 2013 Re: Slow performance compared to C++, ideas? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Fri, 31 May 2013 10:49:21 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote: > On 5/31/13 9:07 AM, finalpatch wrote: >> D is very strong at TMP, it provides a lot more tools >> specifically designed for TMP, that is vastly superior than C++ >> which relies on abusing the templates. This is actually the main >> reason drawing me to D: TMP in a more pleasant way. IMO one thing >> D needs to address is less surprises, eg. innocent looking code >> like v[] = [x,x,x] shouldn't cause major performance hit. In c++ >> memory allocation is explicit, either operator new or malloc, or >> indirectly through a method call, otherwise the language would >> not do heap allocation for you. > > It would be great if we addressed that in 2.064. I'm sure I've seen the report in bugzilla, but the closest I found were: > > http://d.puremagic.com/issues/show_bug.cgi?id=9335 > http://d.puremagic.com/issues/show_bug.cgi?id=8449 There was this: http://d.puremagic.com/issues/show_bug.cgi?id=2356 I know Don has suggested in the past that all array literals be immutable, like strings, and I agree with that. But it would be a huge breaking change. I agree with finalpatch that array literals allocating is not obvious or expected in many cases. I wonder if the compiler can prove that an array literal isn't referenced outside the function (or statement at least), it can allocate it on the stack instead of the heap? That would be a huge improvement, and good middle ground. -Steve |
May 31, 2013 Re: Slow performance compared to C++, ideas? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | Manu: > Yeah, I've actually noticed this too on a few occasions. It would be nice > if array operations would unroll for short arrays. Particularly so for static arrays! Thanks to Kenji the latest dmd 2.063 solves part of this problem: http://d.puremagic.com/issues/show_bug.cgi?id=2356 Maybe this improvement is not yet in LDC/GDC. But avoiding heap allocations for array literals is a change that needs to be discussed. Bye, bearophile |
May 31, 2013 Re: Slow performance compared to C++, ideas? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | Manu:
> Frankly, this is a textbook example of why STL is the spawn of satan. For
> some reason people are TAUGHT that it's reasonable to write code like this.
There are many kinds of D code, not everything is a high performance ray-tracer or 3D game. So I'm sure there are many many situations where using the C++ STL is more than enough. As most tools, you need to know where and when to use them. So it's not a Satan-spawn :-)
Bye,
bearophile
|
Copyright © 1999-2021 by the D Language Foundation