On 31 May 2013 23:07, finalpatch <fengli@gmail.com> wrote:
I actually have some experience with C++ template
meta-programming in HD video codecs. My experience is that it is
possible for generic code through TMP to match or even beat hand
written code. Modern C++ compilers are very good, able to
optimize away most of the temporary variables resulting very
compact object code, provides you can avoid branches and keep the
arguments const refs as much as possible. A real example is my
TMP generic codec beat the original hand optimized c/asm version
(both use sse intrinsics) by as much as 30% with only a fraction
of the line of code. Another example is the Eigen linear algebra
library, through template meta-programming it is able to match
the speed of Intel MKL.

Just to clarify, I'm not trying to say templates are slow because they're tempaltes.
There's no reason carefully crafted template code couldn't be identical to hand crafted code.
What I am saying, is that it introduces the possibility for countless subtle details to get in the way.
If you want maximum performance from templates, you often need to be really good at expanding the code in your mind, and visualising it all in expanded context, so you can then reason whether anything is likely to get in the way of the optimiser or not.
A lot of people don't possess this skill, and for good reason, it's hard! It usually takes considerable time to optimise template code, and optimised template code may often only be optimal in the context you tested against.
At some point, depending on the complexity of your code, it might just be easier/less time consuming to write the code directly.
It's a fine line, but I've seen so much code that takes it WAAAAY too far.

There's always the unpredictable element too. Imagine a large-ish template function, and one very small detail inside is customised of otherwise identical functions.
Let's say 2 routines are generated for int and long; the cost of casting int -> long and calling the long function in both cases is insignificant, but using templates, your exe just got bigger, branches less predictable, icache got more noisy, and there's no way to profile for loss of performance introduced this way. In-fact, the profiler will typically erroneously lead you to believe your code is FASTER, but it results in code that may be slower at net.

I'm attracted to D for the power of it's templates too, but that attraction is all about simplicity and readability.
In D, you can do more with less. The goal is not to use more and more templates, but make the few templates I use, more readable and maintainable.

D is very strong at TMP, it provides a lot more tools
specifically designed for TMP, that is vastly superior than C++
which relies on abusing the templates. This is actually the main
reason drawing me to D: TMP in a more pleasant way. IMO one thing
D needs to address is less surprises, eg. innocent looking code
like v[] = [x,x,x] shouldn't cause major performance hit. In c++
memory allocation is explicit, either operator new or malloc, or
indirectly through a method call, otherwise the language would
not do heap allocation for you.

Yeah well... I have a constant inner turmoil with this in D.
I want to believe the GC is the future, but I'm still trying to convince myself of that (and I think the GC is losing the battle at the moment).
Fortunately you can avoid the GC fairly effectively (if you forego large parts of phobos!).

Buy things like the array initialisation are inexcusable. Array literals should NOT allocate, this desperately needs to be fixed.
And scope/escape analysis, so local dynamic arrays can be lowered onto the stack in self-contained situations.
That's the biggest source of difficult-to-control allocations in my experience.

On Friday, 31 May 2013 at 11:51:04 UTC, Manu wrote:
Assuming that you would hand-write exactly the same code as the template
expansion...
Typically template expansion leads to countless temporary redundancies,
which you expect the compiler to try and optimise away, but it's not always
able to do so, especially if there is an if() nearby, or worse, a pointer
dereference.