March 28, 2006
James Dunne wrote:
> If possible, can someone lay out a clear definition of both "array expressions" and "expression templates"?  I'd really like to fully understand what's possible in this area for my own research.
> 
> Thanks,
> 

afaik, array expressions are just expressions which get evaluated element-wise over whole arrays:

a[] = b[] + c[]; // must be same length

is the same as

for (int i=0; i<a.length; i++)
    a[i] = b[i] + c[i];

The advantage of having them instead of doing for loops (in addition to aesthetics) is that the compiler can optimize the code much better (for example, by doing vectorization == AltiVec/MMX/SSE), because it clearly knows what you're doing - with a for loop, it's just a bunch of single-element operations.


Expression templates, otoh, are a somewhat complex template technique, which allows efficient evaluation of expressions over arbitrary types. Instead of evaluating the expression one operation at a time:

a = b + c * d

usually becomes

_t1 = c.opMul(d);
_t2 = b.opAdd(_t1);
a = _t2

the expressions first evaluate to template instances, which can then be inlined and optimized by the compiler. That obviously results in faster execution. The above example would become something like:

auto expr=Sum!(b, Product!(c, d));
a.length=expr.length;
for (int i=0; i<a.length; i++)
    a[i]=expr.evaluate(i); // hopefully inlines to b[i]+c[i]*d[i]

http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html

One problem in doing them in D is that you can't overload the = operator, so the best one can hope for is

(b+c*d).assignTo(a);
// or
a = (b+c*d).eval();

Another problem is that expression templates rely heavily on implicit instantiation, which is currently quite basic in D (but getting better).


Hope that helped :)


xs0
March 28, 2006
Don Clugston wrote:
> You raise a good point, though -- unavoidable temporaries could be treated better (eg with memory pools), this proposal does not let you distinguish between "new temporary = a+b " and "new result = a+b", the former could be stored in a memory pool. I think that's a minor issue, though.

I think that's something that could be added as a Quality of Implementation issue without violating the rules you've outlined.  ie. it doesn't matter where the memory comes from.  Temporaries could even be allocated using alloca in some cases.

As for other issues with the behavior--so long as this is spelled out in the spec then I don't see any problems with it.  As you've said, it's what's actually happening behind the scenes anyway, but defining it this way makes for the most efficient code generation possible, and this is a fantastic guarantee to have in the case of large objects.


Sean
March 28, 2006
Norbert Nemec wrote:
> Don Clugston wrote:
>> Obviously not with real matrices (C*D is not a pointwise operation), but
>>  point taken. (BTW, the temporaries are still there, they're just in
>> registers this time (A[i]*B[i], C[i]*D[i]). The proposal does get rid of
>> all unnecessary temporaries, the problem is that there's no vectorisation).
> 
> Point accepted. For matrices, the issues are much more complicated, but
> still there is quite a bit of optimization possible when vectorization
> is taken into accound. p.e. the expression A*B+C can be done very
> efficiently when done in one shot. (There even are BLAS routines for
> this kind of combined operations, which are very common in many fields
> of application.)

It almost seems like this could be handled via a special opIndex function: opIndexCalc or some such.  If the method exists for all involved types, then:

A = B + C

could be translated to:

for( size_t i = 0; i < A.length; ++i )
    A[i] = B[i] + C[i];

where the subscripting calls opIndexCalc instead of the standard opIndex.  But this leaves out array length checking, and simply throwing an IndexOutOfBounds exception if something goes wrong would leave A corrupted.  So perhaps some checking would also be required to see if opIndexCalc should be called?  The only catch is that this would likely need to occur at run-time:

if( A.matches( B ) && A.matches( C ) )
    for( size_t i = 0; i < A.length; ++i )
        A[i] = B[i] + C[i];
else
    A = B + C; // standard method using temporaries

I don't have enough experience to know what might work here, but it would be great if an alternative to expression templates could be devised.


Sean
March 28, 2006
xs0 wrote:
> James Dunne wrote:
> 
>> If possible, can someone lay out a clear definition of both "array expressions" and "expression templates"?  I'd really like to fully understand what's possible in this area for my own research.
>>
>> Thanks,
>>
> 
> afaik, array expressions are just expressions which get evaluated element-wise over whole arrays:
> 
> a[] = b[] + c[]; // must be same length
> 
> is the same as
> 
> for (int i=0; i<a.length; i++)
>     a[i] = b[i] + c[i];
> 
> The advantage of having them instead of doing for loops (in addition to aesthetics) is that the compiler can optimize the code much better (for example, by doing vectorization == AltiVec/MMX/SSE), because it clearly knows what you're doing - with a for loop, it's just a bunch of single-element operations.
> 
> 
> Expression templates, otoh, are a somewhat complex template technique, which allows efficient evaluation of expressions over arbitrary types. Instead of evaluating the expression one operation at a time:
> 
> a = b + c * d
> 
> usually becomes
> 
> _t1 = c.opMul(d);
> _t2 = b.opAdd(_t1);
> a = _t2
> 
> the expressions first evaluate to template instances, which can then be inlined and optimized by the compiler. That obviously results in faster execution. The above example would become something like:
> 
> auto expr=Sum!(b, Product!(c, d));
> a.length=expr.length;
> for (int i=0; i<a.length; i++)
>     a[i]=expr.evaluate(i); // hopefully inlines to b[i]+c[i]*d[i]
> 
> http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html
> 
> One problem in doing them in D is that you can't overload the = operator, so the best one can hope for is
> 
> (b+c*d).assignTo(a);
> // or
> a = (b+c*d).eval();
> 
> Another problem is that expression templates rely heavily on implicit instantiation, which is currently quite basic in D (but getting better).
> 
> 
> Hope that helped :)
> 
> 
> xs0

Yes, thank you very much!

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/MU/S d-pu s:+ a-->? C++++$ UL+++ P--- L+++ !E W-- N++ o? K? w--- O M--@ V? PS PE Y+ PGP- t+ 5 X+ !R tv-->!tv b- DI++(+) D++ G e++>e h>--->++ r+++ y+++
------END GEEK CODE BLOCK------

James Dunne
March 29, 2006
Sean Kelly wrote:
> I don't have enough experience to know what might work here, but it
> would be great if an alternative to expression templates could be devised.

I believe people should not be overly afraid of expression templates. In C++ they are ugly because the whole template system is ugly. In principle they are a tremendously powerful concept that should definitely be supported in D as well as possible.

When I say that there should be support for array expressions that does not rely on expression templates, that is only because I believe that arrays are crucial enough for the language to justify this special treatment.
April 02, 2006
Don Clugston wrote:
> Background: Operator overloading, in the form it exists in C++ and currently in D, inherently results in sub-optimal code, because it always results in unnecessary temporary objects being created.
> 
> For example,
> X = A - ((B*C) + D)* E;
> 
> becomes:
> T1 = B * C;
> T2 = T1 + D;
> T3 = T2 * E;
> T4 = A - T3;
> X = T4;
> Four objects were created, whereas only one was strictly required.

Ok, I'm new to this, so it took me a while to understand the problem. Let's see if I got it right: this is actually only a problem when the operator methods explicitly instantiate a *class object*, to be used as the return of the method, right?



-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
April 03, 2006
Bruno Medeiros wrote:
> Don Clugston wrote:
>> Background: Operator overloading, in the form it exists in C++ and currently in D, inherently results in sub-optimal code, because it always results in unnecessary temporary objects being created.
>>
>> For example,
>> X = A - ((B*C) + D)* E;
>>
>> becomes:
>> T1 = B * C;
>> T2 = T1 + D;
>> T3 = T2 * E;
>> T4 = A - T3;
>> X = T4;
>> Four objects were created, whereas only one was strictly required.
> 
> Ok, I'm new to this, so it took me a while to understand the problem. Let's see if I got it right: this is actually only a problem when the operator methods explicitly instantiate a *class object*, to be used as the return of the method, right?

Not really, it applies everywhere that you can have overloaded operators. The cost of a temporary will be greater with classes, but for structs, eliminating temporaries will make it much easier for the compiler to optimise.

April 04, 2006
Don Clugston wrote:
> Bruno Medeiros wrote:
>> Don Clugston wrote:
>>> Background: Operator overloading, in the form it exists in C++ and currently in D, inherently results in sub-optimal code, because it always results in unnecessary temporary objects being created.
>>>
>>> For example,
>>> X = A - ((B*C) + D)* E;
>>>
>>> becomes:
>>> T1 = B * C;
>>> T2 = T1 + D;
>>> T3 = T2 * E;
>>> T4 = A - T3;
>>> X = T4;
>>> Four objects were created, whereas only one was strictly required.
>>
>> Ok, I'm new to this, so it took me a while to understand the problem. Let's see if I got it right: this is actually only a problem when the operator methods explicitly instantiate a *class object*, to be used as the return of the method, right?
> 
> Not really, it applies everywhere that you can have overloaded operators. The cost of a temporary will be greater with classes, but for structs, eliminating temporaries will make it much easier for the compiler to optimise.
> 

But with structs (more generally, with stack-based value types), can't the compiler already optimize it? In your example, it seems to me that the compiler make the code so that it uses only two temporaries:

T1 = B * C;
T2 = T1 + D; // T1 is now free for use
T1 = T2 * E; // T2 is now free for use
T2 = A - T1; // T1 is now free for use
X = T2;

And, if inlining occurs, it can be made to use only one temporary, no?


-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
April 05, 2006
Bruno Medeiros wrote:
> Don Clugston wrote:
>> Bruno Medeiros wrote:
>>> Don Clugston wrote:
>>>> Background: Operator overloading, in the form it exists in C++ and currently in D, inherently results in sub-optimal code, because it always results in unnecessary temporary objects being created.
>>>>
>>>> For example,
>>>> X = A - ((B*C) + D)* E;
>>>>
>>>> becomes:
>>>> T1 = B * C;
>>>> T2 = T1 + D;
>>>> T3 = T2 * E;
>>>> T4 = A - T3;
>>>> X = T4;
>>>> Four objects were created, whereas only one was strictly required.
>>>
>>> Ok, I'm new to this, so it took me a while to understand the problem. Let's see if I got it right: this is actually only a problem when the operator methods explicitly instantiate a *class object*, to be used as the return of the method, right?
>>
>> Not really, it applies everywhere that you can have overloaded operators. The cost of a temporary will be greater with classes, but for structs, eliminating temporaries will make it much easier for the compiler to optimise.
>>
> 
> But with structs (more generally, with stack-based value types), can't the compiler already optimize it? In your example, it seems to me that the compiler make the code so that it uses only two temporaries:
> 
> T1 = B * C;
> T2 = T1 + D; // T1 is now free for use
> T1 = T2 * E; // T2 is now free for use
> T2 = A - T1; // T1 is now free for use
> X = T2;

True, but for objects on the stack, the cost is really just in the copying of data, not the memory allocation. T1 and T2 still get initialised twice.

> And, if inlining occurs, it can be made to use only one temporary, no?

Indeed, the compiler might optimise it, if the structs are small enough. Which is why I said that it makes it "much easier for the compiler to optimise". It might be able to do it without this help, but my experience with C++ has been that inlining is unreliable.


1 2 3
Next ›   Last »