March 28, 2006
In article <e09v1r$2tb0$1@digitaldaemon.com>, kris says...
>
>Sean Kelly wrote:
>> Charles wrote:
>> 
>>> Wow looks good ... too good.  How could this have gone un[noticed|implemented] in the  last 20 years ?  I'm anxious to here Walters take.
>> 
>> 
>> I have a feeling this may be a lot more difficult in C++ because of ADL--there are simply a lot more functions to be evaluated when building expression trees.  Also, the standard doesn't seem to consider things from a compiler writer's perspective, which this three-value code optimization requires.
>> 
>> 
>> Sean
>
>ADL? Does that stand for Attention Deficit Level?

Sorry, I missed this entire thread.  What's this about now?

<g>

- EricAnderton at yahoo
March 28, 2006
kris wrote:
> Sean Kelly wrote:
>> Charles wrote:
>>
>>> Wow looks good ... too good.  How could this have gone un[noticed|implemented] in the  last 20 years ?  I'm anxious to here Walters take.
>>
>>
>> I have a feeling this may be a lot more difficult in C++ because of ADL--there are simply a lot more functions to be evaluated when building expression trees.  Also, the standard doesn't seem to consider things from a compiler writer's perspective, which this three-value code optimization requires.
> 
> ADL? Does that stand for Attention Deficit Level?

Argument Dependent Lookup.  ie. the complex overload resolution semantics in C++.  Another potential issue is the lack of "_r" functions in C++, since while free functions can do quite a bit they must either use temporaries, be friend functions with very odd semantics, or do something akin to expression templates.


Sean
March 28, 2006
> Argument Dependent Lookup.  ie. the complex overload resolution
> semantics in C++.  Another potential issue is the lack of "_r" functions
> in C++, since while free functions can do quite a bit they must either
> use temporaries, be friend functions with very odd semantics, or do
> something akin to expression templates.

Ahh , I see.  Well I think this will be huge for D, great idea Don!


Sean Kelly wrote:
> kris wrote:
> 
>> Sean Kelly wrote:
>>
>>> Charles wrote:
>>>
>>>> Wow looks good ... too good.  How could this have gone un[noticed|implemented] in the  last 20 years ?  I'm anxious to here Walters take.
>>>
>>>
>>>
>>> I have a feeling this may be a lot more difficult in C++ because of ADL--there are simply a lot more functions to be evaluated when building expression trees.  Also, the standard doesn't seem to consider things from a compiler writer's perspective, which this three-value code optimization requires.
>>
>>
>> ADL? Does that stand for Attention Deficit Level?
> 
> 
> Argument Dependent Lookup.  ie. the complex overload resolution semantics in C++.  Another potential issue is the lack of "_r" functions in C++, since while free functions can do quite a bit they must either use temporaries, be friend functions with very odd semantics, or do something akin to expression templates.
> 
> 
> Sean
March 28, 2006
Don Clugston wrote:
> Background: Operator overloading, in the form it exists in C++ and currently in D, inherently results in sub-optimal code, because it always results in unnecessary temporary objects being created.
> 
> For example,
> X = A - ((B*C) + D)* E;
> 
> becomes:
> T1 = B * C;
> T2 = T1 + D;
> T3 = T2 * E;
> T4 = A - T3;
> X = T4;
> Four objects were created, whereas only one was strictly required.
> In C++, there are libraries like Blitz++ which use complicated expression templates in order to avoid these creating these temporaries, and provide performance comparable with FORTRAN. I think D can do much better...
> Note that temporaries are avoided when using the opXXXAssign() operators like +=.
> 
> ===========
>   Proposal
> ===========
> (1) Allow the compiler to assume that b = b + c  can be replaced with b += c. (In C++, operator + and operator += are just symbols, the compiler doesn't know that there is any relationship between them).
> In the example above, this would allow the compiler to generate:
> T1 = B * C;
> T1 += D;
> T1 *= E;
> 
> and we have eliminated two of the three temporaries.
> (2). Fill in the gaps in the operator overloading table by introducing
> opAddAssign_r, opSubAssign_r, etc.
> 
> Just as A.opSubAssign(B)
> is the operation  A -= B  or equivalently  A = A - B, similarly
> 
> A.opSubAssign_r(B)
> would mean
> A = B - A.
> and would only occur when temporaries are generated in expressions. Like -=, it's an operation which can frequently be performed very efficiently, but at present the language has no way of expressing it.
> 
> Our original example then becomes:
> 
> T1 = B.opMul(C);
> T1.opAddAssign(D);
> T1.opMulAssign(E);
> T1.opSubAssign_r(A);
> X = T1;
> .... and all the useless temporaries are gone!
> 
> More formally, when the expression tree for an expression is generated:
> With a binary operator XXX, operating on left & right nodes:
> 
> if (the left node is *not* an original leaf node) {
>    // the left node is a temporary, does not need to be preserved.
>    // we don't care if the right node is a temporary or not
>    look for opXXXAssign().
> } else if (the the right node is not an original leaf node) {
>    // the right node is a temporary
>    look for opXXXAssign_r()
> } else {
>   // both left and right nodes are leaf nodes, we have to
>   // create a temporary
>    look for opXXX(), just as it does now.
> }
> 
> These rules also cope with the situation where temporaries are required:
> eg
> X = (A*B) + (C*D);
> becomes
> T1 = A*B;
> T2 = C*D;
> T1 += T2;
> X = T1;
> 
> If this were implemented, it would permanently eradicate (for D) the most significant advantage which Fortran has managed to retain over object-oriented languages. And I really don't think it would be difficult to implement, or have negative side-effects.
> 
> There are a couple of decisions to be made:
> (I) should the compiler use opAdd() and generate a temporary, if opAddAssign_r() doesn't exist, to preserve existing behaviour? I think the answer to this is YES.
> (II) should the compiler use opAdd() and generate a temporary, if oppAddAssign() doesn't exist, to preserve existing behaviour? Again, I'm inclined to answer YES.
> (III) If the code includes +=, and there is an opAdd() but no opAddAssign(), should the compiler accept this, and just generate an opAdd() followed by an assignment?? This would mean that opAdd() would generate the += operation as well as +, while opAddAssign() would be a performance enhancement. (It would still be possible to have opAddAssign() without opAdd(), to have += but not +, but it would not be possible to have + without +=). This would mean that += would be *purely* syntactic sugar.
> 
> Decision III would be a little more difficult to implement and is of less obvious merit, I only mention it as a possibility.
> 
> Comments?

I guess I'll be the "Negative Nancy" here for purposes of strengthening your proposal...

While being well laid out and well thought through, this proposal still screams to me that it's concentrating on the mathematical problem domain.  This is fine for assuming that classes implementing operators will be mimicing real-world mathematical entities, such as vectors, matricies, etc.  But will this affect other problem domains adversely?

I usually like to come from the "everything explicit" angle and don't want the compiler making decisions on my behalf; especially when I'm not aware of them.  My suggestion would be to add a keyword in the operator definition (or class definition) to indicate that you want this sort of operator overloading behavior, such that one could leave it off if the default behavior is desired for other such cases.

In what specific problem domain are you experiencing issues with the current operator overloading syntax/semantics?  Or is it just that you feel that the current syntax/semantics are not quite fully developed?

And last but not least, another problem is in the order of evaluation for the operator overload calls.  What do you propose for this?  I think in order for this _not_ to matter, you'd have to guarantee that the classes themselves are self-contained and would have no references to (or have any effect on) the other classes involved in the expression statement.

This brings me to another related issue: these temporaries are going to be allocated on the GC heap no matter what, correct?  What if a silent out-of-memory exception was thrown from a line of code appearing to have no effect on memory allocation whatsoever?

There's basically no control over the number of temporaries that could be generated.  Also, there'd be "no going back" from a GC to a manual allocation strategy (i.e. memory pools) because you've effectively lost handles to those blocks of memory for the temporaries.  One could use custom allocators on the class for this purpose, but that would have an adverse effect on normal usage of the class.  These findings lead me to believe that classes which overload operators should have the requirement of being 'auto' (as in RAII or RR).

-- 
Regards,
James Dunne
March 28, 2006
James Dunne wrote:
> 
> While being well laid out and well thought through, this proposal still screams to me that it's concentrating on the mathematical problem domain.  This is fine for assuming that classes implementing operators will be mimicing real-world mathematical entities, such as vectors, matricies, etc.  But will this affect other problem domains adversely?

I don't know - but I don't think so. My feeling is that if it strays too far from a mathematical domain, it probably shouldn't be using overloading of the arithmetical operators. In particular, I think that it's very hard to justify a+=b being different to a=a+b.

> I usually like to come from the "everything explicit" angle and don't want the compiler making decisions on my behalf; especially when I'm not aware of them. 

I'll take this as another vote against (III).

My suggestion would be to add a keyword in the operator
> definition (or class definition) to indicate that you want this sort of operator overloading behavior, such that one could leave it off if the default behavior is desired for other such cases.


> In what specific problem domain are you experiencing issues with the current operator overloading syntax/semantics?  Or is it just that you feel that the current syntax/semantics are not quite fully developed?

I was specifically interested in linear algebra. In thinking about Norbet's matrix proposal, I was thinking that it doesn't make sense to work on the syntax when there's an inherent inefficiency underneath.
Ultimately, operator overloading is just syntactic sugar for function calls. The problem with the C++ approach is that it only provides function calls for two of the three situations. Consequently, you suffer an unnecessary performance hit every time you use operator +.


> And last but not least, another problem is in the order of evaluation for the operator overload calls.  What do you propose for this?  I think in order for this _not_ to matter, you'd have to guarantee that the classes themselves are self-contained and would have no references to (or have any effect on) the other classes involved in the expression statement.

True, but I think this already applies to operator +.

> This brings me to another related issue: these temporaries are going to be allocated on the GC heap no matter what, correct?  What if a silent out-of-memory exception was thrown from a line of code appearing to have no effect on memory allocation whatsoever?

Again, this already applies to opAdd. The only thing this proposal changes is that avoidable temporaries are not created. Unavoidable temporaries are unchanged.
You raise a good point, though -- unavoidable temporaries could be treated better (eg with memory pools), this proposal does not let you distinguish between "new temporary = a+b " and "new result = a+b", the former could be stored in a memory pool. I think that's a minor issue, though.

> There's basically no control over the number of temporaries that could be generated.  Also, there'd be "no going back" from a GC to a manual allocation strategy (i.e. memory pools) because you've effectively lost handles to those blocks of memory for the temporaries.  One could use custom allocators on the class for this purpose, but that would have an adverse effect on normal usage of the class.  These findings lead me to believe that classes which overload operators should have the requirement of being 'auto' (as in RAII or RR).

March 28, 2006
I don't see any obvious reasons against this proposal, but one should not overestimate it!

It is true that it allows a number of optimizations and helps avoiding some unnecessary temporaries, but it is not a replacement for expression templates or vectorized expressions (aka array expressions).

Imagine A,B,C,D and X being arrays of the same size and consider the last example in the proposal:

> X = (A*B) + (C*D);
> becomes
> T1 = A*B;
> T2 = C*D;
> T1 += T2;
> X = T1;

Fortran90 could translate the original expression into something like
	for(int i=0;i<N;i++)
		X[i] = (A[i]*B[i]) + (C[i]*D[i]);
which not only eliminates *all* temporaries, but does something more:
handle all calculations in one loop, allowing the memory to be read
cache friendly and all the calculations being done in registers.

C++ expression templates as used in blitz++ et al allow the same kind of optimizations. Array expressions in D could do the same thing. The operator optimization cannot handle this optimization.

So, as it stands I have no objections against the proposal, but it should *NOT* be used as excuse against expression templates or array expressions in the long term.

Greetings,
Norbert
March 28, 2006
Norbert Nemec wrote:
> I don't see any obvious reasons against this proposal, but one should
> not overestimate it!
> 
> It is true that it allows a number of optimizations and helps avoiding
> some unnecessary temporaries, but it is not a replacement for expression
> templates or vectorized expressions (aka array expressions).
> 
> Imagine A,B,C,D and X being arrays of the same size and consider the
> last example in the proposal:
> 
>> X = (A*B) + (C*D);
>> becomes
>> T1 = A*B;
>> T2 = C*D;
>> T1 += T2;
>> X = T1;
> 
> Fortran90 could translate the original expression into something like
> 	for(int i=0;i<N;i++)
> 		X[i] = (A[i]*B[i]) + (C[i]*D[i]);
> which not only eliminates *all* temporaries, but does something more:
> handle all calculations in one loop, allowing the memory to be read
> cache friendly and all the calculations being done in registers.

Obviously not with real matrices (C*D is not a pointwise operation), but  point taken. (BTW, the temporaries are still there, they're just in registers this time (A[i]*B[i], C[i]*D[i]). The proposal does get rid of all unnecessary temporaries, the problem is that there's no vectorisation).

> C++ expression templates as used in blitz++ et al allow the same kind of
> optimizations. Array expressions in D could do the same thing. The
> operator optimization cannot handle this optimization.
> 
> So, as it stands I have no objections against the proposal, but it
> should *NOT* be used as excuse against expression templates or array
> expressions in the long term.

I completely agree. I see this as fixing the general case, but it does nothing for vectorisation. I suspect that array expressions are somewhat special as regards expressions, because of the vectorisation possibility. If we had array expressions, this might eliminate the necessity for expression templates. Do you think that's right?

Thanks for putting this into perspective.

> Greetings,
> Norbert
March 28, 2006
I fully support this proposal. It makes sense to place stricter semantic requirements on overloaded operators. I can not see any problems. You seem to have everything covered. What restrictions should the compiler placed on the operator overloading signatures? Should it for instance be illegal to define an opAdd with a different return type than opAddAssign?

Don Clugston wrote:

> In C++, there are libraries like Blitz++ which use complicated expression templates in order to avoid these creating these temporaries, and provide performance comparable with FORTRAN. I think D can do much better...

Expression templates would still be useful for other cases though. Consider: (A * B) % C.
Here, expression templates could allow evaluating a much more efficient modMul(A,B,C). Expression templates could also help writing less complex but still efficient code by allowing lazy evaluation.

/Oskar
March 28, 2006
Don Clugston wrote:
> Obviously not with real matrices (C*D is not a pointwise operation), but
>  point taken. (BTW, the temporaries are still there, they're just in
> registers this time (A[i]*B[i], C[i]*D[i]). The proposal does get rid of
> all unnecessary temporaries, the problem is that there's no vectorisation).

Point accepted. For matrices, the issues are much more complicated, but still there is quite a bit of optimization possible when vectorization is taken into accound. p.e. the expression A*B+C can be done very efficiently when done in one shot. (There even are BLAS routines for this kind of combined operations, which are very common in many fields of application.)

> I completely agree. I see this as fixing the general case, but it does nothing for vectorisation. I suspect that array expressions are somewhat special as regards expressions, because of the vectorisation possibility. If we had array expressions, this might eliminate the necessity for expression templates. Do you think that's right?

Those have their right of existance and should be supported by the language: array expressions are more comfortable to use and to optimize than a corresponding ET-library, but expression templates can be used for a much larger field of applications than just vectorized expressions. Just consider the linear algebra example above, where an expression template library might automatically optimize A*B+C into a single BLAS function call.
March 28, 2006
Norbert Nemec wrote:
> Don Clugston wrote:
> 
>>Obviously not with real matrices (C*D is not a pointwise operation), but
>> point taken. (BTW, the temporaries are still there, they're just in
>>registers this time (A[i]*B[i], C[i]*D[i]). The proposal does get rid of
>>all unnecessary temporaries, the problem is that there's no vectorisation).
> 
> 
> Point accepted. For matrices, the issues are much more complicated, but
> still there is quite a bit of optimization possible when vectorization
> is taken into accound. p.e. the expression A*B+C can be done very
> efficiently when done in one shot. (There even are BLAS routines for
> this kind of combined operations, which are very common in many fields
> of application.)
> 
> 
>>I completely agree. I see this as fixing the general case, but it does
>>nothing for vectorisation. I suspect that array expressions are somewhat
>>special as regards expressions, because of the vectorisation
>>possibility. If we had array expressions, this might eliminate the
>>necessity for expression templates. Do you think that's right?
> 
> 
> Those have their right of existance and should be supported by the
> language: array expressions are more comfortable to use and to optimize
> than a corresponding ET-library, but expression templates can be used
> for a much larger field of applications than just vectorized
> expressions. Just consider the linear algebra example above, where an
> expression template library might automatically optimize A*B+C into a
> single BLAS function call.

If possible, can someone lay out a clear definition of both "array expressions" and "expression templates"?  I'd really like to fully understand what's possible in this area for my own research.

Thanks,

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/MU/S d-pu s:+ a-->? C++++$ UL+++ P--- L+++ !E W-- N++ o? K? w--- O M--@ V? PS PE Y+ PGP- t+ 5 X+ !R tv-->!tv b- DI++(+) D++ G e++>e h>--->++ r+++ y+++
------END GEEK CODE BLOCK------

James Dunne