Thread overview
Re: A possible solution for the opIndexXxxAssign morass
Oct 14, 2009
Jason House
Oct 14, 2009
Bill Baxter
Oct 14, 2009
Jason House
Oct 14, 2009
Robert Jacques
Oct 14, 2009
Bill Baxter
October 14, 2009
Andrei Alexandrescu Wrote:

> Right now we're in trouble with operators: opIndex and opIndexAssign don't seem to be up to snuff because they don't catch operations like
> 
> a[b] += c;
> 
> with reasonable expressiveness and efficiency.

I would hope that *= += /= and friends could all be handled efficiently with one function written by the programmer. As I see it, there are 3 basic steps:
1. Look up a value by index
2. Mutate the value
3. Store the result

it's possible to use opIndex for #1 and opIndexAssign for #3, but that's not efficient. #1 and #3 should be part of the same function, but I think #2 shouldnot be. What about defining an opIndexOpOpAssign that accepts a delegate for #2 and then use compiler magic to specialize/inline it?
October 14, 2009
On Wed, Oct 14, 2009 at 7:42 AM, Jason House <jason.james.house@gmail.com> wrote:
> Andrei Alexandrescu Wrote:
>
>> Right now we're in trouble with operators: opIndex and opIndexAssign don't seem to be up to snuff because they don't catch operations like
>>
>> a[b] += c;
>>
>> with reasonable expressiveness and efficiency.
>
> I would hope that *= += /= and friends could all be handled efficiently with one function written by the programmer. As I see it, there are 3 basic steps:
> 1. Look up a value by index
> 2. Mutate the value
> 3. Store the result

And as Chad J reminds us, same goes for in-place property mutations
like  a.b += c.
It's just a matter of  accessing  .b  vs .opIndex(b).   And really
same goes for any function  a.memfun(b) += c could benefit from the
same thing (a.length(index)+=3 anyone?)

> it's possible to use opIndex for #1 and opIndexAssign for #3, but that's not efficient. #1 and #3 should be part of the same function, but I think #2 shouldnot be. What about defining an opIndexOpOpAssign that accepts a delegate for #2 and then use compiler magic to specialize/inline it?

It could also be done using a template thing to inject the "mutate the value" operation:

void opIndexOpOpAssignOpSpamOpSpamSpamSpam(string Op)(Thang c, Thing idx) {
     ref v = <lookup [idx] however you like>
     mixin("v "~Op~" c;");
     <store to v to [idx] however you like>
}

or make it an alias function argument and use Op(v, b).

Sparse matrices are a good case to look at for issues.  a[b] is defined for every [b], but if the value is zero nothing is actually stored.  So there may or may not be something you can return a reference to.   In C++ things like std::map just declare that if you try to access a value that isn't there, it gets created.  That way operator[] can always return a reference.   It would be great if we could make a[b] not force a ref return in cases where there is no lvalue that corresponds to the index (or property) being accessed. Gracefully degrade to the slow path in those cases.

A good thing about a template is you can pretty easily specify which cases to allow using template constraints:

void opIndexOpOpAssignOpSpamOpSpamSpamSpam(string Op)(Thang c, Thing b)
       if (Op in "+= -=")
{
   ...
}

(+ 1 small dream there about 'in' being defined to mean substring search for string arguments -- that doesn't currently work does it?)

If the template can't be instantiated for the particular operation,
then the compiler would try to revert to the less efficient standby:
auto tmp = a[b];
tmp op= c;
a[b] = tmp;

The whole thing can generalize to all accessors too.  Instead of just passing the Op, the compiler could pass the accessor string, and args for that accessor.  Here an accessor means ".opIndex(b)",  ".foo", or even a general ".memfun(b)"


void opIndexOpOpAssignOpSpamOpSpamSpamSpam(string Member, string
Op)(Thang c, Thing b)
   if (Member in ".foo() .bar() .opIndex()")
{
     string call = ctReplace(Member, "()", "(b)");  // Member looks
like ".memfun()"  this turns it into ".memfun(b)"
     ref v = mixin("this" ~ call ~ ";");
     < any extra stuff you want to do on accesses to v >
     mixin("v "~Op~" c;");
     < store v back to member >
}

It's ugly and perhaps too low-level, but that can be worked on if the general principle is sound.   Utility functions can be defined to do whatever it is that turns out to be a recurring pattern.  Lack of being virtual could be a problem for classes.

--bb
October 14, 2009
On Wed, Oct 14, 2009 at 9:34 AM, Bill Baxter <wbaxter@gmail.com> wrote:
> On Wed, Oct 14, 2009 at 7:42 AM, Jason House <jason.james.house@gmail.com> wrote:
>> Andrei Alexandrescu Wrote:
>>
>>> Right now we're in trouble with operators: opIndex and opIndexAssign don't seem to be up to snuff because they don't catch operations like
>>>
>>> a[b] += c;
>>>
>>> with reasonable expressiveness and efficiency.
>>
>> I would hope that *= += /= and friends could all be handled efficiently with one function written by the programmer. As I see it, there are 3 basic steps:
>> 1. Look up a value by index
>> 2. Mutate the value
>> 3. Store the result
>
> And as Chad J reminds us, same goes for in-place property mutations
> like  a.b += c.
> It's just a matter of  accessing  .b  vs .opIndex(b).   And really
> same goes for any function  a.memfun(b) += c could benefit from the
> same thing (a.length(index)+=3 anyone?)
>
>> it's possible to use opIndex for #1 and opIndexAssign for #3, but that's not efficient. #1 and #3 should be part of the same function, but I think #2 shouldnot be. What about defining an opIndexOpOpAssign that accepts a delegate for #2 and then use compiler magic to specialize/inline it?
>
> It could also be done using a template thing to inject the "mutate the value" operation:
>
> void opIndexOpOpAssignOpSpamOpSpamSpamSpam(string Op)(Thang c, Thing idx) {
>     ref v = <lookup [idx] however you like>
>     mixin("v "~Op~" c;");
>     <store to v to [idx] however you like>
> }
>
> or make it an alias function argument and use Op(v, b).
>
> Sparse matrices are a good case to look at for issues.  a[b] is defined for every [b], but if the value is zero nothing is actually stored.  So there may or may not be something you can return a reference to.   In C++ things like std::map just declare that if you try to access a value that isn't there, it gets created.  That way operator[] can always return a reference.   It would be great if we could make a[b] not force a ref return in cases where there is no lvalue that corresponds to the index (or property) being accessed. Gracefully degrade to the slow path in those cases.
>
> A good thing about a template is you can pretty easily specify which cases to allow using template constraints:
>
> void opIndexOpOpAssignOpSpamOpSpamSpamSpam(string Op)(Thang c, Thing b)
>       if (Op in "+= -=")
> {
>   ...
> }
>
> (+ 1 small dream there about 'in' being defined to mean substring search for string arguments -- that doesn't currently work does it?)
>
> If the template can't be instantiated for the particular operation,
> then the compiler would try to revert to the less efficient standby:
> auto tmp = a[b];
> tmp op= c;
> a[b] = tmp;
>
> The whole thing can generalize to all accessors too.  Instead of just passing the Op, the compiler could pass the accessor string, and args for that accessor.  Here an accessor means ".opIndex(b)",  ".foo", or even a general ".memfun(b)"
>
>
> void opIndexOpOpAssignOpSpamOpSpamSpamSpam(string Member, string
> Op)(Thang c, Thing b)
>   if (Member in ".foo() .bar() .opIndex()")
> {
>     string call = ctReplace(Member, "()", "(b)");  // Member looks
> like ".memfun()"  this turns it into ".memfun(b)"
>     ref v = mixin("this" ~ call ~ ";");
>     < any extra stuff you want to do on accesses to v >
>     mixin("v "~Op~" c;");
>     < store v back to member >
> }
>
> It's ugly and perhaps too low-level, but that can be worked on if the general principle is sound.   Utility functions can be defined to do whatever it is that turns out to be a recurring pattern.  Lack of being virtual could be a problem for classes.


After mulling over it some more, basically what I'm describing is simply a function that gives the user a chance to rewrite the AST of these kinds of ".memfun(args) op= " type operations.

When the compiler sees     "obj.memfun(b) += c"
It gives that bit of the syntax tree to the AST manipulator function
(if obj defines one) and the function can then alter it however
desired.

This is made somewhat clunky by the fact that our only representation for ASTs is strings.

Actually this could just be a CTFE function.   It doesn't need to be a template.

Just imagine there's a compile-time struct passed in that could do things like this:

string opWhateverAssign(AST syntax)
{
     // First some examples:
     // Assume obj.memfun(b0,b1) += c   is what appeared in source code.
     enum s0 = syntax.args;  // yields "b0, b1" -- compiler knows args
to this fn are called "b0" and "b1"
     enum s1 = syntax.args[0]; // yields "b0"
     enum s2 = syntax.rvalue; // yields "c"
     enum s3 = syntax.member;  // yields  "memfun"
     enum s4 = syntax.formatCallString("v = $syntax.member( x,y )");
// yields "v=memfun(x,y)"
     enum s5 = syntax.defaultImpl; // yields "auto v=memfun(b0,b1);
v+=c; memfun(b0,b1)=v;"

     // ok now I'll actually do something
     static if (syntax.member == "opIndex") {
           // say this is a sparse matrix class
           return ctFormat(q{
                   if (!this.matrix_contains($syntax.args)) {
                          this.create_entry($syntax.args);
                   }
                   auto v = &this.matrix_storage[$syntax.args];
                   *v  $syntax.op  $syntax.rvalue;
            });
     }
     else {
           return syntax.defaultImpl;
      }
}

This assumes we can have CTFE functions inside structs/classes.  It assumes a function called ctFormat that can format a string at compile-time and do perl-like variable interpolation.   It assumes we can pass structs to CTFE functions and use them there.

Really it doesn't have to be just the opAssign type calls either.  You could allow such interceptors for any method call or member access.

This is really close to a nemerle-like macro, actually.  Just modify 4 lines and it is one.

macro opWhateverAssign(AST syntax)
{
     // First some examples:
     // ok now I'll actually do something
     static if (syntax.member == "opIndex") {
           // say this is a sparse matrix class
           <[
                   if (!this.matrix_contains($syntax.args)) {
                          this.create_entry($syntax.args);
                   }
                   auto v = &this.matrix_storage[$syntax.args];
                   *v  $syntax.op  $syntax.rvalue;
            ]>
     }
     else {
           <[ $syntax.defaultImpl; ]>
      }
}

And this really makes me think it's silly to put off macro syntax till D3.  Everything needed is basically there.  In contrast to a new paradigm to reinvent parallel computing.


--bb
October 14, 2009
Bill Baxter Wrote:

> On Wed, Oct 14, 2009 at 7:42 AM, Jason House <jason.james.house@gmail.com> wrote:
> > Andrei Alexandrescu Wrote:
> >
> >> Right now we're in trouble with operators: opIndex and opIndexAssign don't seem to be up to snuff because they don't catch operations like
> >>
> >> a[b] += c;
> >>
> >> with reasonable expressiveness and efficiency.
> >
> > I would hope that *= += /= and friends could all be handled efficiently with one function written by the programmer. As I see it, there are 3 basic steps:
> > 1. Look up a value by index
> > 2. Mutate the value
> > 3. Store the result
> 
> And as Chad J reminds us, same goes for in-place property mutations
> like  a.b += c.
> It's just a matter of  accessing  .b  vs .opIndex(b).   And really
> same goes for any function  a.memfun(b) += c could benefit from the
> same thing (a.length(index)+=3 anyone?)
> 
> > it's possible to use opIndex for #1 and opIndexAssign for #3, but that's not efficient. #1 and #3 should be part of the same function, but I think #2 shouldnot be. What about defining an opIndexOpOpAssign that accepts a delegate for #2 and then use compiler magic to specialize/inline it?
> 
> It could also be done using a template thing to inject the "mutate the value" operation:

The only issue with templates is that they're never virtual
October 14, 2009
Jason House wrote:
> Bill Baxter Wrote:
> 
>> On Wed, Oct 14, 2009 at 7:42 AM, Jason House
>> <jason.james.house@gmail.com> wrote:
>>> Andrei Alexandrescu Wrote:
>>>
>>>> Right now we're in trouble with operators: opIndex and opIndexAssign
>>>> don't seem to be up to snuff because they don't catch operations like
>>>>
>>>> a[b] += c;
>>>>
>>>> with reasonable expressiveness and efficiency.
>>> I would hope that *= += /= and friends could all be handled efficiently with one function written by the programmer. As I see it, there are 3 basic steps:
>>> 1. Look up a value by index
>>> 2. Mutate the value
>>> 3. Store the result
>> And as Chad J reminds us, same goes for in-place property mutations
>> like  a.b += c.
>> It's just a matter of  accessing  .b  vs .opIndex(b).   And really
>> same goes for any function  a.memfun(b) += c could benefit from the
>> same thing (a.length(index)+=3 anyone?)
>>
>>> it's possible to use opIndex for #1 and opIndexAssign for #3, but that's not efficient. #1 and #3 should be part of the same function, but I think #2 shouldnot be. What about defining an opIndexOpOpAssign that accepts a delegate for #2 and then use compiler magic to specialize/inline it?
>> It could also be done using a template thing to inject the "mutate the
>> value" operation:
> 
> The only issue with templates is that they're never virtual

You can make virtuals out of templates, but not templates out of virtuals. I think Walter is now inclined to look at a template-based solution for operator overloading. That would save a mighty lot of code without preventing classes that prefer virtual dispatch from doing so.

Andrei
October 14, 2009
On Wed, 14 Oct 2009 16:49:28 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Jason House wrote:
>> Bill Baxter Wrote:
>>
>>> On Wed, Oct 14, 2009 at 7:42 AM, Jason House
>>> <jason.james.house@gmail.com> wrote:
>>>> Andrei Alexandrescu Wrote:
>>>>
>>>>> Right now we're in trouble with operators: opIndex and opIndexAssign
>>>>> don't seem to be up to snuff because they don't catch operations like
>>>>>
>>>>> a[b] += c;
>>>>>
>>>>> with reasonable expressiveness and efficiency.
>>>> I would hope that *= += /= and friends could all be handled efficiently with one function written by the programmer. As I see it, there are 3 basic steps:
>>>> 1. Look up a value by index
>>>> 2. Mutate the value
>>>> 3. Store the result
>>> And as Chad J reminds us, same goes for in-place property mutations
>>> like  a.b += c.
>>> It's just a matter of  accessing  .b  vs .opIndex(b).   And really
>>> same goes for any function  a.memfun(b) += c could benefit from the
>>> same thing (a.length(index)+=3 anyone?)
>>>
>>>> it's possible to use opIndex for #1 and opIndexAssign for #3, but that's not efficient. #1 and #3 should be part of the same function, but I think #2 shouldnot be. What about defining an opIndexOpOpAssign that accepts a delegate for #2 and then use compiler magic to specialize/inline it?
>>> It could also be done using a template thing to inject the "mutate the
>>> value" operation:
>>  The only issue with templates is that they're never virtual
>
> You can make virtuals out of templates, but not templates out of virtuals. I think Walter is now inclined to look at a template-based solution for operator overloading. That would save a mighty lot of code without preventing classes that prefer virtual dispatch from doing so.
>
> Andrei

I've done something similar for a SmallVec struct. Most of the operator overloads are actually aliases of templated functions (one each for uni-ops, bi-ops, bi-op_r and opassign)