Thread overview
opApply and that int
Jan 04, 2008
Bill Baxter
Jan 04, 2008
Bill Baxter
Jan 04, 2008
BCS
Jan 05, 2008
Bill Baxter
Jan 05, 2008
Bill Baxter
January 04, 2008
I know we've been through this before but I don't recall the conclusion.

Why do we have to pass an int through our opApply functions.

Given an object.opApply that takes a delegate that takes a ref T,
and code like this:

foreach(T x; object) {
     if (x) break;
     if (condition) return Something;
     do_something;
}

the compiler transforms that into something like this:

RType _fn_ret; // (RType is return type of enclosing function)
int _loop_body(ref T x)
{
   if (x) return BREAK;
   if (condition) { _fn_ret = Something; } return RETURN;
   do_something;
   return 0;
}
int _ret = object.opApply(&_loop_body));
if (_ret==RETURN) return;
else if (_ret==GOTO) goto ??;
// maybe some other cases...


My question is this: _loop_body and the caller of opApply share the same enclosing scope, so why not stick the return code in a local variable both can see?  It already seems to do that that for return values (as far as I can tell from reading dmd/src/dmd/statement.c).  So why not do it for the main return code too and generate code like this:

RType _fn_ret;
int _ret = 0;
void _loop_body(ref T x)
{
   _ret = 0;
   if (x) { _ret = BREAK; return; }
   if (condition) { _fn_ret = Something; _ret = RETURN; return; }
   do_something;
}
object.opApply(&_loop_body));
if (_ret==RETURN) return;
else if (_ret==GOTO) goto ??;
// maybe some other cases...


Why oh why does that int have to go traipsing through *my* opApply?

--bb
January 04, 2008
Bill Baxter wrote:
> I know we've been through this before but I don't recall the conclusion.
> 
> Why do we have to pass an int through our opApply functions.
> 
> Given an object.opApply that takes a delegate that takes a ref T,
> and code like this:
> 
> foreach(T x; object) {
>      if (x) break;
>      if (condition) return Something;
>      do_something;
> }
> 
> the compiler transforms that into something like this:
> 
> RType _fn_ret; // (RType is return type of enclosing function)
> int _loop_body(ref T x)
> {
>    if (x) return BREAK;
>    if (condition) { _fn_ret = Something; } return RETURN;
>    do_something;
>    return 0;
> }
> int _ret = object.opApply(&_loop_body));
> if (_ret==RETURN) return;
> else if (_ret==GOTO) goto ??;
> // maybe some other cases...
> 
> 
> My question is this: _loop_body and the caller of opApply share the same enclosing scope, so why not stick the return code in a local variable both can see?  It already seems to do that that for return values (as far as I can tell from reading dmd/src/dmd/statement.c).  So why not do it for the main return code too and generate code like this:
> 
> RType _fn_ret;
> int _ret = 0;
> void _loop_body(ref T x)
> {
>    _ret = 0;
>    if (x) { _ret = BREAK; return; }
>    if (condition) { _fn_ret = Something; _ret = RETURN; return; }
>    do_something;
> }
> object.opApply(&_loop_body));
> if (_ret==RETURN) return;
> else if (_ret==GOTO) goto ??;
> // maybe some other cases...
> 
> 
> Why oh why does that int have to go traipsing through *my* opApply?
> 
> --bb

Ok, Jason poked me into realizing that I completely forgot that the user's opApply has to know to return when the loop body does a break or something.  So with what I just proposed it would still have to check for a non-zero return code, *BUT* it wouldn't have to return it to the caller.  So opApplys could become:

     void opApply(int delegate(ref T) loop_body) {
           for(/*x in elements*/) {
                if (loop_body(x)) return;
           }
     }

At least then users don't have to handle radioactive materials.

Still I'd love to get rid of that int in front of the delegate too and just have something like:

     void opApply(void delegate(ref T) loop_body) {
           for(/*x in elements*/) {
                loop_body(x);
                yield();
           }
     }

The trouble is figuring out how to make yield do its magic.
Macros I guess will make it possible to have yield actually return from the function.  But I don't see a good way to communicate the current loop state to yield().  Yield could maybe know about the stack layouts and the code that calls opApply could be careful to put the "int _ret" variable in a place on the stack that yield() could always reach up to find it.  Yield would be doing tricky non-portable stuff, but the idea is it would be included as part of something low-level like object.d, so non-portable would be ok.  Unfortunately if you call yield in a non-opApply callback situation it could just do bogus stuff and probably couldn't even warn you that what you were doing was bogus.

--bb
January 04, 2008
Reply to Bill,


> Ok, Jason poked me into realizing that I completely forgot that the
> user's opApply has to know to return when the loop body does a break
> or something.  So with what I just proposed it would still have to
> check for a non-zero return code, *BUT* it wouldn't have to return it
> to the caller.  So opApplys could become:
> 

how about this:

void opApply( /**/ bool /**/  delegate(ref T) loop_body) {
 for(/*x in elements*/)
 {
   if (loop_body(x)) return;
 }
}


> Unfortunately if you call yield in a
> non-opApply callback situation it could just do bogus stuff and
> probably
> couldn't even warn you that what you were doing was bogus.

yield(loop_body(x));  // can check stuff about loop_body

but that still has the issue of:

MyObject mo;
mo.opApply((ref T x){something(x);});  // call to opApply directly


> --bb
> 


January 05, 2008
Warning, long post, but in the end I think I actually came up with a pretty decent way to make opApply code cleaner without requiring any funky special casing or hacks, and without breaking legacy code.

So please read!

BCS wrote:
> Reply to Bill,
> 
> 
>> Ok, Jason poked me into realizing that I completely forgot that the user's opApply has to know to return when the loop body does a break or something.  So with what I just proposed it would still have to check for a non-zero return code, *BUT* it wouldn't have to return it to the caller.  So opApplys could become:
>>
> 
> how about this:
> 
> void opApply( /**/ bool /**/  delegate(ref T) loop_body) {
>  for(/*x in elements*/)
>  {
>    if (loop_body(x)) return;
>  }
> }

Oh, right :-)  A bool would be the way to go.

>> Unfortunately if you call yield in a
>> non-opApply callback situation it could just do bogus stuff and
>> probably
>> couldn't even warn you that what you were doing was bogus.
> 
> yield(loop_body(x));  // can check stuff about loop_body
> 
> but that still has the issue of:
> 
> MyObject mo;
> mo.opApply((ref T x){something(x);});  // call to opApply directly

Hmm, maybe this is what you were getting at when you said "can check
stuff", but it just ocurred to me that loop_body is a delegate whose
context pointer points to the stack frame where _ret lives.  So we have
access to the apropriate stack frame, we just don't know
(A) the right offset for _ret or
(B) if there even *is* a _ret in that context (as there wouldn't be for
a direct call to the opApply)

So we could make that work if we could somehow pass opApply an int* that points to _ret.  But then users would have access to that radioactive int* which isn't any better than what we started with.

Ok lets face it, though.  Currently the type of delegate that you pass to a foreach (be it opApply or some other method) really is not particularly useful for anything other than being called by foreach.  If you don't call it via a foreach, you have to carefully construct a loop_body that handles the int return code properly.  And this is a far far *far* less common thing than writing an opApply.  So I think it's acceptable to make calling opApply and writing an opApply delegate parameter more complex, in order to make writing the opApply itself simpler and safer.

So, a new templates and a new macro in object.d are the answer.

The template just bundles an int* (pointer to _ret) together with the loop body delegate:

struct Apply(Args...) {
     alias void delegate(Args) LoopBody;
     LoopBody _loop_body;
     int* _ret = null;

     void _call(Args a) {
         loop_body(a); // may set *_ret!
     }
}

the macro is this:

macro yield(dg, args...) {
     dg._call(args);
     if (dg._ret && *dg._ret) { return; }
}

and then opApply-like functions can become:

void opApply( Apply!(ref T) dg ) {
    for( /*T x in elements*/ ) {
        yield(dg,x);
    }
}

Now the trickiness is *all* shifted to how you call such a beast properly.  For a foreach in a void function, the compiler will have to generate code like so:

int _ret = 0;
void _loop_body(ref T x)
{
    _ret = 0;
    if (x) { _ret = BREAK; return; }
    if (condition) { _ret = RETURN; return; }
    do_something;
}
object.opApply( Apply(&_loop_body, &_ret)));
if (_ret==RETURN) return;



The language can ALMOST do this today except for three small things:
1) No macros - but they're on the way!
2) Inability to preserve ref-ness of template arguments -- this really
needs to be solved one way or another regardless.
3) The necessary but changes to the foreach code gen -- this is
straightforward.


Attached is a proof of concept demo.  I've manually inlined the yield() code to work around 1), and made the loop body use a non-ref type to work around 2).

So what do you think?  The biggest problems I see are
1) the code breakage, but D2.0 is all about breaking code to make things
better!  Furthermore the signatures of the opApplys are different so the
compiler could very well continue to generate code the old way for any
opApply written in the old style.  So actually very little code has to
break, if any.
2) (maybe the bigger of the two) Walter has never acknowledged that he
sees anything wrong with making users pass around a magic int in their
opApplys.


--bb


January 05, 2008
A slightly more streamlined version of the demo.

* The Apply_call method was unnecessary baggage. yield() can just call dg._loop_body directly

* Resetting *_ret to 0 on every iteration was unnecessary.

* Allowing for _ret to be a null pointer was unnecessary.  That was just intended to make it easier for users to call opApply directly.  But realistically, there's no reason for users to ever do that.  But if they really really want to they still can; they just have to supply that int pointer.

(Note: This code is free for anyone to use for whatever purpose they like)