Jump to page: 1 2
Thread overview
Closures Part II - local variables outliving function activations
Nov 07, 2001
la7y6nvo
Nov 07, 2001
Axel Kittenberger
Nov 08, 2001
la7y6nvo
Nov 08, 2001
la7y6nvo
Nov 08, 2001
la7y6nvo
Nov 09, 2001
a
Nov 09, 2001
la7y6nvo
Nov 08, 2001
Sean L. Palmer
Nov 09, 2001
la7y6nvo
November 07, 2001
To repeat myself from an earlier posting -

    People who are used to stack-based local variable frames may be
    surprised by the idea that local variables might outlive the function
    activation.  Rather than get into that in too much detail [in that
    posting] I think a follow-up post is a better idea.  Expect that
    follow-up post shortly.

This posting is that follow-up.

There are two issues.  Issue one, is it useful for closures to have access to local variables?  Issue two, assuming they do, how should closures behave when they access local variables whose containing function activations have exited?



Issue one:  Is it useful for closures to have access to local variables?  (The access referred to is real shared access, not access by value or some sort of IN/OUT or value/result semantics.)

My response is, it is useful, and here's why.  First, it comes up in real use, not on every closure certainly, but often enough so that there needs to be some way of sharing local state.  Second, closures tend to be short, so using some other mechanism that adds code would make a substantial difference.  Third, natural expression - using local variables inside closures looks natural and also produces expectable results, much the same way as the functions (methods) inside a class affect the instance variables of the object that's an instance of the class.



Issue two:  How should closures behave when they access local variables whose containing function activation has exited?

There are two main possibilities:  one, same as taking the address of something on the stack (which is to say, a runtime error in the best case, and random memory accesses in the more typical case);  two, if these variables are allocated so that they can outlive the function activation (and are collected by gc), the behavior is just the same as the case where the function activation has not exited.

People who program in C or C++ will probably expect (one).  In spite
of this I claim (two) is better, for the following reasons.

First, D is moving in the direction of more benign semantics, for example garbage collection and array bounds checking.  So providing behavior (two) is consistent with where D is going.

Second, people who are expecting (two) and get (one) will have to deal
with what is probably a nasty bug.  On the other hand, people who are
expecting (one) and get (two) will be pleasantly surprised that the
"anomalous" condition is handled in a way that works.  It's nicer
to be pleasantly surprised than unpleasantly surprised.

Third, the rule for how closures work is now very simple - they work the same whether the surrounding function activation has exited or not.  Simplifying the programming model means simplifying programming.


Editorial comment:
==================

The situation with long-lasting local variable lifetimes is similar to the situation with garbage collection (and indeed they are related). At first it seems a little unnatural, because it's not what one is used to;  but after having it, it's hard to imagine living without.

(End of editorial comment.)
November 07, 2001
> Issue two:  How should closures behave when they access local variables whose containing function activation has exited?

I don't understand a bit what you're talking all the time about, how can a closure outlife it's mother function? Is it an own thread?

- Axel
November 07, 2001
<la7y6nvo@shamko.com> wrote in message news:s7c8zdiubo6.fsf@michael.shamko.com...

> Issue two:  How should closures behave when they access local variables whose containing function activation has exited?
>
> There are two main possibilities:  one, same as taking the address of something on the stack (which is to say, a runtime error in the best case, and random memory accesses in the more typical case);  two, if these variables are allocated so that they can outlive the function activation (and are collected by gc), the behavior is just the same as the case where the function activation has not exited.

I wonder how can this be made technically? Since locals are on stack, returning from the function makes them invalid. Any other mechanism for storing them (like, say, a separate RTL-maintained stack) would be waay slower.




November 08, 2001
Pavel \"EvilOne\" Minayev wrote:

> <la7y6nvo@shamko.com> wrote in message news:s7c8zdiubo6.fsf@michael.shamko.com...
>
> > Issue two:  How should closures behave when they access local variables whose containing function activation has exited?
> >
> > There are two main possibilities:  one, same as taking the address of something on the stack (which is to say, a runtime error in the best case, and random memory accesses in the more typical case);  two, if these variables are allocated so that they can outlive the function activation (and are collected by gc), the behavior is just the same as the case where the function activation has not exited.
>
> I wonder how can this be made technically? Since locals are on stack, returning from the function makes them invalid. Any other mechanism for storing them (like, say, a separate RTL-maintained stack) would be waay slower.

Ok, I'll take a stab at it, but in a decidedly non-thread-safe and non-OO way...

char *utoa( unsigned int i )
{
   // An unsigned int can be 128 bits or more on some systems.  Allow for
it, and for the trailing null
  static char buf[1 + sizeof(unsigned int) * 3];
  char *head, *tail, temp;

  tail = head = &buf[0];

  // Get the digits in low-to-high order
  while (i > 0)
  {
    *tail++ = '0' + (i%10);
    i /= 10;
  }
  // Append a null and point to the last digit.
  *tail-- = '\0';
 // Place the digits in the right order.
  while (head < tail)
  {
    temp = *head;
    *head++ = *tail;
    *tail-- = temp;
  }
  return &buf[0];
}


Yes, it is almost like one of our favorite functions, itoa(), doing what it should.  But the static buffer it allocates cannot ever be free()ed or GC'ed until it can be guaranteed that the function will not be called again.

How would/could this situation be better handled via closures, instead of via "static"?  Should local variables ever outlast their scope without an explicit declaration to permit it (and guarantee it)?

Of course, in D we could always simply change the declaration of buf[] to something like this:

   char *buf;
   buf = new char[(sizeof(unsigned int) * 3];

And then simply let the GC handle the free() for us in the context of the caller, right?  Or am I missing something here?


-BobC


November 08, 2001
A closure can outlive its mother function in the same way that
the address of a local variable can outlive the stack frame of
the mother function - by being stored in a variable that
has a longer lifetime than the mother function stack frame.
And in fact the mechanism for dealing with local variables used
by closures could also be used for any local variable that had
its address taken, so that no dangling-pointer-to-stack-variables
could occur.  I think it would be nice if the compiler would
support an option like that, but that's a whole other digression.

Axel Kittenberger <axel@dtone.org> writes:

> > Issue two:  How should closures behave when they access local variables whose containing function activation has exited?
> 
> I don't understand a bit what you're talking all the time about, how can a closure outlife it's mother function? Is it an own thread?
> 
> - Axel
November 08, 2001
The storage for such local variables would be dynamically allocated using the regular storage allocator.  Yes, it would be slower than just incrementing a stack pointer, but not a lot slower than just a regular function call - storage allocators are pretty good at running quickly these days.

"Pavel \"EvilOne\" Minayev" <evilone@omen.ru> writes:

> <la7y6nvo@shamko.com> wrote in message news:s7c8zdiubo6.fsf@michael.shamko.com...
> 
> > Issue two:  How should closures behave when they access local variables whose containing function activation has exited?
> >
> > There are two main possibilities:  one, same as taking the address of something on the stack (which is to say, a runtime error in the best case, and random memory accesses in the more typical case);  two, if these variables are allocated so that they can outlive the function activation (and are collected by gc), the behavior is just the same as the case where the function activation has not exited.
> 
> I wonder how can this be made technically? Since locals are on stack, returning from the function makes them invalid. Any other mechanism for storing them (like, say, a separate RTL-maintained stack) would be waay slower.
November 08, 2001
<la7y6nvo@shamko.com> wrote in message news:s7cvggmrkt1.fsf@michael.shamko.com...

> The storage for such local variables would be dynamically allocated using the regular storage allocator.  Yes, it would be slower than just incrementing a stack pointer, but not a lot slower than just a regular function call - storage allocators are pretty good at running quickly these days.

Everything is not that simple though. Since functions are re-entrant,
you can't just allocate locals on heap - you have to manage a stack
for them yourself (which is probably slower). This is another source
of problems: suppose we have function A called 3 times, so now there's
three copies of its locals on the stack. The last time we called it,
it returned, directly or indirectly, a closure referencing one of
its locals, so we can't clean the stack, at least not completely. SO
WE HAVE ALL THE LOCALS GENERATED BY PREVIOUS FUNCTION CALLS HANGING
THERE IN THE MEMORY WASTING SPACE - without knowing when to release
them (since closure has outlived the function and can be activated
at any point of the program)! Now what if the function is recursive
and calls itself a thousand times?

If all locals are preserved, not only those that are referenced from closures, as you suggest, than this problem will arise with each and every function. To make it simpler - since every function has its own copy of locals, and since it can possibly return a pointer to one of locals, and since locals should be able to outlive the function, we cannot clear the stack when leaving the function at all!

...am I missing something?


November 08, 2001
Only the heap-allocated locals that are referenced by a closure
that is still accessible need be kept.  Any others would be
cleaned up just like any other unreachable object.  Also any
local variables _not_ referenced by closures can be allocated on
the stack and pruned in the usual way.  No stack frames need
be kept.

Does this answer your question?

"Pavel \"EvilOne\" Minayev" <evilone@omen.ru> writes:

> <la7y6nvo@shamko.com> wrote in message news:s7cvggmrkt1.fsf@michael.shamko.com...
> 
> > The storage for such local variables would be dynamically allocated using the regular storage allocator.  Yes, it would be slower than just incrementing a stack pointer, but not a lot slower than just a regular function call - storage allocators are pretty good at running quickly these days.
> 
> Everything is not that simple though. Since functions are re-entrant,
> you can't just allocate locals on heap - you have to manage a stack
> for them yourself (which is probably slower). This is another source
> of problems: suppose we have function A called 3 times, so now there's
> three copies of its locals on the stack. The last time we called it,
> it returned, directly or indirectly, a closure referencing one of
> its locals, so we can't clean the stack, at least not completely. SO
> WE HAVE ALL THE LOCALS GENERATED BY PREVIOUS FUNCTION CALLS HANGING
> THERE IN THE MEMORY WASTING SPACE - without knowing when to release
> them (since closure has outlived the function and can be activated
> at any point of the program)! Now what if the function is recursive
> and calls itself a thousand times?
> 
> If all locals are preserved, not only those that are referenced from closures, as you suggest, than this problem will arise with each and every function. To make it simpler - since every function has its own copy of locals, and since it can possibly return a pointer to one of locals, and since locals should be able to outlive the function, we cannot clear the stack when leaving the function at all!
> 
> ...am I missing something?
November 08, 2001
<la7y6nvo@shamko.com> wrote in message news:s7csnbpsmx5.fsf@michael.shamko.com...

> Only the heap-allocated locals that are referenced by a closure
> that is still accessible need be kept.  Any others would be
> cleaned up just like any other unreachable object.  Also any
> local variables _not_ referenced by closures can be allocated on
> the stack and pruned in the usual way.  No stack frames need
> be kept.

Does this mean that all locals that are referenced by a closure
automatically become static (since their only instance reside on heap)?
Or will there be a separate stack for them (again, on heap)?

BTW another problem: compiler can't always be sure that closure gets returned from the function on a specific call - if, for example, it's in conditional statement. If closure references, say, a local string, and function is called several times, but actually returns closure only once, all instances of that local string are preserved till the end of the program...



November 08, 2001
> Issue one:  Is it useful for closures to have access to local variables?  (The access referred to is real shared access, not access by value or some sort of IN/OUT or value/result semantics.)
>
> My response is, it is useful, and here's why.  First, it comes up in real use, not on every closure certainly, but often enough so that there needs to be some way of sharing local state.  Second, closures tend to be short, so using some other mechanism that adds code would make a substantial difference.  Third, natural expression - using local variables inside closures looks natural and also produces expectable results, much the same way as the functions (methods) inside a class affect the instance variables of the object that's an instance of the class.

I agree, they should be able to access local variables of the enclosing function.  I don't really think a closure should be able to declare any variables which are visible to the enclosing function. (much like local functions)

> Issue two:  How should closures behave when they access local variables whose containing function activation has exited?
>
> There are two main possibilities:  one, same as taking the address of something on the stack (which is to say, a runtime error in the best case, and random memory accesses in the more typical case);  two, if these variables are allocated so that they can outlive the function activation (and are collected by gc), the behavior is just the same as the case where the function activation has not exited.
>
> People who program in C or C++ will probably expect (one).  In spite
> of this I claim (two) is better, for the following reasons.
>
> First, D is moving in the direction of more benign semantics, for example garbage collection and array bounds checking.  So providing behavior (two) is consistent with where D is going.
>
> Second, people who are expecting (two) and get (one) will have to deal
> with what is probably a nasty bug.  On the other hand, people who are
> expecting (one) and get (two) will be pleasantly surprised that the
> "anomalous" condition is handled in a way that works.  It's nicer
> to be pleasantly surprised than unpleasantly surprised.
>
> Third, the rule for how closures work is now very simple - they work the same whether the surrounding function activation has exited or not.  Simplifying the programming model means simplifying programming.

I almost disagreed with you, but after rethinking I think you're right. Closures would be much less powerful without being able to outlive the enclosing function.  If only there were some way of determining whether a closure was safe to let use the normal stack frame, or if it needed one allocated from the heap!  I can't think of a good way without somehow having to declare closure-type parameters to a function as "storable" in much the way that "const" works in C++.  Perhaps the compiler can keep a record for each function whether its closure parameters are being treated in an unsafe storable manner or not, and generate the heap stack frame only in those cases.  "storable" use being storing the closure argument to any location external to the function or sending the closure to any function which takes a storable closure as parameter.  Perhaps "callback" would be a better word for this.  Anyway if the symbol table is descriptive enough this could be done in most cases with no heap overhead so long as the closure is used carefully to satisfy these criteria.

Anyway I don't think a small heap allocation before iterating through 5000 objects will slow an application appreciably.  It is a tradeoff I'd gladly make in most situations where closures are useful.  The size is likely small, the lifetime short and, unless stored, predictable.  The compiler could very well use a reference counted pointer to maintain this memory instead of GC if it wanted.

Question:  How would a closure declared in a method of an object be handled? I assume a "this" pointer to the object would be maintained in the closure's stack frame.




« First   ‹ Prev
1 2