Jump to page: 1 2
Thread overview
Closures in D
Nov 07, 2001
la7y6nvo
Nov 07, 2001
Axel Kittenberger
Nov 07, 2001
la7y6nvo
Nov 09, 2001
Roberto Mariottini
Nov 10, 2001
la7y6nvo
Nov 12, 2001
Roberto Mariottini
Nov 12, 2001
la7y6nvo
Nov 10, 2001
Axel Kittenberger
Nov 08, 2001
Sean L. Palmer
Nov 07, 2001
Russell Borogove
Nov 08, 2001
la7y6nvo
Nov 10, 2001
Patrick Down
November 07, 2001
I'd like to talk about the idea of having closures in D.  For those who aren't used to the term closure, a closure is something like a pointer to function, and usually is created by a syntactic construct that appears in source code in the form of an anonymous "code literal".


Closures in Smalltalk
=====================

I'll start by talking about closures in Smalltalk, which calls them "Blocks".  (I'm assuming people can read Smalltalk syntax, which may be a bad assumption, but that's what I'm assuming.)

A block in Smalltalk is code that appears between square brackets. An example would be

    symbolTable at: identifier ifAbsent: [ defaultSymbol ].

The '[ defaultSymbol ]' is a block that is executed only if the 'identifier' key is not present in 'symbolTable'.

This example isn't very motivating, because defaultSymbol is just a value.  Consider this however

    symbolTable at: identifier ifAbsent: [ UndeclaredSymbol new ].

The '[ UndeclaredSymbol new ]' block only gets executed if the identifier is not present.  This could be useful because making a new UndeclaredSymbol might involve lots of computation.

The examples above return values, but blocks can also have side effects.  For example,

    symbolTable at: identifier ifAbsent: [
	undeclaredIdentifiers add: identifier.
	UndeclaredSymbol new
    ].

Notice that the last expression evaluated in a block is what is returned for the value of the block.  (Please, no flames about bracket placement style - I know some people prefer the open bracket to go on a line by itself.)

Blocks can also take arguments, for example -

    symbolTable keysAndValuesDo: [  :identifier  :entry  |
        output nextPut:
	    '(', identifier printString, '->', entry printString, ')'.
    ].

which causes a printable form of a symbol table to be added to the 'output' object.

Blocks get much of their power from being able to access local context.  In an example above, the block contains references to 'identifier' and 'undeclaredIdentifiers', which might be instance variables or perhaps local variables.  This ability puts blocks on an equal footing with the more usual if/then/else and looping control structures.  (In fact these "built-in" control structures are themselves implemented using blocks in Smalltalk, but that point is incidental - the main thing is that having access to local context gives an important kind of expressive power.)

Incidentally, blocks can declare variables local to the block. (I'm not going to show an example for this.)

Blocks also have a capability that might be surprising to people used to C, namely the ability to return (a value) from the method that contains the block.  So for example

    referencesTo: identifier
        "Answer a collection of all references to identifier"

	| entry |  "this is a local variable"

	entry := self at: identifier ifAbsent: [ ^ Set new ].
	^ programSource referencesToEntry: entry

The ^ in Smalltalk is like 'return' in C.  The block '[ ^ Set new ]' will, if executed, immediately return a value from the 'referencesTo:' method, and the following statement will not be executed at all.

Here again, the ability to return a value from the enclosing function is one that adds expressive power, and puts blocks on an equal footing with built-in control structures.

Blocks in Smalltalk are fairly representative of closures that are available in other programming languages, taking into account that Smalltalk is more of an imperative language than a functional language (in the sense of "functional programming").


Closures in D
=============

For the sake of discussion let's assume that closures will be included in D.  What are the implications of this?  Clearly there are syntax questions (about which more later), and there are questions about what capabilities should be supported.  For the moment let's use

    [ ParameterList ]{ FunctionBody }

as the syntax for a closure in D, without being too particular about types or return values, and look at semantic issues.

First, the body of a closure is the same as that for a function body, which means closures can declare variables local to the closure activation.

Second, the code in the body of the closure should have access to the context of the surrounding code, notably instance variables and local variables.  Instance variables aren't a problem (just have the closure keep a pointer to the object in question), but local variables are a little trickier, since a closure can outlive the stack frame of the method that created it.  This could be handled by taking any variables used in closure bodies and allocating them in a heap object and having the closure hang on to a pointer to that.  Yes, there is a cost associated with doing that allocation, but only for the cases where there are closures that reference local variables.

[Note added during editing - People who are used to stack-based local
variable frames may be surprised by the idea that local variables
might outlive the function activation.  Rather than get into that
in too much detail here I think a follow-up post is a better idea.
Expect that shortly.]

Third, recursion - as with ordinary functions, closures need a stack frame for their local variables so recursion can occur. A closure activation is not the same as a closure.

Fourth, nested closures.  Closure bodies can contain other closure bodies.  This brings with it all the usual machinery needed for nested functions, in particular accessing local variables of containing closures (when that happens).  As with functions, closures that have local variables accessed by contained closures would allocate those local variables in an object in the heap.

Fifth, interchangeability with function pointers.  Getting a closure to masquerade as a function pointer is a little tricky (because a function pointer is only one pointer, whereas a closure is two pointers - a pointer to code and a pointer to context that the code runs in).  Maybe it's useful to do that, I don't know.  But going the other way is easy - turning a function pointer into a closure just means adding a null pointer for execution context, and perhaps a dummy closure body that simply invokes through the function pointer.  The more closures and function pointers are interchangeable (as far as the function getting them as arguments is concerned), the more the language follows the "Law of Least Astonishment".

Incidentally, closures being interchangeable with function pointers means closures should have all the same parameter passing modes. In particular, if functions have OUT and INOUT parameters, then closures should have them.

Sixth, invoking a closure.  How about the same syntax as that for invoking through a function pointer?

    (*closure_parameter)( argument, argument, argument, ... )

Seventh, closure returns.  It's important to distinguish between returning from a closure and returning from a function body containing a closure.  Blocks in Smalltalk use the usual return syntax (^) to indicate return from the containing method, and use the last expression evaluated as the return value for the block.  Should there be a way to return a value from a closure out of the middle of the closure body?  Should 'return' mean return from the closure or return from the function?  I can see arguments on both sides.

For nested closures, you can imagine an "intermediate" return that returns from a containing closure but not the outermost function body. Blocks in Smalltalk don't include this capability (innermost return or outermost return are the only choices);  I have no experience with it, so I don't know how useful it is.  But if D is to include multi-level breaks, it probably should also include multi-level returns.

Eighth, other control structures.  Consider the following code fragment -

    while(  p < p_limit  ){
	candidate = *p;
	candidate.enumerate_parts(
	    [ part ]{
	        if(  part.passesTest()  )  break;
	    }
	);
	p ++;
    }

Notice what's happening.  The closure is testing a value, and in the event that the test succeeds, the 'break' breaks out of the while in the function body outside the closure - a non-local transfer of control.  There are similar examples for 'continue' and (dare I say it) 'goto'.  These control structures aren't that different from uplevel returns in terms of what mechanisms are needed.

Notice that any non-local control transfer - either a return out of or a break/continue/goto into - a function body containing a closure can fail at runtime if that function activation has already returned. This is true for intermediate (closure) function bodies as well as outermost function bodies.  Presumably such a case would generate some kind of exception.

Despite the difficulties of these exceptional conditions (and the reactions I'm sure some people will have at the idea of using 'goto' out of a closure), I would argue that if the language is going to be orthogonal then as much as possible closures should look like the regular built-in control structures and non-local control transfers should be supported.


Syntax for Closures
===================

For invoking a closure - how about using the pointer to function notation

    (*closure)( argument, argument, argument, ... )

Note:  personally I see nothing wrong with using the unadorned function call notation for pointers-to-function

    function_pointer( argument, argument, argument, ... )

and would use that for closures if it were allowed for function pointers, but I don't want to start an argument on that topic.

For declaring a closure variable - this should be similar (not identical) to declaring a function pointer.  How will function pointers be declared in D?

For writing a closure body - well I like the syntax I've been using, assuming it doesn't cause too many problems:

    [  ParameterList  ]  {  FunctionBody  }

The types of the parameters and the return value need to go somewhere, which is sort of a pain.  The parameter types might go in the parameter list, much like function definitions, e.g.,

    [ int a, int b ]  {  a < b ? a : b  }

but for the return type, I'm at more of a loss.  Perhaps other people will have some suggestions.


Final Remarks
=============

I think closures should be considered for D.  I've tried to lay out the basic issues and implications of D having closures.  I can say based on experience that closures are extremely useful - given a choice between closures and exception handling, I think closures are more generally useful.  Also, I think having closures in some form is a prerequisite to writing a good set of Collection classes, which is a large part of the motivation for having some form of generics (or templates).
November 07, 2001
Quite a lot of text, and honestly it only confused me.

What's now the advantage of a closure? How is it implemented? Especially in featureless systems where C originally was developed for. In UserSpace resources are easy to get/implement, but what in kernel space? In embedded OS-less applications? This places are the real motherhood of C.

Maybe more for a presentational hint, try to get faster to the point, and explain it good and in a clean easy to understand way. This text sounds just so universitary from professors who's thing they most love in life is to hear themself's talking.

- Axel
November 07, 2001
<la7y6nvo@shamko.com> wrote in message news:s7cbsieubtt.fsf@michael.shamko.com...

> form is a prerequisite to writing a good set of Collection classes, which is a large part of the motivation for having some form of generics (or templates).

First, I don't understand the need of closures for Collections (not that I understand them - closures - very well at all =)). But since D has dynamic and associative arrays, I think most requirements for containers are covered.



November 07, 2001
<la7y6nvo@shamko.com> wrote in message news:s7cbsieubtt.fsf@michael.shamko.com...

> I'd like to talk about the idea of having closures in D.

In fact, after reading this one more time, I really can't understand HOW can this be implemented in a non-interpreted language like D. Nor do I see any advantages of it...

Maybe I'm wrong. I'm not sure that I understood anything
right, besides, I'm not familiar with Smalltalk, so could
you please give a shorter and more self-explanatory example? =)



November 07, 2001
Pavel \"EvilOne\" Minayev wrote:
> 
> <la7y6nvo@shamko.com> wrote in message news:s7cbsieubtt.fsf@michael.shamko.com...
> 
> > I'd like to talk about the idea of having closures in D.
> 
> In fact, after reading this one more time, I really can't understand HOW can this be implemented in a non-interpreted language like D. Nor do I see any advantages of it...

Closures can be used as a form of generic programming, but in a slightly different way than C++ templates. Perl has them and makes them very convenient to use; Perl has the advantage that the run-time environment is guaranteed to include a Perl compiler[1].

Here's a trivial example of a closure in Perl:

    for my $color (qw[red yellow orange green blue purple violet])
    {
        *$color = sub { qq<<FONT COLOR="\U$color\E">@_</FONT>> };
    }

Don't worry about the hairballs, that's just Perl. :)

This automatically generates functions/subroutines red(), yellow(), orange()... etc., which generate some HTML to render strings in the specified color.

In C/C++, you could approach this with #defined macros (except for
the trick where function red() generates the color tag "RED"), but
closures offer quite a bit more power than this.

The above example in and of itself could be evaluated completely at compile time, but you could hypothetically replace the list of color names above with a list of arbitrary dynamic text strings. At this point, you _have_ to have dynamic binding and a compiler in the runtime. My guess is that this need is incompatible with the desire to have D be easy to implement.

-RB

[1] Inasmuch as perl is a compiled language.
November 07, 2001
"Russell Borogove" <kaleja@estarcion.com> wrote in message news:3BE98692.7994A03C@estarcion.com...

> The above example in and of itself could be evaluated completely at compile time, but you could hypothetically replace the list of color names above with a list of arbitrary dynamic text strings. At this point, you _have_ to have dynamic binding and a compiler in the runtime. My guess is that this need is incompatible with the desire to have D be easy to implement.

Thanks for explaining, now everything seems much clearer.
And I don't think that it's worth implementing. As you've
said it requires dynamic binding, which would probably bloat
the code... no thanks. Just not the "D way", I suppose -
or at least it feels like that.


November 07, 2001
Here's an example of where a closure might be useful:

    quicksort(
	string_pointer_array,
	number_of_elements,
	[ char * a, char * b ]{  strcmp( a, b )  }
    );

So we get to write the compare function right here rather than having to define a separate function.

Closures are also useful for dealing with collections:

    some_collection.do(
	[ ElementType element ]{
	    if(  element.satisfies_some_condition()  )  return  element;
	}
    );
    return  NULL;

This search pattern occurs frequently, so we might very well have a function that does it already, in which case we could write

    ElementType e =
	some_collection.first_element_that(
	    [ ElementType e ]{  e.satisfies_some_condition()  }
	);

Closures can also be used to create control structures that haven't
(yet) gotten into the language:

    foreach_element_pair( array_a, array_b,
	[ ElementType a_element, ElementType b_element ]{
	    /* do something with a_element and b_element */
	}
    );

Do these examples help?

Implementation - closures don't need interpretation or any sort of runtime compilation;  they are compiled in the same way that functions are.  The non-local control transfers need some sort of mechanism similar to an exception handling mechanism - I expect Walter can provide some details here.


To respond to the individual points -

> Quite a lot of text, and honestly it only confused me.

Sorry for the confusion.  There was a lot of ground to cover, and
I thought breaking it into small pieces would help the presentation.


> What's now the advantage of a closure? How is it implemented? Especially in featureless systems where C originally was developed for. In UserSpace resources are easy to get/implement, but what in kernel space? In embedded OS-less applications? This places are the real motherhood of C.

Compiling a closure is much like compiling a function.  The cost of calling a closure is about the same as calling through a pointer to function;  the major difference is that a closure needs an extra pointer to establish its execution context.  Functional closures like the quicksort example above have essentially no more overhead than calling through pointer to function.  Closures that need access to local context require one heap allocation on function entry (regardless of how many closures in the function there are).

I tried to talk about some advantages up above.


> Maybe more for a presentational hint, try to get faster to the point, and explain it good and in a clean easy to understand way. This text sounds just so universitary from professors who's thing they most love in life is to hear themself's talking.

I appreciate the advice.
November 08, 2001
The Perl example is generating functions dynamically at runtime.

Closures are more static, similar to nested functions.  In fact the main difference between a nested function and a closure is that closures are written in line at the point of use.

Imagine a C-like language with nested functions, eg

    int
    outer_function(){
	int
	inner_function(){
	    /* inner function body ... */
	}

	needs_pointer_to_function( inner_function );
    }

A closure is like a function except the code is written directly where it's needed:

    int
    outer_function(){

	needs_pointer_to_function( []{ /* inner function body ... */ } );
    }

The Perl example (as I understand it - I'm not a Perl expert) is
generating function names (and I guess function bodies) dynamically
and hooking those functions into the environment.

Closures don't do any dynamic generation.  The code is compiled just like a function body, and the value of the closure is stored in an ordinary variable, like pointer-to-function, except closures have two pointers - one for the code body, one for the execution environment - rather than just one.



Russell Borogove <kaleja@estarcion.com> writes:

> Pavel \"EvilOne\" Minayev wrote:
> > 
> > <la7y6nvo@shamko.com> wrote in message news:s7cbsieubtt.fsf@michael.shamko.com...
> > 
> > > I'd like to talk about the idea of having closures in D.
> > 
> > In fact, after reading this one more time, I really can't understand HOW can this be implemented in a non-interpreted language like D. Nor do I see any advantages of it...
> 
> Closures can be used as a form of generic programming, but in a slightly different way than C++ templates. Perl has them and makes them very convenient to use; Perl has the advantage that the run-time environment is guaranteed to include a Perl compiler[1].
> 
> Here's a trivial example of a closure in Perl:
> 
>     for my $color (qw[red yellow orange green blue purple violet])
>     {
>         *$color = sub { qq<<FONT COLOR="\U$color\E">@_</FONT>> };
>     }
> 
> Don't worry about the hairballs, that's just Perl. :)
> 
> This automatically generates functions/subroutines red(), yellow(), orange()... etc., which generate some HTML to render strings in the specified color.
> 
> In C/C++, you could approach this with #defined macros (except for
> the trick where function red() generates the color tag "RED"), but
> closures offer quite a bit more power than this.
> 
> The above example in and of itself could be evaluated completely at compile time, but you could hypothetically replace the list of color names above with a list of arbitrary dynamic text strings. At this point, you _have_ to have dynamic binding and a compiler in the runtime. My guess is that this need is incompatible with the desire to have D be easy to implement.
> 
> -RB
> 
> [1] Inasmuch as perl is a compiled language.
November 08, 2001
Dynamic and associative arrays barely scratch the surface of container types.

Consider a video game, which maintains its world database as an octree of objects of various types.  Would it not be nice to be able to write code something like this:?

octreeScene.ForEach( [ octreeNode node ] { if ( node.IsAlive() &&
camera.CanSee(node.boundSphere) ) node.Render(); } );

This is alot of expressive power... something that may help D out alot.  I really needs something to make up for lack of generics.

The alternative is a fairly clunky functor object.  This is almost a way to get the compiler to write functor objects for you... I have a feeling they'd be pretty useful for callbacks as well.

Sean

"Pavel "EvilOne" Minayev" <evilone@omen.ru> wrote in message news:9sb9jv$ik3$1@digitaldaemon.com...
> <la7y6nvo@shamko.com> wrote in message news:s7cbsieubtt.fsf@michael.shamko.com...
>
> > form is a prerequisite to writing a good set of Collection classes, which is a large part of the motivation for having some form of generics (or templates).
>
> First, I don't understand the need of closures for Collections (not that I understand them - closures - very well at all =)). But since D has dynamic and associative arrays, I think most requirements for containers are covered.



November 08, 2001
"Sean L. Palmer" <spalmer@iname.com> wrote in message news:9sdl9d$28id$1@digitaldaemon.com...
> Dynamic and associative arrays barely scratch the surface of container types.
>
> Consider a video game, which maintains its world database as an octree of objects of various types.  Would it not be nice to be able to write code something like this:?
>
> octreeScene.ForEach( [ octreeNode node ] { if ( node.IsAlive() &&
> camera.CanSee(node.boundSphere) ) node.Render(); } );
>
> This is alot of expressive power... something that may help D out alot.  I really needs something to make up for lack of generics.

for-each loops is something that I was going to offer and that, IMHO, must be in the language that supports dynamic and associative arrays. So, this example could be rewritten as (note, this is also a syntax proposal):

    for(node: octreeNode)
    {
        if (node.IsAlive() && camera.CanSee(node.boundSphere))
            node.Render();
    }




« First   ‹ Prev
1 2