Jump to page: 1 2
Thread overview
Explicit TCE
Oct 12, 2012
David Nadlinger
Oct 12, 2012
Dmitry Olshansky
Oct 12, 2012
Dmitry Olshansky
Oct 12, 2012
Dmitry Olshansky
Oct 12, 2012
bearophile
Oct 12, 2012
bearophile
October 12, 2012
I've read a few threads discussing tail call elimination, but they all wanted the spec to articulate specific circumstances where tail call elimination is required.  Has there been any thought to adding syntax to explicitly state tail call elimination?

D could use something like Newsqueak's become keyword. If you're not familial with Newsqueak, become is just like a return, except it replaces the stack frame with the function that it calls.  This was briefly mentioned years ago in this forum, but the become keyword was ignored:

    http://www.digitalmars.com/d/archives/digitalmars/D/Rob_Pike_s_Newsqueak_-_some_good_concepts_53511.html

I think D could do something like this:

    int fact(int n, int accum = 1) {
        if (n == 0) {
            return accum
        }
        // become means return, but guarantees TCE
        become fact(n - 1, accum * n)
    }

DMD should optimize this already, but explicitly stating become is a key to the compiler that the user wants this call to be eliminated.  Then, more interesting things can be implemented more simply, like a state machine:

    void stateA() {
        become stateB();
    }

    void stateB() {
        become stateC();
    }

    void stateC() {
        return;
    }

    void main() {
        become stateA();
    }

This would only take a single stack frame. If there were conditionals in there for branching, this could end up with a stack overflow because DMD does not support complicated TCE, only simple recursive TCE.

The become keyword would probably have to have these properties:

    * statement after become must only be a function call (can't do foo() + 3)
    * scope() needs to be handled appropriately for functions with become

There might be others. I'm not sure how D handles stack sizes, so this may be an issue as well if stack sizes are determined at runtime. A requirement could be that functions called with become need to have a static stack size, since this stack might never be collected (in the case of an infinite state machine).

Adding this feature wouldn't cost much, but it would add a ton of functional power.
October 12, 2012
On 12-10-2012 19:29, Tyler Jameson Little wrote:
> I've read a few threads discussing tail call elimination, but they all
> wanted the spec to articulate specific circumstances where tail call
> elimination is required.  Has there been any thought to adding syntax to
> explicitly state tail call elimination?
>
> D could use something like Newsqueak's become keyword. If you're not
> familial with Newsqueak, become is just like a return, except it
> replaces the stack frame with the function that it calls. This was
> briefly mentioned years ago in this forum, but the become keyword was
> ignored:
>
> http://www.digitalmars.com/d/archives/digitalmars/D/Rob_Pike_s_Newsqueak_-_some_good_concepts_53511.html
>
>
> I think D could do something like this:
>
>      int fact(int n, int accum = 1) {
>          if (n == 0) {
>              return accum
>          }
>          // become means return, but guarantees TCE
>          become fact(n - 1, accum * n)
>      }
>
> DMD should optimize this already, but explicitly stating become is a key
> to the compiler that the user wants this call to be eliminated.  Then,
> more interesting things can be implemented more simply, like a state
> machine:
>
>      void stateA() {
>          become stateB();
>      }
>
>      void stateB() {
>          become stateC();
>      }
>
>      void stateC() {
>          return;
>      }
>
>      void main() {
>          become stateA();
>      }
>
> This would only take a single stack frame. If there were conditionals in
> there for branching, this could end up with a stack overflow because DMD
> does not support complicated TCE, only simple recursive TCE.
>
> The become keyword would probably have to have these properties:
>
>      * statement after become must only be a function call (can't do
> foo() + 3)
>      * scope() needs to be handled appropriately for functions with become
>
> There might be others. I'm not sure how D handles stack sizes, so this
> may be an issue as well if stack sizes are determined at runtime. A
> requirement could be that functions called with become need to have a
> static stack size, since this stack might never be collected (in the
> case of an infinite state machine).
>
> Adding this feature wouldn't cost much, but it would add a ton of
> functional power.

I'm a big fan of explicit, guaranteed TCE.

However, the primary problem with this approach is a really mundane one: The major compiler back ends (GCC and LLVM) don't have any means of guaranteeing TCE...

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
October 12, 2012
On 12-Oct-12 21:29, Tyler Jameson Little wrote:
> I've read a few threads discussing tail call elimination, but they all
> wanted the spec to articulate specific circumstances where tail call
> elimination is required.  Has there been any thought to adding syntax to
> explicitly state tail call elimination?
>
> D could use something like Newsqueak's become keyword. If you're not
> familial with Newsqueak, become is just like a return, except it
> replaces the stack frame with the function that it calls. This was
> briefly mentioned years ago in this forum, but the become keyword was
> ignored:

I suspect it's so called switch-call or forced tail call. I proposed something to the exact effect of the below but to reuse the 'goto' keyword.

> http://www.digitalmars.com/d/archives/digitalmars/D/Rob_Pike_s_Newsqueak_-_some_good_concepts_53511.html
>
>
> I think D could do something like this:
>
>      int fact(int n, int accum = 1) {
>          if (n == 0) {
>              return accum
>          }
>          // become means return, but guarantees TCE
>          become fact(n - 1, accum * n)
>      }
>
> DMD should optimize this already, but explicitly stating become is a key
> to the compiler that the user wants this call to be eliminated.  Then,
> more interesting things can be implemented more simply, like a state
> machine:
>
>      void stateA() {
>          become stateB();
>      }
>
>      void stateB() {
>          become stateC();
>      }
>
>      void stateC() {
>          return;
>      }
>
>      void main() {
>          become stateA();
>      }
>
> This would only take a single stack frame. If there were conditionals in
> there for branching, this could end up with a stack overflow because DMD
> does not support complicated TCE, only simple recursive TCE.
>
> The become keyword would probably have to have these properties:
>
>      * statement after become must only be a function call (can't do
> foo() + 3)
>      * scope() needs to be handled appropriately for functions with become
>
> There might be others. I'm not sure how D handles stack sizes, so this
> may be an issue as well if stack sizes are determined at runtime. A
> requirement could be that functions called with become need to have a
> static stack size, since this stack might never be collected (in the
> case of an infinite state machine).
No idea what you are talking about.

>
> Adding this feature wouldn't cost much, but it would add a ton of
> functional power.

The use case in general is threaded code. In particular fast virtual machines.

-- 
Dmitry Olshansky
October 12, 2012
> I'm a big fan of explicit, guaranteed TCE.
>
> However, the primary problem with this approach is a really mundane one: The major compiler back ends (GCC and LLVM) don't have any means of guaranteeing TCE...

Ugh... I thought that might be a problem.

I don't know too much about GCC/LLVM, but I saw 'tailcallelim' for LLVM:

http://llvm.org/docs/Passes.html#tailcallelim

GCC seems to support it in 4.x:

arxiv.org/pdf/1109.4048

These look promising, so I wouldn't completely rule out the possibility of doing it in GCC/LLVM.  Perhaps someone more knowledgeable about GCC/LLVM could comment? I would really like to see D have this feature (then I can stop daydreaming about LISP).
October 12, 2012
Tyler Jameson Little:

> D could use something like Newsqueak's become keyword. If you're not familial with Newsqueak, become is just like a return, except it replaces the stack frame with the function that it calls.

Are you talking about CPS?
http://en.wikipedia.org/wiki/Continuation_passing_style


> DMD should optimize this already, but explicitly stating become is a key to the compiler that the user wants this call to be eliminated.  Then, more interesting things can be implemented more simply, like a state machine:
>
>     void stateA() {
>         become stateB();
>     }
>
>     void stateB() {
>         become stateC();
>     }
>
>     void stateC() {
>         return;
>     }
>
>     void main() {
>         become stateA();
>     }

Seems nice.


> I'm not sure how D handles stack sizes, so this may be an issue as well if stack sizes are determined at runtime.

D doesn't currently support C99 VLAs, but it supports alloca(), so stack frames are sized dynamically. But maybe this is not a big problem for CPS.

Bye,
bearophile
October 12, 2012
> No idea what you are talking about.

I'm not sure which part wasn't clear, so I'll try to explain myself. Please don't feel offended if I clarify things you already understand.

An optimizable tail call must simply be a function call. The current stack frame would be replaced with the new function, so anything more complex than a simple function call would require some stack from the preceding function to stick around in the new function, thus requiring the old stack to stick around.

For example, te following is not optimizable the old stack (the one with 3) needs to be maintained until foo() returns, which is not TCE.

return foo() * 3


Since the old stack won't be around anymore, that leaves us with in a sticky situation with regard to scope():

http://dlang.org/statement.html#ScopeGuardStatement

If the current stack is going to be replaced with data from another function call, the behavior of scope() is undefined. The scope that scope() was in has now been repurpose, but the scope is still kind of there. If scope() is allowed, they must be executed just before the tail call, otherwise it will be overwritten (or it has to stick around until the actual stack frame is cleared. Consider:

void a() {
  become b();
}

void b() {
  // when does this get called?
  scope(exit) writeln("exited");
  become a();
}

If we allow scope(), then the line should be written before the call to a(). If we don't, then this is a compile time error. I like disallowing it personally, because if the scope(exit) call frees some memory that is passed to a, the programmer may think that it will be called after a exits, which may not be the case.

void a(void* arr) {
  // do something with arr
  become b();
}

void b() {
  void* arr = malloc(sizeof(float) * 16);
  scope(exit) free(arr);
  become a(arr);
}

I just see this as being a problem for those who don't fully understand scoping and TCE.


My mention of overhead was just how complicated it would be to implement. The general algorithm is (for each become keyword):

* determine max stack size (consider all branches in all recursive contexts)
* allocate stack size for top-level function
* do normal TCE stuff (use existing stack for new call)

The stack size should be known at compile time for cases like the one above (a calls b, b calls a, infinitely) to avoid infinitely expanding stack. A situation like this is a memory optimization, so forcing guaranteed stack size puts an upper-bound on memory usage, which is the whole point of TCE. If the stack is allowed to grow, there is opportunity for stack overflow.



My use case for this is a simple compiler, but I'm sure this could be applied to other use cases as well.  I'd like to produce code for some BNF-style grammar where each LHS is a function. Thus, my state machine wouldn't be a huge, unnatural switch statement that reads in the current state, but a series of code branches that 'become' other states, like an actual state machine.

For example:

A := B | C | hello
B := bye | see ya
C := go away

void A() {
    char next = getNext();
    if (next == 'b' || next == 's') {
        become B();
    }
    if (next == 'g') {
        become C();
    }
    if (next == 'h') {
        // consume until hello is found, or throw exception
        // then put some token on the stack
    }
}

void B() {
    // consume until 'bye' or 'see ya'
}

void C() {
    // consume until 'go away'
}

This would minimize memory use and allow me to write code that more closely matches the grammar. There are plenty of other use cases, but DSLs would be very easy to implement with TCE.
October 12, 2012
On Friday, 12 October 2012 at 18:02:57 UTC, bearophile wrote:
> Tyler Jameson Little:
>
>> D could use something like Newsqueak's become keyword. If you're not familial with Newsqueak, become is just like a return, except it replaces the stack frame with the function that it calls.
>
> Are you talking about CPS?
> http://en.wikipedia.org/wiki/Continuation_passing_style

I don't think it would necessitate CPS, but that is a nice side effect. I'm thinking more of a recursive function call that may or may not return. For example, a process that shovels data between two network connections. If the data never stops, the function will never return. If there's some kind of a problem, then it could return with that error, and be restarted when that problem is fixed. All of this could happen with a series of function calls that use the same stack.

void handleIncommingData() {
   if (error) {
       // returns directly to manageLongRunningProcess
       return;
   }

   // do something useful

   become longRunningProcess();
}

void longRunningProcess() {
    become handleIncommingData();
}

void manageLongRunningProcess() {
    longRunningProcess();
    // there was a problem, so fix it
    ....

    // try again
    manageLongRunningProcess();
}

Exceptions are not needed, so these can be nothrow functions, and this implementation is simpler than some complex while loop, while having the same memory footprint.

CPS would make things like imitating javascript's setTimeout/setInterval possible. I don't think this is a major benefit for D because the parallelism/concurrency support is already pretty awesome.

The main benefit is for implementing things like lexical analyzers (or tokenizers, whatever), which don't really depend on previous states and can emit tokens. This allows for efficient representation of recursive problems, that call functions circularly (a -> b -> c -> a -> b ...), like a state machine.

I think it just allows an extra level of expressiveness without a backwards incompatible change to the language. True, you can express this same idea with trampolining, but that isn't as fun:

http://stackoverflow.com/a/489860/538551
http://en.wikipedia.org/wiki/Tail-recursive_function#Through_trampolining

There are still some problems that I think a LISP language would make more sense for, and for those problems, it would be great to express them in D with my other code.

>> DMD should optimize this already, but explicitly stating become is a key to the compiler that the user wants this call to be eliminated.  Then, more interesting things can be implemented more simply, like a state machine:
>>
>>    void stateA() {
>>        become stateB();
>>    }
>>
>>    void stateB() {
>>        become stateC();
>>    }
>>
>>    void stateC() {
>>        return;
>>    }
>>
>>    void main() {
>>        become stateA();
>>    }
>
> Seems nice.

I'm glad you think so =D

>> I'm not sure how D handles stack sizes, so this may be an issue as well if stack sizes are determined at runtime.
>
> D doesn't currently support C99 VLAs, but it supports alloca(), so stack frames are sized dynamically. But maybe this is not a big problem for CPS.

Well, dynamic stack frames aren't strictly a bad thing for CPS, it just removes the memory use guarantee. There's already a huge memory gain from using TCE, I just don't know debugging would be done if a function keeps adding to and passing a dynamic array.
October 12, 2012
On 12-Oct-12 22:17, Tyler Jameson Little wrote:
>> No idea what you are talking about.
>
> I'm not sure which part wasn't clear, so I'll try to explain myself.
> Please don't feel offended if I clarify things you already understand.
>


> An optimizable tail call must simply be a function call. The current
> stack frame would be replaced with the new function, so anything more
> complex than a simple function call would require some stack from the
> preceding function to stick around in the new function, thus requiring
> the old stack to stick around.
>
Yup.

> For example, te following is not optimizable the old stack (the one with
> 3) needs to be maintained until foo() returns, which is not TCE.
>
> return foo() * 3
>
>
> Since the old stack won't be around anymore, that leaves us with in a
> sticky situation with regard to scope():
>
> http://dlang.org/statement.html#ScopeGuardStatement
>
> If the current stack is going to be replaced with data from another
> function call, the behavior of scope() is undefined. The scope that
> scope() was in has now been repurpose, but the scope is still kind of
> there. If scope() is allowed, they must be executed just before the tail
> call, otherwise it will be overwritten (or it has to stick around until
> the actual stack frame is cleared.

 Consider:
>
> void a() {
>    become b();
> }
>
> void b() {
>    // when does this get called?
>    scope(exit) writeln("exited");
>    become a();
> }
>
> If we allow scope(), then the line should be written before the call to
> a(). If we don't, then this is a compile time error. I like disallowing
> it personally, because if the scope(exit) call frees some memory that is
> passed to a, the programmer may think that it will be called after a
> exits, which may not be the case.
>
I think it should be disallowed. That along with destructors.

> void a(void* arr) {
>    // do something with arr
>    become b();
> }
>
> void b() {
>    void* arr = malloc(sizeof(float) * 16);
>    scope(exit) free(arr);
>    become a(arr);
> }
>
> I just see this as being a problem for those who don't fully understand
> scoping and TCE.


>
> My mention of overhead was just how complicated it would be to
> implement. The general algorithm is (for each become keyword):
>
> * determine max stack size (consider all branches in all recursive
> contexts)
> * allocate stack size for top-level function
> * do normal TCE stuff (use existing stack for new call)

What's wrong with just allocating a new stack _in-place_  of the old?
In other words make 'become' synonym for 'reuse the current stack frame'.
Effectively you still stay in constant space that is maximum of all functions being called.

> The stack size should be known at compile time for cases like the one
> above (a calls b, b calls a, infinitely) to avoid infinitely expanding
> stack. A situation like this is a memory optimization, so forcing
> guaranteed stack size puts an upper-bound on memory usage, which is the
> whole point of TCE. If the stack is allowed to grow, there is
> opportunity for stack overflow.

Right.

> My use case for this is a simple compiler, but I'm sure this could be
> applied to other use cases as well.  I'd like to produce code for some
> BNF-style grammar where each LHS is a function. Thus, my state machine
> wouldn't be a huge, unnatural switch statement that reads in the current
> state, but a series of code branches that 'become' other states, like an
> actual state machine.

I see nice staff. My use case is optimizing virtual machine, the one inside std.regex primarily.

-- 
Dmitry Olshansky
October 12, 2012
>> My mention of overhead was just how complicated it would be to
>> implement. The general algorithm is (for each become keyword):
>>
>> * determine max stack size (consider all branches in all recursive
>> contexts)
>> * allocate stack size for top-level function
>> * do normal TCE stuff (use existing stack for new call)
>
> What's wrong with just allocating a new stack _in-place_  of the old?
> In other words make 'become' synonym for 'reuse the current stack frame'.
> Effectively you still stay in constant space that is maximum of all functions being called.

That would work too. If scope() is disallowed, it doesn't matter where the stack comes from. It's only slightly cheaper to reuse the current stack (CPU), but making a new one would be lighter on memory.

> I see nice staff. My use case is optimizing virtual machine, the one inside std.regex primarily.

Yeah, that is a great example! I've read some bug reports about std.regex using a ton of memory, especially with CTFE. Since regex is by definition a state machine, this would be a particularly elegant fit (granted, backreferences et al break that model, but it's still a nice metaphor).

The main problem I see is working with other compilers like GCC/LLVM. If this can be done on those compilers, I don't see any major hurdle to getting this implemented.
October 12, 2012
On 12-Oct-12 22:49, Tyler Jameson Little wrote:
>
> That would work too. If scope() is disallowed, it doesn't matter where
> the stack comes from. It's only slightly cheaper to reuse the current
> stack (CPU), but making a new one would be lighter on memory.
>
>> I see nice staff. My use case is optimizing virtual machine, the one
>> inside std.regex primarily.
>
> Yeah, that is a great example! I've read some bug reports about
> std.regex using a ton of memory, especially with CTFE.

Hey, that's dmd (compiler) using a ton of memory,  not std.regex :(
It actually flies with only a modest set of ram after CTFE (or rather 'if') succeeds that is :)

Since regex is by
> definition a state machine, this would be a particularly elegant fit
> (granted, backreferences et al break that model, but it's still a nice
> metaphor).

Yeah, without backreferences it's a state machine. Still it's NFA (non-deterministic) as no DFA would do if you want to get things like captures of sub matches etc. Either way the remarkable giant switch is present ;)

>
> The main problem I see is working with other compilers like GCC/LLVM. If
> this can be done on those compilers, I don't see any major hurdle to
> getting this implemented.

Perhaps the biggest one would be convincing GCC/LLVM devs to accept patches :)

-- 
Dmitry Olshansky
« First   ‹ Prev
1 2