View mode: basic / threaded / horizontal-split · Log in · Help
October 12, 2012
Explicit TCE
I've read a few threads discussing tail call elimination, but 
they all wanted the spec to articulate specific circumstances 
where tail call elimination is required.  Has there been any 
thought to adding syntax to explicitly state tail call 
elimination?

D could use something like Newsqueak's become keyword. If you're 
not familial with Newsqueak, become is just like a return, except 
it replaces the stack frame with the function that it calls.  
This was briefly mentioned years ago in this forum, but the 
become keyword was ignored:

    
http://www.digitalmars.com/d/archives/digitalmars/D/Rob_Pike_s_Newsqueak_-_some_good_concepts_53511.html

I think D could do something like this:

    int fact(int n, int accum = 1) {
        if (n == 0) {
            return accum
        }
        // become means return, but guarantees TCE
        become fact(n - 1, accum * n)
    }

DMD should optimize this already, but explicitly stating become 
is a key to the compiler that the user wants this call to be 
eliminated.  Then, more interesting things can be implemented 
more simply, like a state machine:

    void stateA() {
        become stateB();
    }

    void stateB() {
        become stateC();
    }

    void stateC() {
        return;
    }

    void main() {
        become stateA();
    }

This would only take a single stack frame. If there were 
conditionals in there for branching, this could end up with a 
stack overflow because DMD does not support complicated TCE, only 
simple recursive TCE.

The become keyword would probably have to have these properties:

    * statement after become must only be a function call (can't 
do foo() + 3)
    * scope() needs to be handled appropriately for functions 
with become

There might be others. I'm not sure how D handles stack sizes, so 
this may be an issue as well if stack sizes are determined at 
runtime. A requirement could be that functions called with become 
need to have a static stack size, since this stack might never be 
collected (in the case of an infinite state machine).

Adding this feature wouldn't cost much, but it would add a ton of 
functional power.
October 12, 2012
Re: Explicit TCE
On 12-10-2012 19:29, Tyler Jameson Little wrote:
> I've read a few threads discussing tail call elimination, but they all
> wanted the spec to articulate specific circumstances where tail call
> elimination is required.  Has there been any thought to adding syntax to
> explicitly state tail call elimination?
>
> D could use something like Newsqueak's become keyword. If you're not
> familial with Newsqueak, become is just like a return, except it
> replaces the stack frame with the function that it calls. This was
> briefly mentioned years ago in this forum, but the become keyword was
> ignored:
>
> http://www.digitalmars.com/d/archives/digitalmars/D/Rob_Pike_s_Newsqueak_-_some_good_concepts_53511.html
>
>
> I think D could do something like this:
>
>      int fact(int n, int accum = 1) {
>          if (n == 0) {
>              return accum
>          }
>          // become means return, but guarantees TCE
>          become fact(n - 1, accum * n)
>      }
>
> DMD should optimize this already, but explicitly stating become is a key
> to the compiler that the user wants this call to be eliminated.  Then,
> more interesting things can be implemented more simply, like a state
> machine:
>
>      void stateA() {
>          become stateB();
>      }
>
>      void stateB() {
>          become stateC();
>      }
>
>      void stateC() {
>          return;
>      }
>
>      void main() {
>          become stateA();
>      }
>
> This would only take a single stack frame. If there were conditionals in
> there for branching, this could end up with a stack overflow because DMD
> does not support complicated TCE, only simple recursive TCE.
>
> The become keyword would probably have to have these properties:
>
>      * statement after become must only be a function call (can't do
> foo() + 3)
>      * scope() needs to be handled appropriately for functions with become
>
> There might be others. I'm not sure how D handles stack sizes, so this
> may be an issue as well if stack sizes are determined at runtime. A
> requirement could be that functions called with become need to have a
> static stack size, since this stack might never be collected (in the
> case of an infinite state machine).
>
> Adding this feature wouldn't cost much, but it would add a ton of
> functional power.

I'm a big fan of explicit, guaranteed TCE.

However, the primary problem with this approach is a really mundane one: 
The major compiler back ends (GCC and LLVM) don't have any means of 
guaranteeing TCE...

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
October 12, 2012
Re: Explicit TCE
On 12-Oct-12 21:29, Tyler Jameson Little wrote:
> I've read a few threads discussing tail call elimination, but they all
> wanted the spec to articulate specific circumstances where tail call
> elimination is required.  Has there been any thought to adding syntax to
> explicitly state tail call elimination?
>
> D could use something like Newsqueak's become keyword. If you're not
> familial with Newsqueak, become is just like a return, except it
> replaces the stack frame with the function that it calls. This was
> briefly mentioned years ago in this forum, but the become keyword was
> ignored:

I suspect it's so called switch-call or forced tail call. I proposed 
something to the exact effect of the below but to reuse the 'goto' keyword.

> http://www.digitalmars.com/d/archives/digitalmars/D/Rob_Pike_s_Newsqueak_-_some_good_concepts_53511.html
>
>
> I think D could do something like this:
>
>      int fact(int n, int accum = 1) {
>          if (n == 0) {
>              return accum
>          }
>          // become means return, but guarantees TCE
>          become fact(n - 1, accum * n)
>      }
>
> DMD should optimize this already, but explicitly stating become is a key
> to the compiler that the user wants this call to be eliminated.  Then,
> more interesting things can be implemented more simply, like a state
> machine:
>
>      void stateA() {
>          become stateB();
>      }
>
>      void stateB() {
>          become stateC();
>      }
>
>      void stateC() {
>          return;
>      }
>
>      void main() {
>          become stateA();
>      }
>
> This would only take a single stack frame. If there were conditionals in
> there for branching, this could end up with a stack overflow because DMD
> does not support complicated TCE, only simple recursive TCE.
>
> The become keyword would probably have to have these properties:
>
>      * statement after become must only be a function call (can't do
> foo() + 3)
>      * scope() needs to be handled appropriately for functions with become
>
> There might be others. I'm not sure how D handles stack sizes, so this
> may be an issue as well if stack sizes are determined at runtime. A
> requirement could be that functions called with become need to have a
> static stack size, since this stack might never be collected (in the
> case of an infinite state machine).
No idea what you are talking about.

>
> Adding this feature wouldn't cost much, but it would add a ton of
> functional power.

The use case in general is threaded code. In particular fast virtual 
machines.

-- 
Dmitry Olshansky
October 12, 2012
Re: Explicit TCE
> I'm a big fan of explicit, guaranteed TCE.
>
> However, the primary problem with this approach is a really 
> mundane one: The major compiler back ends (GCC and LLVM) don't 
> have any means of guaranteeing TCE...

Ugh... I thought that might be a problem.

I don't know too much about GCC/LLVM, but I saw 'tailcallelim' 
for LLVM:

http://llvm.org/docs/Passes.html#tailcallelim

GCC seems to support it in 4.x:

arxiv.org/pdf/1109.4048

These look promising, so I wouldn't completely rule out the 
possibility of doing it in GCC/LLVM.  Perhaps someone more 
knowledgeable about GCC/LLVM could comment? I would really like 
to see D have this feature (then I can stop daydreaming about 
LISP).
October 12, 2012
Re: Explicit TCE
Tyler Jameson Little:

> D could use something like Newsqueak's become keyword. If 
> you're not familial with Newsqueak, become is just like a 
> return, except it replaces the stack frame with the function 
> that it calls.

Are you talking about CPS?
http://en.wikipedia.org/wiki/Continuation_passing_style


> DMD should optimize this already, but explicitly stating become 
> is a key to the compiler that the user wants this call to be 
> eliminated.  Then, more interesting things can be implemented 
> more simply, like a state machine:
>
>     void stateA() {
>         become stateB();
>     }
>
>     void stateB() {
>         become stateC();
>     }
>
>     void stateC() {
>         return;
>     }
>
>     void main() {
>         become stateA();
>     }

Seems nice.


> I'm not sure how D handles stack sizes, so this may be an issue 
> as well if stack sizes are determined at runtime.

D doesn't currently support C99 VLAs, but it supports alloca(), 
so stack frames are sized dynamically. But maybe this is not a 
big problem for CPS.

Bye,
bearophile
October 12, 2012
Re: Explicit TCE
> No idea what you are talking about.

I'm not sure which part wasn't clear, so I'll try to explain 
myself. Please don't feel offended if I clarify things you 
already understand.

An optimizable tail call must simply be a function call. The 
current stack frame would be replaced with the new function, so 
anything more complex than a simple function call would require 
some stack from the preceding function to stick around in the new 
function, thus requiring the old stack to stick around.

For example, te following is not optimizable the old stack (the 
one with 3) needs to be maintained until foo() returns, which is 
not TCE.

return foo() * 3


Since the old stack won't be around anymore, that leaves us with 
in a sticky situation with regard to scope():

http://dlang.org/statement.html#ScopeGuardStatement

If the current stack is going to be replaced with data from 
another function call, the behavior of scope() is undefined. The 
scope that scope() was in has now been repurpose, but the scope 
is still kind of there. If scope() is allowed, they must be 
executed just before the tail call, otherwise it will be 
overwritten (or it has to stick around until the actual stack 
frame is cleared. Consider:

void a() {
  become b();
}

void b() {
  // when does this get called?
  scope(exit) writeln("exited");
  become a();
}

If we allow scope(), then the line should be written before the 
call to a(). If we don't, then this is a compile time error. I 
like disallowing it personally, because if the scope(exit) call 
frees some memory that is passed to a, the programmer may think 
that it will be called after a exits, which may not be the case.

void a(void* arr) {
  // do something with arr
  become b();
}

void b() {
  void* arr = malloc(sizeof(float) * 16);
  scope(exit) free(arr);
  become a(arr);
}

I just see this as being a problem for those who don't fully 
understand scoping and TCE.


My mention of overhead was just how complicated it would be to 
implement. The general algorithm is (for each become keyword):

* determine max stack size (consider all branches in all 
recursive contexts)
* allocate stack size for top-level function
* do normal TCE stuff (use existing stack for new call)

The stack size should be known at compile time for cases like the 
one above (a calls b, b calls a, infinitely) to avoid infinitely 
expanding stack. A situation like this is a memory optimization, 
so forcing guaranteed stack size puts an upper-bound on memory 
usage, which is the whole point of TCE. If the stack is allowed 
to grow, there is opportunity for stack overflow.



My use case for this is a simple compiler, but I'm sure this 
could be applied to other use cases as well.  I'd like to produce 
code for some BNF-style grammar where each LHS is a function. 
Thus, my state machine wouldn't be a huge, unnatural switch 
statement that reads in the current state, but a series of code 
branches that 'become' other states, like an actual state machine.

For example:

A := B | C | hello
B := bye | see ya
C := go away

void A() {
    char next = getNext();
    if (next == 'b' || next == 's') {
        become B();
    }
    if (next == 'g') {
        become C();
    }
    if (next == 'h') {
        // consume until hello is found, or throw exception
        // then put some token on the stack
    }
}

void B() {
    // consume until 'bye' or 'see ya'
}

void C() {
    // consume until 'go away'
}

This would minimize memory use and allow me to write code that 
more closely matches the grammar. There are plenty of other use 
cases, but DSLs would be very easy to implement with TCE.
October 12, 2012
Re: Explicit TCE
On Friday, 12 October 2012 at 18:02:57 UTC, bearophile wrote:
> Tyler Jameson Little:
>
>> D could use something like Newsqueak's become keyword. If 
>> you're not familial with Newsqueak, become is just like a 
>> return, except it replaces the stack frame with the function 
>> that it calls.
>
> Are you talking about CPS?
> http://en.wikipedia.org/wiki/Continuation_passing_style

I don't think it would necessitate CPS, but that is a nice side 
effect. I'm thinking more of a recursive function call that may 
or may not return. For example, a process that shovels data 
between two network connections. If the data never stops, the 
function will never return. If there's some kind of a problem, 
then it could return with that error, and be restarted when that 
problem is fixed. All of this could happen with a series of 
function calls that use the same stack.

void handleIncommingData() {
   if (error) {
       // returns directly to manageLongRunningProcess
       return;
   }

   // do something useful

   become longRunningProcess();
}

void longRunningProcess() {
    become handleIncommingData();
}

void manageLongRunningProcess() {
    longRunningProcess();
    // there was a problem, so fix it
    ....

    // try again
    manageLongRunningProcess();
}

Exceptions are not needed, so these can be nothrow functions, and 
this implementation is simpler than some complex while loop, 
while having the same memory footprint.

CPS would make things like imitating javascript's 
setTimeout/setInterval possible. I don't think this is a major 
benefit for D because the parallelism/concurrency support is 
already pretty awesome.

The main benefit is for implementing things like lexical 
analyzers (or tokenizers, whatever), which don't really depend on 
previous states and can emit tokens. This allows for efficient 
representation of recursive problems, that call functions 
circularly (a -> b -> c -> a -> b ...), like a state machine.

I think it just allows an extra level of expressiveness without a 
backwards incompatible change to the language. True, you can 
express this same idea with trampolining, but that isn't as fun:

http://stackoverflow.com/a/489860/538551
http://en.wikipedia.org/wiki/Tail-recursive_function#Through_trampolining

There are still some problems that I think a LISP language would 
make more sense for, and for those problems, it would be great to 
express them in D with my other code.

>> DMD should optimize this already, but explicitly stating 
>> become is a key to the compiler that the user wants this call 
>> to be eliminated.  Then, more interesting things can be 
>> implemented more simply, like a state machine:
>>
>>    void stateA() {
>>        become stateB();
>>    }
>>
>>    void stateB() {
>>        become stateC();
>>    }
>>
>>    void stateC() {
>>        return;
>>    }
>>
>>    void main() {
>>        become stateA();
>>    }
>
> Seems nice.

I'm glad you think so =D

>> I'm not sure how D handles stack sizes, so this may be an 
>> issue as well if stack sizes are determined at runtime.
>
> D doesn't currently support C99 VLAs, but it supports alloca(), 
> so stack frames are sized dynamically. But maybe this is not a 
> big problem for CPS.

Well, dynamic stack frames aren't strictly a bad thing for CPS, 
it just removes the memory use guarantee. There's already a huge 
memory gain from using TCE, I just don't know debugging would be 
done if a function keeps adding to and passing a dynamic array.
October 12, 2012
Re: Explicit TCE
On 12-Oct-12 22:17, Tyler Jameson Little wrote:
>> No idea what you are talking about.
>
> I'm not sure which part wasn't clear, so I'll try to explain myself.
> Please don't feel offended if I clarify things you already understand.
>


> An optimizable tail call must simply be a function call. The current
> stack frame would be replaced with the new function, so anything more
> complex than a simple function call would require some stack from the
> preceding function to stick around in the new function, thus requiring
> the old stack to stick around.
>
Yup.

> For example, te following is not optimizable the old stack (the one with
> 3) needs to be maintained until foo() returns, which is not TCE.
>
> return foo() * 3
>
>
> Since the old stack won't be around anymore, that leaves us with in a
> sticky situation with regard to scope():
>
> http://dlang.org/statement.html#ScopeGuardStatement
>
> If the current stack is going to be replaced with data from another
> function call, the behavior of scope() is undefined. The scope that
> scope() was in has now been repurpose, but the scope is still kind of
> there. If scope() is allowed, they must be executed just before the tail
> call, otherwise it will be overwritten (or it has to stick around until
> the actual stack frame is cleared.

 Consider:
>
> void a() {
>    become b();
> }
>
> void b() {
>    // when does this get called?
>    scope(exit) writeln("exited");
>    become a();
> }
>
> If we allow scope(), then the line should be written before the call to
> a(). If we don't, then this is a compile time error. I like disallowing
> it personally, because if the scope(exit) call frees some memory that is
> passed to a, the programmer may think that it will be called after a
> exits, which may not be the case.
>
I think it should be disallowed. That along with destructors.

> void a(void* arr) {
>    // do something with arr
>    become b();
> }
>
> void b() {
>    void* arr = malloc(sizeof(float) * 16);
>    scope(exit) free(arr);
>    become a(arr);
> }
>
> I just see this as being a problem for those who don't fully understand
> scoping and TCE.


>
> My mention of overhead was just how complicated it would be to
> implement. The general algorithm is (for each become keyword):
>
> * determine max stack size (consider all branches in all recursive
> contexts)
> * allocate stack size for top-level function
> * do normal TCE stuff (use existing stack for new call)

What's wrong with just allocating a new stack _in-place_  of the old?
In other words make 'become' synonym for 'reuse the current stack frame'.
Effectively you still stay in constant space that is maximum of all 
functions being called.

> The stack size should be known at compile time for cases like the one
> above (a calls b, b calls a, infinitely) to avoid infinitely expanding
> stack. A situation like this is a memory optimization, so forcing
> guaranteed stack size puts an upper-bound on memory usage, which is the
> whole point of TCE. If the stack is allowed to grow, there is
> opportunity for stack overflow.

Right.

> My use case for this is a simple compiler, but I'm sure this could be
> applied to other use cases as well.  I'd like to produce code for some
> BNF-style grammar where each LHS is a function. Thus, my state machine
> wouldn't be a huge, unnatural switch statement that reads in the current
> state, but a series of code branches that 'become' other states, like an
> actual state machine.

I see nice staff. My use case is optimizing virtual machine, the one 
inside std.regex primarily.

-- 
Dmitry Olshansky
October 12, 2012
Re: Explicit TCE
>> My mention of overhead was just how complicated it would be to
>> implement. The general algorithm is (for each become keyword):
>>
>> * determine max stack size (consider all branches in all 
>> recursive
>> contexts)
>> * allocate stack size for top-level function
>> * do normal TCE stuff (use existing stack for new call)
>
> What's wrong with just allocating a new stack _in-place_  of 
> the old?
> In other words make 'become' synonym for 'reuse the current 
> stack frame'.
> Effectively you still stay in constant space that is maximum of 
> all functions being called.

That would work too. If scope() is disallowed, it doesn't matter 
where the stack comes from. It's only slightly cheaper to reuse 
the current stack (CPU), but making a new one would be lighter on 
memory.

> I see nice staff. My use case is optimizing virtual machine, 
> the one inside std.regex primarily.

Yeah, that is a great example! I've read some bug reports about 
std.regex using a ton of memory, especially with CTFE. Since 
regex is by definition a state machine, this would be a 
particularly elegant fit (granted, backreferences et al break 
that model, but it's still a nice metaphor).

The main problem I see is working with other compilers like 
GCC/LLVM. If this can be done on those compilers, I don't see any 
major hurdle to getting this implemented.
October 12, 2012
Re: Explicit TCE
On 12-Oct-12 22:49, Tyler Jameson Little wrote:
>
> That would work too. If scope() is disallowed, it doesn't matter where
> the stack comes from. It's only slightly cheaper to reuse the current
> stack (CPU), but making a new one would be lighter on memory.
>
>> I see nice staff. My use case is optimizing virtual machine, the one
>> inside std.regex primarily.
>
> Yeah, that is a great example! I've read some bug reports about
> std.regex using a ton of memory, especially with CTFE.

Hey, that's dmd (compiler) using a ton of memory,  not std.regex :(
It actually flies with only a modest set of ram after CTFE (or rather 
'if') succeeds that is :)

Since regex is by
> definition a state machine, this would be a particularly elegant fit
> (granted, backreferences et al break that model, but it's still a nice
> metaphor).

Yeah, without backreferences it's a state machine. Still it's NFA 
(non-deterministic) as no DFA would do if you want to get things like 
captures of sub matches etc. Either way the remarkable giant switch is 
present ;)

>
> The main problem I see is working with other compilers like GCC/LLVM. If
> this can be done on those compilers, I don't see any major hurdle to
> getting this implemented.

Perhaps the biggest one would be convincing GCC/LLVM devs to accept 
patches :)

-- 
Dmitry Olshansky
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home