Battle-plan for CTFE (page 3)

On Wednesday, May 11, 2016 07:06:59 maik klein via Digitalmars-d-announce wrote: > What is the current problem with ctfe? The biggest problem is that it uses up a _lot_ of memory and is generally slow. For instance, as I understand it, every time it mutates a variable, it actually allocates a new one to hold the new state. This combined with the fact that the compiler doesn't actually ever free memory (since it's normally more efficient for it to work that way), and you risk running out of memory while compiling. CTFE is a fantastic feature, but it evolved over time rather than being designed up front, and it's suffered a lot because of that. Don did a _lot_ of work to improve it, but he wasn't able to continue working on it, and until now, no one has ever really stepped up to finish the job. Don's post gives a good background on why CTFE is the way it is and some of what he did to make it as solid as it is now: http://forum.dlang.org/post/jmvsbhdpsjgeykpukoxf@forum.dlang.org But having someone like Stefan reimplement will be _huge_, and the D community will be _very_ grateful. - Jonathan M Davis

On 05/10/2016 08:45 AM, Jacob Carlborg wrote: > > I was listening to a discussion Don and Daniel had about the current implementation of CTFE. They talked about using a byte code interpreter. Even implementing a really crappy byte code interpreter would be a huge improvement. No need for a byte-code interpreter, it mostly just adds overhead and complexity over an AST interpreter. If you want to go really fast you need some sort of JIT anyhow, but a proper interpreter will be orders of mangnitude faster than the current implementation. I might refer you to http://dconf.org/2013/talks/chevalier_boisvert.pdf page 59 ff.

May 15, 2016

Re: Battle-plan for CTFE

Posted by Martin Nowak
in reply to Stefan Koch

Permalink

Martin Nowak

Posted in reply to Stefan Koch

Permalink

On 05/13/2016 06:32 PM, Stefan Koch wrote:
> I would like to work on a solution that does scale. The Problem is not making a byteCode-interpreter. That part is relatively easy. Currently I am trying to get a detailed understanding of dmd and it's data-structures. (mainly it's AST.)
> 
> Generating the byte-code seems to be non-trivial.
> 
> I wonder in how far the glue layer can be of help...

Seems like I've to repeat this once more, b/c everyone including me didn't got it in the first place. We don't need a bytecode interpreter, it mostly adds overhead and a complicated second layer between walking the AST and interpreting it (think of transforming a for loop with goto into linear bytecode, almost as complicated as in the glue layer).

What we basically need is a stack of values, a stack of frames (for
function calls and variables in scopes), and an AST visitor that does
the interpretation.
It would be most helpful for the success of this to follow common CS
examples like [¹], [²], or [³].
With realistic expectations we might have a working implementation in
a month or so. With more requirements like bytecode, using dmd's
backend, or JIT we end up with a long newsgroup discussion ;).

Tricky things for a CTFE interpreter include:

- enumerating VarDeclarations (they already have a ctfeAdrOnStack
field) in each scope, and referring to outer variables from nested scopes

At best just use a continuous stack and just set the stack pointer to the last frame pointer when leaving a scope.

- getting function calls right

Push arguments, on return shift top of stack under arguments and pop them (caller cleanup). If possible detect and support tail recursion.

- converting AST values to and from Interpreter Values.

Literals and constant VarExp from the AST need to be converted to an
interpreter Value before being pushed on the stack. The result of
interpretation (top of stack) needs to be converted back to an AST
literal.
Using separate data types (instead of creating AST values in the
interpreter) will be a major performance improvement over using AST
values (e.g. 16 vs. ~100 bytes). It also creates a clear boundary
between Interpreter and AST values. Currently quite some complexity is
thrown at cleaning interpreter generated AST values, and
distinguishing interpreter owned from normal AST values (which allows
some optimizations) [⁴].

We don't need a tagged union b/c the AST already contains the type information, but a tag could be helpful for debugging [⁵].

Values can be deleted when popped from stack (no ref counting needed I
think).

- Implementing more complex data structures (arrays, strings, hash
tables, aggregates)

Use Value[], Value[Value], and a dedicated String (char[]/wchar[]/dchar[]). For structs/classes field indexes are known => use fix-sized Value[]. Value.class_ needs to hold a reference to the actual class instance for vtable calls.

Last time I was working on this (also on a bytecode interpreter) the
entry point was fairly clear [⁶] (thanks to Don).

-Martin

[¹]: [The Interpreter In An Undergraduate Compilers Course
](http://arxiv.org/pdf/1412.0426.pdf)
[²]: [L8: Interpreters &
Visitors](http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-005-elements-of-software-construction-fall-2011/lecture-notes/MIT6_005F11_lec08.pdf)
[³]: [PA 2:
Interpreter](https://sites.google.com/a/bodik.org/cs164/projects/pa2)
[⁴]: https://github.com/dlang/dmd/search?q=ownedByCtfe
[⁵]:
https://github.com/MartinNowak/dmd/blob/28ffb0ab4fa6950f60c085f33f8a2ce23df7c0cd/src/interpret.c#L73
[⁶]:
https://github.com/MartinNowak/dmd/blob/28ffb0ab4fa6950f60c085f33f8a2ce23df7c0cd/src/interpret.c#L693

On 05/09/2016 06:57 PM, Stefan Koch wrote: > I was shocked to discover that the PowExpression actually depends on phobos! (depending on the exact codePath it may or may not compile...) which let to me prematurely stating that it worked at ctfe [http://forum.dlang.org/thread/ukcoibejffinknrbzktv@forum.dlang.org] There is a really old bug report for that [Issue 3749 – cannot evaluate yl2x (log) and exp functions at compile time](https://issues.dlang.org/show_bug.cgi?id=3749). The lack of exp is really limiting for many nice table precomputation use-cases in scientific contexts. -Martin

On Sunday, 15 May 2016 at 10:29:21 UTC, Martin Nowak wrote: > On 05/10/2016 08:45 AM, Jacob Carlborg wrote: > overhead and complexity over an AST interpreter. If you want to go really fast you need some sort of JIT anyhow, but a proper interpreter will be orders of mangnitude faster than the current implementation. If you are going to have fast evaluation of loops/recursion then you need to use a solver. And well, doing worse than O(log N) at compile time is a very bad idea. Why not start with the most difficult case first? Then the simple cases will resolve themselves for free, most likely.

On 05/15/2016 01:58 PM, Daniel Murphy wrote: > The biggest advantage of bytecode is not the interpreter speed, it's that by lowering you can substitute VarExps etc with actual references to memory without modifying the AST. > > By working with something lower level than the AST, you should end up with something much less complex and with fewer special cases. Which is a bad assessment, you can stick variable indexes into VarDeclaration (we already do that) and thereby access them in O(1). Converting control flow and references into byte code is far from trivial, we're talking about another s2ir and e2ir here. -Martin

On 15/05/2016 8:29 PM, Martin Nowak wrote: > > No need for a byte-code interpreter, it mostly just adds overhead and > complexity over an AST interpreter. If you want to go really fast you > need some sort of JIT anyhow, but a proper interpreter will be orders of > mangnitude faster than the current implementation. > The biggest advantage of bytecode is not the interpreter speed, it's that by lowering you can substitute VarExps etc with actual references to memory without modifying the AST. By working with something lower level than the AST, you should end up with something much less complex and with fewer special cases. The current goal is not a full JIT, just something that manages memory in a less insane way.

On 05/15/2016 01:55 PM, Ola Fosheim Grøstad wrote: > If you are going to have fast evaluation of loops/recursion then you need to use a solver. And well, doing worse than O(log N) at compile time is a very bad idea. > > Why not start with the most difficult case first? Then the simple cases will resolve themselves for free, most likely. Why not do something that takes about a month and is much more likely to succeed? If someone has more time and needs an even faster interpreter she can write a new one, or add optimizations or JIT to the simple interpreter.

On Sunday, 15 May 2016 at 10:29:21 UTC, Martin Nowak wrote: > On 05/10/2016 08:45 AM, Jacob Carlborg wrote: >> >> I was listening to a discussion Don and Daniel had about the current implementation of CTFE. They talked about using a byte code interpreter. Even implementing a really crappy byte code interpreter would be a huge improvement. > > No need for a byte-code interpreter, it mostly just adds overhead and complexity over an AST interpreter. If you want to go really fast you need some sort of JIT anyhow, but a proper interpreter will be orders of mangnitude faster than the current implementation. > > I might refer you to > http://dconf.org/2013/talks/chevalier_boisvert.pdf > page 59 ff. Correct. A ByteCode Interpreter will add even more implementation overhead, and the benefit is only realizable if the ByteCode is a standard format that can be read other backends such as a jit.

On 05/15/2016 02:02 PM, Stefan Koch wrote: > > Correct. A ByteCode Interpreter will add even more implementation overhead, and the benefit is only realizable if the ByteCode is a standard format that can be read other backends such as a jit. This indeed would be an interesting proposal, interpretable IR that is portable between the backends. But I guess we won't find such IR or at least need to lower it into RealIR for dmd/gcc/llvm. Sounds like an interesting candidate for the next GSoC.

Forums