May 15, 2016
On Sunday, 15 May 2016 at 12:00:43 UTC, Martin Nowak wrote:
> On 05/15/2016 01:55 PM, Ola Fosheim Grøstad wrote:
>> If you are going to have fast evaluation of loops/recursion then you need to use a solver. And well, doing worse than O(log N) at compile time is a very bad idea.
>> 
>> Why not start with the most difficult case first? Then the simple cases will resolve themselves for free, most likely.
>
> Why not do something that takes about a month and is much more likely to succeed?

Well, you can, but it won't bring improvements to the language down the line. If typical CTFE can be rewritten as simple tail recursion then you probably can find existing libraries that will do it for you.

May 15, 2016
On 15/05/2016 9:57 PM, Martin Nowak wrote:
> On 05/15/2016 01:58 PM, Daniel Murphy wrote:
>> The biggest advantage of bytecode is not the interpreter speed, it's
>> that by lowering you can substitute VarExps etc with actual references
>> to memory without modifying the AST.
>>
>> By working with something lower level than the AST, you should end up
>> with something much less complex and with fewer special cases.
>
> Which is a bad assessment, you can stick variable indexes into
> VarDeclaration (we already do that) and thereby access them in O(1).
> Converting control flow and references into byte code is far from
> trivial, we're talking about another s2ir and e2ir here.
>
> -Martin
>

For simple types that's true.  For more complicated reference types...

Variable indexes are not enough, you also need heap memory, but slices and pointers (and references) can refer to values either on the heap or the stack, and you can have a slice of a member static array of a class on the stack, etc.  Then there are closures...

Neither e2ir or s2ir are actually that complex.  A lot of the mess there comes from the backend IR interface being rather difficult to work with.  We can already save a big chunk of complexity by not having to translate the frontend types.  E.g.  implementing the logic in the interpreter to correctly unwind through destructors is unlikely to be simpler than lowering to an IR.
May 15, 2016
On 15/05/2016 9:57 PM, Martin Nowak wrote:
> On 05/15/2016 01:58 PM, Daniel Murphy wrote:
>> The biggest advantage of bytecode is not the interpreter speed, it's
>> that by lowering you can substitute VarExps etc with actual references
>> to memory without modifying the AST.
>>
>> By working with something lower level than the AST, you should end up
>> with something much less complex and with fewer special cases.
>
> Which is a bad assessment, you can stick variable indexes into
> VarDeclaration (we already do that) and thereby access them in O(1).
> Converting control flow and references into byte code is far from
> trivial, we're talking about another s2ir and e2ir here.
>
> -Martin
>

We really should have discussed this last week!
May 15, 2016
On 05/15/2016 02:17 PM, Daniel Murphy wrote:
> 
> For simple types that's true.  For more complicated reference types...
> 
> Variable indexes are not enough, you also need heap memory, but slices and pointers (and references) can refer to values either on the heap or the stack, and you can have a slice of a member static array of a class on the stack, etc.  Then there are closures...

So we do need a GC or RC for arrays, structs, classes (anything heapish). Values for those could be allocated by a simple bump/region allocator or a dedicated allocator that support individual freeing (via RC or GC).

In any case struct Pointer { int index; /* 2B positive values for stack, 2B negative for heap*/ } wouldn't be much more complicated than a raw pointer (and a bit simpler to garbage collect).

> Neither e2ir or s2ir are actually that complex.  A lot of the mess there comes from the backend IR interface being rather difficult to work with.

Think of a simple switch statement where you even need to introduce
relocations (or keep a list of fixup addresses) b/c you don't know the
jump addresses in advance.
In a visitor you simply test the cases and execute the first case body.

Not to mention that we can reuse existing solutions from the current
interpreter (e.g. for gotos see
https://github.com/dlang/dmd/blob/0757504342e48e272372b7ac52cda5a333b2a2bc/src/dinterpret.d#L1014
and
https://github.com/dlang/dmd/blob/0757504342e48e272372b7ac52cda5a333b2a2bc/src/dinterpret.d#L1094).
May 15, 2016
On 05/15/2016 02:13 PM, Ola Fosheim Grøstad wrote:
> 
> Well, you can, but it won't bring improvements to the language down the line.

Maybe you don't know the actual problem of the current interpreter?
I leaks memory like hell b/c it allocates new AST nodes for almost every
expression evaluation.
Even an interpreter that is as slow as ruby will fly during compile time
in comparision to the current one.

Let me illustrate the point.

---
import std.algorithm, std.array, std.random, std.range;

enum count = 2 ^^ 10;
enum sorted = {
    auto gen = Random(123);
    return generate!(() => uniform(byte.min, byte.max,
gen)).take(count).array.sort().release;
}();
pragma(msg, sorted.length);
---
count = 2 ** 10
nums = Enumerator.new do |yielder|
  prng = Random.new(123)
  loop do
    yielder.yield prng.rand(-128 .. 127)
  end
end.take(count).sort
print nums.length
---

   N   |     CTFE     |     Ruby
       | Time  | Mem  | Time  | Mem
-------|-------|------|-------|------
 2^^10 | 0.16s | 82M  | 0.11s | 9.3M
 2^^11 | 0.22s | 110M | 0.12s | 9.3M
 2^^12 | 0.4s  | 190M | 0.12s | 9.4M
 2^^13 | 0.7s  | 450M | 0.12s | 9.5M
 2^^14 | 1.5s  | 1.4G | 0.12s | 9.7M
 2^^15 | 3.7s  | 4.8G | 0.13s | 10.0M
 2^^16 | 5:30m | 15G  | 0.13s | 10.8M

D's CTFE grows O(N^2) b/c it leaks for almost every operation.
We don't currently need a superfast interpreter, even the simplest
possible interpreter will allow so much more that we're more likely
limited by the lack of I/O before we need a faster interpreter.

May 15, 2016
On 05/15/2016 02:54 PM, Daniel Murphy wrote:
> 
> We really should have discussed this last week!

I talked about it w/ Stefan, and asked him to volunteer for an
implementation, that's why we have this thread ;).
In any case I'm convinced that the simple-first strategy has a much
higher chance to be implemented this year, whereas the bug report [¹]
for slow CTFE is already 5 years old.

[¹]: https://issues.dlang.org/show_bug.cgi?id=6498

May 15, 2016
On Sunday, 15 May 2016 at 12:54:33 UTC, Daniel Murphy wrote:
>
> We really should have discussed this last week!

I agree.
Maybe we can have a skype conference or something in the next days.

About the whole to BC or not to BC discussion.
As Daniel already outlined, the main purpose it not speed, but having a simple  lowered representation to interpret.
Many AST interpreters switched to a byte-code because it's much easier to maintain.


May 16, 2016
On 15/05/2016 11:25 PM, Martin Nowak wrote:
> On 05/15/2016 02:17 PM, Daniel Murphy wrote:
>>
>> For simple types that's true.  For more complicated reference types...
>>
>> Variable indexes are not enough, you also need heap memory, but slices
>> and pointers (and references) can refer to values either on the heap or
>> the stack, and you can have a slice of a member static array of a class
>> on the stack, etc.  Then there are closures...
>
> So we do need a GC or RC for arrays, structs, classes (anything
> heapish). Values for those could be allocated by a simple bump/region
> allocator or a dedicated allocator that support individual freeing (via
> RC or GC).
>
> In any case struct Pointer { int index; /* 2B positive values for stack,
> 2B negative for heap*/ } wouldn't be much more complicated than a raw
> pointer (and a bit simpler to garbage collect).
>

The problem is, if index refers to a single variable on the stack, then it's insufficient to refer to a variable inside an aggregate on the stack.  Then you need to start building constructs for member of struct in array of struct pointers and it gets fairly messy...  It's all solvable, I'm not sure the simplicity would survive.

>> Neither e2ir or s2ir are actually that complex.  A lot of the mess there
>> comes from the backend IR interface being rather difficult to work with.
>
> Think of a simple switch statement where you even need to introduce
> relocations (or keep a list of fixup addresses) b/c you don't know the
> jump addresses in advance.

This is not exactly difficult to do.

> In a visitor you simply test the cases and execute the first case body.
>
> Not to mention that we can reuse existing solutions from the current
> interpreter (e.g. for gotos see
> https://github.com/dlang/dmd/blob/0757504342e48e272372b7ac52cda5a333b2a2bc/src/dinterpret.d#L1014
> and
> https://github.com/dlang/dmd/blob/0757504342e48e272372b7ac52cda5a333b2a2bc/src/dinterpret.d#L1094).
>

Flow control is really not where the complexity lies IMO.  The weird ways in which different types of reference types can combine leads to either very complicated or very low level descriptions of memory.
May 15, 2016
On 05/15/2016 04:00 PM, Daniel Murphy wrote:
> The problem is, if index refers to a single variable on the stack, then it's insufficient to refer to a variable inside an aggregate on the stack.  Then you need to start building constructs for member of struct in array of struct pointers and it gets fairly messy...  It's all solvable, I'm not sure the simplicity would survive.

Taking what I said further down below there would be one union Value type (untagged b/c the compiler knows what it is).

Then an Array could be implementd as HeapValueRef[], a hash table as
HeapValueRef[ValueRef], a struct/class as HeapValueRef[] (with
HeapValueRef being a pointer or int to a heap allocated Value and
ValueRef being a pointer or int to either a stack value or a heap value).
A D Pointer/Ref would just be a ValueRef in the interpreter and the
aforementioned int (2B + for stack, 2B - for heap) encoding should still
work for that.
Depending on how much pointer arithmetic we support, Pointer must
contain a ValueRef to the outermost aggregate and needs to store the
type of that an additional byte offset. The type and offset could then
be used to compute the actual pointee. While that sounds a bit complex,
it's merely just https://en.wikipedia.org/wiki/Offsetof.

> Flow control is really not where the complexity lies IMO.  The weird ways in which different types of reference types can combine leads to either very complicated or very low level descriptions of memory.

Maybe you're right, but it'll be hard to figure out w/o an actual implementation. And the AST still looks like a lot less to code and execute.

May 15, 2016
On Sunday, 15 May 2016 at 13:44:45 UTC, Martin Nowak wrote:
>  2^^15 | 3.7s  | 4.8G | 0.13s | 10.0M
>  2^^16 | 5:30m | 15G  | 0.13s | 10.8M
>
> D's CTFE grows O(N^2) b/c it leaks for almost every operation.
> We don't currently need a superfast interpreter, even the simplest
> possible interpreter will allow so much more that we're more likely
> limited by the lack of I/O before we need a faster interpreter.

Well, this looks really bad.  But a solver would get you much more than an interpreter. E.g. proving that asserts always hold etc.