November 15, 2016
On Tuesday, 15 November 2016 at 22:50:49 UTC, deadalnix wrote:
> On Tuesday, 15 November 2016 at 01:35:42 UTC, Stefan Koch wrote:
>> However there is a a bug inside the code that does bounds-checking for array assignment.
>> In rare cases it can trigger a out-bounds-error on newly created arrays.
>>
>
> This raise all kind of red flags to me. What are the design decision that lead to this ?

I am still figuring out when and why the bug is caused.
The Byte-code for slice allocation is rather complex.
Because it has to deal with resizing slices as well.
I suspect that somewhere the heapPtr is not bumped or the length is not set correctly.
November 16, 2016
On Tuesday, 15 November 2016 at 23:46:51 UTC, Stefan Koch wrote:
> On Tuesday, 15 November 2016 at 22:50:49 UTC, deadalnix wrote:
>> On Tuesday, 15 November 2016 at 01:35:42 UTC, Stefan Koch wrote:
>>> However there is a a bug inside the code that does bounds-checking for array assignment.
>>> In rare cases it can trigger a out-bounds-error on newly created arrays.
>>>
>>
>> This raise all kind of red flags to me. What are the design decision that lead to this ?
>
> I am still figuring out when and why the bug is caused.
> The Byte-code for slice allocation is rather complex.
> Because it has to deal with resizing slices as well.
> I suspect that somewhere the heapPtr is not bumped or the length is not set correctly.

Indeed the length was not set on a code-path meant for resizeing.
The problem is fixed :)

The HeapLimit has been raised to a more reasonable limit of 2 ^^ 24 Addresses. (Which means that you'll have 2 ^^ 24 Bytes in practice.)

November 16, 2016
On Wednesday, 16 November 2016 at 09:22:01 UTC, Stefan Koch wrote:
> On Tuesday, 15 November 2016 at 23:46:51 UTC, Stefan Koch wrote:
>
>> I suspect that somewhere the heapPtr is not bumped or the length is not set correctly.
>
> Indeed the length was not set on a code-path meant for resizeing.
> The problem is fixed :)
>
> The HeapLimit has been raised to a more reasonable limit of 2 ^^ 24 Addresses. (Which means that you'll have 2 ^^ 24 Bytes in practice.)

In fact, there is a single design decision that fosters these kinds of problems.
And that is to go with a low-level IR.

However although it's a tough route. It's also the only solution, (the only one I could think of), that will really enable CTFE to scale gracefully.

My latest measurements show that even for relatively small arrays (2 ^^ 15) bytes.
There is a 2x speedup.
When the interpreter backend.
As soon as my own jit backend is in place the performance will be a factor of 6 better.
Makeing newCTFE 12x faster then the current engine, even on small one-shot functions!

November 16, 2016
Here is a small demostration of the performance increase :

[root@localhost dmd]# time src/dmd -c testSettingArrayLength.d > x 2> x

real	0m0.199s
user	0m0.180s
sys	0m0.017s
[root@localhost dmd]# time src/dmd -c testSettingArrayLength.d -bc-ctfe > x 2> x

real	0m0.072s
user	0m0.050s
sys	0m0.020s

Please note that newCTFE only spends 15 ms inside the evaluation, most time is spent clearing the 2^^24 Bytes of heap-memory to zero.

The sourcode of testSettingArrayLength is
uint[] MakeAndInitArr(uint length)
{
    uint[] arr;
    arr.length = length;

    foreach(i;0 .. length)
    {
        arr[i] = i + 3;
    }
    return arr;
}

static assert(MakeAndInitArr(ushort.max).length == ushort.max);

November 16, 2016
On Wednesday, 16 November 2016 at 09:45:24 UTC, Stefan Koch wrote:
> Here is a small demostration of the performance increase :
>
> [root@localhost dmd]# time src/dmd -c testSettingArrayLength.d
> > x 2> x
>
> real	0m0.199s
> user	0m0.180s
> sys	0m0.017s
> [root@localhost dmd]# time src/dmd -c testSettingArrayLength.d -bc-ctfe > x 2> x
>
> real	0m0.072s
> user	0m0.050s
> sys	0m0.020s
>
> Please note that newCTFE only spends 15 ms inside the evaluation, most time is spent clearing the 2^^24 Bytes of heap-memory to zero.
>
> The sourcode of testSettingArrayLength is
> uint[] MakeAndInitArr(uint length)
> {
>     uint[] arr;
>     arr.length = length;
>
>     foreach(i;0 .. length)
>     {
>         arr[i] = i + 3;
>     }
>     return arr;
> }
>
> static assert(MakeAndInitArr(ushort.max).length == ushort.max);

A more accurate breakdown :

Initializing Heap:     18.6 ms
Generating Bytecode:    1.2 ms
Executing Bytecode:    13.2 ms
Converting to CTFE-EXp: 9.1 ms

For a second execution of the same function with the same arguments within the same file the numbers look like :

Initializing Heap:     16.7 ms
Generating Bytecode:    0.6 ms
Executing Bytecode:    13.2 ms
Converting to CTFE-EXp: 9.3 ms




November 16, 2016
On Wednesday, 16 November 2016 at 10:07:06 UTC, Stefan Koch wrote:
>
> A more accurate breakdown :
>
> Initializing Heap:     18.6 ms
> Generating Bytecode:    1.2 ms
> Executing Bytecode:    13.2 ms
> Converting to CTFE-EXp: 9.1 ms
>
> For a second execution of the same function with the same arguments within the same file the numbers look like :
>
> Initializing Heap:     16.7 ms
> Generating Bytecode:    0.6 ms
> Executing Bytecode:    13.2 ms
> Converting to CTFE-EXp: 9.3 ms

The above numbers were obtained using a debug build made with dmd.
The following numbers are from a optimized build with ldmd2

First Execution (cold cache) :

Initializing Heap:     17.4 ms
Generating Bytecode:    0.7 ms
Executing Bytecode:     5.3 ms
Converting to CTFE-EXp: 5.1 ms

Second run (warmer cache) :

Initializing Heap:     16.9 ms
Generating Bytecode:    0.3 ms
Executing Bytecode:     5.3 ms
Converting to CTFE-EXp: 4.9 ms


November 16, 2016
On Wednesday, 16 November 2016 at 10:25:30 UTC, Stefan Koch wrote:

> First Execution (cold cache) :
>
> Initializing Heap:     17.4 ms
> Generating Bytecode:    0.7 ms
> Executing Bytecode:     5.3 ms
> Converting to CTFE-EXp: 5.1 ms
>
> Second run (warmer cache) :
>
> Initializing Heap:     16.9 ms
> Generating Bytecode:    0.3 ms
> Executing Bytecode:     5.3 ms
> Converting to CTFE-EXp: 4.9 ms


And Again a bit of bad news.
Due to problems in the lowering of function arguments the implementation of strcat is delayed again.
November 17, 2016
On Wednesday, 16 November 2016 at 14:44:06 UTC, Stefan Koch wrote:
>
> And Again a bit of bad news.
> Due to problems in the lowering of function arguments the implementation of strcat is delayed again.

The bug does not affect strings.
Since strings are not build up out of multiple sub-expressions.
strcat is on it's way.

I have begun the process of macrofication.
November 17, 2016
On Thursday, 17 November 2016 at 05:35:33 UTC, Stefan Koch wrote:
> On Wednesday, 16 November 2016 at 14:44:06 UTC, Stefan Koch wrote:
>>
>> And Again a bit of bad news.
>> Due to problems in the lowering of function arguments the implementation of strcat is delayed again.
>
> The bug does not affect strings.
> Since strings are not build up out of multiple sub-expressions.
> strcat is on it's way.
>
> I have begun the process of macrofication.

I follow this thread every day. I hope you'll write an article on dlang blog when the work will be completed :)
November 17, 2016
On Thursday, 17 November 2016 at 08:39:57 UTC, Andrea Fontana wrote:
>
> I follow this thread every day. I hope you'll write an article on dlang blog when the work will be completed :)

Mike Parker is going to write an short article about it based on information I gave him via mail.

I am afraid my blog-writing skills are a bit under-developed.

On the topic of CTFE :
My attempts of half-automatically generating a string-concat macro have yet to succeed.

I am currently busy fixing bugs :)
I apologize for the seemingly slow progress.
However,
such is the nature of low-level code.
There is really no way around it,
if we are aiming for performance.