December 07, 2016
On Wednesday, 7 December 2016 at 18:00:40 UTC, Stefan Koch wrote:
> I have an update about ctfe performance.
> I wondered for quite a while why newCTFE had a 5 millisecond overhead, when compared the old interpreter.
> Since on my charts it had comparable or better stats.
> I finally figured it out.
>
> Because the interpreter is supposed to be ctfe-able itself it uses a gc-allocated stack.
> And that in turn call envokes the gc.
> The mark phase of the garbage-collector matches up exactly with the extra time taken.
>
> I have no idea, why this happens though since the GC is supposed to be disabled.

It turns out that almost classes in dmd that are em-placed.
Whereas by bytecode visitor uses the regular new, because it is fairly sizable. The gc is much more visible.

Although that does still not explain why GC.disable seems to have no effect.
December 07, 2016
On Wednesday, 7 December 2016 at 18:50:20 UTC, Stefan Koch wrote:
> On Wednesday, 7 December 2016 at 18:00:40 UTC, Stefan Koch wrote:
>> I have an update about ctfe performance.
>> I wondered for quite a while why newCTFE had a 5 millisecond overhead, when compared the old interpreter.
>> Since on my charts it had comparable or better stats.
>> I finally figured it out.
>>
>> Because the interpreter is supposed to be ctfe-able itself it uses a gc-allocated stack.
>> And that in turn call envokes the gc.
>> The mark phase of the garbage-collector matches up exactly with the extra time taken.
>>
>> I have no idea, why this happens though since the GC is supposed to be disabled.
>
> It turns out that almost classes in dmd that are em-placed.
> Whereas by bytecode visitor uses the regular new, because it is fairly sizable. The gc is much more visible.
>
> Although that does still not explain why GC.disable seems to have no effect.

After a few more iterations the observed slow down seems to disappear.
I am suspicious.
December 07, 2016
I did the change in continue and break handling.

Now ForStatements should be handled correctly for the most part.
I think there could still be a bug if you do labeled continue on a for-statement.
I will do a change later that takes that into account as well.

Then we should be all set to implement call handling I believe.
December 08, 2016
On Wednesday, 7 December 2016 at 20:56:08 UTC, Stefan Koch wrote:
> I did the change in continue and break handling.
>
> Now ForStatements should be handled correctly for the most part.
> I think there could still be a bug if you do labeled continue on a for-statement.
> I will do a change later that takes that into account as well.
>
> Then we should be all set to implement call handling I believe.

Array-literals with ctfe-known Variables in them are now supported;
uint[] fn(uint n)
{
  return [n, n, n];
}

static assert(fn(3).length == 3);
static assert(fn(3)[0] == 3);
static assert(fn(3)[1] == 3);
static assert(fn(3)[2] == 3);


December 08, 2016
On Thursday, 8 December 2016 at 01:52:13 UTC, Stefan Koch wrote:
>
> Array-literals with ctfe-known Variables in them are now supported;
> uint[] fn(uint n)
> {
>   return [n, n, n];
> }
>
> static assert(fn(3).length == 3);
> static assert(fn(3)[0] == 3);
> static assert(fn(3)[1] == 3);
> static assert(fn(3)[2] == 3);

This works as well with interspersed constant.
uint[] itrspary(uint n)
{
  return [1, n, 3];
}

static assert(itrspary(7).length == 3);
static assert(itrspary(7)[0] == 1);
static assert(itrspary(7)[1] == 7);
static assert(itrspary(7)[2] == 3);
December 08, 2016
On Thursday, 8 December 2016 at 02:22:31 UTC, Stefan Koch wrote:
> On Thursday, 8 December 2016 at 01:52:13 UTC, Stefan Koch wrote:
>>
>> Array-literals with ctfe-known Variables in them are now supported;
>> uint[] fn(uint n)
>> {
>>   return [n, n, n];
>> }
>>
>> static assert(fn(3).length == 3);
>> static assert(fn(3)[0] == 3);
>> static assert(fn(3)[1] == 3);
>> static assert(fn(3)[2] == 3);
>
> This works as well with interspersed constant.
> uint[] itrspary(uint n)
> {
>   return [1, n, 3];
> }
>
> static assert(itrspary(7).length == 3);
> static assert(itrspary(7)[0] == 1);
> static assert(itrspary(7)[1] == 7);
> static assert(itrspary(7)[2] == 3);

I fixed the bug in switchStatements (I think.).
It was not a mismatch with dmd's sorting but rather.
My own sloppy block-exit detection.
Using the dmd BlockExit visitor fixed it.
December 08, 2016
I found the biggest performance bottleneck in newCTFE!

oldCtfe :
[root@localhost dmd]# time src/dmd -c ctfeTest.d testStringEq.d testStringLength.d testStruct.d testMultipleArrayLiterals.d

real	0m0.026s
user	0m0.020s
sys	0m0.003s

[root@localhost dmd]# time src/dmd -c ctfeTest.d testStringEq.d testStringLength.d testStruct.d testMultipleArrayLiterals.d -bc-ctfe

real	0m0.025s
user	0m0.020s
sys	0m0.003s

After Fixing

[root@localhost dmd]# time src/dmd -c ctfeTest.d testStringEq.d testStringLength.d testStruct.d testMultipleArrayLiterals.d  -bc-ctfe

real	0m0.019s
user	0m0.017s
sys	0m0.000s


December 08, 2016
Any reason for the infinite depth update posting style?
I would have loved to see each update to be a child of the root post with its own discussions tree.
Currently, the posts are quite unreadable in tree view (thunderbird).


On 2016-10-31 14:29, Stefan Koch wrote:
> Hi Guys, since I got a few complaints about giving minor status updates
> in the announce group, I am opening this thread.
>



December 08, 2016
On Thursday, 8 December 2016 at 19:13:23 UTC, Stefan Koch wrote:
> I found the biggest performance bottleneck in newCTFE!
>
> oldCtfe :
> [root@localhost dmd]# time src/dmd -c ctfeTest.d testStringEq.d testStringLength.d testStruct.d testMultipleArrayLiterals.d
>
> real	0m0.026s
> user	0m0.020s
> sys	0m0.003s
>
Before Fixing :
> [root@localhost dmd]# time src/dmd -c ctfeTest.d testStringEq.d testStringLength.d testStruct.d testMultipleArrayLiterals.d -bc-ctfe
>
> real	0m0.025s
> user	0m0.020s
> sys	0m0.003s
>
> After Fixing
>
> [root@localhost dmd]# time src/dmd -c ctfeTest.d testStringEq.d testStringLength.d testStruct.d testMultipleArrayLiterals.d  -bc-ctfe
>
> real	0m0.019s
> user	0m0.017s
> sys	0m0.000s

Please note that the ctfe interpretation, with newCTFE, inside the frontend takes 10% of the compilation time
Whereas with the old interpreter it takes 50%
December 08, 2016
I just wanted to post another performance comparision that does not test dmd's memory allocator more then anything else :)

[root@localhost dmd]# time src/dmd -c testSettingArrayLength.d  -bc-ctfe
2147385345u
536821761u
4294639619u

real	0m0.114s
user	0m0.110s
sys	0m0.003s
[root@localhost dmd]# time src/dmd -c testSettingArrayLength.d
2147385345u
536821761u
4294639619u

real	0m0.921s
user	0m0.843s
sys	0m0.077s

Results are obtained running the following code
uint MakeInitAndSumArray(uint length)
{
    uint result;
    uint[] arr;
    arr.length = length;

    while(length--)
    {
        arr[length] = length;
    }
    foreach(e;arr)
    {
      result += e;
    }

    return result;
}

pragma(msg, MakeInitAndSumArray(ushort.max));
pragma(msg, MakeInitAndSumArray(ushort.max/2));
pragma(msg, MakeInitAndSumArray(ushort.max*2));