CTFE Status (page 15)

On Wednesday, 7 December 2016 at 18:00:40 UTC, Stefan Koch wrote: > I have an update about ctfe performance. > I wondered for quite a while why newCTFE had a 5 millisecond overhead, when compared the old interpreter. > Since on my charts it had comparable or better stats. > I finally figured it out. > > Because the interpreter is supposed to be ctfe-able itself it uses a gc-allocated stack. > And that in turn call envokes the gc. > The mark phase of the garbage-collector matches up exactly with the extra time taken. > > I have no idea, why this happens though since the GC is supposed to be disabled. It turns out that almost classes in dmd that are em-placed. Whereas by bytecode visitor uses the regular new, because it is fairly sizable. The gc is much more visible. Although that does still not explain why GC.disable seems to have no effect.

December 07, 2016

Re: CTFE Status

Posted by Stefan Koch
in reply to Stefan Koch

Permalink

Stefan Koch

Posted in reply to Stefan Koch

Permalink

On Wednesday, 7 December 2016 at 18:50:20 UTC, Stefan Koch wrote:
> On Wednesday, 7 December 2016 at 18:00:40 UTC, Stefan Koch wrote:
>> I have an update about ctfe performance.
>> I wondered for quite a while why newCTFE had a 5 millisecond overhead, when compared the old interpreter.
>> Since on my charts it had comparable or better stats.
>> I finally figured it out.
>>
>> Because the interpreter is supposed to be ctfe-able itself it uses a gc-allocated stack.
>> And that in turn call envokes the gc.
>> The mark phase of the garbage-collector matches up exactly with the extra time taken.
>>
>> I have no idea, why this happens though since the GC is supposed to be disabled.
>
> It turns out that almost classes in dmd that are em-placed.
> Whereas by bytecode visitor uses the regular new, because it is fairly sizable. The gc is much more visible.
>
> Although that does still not explain why GC.disable seems to have no effect.

After a few more iterations the observed slow down seems to disappear.
I am suspicious.

I did the change in continue and break handling. Now ForStatements should be handled correctly for the most part. I think there could still be a bug if you do labeled continue on a for-statement. I will do a change later that takes that into account as well. Then we should be all set to implement call handling I believe.

On Wednesday, 7 December 2016 at 20:56:08 UTC, Stefan Koch wrote: > I did the change in continue and break handling. > > Now ForStatements should be handled correctly for the most part. > I think there could still be a bug if you do labeled continue on a for-statement. > I will do a change later that takes that into account as well. > > Then we should be all set to implement call handling I believe. Array-literals with ctfe-known Variables in them are now supported; uint[] fn(uint n) { return [n, n, n]; } static assert(fn(3).length == 3); static assert(fn(3)[0] == 3); static assert(fn(3)[1] == 3); static assert(fn(3)[2] == 3);

On Thursday, 8 December 2016 at 01:52:13 UTC, Stefan Koch wrote: > > Array-literals with ctfe-known Variables in them are now supported; > uint[] fn(uint n) > { > return [n, n, n]; > } > > static assert(fn(3).length == 3); > static assert(fn(3)[0] == 3); > static assert(fn(3)[1] == 3); > static assert(fn(3)[2] == 3); This works as well with interspersed constant. uint[] itrspary(uint n) { return [1, n, 3]; } static assert(itrspary(7).length == 3); static assert(itrspary(7)[0] == 1); static assert(itrspary(7)[1] == 7); static assert(itrspary(7)[2] == 3);

On Thursday, 8 December 2016 at 02:22:31 UTC, Stefan Koch wrote: > On Thursday, 8 December 2016 at 01:52:13 UTC, Stefan Koch wrote: >> >> Array-literals with ctfe-known Variables in them are now supported; >> uint[] fn(uint n) >> { >> return [n, n, n]; >> } >> >> static assert(fn(3).length == 3); >> static assert(fn(3)[0] == 3); >> static assert(fn(3)[1] == 3); >> static assert(fn(3)[2] == 3); > > This works as well with interspersed constant. > uint[] itrspary(uint n) > { > return [1, n, 3]; > } > > static assert(itrspary(7).length == 3); > static assert(itrspary(7)[0] == 1); > static assert(itrspary(7)[1] == 7); > static assert(itrspary(7)[2] == 3); I fixed the bug in switchStatements (I think.). It was not a mismatch with dmd's sorting but rather. My own sloppy block-exit detection. Using the dmd BlockExit visitor fixed it.

I found the biggest performance bottleneck in newCTFE! oldCtfe : [root@localhost dmd]# time src/dmd -c ctfeTest.d testStringEq.d testStringLength.d testStruct.d testMultipleArrayLiterals.d real 0m0.026s user 0m0.020s sys 0m0.003s [root@localhost dmd]# time src/dmd -c ctfeTest.d testStringEq.d testStringLength.d testStruct.d testMultipleArrayLiterals.d -bc-ctfe real 0m0.025s user 0m0.020s sys 0m0.003s After Fixing [root@localhost dmd]# time src/dmd -c ctfeTest.d testStringEq.d testStringLength.d testStruct.d testMultipleArrayLiterals.d -bc-ctfe real 0m0.019s user 0m0.017s sys 0m0.000s

Any reason for the infinite depth update posting style? I would have loved to see each update to be a child of the root post with its own discussions tree. Currently, the posts are quite unreadable in tree view (thunderbird). On 2016-10-31 14:29, Stefan Koch wrote: > Hi Guys, since I got a few complaints about giving minor status updates > in the announce group, I am opening this thread. >

On Thursday, 8 December 2016 at 19:13:23 UTC, Stefan Koch wrote: > I found the biggest performance bottleneck in newCTFE! > > oldCtfe : > [root@localhost dmd]# time src/dmd -c ctfeTest.d testStringEq.d testStringLength.d testStruct.d testMultipleArrayLiterals.d > > real 0m0.026s > user 0m0.020s > sys 0m0.003s > Before Fixing : > [root@localhost dmd]# time src/dmd -c ctfeTest.d testStringEq.d testStringLength.d testStruct.d testMultipleArrayLiterals.d -bc-ctfe > > real 0m0.025s > user 0m0.020s > sys 0m0.003s > > After Fixing > > [root@localhost dmd]# time src/dmd -c ctfeTest.d testStringEq.d testStringLength.d testStruct.d testMultipleArrayLiterals.d -bc-ctfe > > real 0m0.019s > user 0m0.017s > sys 0m0.000s Please note that the ctfe interpretation, with newCTFE, inside the frontend takes 10% of the compilation time Whereas with the old interpreter it takes 50%

I just wanted to post another performance comparision that does not test dmd's memory allocator more then anything else :) [root@localhost dmd]# time src/dmd -c testSettingArrayLength.d -bc-ctfe 2147385345u 536821761u 4294639619u real 0m0.114s user 0m0.110s sys 0m0.003s [root@localhost dmd]# time src/dmd -c testSettingArrayLength.d 2147385345u 536821761u 4294639619u real 0m0.921s user 0m0.843s sys 0m0.077s Results are obtained running the following code uint MakeInitAndSumArray(uint length) { uint result; uint[] arr; arr.length = length; while(length--) { arr[length] = length; } foreach(e;arr) { result += e; } return result; } pragma(msg, MakeInitAndSumArray(ushort.max)); pragma(msg, MakeInitAndSumArray(ushort.max/2)); pragma(msg, MakeInitAndSumArray(ushort.max*2));

Forums