August 04, 2020
On 8/4/20 12:01 PM, Avrina wrote:
> On Tuesday, 4 August 2020 at 12:47:54 UTC, Steven Schveighoffer wrote:
>> On 8/4/20 4:44 AM, Stefan Koch wrote:
>>> newCTFE in the end, has a lot less benefit than I first assumed.
>>
>> If CTFE becomes way less expensive (in CPU usage and memory usage), then the template problem becomes easier as well, as we can do more CTFE to replace templates.
> 
> CTFE takes 500 ms for my project. It takes a total of about 10 seconds for the frontend to do everything, without -inline. The more significant problem is definitely templates, their expansion and how everything is processed back into a AST. Such as the case if your run CTFE, even "newCTFE" the result is still going to have to be expanded back into an AST, which is the core problem.

I have faced much different problems (mostly with memory consumption). Part of the problem of current CTFE is that it consumes lots of memory needlessly (at least that's my recollection). That is part (but not all) of the reason we use recursive templates instead of CTFE to solve our compile-time computation problems.

I don't know whether newCTFE fixes ALL the problems or not. But it will still help.

> 
>>> If I had had a good integrated profiler back then, and some of the code which I have access to now, I would probably never have started newCTFE, and would have tried to fix the template system itself.
>>
>> I still think newCTFE has worthwhile benefit, even if it alone cannot fix all the problems.
>>
>> I think newCTFE and type functions (along with Manu's ... DIP) would be a good combination to attack the problem.
>>
> 
> It doesn't help, it just introduces a whole lot of more complexity into the compiler as well. Effectively it creates a second compiler within the compiler. It is much more complicated than the current solution, and I don't imagine the speed up is going to be that much as, for my case it will only be able to reduce the build time by a maximum of 500 ms.

What is "It" that you are talking about?

I imagine part of the problem here is that CTFE is avoided because it doesn't deal with types and compile-time lists very well -- you have only one solution there. In which case, CTFE is not used in many places where it really is a natural fit.

So really, if CTFE was more usable, it could replace the vast template usage that is likely causing your build to be slower, and then an optimized CTFE becomes more relevant.

-Steve
August 04, 2020
On Monday, 3 August 2020 at 20:36:51 UTC, Stefan Koch wrote:
> Hello Folks,
>
> I am currently integrating the tracy profiler (https://github.com/wolfpld/tracy)
> with dmd.
>
> Such that instead of the profiler in druntime, tracy can be used for instrumented profiling.
>
>
> The current progress is here: https://github.com/dlang/dmd/compare/master...UplinkCoder:tracy?expand=1
>  there are still some teething problems, but I am confident that it will be useful soon.
>
> Cheers,
>
> Stefan


I agree with Webfreak on this. Tooling is very lacking in D and a profiler like tracy (along with its ecosystem of tools) is a very necessary addition for code optimization.

As a side note, one the most spoken about of browser dev tools is their JavaScript profiler.

August 04, 2020
On Tuesday, 4 August 2020 at 16:32:06 UTC, Steven Schveighoffer wrote:
> Part of the problem of current CTFE is that it consumes lots of memory needlessly (at least that's my recollection).

Yes, indeed, but worth noting with newer versions of the compiler *some* of it is freed in between instances and if you preallocate memory; write your ctfe code carefully enough, this can be very effectively managed with existing old ctfe.

I'm still for improving it, of course, just we can do nicer things with the existing thing if done carefully.
August 04, 2020
On Tuesday, 4 August 2020 at 16:32:06 UTC, Steven Schveighoffer wrote:
> On 8/4/20 12:01 PM, Avrina wrote:
>> On Tuesday, 4 August 2020 at 12:47:54 UTC, Steven Schveighoffer wrote:
>>> On 8/4/20 4:44 AM, Stefan Koch wrote:
>>>> newCTFE in the end, has a lot less benefit than I first assumed.
>>>
>>> If CTFE becomes way less expensive (in CPU usage and memory usage), then the template problem becomes easier as well, as we can do more CTFE to replace templates.
>> 
>> CTFE takes 500 ms for my project. It takes a total of about 10 seconds for the frontend to do everything, without -inline. The more significant problem is definitely templates, their expansion and how everything is processed back into a AST. Such as the case if your run CTFE, even "newCTFE" the result is still going to have to be expanded back into an AST, which is the core problem.
>
> I have faced much different problems (mostly with memory consumption). Part of the problem of current CTFE is that it consumes lots of memory needlessly (at least that's my recollection). That is part (but not all) of the reason we use recursive templates instead of CTFE to solve our compile-time computation problems.

If you use -lowmem and you still have memory issues, then it probably won't help. If it's CTFE. The reason so much memory is being used is because at times the compiler is working backwards. Even when it has a representation that is smaller and closer to what the final result should be, it has to convert it backwards into an AST.

> I don't know whether newCTFE fixes ALL the problems or not. But it will still help.

It will very likely introduce new problems as well.

>>>> If I had had a good integrated profiler back then, and some of the code which I have access to now, I would probably never have started newCTFE, and would have tried to fix the template system itself.
>>>
>>> I still think newCTFE has worthwhile benefit, even if it alone cannot fix all the problems.
>>>
>>> I think newCTFE and type functions (along with Manu's ... DIP) would be a good combination to attack the problem.
>>>
>> 
>> It doesn't help, it just introduces a whole lot of more complexity into the compiler as well. Effectively it creates a second compiler within the compiler. It is much more complicated than the current solution, and I don't imagine the speed up is going to be that much as, for my case it will only be able to reduce the build time by a maximum of 500 ms.
>
> What is "It" that you are talking about?
>
> I imagine part of the problem here is that CTFE is avoided because it doesn't deal with types and compile-time lists very well -- you have only one solution there. In which case, CTFE is not used in many places where it really is a natural fit.
>
> So really, if CTFE was more usable, it could replace the vast template usage that is likely causing your build to be slower, and then an optimized CTFE becomes more relevant.
>
> -Steve

It is newCTFE. I haven't looked at type functions, maybe they can help, but the larger issue is just how the compiler is structured. They can help in some cases, but not all. Part of what takes so much memory in my project and the build time, is in fact the runtime. I wonder how fast the compile time would be with -betterC but I'm not going to modify my project that much to figure out.

Have you looked at newCTFE and how it is being implemented and exactly what it affects? I think that's just overly positive optimism and not how it will end up working (without heavy modification, more than what is already being done in regards to newCTFE).

August 04, 2020
On Tuesday, 4 August 2020 at 16:11:38 UTC, Stefan Koch wrote:
> It's not quite that AST insertion is slow. It's the fact that you
> have to do semantic processing piece by piece, which is expensive.

Can you elaborate a bit on this statement? Is this problem specific to `dmd`'s non-lazy implementation of semantic analysis, D or templated statically typed languages in general?

Further is this problem related to the frontend only?

> If you have completely semantically processed nodes, linking them into the tree is quite painless.

What do you mean by "semantically processed nodes"?
August 04, 2020
On Tuesday, 4 August 2020 at 20:54:01 UTC, Per Nordlöw wrote:
> What do you mean by "semantically processed nodes"?

I guess I understand what you mean by "semantically processed nodes". The other question remains unanswered, though.
August 05, 2020
On Tuesday, 4 August 2020 at 20:55:11 UTC, Per Nordlöw wrote:
> On Tuesday, 4 August 2020 at 20:54:01 UTC, Per Nordlöw wrote:
>> What do you mean by "semantically processed nodes"?
>
> I guess I understand what you mean by "semantically processed nodes". The other question remains unanswered, though.

completely processed means just that.
That they are finished and can be just linked into the tree.

i.e. they know what they are.

In the example f(a, b).
That would mean we know the type, size, location and meaning of f, a, and b.
As well as anything f, a, and b might refer to.
Determining this can be a very expensive process,
just searching scopes upwards to know the meaning of a name can take very long,
if you have deep nesting (this happens in recursive templates for example.)

> Is this problem specific to `dmd`'s non-lazy implementation of semantic analysis, D or templated statically typed languages in general?

That's a tricky question. I don't know.
It is my strong believe however that templates (static polymorphism) as used in C++ or D,
is fundamentally hard to implement efficiently and fast.
Don't quote me on that unless I turn out to be right ;p
August 05, 2020
On Wednesday, 5 August 2020 at 10:34:55 UTC, Stefan Koch wrote:
> On Tuesday, 4 August 2020 at 20:55:11 UTC, Per Nordlöw wrote:
>> On Tuesday, 4 August 2020 at 20:54:01 UTC, Per Nordlöw wrote:
>
> i.e. they know what they are.
>
> In the example f(a, b).
> That would mean we know the type, size, location and meaning of f, a, and b.
> As well as anything f, a, and b might refer to.
> Determining this can be a very expensive process,
> just searching scopes upwards to know the meaning of a name can take very long,
> if you have deep nesting (this happens in recursive templates for example.)

How much of a part does non-templated nested function/classes/struct play in this?

And is it more about the scope where they are called or where they are defined in code?
August 05, 2020
On Wednesday, 5 August 2020 at 10:44:24 UTC, aberba wrote:
> On Wednesday, 5 August 2020 at 10:34:55 UTC, Stefan Koch wrote:
>> On Tuesday, 4 August 2020 at 20:55:11 UTC, Per Nordlöw wrote:
>>> On Tuesday, 4 August 2020 at 20:54:01 UTC, Per Nordlöw wrote:
>>
>> i.e. they know what they are.
>>
>> In the example f(a, b).
>> That would mean we know the type, size, location and meaning of f, a, and b.
>> As well as anything f, a, and b might refer to.
>> Determining this can be a very expensive process,
>> just searching scopes upwards to know the meaning of a name can take very long,
>> if you have deep nesting (this happens in recursive templates for example.)
>
> How much of a part does non-templated nested function/classes/struct play in this?
>
> And is it more about the scope where they are called or where they are defined in code?

It's all about the point of definition.
I doubt regular nested functions/aggregates ever have a nesting level of over 20 which is when this stuff starts to matter.
August 05, 2020
On Tuesday, 4 August 2020 at 09:09:41 UTC, WebFreak001 wrote:
> I'm curious though, why does this need to be a compiler change instead of a library addition?

Ah sorry, I missed this question.

The interface druntime has to the profiler does not expose all the information, tracy needs.
Because you can attach the tracy frontend to your program at runtime and see which functions currently need how long to execute.
Tracy needs location information to be provided with the measurement because, tracy is a realtime frame profiler.