Jump to page: 1 2
Thread overview
DMD as a library - recap, next steps
Jun 16, 2020
RazvanN
Jun 16, 2020
evilrat
Jun 16, 2020
Stefan Koch
Jun 16, 2020
RazvanN
Jun 16, 2020
Jacob Carlborg
Jun 16, 2020
RazvanN
Jun 16, 2020
Jacob Carlborg
Jun 17, 2020
RazvanN
Jun 17, 2020
Jacob Carlborg
Jun 16, 2020
Jacob Carlborg
Jun 16, 2020
WebFreak001
Jun 16, 2020
RazvanN
Jun 16, 2020
Jacob Carlborg
Jun 17, 2020
RazvanN
Jun 17, 2020
Jacob Carlborg
June 16, 2020
A few years ago, I worked on a project to refactor the dmd codebase so that it becomes easier to be used as a library. This has the advantage that tools that use dmd-as-a-lib will rely on the latest working compiler. From that perspective, I made several PRs:

1. Template the parser to remove reliance on modules that implement semantic analysis: https://github.com/dlang/dmd/pull/6625

2. Create ASTBase, an AST family that contains the minimum information to separate the parser from the rest of the compiler: https://github.com/dlang/dmd/pull/6836

3. A series of PRs to pull out all the semantic methods from AST nodes into visitors or free functions so that AST nodes in the compiler will replace the ASTBase family (which is essentially duplicated code):
    https://github.com/dlang/dmd/pull/7031
    https://github.com/dlang/dmd/pull/7048
    https://github.com/dlang/dmd/pull/7049
    https://github.com/dlang/dmd/pull/7114
    https://github.com/dlang/dmd/pull/7119
    https://github.com/dlang/dmd/pull/7122

4. I created visitors for semantic time analysis: https://github.com/dlang/dmd/pull/7411

At that point I've hit some bugs that prevented me from moving forward. I started fixing them and after a while I moved to another project which left the dmd as a library project in a half baked state.

What remains to be done is to:

- Further strip the AST nodes of functions that require semantic analysis so that we can remove the code duplication in ASTBase
- Refactor dmd to offer a decent interface. Sometimes people argue against this point by saying that some changes affect the compilation time of the compiler. Honestly, it takes 5 seconds to compile dmd, even if we double that time I say it is worth it if we gain a decent interface for dmd-as-a-lib. On this point, me and Edi Staniloiu have been working with a bachelor student to see what interface is required to be able to use dmd-as-a-lib in tools in the ecosystem (like DCD).
- Add some visitors that make it easy for 3rd party tools to use compiler features.

In the mean time there were some PRs that regressed the state of dmd as a lib, PRs such as this one: https://github.com/dlang/dmd/pull/9010 . That PR makes it impossible for someone to override the type semantic. This is moving things in the exact opposite direction of every PR that was showcased on this post and is a showstopper for this: https://github.com/dlang/dmd/pull/11265 moving forward in a consistent way.

Me and Edi are in the position where we can use bachelor students to do the heavy-lifting on helping this project to cross the finish line, however, it is critical that the Dlang Foundation leadership has a clear direction/vision on this.

So, how do we move forward?

Cheers,
RazvanN
June 16, 2020
On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:
> A few years ago, I worked on a project to refactor the dmd codebase so that it becomes easier to be used as a library. This has the advantage that tools that use dmd-as-a-lib will rely on the latest working compiler.

Nice progress so far, but I would also like to note that there was some annoying change some time last winter that puts whole compiler configuration in a private functions.

I know this is probably better addressed in dub instead, but it is just way less maintained than dmd.

Simply put, this dmd-as-a-library can be incredibly useful in doing custom code transformation step by serving as dmd proxy, however dub probing is too strict and relies on CTFE introspection results in std output, and that changes I mentioned currently forces one to copy-paste ~500 lines of code for option parsing and configuration, so it can be recognized by dub as actual compiler.
(last time I checked it in March)

> On this point, me and Edi Staniloiu have been working with a bachelor student to see what interface is required to be able to use dmd-as-a-lib in tools in the ecosystem (like DCD).
> - Add some visitors that make it easy for 3rd party tools to use compiler features.
>

This would be awesome, it can even end the "IDE support sucks" complains.
If dmd will be able to do recompile in memory on code update to provide compilation database updates in under 500ms it will be usable enough to use in LSP's and other productivity tooling while actually supporting every language feature including template instance body inspection and UFCS.


June 16, 2020
On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:

> - Further strip the AST nodes of functions that require semantic analysis so that we can remove the code duplication in ASTBase
The AST without SemA is useless.


> - Refactor dmd to offer a decent interface. Sometimes people argue against this point by saying that some changes affect the compilation time of the compiler. Honestly, it takes 5 seconds to compile dmd, even if we double that time I say it is worth it if we gain a decent interface for dmd-as-a-lib.

No if we double that time, it's a massive change to the rapid
development we can have right now.
Try compiling DMD with LDC and you'll see what I mean

> In the mean time there were some PRs that regressed the state of dmd as a lib, PRs such as this one: https://github.com/dlang/dmd/pull/9010 . That PR makes it impossible for someone to override the type semantic.

Why would you ever need or want to OVERRIDE semantic.
If you override semantics than by definition you are no longer,
in the same language space.

> Me and Edi are in the position where we can use bachelor students to do the heavy-lifting on helping this project to cross the finish line, however, it is critical that the Dlang Foundation leadership has a clear direction/vision on this.
>
> So, how do we move forward?

First establish a usecase.
I would say that the ability to custom add static-analysis passes should be at the forefront, since that is what I would want a compiler as a library to do.

For Example, I want to be able to ask.

Forall functions in module a, give me the ones which call a function called malloc,
either directly or transitively, as far as you can see by the source code I gave you.
Forall functions which only exist as declarations (you don't have the body), create a list and cross reference it with the call-graph of the selection we've got before.

For that to work you need to be able to run time compiler until just before code-generation and you need to be able to walk that type/identifer-resolved tree in a useful manner.

Some kind of C plugin api would be preferred.

June 16, 2020
On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:
> On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:
>
>> - Further strip the AST nodes of functions that require semantic analysis so that we can remove the code duplication in ASTBase
> The AST without SemA is useless.

It is useful for serialization and source generation purposes at least. In other language communities, where they don't have the metaprogramming power of D, they do extensive source code generation, and most of the time you don't need much semantic for that (sometimes you just need a way to string interpolation on symbol names).

Also perhaps one of the goal is to completely replace the parser with something more fault-tolerant. For example see:

https://unallocated.com/blog/incremental-packrat-parsing-the-secret-to-fast-language-servers/

https://github.com/tree-sitter/tree-sitter
https://github.com/tree-sitter/tree-sitter/blob/master/docs/index.md

^^^
I suggest you watch the talks linked from the last page.

>
>> - Refactor dmd to offer a decent interface. Sometimes people argue against this point by saying that some changes affect the compilation time of the compiler. Honestly, it takes 5 seconds to compile dmd, even if we double that time I say it is worth it if we gain a decent interface for dmd-as-a-lib.
>
> No if we double that time, it's a massive change to the rapid
> development we can have right now.
> Try compiling DMD with LDC and you'll see what I mean

I agree. Though if a refactoring of this sort increases the compile-time by 2x we have serious problems elsewhere. I don't expect something like this to require more 1.15x in the worst case. And even then we should be able to find many ways to further decrease the compile-time to e.g. 0.85x.

>
>> In the mean time there were some PRs that regressed the state of dmd as a lib, PRs such as this one: https://github.com/dlang/dmd/pull/9010 . That PR makes it impossible for someone to override the type semantic.
>
> Why would you ever need or want to OVERRIDE semantic.
> If you override semantics than by definition you are no longer,
> in the same language space.

I suggest you try using a language with an excellent implementation of a LSP (language server) like C# or TypeScript. You'd be amazed how well it works. It is able to make sense of all kinds of broken code (e.g. giving you auto-completion for a function with a missing closing curly brace. Most of these things are completely impossible to do with the current rigid nature of the dmd frontend.

Of course Razvan, Edi and Cristian are in better position to answer, but I think that the main idea is that they don't want to change the language, but instead they want to be able to plug code in more parts of the compilation pipeline so the LSP can be notified when e.g. the compiler is visiting an overload set. For example, the compiler may stop looking when it finds the best overload match, while for the LSP you want to display overloads, as the user may have made a typo and so on. (Please don't read too much into this example, I may have made a mistake, but the general idea is that they need to be able to extract more info from the frontend.)

>> Me and Edi are in the position where we can use bachelor students to do the heavy-lifting on helping this project to cross the finish line, however, it is critical that the Dlang Foundation leadership has a clear direction/vision on this.
>>
>> So, how do we move forward?
>
> First establish a usecase.
> I would say that the ability to custom add static-analysis passes should be at the forefront, since that is what I would want a compiler as a library to do.

The use case implementing all of this API:

https://microsoft.github.io/language-server-protocol/specifications/specification-current/

Specifically, take a look at the "Language Features" section, e.g.:
https://microsoft.github.io/language-server-protocol/specifications/specification-current/#textDocument_completion

> For Example, I want to be able to ask.
>
> Forall functions in module a, give me the ones which call a function called malloc,
> either directly or transitively, as far as you can see by the source code I gave you.
> Forall functions which only exist as declarations (you don't have the body), create a list and cross reference it with the call-graph of the selection we've got before.
>
> For that to work you need to be able to run time compiler until just before code-generation and you need to be able to walk that type/identifer-resolved tree in a useful manner.
>
> Some kind of C plugin api would be preferred.

The set of C developers that want to write a D langauge server is much smaller than the set of D developers that want to do the same :D

But I agree on the general point that a well-defined and versioned API is much needed, just like it sucks when changes frontend break LDC and GDC.
June 16, 2020
On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:
> On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:
>
>> - Further strip the AST nodes of functions that require semantic analysis so that we can remove the code duplication in ASTBase
> The AST without SemA is useless.
>
It is not useless, the fact that libdparse exists and it is used as a standalone library is proof of that.
>
>> - Refactor dmd to offer a decent interface. Sometimes people argue against this point by saying that some changes affect the compilation time of the compiler. Honestly, it takes 5 seconds to compile dmd, even if we double that time I say it is worth it if we gain a decent interface for dmd-as-a-lib.
>
> No if we double that time, it's a massive change to the rapid
> development we can have right now.
> Try compiling DMD with LDC and you'll see what I mean
>
The point here was that minor performance regressions that come out from refactorings should not be an obstacle if it offers a clear benefit from a dmd-as-a-lib standpoint.

>> In the mean time there were some PRs that regressed the state of dmd as a lib, PRs such as this one: https://github.com/dlang/dmd/pull/9010 . That PR makes it impossible for someone to override the type semantic.
>
> Why would you ever need or want to OVERRIDE semantic.
> If you override semantics than by definition you are no longer,
> in the same language space.
>
>> Me and Edi are in the position where we can use bachelor students to do the heavy-lifting on helping this project to cross the finish line, however, it is critical that the Dlang Foundation leadership has a clear direction/vision on this.
>>
>> So, how do we move forward?
>
> First establish a usecase.
> I would say that the ability to custom add static-analysis passes should be at the forefront, since that is what I would want a compiler as a library to do.
>
> For Example, I want to be able to ask.
>
> Forall functions in module a, give me the ones which call a function called malloc,
> either directly or transitively, as far as you can see by the source code I gave you.
> Forall functions which only exist as declarations (you don't have the body), create a list and cross reference it with the call-graph of the selection we've got before.
>
> For that to work you need to be able to run time compiler until just before code-generation and you need to be able to walk that type/identifer-resolved tree in a useful manner.
>
Analyzing the AST is one scenario, but there are other situations:

1. You want to extend the language with some feature
2. Semantic analysis mutates the AST in a way that makes it impossible for you to reason about what was there in the first place. One example here is an auto-complete tool that needs to be able to analyze incomplete code; if you do not override specific semantic methods, by the time you analyze the AST you will have error nodes that prevent you from doing any work.
3. You want to drop certain semantic passes because your tool does not necessitate them.

Ideally we would offer maximum flexibility with dmd-as-a-lib. Currently, you are forced to run the full semantic analysis pass and hope for the best.

> Some kind of C plugin api would be preferred.

I don't understand what you are reffering to.


June 16, 2020
On Tuesday, 16 June 2020 at 09:00:55 UTC, Petar Kirov [ZombineDev] wrote:
> On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:
>> On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:
>>
>>> - Further strip the AST nodes of functions that require semantic analysis so that we can remove the code duplication in ASTBase
>> The AST without SemA is useless.

BTW, I really don't get this obsession with bad OOP design in DMD. The AST classes should be just pure data. SemA should functions that operate on this data. I can't find any good reason why one would put the SemA *implementation* inside the AST classes. Just imagine if the constructor of the FunctionDeclaration directly outputted x86 assembly :D

So this is why I think all logic should be moved from the AST classes to other functions/classes/modules.



June 16, 2020
On Tuesday, 16 June 2020 at 09:00:55 UTC, Petar Kirov [ZombineDev] wrote:
> On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:
>> [...]

> It is useful for serialization and source generation purposes at least. In other language communities, where they don't have the metaprogramming power of D, they do extensive source code generation, and most of the time you don't need much semantic for that (sometimes you just need a way to string interpolation on symbol names).
>

Indeed!

> Also perhaps one of the goal is to completely replace the parser with something more fault-tolerant. For example see:
>
> https://unallocated.com/blog/incremental-packrat-parsing-the-secret-to-fast-language-servers/
>
> https://github.com/tree-sitter/tree-sitter
> https://github.com/tree-sitter/tree-sitter/blob/master/docs/index.md
>
> ^^^
> I suggest you watch the talks linked from the last page.
>

That is exactly what I had in mind.

>> [...]
>
> I agree. Though if a refactoring of this sort increases the compile-time by 2x we have serious problems elsewhere. I don't expect something like this to require more 1.15x in the worst case. And even then we should be able to find many ways to further decrease the compile-time to e.g. 0.85x.
>

You are correct. I was exaggerating for the sake of the argument. What I meant was: we are extremely fast, but we are lacking a compiler interface. Small performance regressions are acceptable if they offer a guaranteed benefit with regards to defining a good interface.

>
> I suggest you try using a language with an excellent implementation of a LSP (language server) like C# or TypeScript. You'd be amazed how well it works. It is able to make sense of all kinds of broken code (e.g. giving you auto-completion for a function with a missing closing curly brace. Most of these things are completely impossible to do with the current rigid nature of the dmd frontend.
>
> Of course Razvan, Edi and Cristian are in better position to answer, but I think that the main idea is that they don't want to change the language, but instead they want to be able to plug code in more parts of the compilation pipeline so the LSP can be notified when e.g. the compiler is visiting an overload set. For example, the compiler may stop looking when it finds the best overload match, while for the LSP you want to display overloads, as the user may have made a typo and so on. (Please don't read too much into this example, I may have made a mistake, but the general idea is that they need to be able to extract more info from the frontend.)
>

You are entirely right. We have replaced all uses of libdparse and
other tools that mimic semantic analysis in DCD with dmd as a lib and
it works, but it requires that we override current semantic analysis that
is done for CallExps to be able to cope with semantic failures on incomplete code.

>> [...]
>
> The use case implementing all of this API:
>
> https://microsoft.github.io/language-server-protocol/specifications/specification-current/
>
> Specifically, take a look at the "Language Features" section, e.g.:
> https://microsoft.github.io/language-server-protocol/specifications/specification-current/#textDocument_completion
>
>> [...]
>
> The set of C developers that want to write a D langauge server is much smaller than the set of D developers that want to do the same :D
>
> But I agree on the general point that a well-defined and versioned API is much needed, just like it sucks when changes frontend break LDC and GDC.

Thanks for your reply. I feel that we are on the same page on this.
June 16, 2020
On Tuesday, 16 June 2020 at 09:15:24 UTC, RazvanN wrote:
>
> Thanks for your reply. I feel that we are on the same page on this.

My pleasure, please keep up the good work on this amazing project!
June 16, 2020
On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:
> A few years ago, I worked on a project to refactor the dmd codebase so that it becomes easier to be used as a library. This has the advantage that tools that use dmd-as-a-lib will rely on the latest working compiler. From that perspective, I made several PRs:
>
> [...]
>
> Me and Edi are in the position where we can use bachelor students to do the heavy-lifting on helping this project to cross the finish line, however, it is critical that the Dlang Foundation leadership has a clear direction/vision on this.
>
> So, how do we move forward?
>
> Cheers,
> RazvanN

very cool, thanks for all the work! I think however increasing the compilation time significantly is a major no-go, at least when it affects all code, expect of course if it allows us to do faster things like incremental compilation.

For architecture ideas you might want to check out how Microsoft implemented their Roslyn compiler platform: https://github.com/dotnet/roslyn/wiki/Roslyn-Overview

Just AST and visitors is not the most useful on its own, we have libdparse for this already which works fine too. Much more interesting is the semantic analysis.

If there is one thing I would want exposed by dmd for anything, being completion, dynamic linting, navigation, etc., I would really really want a symbol API. Much like dsymbol one incremental database of all defined symbols (modules, types, aliases, parameters, template parameters, variables, etc.) with references, definitions, types (of variables and parameters), names and all traits information. This database would contain all symbols in the entire compilation unit, be aware of scopes at any given point and be able to incrementally update by adding/removing or changing files.

The semantic analysis needs to be incremental here too though, so symbols would need some kind of dependency graph for things using mixin or templates. Also it would be difficult to extract information from scopes like `version (Foo)` where version is not Foo.

But just this symbols database and some APIs to query it per location, per file or per symbol name would be enough to implement nearly all features a user would expect from tooling and more.

Otherwise raw token access and AST visitors is all you really need to implement the rest like formatting, highlighting, static linting and refactorings. It's important that you can somehow recover the whitespaces and comments from tokens for refactoring and formatting though!

For other use-cases, like a REPL, exposing APIs to the executable generator would also be cool.

So if we have a symbols API I'm happy and I think that will be the goal of any DCD replacement program too :p

Keep up the good work on this!
June 16, 2020
On Tuesday, 16 June 2020 at 08:16:25 UTC, Stefan Koch wrote:

> Why would you ever need or want to OVERRIDE semantic.
> If you override semantics than by definition you are no longer,
> in the same language space.

It's useful to be able to do. It's not up the compiler developers to come up with every single use case. And just because they cannot come up with a use case doesn't mean there aren't any good ones. If a language or a library could only be used for what the creator could think of it would probably not be very useful at all.

I have a tool that has made some minor changes to the semantic phase to allow to infer attributes for all functions. Then it outputs all the attributes that can be attached to all functions of a given file.

--
/Jacob Carlborg

« First   ‹ Prev
1 2