June 16, 2020
On Tuesday, 16 June 2020 at 10:48:13 UTC, WebFreak001 wrote:

>
> very cool, thanks for all the work! I think however increasing the compilation time significantly is a major no-go, at least when it affects all code, expect of course if it allows us to do faster things like incremental compilation.
>
Incremental compilation is out of discussion when referring to dmd as a lib. For that we would need to start from scratch an implementation that takes care of all the various cases.

> For architecture ideas you might want to check out how Microsoft implemented their Roslyn compiler platform: https://github.com/dotnet/roslyn/wiki/Roslyn-Overview
>
> Just AST and visitors is not the most useful on its own, we have libdparse for this already which works fine too. Much more interesting is the semantic analysis.
>
But the whole point of dmd as a lib is to offer the ability to use semantic analysis by using semantic visitors. For example this PR
https://github.com/dlang/dmd/pull/11265 enables the ability to inherit semantic visitors that are used in the compiler and override or extend the functionality.

> If there is one thing I would want exposed by dmd for anything, being completion, dynamic linting, navigation, etc., I would really really want a symbol API. Much like dsymbol one incremental database of all defined symbols (modules, types, aliases, parameters, template parameters, variables, etc.) with references, definitions, types (of variables and parameters), names and all traits information. This database would contain all symbols in the entire compilation unit, be aware of scopes at any given point and be able to incrementally update by adding/removing or changing files.
>

One step in that direction: https://github.com/dlang/dmd/pull/11092
With that, you can provide a function that pulls out all the symbols in a particular scope. It is not incremental, though.

> The semantic analysis needs to be incremental here too though, so symbols would need some kind of dependency graph for things using mixin or templates. Also it would be difficult to extract information from scopes like `version (Foo)` where version is not Foo.
>
> But just this symbols database and some APIs to query it per location, per file or per symbol name would be enough to implement nearly all features a user would expect from tooling and more.
>
> Otherwise raw token access and AST visitors is all you really need to implement the rest like formatting, highlighting, static linting and refactorings. It's important that you can somehow recover the whitespaces and comments from tokens for refactoring and formatting though!
>
> For other use-cases, like a REPL, exposing APIs to the executable generator would also be cool.
>
> So if we have a symbols API I'm happy and I think that will be the goal of any DCD replacement program too :p
>
We are working on this and will soon publish that work.

> Keep up the good work on this!


June 16, 2020
On Tuesday, 16 June 2020 at 09:00:55 UTC, Petar Kirov [ZombineDev] wrote:

> Also perhaps one of the goal is to completely replace the parser with something more fault-tolerant. For example see:
>
> https://unallocated.com/blog/incremental-packrat-parsing-the-secret-to-fast-language-servers/
>
> https://github.com/tree-sitter/tree-sitter
> https://github.com/tree-sitter/tree-sitter/blob/master/docs/index.md
>
> ^^^
> I suggest you watch the talks linked from the last page.

I agree. The Eclipse Java compiler has quite a few "modes" in which you can run it. Only run the parser or run various levels of semantic analysis. It allows you to compile and run code that does not compile. For example, the signature of a functions compiles but not the body. The compiler can just replace the body with throwing an exception. If that functions is not called at runtime, it's perfectly fine.

--
/Jacob Carlborg


June 16, 2020
On Tuesday, 16 June 2020 at 09:07:19 UTC, RazvanN wrote:

> 2. Semantic analysis mutates the AST in a way that makes it impossible for you to reason about what was there in the first place.

Ideally the compiler should be modified to preserve all information through all phases of the compilation.

In my tool I had to make quite a bit of extra effort to preserve the information I needed. Basically walking the AST twice, once before running the semantic analyzer and once after.

--
/Jacob Carlborg
June 16, 2020
On Tuesday, 16 June 2020 at 04:13:10 UTC, RazvanN wrote:
> A few years ago, I worked on a project to refactor the dmd codebase so that it becomes easier to be used as a library. This has the advantage that tools that use dmd-as-a-lib will rely on the latest working compiler. From that perspective, I made several PRs:
>
> What remains to be done is to:
>- Refactor dmd to offer a decent interface. Sometimes people
> argue against this point by saying that some changes affect the compilation time of the compiler. Honestly, it takes 5 seconds to compile dmd, even if we double that time I say it is worth it if we gain a decent interface for dmd-as-a-lib. On this point, me and Edi Staniloiu have been working with a bachelor student to see what interface is required to be able to use dmd-as-a-lib in tools in the ecosystem (like DCD).
> - Add some visitors that make it easy for 3rd party tools to use compiler features.

I don't think API is not the most important part, as long as you can do what you need to do with the library. In my opinion there's a lot of functionality that is missing. For example, I've tried to use DMD as a library to do source code transformation. It falls very short in this area:

* AST nodes without locations
* Locations don't contain an end point/length
* Locations don't contain the buffer offset
* Indirect files are always read from disk. There's no option to make a full compilation purely from memory

> Me and Edi are in the position where we can use bachelor students to do the heavy-lifting on helping this project to cross the finish line, however, it is critical that the Dlang Foundation leadership has a clear direction/vision on this.

I agree, you need a full buy in from the leadership and the compiler developers. I think this will be very difficult.

> So, how do we move forward?

I think the way forward is to fork DMD to allow you to make the necessary changes as you see fit without having to bother with discussion and politics.

If you're lucky you can merge changes from upstream easily. Otherwise if you can't easily sync with upstream you can treat it as a separate compiler and evolve it on its own.

I've already done this, if you're interested in collaborating [1].

[1] https://github.com/jacob-carlborg/ddc

--
/Jacob Carlborg
June 17, 2020
On Tuesday, 16 June 2020 at 11:34:43 UTC, Jacob Carlborg wrote:
> On Tuesday, 16 June 2020 at 09:07:19 UTC, RazvanN wrote:
>
>> 2. Semantic analysis mutates the AST in a way that makes it impossible for you to reason about what was there in the first place.
>
> Ideally the compiler should be modified to preserve all information through all phases of the compilation.
>
> In my tool I had to make quite a bit of extra effort to preserve the information I needed. Basically walking the AST twice, once before running the semantic analyzer and once after.
>
> --
> /Jacob Carlborg

Would it have been easier if you had the ability override certain portions of the semantic analysis? What we are trying to push forward now is the ability to extend the semantic visitor and override/extend functionality as you wish, however, since some nodes have a lot of code that does semantic on them (CallExp ~1000 lines of code) you would have to copy paste a lot of code and modify only what interests you.
The advantage is that you perform semantic only once.
June 17, 2020
On Tuesday, 16 June 2020 at 12:05:11 UTC, Jacob Carlborg wrote:

> I don't think API is not the most important part, as long as you can do what you need to do with the library. In my opinion there's a lot of functionality that is missing. For example, I've tried to use DMD as a library to do source code transformation. It falls very short in this area:
>
> * AST nodes without locations

This typically happens when the compiler rewrites segments of the AST. It creates nodes, but doesn't bother with location since that code is not meant to be seen by any user.

> * Locations don't contain an end point/length
> * Locations don't contain the buffer offset
> * Indirect files are always read from disk. There's no option to make a full compilation purely from memory
>
There are all valid points. I mostly thought about dmd as a lib as a way to analyze the AST and output relevant information (e.g. DCD), not as a tool to modify source code, however, I was expecting that the hdrgen visitor would help with that.
>
> I agree, you need a full buy in from the leadership and the compiler developers. I think this will be very difficult.
>

Currently, the dmd as a library project is a state of limbo. We all agree that it needs to be pushed forward, but we don't know exactly how. This should be a good start for discussions, I guess.

>> So, how do we move forward?
>
> I think the way forward is to fork DMD to allow you to make the necessary changes as you see fit without having to bother with discussion and politics.
>
> If you're lucky you can merge changes from upstream easily. Otherwise if you can't easily sync with upstream you can treat it as a separate compiler and evolve it on its own.
>
> I've already done this, if you're interested in collaborating [1].
>

I am still hoping that we can work our way with the main compiler, but if things don't sort out, yes, collaborating on your fork is definitely the best alternative.

> [1] https://github.com/jacob-carlborg/ddc
>
> --
> /Jacob Carlborg


June 17, 2020
On 2020-06-17 06:58, RazvanN wrote:

> Would it have been easier if you had the ability override certain portions of the semantic analysis? 

Yes, definitely.

> What we are trying to push forward now is the ability to extend the semantic visitor and override/extend functionality as you wish, however, since some nodes have a lot of code that does semantic on them (CallExp ~1000 lines of code) you would have to copy paste a lot of code and modify only what interests you.
> The advantage is that you perform semantic only once.

If it's possible to design the interface so it's possible to add a customization points both before and after the original semantic implementation and give access it the original implementation it would be a good start. That would suffice for my needs in the current tool.

Inheritance is a good example of this:

class Foo
{
    void foo() {}
}

class Bar : Foo
{
    override foo()
    {
        // new code
        super.foo(); // call original implementation
        // new code
    }
}

The API doesn't need to be inheritance but it's a good example that shows that is possible to add new code both before and after the original implementation. And, you don't need to invoke the original implementation at all if you don't want to.

-- 
/Jacob Carlborg
June 17, 2020
On 2020-06-17 07:08, RazvanN wrote:

> This typically happens when the compiler rewrites segments of the AST. It creates nodes, but doesn't bother with location since that code is not meant to be seen by any user.

I wasn't referring to those cases. There are other cases I was thinking of:

@("bar")
void foo();

In the above code, the first location will point to, IIRC, the quote symbol (") and not the at sign (@). This is before running any semantic analysis.

> There are all valid points. I mostly thought about dmd as a lib as a way to analyze the AST and output relevant information (e.g. DCD), not as a tool to modify source code, however,

I don't know how far you've come with this progress and I'm certainly not an expert in this subject and I've only glanced at the LSP specification. But if the compiler cannot compile all files from memory the LSP server needs to store the source code in temporary files, or it needs to read the files directly from the project directory, the latter is what DCD is doing. In that case all files, perhaps except the one you're currently are editing in need to be saved. That is, you cannot have multiple unsaved files and get the correct result, you'll get stale data instead. Perhaps not that common. But you can definitely end up with multiple unsaved files after a global search-and-replace.

When it comes to start and end position and buffer offset, keep in mind that LSP does support modifying the source code with the "rename" feature [1]. If you don't know the buffer offset of a token, how would you know where to make the changes in the buffer? In this case you might get away with what the current compiler supports because this feature only applies to identifiers and you do know the length of an identifier. But don't you need to know where in the buffer the identifier start? I guess you could run the lexer again and count the number of bytes. But seems quite inefficient for something the compiler should support out of the box.

Another feature which seems to depend on start and end position, and possible buffer offset as well, is the "foldingRange" feature [2].

> I was expecting that the hdrgen  visitor would help with that.

I haven't looked at the implementation of hdrgen but if the lexer doesn't preserve the information how would hdrgen get access to it?

> I am still hoping that we can work our way with the main compiler

Yeah, I don't want to wait anymore.


[1] https://microsoft.github.io/language-server-protocol/specifications/specification-3-14/#textDocument_rename

[2] https://microsoft.github.io/language-server-protocol/specifications/specification-3-14/#textDocument_foldingRange

-- 
/Jacob Carlborg
1 2
Next ›   Last »