Thread overview
[SAoC] "Improving DMD as a Library" project thread
Oct 30
RazvanN
September 14
Hello!

My name is Mihaela Chirea and I am a 4th year Computer Engineering student at Politehnica University of Bucharest.

My interest in programming languages lead me to attending a D workshop at Ideas and Projects Workshop in 2019 and D Summer School this year, both held by Eduard Staniloiu and Razvan Nitu. Topics like meta-programming and design by introspection made me curious about how these concepts were implemented, thus increasing my interest in compilers.

For this year's edition of SAoC I will be working on improving dmd as a library, mainly by cleaning up the AST nodes by moving the semantic elements in more suitable places and creating new visitors when needed.
After studying the current state of dmd and identifying the parts I will be working on, I have decided on following this plan:

- Getting used to the structure of the compiler by working on the nodes that don't contain that much semantic information:

Milestone 1:
    - aliasthis.d
    - attrib.d
    - statement.d
    - aggregate.d
    - cond.d
    - staticcond.d
    - nspace.d

- Work on the files where semantic elements either appear often, or the functions in which they appear are used in many other places and therefore more files would need changes

Milestone 2:
    - mtype.d
    - dstruct.d
    - dclass.d
    - denum.d
    - dimport.d

Milestone 3:
    - dsymbol.d
    - expression.d
    - dmodule.d

Milestone 4
    - declaration.d
    - func.d
    - dtemplate.d

However, small changes to this plan may be necessary since other changes to the compiler may raise unexpected issues for this project.

For as much as time allows, and even after the end of this event, I would also work on creating a nice compiler interface, which would become much easier after this refactoring step.
I will be posting weekly updates regarding my progress on this project.

Thanks!
Mihaela
September 15
On Monday, 14 September 2020 at 12:47:42 UTC, Mihaela Chirea wrote:
> Hello!
>
> My name is Mihaela Chirea and I am a 4th year Computer Engineering student at Politehnica University of Bucharest.
>
> [...]
>
> Thanks!
> Mihaela

Good luck! It's a much needed improvement. I see you already joined the dlang slack, if you have any questions, #dmd is the place to go.
October 28
Hello!

During the first week of working on this project I received multiple suggestions regarding other possible tasks that could better benefit the community. I started working on them from the second week but never clearly changed the milestones.

So, based mostly on Jacob Carlborg's suggestions[1], here are the new plans:

Milestone 2:
- Add the start location to the AST nodes that lack this information
- Bring all the dmd as a library features already existing in the compiler under DMDLIB
- Add the token size
- Add the end location to all nodes

Some of the issues I would like to tackle during the next milestones are:
- Add the possibility of analyzing source code that is only in memory
- Reduce the global state
- Don't generate TypeInfo when not needed (as suggested here[2])

So far, I didn't get the chance to study these last topics in detail and I would appreciate any advice or opinions on how to start working on these tasks.

[1] https://github.com/dlang/dmd/pull/11788#issuecomment-698186023
[2] https://forum.dlang.org/post/iopxhnudlrgiqwjxzihe@forum.dlang.org
October 29
On Wednesday, 28 October 2020 at 19:08:01 UTC, Mihaela Chirea wrote:

> - Add the possibility of analyzing source code that is only in memory

I've started on this [1] (very rough workin in progress), if you need any pointers.

[1] https://github.com/jacob-carlborg/ddc/commit/cee56ce3750701d593dd619b27d28f18e4929e72

--
/Jacob Carlborg
October 30
On Thursday, 29 October 2020 at 08:50:46 UTC, Jacob Carlborg wrote:
> On Wednesday, 28 October 2020 at 19:08:01 UTC, Mihaela Chirea wrote:
>
>> - Add the possibility of analyzing source code that is only in memory
>
> I've started on this [1] (very rough workin in progress), if you need any pointers.
>
> [1] https://github.com/jacob-carlborg/ddc/commit/cee56ce3750701d593dd619b27d28f18e4929e72
>
> --
> /Jacob Carlborg

So right now the compiler, when given a .d/.di file it opens it, reads the contents and immediately lexes+parses the string after which the string is discarded. If the contents of the file need to be changed or reanalyzed, then the whole process needs to be started from scratch. What you are proposing Jacob is that the contents of the file are stored somewhere for ease of reuse. Is that right?

Cheers,
RazvanN
October 30
On Friday, 30 October 2020 at 06:03:40 UTC, RazvanN wrote:

> So right now the compiler, when given a .d/.di file it opens it, reads the contents and immediately lexes+parses the string after which the string is discarded. If the contents of the file need to be changed or reanalyzed, then the whole process needs to be started from scratch. What you are proposing Jacob is that the contents of the file are stored somewhere for ease of reuse. Is that right?

Kind of, or at least that's one of the reasons. The main idea is to separate the reading of a file from lexing and parsing it. We introduce a file manager (like a cache). The compiler will first look in the file manager if the file content if available, otherwise read from disk. The important part here is that it needs to be possible to pre-populate (and also update) the file manager with a file and its content. This would allow to do a full compilation from memory, without touching the disk.

The main reason for this is to be able to have the compiler receive file content data from other sources than disk. Two use cases for that would be:

* A LSP server (or similar tool) receiving the data from the network from an editor with unsaved files

* The data is already in memory, think a string literal. This is useful when writing tests

The other idea is, as you mentioned, to read from memory if the file has already been read from disk when reanalyzing. For example, if you want to get the tokens of an AST node, as the compiler looks like now, you probably need to re-lex the file to get the tokens. But you don't want to re-read the file from disk, because it might have been updated. For this use case, it's really important the compiler is reading the exact same file content as it did when it originally created the AST.

Note, there's already a file cache [1], but that will not fit. It it's not possible to pre-populate or update. It also splits up the file in lines. The existing file cache [1] could perhaps take advantage of the new file manager.

Keep in mind that this new file manager needs to be used, not only when reading D files, but also when reading files through import expressions.

[1] https://github.com/dlang/dmd/blob/master/src/dmd/filecache.d

--
/Jacob Carlborg