Thread overview
[SAOC 2023] dfmt rewrite - Weekly update #1
Sep 22, 2023
Prajwal S N
Sep 22, 2023
M.M.
Sep 22, 2023
Imperatorn
Sep 24, 2023
deadalnix
Sep 25, 2023
RazvanN
Sep 25, 2023
Tim
Sep 26, 2023
RazvanN
Sep 26, 2023
RazvanN
Sep 25, 2023
max haughton
Sep 27, 2023
sighoya
September 22, 2023

Hi everyone,

For SAOC 2023, I'm working on refactoring dfmt to use the AST from DMD-as-a-library instead of libdparse.

The past week has been very interesting. I got up to speed with the dfmt codebase, and managed to do a 1-to-1 port of the lexer dependency from libdparse to DMD-as-a-library. Most parts were pretty straightforward, and the bulk of the work was replacing every tok!"<token>" instance with TOK.<token> and making sure the token coming from DMD was the same as what was previously being used. So far so good!

You can see the draft PR tracking the work here.

Going forward, my mentor and I have decided that it would be impractical to try and replace the parser directly, for multiple reasons:

  • It's a lot of work to replace the parser and use the DMD AST instead of libdparse's, and all of this work will happen without a working version of dfmt. If, at the end of this, dfmt is broken or refuses to compile, it could very well mean that all that effort went down the drain.
  • Doing a brute force replacement of the parser will prevent us from testing the transformation passes in dfmt individually, and also brings us back to the point above.

Hence, we've decided to do an incremental rewrite of the files that use the parser, initially with no passes (just to ensure the AST is being built in the first place), and then adding each pass along with relevant unit tests.

September 22, 2023

On Friday, 22 September 2023 at 07:12:02 UTC, Prajwal S N wrote:

>

Hi everyone,

For SAOC 2023, I'm working on refactoring dfmt to use the AST from DMD-as-a-library instead of libdparse.

....

Good luck in the project!

September 22, 2023

On Friday, 22 September 2023 at 07:12:02 UTC, Prajwal S N wrote:

>

Hi everyone,

For SAOC 2023, I'm working on refactoring dfmt to use the AST from DMD-as-a-library instead of libdparse.

[...]

Superb! We really really need people working on tooling ❤️

September 24, 2023

On Friday, 22 September 2023 at 07:12:02 UTC, Prajwal S N wrote:

>

Hi everyone,

For SAOC 2023, I'm working on refactoring dfmt to use the AST from DMD-as-a-library instead of libdparse.

The past week has been very interesting. I got up to speed with the dfmt codebase, and managed to do a 1-to-1 port of the lexer dependency from libdparse to DMD-as-a-library. Most parts were pretty straightforward, and the bulk of the work was replacing every tok!"<token>" instance with TOK.<token> and making sure the token coming from DMD was the same as what was previously being used. So far so good!

You can see the draft PR tracking the work here.

Going forward, my mentor and I have decided that it would be impractical to try and replace the parser directly, for multiple reasons:

  • It's a lot of work to replace the parser and use the DMD AST instead of libdparse's, and all of this work will happen without a working version of dfmt. If, at the end of this, dfmt is broken or refuses to compile, it could very well mean that all that effort went down the drain.
  • Doing a brute force replacement of the parser will prevent us from testing the transformation passes in dfmt individually, and also brings us back to the point above.

Hence, we've decided to do an incremental rewrite of the files that use the parser, initially with no passes (just to ensure the AST is being built in the first place), and then adding each pass along with relevant unit tests.

I don't want to sound alarming or anything, but an AST is not really what you want to work with as a formatter.

The main reason is that you want to carry around a lot of information that the AST generally doesn't care about (comments, informations about layout, etc...). Consider the following example:

int a; // this is an int.
int b;

We immediately recognize that the comment refers to a. However:

int a;

// this is an int.
int b;

Now we recognize that the comment refers to b.

There is a lot of subtle semantic in there that is very hard to convey through an AST and are very hard to work with in that form.

There is a lot of prior art on the matter of code formatting, and the best explanation is probably the one from dartfmt's author: https://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/ . clang-format and many others do use this approach.

Shameless plug: sdfmt uses that approach. You can get it there: https://code.dlang.org/packages/sdc%3Asdfmt .

I understand this is probably out of scope to turn things around at this time, but holly hell, do we really need, as a community, to redo all the mistake other communities have done instead of learning from them, and, to add insult to injury, involve junior devs in that madness?

September 25, 2023

On Sunday, 24 September 2023 at 21:11:34 UTC, deadalnix wrote:

>

On Friday, 22 September 2023 at 07:12:02 UTC, Prajwal S N wrote:

>

Hi everyone,

For SAOC 2023, I'm working on refactoring dfmt to use the AST from DMD-as-a-library instead of libdparse.

The past week has been very interesting. I got up to speed with the dfmt codebase, and managed to do a 1-to-1 port of the lexer dependency from libdparse to DMD-as-a-library. Most parts were pretty straightforward, and the bulk of the work was replacing every tok!"<token>" instance with TOK.<token> and making sure the token coming from DMD was the same as what was previously being used. So far so good!

You can see the draft PR tracking the work here.

Going forward, my mentor and I have decided that it would be impractical to try and replace the parser directly, for multiple reasons:

  • It's a lot of work to replace the parser and use the DMD AST instead of libdparse's, and all of this work will happen without a working version of dfmt. If, at the end of this, dfmt is broken or refuses to compile, it could very well mean that all that effort went down the drain.
  • Doing a brute force replacement of the parser will prevent us from testing the transformation passes in dfmt individually, and also brings us back to the point above.

Hence, we've decided to do an incremental rewrite of the files that use the parser, initially with no passes (just to ensure the AST is being built in the first place), and then adding each pass along with relevant unit tests.

I don't want to sound alarming or anything, but an AST is not really what you want to work with as a formatter.

The main reason is that you want to carry around a lot of information that the AST generally doesn't care about (comments, informations about layout, etc...). Consider the following example:

int a; // this is an int.
int b;

We immediately recognize that the comment refers to a. However:

int a;

// this is an int.
int b;

Now we recognize that the comment refers to b.

There is a lot of subtle semantic in there that is very hard to convey through an AST and are very hard to work with in that form.

But dmd attaches comments to declarations in the parsing phase. You can get the comments from the AST. Why is that not a good way to represent the data?

>

There is a lot of prior art on the matter of code formatting, and the best explanation is probably the one from dartfmt's author: https://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/ . clang-format and many others do use this approach.

The article states the following:

"At this point, you’re probably thinking, “Wait. What’s so hard about formatting?” After you’ve parsed, can’t you just walk the AST and pretty-print it with some whitespace?

If every statement fit within the column limit of the page, yup. It’s a piece of cake. (I think that’s what gofmt does.) But our formatter also keeps your code within the line length limit. That means adding line breaks (or “splits” as the formatter calls them), and determining the best place to add those is famously hard."

So the fundamental problem comes from the fact that the author wants to do smart stuff with line/column limits. dfmt does not even have a line limit option, so that problem does not apply here. Even if we were to add it, that could be viewed as a separate project of transitioning dfmt to a better representation.

>

Shameless plug: sdfmt uses that approach. You can get it there: https://code.dlang.org/packages/sdc%3Asdfmt .

I understand this is probably out of scope to turn things around at this time, but holly hell, do we really need, as a community, to redo all the mistake other communities have done instead of learning from them, and, to add insult to injury, involve junior devs in that madness?

As things stand dfmt already exists and is implemented by using an AST to output the formatted code. Maybe that is not the ideal approach and could be improved, but that has nothing to do with this project. This project is all about replacing a dependency (libdparse) with another one (dmd-as-a-lib) so that when dmd is upgraded, dfmt ideally would not be affected.

Re-engineering the formatter such that it uses a different technique represents a separate project that is orthogonal to the current effort.

September 25, 2023

On Friday, 22 September 2023 at 07:12:02 UTC, Prajwal S N wrote:

>

Hi everyone,

For SAOC 2023, I'm working on refactoring dfmt to use the AST from DMD-as-a-library instead of libdparse.

[...]

Somewhat orthogonal but I'm tempted by the idea of all the "official" tooling being in with the compiler source. They should be together in the release, and one should never have to deal with them being out of sync.

September 25, 2023

On Monday, 25 September 2023 at 08:26:33 UTC, RazvanN wrote:

>

So the fundamental problem comes from the fact that the author wants to do smart stuff with line/column limits. dfmt does not even have a line limit option, so that problem does not apply here. Even if we were to add it, that could be viewed as a separate project of transitioning dfmt to a better representation.

I think dfmt has options for line limits:

    --soft_max_line_length
    --max_line_length
September 26, 2023

On Monday, 25 September 2023 at 16:12:51 UTC, Tim wrote:

>

On Monday, 25 September 2023 at 08:26:33 UTC, RazvanN wrote:

>

So the fundamental problem comes from the fact that the author wants to do smart stuff with line/column limits. dfmt does not even have a line limit option, so that problem does not apply here. Even if we were to add it, that could be viewed as a separate project of transitioning dfmt to a better representation.

I think dfmt has options for line limits:

    --soft_max_line_length
    --max_line_length

Yes, those refer to the length of a specific line, but the OP was also referring to constraints for the number of lines in a specific file, so when I said line limit I was referring to the limit of a line, rather to the limit of the number of lines in a file.

September 26, 2023

On Tuesday, 26 September 2023 at 07:50:59 UTC, RazvanN wrote:

>

Yes, those refer to the length of a specific line, but the OP was also referring to constraints for the number of lines in a specific file, so when I said line limit I was referring to the limit of a line, rather to the limit of the number of lines in a file.

*I was not referring to the limit of a line, rather ...

September 27, 2023

On Friday, 22 September 2023 at 07:12:02 UTC, Prajwal S N wrote:

>

[...]

Wishing you great success for the project.