Thread overview
Adding D support to Clang format
Apr 28, 2023
Zachary Yedidia
Apr 28, 2023
max haughton
Apr 29, 2023
Johan
Apr 29, 2023
Johan
Apr 29, 2023
Guillaume Piolat
Apr 29, 2023
max haughton
Apr 29, 2023
Zachary Yedidia
Apr 29, 2023
max haughton
Apr 29, 2023
Johan
Apr 29, 2023
max haughton
April 28, 2023

Clang format is a high quality auto-formatter developed by the LLVM project, mainly for use with C and C++, but it also supports Java, C#, JavaScript, JSON, Objective-C, Proto, TableGen, TextProto, and Verilog. I think it would be great to add support for D, and hopefully shouldn't be too difficult since it is a C-like language. Clang format has good support for aligning tokens (comments, assignments, etc), which is something important that existing D formatters (dfmt, sdfmt) don't support, and Clang format has many configuration options. Clang format also uses an incomplete parser, so it is relatively resilient to new language syntax and doesn't need a full D parser (and can format code with syntax errors for what it's worth).

I have started an implementation for D here: https://github.com/zyedidia/llvm-project/tree/clang-format-d, with some information about the implementation here (how to download/build/test): https://github.com/zyedidia/llvm-project/blob/clang-format-d/clang-format-d.md. There are still a number of things that are formatted incorrectly, so there is still work to do (I have added tests for the things I have noticed aren't working right). I'm not sure how much time I can dedicate to working on this, so any help would be appreciated. If we get to a point where this fork of clang-format has good support for D, then hopefully we could get these changes merged upstream into the LLVM project.

April 28, 2023

On Friday, 28 April 2023 at 23:30:37 UTC, Zachary Yedidia wrote:

>

incomplete parser

sdfmt already does that, and it is important. Worth noting that Amaury who did sdfmt is an LLVM guy.

>

[...]

Go for it, although I should say that nothing listed would convince me to switch.

April 29, 2023

On Friday, 28 April 2023 at 23:30:37 UTC, Zachary Yedidia wrote:

>

Clang format is a high quality auto-formatter developed by the LLVM project, mainly for use with C and C++, but it also supports Java, C#, JavaScript, JSON, Objective-C, Proto, TableGen, TextProto, and Verilog. I think it would be great to add support for D, and hopefully shouldn't be too difficult since it is a C-like language.

clang-format is indeed a godsent formatter.
I recently also read that it can do languages quite different from C, and hoped for D support in the future.
It will be very nice for people (like me) who work on mixed C++/D codebases.

>

I have started an implementation for D here: https://github.com/zyedidia/llvm-project/tree/clang-format-d, with some information about the implementation here (how to download/build/test): https://github.com/zyedidia/llvm-project/blob/clang-format-d/clang-format-d.md.

Great you are picking this up and leading the effort!
Hope to be able to help out.

Tip: already start discussing on the LLVM maillist of how to upstream the work, to avoid have to rework certain pieces later. Obviously adhere to the LLVM writing style (which you already do), but there may be some other concerns that are less obvious.

cheers,
Johan

April 29, 2023

On Saturday, 29 April 2023 at 10:08:41 UTC, Johan wrote:

>

On Friday, 28 April 2023 at 23:30:37 UTC, Zachary Yedidia wrote:

>

Clang format is a high quality auto-formatter developed by the LLVM project, mainly for use with C and C++, but it also supports Java, C#, JavaScript, JSON, Objective-C, Proto, TableGen, TextProto, and Verilog. I think it would be great to add support for D, and hopefully shouldn't be too difficult since it is a C-like language.

clang-format is indeed a godsent formatter.
I recently also read that it can do languages quite different from C, and hoped for D support in the future.
It will be very nice for people (like me) who work on mixed C++/D codebases.

>

I have started an implementation for D here: https://github.com/zyedidia/llvm-project/tree/clang-format-d, with some information about the implementation here (how to download/build/test): https://github.com/zyedidia/llvm-project/blob/clang-format-d/clang-format-d.md.

Great you are picking this up and leading the effort!
Hope to be able to help out.

I'm enthusiastic, but after some more thinking I think it's wise to talk with the sdfmt team and discuss which way to go: better to have everyone work on one project, rather than two. I'm very interested in working on a good formatter, but not two ;-)

Perhaps the upsides of clang-format (what is already there) are not so large.
A big downside of clang-format is that you are tied to LLVM's project (release schedule, way of working, community, ...). In order to format D code, you'd have to install clang-format (and the whole LLVM dependency?). And you are programming in C++, which is a pity for a project that is purely for D.
Because sdfmt has its own 'parser' (excellent!), it is quite a small program that can easily be built using dub. That makes the barrier to entry for developers very small (no need to build or test sdc itself).

-Johan

April 29, 2023

On Saturday, 29 April 2023 at 10:46:45 UTC, Johan wrote:

>

I'm enthusiastic, but after some more thinking I think it's wise to talk with the sdfmt team and discuss which way to go: better to have everyone work on one project, rather than two. I'm very interested in working on a good formatter, but not two ;-)

I liked sdfmt when I tried it.
sdfmt doesn't support spaces as indentation (or even specifying the indent level), so it seems to contradict the "D style". So we have a tab vs space debate on our hands :) and even K&R vs Allman.

April 29, 2023

On Saturday, 29 April 2023 at 14:32:02 UTC, Guillaume Piolat wrote:

>

On Saturday, 29 April 2023 at 10:46:45 UTC, Johan wrote:

>

I'm enthusiastic, but after some more thinking I think it's wise to talk with the sdfmt team and discuss which way to go: better to have everyone work on one project, rather than two. I'm very interested in working on a good formatter, but not two ;-)

I liked sdfmt when I tried it.
sdfmt doesn't support spaces as indentation (or even specifying the indent level), so it seems to contradict the "D style". So we have a tab vs space debate on our hands :) and even K&R vs Allman.

You can tell sdfmt to use spaces in a config file.

D style is bad IMO (or at least has some warts) - sdfmt output is so much more efficient than (say) dfmt by default

April 29, 2023

On Saturday, 29 April 2023 at 10:46:45 UTC, Johan wrote:

>

On Saturday, 29 April 2023 at 10:08:41 UTC, Johan wrote:

>

On Friday, 28 April 2023 at 23:30:37 UTC, Zachary Yedidia

[...]

I'm enthusiastic, but after some more thinking I think it's wise to talk with the sdfmt team and discuss which way to go: better to have everyone work on one project, rather than two. I'm very interested in working on a good formatter, but not two ;-)

Perhaps the upsides of clang-format (what is already there) are not so large.
A big downside of clang-format is that you are tied to LLVM's project (release schedule, way of working, community, ...). In order to format D code, you'd have to install clang-format (and the whole LLVM dependency?). And you are programming in C++, which is a pity for a project that is purely for D.
Because sdfmt has its own 'parser' (excellent!), it is quite a small program that can easily be built using dub. That makes the barrier to entry for developers very small (no need to build or test sdc itself).

-Johan

I think you make some good points. The clang-format codebase is a bit gnarly to work with (~20,000 lines of C++ without any clear separation/organization for formatting all the various languages). I wasn't aware of sdfmt until the foundation meeting forum post yesterday, but maybe it is a better direction to go in. Having briefly tried sdfmt, I think the main thing I miss is comment/assignment alignment. I was under the impression that this might be difficult to implement in sdfmt, but if not I think that would resolve a lot for me (sdfmt already does better than dfmt for me). It would also be great to have import sorting. Some other minor issues with sdfmt are: not many configuration options (though I found the defaults to be pretty good), no documentation on how to set the configuration options that do exist (i.e., the .sdfmt JSON file is not documented), and a lack of visibility of the project overall because it is hidden within SDC (probably why I was not aware of it until recently).

I think for now I'll just sit on my clang-format fork and see if there is interest/it's feasible to implement these things in sdfmt. If so, then I would be happy to use (and possibly contribute to) sdfmt instead.

April 29, 2023

On Saturday, 29 April 2023 at 19:08:14 UTC, Zachary Yedidia wrote:

>

On Saturday, 29 April 2023 at 10:46:45 UTC, Johan wrote:

>

On Saturday, 29 April 2023 at 10:08:41 UTC, Johan wrote:

>

On Friday, 28 April 2023 at 23:30:37 UTC, Zachary Yedidia

[...]

I'm enthusiastic, but after some more thinking I think it's wise to talk with the sdfmt team and discuss which way to go: better to have everyone work on one project, rather than two. I'm very interested in working on a good formatter, but not two ;-)

Perhaps the upsides of clang-format (what is already there) are not so large.
A big downside of clang-format is that you are tied to LLVM's project (release schedule, way of working, community, ...). In order to format D code, you'd have to install clang-format (and the whole LLVM dependency?). And you are programming in C++, which is a pity for a project that is purely for D.
Because sdfmt has its own 'parser' (excellent!), it is quite a small program that can easily be built using dub. That makes the barrier to entry for developers very small (no need to build or test sdc itself).

-Johan

I think you make some good points. The clang-format codebase is a bit gnarly to work with (~20,000 lines of C++ without any clear separation/organization for formatting all the various languages). I wasn't aware of sdfmt until the foundation meeting forum post yesterday, but maybe it is a better direction to go in. Having briefly tried sdfmt, I think the main thing I miss is comment/assignment alignment. I was under the impression that this might be difficult to implement in sdfmt, but if not I think that would resolve a lot for me (sdfmt already does better than dfmt for me). It would also be great to have import sorting.

Why is import sorting useful in practice? Anything more than ~3 top level imports for me is a pretty big red flag for me - D has local imports, use them.

Wouldn't be hard to implement in theory, although keep in mind that sdfmt internally has no concept of an import other than in the "parser", so that might be more of an AST->AST type of thing (which are much, much, easier to implement if you have a formatter for the output).

>

Some other minor issues with sdfmt are: not many configuration options (though I found the defaults to be pretty good), no documentation on how to set the configuration options that do exist (i.e., the .sdfmt JSON file is not documented), and a lack of visibility of the project overall because it is hidden within SDC (probably why I was not aware of it until recently).
sdfmt not being particularly configurable is sort of by design.
I think for now I'll just sit on my clang-format fork and see if there is interest/it's feasible to implement these things in sdfmt. If so, then I would be happy to use (and possibly contribute to) sdfmt instead.

The sdfmt algorithm is basically a simplified take on the way clang format works as far as I'm aware, implementing the alignment stuff shouldn't be ridiculously hard although I'm not sure how clang format has it in their decision-making/heuristics.

April 29, 2023

On Saturday, 29 April 2023 at 21:07:28 UTC, max haughton wrote:

>

On Saturday, 29 April 2023 at 19:08:14 UTC, Zachary Yedidia wrote:

>

It would also be great to have import sorting.

Why is import sorting useful in practice? Anything more than ~3 top level imports for me is a pretty big red flag for me - D has local imports, use them.

I think these kind of discussions should be kept to a minimum: a formatter should not force a specific formatting taste on the user, and instead provide options such that the user can tailor it to taste.
I would also appreciate import sorting (with option to separate stdlib imports from user libraries), including sorting symbols of specific imports.
Another wish is grouping all UDAs either before/after the function.
If I think longer, I'm sure I have other wishes that clash with someone else's taste, like you had (https://github.com/snazzy-d/sdc/issues/231) ;-)
Hence options options options!

cheers,
Johan

April 29, 2023

On Saturday, 29 April 2023 at 21:27:51 UTC, Johan wrote:

>

On Saturday, 29 April 2023 at 21:07:28 UTC, max haughton wrote:

>

On Saturday, 29 April 2023 at 19:08:14 UTC, Zachary Yedidia wrote:

>

It would also be great to have import sorting.

Why is import sorting useful in practice? Anything more than ~3 top level imports for me is a pretty big red flag for me - D has local imports, use them.

I think these kind of discussions should be kept to a minimum: a formatter should not force a specific formatting taste on the user, and instead provide options such that the user can tailor it to taste.

In this case I think the question only makes sense if you are writing suboptimal code e.g. some files in dmd currently have almost 100 lines of imports at the top.

>

I would also appreciate import sorting (with option to separate stdlib imports from user libraries), including sorting symbols of specific imports.
Another wish is grouping all UDAs either before/after the function.
If I think longer, I'm sure I have other wishes that clash with someone else's taste, like you had (https://github.com/snazzy-d/sdc/issues/231) ;-)
Hence options options options!
I guess, but this felt inconsistent rather than merely not to taste. For the most part I genuinely don't care how the code is actually formatted as long as it feels space efficient (using sdfmt has made me hate Allman braces, other than that not that much to report) and isn't going to trip my eyes up.

At a scale larger than 1 person formatting isn't really about aesthetics anyway, it's about uniformity (both for tools and people). As long as it isn't completely brain-damaged I'm not that bothered about the format itself.

At the level of a team of programmers, consider - as we all have done, and will do as long as there are programmers, and a bit longer after that as long as people remember what digital computers are - debates about how exactly code should be formatted: In using a relatively inflexible formatter you mostly eliminate the politics and distraction of these debates whereas in having to decide you run the risk of just moving the distraction around. YMMV.

Following on from the above, if someone wants to implement more options in sdfmt (other than it being up to Amaury) I don't see much of a problem but I just think people miss why (and when) formatters are a good idea.