Jump to page: 1 213  
Page
Thread overview
How can we make it easier to experiment with the compiler?
May 24, 2021
Nicholas Wilson
May 24, 2021
Walter Bright
May 24, 2021
Nicholas Wilson
May 24, 2021
Walter Bright
May 24, 2021
Nicholas Wilson
May 24, 2021
Walter Bright
May 24, 2021
Walter Bright
May 24, 2021
Tobias Pankrath
May 26, 2021
Walter Bright
May 26, 2021
Nicholas Wilson
May 26, 2021
Paul Backus
May 26, 2021
Imperatorn
May 26, 2021
rikki cattermole
May 26, 2021
rikki cattermole
May 26, 2021
zjh
May 26, 2021
Greg Strong
May 26, 2021
rikki cattermole
May 26, 2021
rikki cattermole
May 26, 2021
Alexandru Ermicioi
May 26, 2021
rikki cattermole
May 27, 2021
Walter Bright
May 27, 2021
Walter Bright
May 27, 2021
Nicholas Wilson
May 27, 2021
Mathias LANG
May 27, 2021
Walter Bright
May 27, 2021
Basile B.
May 27, 2021
Basile B.
May 27, 2021
Walter Bright
May 27, 2021
Patrick Schluter
May 24, 2021
Nicholas Wilson
May 24, 2021
Alexandru Ermicioi
May 24, 2021
Walter Bright
May 24, 2021
Nicholas Wilson
May 24, 2021
Walter Bright
May 24, 2021
zjh
May 24, 2021
zjh
May 24, 2021
user1234
May 24, 2021
Iain Buclaw
Jun 06, 2021
Basile B.
May 25, 2021
Patrick Schluter
May 24, 2021
Iain Buclaw
May 24, 2021
Johan Engelen
May 24, 2021
sighoya
May 24, 2021
Bruce Carneal
May 24, 2021
Walter Bright
May 24, 2021
poffer
May 24, 2021
Walter Bright
May 24, 2021
poffer
May 24, 2021
Walter Bright
May 24, 2021
Iain Buclaw
May 24, 2021
Walter Bright
May 24, 2021
Iain Buclaw
May 24, 2021
Walter Bright
May 24, 2021
Iain Buclaw
May 25, 2021
zjh
May 24, 2021
Dukc
May 24, 2021
Walter Bright
May 24, 2021
Dukc
May 25, 2021
Dibyendu Majumdar
May 27, 2021
Walter Bright
May 24, 2021
Iain Buclaw
May 24, 2021
Nicholas Wilson
May 24, 2021
Alexandru Ermicioi
May 24, 2021
Walter Bright
May 24, 2021
Alexandru Ermicioi
May 27, 2021
Walter Bright
May 27, 2021
zjh
May 27, 2021
zjh
May 24, 2021
12345swordy
May 24, 2021
Max Haughton
May 25, 2021
Walter Bright
Re: How can we make it easier to experiment with the compiler
May 25, 2021
sighoya
May 25, 2021
jmh530
May 25, 2021
jmh530
May 25, 2021
zjh
May 25, 2021
zjh
May 27, 2021
Walter Bright
May 27, 2021
zjh
May 27, 2021
jmh530
May 25, 2021
sighoya
May 25, 2021
Alexandru Ermicioi
May 25, 2021
sighoya
May 25, 2021
Basile B.
May 25, 2021
Iain Buclaw
May 25, 2021
Walter Bright
May 23, 2021

I think there are many that would like to experiment with the compiler, but feel discouraged because they don't know how to approach it.

I think this is not only comes down to documentation, but also is structural. In order to figure out what to improve, the best starting point is experienced challenges.

The number one challenge I see is keeping track of DMD as it is released with new improvements. Basically reapplying the changes made to the experimental branch to the main branch (aka "rebasing"?). I suspect that kills many efforts, meaning people create a fork, start making changes, but then a new version of DMD is released and the fork is left to dry in the sun as rebasing is not fun. And well, a hobby that isn't fun, is not a good hobby. :-D

Better internal compiler structure would help a lot with this. So a prioritized list for me would be:

  1. Have a clean separation between frontend and backend, that is close to plug-and-play. That would allow people to inject a new high level IR between frontend and backend that could open for new interesting optimizations, and allow all the compilers to benefit from it.

  2. Break down source files into smaller units, so that stable parts are separated from unstable parts.

  3. More encapsulation and separation of responsibility.

  4. Switch to a more syntactical AST, possibly enabling AST macros in the future without too much hassle, then use an IR for real work.

  5. Use directories.

  6. Improved documentation.

  7. Tutorials.

What other items should be on the list?

Which items are feasible in the next 6 months?

May 24, 2021

On Sunday, 23 May 2021 at 06:12:30 UTC, Ola Fosheim Grøstad wrote:

>

I think there are many that would like to experiment with the compiler, but feel discouraged because they don't know how to approach it.

I think this is not only comes down to documentation, but also is structural. In order to figure out what to improve, the best starting point is experienced challenges.

The number one challenge I see is keeping track of DMD as it is released with new improvements. Basically reapplying the changes made to the experimental branch to the main branch (aka "rebasing"?).

(the is the correct terminology). I suspect this is more of a problem for people that are less familiar with git, which might well also include people wanting to play around with DMD, e.g. GSoC/SAoC students.
I know this was the case for me while developing dcompute with the added difficulty of tracking LLVM on top of LDC (which was kept in sync with DMD).

>

I suspect that kills many efforts, meaning people create a fork, start making changes, but then a new version of DMD is released and the fork is left to dry in the sun as rebasing is not fun. And well, a hobby that isn't fun, is not a good hobby. :-D

The solution to this is better git skills not so much better compiler skills/knowledge of DMD although a merge conflict in a critical piece of code is always a PiTA. We now have slack/discord for people to ask these kinds of questions, which I'm sure they will get answered if the are trying to do something interesting or fix an annoying problem.

>

Better internal compiler structure would help a lot with this. So a prioritized list for me would be:

Oh god yes. the directory structure, or rather lack thereof, is a really dire repellant for newcomers. I cannot understate this. 173 files in dmd/src/dmd is completely unacceptable, however Walter seems to like it this way and has struck down PRs trying to remediate this in the past (because it doesn't suit his editor configuration? or something like that).

We should have at least the following folders:
ast: ast_node, dsymbol, aggregate, et al
semantic: semantic2, semantic3, ob, nogc, safe et al
visitors: parsetimevisitor, permissivevisitor, visitor et al
glue (backend interfacing files): lib[.],scan[.] toir, s2ir, e2ir et al
lex: lexer, tokens, identifier, id utf et al
headers: (alas still needed until dtoh works well enough and has been stable enough releases for GDC to bootstrap)

>
  1. Have a clean separation between frontend and backend, that is close to plug-and-play. That would allow people to inject a new high level IR between frontend and backend that could open for new interesting optimizations, and allow all the compilers to benefit from it.

see also https://mlir.llvm.org, I had a GSoC student try to do something with this, I don't think it got to a usable state. but this is about as a state of the art as it gets and a very interesting research direction. Rust and swift use multiple levels of IRs.

Also from what I understand, the pointer and liveness analysis as part of DIP 1000/1040/(other walter DIPs?) does something like this, but in a hacked up, nonstandard manner.

>
  1. Break down source files into smaller units, so that stable parts are separated from unstable parts.

Urgh. Dealing with 10000 line files and 1000 line functions is such a drain on trying to get stuff done (looking at you expressionsem.d). However this needs to be combined with directories/packages or it will not improve the situation.

>
  1. More encapsulation and separation of responsibility.

  2. Switch to a more syntactical AST, possibly enabling AST macros in the future without too much hassle, then use an IR for real work.

That is a noble goal, but would require a lot of changes both in DMD and in downstream LDC and GDC, and tools that consume AST that expect it to be complete. not to mention designing said IR, redoing semantic analysis/transformations to work with it.

>
  1. Use directories.

Yes!!! sooo much yes! see above.

>
  1. Improved documentation.

  2. Tutorials.

What other items should be on the list?

try to make sure we use standard terminology for things so that people can reliably search for things

>

Which items are feasible in the next 6 months?

Directories.

May 23, 2021
On 5/23/2021 7:25 PM, Nicholas Wilson wrote:
> Directories.

The #1 problem isn't directories, it's "every module imports every other module" that leaves one with nowhere to start.

We currently have:

  dmd
  dmd/root
  dmd/backend

I regularly fend off attempts to have dmd/root import files from dmd, and dmd/backend import files from dmd. I recently had to talk someone out of having dmd/backend import files from dmd/root.

In other words, a failure of encapsulation.

Let's look at one example, picked more or less because I've looked at it recently, dmd/target.d. The reason for its existence is to abstract target information. It's imports are:

  import dmd.argtypes_x86;
  import dmd.argtypes_sysv_x64;
  import core.stdc.string : strlen;
  import dmd.cond;
  import dmd.cppmangle;
  import dmd.cppmanglewin;
  import dmd.dclass;
  import dmd.declaration;
  import dmd.dscope;
  import dmd.dstruct;
  import dmd.dsymbol;
  import dmd.expression;
  import dmd.func;
  import dmd.globals;
  import dmd.id;
  import dmd.identifier;
  import dmd.mtype;
  import dmd.statement;
  import dmd.typesem;
  import dmd.tokens : TOK;
  import dmd.root.ctfloat;
  import dmd.root.outbuffer;
  import dmd.root.string : toDString;

If I want to understand the code, I have to understand half of the rest of the compiler. On a more abstract level, why on earth would a target abstraction need to know about AST nodes? At least half of these imports shouldn't be here, and if they are, the code needs to be redesigned.

Recently I needed some target information in the ImportC lexer, and it would have been so easy to just import dmd.target. But then that drags along all the imports that I've really tried to avoid importing into the lexer.

Iain came up with a clever solution to use a template parameter.

Note that Phobos suffers terribly from this disease (everything ultimately imports everything else), which makes it very hard to understand and debug.

Fixing this is not easy, it requires a lot of hard thinking about what a module *really* needs to do. But each success at eliminating an import makes it more understandable.

Creating a false hierarchy (an implied relationship that is instantly defeated by the imports) of files won't fix it.

A good rule of thumb is:

    *** Never import a file from an uplevel directory ***

Import sideways and down, never up.
May 24, 2021
On Monday, 24 May 2021 at 02:56:14 UTC, Walter Bright wrote:
> On 5/23/2021 7:25 PM, Nicholas Wilson wrote:
>> Directories.
>
> The #1 problem isn't directories, it's "every module imports every other module" that leaves one with nowhere to start.
>
> We currently have:
>
>   dmd
>   dmd/root
>   dmd/backend
>
> I regularly fend off attempts to have dmd/root import files from dmd, and dmd/backend import files from dmd. I recently had to talk someone out of having dmd/backend import files from dmd/root.
>
> In other words, a failure of encapsulation.

This is a _completely_ orthogonal problem.

The symptoms are completely orthogonal, although easily confused: failure of encapsulation makes _reasoning_ about the _interconnectedness_ of code difficult, failure to package makes _exploration_ and _enumeration_ of code (files, functions, classes, data structures) more difficult.
The solutions, however are cross enabling: we can implement and _enforce_ policies like say "AST node implementing modules should not import semantic analysis modules" with reasonable confidence iff we have all the AST modules in one place and all the semantic analysis modules in one place.

The symptoms of failure of encapsulation I'm going to assume you are well aware of.
The symptoms of failure to use packages are as follows:
 * the sheer number of filed in src/dmd make it impossible to remember what each file is for.  This problem is compounded by the fact that many files have names that do not describe well what they do _especially_ to newcomers. Principle offending example `ob`. Compare with names like `filecache`.
* it is impossible to determine at a glance what files are related to each other:
is `foreachvar.d` an AST node?, what about `dcast.d`? (No and No)
Whats the difference between `glue.d` and `gluelayer.d`?
is `visitor.d`, `transitivevisitor.d`, `strictvisitor.d` `parsetimevisitor.d` and `permissivevisitor.d` a complete list of the module public visitor modules? (No)
Which of `cond.d` and `staticcond.d` is the AST node for a static condition? What does the other file do? (`cond.d`, semantic analysis)
What files do semantic analysis? Which files declare AST nodes? Which files interface with the backend (and subsequently are not part of LDC or GDC)?
Where is DMD's entry point?


> snip example
> Fixing this is not easy, it requires a lot of hard thinking about what a module *really* needs to do. But each success at eliminating an import makes it more understandable.

Fixing the lack of directory issue requires only to think about what a module _is_ i.e. what package it belongs to: driver/frontend (mars, errors etc) , lexer group (lex, parse, tokens etc), ast, semantic analysis, backend interfacing, backend, root.

> Creating a false hierarchy (an implied relationship that is instantly defeated by the imports)

You cannot seriously tell me with a straight face that e.g. AST, is not a hierarchy and should not be grouped together.

> of files won't fix [failure to encapsulate].

Indeed is fixes a different problem, but it makes fixing failure to encapsulate much easier.

> A good rule of thumb is:
>
>     *** Never import a file from an uplevel directory ***
>
> Import sideways and down, never up.

Indeed. However you can't to much of that with just

>   dmd
>   dmd/root
>   dmd/backend


May 24, 2021
On 5/23/21 10:56 PM, Walter Bright wrote:
> I recently had to talk someone out of having dmd/backend import files from dmd/root.

One problem with that is code duplication. There are two types OutBuffer in frontend and Outbuffer in backend that are 95% identical, yet duplicated. Recent improvements (two distinct) will need to be duplicated to the other, which is clearly not a good way to go.

How to address this problem?

I think all of us looking to improve dmd's architecture would be well served by reading this book:

https://amazon.com/gp/product/0135974445/

Really close, cover to cover. A lot of the principles in that book are either applied with good results (sadly not as often as one would hope), or not, with the expected poor outcome, in dmd's codebase. For example, this:

> A good rule of thumb is:
> 
>     *** Never import a file from an uplevel directory ***
> 
> Import sideways and down, never up. 

is an approximate formulation of a subset of Dependency Inversion Principle:

https://en.wikipedia.org/wiki/Dependency_inversion_principle
May 24, 2021
On 5/24/21 1:15 AM, Nicholas Wilson wrote:
> Indeed is fixes a different problem, but it makes fixing failure to encapsulate much easier.

I think the best first step is to add `private` to the codebase. This is cheap to get into and informs any future refactoring. I find it confusing that people push for massive reorganization for years, but won't bother to create 50 line PRs that add `private` appropriately.
May 23, 2021
On 5/23/2021 10:15 PM, Nicholas Wilson wrote:
> This is a _completely_ orthogonal problem.

It's the same problem.

D's support for modules and packages is literally designed around matching the hierarchy of the source files.

Shuffling files around accomplishes nothing when every module imports every other module.
May 24, 2021
On Monday, 24 May 2021 at 06:58:48 UTC, Walter Bright wrote:
> On 5/23/2021 10:15 PM, Nicholas Wilson wrote:
>> This is a _completely_ orthogonal problem.
>
> It's the same problem.
>
> Shuffling files around accomplishes nothing when every module imports every other module.

Did you read _literally nothing else_ that I wrote?

Let me quote myself again so that you don't miss it:

> The symptoms are completely orthogonal, although easily confused: failure of encapsulation makes _reasoning_ about the _interconnectedness_ of code difficult, failure to package makes _exploration_ and _enumeration_ of code (files, functions, classes, data structures) more difficult.

Putting the modules into packages fixes EXACTLY the problem of horrible experience with exploration and enumeration. It explicitly does not fix failure of encapsulation because it is a  _completely_ orthogonal set of symptoms.

> D's support for modules and packages is literally designed around matching the hierarchy of the source files.

Yes, and?

May 24, 2021
On Monday, 24 May 2021 at 02:56:14 UTC, Walter Bright wrote:
> On 5/23/2021 7:25 PM, Nicholas Wilson wrote:
>> Directories.
>
> The #1 problem isn't directories, it's "every module imports every other module" that leaves one with nowhere to start.
>
> We currently have:
>
>   dmd
>   dmd/root
>   dmd/backend
>
> I regularly fend off attempts to have dmd/root import files from dmd, and dmd/backend import files from dmd. I recently had to talk someone out of having dmd/backend import files from dmd/root.
>
> In other words, a failure of encapsulation.
>
> Let's look at one example, picked more or less because I've looked at it recently, dmd/target.d. The reason for its existence is to abstract target information. It's imports are:
>
>   import dmd.argtypes_x86;
>   import dmd.argtypes_sysv_x64;
>   import core.stdc.string : strlen;
>   import dmd.cond;
>   import dmd.cppmangle;
>   import dmd.cppmanglewin;
>   import dmd.dclass;
>   import dmd.declaration;
>   import dmd.dscope;
>   import dmd.dstruct;
>   import dmd.dsymbol;
>   import dmd.expression;
>   import dmd.func;
>   import dmd.globals;
>   import dmd.id;
>   import dmd.identifier;
>   import dmd.mtype;
>   import dmd.statement;
>   import dmd.typesem;
>   import dmd.tokens : TOK;
>   import dmd.root.ctfloat;
>   import dmd.root.outbuffer;
>   import dmd.root.string : toDString;
>
> If I want to understand the code, I have to understand half of the rest of the compiler. On a more abstract level, why on earth would a target abstraction need to know about AST nodes? At least half of these imports shouldn't be here, and if they are, the code needs to be redesigned.
>
> Recently I needed some target information in the ImportC lexer, and it would have been so easy to just import dmd.target. But then that drags along all the imports that I've really tried to avoid importing into the lexer.
>
> Iain came up with a clever solution to use a template parameter.
>
> Note that Phobos suffers terribly from this disease (everything ultimately imports everything else), which makes it very hard to understand and debug.
>
> Fixing this is not easy, it requires a lot of hard thinking about what a module *really* needs to do. But each success at eliminating an import makes it more understandable.
>
> Creating a false hierarchy (an implied relationship that is instantly defeated by the imports) of files won't fix it.
>
> A good rule of thumb is:
>
>     *** Never import a file from an uplevel directory ***
>
> Import sideways and down, never up.

A good enhancement to the language would be adding some sort of module declaration that just states the admitted import packages or modules. I know that could be done by an external tool, but I feel that this one is a common problem.


May 24, 2021

On Monday, 24 May 2021 at 02:25:33 UTC, Nicholas Wilson wrote:

>

On Sunday, 23 May 2021 at 06:12:30 UTC, Ola Fosheim Grøstad wrote:

>

The number one challenge I see is keeping track of DMD as it is released with new improvements. Basically reapplying the changes made to the experimental branch to the main branch (aka "rebasing"?).

(the is the correct terminology). I suspect this is more of a problem for people that are less familiar with git, which might well also include people wanting to play around with DMD, e.g. GSoC/SAoC students.
I know this was the case for me while developing dcompute with the added difficulty of tracking LLVM on top of LDC (which was kept in sync with DMD).

>

I suspect that kills many efforts, meaning people create a fork, start making changes, but then a new version of DMD is released and the fork is left to dry in the sun as rebasing is not fun. And well, a hobby that isn't fun, is not a good hobby. :-D

The solution to this is better git skills not so much better compiler skills/knowledge of DMD although a merge conflict in a critical piece of code is always a PiTA. We now have slack/discord for people to ask these kinds of questions, which I'm sure they will get answered if the are trying to do something interesting or fix an annoying problem.

I think I should have used the term "boring" rather than "challenging".

I doubt that git skills would solve it as I think it is more related to what a hobby is to people who are older and have a very long spare time todo-list. Any "unproductive" and "unfun" chore will go to the bottom of the todo-list. My I-really-ought-todo-list is so long that it could fill up the rest of my life...

So it is basically easier to just stay on an outdated dmd-branch for a couple of years, rather than keeping track of it... which is not a good strategy.

Think of it like this: I have 2-5 hours a week for completely unnecessary, but fun things like hacking a new IR + optimization inbetween DMD and LLVM. So, what should I do: do my taxes, rebase my fork, watch Eurovision with family? Rebasing is down there with taxes, except I have to do the taxes eventually, just not this Saturday... (Ok, so we watch Eurovision then just to find out how bad it is? :-)

I think it would not be too difficult to get to a situation where you have well-defined entry points, hooks, layers that makes it more of a plugin-experience.

Examples of potential plug-and-play:

  1. Add new experimental syntax: The parser is quite close. It would not take a lot of work to encapsulate a manager of (file-extension, Parser) pairs that have no overhead (compile time). Ok, so if you want to extend the language as experiment, just duplicate the parser, modify it and plug it in. This is a low-hanging fruit.

  2. Add new semantics: add a new file with functions with custom intrinsics that are somehow added to the runtime, use your custom parser to lower your custom syntax to these custom runtime functions. Inject yourself between the front-end and backed (assuming a high level IR), pick up the custom intrinsics and do the analysis/transforms you want.

  3. Add new high level optimization, like ARC: same as 2, except you only add new passes in a new file and possibly some new fields to the high level IR. Then edit a config file that makes the pass available and executed at the right time (with respect to other passes).

So, the basic idea is, that instead of modifying the compiler, you add new files to it and bring them into the compiler by hooks, configuration files etc.

Then you can also much easier merge and combine contributions from many different extension authors and easily replace one extension with a better one.

>

Urgh. Dealing with 10000 line files and 1000 line functions is such a drain on trying to get stuff done (looking at you expressionsem.d). However this needs to be combined with directories/packages or it will not improve the situation.

Yes, but one can create virtual directories though. E.g. in some editors you can group files from different directories so it looks like they are in one directory. You can do something similar with "ln -s", but it isn't optimal...

> >

Which items are feasible in the next 6 months?

Directories.

Sounds like a good start. I still think the high level IR is the most pressing one, as not having that abstraction makes adding new experimental semantics too time consuming for hobbyists.

I had the idea that I could do ARC by adding intrinsics to LLVM, but Apple engineers strongly advised against it and strongly suggested working on a high level IR instead.

ARC is something well suited for a hobbyists as you can implement it in a gradual manner if you have a high level IR (one tweak here, one tweak there).

Anyway, I think more experimentation is needed. Say, if 1 out of 10 experiments made it into the main dmd, then there could be more interesting options that would make dmd stand out in the crowd.

IMHO The key challenge is to make experimentation fun for people who have limited time (which happens as you get older).

Imagine if D could get some of the people that were active with D 10-15 years ago, but currently have very limited time, to create their own experiments? I am sure that many of those have grown to capable programmers since then, so that could be something to think about.

It has to be fun experience throughout for people to spend those 3-4 spare hours a week on compiler hacking.

« First   ‹ Prev
1 2 3 4 5 6 7 8 9 10 11