May 24, 2021

On Monday, 24 May 2021 at 10:21:44 UTC, Walter Bright wrote:

>

That doesn't really help, the dependencies are still there.

It makes it clear what they are for, which makes this statement:

>

If I want to understand the code, I have to understand half of the rest of the compiler.

obsolete.

>

It is not critical that we fix target.d. It's just that it would be better if its API was not AST nodes, but just values. Let the caller construct the AST node from the information provided.

The majority of the API are values, but it still needs to be fed AST information in order to make informative decisions.

For instance, how else would we be able to infer isReturnOnStack without a TypeFunction? Even GDC needs the completed TypeFunction, as I generate a tree on-the-fly and pass that to GCC's back-end API to get said information.

>

Like what we did for the C parser. I was happy to have it not indirectly import everything in dmd when all it needed was a couple values.

I'm not saying any of this is easy.

Target's first goal of removing all global.params.isXXX fields was never going to be easy either. :-)

May 24, 2021
On Monday, 24 May 2021 at 10:34:35 UTC, Walter Bright wrote:
> On 5/24/2021 2:44 AM, Alexandru Ermicioi wrote:
>> They are not simple for new volunteers to dmd.
>
> You're right, they are not. They're optimized for the people who spend thousands of hours working on it.
>
> This inevitably happens with every profession, every discipline, and every project. A jargon specific to it grows up around it, for the convenience of the people who work on it every day. If the jargon is consistent and reasonably logical, it can be a great aid to understanding once one gets familiar with it.

Well, there is no dictionary for those abbreviations, and it is hard, to decipher them when looking into kilometer long code. That is my experience with dmd code:
1. I stumble on compiler bug.
2. File a bug report.
3. No-one fixes it in couple of days, and I think perhaps I can fix this bug, since it's not complicated, and should be couple of lines.
4. I download dmd, try to compile it somehow, because dub compilation either freezed, or failed, but somehow manage to by using older build system dmd had.
5. Then finally I can start changing code?
6. No, first find what module and class is responsible for that code, in ocean of modules named from an ocean of abbreviations, or with misguiding names.
7. Oh well after wasting an hour/two of from 3 to 4 what you have, you find it.
8. Then you look into the kilometer long function. You seem to find the piece of code that might be the cause of the bug, and try understanding it better.
9. You read that said code, try keeping in mind entire code flow you've read up to this point, and suddenly there is an 'aa'.
10. You try to figure out what 'aa' means, but fail to do so, therefore you need to look ad it's declaration to know the type and figure it out from it's type.
11. You find the type of variable, and rejoice at deciphering it being 'associative array', yay.
12. Okay, let's go back to the line with 'aa'.
13. First find that said line if for some reason your ide didn't retain it.
14. Once there, you continue reading, but wait,what was before the line with 'aa'?
15. Damn, I forgot. Sigh I have to read all the code again.


That is my experience with all abbreviations in dmd, which are like an ocean.
It is ok, to have a couple of well defined and documented abbreviations, not an ocean of them without any documentation. It is not my job, to fix dmd, I wanted to do something when I had couple of hours to invest. It is not rewarding when those couple of hours are wasted at deciphering abbreviations, and not even understanding the flow of code itself.

Please limit use of abbreviations to minimum, and those that are used, should be documented.

>
> There are some reasonably well-encapsulated parts. The lexer, the parser, and the files in the root package. To understand the compiler, I'd start there.

Yet there is no official guidance on where to start. Also, please note that not all volunteers prefer reading source code, and invest hours at understanding the architecture and inner workings, starting from lexer or parser, some of them just want to fix a small bug, and be done with it. It is extremely hard to do that now.

Best regards,
Alexandru.


May 24, 2021

We need big changes.
We need todolist(order by important).
We need to split big files into directories.
Small refactoring is useless.
Big changes are necessary.
We separate the stable part from the unstable part of the big file.And divided into small files.
According to dependence, change from the most dependent.
Interfacs or func name need not to change.
It's just that the organization has to be changed.
Nobody reads thousands of lines functions.
No one reads >100kb coding files because they are too large.
We just split up large files, not modify the function implementation.
Because modifying the function implementation is most likely to make mistakes.

May 24, 2021

On Monday, 24 May 2021 at 10:34:35 UTC, Walter Bright wrote:

>

On 5/24/2021 2:44 AM, Alexandru Ermicioi wrote:

>

They are not simple for new volunteers to dmd.

You're right, they are not. They're optimized for the people who spend thousands of hours working on it.

This inevitably happens with every profession, every discipline, and every project. A jargon specific to it grows up around it, for the convenience of the people who work on it every day. If the jargon is consistent and reasonably logical, it can be a great aid to understanding once one gets familiar with it.

Unfortunately, I have failed at my original design goal of making DMD a simple compiler. Reshuffling files around and renaming things will not help. What will help is better encapsulation - unfortunately, that is hard to do.

There are some reasonably well-encapsulated parts. The lexer, the parser, and the files in the root package. To understand the compiler, I'd start there.

I seriously question the "Optimized for people who spend thousands of hours working on it" line, as I had a very intelligent person posted on slacks asking what does this function do, as there is no comments for said functions.

-Alex

May 24, 2021

On Monday, 24 May 2021 at 10:47:16 UTC, Johan Engelen wrote:

>

My standpoint on the original topic of "make it easier to experiment with the compiler": I disagree with making the code more stable. If anything, we should be refactoring much more aggressively, changing function names etc.

Thank you for bringing us back on topic. Yes, or at least have a map of what is considered stable and well encapsulated and what is considered unstable and likely to change.

I don't believe this is a matter for git rebasing tooling/understanding. I just don't want to build directly on top of something that looks like it is likely to change (from a software engineering point of view).

I consider every hour spent on rebasing, dealing with regressions etc to be losses, or more importantly "not fun". I only want to do "not fun" things if I can learn something from them.

D has to rely on hobbyists, so getting "not fun"/"no learning potential" out of the way is important.

>

others too). The frontend source code is not nice, but I'm not drawn to fix it at all (even if paid for) because I am not ashamed by it as I would be if I would have some shared 'ownership' of it.

That is a bit harsh, of course all code bases that have evolved over a long time are not nice, parts of LDC too.

Anyway, my main wish is just to be able to inject my own IR between the frontend and backend.

My feeling right now is that to do that I have to choose LDC and then heavily modify it. I sense that in the end I basically will end up with my own backend, something I don't want to maintain...

Think of it like LEGOs. The front end is a green brick and the back end a red brick. I want to insert a white brick between them. I don't want to modify the bricks more than "cleaning" the studs.

Another analogy, if the frontend is an engine, my IR is the transmission and the backend is the wheels, then I don't mind that the current engine is oily and grease, I leave that to other mechanics to clean up. Same with the wheels. I just want to be an expert on the transmission and evolve it from a manual transmission into a nice automatic transmission. Right now the engine is coupled directly to the wheels... which basically means being forced to drive in the same gear all the time.

I am less interested in getting my fingers greasy and am happy to leave that to others as long as I can focus on polishing the chrome on my transmission line...

(I belive many things could be done with an intermediary high level IR, such as ARC, stackless coroutines, heap optimizations... LLVM is too low level. AST is too cumbersome.)

May 24, 2021

On Monday, 24 May 2021 at 14:37:45 UTC, Ola Fosheim Grøstad wrote:

>

I sense that in the end I basically will end up with my own backend, something I don't want to maintain...

I think you will end up with your own compiler :)

May 24, 2021

On Monday, 24 May 2021 at 12:38:59 UTC, zjh wrote:

>

We need big changes.
We need todolist(order by important).
We need to split big files into directories.
Small refactoring is useless.
Big changes are necessary.
We separate the stable part from the unstable part of the big file.And divided into small files.
According to dependence, change from the most dependent.
Interfacs or func name need not to change.
It's just that the organization has to be changed.
Nobody reads thousands of lines functions.
No one reads >100kb coding files because they are too large.
We just split up large files, not modify the function implementation.
Because modifying the function implementation is most likely to make mistakes.

100 kb is let's say 2500 slocs (or rather 1500 from the D-Scanner pov), that's not too crazy. Many DMD source files are big because they contain a visitor.
visitors cant be split in several files. Often you only actually are interested by a single method of a visitor so the overhall size of a source does not matter.

Eventually what could be done for the biggest methods of visitors is to extract parts of the content to several non-nested free functions, so that no more low level implementation details, like control loops, are visible and instead you just see do_this; do_that; with just a few, nzzcessarily unavoidable, flow statements.

The problem is that extracting and splitting the content would be tedious because of the decade of more or less well organized patchwork added to fix the bugs.

PS: backticks are for inline code, sourround with pairs of stars or pairs of underscores.

May 24, 2021

On Monday, 24 May 2021 at 19:42:00 UTC, user1234 wrote:

>

On Monday, 24 May 2021 at 12:38:59 UTC, zjh wrote:

>

[...]

100 kb is let's say 2500 slocs (or rather 1500 from the D-Scanner pov), that's not too crazy. Many DMD source files are big because they contain a visitor.
visitors cant be split in several files. Often you only actually are interested by a single method of a visitor so the overhall size of a source does not matter.

Eventually what could be done for the biggest methods of visitors is to extract parts of the content to several non-nested free functions, so that no more low level implementation details, like control loops, are visible and instead you just see do_this; do_that; with just a few, nzzcessarily unavoidable, flow statements.

Actually, the visitors have been slowly getting converted into nested functions and a switch table.

May 24, 2021

On Monday, 24 May 2021 at 10:47:16 UTC, Johan Engelen wrote:

>

[...]

>

My standpoint on the original topic of "make it easier to experiment with the compiler": I disagree with making the code more stable. If anything, we should be refactoring much more aggressively, changing function names etc.

Yes. It's easier to understand shallow trees with modest leaves than arbitrary graphs with 1000+ LOC "leaves". Getting there will take some work. Fortunately, it looks like much of that work can be done "bottom up" i.e. incrementally.

When simplifying code readability is a commonly applied metric. How long does it take for an intelligent but "outside" developer to understand the code? Another useful metric is the degree of dynamic dependence: could this code run in parallel? If not, why not?

Examining the ability to run in parallel can also be done "bottom up", and is at least as valuable for simplification/correctness as it is for parallel speedup potential. That said, a taskification that followed our, sometimes extreme, code expansion contours could yield speedups that coarser approaches to multi-threading do not. It could also bring vibe style sanity in place of manually managed asynchrony where the dependencies are carried in your head.

When looking to foster task independence, building around dependency graphs which are immutable/committed in the interior and expanding/mutating/synchronizing at the frontier, is one way to go. (__traits compiles is interesting in this context...) The SDC people will have other ideas/experience to share if taskification becomes a thing.

Finally, my thanks again to the current front end crew and the LDC/dcompute crew. The tool chain may not be perfect but, boy, it's way better than falling back to C++/CUDA.

May 24, 2021

On Monday, 24 May 2021 at 15:16:34 UTC, sighoya wrote:

>

On Monday, 24 May 2021 at 14:37:45 UTC, Ola Fosheim Grøstad wrote:

>

I sense that in the end I basically will end up with my own backend, something I don't want to maintain...

I think you will end up with your own compiler :)

I think we need to learn from Apple and Microsoft, they are doing well, not only because of resources, but because they let people be specialists on certain aspects of the compiler. D has people who has specialized on the GC and LLVM, but it isnt a deliberate strategy... Yet.

Building a racing car is not a one man project...