Jump to page: 1 2
Thread overview
Purity, memoization and parallelization of dmd
Jul 16, 2020
Per Nordlöw
Jul 16, 2020
Paul Backus
Jul 16, 2020
Stefan Koch
Jul 16, 2020
Per Nordlöw
Jul 20, 2020
Atila Neves
Jul 20, 2020
Stefan Koch
Jul 21, 2020
Atila Neves
Jul 22, 2020
Atila Neves
Jul 21, 2020
Bruce Carneal
Jul 21, 2020
Stefan Koch
July 16, 2020
What's the status/progress on making dmd (completely) pure?

Is this task on somebody's agenda? If so, are there any big obstacles that currently has no clear solution or is just a very large pile of small ones?

And, in the long run, will a pure compiler (finally) enable caching/memoization of, for instance, template instantiations/ctfe-evaluations and, perhaps further into future, parallelization of the compiler?
July 16, 2020
On Thursday, 16 July 2020 at 18:21:11 UTC, Per Nordlöw wrote:
> What's the status/progress on making dmd (completely) pure?
>
> Is this task on somebody's agenda? If so, are there any big obstacles that currently has no clear solution or is just a very large pile of small ones?
>
> And, in the long run, will a pure compiler (finally) enable caching/memoization of, for instance, template instantiations/ctfe-evaluations and, perhaps further into future, parallelization of the compiler?

DMD uses mutable state for basically everything, so I don't think it is likely to ever be completely pure. I believe there is an ongoing effort to make individual functions pure when possible, though I'm not sure how much progress is being made.

Template instantiations are already cached (which actually causes buggy behavior, because they are not quite pure [1]).

[1] https://issues.dlang.org/show_bug.cgi?id=19458
July 16, 2020
On Thursday, 16 July 2020 at 18:21:11 UTC, Per Nordlöw wrote:
> What's the status/progress on making dmd (completely) pure?
>
> Is this task on somebody's agenda? If so, are there any big obstacles that currently has no clear solution or is just a very large pile of small ones?
>
> And, in the long run, will a pure compiler (finally) enable caching/memoization of, for instance, template instantiations/ctfe-evaluations and, perhaps further into future, parallelization of the compiler?

Natural obstacles with possible solutions are

- Global variables (of course) that should be stored in structs and/or classes
- Debug printing (fixed by prepending `debug` in front of printf's)
- File I/O can be wrapped in (fake)-pure input- and output-ranges that are lazy eagerly forwarded to stdout and stderr.


Have any of the alternative (experimental) D compilers been written with these things in mind? I recall there is any experimental D compiler which is complete lazy.
July 16, 2020
On Thursday, 16 July 2020 at 18:55:23 UTC, Paul Backus wrote:

> Template instantiations are already cached (which actually causes buggy behavior, because they are not quite pure [1]).
>
> [1] https://issues.dlang.org/show_bug.cgi?id=19458

Good you've post this here.
I would have overlooked it otherwise.
This is a critical bug.

July 20, 2020
On Thursday, 16 July 2020 at 18:21:11 UTC, Per Nordlöw wrote:
> What's the status/progress on making dmd (completely) pure?
>
> Is this task on somebody's agenda? If so, are there any big obstacles that currently has no clear solution or is just a very large pile of small ones?
>
> And, in the long run, will a pure compiler (finally) enable caching/memoization of, for instance, template instantiations/ctfe-evaluations and, perhaps further into future, parallelization of the compiler?

I don't think making the compiler parallel is particularly important since that should be handled at the build-system level (and if you use reggae, already is).
July 20, 2020
On Monday, 20 July 2020 at 10:39:08 UTC, Atila Neves wrote:
> On Thursday, 16 July 2020 at 18:21:11 UTC, Per Nordlöw wrote:
>> What's the status/progress on making dmd (completely) pure?
>>
>> Is this task on somebody's agenda? If so, are there any big obstacles that currently has no clear solution or is just a very large pile of small ones?
>>
>> And, in the long run, will a pure compiler (finally) enable caching/memoization of, for instance, template instantiations/ctfe-evaluations and, perhaps further into future, parallelization of the compiler?
>
> I don't think making the compiler parallel is particularly important since that should be handled at the build-system level (and if you use reggae, already is).

When using mata-programming it is possible to build huge monolithic
chains of dependencies which can't be broken up.

In that case using reggae doesn't help.
July 20, 2020
On Monday, 20 July 2020 at 10:39:08 UTC, Atila Neves wrote:
> On Thursday, 16 July 2020 at 18:21:11 UTC, Per Nordlöw wrote:
>> What's the status/progress on making dmd (completely) pure?
>>
>> Is this task on somebody's agenda? If so, are there any big obstacles that currently has no clear solution or is just a very large pile of small ones?
>>
>> And, in the long run, will a pure compiler (finally) enable caching/memoization of, for instance, template instantiations/ctfe-evaluations and, perhaps further into future, parallelization of the compiler?
>
> I don't think making the compiler parallel is particularly important since that should be handled at the build-system level (and if you use reggae, already is).

Build system level parallelism usually implies separate compilation (especially in the C++ world), however, if you're building at package-level granularity parallelism could be quite useful actually.
If you have a package, where many of its modules import each other, module-level separate compilation can be quite inefficient. For example, if a module has an immutable variable, the result of an expensive CTFE calculation, with separate compilation you would end up repeating the calculation every time this module is imported. With package-level compilation, it would be calculated only once.

I'd also say that build system-level caching is leaves a lot to be desired. At work we use various languages and frameworks where the compiler runs as a daemon process, listening for changes and then only recompiles parts of the program that changed. How big are the parts depends on the compiler implementation - it could be a file granularity, function granularity, or even a statement/expression granularity.

C#
https://github.com/dotnet/roslyn/wiki/EnC-Supported-Edits
https://joshvarty.com/2016/04/18/edit-and-continue-part-1-introduction/
https://joshvarty.com/2016/04/21/edit-and-continue-part-2-roslyn/

TS:
https://github.com/microsoft/TypeScript/wiki/Using-the-Compiler-API#incremental-build-support-using-the-language-services
https://github.com/microsoft/TypeScript/wiki/Using-the-Language-Service-API

Dart / Flutter:
https://flutter.dev/docs/development/tools/hot-reload
https://github.com/dart-lang/sdk/wiki/Hot-reload

Rust:
https://github.com/rust-lang/rfcs/blob/master/text/1298-incremental-compilation.md
https://blog.rust-lang.org/2016/09/08/incremental.html
https://internals.rust-lang.org/t/incremental-compilation-beta/4721
https://github.com/rust-lang/rust/issues/57968
https://blog.mozilla.org/nnethercote/2020/04/24/how-to-speed-up-the-rust-compiler-in-2020/
July 21, 2020
On Monday, 20 July 2020 at 12:58:39 UTC, Petar Kirov [ZombineDev] wrote:
> On Monday, 20 July 2020 at 10:39:08 UTC, Atila Neves wrote:
>> [...]
>
> Build system level parallelism usually implies separate compilation

Yes.

> (especially in the C++ world), however, if you're building at package-level granularity parallelism could be quite useful actually.


Yes.

> If you have a package, where many of its modules import each other, module-level separate compilation can be quite inefficient.

Yes.

> For example, if a module has an immutable variable, the result of an expensive CTFE calculation, with separate compilation you would end up repeating the calculation every time this module is imported. With package-level compilation, it would be calculated only once.

Correct. Which is why reggae defaults to building per package.

> I'd also say that build system-level caching is leaves a lot to be desired. At work we use various languages and frameworks where the compiler runs as a daemon process, listening for changes and then only recompiles parts of the program that changed. How big are the parts depends on the compiler implementation - it could be a file granularity, function granularity, or even a statement/expression granularity.

That is my dream for D. If the compiler *is* the build system, then sure, parallelise the compiler. Currently, I don't see the point of even trying.

July 21, 2020
On Tuesday, 21 July 2020 at 11:37:16 UTC, Atila Neves wrote:
> [..]
>
> That is my dream for D. If the compiler *is* the build system, then sure, parallelise the compiler. Currently, I don't see the point of even trying.

In one of the web technologies we use at work, the compiler is used as a library by the build system to build a dependency graph (based on the imports) of all code and non-code assets. Then there is a declarative way to describe the transformations (compilation, minification, media encoding, etc.) that need to be done on each part of the project. The linking step (like in C/C++) is implicit - it's like you invoke the linker which works in reverse to figure out that in order to link dependencies in the form of libraries A and B it needs to first compile them with compilers X and Y.


July 21, 2020
On Tuesday, 21 July 2020 at 13:29:55 UTC, Petar Kirov [ZombineDev] wrote:
> On Tuesday, 21 July 2020 at 11:37:16 UTC, Atila Neves wrote:
>> [..]
>>
>> That is my dream for D. If the compiler *is* the build system, then sure, parallelise the compiler. Currently, I don't see the point of even trying.
>
> In one of the web technologies we use at work, the compiler is used as a library by the build system to build a dependency graph (based on the imports) of all code and non-code assets. Then there is a declarative way to describe the transformations (compilation, minification, media encoding, etc.) that need to be done on each part of the project. The linking step (like in C/C++) is implicit - it's like you invoke the linker which works in reverse to figure out that in order to link dependencies in the form of libraries A and B it needs to first compile them with compilers X and Y.

This increases the coupling between those toolchain projects, but in the end, it works pretty well for end-users, like us. Of course, the compiler can still be used from the command line and we could use regular build systems like Make, but we would lose a lot if we go back to those "archaic" ways :D
« First   ‹ Prev
1 2