Jump to page: 1 2
Thread overview
Adding ccache-like output caching to dmd
Dec 28, 2020
Per Nordlöw
Dec 29, 2020
Max Haughton
Dec 29, 2020
Stefan Koch
Dec 29, 2020
John Colvin
Dec 29, 2020
Per Nordlöw
Dec 29, 2020
John Colvin
Dec 29, 2020
John Colvin
Dec 30, 2020
drug
Dec 30, 2020
Per Nordlöw
Dec 29, 2020
Per Nordlöw
Dec 29, 2020
Per Nordlöw
Dec 29, 2020
Per Nordlöw
Dec 29, 2020
Ali Çehreli
Dec 29, 2020
Johan
December 28, 2020
Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on

- environment variables,
- process arguments which, in turn, decide
- input file contents (including import files detected upon first uncached compile)
- dmd compiler binary fingerprint
- ...probably something more I missed

Initial call stores that list alongside content hash and resulting binary(s).

If not, would anyone have any strong objections against adding this?
December 29, 2020
On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
> Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on
>
> - environment variables,
> - process arguments which, in turn, decide
> - input file contents (including import files detected upon first uncached compile)
> - dmd compiler binary fingerprint
> - ...probably something more I missed
>
> Initial call stores that list alongside content hash and resulting binary(s).
>
> If not, would anyone have any strong objections against adding this?

If it's implemented in a sensible manner I don't see why not. My only worry would be that dmd code tends to be a weird blend of C, C++, and Java - if the cache is properly wrapped up in a way that compartmentalizes the things that can go wrong then go for it.
December 29, 2020
On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
> Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on
>
> - environment variables,
> - process arguments which, in turn, decide
> - input file contents (including import files detected upon first uncached compile)
> - dmd compiler binary fingerprint
> - ...probably something more I missed
>
> Initial call stores that list alongside content hash and resulting binary(s).
>
> If not, would anyone have any strong objections against adding this?

The issue is that because of string imports you don't know the full set of files you are depending on.
which means any change can cause any file to be required.
December 29, 2020
On 12/28/20 3:14 PM, Per Nordlöw wrote:
> Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on

Related: https://forum.dlang.org/post/r812of$11n7$1@digitalmars.com

Ali

December 29, 2020
On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:
> On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
>> Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on
>>
>> - environment variables,
>> - process arguments which, in turn, decide
>> - input file contents (including import files detected upon first uncached compile)
>> - dmd compiler binary fingerprint
>> - ...probably something more I missed
>>
>> Initial call stores that list alongside content hash and resulting binary(s).
>>
>> If not, would anyone have any strong objections against adding this?
>
> The issue is that because of string imports you don't know the full set of files you are depending on.
> which means any change can cause any file to be required.

In general it's unknown what files a given D build depends on until after the build has (mostly) happened. This is true for string imports, but also for regular imports.

Conceptually we split inputs in to:

Y: inputs knowable only after compilation is done (set of the contents of all imported files, string or code)
X: inputs known ahead of time (e.g. the command line flags to DMD).

Object files are O.

The set of file names containing Y are referred to by S.

Compiler is then a pure function F(X, Y) -> O.

Real compiler invocation is C(X, [Y]) -> O where [Y] means Y is implicit.

But the compiler can give us S, so we can instead say compiler is C(X, [Y]) -> (O, S).

The only way S will change is if X or Y change.


It (roughly :-p ) follows that we can build a persistent nested map Hash(X) -> ((S, Hash(Y)) -> O).

We calculate Hash(X) before compiling and look up in the map to get (S, Hash(Y)). If it's not there then you need to recompile and store a new entry in the outer map. If it is, then read all the files in S and use that to calculate Hash(Y)', if Hash(Y)' == Hash(Y) then proceed to get O, else recompile and store a new entry in the inner map.

Or something like that, you get the idea... It's not intractable, it's just a bit fiddly.
December 29, 2020
On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
> Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on
>
> - environment variables,
> - process arguments which, in turn, decide
> - input file contents (including import files detected upon first uncached compile)
> - dmd compiler binary fingerprint
> - ...probably something more I missed
>
> Initial call stores that list alongside content hash and resulting binary(s).
>
> If not, would anyone have any strong objections against adding this?

Or we could just use Nix [1] (TL;DR version - [2]) :P

That said, Nix mostly with high-level caching, and won't help with incremental compilation.

Checkout the previous efforts in this area: [3] [4]

[1]: https://edolstra.github.io/pubs/phd-thesis.pdf
[2]: https://nixos.org/guides/how-nix-works.html
[3]: https://www.youtube.com/watch?v=WHb7y3JYEBQ
[4]: https://github.com/dlang/dmd/pull/7843
December 29, 2020
On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:
> On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
>> Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on
>>
>> - environment variables,
>> - process arguments which, in turn, decide
>> - input file contents (including import files detected upon first uncached compile)
>> - dmd compiler binary fingerprint
>> - ...probably something more I missed
>>
>> Initial call stores that list alongside content hash and resulting binary(s).
>>
>> If not, would anyone have any strong objections against adding this?
>
> The issue is that because of string imports you don't know the full set of files you are depending on.
> which means any change can cause any file to be required.

If we pass the complete set of files (instead of using relying on [string] import paths, which not very precise), this definitely doable.

Sure, the developer "experience" would be a bit more clumsy, but not a big deal either - a wrapper tool could first compile your code with `dmd -i -makedeps` [1] and then save the currently known set of files and then the incremental compilation would use it.

[1]: coming soon: https://github.com/dlang/dmd/pull/12049
December 29, 2020
On Tuesday, 29 December 2020 at 17:41:49 UTC, Petar Kirov [ZombineDev] wrote:
> On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:
>> On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
>>> [...]
>>
>> The issue is that because of string imports you don't know the full set of files you are depending on.
>> which means any change can cause any file to be required.
>
> If we pass the complete set of files (instead of using relying on [string] import paths, which not very precise), this definitely doable.
>
> Sure, the developer "experience" would be a bit more clumsy, but not a big deal either - a wrapper tool could first compile your code with `dmd -i -makedeps` [1] and then save the currently known set of files and then the incremental compilation would use it.
>
> [1]: coming soon: https://github.com/dlang/dmd/pull/12049

Edit: What John Colvin said :D
December 29, 2020
On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
> Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on
>
> - environment variables,
> - process arguments which, in turn, decide
> - input file contents (including import files detected upon first uncached compile)
> - dmd compiler binary fingerprint
> - ...probably something more I missed
>
> Initial call stores that list alongside content hash and resulting binary(s).
>
> If not, would anyone have any strong objections against adding this?

FWIW, I feel this is much better handled by a build system that invokes the compiler, and not by the compiler itself. Handling the build environment, input/intermediate/output files (timestamps, interdependencies etc.), invoking (or caching) the substep tool, ..., are core tasks of a build system tool. Caching would add a lot of non-core-task complexity to a compiler.

The specific task of optimization and machine code generation is cachable by LDC (see `--cache`), but that is much more limited task.

-Johan

December 29, 2020
On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:
> The issue is that because of string imports you don't know the full set of files you are depending on.
> which means any change can cause any file to be required.

If we, in dmd, during the initial (uncached) build log all the imported files including string imports and output them to a cache description together with their individual content hashes and pessimistically rebuild every time anything changes I don't see how this can be an issue. Can you elaborate on which case I've missed?
« First   ‹ Prev
1 2