Adding ccache-like output caching to dmd - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Adding ccache-like output caching to dmd

Thread overview

Adding ccache-like output caching to dmd
Dec 28, 2020 Per Nordlöw
Dec 29, 2020 Max Haughton
Dec 29, 2020 Stefan Koch
Dec 29, 2020 John Colvin
Dec 29, 2020 Per Nordlöw
Dec 29, 2020 John Colvin
Dec 29, 2020 John Colvin
Dec 30, 2020 drug
Dec 29, 2020 Petar Kirov [ZombineDev]
Dec 29, 2020 Petar Kirov [ZombineDev]
Dec 30, 2020 Per Nordlöw
Dec 29, 2020 Per Nordlöw
Dec 29, 2020 Per Nordlöw
Dec 29, 2020 Per Nordlöw
Dec 29, 2020 Ali Çehreli
Dec 29, 2020 Petar Kirov [ZombineDev]
Dec 29, 2020 Johan

December 28, 2020

Adding ccache-like output caching to dmd

Posted by Per Nordlöw

Per Nordlöw

Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on

- environment variables,
- process arguments which, in turn, decide
- input file contents (including import files detected upon first uncached compile)
- dmd compiler binary fingerprint
- ...probably something more I missed

Initial call stores that list alongside content hash and resulting binary(s).

If not, would anyone have any strong objections against adding this?

December 29, 2020

Re: Adding ccache-like output caching to dmd

Posted by Max Haughton
in reply to Per Nordlöw

Max Haughton

Posted in reply to Per Nordlöw

On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
> Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on
>
> - environment variables,
> - process arguments which, in turn, decide
> - input file contents (including import files detected upon first uncached compile)
> - dmd compiler binary fingerprint
> - ...probably something more I missed
>
> Initial call stores that list alongside content hash and resulting binary(s).
>
> If not, would anyone have any strong objections against adding this?

If it's implemented in a sensible manner I don't see why not. My only worry would be that dmd code tends to be a weird blend of C, C++, and Java - if the cache is properly wrapped up in a way that compartmentalizes the things that can go wrong then go for it.

December 29, 2020

Re: Adding ccache-like output caching to dmd

Posted by Stefan Koch
in reply to Per Nordlöw

Stefan Koch

Posted in reply to Per Nordlöw

On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
> Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on
>
> - environment variables,
> - process arguments which, in turn, decide
> - input file contents (including import files detected upon first uncached compile)
> - dmd compiler binary fingerprint
> - ...probably something more I missed
>
> Initial call stores that list alongside content hash and resulting binary(s).
>
> If not, would anyone have any strong objections against adding this?

The issue is that because of string imports you don't know the full set of files you are depending on.
which means any change can cause any file to be required.

December 29, 2020

Re: Adding ccache-like output caching to dmd

Posted by Ali Çehreli
in reply to Per Nordlöw

Ali Çehreli

Posted in reply to Per Nordlöw

On 12/28/20 3:14 PM, Per Nordlöw wrote:
> Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on

Related: https://forum.dlang.org/post/r812of$11n7$1@digitalmars.com

Ali

December 29, 2020

Re: Adding ccache-like output caching to dmd

Posted by John Colvin
in reply to Stefan Koch

John Colvin

Posted in reply to Stefan Koch

On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:
> On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
>> Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on
>>
>> - environment variables,
>> - process arguments which, in turn, decide
>> - input file contents (including import files detected upon first uncached compile)
>> - dmd compiler binary fingerprint
>> - ...probably something more I missed
>>
>> Initial call stores that list alongside content hash and resulting binary(s).
>>
>> If not, would anyone have any strong objections against adding this?
>
> The issue is that because of string imports you don't know the full set of files you are depending on.
> which means any change can cause any file to be required.

In general it's unknown what files a given D build depends on until after the build has (mostly) happened. This is true for string imports, but also for regular imports.

Conceptually we split inputs in to:

Y: inputs knowable only after compilation is done (set of the contents of all imported files, string or code)
X: inputs known ahead of time (e.g. the command line flags to DMD).

Object files are O.

The set of file names containing Y are referred to by S.

Compiler is then a pure function F(X, Y) -> O.

Real compiler invocation is C(X, [Y]) -> O where [Y] means Y is implicit.

But the compiler can give us S, so we can instead say compiler is C(X, [Y]) -> (O, S).

The only way S will change is if X or Y change.

It (roughly :-p ) follows that we can build a persistent nested map Hash(X) -> ((S, Hash(Y)) -> O).

We calculate Hash(X) before compiling and look up in the map to get (S, Hash(Y)). If it's not there then you need to recompile and store a new entry in the outer map. If it is, then read all the files in S and use that to calculate Hash(Y)', if Hash(Y)' == Hash(Y) then proceed to get O, else recompile and store a new entry in the inner map.

Or something like that, you get the idea... It's not intractable, it's just a bit fiddly.

December 29, 2020

Re: Adding ccache-like output caching to dmd

Posted by Petar Kirov [ZombineDev]
in reply to Per Nordlöw

Petar Kirov [ZombineDev]

Posted in reply to Per Nordlöw

On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
> Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on
>
> - environment variables,
> - process arguments which, in turn, decide
> - input file contents (including import files detected upon first uncached compile)
> - dmd compiler binary fingerprint
> - ...probably something more I missed
>
> Initial call stores that list alongside content hash and resulting binary(s).
>
> If not, would anyone have any strong objections against adding this?

Or we could just use Nix [1] (TL;DR version - [2]) :P

That said, Nix mostly with high-level caching, and won't help with incremental compilation.

Checkout the previous efforts in this area: [3] [4]

[1]: https://edolstra.github.io/pubs/phd-thesis.pdf
[2]: https://nixos.org/guides/how-nix-works.html
[3]: https://www.youtube.com/watch?v=WHb7y3JYEBQ
[4]: https://github.com/dlang/dmd/pull/7843

December 29, 2020

Re: Adding ccache-like output caching to dmd

Posted by Petar Kirov [ZombineDev]
in reply to Stefan Koch

Petar Kirov [ZombineDev]

Posted in reply to Stefan Koch

On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:
> On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
>> Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on
>>
>> - environment variables,
>> - process arguments which, in turn, decide
>> - input file contents (including import files detected upon first uncached compile)
>> - dmd compiler binary fingerprint
>> - ...probably something more I missed
>>
>> Initial call stores that list alongside content hash and resulting binary(s).
>>
>> If not, would anyone have any strong objections against adding this?
>
> The issue is that because of string imports you don't know the full set of files you are depending on.
> which means any change can cause any file to be required.

If we pass the complete set of files (instead of using relying on [string] import paths, which not very precise), this definitely doable.

Sure, the developer "experience" would be a bit more clumsy, but not a big deal either - a wrapper tool could first compile your code with `dmd -i -makedeps` [1] and then save the currently known set of files and then the incremental compilation would use it.

[1]: coming soon: https://github.com/dlang/dmd/pull/12049

December 29, 2020

Re: Adding ccache-like output caching to dmd

Posted by Petar Kirov [ZombineDev]
in reply to Petar Kirov [ZombineDev]

Petar Kirov [ZombineDev]

Posted in reply to Petar Kirov [ZombineDev]

On Tuesday, 29 December 2020 at 17:41:49 UTC, Petar Kirov [ZombineDev] wrote:
> On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:
>> On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
>>> [...]
>>
>> The issue is that because of string imports you don't know the full set of files you are depending on.
>> which means any change can cause any file to be required.
>
> If we pass the complete set of files (instead of using relying on [string] import paths, which not very precise), this definitely doable.
>
> Sure, the developer "experience" would be a bit more clumsy, but not a big deal either - a wrapper tool could first compile your code with `dmd -i -makedeps` [1] and then save the currently known set of files and then the incremental compilation would use it.
>
> [1]: coming soon: https://github.com/dlang/dmd/pull/12049

Edit: What John Colvin said :D

December 29, 2020

Re: Adding ccache-like output caching to dmd

Posted by Johan
in reply to Per Nordlöw

Johan

Posted in reply to Per Nordlöw

On Monday, 28 December 2020 at 23:14:02 UTC, Per Nordlöw wrote:
> Has anyone considered integrating into a `dmd` a ccache-like caching of output files indexed by digests based on
>
> - environment variables,
> - process arguments which, in turn, decide
> - input file contents (including import files detected upon first uncached compile)
> - dmd compiler binary fingerprint
> - ...probably something more I missed
>
> Initial call stores that list alongside content hash and resulting binary(s).
>
> If not, would anyone have any strong objections against adding this?

FWIW, I feel this is much better handled by a build system that invokes the compiler, and not by the compiler itself. Handling the build environment, input/intermediate/output files (timestamps, interdependencies etc.), invoking (or caching) the substep tool, ..., are core tasks of a build system tool. Caching would add a lot of non-core-task complexity to a compiler.

The specific task of optimization and machine code generation is cachable by LDC (see `--cache`), but that is much more limited task.

-Johan

December 29, 2020

Re: Adding ccache-like output caching to dmd

Posted by Per Nordlöw
in reply to Stefan Koch

Per Nordlöw

Posted in reply to Stefan Koch

On Tuesday, 29 December 2020 at 12:49:45 UTC, Stefan Koch wrote:
> The issue is that because of string imports you don't know the full set of files you are depending on.
> which means any change can cause any file to be required.

If we, in dmd, during the initial (uncached) build log all the imported files including string imports and output them to a cache description together with their individual content hashes and pessimistically rebuild every time anything changes I don't see how this can be an issue. Can you elaborate on which case I've missed?

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation