H. S. Teoh
Posted in reply to rikki cattermole
| On Wed, Apr 27, 2022 at 03:57:24AM +1200, rikki cattermole via Digitalmars-d wrote:
> On 27/04/2022 3:35 AM, H. S. Teoh wrote:
[...]
> > What I mean is this: my projects often involve a main executable, which is the primary target of the project, plus several helpers, which are either secondary targets sharing most of the same sources, or code generators that create one or more targets required to compile the main executable. Occasionally, there may also be auxiliary targets like HTML pages, procedurally-generated images, and other resources.
[...]
> > As far as I know -- and if I'm wrong I'd be happy to be corrected -- dub is unable to handle the above (at least not natively -- I'd have to write my own code for building the non-D parts of the build AFAIK, which defeats the purpose of having a build system in the first place).
>
> Pre build commands.
>
> For D stuff in dub something like this works fine.
>
> "preBuildCommands": ["dub run package:tool -- args"]
Does this mean I have to create an entire subpackage just for this purpose? Or in fact, one subpackage per auxiliary target? If so, that would seem needlessly cumbersome for something that, in my mind, is a trivial additional node in the build graph.
Also, treating these auxiliary build targets as second-class citizens doesn't really sit right with me. I mean, after all, it all boils down to "build sources S1, S2, ... into targets T1, T2, ... by running command(s) C1, C2, ...". What if I decide to insert a postprocessing step in the middle of one of these build chains? E.g., after creating a HTML file, before installing it to the staging area, I decide that I want to run a HTML tidying utility on it? Does that mean I have to create another subpackage to represent this extra step?
> But what you are describing is something automatic, which is not currently supported.
What do you mean by "automatic"? These targets are generally not automatically inferrable, i.e., I'm not expecting that if I say "build xyz.html" dub would magically know that in order to build HTML files it needs to compile a.d, b.d, c.d into abc.exe and run abc.exe on xyz.template in order to produce xyz.html. Obviously these build steps must be explicitly stated somewhere.
But I do expect that build products generated by these steps would be smoothly integrated into the build, i.e., if "code.template" is preprocessed by some tool "helper.exe" to produce "code.d", then there should be a way to compile "code.d" into the main executable as well.
[...]
> > - Network dependence (I'd *really* like for it *not* to depend on
> > internet access being available by default, only when I ask it
> > to). IIRC there's some switch or option that does this, it would
> > be nice if there was a local setting I could toggle to make this
> > automatic.
>
> https://dub.pm/settings
>
> So yeah settings file already supports this.
Which setting disables network lookup by default? Glancing at that page, it's not obvious which setting it is and what value I should set it to.
> > - Performance: is there an option to skip the expensive NP-complete
> > dependency resolution step at the beginning for faster turnaround
> > time? When I'm debugging something I do *not* want dub to do
> > anything except recompile local source, no network access, no
> > package dependency resolution, nothing, just *build* the darned
> > thing and leave it at that.
>
> I've had a look at this, it would take a good bit of refactoring to split this out into dub.selections.json *I think*.
>
> But yeah you're right, if nothing has changed it should be cached.
Not just that, when I'm recompiling a project during debugging, I don't want dub to look at the network *at all*. I don't care if upstream has released a critical zero-day exploit fix, I do NOT want the code to suddenly change from under me when I'm trying to trace down a segfault. I want it to just build the sources that are currently on the local machine, and that's it.
Also, sometimes if I'm on the road without internet access, I do not want to suddenly become unable to build my project.
> > - Reproducibility: if I change one source file out of a directory of
> > 50, I want the build system to be able to detect that one change,
> > determine the *minimum* sequence of actions to update current
> > targets, and run only those actions. After running these actions,
> > the targets should be in EXACTLY the same state as if I had
> > rebuilt the entire workspace from a clean checkout. And this
> > should NOT be dependent on the current state of the workspace (it
> > should know to overwrite stale intermediates, etc., so that the
> > final targets are in the correct state).
>
> I was questioning if the problem here is the compiler stuff, but its not.
>
> However, I don't think that this should be the default. Processing all of those dates, caching them... yeah won't be cheap either.
Two comments here:
1) Dates should NOT be used as the basis for detecting changes, because
it's not reliable. Preferably some kind of checksum should be used (a
cheap one like md5 or CRC would do -- we don't need strong crypto
strength here). Why? Because sometimes, an updated timestamp does
*not* mean the file actually changed.
For example, if I `git checkout` a branch to look at something and
switch back later, the file may have been touched during the switch,
but afterwards its contents are identical to when it was last built.
In this case, targets that depend on that file do not need to be
rebuilt; it can be skipped entirely. This can sometimes lead to
better performance, e.g., if a commonly-imported module is touched in
this way, but you realize it hasn't actually changed, you can prune
away large parts of the build graph.
2) The performance issue has already been solved, see for example:
https://gittup.org/tup/
The underlying idea is: *don't* scan the entire source tree to detect
changes, use modern OS facilities (inotify, FileSystemWatcher, etc.)
to let the OS tell you when something changes. This allows the build
time to be O(n), where n is the size of the change, rather than O(N)
where N is the size of the workspace. This is important for
scalability to large projects where N is usually significantly larger
than n.
T
--
An elephant: A mouse built to government specifications. -- Robert Heinlein
|