February 26, 2019
On Monday, 25 February 2019 at 22:55:18 UTC, H. S. Teoh wrote:
> On Mon, Feb 25, 2019 at 10:14:18PM +0000, Rubn via Digitalmars-d wrote:
>> On Monday, 25 February 2019 at 19:28:54 UTC, H. S. Teoh wrote:
> [...]
>> > <off-topic rant>
>> > This is a perfect example of what has gone completely wrong in the world
>> > of build systems. Too many assumptions and poor designs over an
>> > extremely simple and straightforward dependency graph walk algorithm,
>> > that turn something that ought to be trivial to implement into a
>> > gargantuan task that requires a dedicated job title like "build
>> > engineer".  It's completely insane, yet people accept it as a fact of
>> > life. It boggles the mind.
>> > </off-topic rant>
> [...]
>> I don't think it is as simple as you make it seem. Especially when you need to start adding components that need to be build that isn't source code.
>
> It's very simple. The build description is essentially a DAG whose nodes represent files (well, any product, really, but let's say files for a concrete example), and whose edges represent commands that transform input files into output files. All the build system has to do is to do a topological walk of this DAG, and execute the commands associated with each edge to derive the output from the input.
>
> This is all that's needed. The rest are all fluff.
>
> The basic problem with today's build systems is that they impose arbitrary assumptions on top of this simple DAG. For example, all input nodes are arbitrarily restricted to source code files, or in some bad cases, source code of some specific language or set of languages. Then they arbitrarily limit edges to be only compiler invocations and/or linker invocations.  So the result is that if you have an input file that isn't source code, or if the output file requires invoking something other than a compiler/linker, then the build system doesn't support it and you're left out in the cold.
>
> Worse yet, many "modern" build systems assume a fixed depth of paths in the graph, i.e., you can only compile source files into binaries, you cannot compile a subset of source files into an auxiliary utility that in turn generates new source files that are then compiled into an executable.  So automatic code generation is ruled out, preprocessing is ruled out, etc., unless you shoehorn all of that into the compiler invocation, which is a ridiculous idea.

What build systems are you talking about here? I mean I can search for programs that do certain things and I'll most definitely find more subpar ones than any spectacular ones. Especially if they are free. So we are on the same page on which build systems you are referring to.

> None of these restrictions are necessary, and they only needlessly limit what you can do with your build system.
>
> I understand that these assumptions are primarily to simplify the build description, e.g., by inferring dependencies so that you don't have to specify edges and nodes yourself (which is obviously impractical for large projects).  But these additional niceties ought to be implemented as a SEPARATE layer on top of the topological walk, and the user should not be arbitrarily prevented from directly accessing the DAG description.  The way so many build systems are designed is that either you have to do everything manually, like makefiles, which everybody hates, or the hood is welded shut and you can only do what the authors decide that you should be able to do and nothing else.
>
>
> [...]
>> It's easy to say build-systems are overly complicated until you actually work on a big project.
>
> You seem to think that I'm talking out of an ivory tower.  I assure you I know what I'm talking about.  I have written actual build systems that do things like this:
>
> - Compile a subset of source files into a utility;
>
> - Run said utility to transform certain input data files into source
>   code;
>
> - Compile the generated source code into executables;
>
> - Run said executables on other data files to transform the data into
>   PovRay scene files;
>
> - Run PovRay to produce images;
>
> - Run post-processing utilities on said images to crop / reborder them;
>
> - Run another utility to convert these images into animations;
>
> - Install these animations into a target directory.
>
> - Compile another set of source files into a different utility;
>
> - Run said utility on input files to transform them to PHP input files;
>
> - Run php-cli to generate HTML from said input files;
>
> - Install said HTML files into a target directory.
>
> - Run a network utility to retrieve the history of a specific log file
>   and pipe it through a filter to extract a list of dates.
>
> - Run a utility to transform said dates into a gnuplot input file for
>   generating a graph;
>
> - Run gnuplot to create the graph;
>
> - Run postprocessing image utilities to touch up the image;
>
> - Install the result into the target directory.

Yes doing all those things isn't all that difficult, it really is just a matter of calling a different program to generate the file. The difficulty of build systems comes in when you have an extremely large project that takes a long time to build.

> None of the above are baked-in rules. The user is fully capable of specifying whatever transformation he wants on whatever inputs he wants to produce whatever output he wants.  No straitjackets, no stupid hacks to work around stupid build system limitations. Tell it how you want your inputs to be transformed into outputs, and it handles the rest for you.
>
> Furthermore, the build system is incremental: if I modify any of the above input files, it automatically runs the necessary commands to derive the updated output files AND NOTHING ELSE (i.e., it does not needlessly re-derive stuff that hasn't changed).  Better yet, if any of the intermediate output files are identical to the previous outputs, the build stops right there and does not needlessly recreate other outputs down the line.
>
> The build system is also reliable: running the build in a dirty workspace produces identical products as running the build in a fresh checkout.  I never have to worry about doing the equivalent of 'make clean; make', which is a stupid thing to have to do in 2019. I have a workspace that hasn't been "cleaned" for months, and running the build on it produces exactly the same outputs as a fresh checkout.

It really depends on what you are building. Working on DMD I don't have to do a clean, doing a bisect though I effective have to do a clean every new commit.

> There's more I can say, but basically, this is the power that having direct access to the DAG can give you.  In this day and age, it's inexcusable not to be able to do this.
>
> Any build system that cannot do all of the above is a crippled build system that I will not use, because life is far too short to waste fighting with your build system rather than getting things done.
>
>
> T

The build systems I've used can do all that, the problem is about functionality so much as the ease of achieving that functionality. I just use a script, don't need a build system but doing a fully build of my project only takes 10 seconds so I have that luxury.
February 25, 2019
On Mon, Feb 25, 2019 at 12:25 PM Andrei Alexandrescu via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>
> On 2/25/19 2:04 PM, Manu wrote:
> > On Mon, Feb 25, 2019 at 10:10 AM Andrei Alexandrescu via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> >>
> >> On 2/25/19 5:20 AM, Jacob Carlborg wrote:
> >>> On 2019-02-25 02:04, Manu wrote:
> >>>
> >>>> Why wouldn't you do it in the same pass as the .di output?
> >>>
> >>> * Separation of concerns
> >
> > Are we planning to remove .di output?
> >
> >>> * Simplifying the compiler ("simplifying" is not the correct description, rather avoid making the compiler more complex)
> >
> > It seems theoretically very simple to me; whatever the .di code looks
> > like, I can imagine a filter for isExternCorCPP() on candidate nodes
> > when walking the AST. Seems like a pretty simple tweak of the existing
> > code... but I haven't looked at it.
> > I suspect 1 line in the AST walk code, and 99% of the job, a big ugly
> > block that emits a C++ declaration instead of the D declaration?
> >
> >> Indeed so. There's also the network effect of tooling. Integrating within the compiler would be like the proverbial "giving someone a fish", whereas framing it as a tool that can be the first inspiring many others is akin to "teaching fishing".
> >
> > That sounds nice, but it's bollocks though; give me dtoh, i'm about 95% less likely to use it. It's easy to add a flag to the command line of our hyper-complex build, but reworking custom tooling into it, not so much.
>
> More like dog's ones, right? :o)
>
> There are indeed arguments going either way. The point is a universe of tools can be built based on the compiler as a library, of which only a minority should be realistically integrated within the compiler itself.

Right, but in this case, the technology is *already* in the compiler. I'm not suggesting a large new development, just a filter on the output of the existing pass with a bit of a re-format. That form would be so much more readily useful.

> That said, I'd take such work in either form!

Perhaps. But I'd like to strongly encourage a form that's useful to me as best as I can... otherwise it's just a nice talking point and still no practical solution.
February 25, 2019
On Mon, Feb 25, 2019 at 05:24:00PM -0800, Manu via Digitalmars-d wrote:
> On Mon, Feb 25, 2019 at 2:55 PM H. S. Teoh via Digitalmars-d
[...]
> > It's very simple. The build description is essentially a DAG whose nodes represent files (well, any product, really, but let's say files for a concrete example), and whose edges represent commands that transform input files into output files. All the build system has to do is to do a topological walk of this DAG, and execute the commands associated with each edge to derive the output from the input.
> 
> Problem #1:
> You don't know the edges of the DAG until AFTER you run the compiler
> (ie, discovering imports/#includes, etc from the source code)

Yes, that's what scanners are for.  There can be standard scanners for common languages like C, C++, Java, C#, etc..  I didn't say that you have to write DAG nodes and edges by hand.  But my point is that prebaked automatic scanning rules of this sort should not *exclude* you from directly adding your own DAG nodes and edges.  Build systems like SCons offer an interface for building your own scanners, for example.


> You also want to run the build with all 64 cores in your machine.

Build systems like SCons offer parallel building out-of-the-box, and require no additional user intervention. That's proper design. Makefiles require special care when writing rules in order not to break, and you (last time I checked) have to explicitly specify which rules are parallelizable.  That's bad design.


> File B's build depends on file A's build output, but it can't know that until after it attempts (and fails) to build B...
> 
> How do you resolve this tension?

There is no tension. You just do a topological walk on the DAG and run the steps in order. If a step fails, all subsequent steps related to that target are aborted. (Any other products that didn't fail may still continue in that case.)  Parallelization works by identifying DAG nodes that aren't dependent on each other and running them in parallel. A proper build system handles this automatically without user intervention.

Unless you're talking about altering the DAG as you go -- SCons *does* in fact handle this case.  You just have to sequence your build steps such that any new products/targets that are introduced don't invalidate prior steps. A topological walk usually already solves this problem, as long as you don't ask for impossible things like building target A also adds a new dependency to unrelated target B. In the normal case, building A adds dependency to downstream target C (which depends on A), but that's no problem because the topological walk guarantees A is built before C, and by then, we already know of the new dependency and can handle it correctly.

I'm starting to sound like I'm promoting SCons as the best thing since sliced bread, but actually SCons has its own share of problems.  But I'm just using it as an example of a design that got *some* things right. A good number of things, in fact, in spite of the warts that still exist. It's a lot saner than, say, make, and that's my point.  Such a design is possible, and has been done (the multi-stage website build I described in my previous post, btw, is an SCons-based system -- it's not perfect, but already miles ahead of ancient junk like makefiles).


> There's no 'simple' solution to this problem that I'm aware of. You start to address this with higher-level structure, and that is not a 'simple DAG' anymore.

It's still a DAG.  You just have some fancy automatic scanning / generation at the higher level structure, but it all turns into a DAG in the end.  And here is my point: the build system should ALLOW the user to enter custom DAG nodes/edges as needed, rather than force the user to only use the available prebaked rules -- because there will always be a situation where you need to do something the build tool authors haven't thought of. You should always have the option of going under the hood when you need to. You should never be limited only to what the authors had in mind.  I have nothing against prebaked automatic scanners -- but that should not preclude writing your *own* custom scanners if you wanted to.  And it should not prevent you from adding rules to the DAG directly.

The correct design is always the one that empowers the user, not the one that spoonfeeds the user yet comes in a straitjacket.


> Now... whatever solution you concluded; express that in make, ninja, MSBuild, .xcodeproj...

The fact that doing all of this in make (or whatever else) is such a challenge is exactly proof of what I'm saying: these build systems are fundamentally b0rken, and for no good reason. All the technology necessary to make sane builds possible already exists.  It's just that too many build systems are still living in the 80's and refusing to move on.

And in the meantime, even better build systems are already being implemented, like Tup, where the build time is proportional to the size of change rather than the size of the workspace (an SCons wart).  Yet people still use make like it's still 1985, and people still invent build systems with antiquated designs like it's still 1985.


T

-- 
Give a man a fish, and he eats once. Teach a man to fish, and he will sit forever.
February 25, 2019
On Tue, Feb 26, 2019 at 01:33:50AM +0000, Rubn via Digitalmars-d wrote:
> On Monday, 25 February 2019 at 22:55:18 UTC, H. S. Teoh wrote:
[...]
> What build systems are you talking about here? I mean I can search for programs that do certain things and I'll most definitely find more subpar ones than any spectacular ones. Especially if they are free. So we are on the same page on which build systems you are referring to.

SCons is free, and does all of what I described and more.  It's not perfect, of course.  But it's miles better than, say, make -- for its unreliability and the tendency for makefiles to become unreadably complex and unmaintainable. Or dub, for forcing you to work a certain way and unable to express things like multi-stage builds or non-compilation tasks.


[...]
> > - Compile a subset of source files into a utility;
> > 
> > - Run said utility to transform certain input data files into source
> >   code;
> > 
> > - Compile the generated source code into executables;
> > 
> > - Run said executables on other data files to transform the data into
> >   PovRay scene files;
> > 
> > - Run PovRay to produce images;
> > 
> > - Run post-processing utilities on said images to crop / reborder them;
> > 
> > - Run another utility to convert these images into animations;
> > 
> > - Install these animations into a target directory.
[...]
> Yes doing all those things isn't all that difficult, it really is just a matter of calling a different program to generate the file.

And yet the above build is not expressible in dub.


> The difficulty of build systems comes in when you have an extremely large project that takes a long time to build.

The above steps are part of a project I have whose full build takes about 5-6 hours.  But while working on it, the build turnaround time is about 10-15 seconds (and that's only because SCons didn't get one thing right: that build time should be proportional to changeset size, rather than the size of the entire workspace -- otherwise it would be more like 3-4 seconds).  *That's* what I call a sane build system.


[...]
> It really depends on what you are building. Working on DMD I don't have to do a clean, doing a bisect though I effective have to do a clean every new commit.

Well exactly, that's the stupidity of it. You always have to 'make clean', "just to be sure", even if "most of the time" it works.  It's 2019, and algorithms for reliable builds have been known for at least a decade or more, yet we're still stuck in the dark ages of "occasionally I have to run make clean, and maybe I should do it right now 'cos I'm not sure if this bug is caused by out-of-sync object files or if it's a real bug".  Can you imagine how ridiculous it would be with the above 5-6 hour build script, if I had built that project out of makefiles?  I would get absolutely nothing done at all if every once in a while I have to `make clean` "just to be sure".

Thankfully, SCons is sane enough that I don't have to rerun the entire build for months on end -- actually, I never had to do it.  Even when there were big changes that cause almost the whole thing to rebuild, it was SCons that figured out that it had to rebuild everything; I never had to tell it to.  Every time I build, no matter what state the workspace was in, it would always update everything correctly.  I can even `git checkout <branch>` all over the place, and it doesn't lose track of how to update all relevant targets. I never have to hold its hand to get it to do the right thing, it Just Works(tm).  *That's* what I call a sane system.  (In spite of said SCons warts.)


> > There's more I can say, but basically, this is the power that having direct access to the DAG can give you.  In this day and age, it's inexcusable not to be able to do this.
> > 
> > Any build system that cannot do all of the above is a crippled build system that I will not use, because life is far too short to waste fighting with your build system rather than getting things done.
[...]
> The build systems I've used can do all that, the problem is about functionality so much as the ease of achieving that functionality.

Well, yes.  That's why I repeatedly say, a proper design should empower the user.  Easy things should be easy, and hard things should be possible.  It shouldn't be the case that easy things are hard (e.g. Manu's "if I have to run an extra step before compilation, I have to bend backwards and recite gibberish in encrypted Reverse Klingon to get make to do the right thing"), and hard things are either outright impossible, or practically impossible because it's so onerous you might as well not bother trying.


> I just use a script, don't need a build system but doing a fully build of my project only takes 10 seconds so I have that luxury.

As I said, my website project takes about 5-6 hours for a full, clean build.  Anything less than a sane build system -- or a mostly-sane one (SCons does have its warts like I said) -- is simply not even worth my consideration.  Life is too short to have to take 6-hour coffee breaks every other day just because make is too dumb to produce reliable builds.


T

-- 
We are in class, we are supposed to be learning, we have a teacher... Is it too much that I expect him to teach me??? -- RL
February 27, 2019
On 2/25/19 2:28 PM, H. S. Teoh wrote:
> On Mon, Feb 25, 2019 at 11:04:56AM -0800, Manu via Digitalmars-d wrote:
>> On Mon, Feb 25, 2019 at 10:10 AM Andrei Alexandrescu via Digitalmars-d
>> <digitalmars-d@puremagic.com> wrote:
> [...]
>>> Indeed so. There's also the network effect of tooling. Integrating
>>> within the compiler would be like the proverbial "giving someone a
>>> fish", whereas framing it as a tool that can be the first inspiring
>>> many others is akin to "teaching fishing".
>>
>> That sounds nice, but it's bollocks though; give me dtoh, i'm about
>> 95% less likely to use it. It's easy to add a flag to the command line
>> of our hyper-complex build, but reworking custom tooling into it, not
>> so much.
>> I'm not a build engineer, and I have no idea how I'd wire a second
>> pass to each source compile if I wanted to. Tell me how to wire that
>> into VS? How do I wite that into XCode? How do I express that in the
>> scripts that emit those project formats, and also makefiles and ninja?
>> How do I express that the outputs (which are .h files) are correctly
>> expressed as inputs of dependent .cpp compile steps?
> [...]
> 
> <off-topic rant>
> This is a perfect example of what has gone completely wrong in the world
> of build systems. Too many assumptions and poor designs over an
> extremely simple and straightforward dependency graph walk algorithm,
> that turn something that ought to be trivial to implement into a
> gargantuan task that requires a dedicated job title like "build
> engineer".  It's completely insane, yet people accept it as a fact of
> life. It boggles the mind.
> </off-topic rant>
> 

Hear, hear. When adding another step to a build process ISN'T a simple "add a line to the script", then something has gone very, VERY wrong.

(Incidentally, this is part of why I've long since lost all patience for trying to use IDEs like VS, Eclipse, XCode, and whatnot. Life's too short to tolerate all that mess of complexity they turn a basic build into. I still *HATE* with a passion, the fact I have to put up will all that black-box-build bullshit when I use Unity3D - and don't even get me started on the complete and utter garbage that is MSBuild (used by Unity, naturally)).

HOWEVER:

All that said, when a single build needs to make multiple passes of *the same sources* through the compiler, that's clearly an architectural failure on the part of the tooling. We can argue all we want about how separate tools is technically superior, but if means adding duplicate passes *and* extra complications to the user's buildsystem, then it clearly ISN'T "technically superior", it's just a different set of tradeoffs and yet another example of D letting perfect be the enemy of good.
February 27, 2019
On Wednesday, 27 February 2019 at 18:48:06 UTC, Nick Sabalausky (Abscissa) wrote:
> On 2/25/19 2:28 PM, H. S. Teoh wrote:
>> On Mon, Feb 25, 2019 at 11:04:56AM -0800, Manu via Digitalmars-d wrote:
>> <off-topic rant>
>> This is a perfect example of what has gone completely wrong in the world
>> of build systems. Too many assumptions and poor designs over an
>> extremely simple and straightforward dependency graph walk algorithm,
>> that turn something that ought to be trivial to implement into a
>> gargantuan task that requires a dedicated job title like "build
>> engineer".  It's completely insane, yet people accept it as a fact of
>> life. It boggles the mind.
>> </off-topic rant>
>> 
>
> Hear, hear. When adding another step to a build process ISN'T a simple "add a line to the script", then something has gone very, VERY wrong.
>

I strongly agree, please consider adding it to the compiler, I had enough of insane build systems for one lifetime, the only buildsystem I need is "dmd -i"

If you feel there is a need for a more modular aproach, then I'd rather see the posibility of adding "end user developed" dynamic library plugins to the compiler instead, that aproach can also spur creativity from outside developers just as well as separate tools can.


1 2 3
Next ›   Last »