June 17, 2016
> However, I question the utility of even doing this in the first place. You miss out on the convenience of using the existing command line interface. And for what? Just so everything can be in D? Writing the same thing in Lua would be much prettier. I don't understand this dependency-phobia.

It comes from knowing that for most small to average size D projects you don't need a build _tool_ at all. If full clean build takes 2 seconds, installing extra tool to achieve the same thing one line shell script does is highly annoying.

Your reasoning about makefiles seems to be flavored by C++ realities. But my typical D makefile would look like something this:

build:
    dmd -ofbinary `find ./src`

test:
    dmd -unittest -main `find ./src`

deploy: build test
    scp ./binary server:

That means that I usually care neither about correctness nor about speed, only about good cross-platform way to define pipelines. And for that fetching dedicated tool is simply too discouraging.

In my opinion that is why it is so hard to take over make place for any new tool - they all put too much attention into complicated projects but to get self-sustained network effect one has to prioritize small and simple projects. And ease of availability is most important there.
June 17, 2016
On Fri, Jun 17, 2016 at 09:00:45AM +0000, Atila Neves via Digitalmars-d-announce wrote:
> On Friday, 17 June 2016 at 06:18:28 UTC, H. S. Teoh wrote:
> > On Fri, Jun 17, 2016 at 05:41:30AM +0000, Jason White via Digitalmars-d-announce wrote: [...]
> > > Where Make gets slow is when checking for changes on a ton of files.  I haven't tested it, but I'm sure Button is faster than Make in this case because it checks for changed files using multiple threads.  Using the file system watcher can also bring this down to a near-zero time.
> > 
> > IMO using the file system watcher is the way to go. It's the only way to beat the O(n) pause at the beginning of a build as the build system scans for what has changed.
> 
> See, I used to think that, then I measured. tup uses fuse for this and that's exactly why it's fast. I was considering a similar approach with the reggae binary backend, and so I went and timed make, tup, ninja and itself on a synthetic project. Basically I wrote a program to write out source files to be compiled, with a runtime parameter indicating how many source files to write.
> 
> The most extensive tests I did was on a synthetic project of 30k source files. That's a lot bigger than the vast majority of developers are ever likely to work on. As a comparison, the 2.6.11 version of the Linux kernel had 17k files.

Today's software projects are much bigger than you seem to imply. For example, my work project *includes* the entire Linux kernel as part of its build process, and the size of the workspace is dominated by the non-Linux components. So 30k source files isn't exactly something totally far out.


> A no-op build on my laptop was about (from memory):
> 
> tup: <1s
> ninja, binary: 1.3s
> make: >20s
> 
> It turns out that just stat'ing everything is fast enough for pretty much everybody, so I just kept the simple algorithm. Bear in mind the Makefiles here were the simplest possible - doing anything that usually goes on in Makefileland would have made it far, far slower. I know: I converted a build system at work from make to hand-written ninja and it no-op builds went from nearly 2 minutes to 1s.

Problem: stat() isn't good enough when network file sharing is involved. It breaks correctness by introducing heisenbugs caused by (sometimes tiny) differences in local hardware clocks. It also may break if two versions of the same file share the same timestamp (often thought impossible, but quite possible with machine-generated files and a filesystem that doesn't have subsecond resolution -- and it's rare enough that when it does happen people are left scratching their heads for many wasted hours).   To guarantee correctness you need to compute a digest of file contents, not just timestamp.


> If you happen to be unlucky enough to work on a project so large you need to watch the file system, then use the tup backend I guess.
[...]

Yes, I'm pretty sure that describes a lot of software projects out there today. The scale of software these days is growing exponentially, and there's no sign of it slowing down.  Or maybe that's just an artifact of the field I work in? :-P


T

-- 
Never step over a puddle, always step around it. Chances are that whatever made it is still dripping.
June 17, 2016
On 06/17/2016 06:20 PM, H. S. Teoh via Digitalmars-d-announce wrote:
>> If you happen to be unlucky enough to work on a project so large you need to watch the file system, then use the tup backend I guess.
> [...]
> 
> Yes, I'm pretty sure that describes a lot of software projects out there today. The scale of software these days is growing exponentially, and there's no sign of it slowing down.  Or maybe that's just an artifact of the field I work in? :-P

Server-side domain is definitely getting smaller beause micro-service hype keeps growing (and that is one of hypes I do actually support btw).
June 17, 2016
On Friday, 17 June 2016 at 08:23:50 UTC, Atila Neves wrote:
> I agree, but CMake/ninja, tup, regga/ninja, reggae/binary are all correct _and_ fast.

'Correct' referring to which standards? There is an interesting series of blog posts by Mike Shal:

http://gittup.org/blog/2014/03/6-clobber-builds-part-1---missing-dependencies/
http://gittup.org/blog/2014/05/7-clobber-builds-part-2---fixing-missing-dependencies/
http://gittup.org/blog/2014/06/8-clobber-builds-part-3---other-clobber-causes/
http://gittup.org/blog/2015/03/13-clobber-builds-part-4---fixing-other-clobber-causes/
June 17, 2016
On Fri, Jun 17, 2016 at 07:30:42PM +0000, Fool via Digitalmars-d-announce wrote:
> On Friday, 17 June 2016 at 08:23:50 UTC, Atila Neves wrote:
> > I agree, but CMake/ninja, tup, regga/ninja, reggae/binary are all correct _and_ fast.
> 
> 'Correct' referring to which standards? There is an interesting series of blog posts by Mike Shal:
> 
> http://gittup.org/blog/2014/03/6-clobber-builds-part-1---missing-dependencies/ http://gittup.org/blog/2014/05/7-clobber-builds-part-2---fixing-missing-dependencies/ http://gittup.org/blog/2014/06/8-clobber-builds-part-3---other-clobber-causes/ http://gittup.org/blog/2015/03/13-clobber-builds-part-4---fixing-other-clobber-causes/

To me, "correct" means:

- After invoking the build tool, the workspace *always* reflects a
  valid, reproducible build. Regardless of initial conditions, existence
  or non-existence of intermediate files, stale files, temporary files,
  or other detritus. Independent of environmental factors. Regardless of
  whether a previous build invocation was interrupted in the middle --
  the build system should be able to continue where it left off,
  reproduce any partial build products, and produce exactly the same
  products, bit for bit, as if it had not been interrupted before.

- If anything changes -- and I mean literally ANYTHING -- that might
  cause the build products to be different in some way, the build tool
  should detect that and update the affected targets accordingly the
  next time it's invoked.  "Anything" includes (but is not limited to):

   - The contents of source files, even if the timestamp stays
     identical to the previous version.

   - Change in compiler flags, or any change to the build script itself;

   - A new version of the compiler was installed on the system;

   - A system library was upgraded / a new library was installed that
     may get picked up at link time;

   - Change in environment variables that might cause some of the build
     commands to work differently (yes I know this is a bad thing -- it
     is not recommended to have your build depend on this, but the point
     is that if it does, the build tool ought to detect it).

   - Editing comments in a source file (what if there's a script that
     parses comments? Or ddoc?);

   - Reverting a patch (that may leave stray source files introduced by
     the patch).

   - Interrupting a build in the middle -- the build system should be
     able to detect any partially-built products and correctly rebuild
     them instead of picking up a potentially corrupted object in the
     next operation in the pipeline.

- As much as is practical, all unnecessary work should be elided. For
  example:

   - If I edit a comment in a source file, and there's an intermediate
     compile stage where an object file is produced, and the object file
     after the change is identical to the one produced by the previous
     compilation, then any further actions -- linking, archiving, etc.
     -- should not be done, because all products will be identical.

   - More generally, if my build consists of source file A, which gets
     compiled to intermediate product B, which in turn is used to
     produce final product C, then if A is modified, the build system
     should regenerate B. But if the new B is identical to the old B,
     then C should *not* be regenerated again.

      - Contrariwise, if modifications are made to B, the build system
	should NOT use the modified B to generate C; instead, it should
	detect that B is out-of-date w.r.t. A, and regenerate B from A
	first, and then proceed to generate C if it would be different
	from before.

   - Touching the timestamp of a source file or intermediate file should
     *not* cause the build system to rebuild that target, if the result
     will actually be bit-for-bit identical with the old product.

   - In spite of this work elision, the build system should still ensure
     that the final build products are 100% reproducible. That is, work
     is elided if and only if it is actually unnecessary; if a comment
     change actually causes something to change (e.g., ddocs are
     different now), then the build system must rebuild all affected
     subsequent targets.

- Assuming that a revision control system is in place, and a workspace
  is checked out on revision X with no further modifications, then
  invoking the build tool should ALWAYS, without any exceptions, produce
  exactly the same outputs, bit for bit.  I.e., if your workspace
  faithfully represents revision X in the RCS, then invoking the build
  tool will produce the exact same binary products as anybody else who
  checks out revision X, regardless of their initial starting
  conditions.

   - E.g., I may be on revision Y, then I run svn update -rX, and there
     may be stray intermediate files strewn around my workspace that are
     not in a fresh checkout of revision X, the build tool should still
     produce exactly the same products as a clean, fresh checkout of
     revision X.  This holds regardless of whether Y represents an older
     revision or a newer revision, or a different branch, etc..

   - In other words, the build system should be 100% reproducible at all
     times, and should not be affected by the existence (or
     non-existence) of any stale intermediate files.


By the above definition of correctness, Make (and pretty much anything based on it, that I know of) fails on several counts.  Systems like SCons come close to full correctness, and I believe tup can also be made correct in this way.  Make, however, by its very design cannot possibly meet all of the above requirements simultaneously, and thus fails my definition of correctness.


T

-- 
A bend in the road is not the end of the road unless you fail to make the turn. -- Brian White
June 17, 2016
On Monday, 30 May 2016 at 19:16:50 UTC, Jason White wrote:
>
> Note that this is still a ways off from being production-ready. It needs some polishing. Feedback would be most appreciated (file some issues!). I really want to make this one of the best build systems out there.
>

I found the beginning of the tutorial very clear. I really liked that it can produce a png of the build graph. I also liked the Lua build description for DMD. Much more legible than the make file.

However, once I got to the "Going Meta: Building the Build Description" section of the tutorial, I got a little confused.

I found it a little weird that the json output towards the end of the tutorial don't always match up. Like, where did the .h files go from the inputs? (I get that they aren't needed for running gcc, but you should mention that) Why is it displaying cc instead of gcc? I just feel like you might be able to split things up a little and provide a few more details. Like, this is how to do a base version, then say this is how you can customize what is displayed. Also, it's a little terse on the details of things like what the cc.binary is doing. Always err on the side of explaining things too much rather than too little, IMO.
June 18, 2016
On Friday, 17 June 2016 at 20:59:46 UTC, jmh530 wrote:
> I found the beginning of the tutorial very clear. I really liked that it can produce a png of the build graph. I also liked the Lua build description for DMD. Much more legible than the make file.
>
> However, once I got to the "Going Meta: Building the Build Description" section of the tutorial, I got a little confused.
>
> I found it a little weird that the json output towards the end of the tutorial don't always match up. Like, where did the .h files go from the inputs? (I get that they aren't needed for running gcc, but you should mention that) Why is it displaying cc instead of gcc? I just feel like you might be able to split things up a little and provide a few more details. Like, this is how to do a base version, then say this is how you can customize what is displayed. Also, it's a little terse on the details of things like what the cc.binary is doing. Always err on the side of explaining things too much rather than too little, IMO.

Thank you for the feedback! I'm glad someone has read the tutorial.

I'm not happy with that section either. I think I'll split it up and go into more depth, possibly moving it to a separate page. I also still need to write docs on the Lua parts (like cc.binary), but that API is subject to change.

Unlike most people, I kind of actually enjoy writing documentation.
June 18, 2016
On Friday, 17 June 2016 at 10:24:16 UTC, Dicebot wrote:
>> However, I question the utility of even doing this in the first place. You miss out on the convenience of using the existing command line interface. And for what? Just so everything can be in D? Writing the same thing in Lua would be much prettier. I don't understand this dependency-phobia.
>
> It comes from knowing that for most small to average size D projects you don't need a build _tool_ at all. If full clean build takes 2 seconds, installing extra tool to achieve the same thing one line shell script does is highly annoying.
>
> Your reasoning about makefiles seems to be flavored by C++ realities. But my typical D makefile would look like something this:
>
> build:
>     dmd -ofbinary `find ./src`
>
> test:
>     dmd -unittest -main `find ./src`
>
> deploy: build test
>     scp ./binary server:
>
> That means that I usually care neither about correctness nor about speed, only about good cross-platform way to define pipelines. And for that fetching dedicated tool is simply too discouraging.
>
> In my opinion that is why it is so hard to take over make place for any new tool - they all put too much attention into complicated projects but to get self-sustained network effect one has to prioritize small and simple projects. And ease of availability is most important there.

I agree that a sophisticated build tool isn't really needed for tiny projects, but it's still really nice to have one that can scale as the project grows. All too often, as a project gets bigger, the build system it uses buckles under the growing complexity, no one ever gets around to changing it because they're afraid of breaking something, and the problem just gets worse.

I realize you might be playing devil's advocate a bit and I appreciate it. Let me propose another idea where maybe we can remove the extra dependency for new codebase collaborators but still have access to a full-blown build system: Add a sub-command to Button that produces a shell script to run the build. For example, `button shell -o build.sh`. Then just run `./build.sh` to build everything. I vaguely recall either Tup or Ninja having something like this.

The main downside is that it'd have to be committed every time the build changes. This could be automated with a bot, but it's still annoying. The upsides are that there is no need for any other external libraries or tools, and the superior build system can still be used by anyone who wants it.
June 18, 2016
On Friday, 17 June 2016 at 20:36:53 UTC, H. S. Teoh wrote:
> - Assuming that a revision control system is in place, and a
>   workspace is checked out on revision X with no further
>   modifications, then invoking the build tool should ALWAYS,
>   without any exceptions, produce exactly the same outputs, bit
>   for bit.  I.e., if your workspace faithfully represents
>   revision X in the RCS, then invoking the build tool will
>   produce the exact same binary products as anybody else who
>   checks out revision X, regardless of their initial starting
>   conditions.

Making builds bit-for-bit reproducible is really, really hard to do, particularly on Windows. Microsoft's C/C++ compiler embeds timestamps and other nonsense into the binaries so that every time you build, even when no source changed, you get a different binary. Google wrote a tool to help eliminate this non-determinism as a post-processing step called zap_timestamp[1]. I want to eventually include something like this with Button on Windows. I'll probably have to make a PE reader library first though.

Without reproducible builds, caching outputs doesn't work very well either.

Moral of the story is, if you're writing a compiler, for the sake of build systems everywhere, make the output deterministic! For consecutive invocations, without changing any source code, I want the hashes of the binaries to be identical every single time. DMD doesn't do this and it saddens me greatly.

[1] https://github.com/google/syzygy/tree/master/syzygy/zap_timestamp
June 18, 2016
On Sat, Jun 18, 2016 at 08:38:21AM +0000, Jason White via Digitalmars-d-announce wrote:
> On Friday, 17 June 2016 at 20:36:53 UTC, H. S. Teoh wrote:
> > - Assuming that a revision control system is in place, and a
> >   workspace is checked out on revision X with no further
> >   modifications, then invoking the build tool should ALWAYS,
> >   without any exceptions, produce exactly the same outputs, bit
> >   for bit.  I.e., if your workspace faithfully represents
> >   revision X in the RCS, then invoking the build tool will
> >   produce the exact same binary products as anybody else who
> >   checks out revision X, regardless of their initial starting
> >   conditions.
> 
> Making builds bit-for-bit reproducible is really, really hard to do, particularly on Windows. Microsoft's C/C++ compiler embeds timestamps and other nonsense into the binaries so that every time you build, even when no source changed, you get a different binary. Google wrote a tool to help eliminate this non-determinism as a post-processing step called zap_timestamp[1]. I want to eventually include something like this with Button on Windows. I'll probably have to make a PE reader library first though.

Even on Posix, certain utilities also insert timestamps, which is very annoying. An Scons-based website that I developed years ago ran into this problem with imagemagick. Fortunately there was a command-line option to suppress timestamps, which made things saner.


> Without reproducible builds, caching outputs doesn't work very well either.

Yup.


> Moral of the story is, if you're writing a compiler, for the sake of build systems everywhere, make the output deterministic! For consecutive invocations, without changing any source code, I want the hashes of the binaries to be identical every single time. DMD doesn't do this and it saddens me greatly.
[...]

DMD doesn't? What does it do that isn't deterministic?


T

-- 
Elegant or ugly code as well as fine or rude sentences have something in common: they don't depend on the language. -- Luca De Vitis