Jump to page: 1 2
Thread overview
Re: Dub, Cargo, Go, Gradle, Maven
Feb 16, 2018
H. S. Teoh
Feb 16, 2018
Dmitry Olshansky
Feb 16, 2018
H. S. Teoh
Feb 17, 2018
Dmitry Olshansky
Feb 21, 2018
Russel Winder
Feb 21, 2018
Russel Winder
Feb 21, 2018
H. S. Teoh
Feb 21, 2018
David Gileadi
Feb 22, 2018
Russel Winder
Feb 22, 2018
Russel Winder
Feb 22, 2018
H. S. Teoh
Feb 16, 2018
H. S. Teoh
Feb 21, 2018
Russel Winder
February 16, 2018
On Mon, Feb 12, 2018 at 10:35:06AM +0000, Russel Winder via Digitalmars-d wrote:
> In all the discussion of Dub to date, it hasn't been pointed out that JVM building merged dependency management and build a long time ago. Historically:
> 
>   Make → Ant → Maven → Gradle
> 
> and Gradle can handle C++ as well as JVM language builds.
> 
> So the integration of package management and build as seen in Go, Cargo, and Dub is not a group of outliers. Could it be then that it is the right thing to do. After all package management is a dependency management activity and build is a dependency management activity, so why separate them, just have a single ADG to describe the whole thing.

I have no problem with using a single ADG/DAG to describe the whole thing.  However, a naïve implementation of this raises a few issues:

If a dependent node requires network access, it forces network access every time the DAG is updated.  This is slow, and also unreliable: the shape of the DAG could, in theory, change arbitrarily at any time outside the control of the user.  If I'm debugging a program, the very last thing I want to happen is that the act of building the software also pulls in new library versions that cause the location of the bug to shift, thereby ruining any progress I may have made on narrowing down its locus. It would be nice to locally cache such network-dependent nodes so that they are only refreshed on demand.

Furthermore, a malicious external entity can introduce arbitrary changes into the DAG, e.g., hijack an intermediate DNS server so that network lookups get redirected to a malicious server which then adds dependencies on malware to your DAG.  The next time you update: boom, your software now contains a trojan horse. (Even better if you have integrated package dependencies with builds all the way to deployment: now all your customers have a copy of the trojan deployed on their machines, too.)  To mitigate this, some kind of security model would be required (e.g., verifiable server certificates, cryptographically signed package payloads).  Which adds to the cost of refreshing network nodes, and hence is another big reason why this should be done on-demand, NOT automatically every time you ask for a new software build.

Also, if the machine I'm working on happens to be offline, it would
totally suck to be unable to build my project just because of that.
The whole point of having a DAG is reliable builds, and having the graph
depend on remote resources over an inherently unreliable network defeats
the purpose.  That is why caching is basically mandatory, as is control
over when the network is accessed.

And furthermore, one always has to be mindful of the occasional need to rollback.  Generally, source code control is used for the local source code component -- if you need to revert a change, just checkout an earlier revision from your repo.  But if a network resource that used to provide library X v1.0 now has moved on to X v2.0, and has dropped all support for v1.0 so that it is no longer downloadable from the server, then rollback is no longer possible.  You are now unable to reproduce a build you made 2 years ago.  (Which you might need to, if a customer environment is still running the old version and you need to debug it.) IOW, the network is inherently unreliable.  Some form of local caching / cache revision control is required.


[...]
> Then, is a DevOps world, there is deployment, which is usually a dependency management task. Is a totally new tool doing ADG manipulation really needed for this?

My answer is: the ADG/DAG manipulation should be a *library*, a reusable component that can be integrated into diverse systems that require it. Multiple systems that implement functionality X is not necessarily a valid reason to argue for merging said systems into a single monolithic monster.  Rather, what it *does* suggest is to factor out functionality X so that it can be reused across said systems.


[...]
> Merging ideas from Dub, Gradle, and Reggae, into a project management tool for D (with C) projects is relatively straightforward of plan albeit really quite a complicated project. Creating the core ADG processing is the first requirement. It has to deal with external dependencies, project build dependencies, and deployment dependencies.

Your last sentence already shows that such a project is ill-advised, because while all of them in an abstract sense reduce to nothing but DAG manipulation, that is not an argument for integrating all systems that happen to use DAGs as a core algorithm into a single monolithic system. Rather, it's an indication that DAG manipulation code ought to be a common library that's reused across systems that require such functionality, i.e., external dependencies, build dependencies, and deployment dependencies.

It's really very simple.  If your code has function X and function Y, and X and Y have a lot of code in common, it does not mean you should write function Z that can perform the role of both X and Y.  Rather, it means you should factor out the common parts into function W, and reuse W from X and Y.  (Alas, the former is seen all too often in large "enterprise" software, where functions start out being straightforward with a clean API, and end up being a monstrous chimera with 50 non-orthogonal, sometimes mutually-contradictory parameters, that can nevertheless do everything you want -- if you can only figure out what exactly each parameter means and which subset of parameters are actually relevant to what you want.)

Similarly, if you have systems P, Q, and R, and they all have DAG manipulation as a common functionality, that is an argument for factoring out said DAG manipulation as a reusable component. It is not an argument for making a new system S that includes everything that P, Q, and R can do. (Unless S can also provide new functionality that P, Q, and R could not have been able to achieve without such integration.)


[...]
> (*) The O(N) vs. O(n), SCons vs. Tup thing that T raised in another
> thread is important, but actually it is an implementation thing of how
> do you detect change, it isn't an algorithmic issue at a system design
> level. But it is important.

The O(N) vs. O(n) issue is actually very important once you generalize beyond the specifics of build dependencies, esp. if you start talking about network-dependent DAGs.  If a task has a DAG that depends on, say, 100 network nodes, then I absolutely do NOT want the dependency resolution tool to be querying all 100 nodes every time I ask for a refresh.  That's just ridiculously inefficient.  Rather, the tool should subscribe for updates from the network servers so that they inform it when their part of the DAG changes.  IOW, the amount of network traffic should be proportional to the number of *changes* in the remote nodes, NOT the *total* number of nodes.

Similarly, for deployment management, if my project has 100 installation targets (remote customer machines), each of which has 1000 entities (let's say files, like data files and executables), then I really do NOT want to have to scan all 1000 entities on all 100 installation targets, just to decide that only 50 files on 2 installation targets have changed.  I should be able to push out only the files that have changed, and not everything else. IOW, the size of the update should be proportional to the size of the change, NOT the total size of the entire deployment.  Otherwise it is simply not scalable and will quickly become impractical as project sizes grow.

If such considerations are not integrated into the system design at the top level, you can be sure that there will be inherent design flaws that preclude efficient implementation later on. IOW, DAG updates must be proportional to the size of the DAG change. Nowhere must there be any algorithm that requires scanning the entire DAG (unless the changeset covers the entire DAG).


T

-- 
Guns don't kill people. Bullets do.
February 16, 2018
On Friday, 16 February 2018 at 18:16:12 UTC, H. S. Teoh wrote:
> On Mon, Feb 12, 2018 at 10:35:06AM +0000, Russel Winder via Digitalmars-d wrote:
>> In all the discussion of Dub to date, it hasn't been pointed out that JVM building merged dependency management and build a long time ago. Historically:
>> 
>>   Make → Ant → Maven → Gradle
>> 
>> and Gradle can handle C++ as well as JVM language builds.
>> 
>> So the integration of package management and build as seen in Go, Cargo, and Dub is not a group of outliers. Could it be then that it is the right thing to do. After all package management is a dependency management activity and build is a dependency management activity, so why separate them, just have a single ADG to describe the whole thing.
>
> I have no problem with using a single ADG/DAG to describe the whole thing.  However, a naïve implementation of this raises a few issues:
>
> If a dependent node requires network access, it forces network access every time the DAG is updated.  This is slow, and also unreliable: the shape of the DAG could, in theory, change arbitrarily at any time outside the control of the user.

Oh, come on. Immutable artifacts make this matter trivial - the first step is resolution, where you figure out what things you already have by looking at metadata only, followed by download the world (or part you do not have). Since specific versions never change, nothing to worry about. Java folks had this for ages with Maven and its ilk.

Some targets like deploy may indeed not have a cheap “check if its done” step. They may not realy need one. (though rsync does wonders at minimizing the work)

Also most if not all build systems will inevitably integrate all of the below in some way:
- compiler (internal as library or external as a build server)
- source code dependency resolution
- package dependency resolution
- package download or build
- execution of arbitrary tasks in form of plugins or external tools

Me personally in love with plugins and general purpose language available to define tasks. Scala’s SBT may have many faults but plugins and extensibility make it awesome.


February 16, 2018
On Friday, 16 February 2018 at 18:16:12 UTC, H. S. Teoh wrote:
> The O(N) vs. O(n) issue is actually very important once you

I understand what you are trying to say, but this usage of notation is very confusing. O(n) is exactly the same as O(N) if N relates to n by a given percentage.


February 16, 2018
On Fri, Feb 16, 2018 at 07:40:01PM +0000, Ola Fosheim Grøstad via Digitalmars-d wrote:
> On Friday, 16 February 2018 at 18:16:12 UTC, H. S. Teoh wrote:
> > The O(N) vs. O(n) issue is actually very important once you
> 
> I understand what you are trying to say, but this usage of notation is
> very confusing. O(n) is exactly the same as O(N) if N relates to n by
> a given percentage.

N = size of DAG
n = size of changeset

It's not a fixed percentage.


T

-- 
He who does not appreciate the beauty of language is not worthy to bemoan its flaws.
February 16, 2018
On Friday, 16 February 2018 at 19:40:07 UTC, H. S. Teoh wrote:
> On Fri, Feb 16, 2018 at 07:40:01PM +0000, Ola Fosheim Grøstad via Digitalmars-d wrote:
>> On Friday, 16 February 2018 at 18:16:12 UTC, H. S. Teoh wrote:
>> > The O(N) vs. O(n) issue is actually very important once you
>> 
>> I understand what you are trying to say, but this usage of notation is
>> very confusing. O(n) is exactly the same as O(N) if N relates to n by
>> a given percentage.
>
> N = size of DAG
> n = size of changeset
>
> It's not a fixed percentage.

Well, for this comparison to make sense asymptotically you have to consider how n grows when N grows towards infinity. Basically without relating n to N we don't get any information from O(n) vs O(N).

If you cannot bound n in terms of N (lower/upper) then O(n) is most likely either O(1)  or O(N) in relation to N... (e.g. there is a constant upper limit to how many files you modify manually, or you rebuild roughly everything)

Now, if you said that at most O(log N) files are changed, then you could have an argument in terms of big-oh.

February 16, 2018
On Fri, Feb 16, 2018 at 07:31:37PM +0000, Dmitry Olshansky via Digitalmars-d wrote:
> On Friday, 16 February 2018 at 18:16:12 UTC, H. S. Teoh wrote:
[...]
> > If a dependent node requires network access, it forces network access every time the DAG is updated.  This is slow, and also unreliable: the shape of the DAG could, in theory, change arbitrarily at any time outside the control of the user.
> 
> Oh, come on. Immutable artifacts make this matter trivial - the first step is resolution, where you figure out what things you already have by looking at metadata only, followed by download the world (or part you do not have).  Since specific versions never change, nothing to worry about. Java folks had this for ages with Maven and its ilk.

This assumes that the upstream server (1) consistently serves the same data for the same version -- which in principle will be the case, but unforeseen problems could break this assumption; (2) stores all versions forever, which is unlikely to be always true.

In any case, dependence on network access for every invocation of a build is unacceptable to me.


[...]
> Also most if not all build systems will inevitably integrate all of
> the below in some way:
> - compiler (internal as library or external as a build server)
> - source code dependency resolution
> - package dependency resolution
> - package download or build
> - execution of arbitrary tasks in form of plugins or external tools
> 
> Me personally in love with plugins and general purpose language available to define tasks. Scala’s SBT may have many faults but plugins and extensibility make it awesome.

Personally, I find that the most useful build systems are those that make no assumptions about how your products are built.  SCons is a good example of this: for example, currently I have a website completely built from ground-up by SCons, which includes tasks like generating datasets, 3D models, using a PHP filter to generate HTML, running a raytracer to generate images, post-processing generated images, creating a dataset from revision history and rendering graphs, running LaTeX to generate PDF documentation, etc., and installing the products of all of the foregoing into a staging directory that then gets rsync'd to the remote webserver.  Basically none of these steps involve the traditional invocation of a compiler or built-in source code dependency resolution. SCons has a very nice API for defining my own dependency resolver for custom data formats that can leverage all of the built-in scanning / depending resolving algorithms that come with SCons.

I would not even consider any build system incapable of this level of customization.


T

-- 
Life begins when you can spend your spare time programming instead of watching television. -- Cal Keegan
February 17, 2018
On Friday, 16 February 2018 at 22:48:55 UTC, H. S. Teoh wrote:
> On Fri, Feb 16, 2018 at 07:31:37PM +0000, Dmitry Olshansky via Digitalmars-d wrote:
>> On Friday, 16 February 2018 at 18:16:12 UTC, H. S. Teoh wrote:
> [...]
>> > If a dependent node requires network access, it forces network access every time the DAG is updated.  This is slow, and also unreliable: the shape of the DAG could, in theory, change arbitrarily at any time outside the control of the user.
>> 
>> Oh, come on. Immutable artifacts make this matter trivial - the first step is resolution, where you figure out what things you already have by looking at metadata only, followed by download the world (or part you do not have).  Since specific versions never change, nothing to worry about. Java folks had this for ages with Maven and its ilk.
>
> This assumes that the upstream server (1) consistently serves the same data for the same version -- which in principle will be the case, but unforeseen problems could break this assumption;

Trivially enforced on public hubs such maven central.

 (2) stores all versions forever, which is unlikely
> to be always true.


Is in fact true. After all github stores the whole git repo why would storing all distinct versions be too much.


> In any case, dependence on network access for every invocation of a build is unacceptable to me.

Well, it is not in the scheme outlined. Only the clean build is.

> [...]
>> Also most if not all build systems will inevitably integrate all of
>> the below in some way:
>> - compiler (internal as library or external as a build server)
>> - source code dependency resolution
>> - package dependency resolution
>> - package download or build
>> - execution of arbitrary tasks in form of plugins or external tools
>> 
>> Me personally in love with plugins and general purpose language available to define tasks. Scala’s SBT may have many faults but plugins and extensibility make it awesome.
>
> Personally, I find that the most useful build systems are those that make no assumptions about how your products are built.

No thanks, I’d prefer convention then unique snowflake build scripts if there us a choice. Sometimes you have to do go beyound ghe basics but not too often.

> SCons is a good example of this: for example, currently I have a website completely built from ground-up by SCons, which includes tasks like generating datasets, 3D models, using a PHP filter to generate HTML, running a raytracer to generate images, post-processing generated images, creating a dataset from revision history and rendering graphs, running LaTeX to generate PDF documentation, etc., and installing the products of all of the foregoing into a staging directory that then gets rsync'd to the remote

All of that can be done by any of modern tools with full language + DSL at your disposal, eg Gradle or SBT. In a sense SCons is the same but without resolving packages.


> Basically none of these steps involve the traditional invocation of a compiler or built-in source code dependency resolution. SCons has a very nice API for defining my own dependency resolver for custom data formats that can leverage all of the built-in scanning / depending resolving algorithms that come with SCons.
>
> I would not even consider any build system incapable of this level of customization.
>
>
> T


February 21, 2018
On Fri, 2018-02-16 at 10:16 -0800, H. S. Teoh via Digitalmars-d wrote:
> 
[…]
> If a dependent node requires network access, it forces network access
> every time the DAG is updated.  This is slow, and also unreliable:
> the
> shape of the DAG could, in theory, change arbitrarily at any time
> outside the control of the user.  If I'm debugging a program, the
> very
> last thing I want to happen is that the act of building the software
> also pulls in new library versions that cause the location of the bug
> to
> shift, thereby ruining any progress I may have made on narrowing down
> its locus. It would be nice to locally cache such network-dependent
> nodes so that they are only refreshed on demand.

As with all build systems that involve a network accessed dependency provider, there has to be a local cache *and* a mechanism for not always doing network lookups. Some people just use fixed versions to stop this, others also employ a "no lookups" flag or separate build and checking of dependencies. Obviously there needs to be network access for some change events, but it should be well controlled


> Furthermore, a malicious external entity can introduce arbitrary
> changes
> into the DAG, e.g., hijack an intermediate DNS server so that network
> lookups get redirected to a malicious server which then adds
> dependencies on malware to your DAG.  The next time you update: boom,
> your software now contains a trojan horse. (Even better if you have
> integrated package dependencies with builds all the way to
> deployment:
> now all your customers have a copy of the trojan deployed on their
> machines, too.)  To mitigate this, some kind of security model would
> be
> required (e.g., verifiable server certificates, cryptographically
> signed
> package payloads).  Which adds to the cost of refreshing network
> nodes,
> and hence is another big reason why this should be done on-demand,
> NOT
> automatically every time you ask for a new software build.

This is a problem for all extant systems and until a solution is found nothing can be done.

> Also, if the machine I'm working on happens to be offline, it would
> totally suck to be unable to build my project just because of that.
> The whole point of having a DAG is reliable builds, and having the
> graph
> depend on remote resources over an inherently unreliable network
> defeats
> the purpose.  That is why caching is basically mandatory, as is
> control
> over when the network is accessed.

You have answered your own question, and all good build/dependency management systems already do this via some mechanism.

> And furthermore, one always has to be mindful of the occasional need
> to
> rollback.  Generally, source code control is used for the local
> source
> code component -- if you need to revert a change, just checkout an
> earlier revision from your repo.  But if a network resource that used
> to
> provide library X v1.0 now has moved on to X v2.0, and has dropped
> all
> support for v1.0 so that it is no longer downloadable from the
> server,
> then rollback is no longer possible.  You are now unable to reproduce
> a
> build you made 2 years ago.  (Which you might need to, if a customer
> environment is still running the old version and you need to debug
> it.)
> IOW, the network is inherently unreliable.  Some form of local
> caching /
> cache revision control is required.

I don't see this as a big issue since it is already possible in all
good systems.
[…]

-- 
Russel.
===========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk


February 21, 2018
On Fri, 2018-02-16 at 19:31 +0000, Dmitry Olshansky via Digitalmars-d wrote:
> 
[…]
> Me personally in love with plugins and general purpose language available to define tasks. Scala’s SBT may have many faults but plugins and extensibility make it awesome.

Maven has plugins but is seriously unwieldy, uses XML for project descriptions, and the dependency resolution algorithm is faulty. Gradle fixes most of the problems of Maven and can also do C++ building. It could also do D building, but given it's focus on JCenter and Maven Central for dependency management is it the right tool. It could be as it would just require plugins to put Dub in play.

Gradle's dependency management is tried tested and works well even at massive scale.

-- 
Russel.
===========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk


February 21, 2018
On Fri, 2018-02-16 at 14:48 -0800, H. S. Teoh via Digitalmars-d wrote:
> […]
> 
> This assumes that the upstream server (1) consistently serves the
> same
> data for the same version -- which in principle will be the case, but
> unforeseen problems could break this assumption; (2) stores all
> versions
> forever, which is unlikely to be always true.

JCenter and Maven Central do indeed serve all versions of all packages ever stored for all time. It is critical to the JVM-verse that this is the case.

> In any case, dependence on network access for every invocation of a build is unacceptable to me.

Local caching deals with this.

[…]
> 
> 
> Personally, I find that the most useful build systems are those that
> make no assumptions about how your products are built.  SCons is a
> good
> example of this: for example, currently I have a website completely
> built from ground-up by SCons, which includes tasks like generating
> datasets, 3D models, using a PHP filter to generate HTML, running a
> raytracer to generate images, post-processing generated images,
> creating
> a dataset from revision history and rendering graphs, running LaTeX
> to
> generate PDF documentation, etc., and installing the products of all
> of
> the foregoing into a staging directory that then gets rsync'd to the
> remote webserver.  Basically none of these steps involve the
> traditional
> invocation of a compiler or built-in source code dependency
> resolution.
> SCons has a very nice API for defining my own dependency resolver for
> custom data formats that can leverage all of the built-in scanning /
> depending resolving algorithms that come with SCons.
> 
> I would not even consider any build system incapable of this level of customization.

I have been (still am?) a SCons fan but it has some serious problems in
some workflows.

I think all of the things you mention are solved in a system that has a general purpose programming language to describe the project and the build. There is clearly a tension between specifying a project in a purely declarative way (cf. Cargo, Dub, Maven) where all build and deploy activities are effectively hardwired (though Maven has plugins to amend things) versus systems that use a programming language. Then there are two types of the latter: just use a programming language with a declarative internal DSL, e.g. SCons, Gradle, and those using an external DSL, e.g. CMake, Meson. I much prefer using a programming language with internal DSL generally, but then systems like CMake and Meson get forced on you so you have to begin to like them for certain situation (*).


(*) Debian has a downer on SCons for building packages for example, and
Meson actually works very well in that context.


-- 
Russel.
===========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk


« First   ‹ Prev
1 2