Jump to page: 1 2
Thread overview
Reducing the inter-dependencies (in Phobos and at large)
Apr 24, 2013
Dmitry Olshansky
Re: Reducing the inter-dependencies (in Phobos and at large)gei
Apr 24, 2013
renariko
Apr 24, 2013
Joshua Niehus
Apr 24, 2013
Dmitry Olshansky
Apr 24, 2013
qznc
Apr 24, 2013
Dmitry Olshansky
Apr 25, 2013
Zach the Mystic
Apr 26, 2013
Dmitry Olshansky
Apr 24, 2013
Johannes Pfau
Apr 26, 2013
Jonathan M Davis
Apr 26, 2013
Dmitry Olshansky
Apr 26, 2013
Jonathan M Davis
Apr 27, 2013
Dmitry Olshansky
April 24, 2013
Recently I've struggled again to integrate a module in Phobos and the amount of curious forward reference bugs made think about a related but not equivalent problem.

Intro

Basically an import graph of Phobos is a rat's nest of mutual imports.
With the most of modules being template "toolkits" you shouldn't pay for what you don't use. Yet it isn't true in general as the module may drag in other modules and with that all of static constructors and/or globals related.

Adding even more fun to this stuff - to get a "constraint checker" you have to import all other stuff from that module (and transitively from all imported by it modules, see std.range example below).

One motivating example (though I believe there are much more beyond this one):

As some might know regular expression engine can be used for both generating sequences for a known pattern and matching/finding pieces that do match it in some data.

In fact I have such a functionality in std.regex hidden from public interface (not yet complete, wasn't sure of the kind of API etc.).

Now the *key* fact:

auto generate(RegEx, rng) (RegEx re, Random rng)
	if(isRegex!RegEx && isUniformRNG!Random)
{
...
}

Now given that innocent signature you have dependency on the whole of std.random *including* seeding the default random number generator at the program start!

And recall that generating strings while neat is arguably more rare need then pattern matching.

Same thing with std.datetime - to get an ability to accept any kind of Date as a template parameter (in your API) you have to get the whole std.datetime *even if the user never ever calls this function* of your module API.

And everyone and their granny depends on full version of std.range
that in turn depends on std.algorithm that depends on std.conv that depends on std.format(!) and that incidentally on std.uni.
BTW std.uni turns out to be a kind of sink everybody imports sooner or later if only for unittests, sadly it's mostly imported unconditionally.

And that skipping a full rat's nest to preserve the brains of the reader.

After a couple of frustrating evenings dealing with it (multiplied by the bogus dmd) I've come up with the idea below.


Solution

First of all no compiler magic required (phew-ew!) :)

Not to mention the 2 obvious facts - smaller modules might help, as would guarding by version(unittest) imports used only for unit tests.

What we need is to re-arrange the module hierarchy (and we need that anyway) so that we split off the "concept" part of modules to a separate package.

That would allow modules that need this to use these Duck-typed entities (IFF the user ever passes such an entity) can stick with importing only the concept part.

Applying that to the current layout would look like:
std.concept.range
std.concept.random
std.concept.* //every other module with any useful isXYZ constraint
std.* // stays as is

Any module that has "concept" part then looks like this:

module std.xyz;
import std.concept.xyz;
... //the rest

And then other weakly-dependent modules (i.e. these that are satisfied with traits and duck-typed interfaces) can safely import std.concept.xyz instead of std.xyz. E.g. std.regex would import std.concept.random to get isUniformRNG and rely on duck typing thusly described to use it correctly.

The change is backwards compatible and introduces no breakage.
Only clean sugar-free interdependence of modules in Phobos.

Later people (mostly library writers) can use e.g. std.concept.range
to avoid pulling full dependency tree in case only constraints are needed. The technique can be touted as coding guideline for template and duck-type heavy libraries.

Thoughts? Other ideas?

-- 
Dmitry Olshansky
April 24, 2013
On Wednesday, 24 April 2013 at 12:03:52 UTC, Dmitry Olshansky wrote:
> Recently I've struggled again to integrate a module in Phobos and the amount of curious forward reference bugs made think about a related but not equivalent problem.
>
> Intro
>
> Basically an import graph of Phobos is a rat's nest of mutual imports.
> With the most of modules being template "toolkits" you shouldn't pay for what you don't use. Yet it isn't true in general as the module may drag in other modules and with that all of static constructors and/or globals related.
>
> Adding even more fun to this stuff - to get a "constraint checker" you have to import all other stuff from that module (and transitively from all imported by it modules, see std.range example below).
>
> One motivating example (though I believe there are much more beyond this one):
>
> As some might know regular expression engine can be used for both generating sequences for a known pattern and matching/finding pieces that do match it in some data.
>
> In fact I have such a functionality in std.regex hidden from public interface (not yet complete, wasn't sure of the kind of API etc.).
>
> Now the *key* fact:
>
> auto generate(RegEx, rng) (RegEx re, Random rng)
> 	if(isRegex!RegEx && isUniformRNG!Random)
> {
> ...
> }
>
> Now given that innocent signature you have dependency on the whole of std.random *including* seeding the default random number generator at the program start!
>
> And recall that generating strings while neat is arguably more rare need then pattern matching.
>
> Same thing with std.datetime - to get an ability to accept any kind of Date as a template parameter (in your API) you have to get the whole std.datetime *even if the user never ever calls this function* of your module API.
>
> And everyone and their granny depends on full version of std.range
> that in turn depends on std.algorithm that depends on std.conv that depends on std.format(!) and that incidentally on std.uni.
> BTW std.uni turns out to be a kind of sink everybody imports sooner or later if only for unittests, sadly it's mostly imported unconditionally.
>
> And that skipping a full rat's nest to preserve the brains of the reader.
>
> After a couple of frustrating evenings dealing with it (multiplied by the bogus dmd) I've come up with the idea below.
>
>
> Solution
>
> First of all no compiler magic required (phew-ew!) :)
>
> Not to mention the 2 obvious facts - smaller modules might help, as would guarding by version(unittest) imports used only for unit tests.
>
> What we need is to re-arrange the module hierarchy (and we need that anyway) so that we split off the "concept" part of modules to a separate package.
>
> That would allow modules that need this to use these Duck-typed entities (IFF the user ever passes such an entity) can stick with importing only the concept part.
>
> Applying that to the current layout would look like:
> std.concept.range
> std.concept.random
> std.concept.* //every other module with any useful isXYZ constraint
> std.* // stays as is
>
> Any module that has "concept" part then looks like this:
>
> module std.xyz;
> import std.concept.xyz;
> ... //the rest
>
> And then other weakly-dependent modules (i.e. these that are satisfied with traits and duck-typed interfaces) can safely import std.concept.xyz instead of std.xyz. E.g. std.regex would import std.concept.random to get isUniformRNG and rely on duck typing thusly described to use it correctly.
>
> The change is backwards compatible and introduces no breakage.
> Only clean sugar-free interdependence of modules in Phobos.
>
> Later people (mostly library writers) can use e.g. std.concept.range
> to avoid pulling full dependency tree in case only constraints are needed. The technique can be touted as coding guideline for template and duck-type heavy libraries.
>
> Thoughts? Other ideas?

April 24, 2013
On Wednesday, 24 April 2013 at 12:03:52 UTC, Dmitry Olshansky wrote:
> E.g. std.regex would import std.concept.random to get isUniformRNG and rely on duck typing thusly described to use it correctly.
> Thoughts? Other ideas?

how would this be different then limited imports such as:
import std.random: isUniformRNG;
?
April 24, 2013
On Wednesday, 24 April 2013 at 12:03:52 UTC, Dmitry Olshansky wrote:
> Basically an import graph of Phobos is a rat's nest of mutual imports.
> With the most of modules being template "toolkits" you shouldn't pay for what you don't use. Yet it isn't true in general as the module may drag in other modules and with that all of static constructors and/or globals related.
> Thoughts? Other ideas?

I think your concept idea introduces unnecessary complexity.

What are you actually worried about? Compile times? Program size? Startup time?

Is compile time a problem?

Program size should be handled by the compiler. It is much better at pruning dead code.

Startup time should be handled by the modules themselves. For example, std.random could initialize the global RNG only on demand.
April 24, 2013
Am Wed, 24 Apr 2013 16:03:47 +0400
schrieb Dmitry Olshansky <dmitry.olsh@gmail.com>:

> Thoughts? Other ideas?
> 

Sounds good to me. We might extend this idea and also add interfaces to the concept modules. Tango did that and IIRC C# does this as well.
April 24, 2013
24-Apr-2013 19:56, Joshua Niehus пишет:
> On Wednesday, 24 April 2013 at 12:03:52 UTC, Dmitry Olshansky wrote:
>> E.g. std.regex would import std.concept.random to get isUniformRNG and
>> rely on duck typing thusly described to use it correctly.
>> Thoughts? Other ideas?
>
> how would this be different then limited imports such as:
> import std.random: isUniformRNG;
> ?

No matter how it looks to you this line means:
pull in whatever is std.random and make symbol isUniformRNG visible.

Since compiler can't know  (well that might improve but not anytime soon) that isUniformRNG is independent of static ctors/dtors and globals in that module it has to run both.

Strictly speaking what is required is breaking up modules more meaningfully.


-- 
Dmitry Olshansky
April 24, 2013
24-Apr-2013 20:08, qznc пишет:
> On Wednesday, 24 April 2013 at 12:03:52 UTC, Dmitry Olshansky wrote:
>> Basically an import graph of Phobos is a rat's nest of mutual imports.
>> With the most of modules being template "toolkits" you shouldn't pay
>> for what you don't use. Yet it isn't true in general as the module may
>> drag in other modules and with that all of static constructors and/or
>> globals related.
>> Thoughts? Other ideas?
>
> I think your concept idea introduces unnecessary complexity.

It reduces the complexity of full module imports and helps avoid circular dependencies. Problem is that compiler can't know that all you need that module for is a few isolated templates.

If compiler sees this:

module abc;

import xyz;

That means that
a) abc depends xyz and thus all of the  global state in xyz if there are any. Cue to the idea that globals are bad - you can't easily track the usage of them esp. with separate compilation model.

b) if xyz happen to use stuff from abc compiler has to "turn on" cross module dependency checks and define the order of ctor/dtor evaluation.
More importantly in current setting is the fact that it's not good at it and conservative (as it may as well stay forever).

c) It may be the case that both modules define ctors and then you have genuine circular dependency.

d) Another (bogus) case is that it may as throw up hands in the air and spit a bunch of forward reference bugs.

Arguably it could be framed as poor modularity in the Phobos design.
A specific problem with templates is that in order to get a "duck type" you pull the whole innards of module.

That's why I see that duck types could and should be peeled off from modules.

> What are you actually worried about? Compile times? Program size?
> Startup time?

It affects all of it.

First and furthermost unnecessary and unavoidable junk that resides in your program. Unnecessary "fake" dependencies on stuff your module doesn't need.  In the end with current setting a single touch of say std.file pulls in a measurable amount of stuff (including ctors/dtors that are run at startup/shutdown) you never wanted.

>
> Is compile time a problem?

For libraries modularity and minimal dependencies are corner stones of good design. I'd throw among these flexibility and pay and as go principle as other key concerns.

The fact that D compiles fast can always be undermined by the way we structure the code and dependencies.

> Program size should be handled by the compiler. It is much better at
> pruning dead code.

In case it know the code is dead. Throwing bunch of stuff at it that is actually interdependent (the way it's written) doesn't help. The fact that a lot of it is truly independent bears no relation to it partially because of the compilation model.

>
> Startup time should be handled by the modules themselves. For example,
> std.random could initialize the global RNG only on demand.

Doesn't help to have gobla data for it. Not to say that the said dead code for lazy initialization would pulled in always. And the other guy that needs global PRGN now has to go through a "is-it-inited-yet" hook _always_. You suggestion is a net loose on both counts then.

To summarize your point - you don't care and/or don't see it as a problem. That's fine but and it then it just doesn't affect you in any way.
For Phobos developers to work a bit harder to design a cleaner dependency chain so that you code loads less junk is a net gain.
All of that should concern Phobos guys and library writers.

Another question - did you ever read C run-time sources? They are carefully modularized "by hand" so that if you want say printf you get only what you truly need for it.

We currently do a very bad job at this kind of thing and it's not entirely the compiler's fault.


-- 
Dmitry Olshansky
April 25, 2013
On Wednesday, 24 April 2013 at 19:33:51 UTC, Dmitry Olshansky wrote:
> 24-Apr-2013 20:08, qznc пишет:
>> What are you actually worried about? Compile times? Program size?
>> Startup time?
>
> It affects all of it.

I don't know if you are right, but I think the case would be made visible and compelling with some benchmarks showing the compile time, executable size, and run-time differences between the existing mode and the proposed mode.
April 26, 2013
On Wednesday, April 24, 2013 16:03:47 Dmitry Olshansky wrote:
> What we need is to re-arrange the module hierarchy (and we need that anyway) so that we split off the "concept" part of modules to a separate package.

> Thoughts? Other ideas?

I'm a bit divided on the idea. On the one hand, it allows us to reduce interdependencies. On the other hand, it's definitely complicating the module hierarchy. This whole idea is a bit like the .h/.cpp or .di/.d separation. On the whole, I prefer the model of shoving it all in one file, but given that Phobos is the standard library (of a _systems_ language no less), the added complication may very well be worth the benefits in dependency reduction.

Still, given that we're talking about templates here, most of it shouldn't end up in the generated executable or library if it's not used, and we've already been moving away from static constructors in Phobos, and global/module level variables should already be quite rare. And once Phobos is a shared library, the few global/module level variables we have should cost even less. So, I'm not sure that the extra complication is really worth it. If the compiler and linker are doing their job, the only real difference should be in how much the various modules in Phobos need to be parsed, which is very fast with dmd, and most programs of any size are going to pull in all of the dependencies anyway.

I'm inclined to avoid doing this if we don't really need to, but if there's a solid benefit to it, then it may be that we really should do something like this.

On a side note, given that we sometimes call eponymous templates like isForwardRange traits, calling the sub-module trait or traits rather than concepts might be better (probably std.trait given that std.traits is already taken, though that would probably then become std.trait.traits given that it's entirely made up of traits).

- Jonathan M Davis
April 26, 2013
26-Apr-2013 03:20, Zach the Mystic пишет:
> On Wednesday, 24 April 2013 at 19:33:51 UTC, Dmitry Olshansky wrote:
>> 24-Apr-2013 20:08, qznc пишет:
>>> What are you actually worried about? Compile times? Program size?
>>> Startup time?
>>
>> It affects all of it.
>
> I don't know if you are right, but I think the case would be made
> visible and compelling with some benchmarks showing the compile time,
> executable size, and run-time differences between the existing mode and
> the proposed mode.

So the keyword is evidence. I'll give it a go then.

-- 
Dmitry Olshansky
« First   ‹ Prev
1 2