View mode: basic / threaded / horizontal-split · Log in · Help
December 17, 2011
Re: Program size, linking matter, and static this()
On Friday, December 16, 2011 18:05:56 Andrei Alexandrescu wrote:
> On 12/16/11 5:50 PM, Jonathan M Davis wrote:
> http://en.wikipedia.org/wiki/Singleton_pattern
> 
> Second paragraph.

Valid points, but it's still useful under some circumstances. I don't actually 
use it very often personally. It just made sense here. Thanks for the link.

> You're using a stilted version of it. Most often the singleton object is
> created lazily upon the first access, whereas std.datetime creates the
> object (and therefore shotguns linkage with the garbage collector) even
> if never needed.
> 
> But what I'm trying here is to lift the level of discourse. The
> Singleton sounds like the solution of choice already presupposing that
> inheritance and polymorphism are good decisions. What I'm trying to say
> is that D should be rich enough to allow you considerable freedom in the
> design space, so we should have enough means to navigate around this one
> particular issue. I don't think we can say with a straight face we can't
> avoid use of static this inside std.datetime.

The only reason that it's not lazily loading is because of the purity issue an 
the fact that it would require a mutex. The mutex we can live with. pure can't 
be gotten around easily, but I'll figure it out.

As for the general design, SysTime needs to be able to dynamically adjust its 
value based on the time zone upon request (e.g. asking for the SysTime as a 
string or asking for the that SysTime's year). That essentially requires that 
the set of functions required for the calculations be swappable (preferably as 
a group, since that's far cleaner). Encapsulating it in a class gives you that 
polymorphic behavior quite nicely and also groups the various functions quite 
nicely. It also gives you a nice place to put some stuff like the time zone's 
name. Sure, we could theoretically change it to' be struct which holds 
function pointers, but that seems to me like you're pretty much just trying to 
redesign classes that way. I think that the basic design is solid.

> > There would be fewer potential issues with circular dependencies if
> > std.datetime were broken up, but the consensus seems to be that we don't
> > want to do that. Regardless, if I find a way to lazily load the
> > singletons in spite of immutable and pure, then there won't be any more
> > need for the static constructors for them. There's still one for the
> > unit tests, but worse comes to worst, that functionality could be moved
> > to a function which is called by the first unittest block.
> 
> Maybe the choice of immutable and pure is too restrictive. How about
> making the object returned const?

SysTime holds an immutable TimeZone (currently with Rebindable). In theory, 
this should have the advantage of making it possible to pass a SysTime across 
with send and receive, but bugs in the compiler currently make it impossible 
to construct and immutable SysTime. So, all TimeZone objects are const, or 
they won't work with SysTime. And since there's not normally a reason to 
change any of the values in a TimeZone (they don't hold much data in the first 
place), that's really not a problem.

The only problem with making it immutable has to do with the singleton. I 
suppose that it could be change to Rebindable!(immutable TimeZone) like in 
SysTime, but when I designed it, there didn't seem much point to that, since 
it had to be constructed at runtime and required a static constructor 
regardless. And I was trying to make absolutely as much in std.datetime pure 
as possible, which inevitably led to the singletons being pure. Making them 
impure makes it so that a variety of other functions can't be pure and would 
break code. I don't remember how much however.

Regardless, to avoid breaking code, it has to pure. It's possible that the 
code breakage would be worth it, but I'd have to mess around with it to see. 
With appropriate casts, pure can be subverted, but that's obviously ugly.

> Under what circumstances it doesn't work,

I couldn't move the singletons out of std.datetime in that way. pure disallows 
it.

> and how would adding _more_
> support for _less_ safety would be better than a glorified cast that you
> can use _today_?
>
> > Clearly, I'm not going to win any arguments on this, given that both you
> > and Walter are definitely opposed, but I definitely think that the
> > current situation with circular dependencies is one of D's major warts.
> 
> I'm not nailed to the floor. Any good arguments would definitely change
> my opinion.

I don't think that I have ever seen an _actual_ circular dependency when a 
program blows up because of it. It's always a case of the two modules doing 
completely unrelated stuff with their static constructors. It's generally 
incredibly obvious that there's no interdependency, but the compiler/runtime 
isn't smart enough to see that. And if you use static constructors much (which 
invariably happens if you have much in the way of immutable variables which 
are commonly used enough to put at module or class scope), you run into this 
problem fairly easily. And given the large amount of inter-module importing in 
Phobos, it's _very_ easy to run into the problem there if we use static 
constructors.

When such circular dependencies happen, it's a royal pain to sort out what's 
going on - especially if the modules to import each other directly. The error 
messages have improved, but it's still nasty to sort out exactly what's 
happening. And then fixing it? Assuming that you can use the solution that some 
of Phobos' modules use by having a secondary module for the initialization, 
then there's a way to do it, but that solution is quite ugly IMHO, and 
regardless of that, it's _not_ in the least bit obvious. I don't know that I 
ever would have thought of it myself (maybe, maybe not).

So, the programmer is essentially faced with a situation where they have two 
modules with static constructors that they can clearly see are completely 
unrelated, but they're going to have to do some major refactoring to get 
around the issue that the compiler and runtime _aren't_ smart enough to see 
that there order that the modules are initialized doesn't matter at all. _If_ 
they think of the solution that Phobos uses or are lucky enough to have 
someone else points it out to them _and_ it's actually possible to refactor 
the static constructor out like that, then the solution is doable, albeit 
arguably on the ugly side. But that's assuming a lot IMHO.

By contrast, we could have a simple feature that was explained in the 
documenation along with static constructors which made it easy to tell the 
compiler that the order doesn't matter - either by saying that it doesn't 
matter at all or that it doesn't matter in regards to a specific module. e.g.

@nodepends(std.file)
static this()
{
}

Now the code doesn't have to be redesigned to get around the fact that the 
compiler just isn't smart enough to figure it out on its own. Sure, the feature 
is potentially unsafe, but so are plenty of other features in D. The best 
situation would be if the compiler was smart enough to figure it out for 
itself, but barring that this definitely seems like a far cleaner solution than 
having to try and figure out how to break up some of the initialization code 
for a module into a separate module, especially when features such as 
immutable and pure tend to make such separation impossible without some nasty 
casts. It would just be way simpler to have a feature which allowed you to 
tell the compiler that there was no dependency.

I'd probably feel differently about this if static constructors tended to have 
actual interdependencies, but they are almost invariably used for initializing 
immutable variables and the like and have no dependencies on other modules at 
all. It's other stuff in the modules which have those interdependencies.

- Jonathan M Davis
December 17, 2011
Re: Program size, linking matter, and static this()
On 12/16/11 6:54 PM, Jonathan M Davis wrote:
> By contrast, we could have a simple feature that was explained in the
> documenation along with static constructors which made it easy to tell the
> compiler that the order doesn't matter - either by saying that it doesn't
> matter at all or that it doesn't matter in regards to a specific module. e.g.
>
> @nodepends(std.file)
> static this()
> {
> }
>
> Now the code doesn't have to be redesigned to get around the fact that the
> compiler just isn't smart enough to figure it out on its own. Sure, the feature
> is potentially unsafe, but so are plenty of other features in D.

That is hardly a good argument in favor of the feature :o).

One issue that you might have not considered is that this is more 
brittle than it might seem. Even though the dependency pattern is 
"painfully obvious" to the human at a point in time, maintenance work 
can easily change that, and in very non-obvious ways (e.g. dependency 
cycles spanning multiple modules). I've seen it happening in C++, and 
when you realize it it's quite mind-boggling.

> The best
> situation would be if the compiler was smart enough to figure it out for
> itself, but barring that this definitely seems like a far cleaner solution than
> having to try and figure out how to break up some of the initialization code
> for a module into a separate module, especially when features such as
> immutable and pure tend to make such separation impossible without some nasty
> casts. It would just be way simpler to have a feature which allowed you to
> tell the compiler that there was no dependency.

I think the only right approach to this must be principled - either by 
CTFEing the constructor or by guaranteeing it calls no functions that 
may close a dependency cycle. Even without that, I'd say we're in very 
good shape.


Andrei
December 17, 2011
Re: Program size, linking matter, and static this()
Sean Kelly:

> On Dec 16, 2011, at 1:48 PM, Andrei Alexandrescu wrote:
> > Sure you meant static ubyte[__traits(classInstanceSize, T)]
> > and emplace :o).
> 
> Don't forget the 16 byte alignment :-)

Is it possible to support this in D2/D3?

align(16) static ubyte[__traits(classInstanceSize, T)] _localTime;

There are some situations I'd like a static array to be aligned to 16 bytes.

Bye,
bearophile
December 17, 2011
Re: Program size, linking matter, and static this()
On Fri, 16 Dec 2011 19:29:18 +0100, Andrei Alexandrescu  
<SeeWebsiteForEmail@erdani.org> wrote:

> Hello,
>
>
> Late last night Walter and I figured a few interesting tidbits of  
> information. Allow me to give some context, discuss them, and sketch a  
> few approaches for improving things.
>
> A while ago Walter wanted to enable function-level linking, i.e. only  
> get the needed functions from a given (and presumably large) module. So  
> he arranged things that a library contains many small object "files"  
> (that actually are generated from a single .d file and never exist on  
> disk, only inside the library file, which can be considered an archive  
> like tar). Then the linker would only pick the used object "files" from  
> the library and link those in. Unfortunately that didn't have nearly the  
> expected impact - essentially the size of most binaries stayed the same.  
> The mystery was unsolved, and Walter needed to move on to other things.
>
> One particularly annoying issue is that even programs that don't  
> ostensibly use anything from an imported module may balloon inexplicably  
> in size. Consider:
>
> import std.path;
> void main(){}
>
> This program, after stripping and all, has some 750KB in size. Removing  
> the import line reduces the size to 218KB. That includes the runtime  
> support, garbage collector, and such, and I'll consider it a baseline.  
> (A similar but separate discussion could be focused on reducing the  
> baseline size, but herein I'll consider it constant.)
>
> What we'd simply want is to be able to import stuff without blatantly  
> paying for what we don't use. If a program imports std.path and uses no  
> function from it, it should be as large as a program without the import.  
> Furthermore, the increase should be incremental - using 2-3 functions  
> from std.path should only increase the executable size by a little, not  
> suddenly link in all code in that module.
>
> But in experiments it seemed like program size would increase in sudden  
> amounts when certain modules were included. After much investigation we  
> figured that the following fateful causal sequence happened:
>
> 1. Some modules define static constructors with "static this()" or  
> "static shared this()", and/or static destructors.
>
> 2. These constructors/destructors are linked in automatically whenever a  
> module is included.
>
> 3. Importing a module with a static constructor (or destructor) will  
> generate its ModuleInfo structure, which contains static information  
> about all module members. In particular, it keeps virtual table pointers  
> for all classes defined inside the module.
>
> 4. That means generating ModuleInfo refers all virtual functions defined  
> in that module, whether they're used or not.
>
> 5. The phenomenon is transitive, e.g. even if std.path has no static  
> constructors but imports std.datetime which does, a ModuleInfo is  
> generated for std.path too, in addition to the one for std.datetime. So  
> now classes inside std.path (if any) will be all linked in.
>
> 6. It follows that a module that defines classes which in turn use other  
> functions in other modules, and has static constructors (or includes  
> other modules that do) will baloon the size of the executable suddenly.
>
> There are a few approaches that we can use to improve the state of  
> affairs.
>
> A. On the library side, use static constructors and destructors  
> sparingly inside druntime and std. We can use lazy initialization  
> instead of compulsively initializing library internals. I think this is  
> often a worthy thing to do in any case (dynamic libraries etc) because  
> it only does work if and when work needs to be done at the small cost of  
> a check upon each use.
>
> B. On the compiler side, we could use a similar lazy initialization  
> trick to only refer class methods in the module if they're actually  
> needed. I'm being vague here because I'm not sure what and how that can  
> be done.
>
> Here's a list of all files in std using static cdtors:
>
> std/__fileinit.d
> std/concurrency.d
> std/cpuid.d
> std/cstream.d
> std/datebase.d
> std/datetime.d
> std/encoding.d
> std/internal/math/biguintcore.d
> std/internal/math/biguintx86.d
> std/internal/processinit.d
> std/internal/windows/advapi32.d
> std/mmfile.d
> std/parallelism.d
> std/perf.d
> std/socket.d
> std/stdiobase.d
> std/uri.d
>
> The majority of them don't do a lot of work and are not much used inside  
> phobos, so they don't blow up the executable. The main one that could  
> receive some attention is std.datetime. It has a few static ctors and a  
> lot of classes. Essentially just importing std.datetime or any std  
> module that transitively imports std.datetime (and there are many of  
> them) ends up linking in most of Phobos and blows the size up from the  
> 218KB baseline to 700KB.
>
> Jonathan, could I impose on you to replace all static cdtors in  
> std.datetime with lazy initialization? I looked through it and it  
> strikes me as a reasonably simple job, but I think you'd know better  
> what to do than me.
>
> A similar effort could be conducted to reduce or eliminate static cdtors  
> from druntime. I made the experiment of commenting them all, and that  
> reduced the size of the baseline from 218KB to 200KB. This is a good  
> amount, but not as dramatic as what we can get by working on  
> std.datetime.
>
>
> Thanks,
>
> Andrei

We'd need the linker to do anything of this. Unreferenced symbols should  
be outputted using
kind of vague linkage (multiobj partly does this). I-reference-everything  
stuff link ModuleInfos
should only create weak references. This includes that localClasses might  
contain only
part of the actual module. People can use the designated export attribute  
to forcefully
output unused symbols.
December 17, 2011
Re: Program size, linking matter, and static this()
On Sat, 17 Dec 2011 07:09:50 +0100, Martin Nowak <dawg@dawgfoto.de> wrote:

> On Fri, 16 Dec 2011 19:29:18 +0100, Andrei Alexandrescu  
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> Hello,
>>
>>
>> Late last night Walter and I figured a few interesting tidbits of  
>> information. Allow me to give some context, discuss them, and sketch a  
>> few approaches for improving things.
>>
>> A while ago Walter wanted to enable function-level linking, i.e. only  
>> get the needed functions from a given (and presumably large) module. So  
>> he arranged things that a library contains many small object "files"  
>> (that actually are generated from a single .d file and never exist on  
>> disk, only inside the library file, which can be considered an archive  
>> like tar). Then the linker would only pick the used object "files" from  
>> the library and link those in. Unfortunately that didn't have nearly  
>> the expected impact - essentially the size of most binaries stayed the  
>> same. The mystery was unsolved, and Walter needed to move on to other  
>> things.
>>
>> One particularly annoying issue is that even programs that don't  
>> ostensibly use anything from an imported module may balloon  
>> inexplicably in size. Consider:
>>
>> import std.path;
>> void main(){}
>>
>> This program, after stripping and all, has some 750KB in size. Removing  
>> the import line reduces the size to 218KB. That includes the runtime  
>> support, garbage collector, and such, and I'll consider it a baseline.  
>> (A similar but separate discussion could be focused on reducing the  
>> baseline size, but herein I'll consider it constant.)
>>
>> What we'd simply want is to be able to import stuff without blatantly  
>> paying for what we don't use. If a program imports std.path and uses no  
>> function from it, it should be as large as a program without the  
>> import. Furthermore, the increase should be incremental - using 2-3  
>> functions from std.path should only increase the executable size by a  
>> little, not suddenly link in all code in that module.
>>
>> But in experiments it seemed like program size would increase in sudden  
>> amounts when certain modules were included. After much investigation we  
>> figured that the following fateful causal sequence happened:
>>
>> 1. Some modules define static constructors with "static this()" or  
>> "static shared this()", and/or static destructors.
>>
>> 2. These constructors/destructors are linked in automatically whenever  
>> a module is included.
>>
>> 3. Importing a module with a static constructor (or destructor) will  
>> generate its ModuleInfo structure, which contains static information  
>> about all module members. In particular, it keeps virtual table  
>> pointers for all classes defined inside the module.
>>
>> 4. That means generating ModuleInfo refers all virtual functions  
>> defined in that module, whether they're used or not.
>>
>> 5. The phenomenon is transitive, e.g. even if std.path has no static  
>> constructors but imports std.datetime which does, a ModuleInfo is  
>> generated for std.path too, in addition to the one for std.datetime. So  
>> now classes inside std.path (if any) will be all linked in.
>>
>> 6. It follows that a module that defines classes which in turn use  
>> other functions in other modules, and has static constructors (or  
>> includes other modules that do) will baloon the size of the executable  
>> suddenly.
>>
>> There are a few approaches that we can use to improve the state of  
>> affairs.
>>
>> A. On the library side, use static constructors and destructors  
>> sparingly inside druntime and std. We can use lazy initialization  
>> instead of compulsively initializing library internals. I think this is  
>> often a worthy thing to do in any case (dynamic libraries etc) because  
>> it only does work if and when work needs to be done at the small cost  
>> of a check upon each use.
>>
>> B. On the compiler side, we could use a similar lazy initialization  
>> trick to only refer class methods in the module if they're actually  
>> needed. I'm being vague here because I'm not sure what and how that can  
>> be done.
>>
>> Here's a list of all files in std using static cdtors:
>>
>> std/__fileinit.d
>> std/concurrency.d
>> std/cpuid.d
>> std/cstream.d
>> std/datebase.d
>> std/datetime.d
>> std/encoding.d
>> std/internal/math/biguintcore.d
>> std/internal/math/biguintx86.d
>> std/internal/processinit.d
>> std/internal/windows/advapi32.d
>> std/mmfile.d
>> std/parallelism.d
>> std/perf.d
>> std/socket.d
>> std/stdiobase.d
>> std/uri.d
>>
>> The majority of them don't do a lot of work and are not much used  
>> inside phobos, so they don't blow up the executable. The main one that  
>> could receive some attention is std.datetime. It has a few static ctors  
>> and a lot of classes. Essentially just importing std.datetime or any  
>> std module that transitively imports std.datetime (and there are many  
>> of them) ends up linking in most of Phobos and blows the size up from  
>> the 218KB baseline to 700KB.
>>
>> Jonathan, could I impose on you to replace all static cdtors in  
>> std.datetime with lazy initialization? I looked through it and it  
>> strikes me as a reasonably simple job, but I think you'd know better  
>> what to do than me.
>>
>> A similar effort could be conducted to reduce or eliminate static  
>> cdtors from druntime. I made the experiment of commenting them all, and  
>> that reduced the size of the baseline from 218KB to 200KB. This is a  
>> good amount, but not as dramatic as what we can get by working on  
>> std.datetime.
>>
>>
>> Thanks,
>>
>> Andrei
>
> We'd need the linker to do anything of this. Unreferenced symbols should  
> be outputted using
> kind of vague linkage (multiobj partly does this).  
> I-reference-everything stuff link ModuleInfos
> should only create weak references. This includes that localClasses
More concrete if we'd output weak defined symbols (null) for what is  
referenced
by a ModuleInfo then the linker should not open further object files to
find a definition. But if another definition is linked in it will replace
the weak definition. The program would then need to skip the dummy symbols  
(null)
at runtime.

> might contain only
> part of the actual module. People can use the designated export  
> attribute to forcefully
> output unused symbols.
December 17, 2011
Re: Program size, linking matter, and static this()
On 12/17/11 12:27 AM, Martin Nowak wrote:
>> We'd need the linker to do anything of this. Unreferenced symbols
>> should be outputted using
>> kind of vague linkage (multiobj partly does this).
>> I-reference-everything stuff link ModuleInfos
>> should only create weak references. This includes that localClasses
> More concrete if we'd output weak defined symbols (null) for what is
> referenced
> by a ModuleInfo then the linker should not open further object files to
> find a definition. But if another definition is linked in it will replace
> the weak definition. The program would then need to skip the dummy
> symbols (null)
> at runtime.

I think it would be awesome to exploit weak symbols.

Andrei
December 17, 2011
Re: Program size, linking matter, and static this()
On Sat, 17 Dec 2011 01:50:51 +0200, Jonathan M Davis <jmdavisProg@gmx.com>  
wrote:

> On Friday, December 16, 2011 17:13:49 Andrei Alexandrescu wrote:
>> Maybe there's an issue with the design. Maybe Singleton (the most damned
>> of all patterns) is not the best choice here. Or maybe the use of an
>> inheritance hierarchy with a grand total of 4 classes. Or maybe the
>> encapsulation could be rethought.
>>
>> The general point is, a design lives within a language. Any language is
>> going to disallow a few designs or make them unsuitable for particular
>> situation. This is, again, multiplied by the context: it's the standard
>> library.
>
> I don't know what's wrong with singletons. It's a great pattern in  
> certain
> circumstances.

I don't like patterns much but when it comes to singleton i absolutely  
hate it.
Just ask yourself what does it do to earn that fancy name. NOTHING. It is  
nothing but a
hype of those who want to rule everything with one paradigm. Generic  
solutions/rules/paradigms
are our final target WHEN they are elegant.

If you are using singleton in your C++/D (or any other M-P language) code,  
do yourself a favor and trash that book you learned it from.

---
class A {
  static A make();
}

class B;
B makeB();
---

What A.make can do makeB can not? (Other than creating objects of two  
different types :P )
December 17, 2011
Re: Program size, linking matter, and static this()
Le 17/12/2011 00:18, maarten van damme a écrit :
> how did other languages solve this issue? I can't imagine D beeing the
> only language with static constructors, do they have that problem too?

AFAIK, I believe like in D, it's best practice to avoid static
constructors as much as possible in Java, Python and I imagine C# as
well, even though the running order is well-defined.

The dependency injection design pattern seems to help here.
December 17, 2011
Re: Program size, linking matter, and static this()
Le 16/12/2011 22:45, Andrei Alexandrescu a écrit :
> On 12/16/11 3:38 PM, Trass3r wrote:
>> A related issue is phobos being an intermodule dependency monster.
>> A simple hello world pulls in almost 30 modules!
>> And std.stdio is supposed to be just a simple wrapper around C FILE.
> 
> In fact it doesn't (after yesterday's commit). The std code in hello,
> world is a minuscule 3KB. The rest of 218KB is druntime.
> 
> Once we solve the static constructor issue, function-level linking
> should take care of pulling only the minimum needed.
> 
> One interesting fact is that a lot of issues that I tended to take
> non-critically ("templates cause bloat", "intermodule dependencies cause
> bloat", "static linking creates large programs") looked a whole lot
> differently when I looked closer at causes and effects.
> 
> 
> Andrei

Fantastic ! :)
December 17, 2011
Re: Program size, linking matter, and static this()
Le 17/12/2011 02:39, Andrei Alexandrescu a écrit :
> On 12/16/11 6:54 PM, Jonathan M Davis wrote:
>> By contrast, we could have a simple feature that was explained in the
>> documenation along with static constructors which made it easy to tell
>> the
>> compiler that the order doesn't matter - either by saying that it doesn't
>> matter at all or that it doesn't matter in regards to a specific
>> module. e.g.
>>
>> @nodepends(std.file)
>> static this()
>> {
>> }
>>
>> Now the code doesn't have to be redesigned to get around the fact that
>> the
>> compiler just isn't smart enough to figure it out on its own. Sure,
>> the feature
>> is potentially unsafe, but so are plenty of other features in D.
>
> That is hardly a good argument in favor of the feature :o).
>
> One issue that you might have not considered is that this is more
> brittle than it might seem. Even though the dependency pattern is
> "painfully obvious" to the human at a point in time, maintenance work
> can easily change that, and in very non-obvious ways (e.g. dependency
> cycles spanning multiple modules). I've seen it happening in C++, and
> when you realize it it's quite mind-boggling.
>
>> The best
>> situation would be if the compiler was smart enough to figure it out for
>> itself, but barring that this definitely seems like a far cleaner
>> solution than
>> having to try and figure out how to break up some of the
>> initialization code
>> for a module into a separate module, especially when features such as
>> immutable and pure tend to make such separation impossible without
>> some nasty
>> casts. It would just be way simpler to have a feature which allowed
>> you to
>> tell the compiler that there was no dependency.
>
> I think the only right approach to this must be principled - either by
> CTFEing the constructor or by guaranteeing it calls no functions that
> may close a dependency cycle. Even without that, I'd say we're in very
> good shape.
>
>
> Andrei

Very good point. CTFE is improving with each version of dmd, and is a 
real alternative to static this(); It should be considered when 
apropriate, it has many benefices.
3 4 5 6 7 8 9 10 11
Top | Discussion index | About this forum | D home