Jump to page: 1 2
Thread overview
Re: Compilation times and idiomatic D code
Jul 14
Seb
Jul 15
Enamex
Jul 15
Enamex
5 days ago
H. S. Teoh
4 days ago
Olivier FAURE
5 days ago
jmh530
5 days ago
Stefan Koch
July 05
On Wednesday, July 05, 2017 13:12:40 H. S. Teoh via Digitalmars-d wrote: [...]
> IOW, the very range-based idiom that has become one of the defining characteristics of modern D is negating the selling point of fast compilation.
>
> I vaguely remember there was talk about compressing symbols when they get too long... is there any hope of seeing this realized in the near future?

In this case, I think that it's more that Voldemort types are biting us than that ranges are biting us (though the Voldemort types are typically ranges, so in practice, using ranges ends up biting you).

https://issues.dlang.org/show_bug.cgi?id=15831

I think that there's a good argument that Voldemort types aren't actually a particularly good idea given these issues, but the work that's being done to compress symbols should at least reduce the problem.

- Jonathan M Davis

July 13
On Wed, Jul 05, 2017 at 09:45:55PM -0600, Jonathan M Davis via Digitalmars-d wrote: [...]
> In this case, I think that it's more that Voldemort types are biting us than that ranges are biting us (though the Voldemort types are typically ranges, so in practice, using ranges ends up biting you).
> 
> https://issues.dlang.org/show_bug.cgi?id=15831
> 
> I think that there's a good argument that Voldemort types aren't actually a particularly good idea given these issues, but the work that's being done to compress symbols should at least reduce the problem.
[...]

Here's the latest update on this bit of drama:

Today, I decided to sit down and refactor my code.  I've heard tell that Voldemort types tend to cause an explosion in symbol size, and today I thought it'd be interesting to see just how big of a difference this actually makes.

I could not believe my eyes.

Before the refactoring, my executable sizes (I have 2 executables in this codebase) were 37.3MB and 60.0MB, respectively.

After the refactoring, my executable sizes were, respectively, 8.2MB and 10.2MB. That's a 5x and 6x reduction in executable size, respectively. Whoa.

And all I did was to move Voldemort structs out into module-level private structs. I.e., this:

	auto myFunc(Args...)(Args args) {
		static struct Result {
			...
		}
		return Result(args);
	}

became this:

	private MyFuncImpl(Args...)
	{
		...
	}

	auto myFunc(Args...)(Args args) {
		return MyFuncImpl!Args(args);
	}

And you get an instant 5x to 6x reduction in executable size.

Now mind you, the ridiculous symbol size problem still exists; the executable still contains some huge symbols.  The difference is that now the longest symbol is "only" 86KB, whereas before the refactoring, the longest symbol was 777KB (!).  So the problem is less severe, but still present.  So symbol compression is still a very important part of the eventual solution.


T

-- 
Windows: the ultimate triumph of marketing over technology. -- Adrian von Bidder
July 13
On 7/13/17 3:23 PM, H. S. Teoh via Digitalmars-d wrote:
> Today, I decided to sit down and refactor my code.  I've heard tell that
> Voldemort types tend to cause an explosion in symbol size, and today I
> thought it'd be interesting to see just how big of a difference this
> actually makes.
> 
> I could not believe my eyes.
> 
> Before the refactoring, my executable sizes (I have 2 executables in
> this codebase) were 37.3MB and 60.0MB, respectively.
> 
> After the refactoring, my executable sizes were, respectively, 8.2MB and
> 10.2MB. That's a 5x and 6x reduction in executable size, respectively.
> Whoa.

Yep.

http://www.schveiguy.com/blog/2016/05/have-your-voldemort-types-and-keep-your-disk-space-too/

I was surprised as well at the reduction. I only noticed the problem because when I was debugging my library, the program was taking an unreasonably long time to *print a stack trace* (like multiple seconds).

-Steve
July 13
On Thu, Jul 13, 2017 at 04:16:50PM -0400, Steven Schveighoffer via Digitalmars-d wrote: [...]
> http://www.schveiguy.com/blog/2016/05/have-your-voldemort-types-and-keep-your-disk-space-too/
> 
> I was surprised as well at the reduction. I only noticed the problem because when I was debugging my library, the program was taking an unreasonably long time to *print a stack trace* (like multiple seconds).
[...]

Whoa.  I just tried your 'horcrux' trick, and discovered that it causes a further reduction in executable size!

Original sizes:
	38144296 bytes (38.1 MB)
	61412056 bytes (61.4 MB)

After factoring out Voldemort structs as module-global private structs:

	 8332352 bytes (8.3 MB)
	10376584 bytes (10.4 MB)

After further refactoring to use the "horcrux" trick:
	 8147392 bytes (8.1 MB)
	10124136 bytes (10.1 MB)

While this last reduction is rather modest relative to the total executable sizes, it still represents a reduction of 200KB and 300KB worth of symbols(!) each.  Given that this code base is a mere 5000+ LOC, I'd expect significant savings in larger template-heavy projects.


My guess as to reason for this reduction is because now you only have to instantiate one template for each template function call, rather than two:

	private struct MyFuncImpl(Args...) { ... }
	auto myFunc(Args...)(Args args) { return MyFuncImpl!Args(args); }

will instantiate two templates, MyFuncImpl and myFunc, each time with the same argument list; whereas:

	template myFunc(Args...)
	{
		auto myFunc(Args args) { return Impl(args); }
		struct Impl { ... }
	}

only requires a single template instantiation of myFunc, since Impl is instantiated along with the function itself.

There's also the slightly shorter names involved (`myFunc!Args.Impl` vs.
`MyFuncImpl!Args.MyFuncImpl`), which, given the O(n^2) or O(n^3) symbol
length dependency, can quickly add up to non-trivial amounts of savings.


T

-- 
Those who don't understand D are condemned to reinvent it, poorly. -- Daniel N
July 14
On Thu, Jul 13, 2017 at 03:27:31PM -0700, H. S. Teoh via Digitalmars-d wrote:
> On Thu, Jul 13, 2017 at 04:16:50PM -0400, Steven Schveighoffer via Digitalmars-d wrote: [...]
> > http://www.schveiguy.com/blog/2016/05/have-your-voldemort-types-and-keep-your-disk-space-too/
> [...]
> 
> Whoa.  I just tried your 'horcrux' trick, and discovered that it causes a further reduction in executable size!
> 
> Original sizes:
> 	38144296 bytes (38.1 MB)
> 	61412056 bytes (61.4 MB)
> 
> After factoring out Voldemort structs as module-global private structs:
> 
> 	 8332352 bytes (8.3 MB)
> 	10376584 bytes (10.4 MB)
> 
> After further refactoring to use the "horcrux" trick:
> 	 8147392 bytes (8.1 MB)
> 	10124136 bytes (10.1 MB)
[...]

Here's a further update to the saga of combating ridiculously large symbol sizes.

So yesterday I wrote a new module that also heavily uses UFCS chains. My initial draft of the module, once I linked it with the main program, particularly with a UFCS chain that has led to the 600MB executable sizes seen above, caused another explosion in symbol size that actually managed to reach 100MB in *one* symbol, triggering a DMD termination complaining about possible infinite template recursion. :-D  Funnier still, temporarily simplifying part of the chain to bring symbol sizes down, I eventually got it below 100MB but ended up with linker segfaults and ELF errors because the huge symbol was too ridiculously huge.

Eventually, it drove me to refactor two Phobos functions that are used heavily in my code: std.range.chunks and std.algorithm.joiner, using the same "horcrux" technique (see Phobos PRs #5610 and #5611). This, together with some further refactoring in my own code, eventually brought things down to the 20MB range of executable sizes.

Then an idea occurred to me: the reason these symbol sizes got so large, was because the UFCS chain preserves *all* type information about every component type used to build the final type. So, by necessity, the type name has to somehow encode all of this information in an unambiguous way.  Now, arguably, DMD's current mangling scheme is at fault because it contains too many repeating components, but even if you disregard that, the fact remains that if you have 50+ components in your overall UFCS chain, the symbol length cannot be less than 50*n where n is the average length of a single component's type name, plus some necessary overhead to account for the mangling scheme syntax. Let's say n is on average 20-25 characters, say round it up to 35 for mangling syntax, so you're still looking at upwards of 1700+ characters *minimum*.  That, coupled with the current O(n^2) / O(n^3) mangling scheme, you easily reach megabytes of symbol length.  We can compress the symbols all we want, but there's a limit as to how much compression will help. At the end of the day, you still have to represent those 50+ components *somehow*.

But what if most of this type information is actually *unnecessary*?  To use a range example, if all you care about at the end of the chain is that it's a forward range of ubyte, then why even bother with multi-MB symbols encoding type information that's never actually used?  Maybe a little type-erasure would help, via hiding those 50+ component types behind an opaque runtime OO polymorphic interface.  Phobos does have the facilities of this, in the form of the InputRange, ForwardRange, etc., interfaces in std.range.interfaces.  In my case, however, part of the chain uses another generic type (a kind of generalization of 2D arrays). But either way, the idea is simple: at the end of the UFCS chain, wrap it in a class object that inherits from a generic interface that encodes only the primitives of the 2D array concept, and the element type. The class object itself, of course, still must retain knowledge of the full-fledged type, but the trick is that if we insert this type erasure in the middle of the chain, then later components don't have to encode the type names of earlier components anymore.

So, tl;dr:

	// Before:
	auto r = input
	     .ufcsFunc1
	     .ufcsFunc2
	     .map!(e => e.ufcsFunc3.ufcsFunc4 ...)
	     ...
	     .ufcsFuncn
	     .flattenToRange
	     .rangeFunc1
	     .rangeFunc2
	     ...;

Result: 20MB executable size (600MB before the "horcrux" technique was
applied).

	// After:
	input.ufcsFunc1
	     .ufcsFunc2
	     .map!(e => e.ufcsFunc3.ufcsFunc4 ...)
	     ...
	     .ufcsFuncn
	     .wrapWithInterface	// <--- type erasure happens here
	     .flattenToRange
	     .rangeFunc1
	     .rangeFunc2
	     ...;

Result: 8MB executable size (!).

Of course, this comes at a runtime cost of an added level of indirection (now you have to call a virtual function through the interface to integrate with earlier components in the pipeline).  But at a 60% reduction in executable size, I think it's a worthwhile tradeoff.  :-)


T

-- 
What's a "hot crossed bun"? An angry rabbit.
July 14
On Friday, 14 July 2017 at 22:45:44 UTC, H. S. Teoh wrote:
> On Thu, Jul 13, 2017 at 03:27:31PM -0700, H. S. Teoh via Digitalmars-d wrote:
>> [...]
> [...]
>
> Here's a further update to the saga of combating ridiculously large symbol sizes.
>
> [...]

You will be excited to hear that my template work will fix the described problem as a side effect.
Without incurring run-time overhead. Or forcing the usage of OO wrappers.
The downside however is that the scheme I have in mind is very complex and might take over a year to implement.
July 14
On Fri, Jul 14, 2017 at 03:45:44PM -0700, H. S. Teoh via Digitalmars-d wrote: [...]
> Here's a further update to the saga of combating ridiculously large symbol sizes.
[...]
> 	     .wrapWithInterface	// <--- type erasure happens here
[...]

Some further thoughts about type erasure and UFCS chains.

Nobody says type erasure *requires* OO -- that's just an artifact of the way things are currently implemented.  Consider, for example, your generic UFCS chain:

	auto r = source
		.ufcsFunc1
		.ufcsFunc2
		.ufcsFunc3
		...
		.ufcsFuncn;

>From the POV of outside code, *nobody cares* what the specific types of
each stage of the UFCS pipeline are.  Only the code that implements ufcsFunc1, ufcsFunc2, etc., need to know.  Furthermore, suppose X is the specific type of the range that's returned by ufcsFunc3, in the middle of the pipeline.  What are the chances this exact type is going to be reused again later?  Very slim.  And if ufcsFunc2, say, takes an alias parameter that's instantiated with a lambda function, you can basically guarantee that this type X will *never* ever be repeated again, outside of this specific UFCS chain.

And at the end of the chain, typeof(r) really only needs to encode the fact that it implements the API returned by ufcsFuncn, e.g., the range methods, element type, etc.. But nobody will care how these API functions are implemented under the hood -- as far as outside code is concerned, typeof(r) might as well be an opaque serial number, and we'll be able to find the instantiated API functions that implement the methods of r.  IOW, the explicit types of all intermediate stages of the UFCS chain are irrelevant to the outside world.  Since they are template functions (most of the time) anyway, the compiler might as well just glue all their code together, and put it inside functions named like funcName0012345XYZ where the 0012345XYZ is some random unique string, basically a serial number, that identifies it as a method of r.

For that matter, typeof(r) might as well be ufcsChain0012345XYZ, since nobody outside the pipeline itself will care what its component types are.

So basically, the compiler could just automatically type-erase the type of such chains, given that (1) none of the UFCS functions "escape" knowledge about an intermediate type to the outside world, (2) the chain is essentially unique to the current function (esp. if one or more components of the chain has an alias parameter bound to a lambda argument), (3) most UFCS functions return opaque types anyway -- very few (i.e., almost none) Phobos algorithms, for example, allow you to recover the type of the previous component of the chain -- so it's not relevant what the specific types of the previous components are, and (4) therefore the outside world couldn't care less about the specific types that compose typeof(r).  It could be as simple as bumping a global counter and suffixing it to some compiler-reserved prefix, to make type names like __ufcsType_00000123.  This will be far shorter than any symbol compression applied to present-day symbols!

To summarize, here's a scheme for handling template expansion in UFCS chains that will produce extremely short symbols, *independently* of how long your UFCS chains are.  If the compiler sees a UFCS chain with these characteristics:

1) There are ≥ n components, for some arbitrary value of n (say n=5 or n=10, or something like that, so that we don't bother with trivial cases where the full type may actually be important);

2) Each component of the chain is a template function (this one is
almost a given);

3) At least one component guarantees a unique instantiation, e.g., an alias parameter bound to a lambda function;

then the compiler will:

a) Generate a unique serial number (package.module + global counter
should be good enough);

b) Assign a unique typename to the final return value, constructed using this serial number;

c) Essentially inline all template bodies of previous stages of the pipeline into the last one, using the same serial number to generate names for the aggregated symbols of the final stage.

This way, those ridiculously huge symbols aren't even generated to begin with, thereby alleviating the need for convoluted compression schemes.


T

-- 
I am not young enough to know everything. -- Oscar Wilde
July 14
On Friday, 14 July 2017 at 22:45:44 UTC, H. S. Teoh wrote:
> Result: 8MB executable size (!).
>
> Of course, this comes at a runtime cost of an added level of indirection (now you have to call a virtual function through the interface to integrate with earlier components in the pipeline).  But at a 60% reduction in executable size, I think it's a worthwhile tradeoff.  :-)

That's a very interesting idea!
However, I wouldn't want to insert this into code manually, maybe a simple flag-opt-in support in DMD that detects UFCS pipes with X components and then inserts the type erasure hack could solve most problems for the time being? That way it wouldn't create a WTF moment when explaining D code and when Stefan or someone else manages to implement something more sophisticated, no code would need an update.


July 15
On Friday, 14 July 2017 at 22:45:44 UTC, H. S. Teoh wrote:
> Here's a further update to the saga of combating ridiculously large symbol sizes.
>
> So yesterday I wrote a new module that also heavily uses UFCS chains. My initial draft of the module, once I linked it with the main program, particularly with a UFCS chain that has led to the 600MB executable sizes seen above, caused another explosion in symbol size that actually managed to reach 100MB in *one* symbol, triggering a DMD termination complaining about possible infinite template recursion. :-D  Funnier still, temporarily simplifying part of the chain to bring symbol sizes down, I eventually got it below 100MB but ended up with linker segfaults and ELF errors because the huge symbol was too ridiculously huge.
>
> Eventually, it drove me to refactor two Phobos functions that are used heavily in my code: std.range.chunks and std.algorithm.joiner, using the same "horcrux" technique (see Phobos PRs #5610 and #5611). This, together with some further refactoring in my own code, eventually brought things down to the 20MB range of executable sizes.
>
> Then an idea occurred to me: the reason these symbol sizes got so large, was because the UFCS chain preserves *all* type information about every component type used to build the final type. So, by necessity, the type name has to somehow encode all of this information in an unambiguous way.  Now, arguably, DMD's current mangling scheme is at fault because it contains too many repeating components, but even if you disregard that, the fact remains that if you have 50+ components in your overall UFCS chain, the symbol length cannot be less than 50*n where n is the average length of a single component's type name, plus some necessary overhead to account for the mangling scheme syntax. Let's say n is on average 20-25 characters, say round it up to 35 for mangling syntax, so you're still looking at upwards of 1700+ characters *minimum*.  That, coupled with the current O(n^2) / O(n^3) mangling scheme, you easily reach megabytes of symbol length.  We can compress the symbols all we want, but there's a limit as to how much compression will help. At the end of the day, you still have to represent those 50+ components *somehow*.
>
> But what if most of this type information is actually *unnecessary*?  To use a range example, if all you care about at the end of the chain is that it's a forward range of ubyte, then why even bother with multi-MB symbols encoding type information that's never actually used?  Maybe a little type-erasure would help, via hiding those 50+ component types behind an opaque runtime OO polymorphic interface.  Phobos does have the facilities of this, in the form of the InputRange, ForwardRange, etc., interfaces in std.range.interfaces.  In my case, however, part of the chain uses another generic type (a kind of generalization of 2D arrays). But either way, the idea is simple: at the end of the UFCS chain, wrap it in a class object that inherits from a generic interface that encodes only the primitives of the 2D array concept, and the element type. The class object itself, of course, still must retain knowledge of the full-fledged type, but the trick is that if we insert this type erasure in the middle of the chain, then later components don't have to encode the type names of earlier components anymore.
>
> T

I have some stupid questions:

- What does everyone mean when they say 'symbol' here? I'm probably misunderstanding symbols gravely or it's something that DMD in particular handles in a strange way.

- What type information are being kept because of UFCS chains? Doesn't that mechanism simply apply overload resolution then choose between the prefix and .method forms as appropriate, rewriting the terms?
    Then it's a problem of function invocation. I don't get what's happening here still. Does this tie to the Voldemort types problem? (=> are many of the functions in the chain returning custom types?) It doesn't make sense, especially if, from your indirection workaround, it looks like it would work around the same but without the bloat problem if we unlinked the chain into many intermediate temporary parts. So how is this a type information issue?

Thanks!
July 15
On Saturday, 15 July 2017 at 11:10:32 UTC, Enamex wrote:
> On Friday, 14 July 2017 at 22:45:44 UTC, H. S. Teoh wrote:
>> [...]
>
> I have some stupid questions:
>
> - What does everyone mean when they say 'symbol' here? I'm probably misunderstanding symbols gravely or it's something that DMD in particular handles in a strange way.
>
> [...]

A symbol is anything which has an address and can/could be picked up by the linker.
In other words a symbol is anything which has a name. (or could have a name)
« First   ‹ Prev
1 2