Jump to page: 1 24  
Page
Thread overview
A modest proposal: eliminate template code bloat
Apr 08, 2012
Dmitry Olshansky
Apr 08, 2012
Marco Leise
Apr 08, 2012
Dmitry Olshansky
Apr 08, 2012
Marco Leise
Apr 08, 2012
H. S. Teoh
Apr 08, 2012
Marco Leise
Apr 08, 2012
Dmitry Olshansky
Apr 08, 2012
H. S. Teoh
Apr 08, 2012
Dmitry Olshansky
Apr 08, 2012
Dmitry Olshansky
Apr 09, 2012
Daniel Murphy
Apr 09, 2012
Dmitry Olshansky
Apr 09, 2012
Somedude
Apr 09, 2012
Artur Skawina
Apr 09, 2012
H. S. Teoh
Apr 08, 2012
Artur Skawina
Apr 08, 2012
Dmitry Olshansky
Apr 08, 2012
Artur Skawina
Apr 08, 2012
Artur Skawina
Apr 08, 2012
Dmitry Olshansky
Apr 09, 2012
Daniel Murphy
Apr 09, 2012
H. S. Teoh
Apr 09, 2012
Marco Leise
Apr 09, 2012
Daniel Murphy
Apr 09, 2012
H. S. Teoh
Apr 08, 2012
Marco Leise
Apr 08, 2012
Walter Bright
Apr 08, 2012
Dmitry Olshansky
Apr 08, 2012
H. S. Teoh
April 08, 2012
I think it's been ages since I meant to ask why nobody (as in compiler vendors) does what I think is rather simple optimization.

In the short term the plan is to introduce a "link-time" flavored optimization at code generation or (better) link step.

For simplicity let's assume compiler does all of the following during generation of an object file.

1. Every time a function is generated (or pretty much any symbol) not only a size calculated but also a checksum* of it's data.
(If we go for link-time optimization we should find a place to stick it to in the object file)

2. Compiler maintains a hash-table of symbol_size ---> list( ~ array) of pairs (references to data, checksum) of all symbols with given size. Call it a duplicate table. Every function generated and more generally global immutable data should end up there.

3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry.
(so there has to be an alias table if there isn't one already)

*Yes, checksum. I think it should be real simple and easy to parallel hash function. The original checksum is no reliable but some amount balancing and profiling are obviously required when picking this function.

Applicability:
It's not only const-immutable bloat, it can be alleviated with inout. Yet there are plenty of places the exact same code is being generated: e.g. sealed containers of int vs uint, std.find on dchar[] vs uint[]/int[] an so on.
In general, the coarse grained parametrization is the root of all evil and it is inevitable since we are just humans after all.

Notes:
1. If we do checksum calculation on the fly during codegen it gets at virtually no cost as the data is in CPU data cache. Preliminary version can avoid hacking this part of backend though.

2. By _alias_ I mean the ability of compiler to emit references to a given symbol as if it was some other symbol (should be really straight forward).

3. Linker have more data and is able to achieve colossal size savings, essentially running through the same algorithm before actually linking things. Again it's easily separable step and can be an opt-in.

4. Ironically the same exact thing works with any kind of immutable data structures. It looks like string pooling is superseded by this proposal.

Thoughts?

-- 
Dmitry Olshansky
April 08, 2012
Am Sun, 08 Apr 2012 15:01:56 +0400
schrieb Dmitry Olshansky <dmitry.olsh@gmail.com>:

> I think it's been ages since I meant to ask why nobody (as in compiler vendors) does what I think is rather simple optimization.
> 
> In the short term the plan is to introduce a "link-time" flavored optimization at code generation or (better) link step.
> 
> For simplicity let's assume compiler does all of the following during generation of an object file.
> 
> 1. Every time a function is generated (or pretty much any symbol) not
> only a size calculated but also a checksum* of it's data.
> (If we go for link-time optimization we should find a place to stick it
> to in the object file)
> 
> 2. Compiler maintains a hash-table of symbol_size ---> list( ~ array) of pairs (references to data, checksum) of all symbols with given size. Call it a duplicate table. Every function generated and more generally global immutable data should end up there.
> 
> 3. After any function was generated compiler checks an entry in the
> duplicate table that matches size, followed by matching checksum and
> only then (if required) doing a straight memcmp. If it happens that
> there is a match compiler just throws generated code away and _aliases_
> it's symbol to that of a matched entry.
> (so there has to be an alias table if there isn't one already)
> 
> *Yes, checksum. I think it should be real simple and easy to parallel hash function. The original checksum is no reliable but some amount balancing and profiling are obviously required when picking this function.
> 
> Applicability:
> It's not only const-immutable bloat, it can be alleviated with inout.
> Yet there are plenty of places the exact same code is being generated:
> e.g. sealed containers of int vs uint, std.find on dchar[] vs
> uint[]/int[] an so on.
> In general, the coarse grained parametrization is the root of all evil
> and it is inevitable since we are just humans after all.
> 
> Notes:
> 1. If we do checksum calculation on the fly during codegen it gets at
> virtually no cost as the data is in CPU data cache. Preliminary version
> can avoid hacking this part of backend though.
> 
> 2. By _alias_ I mean the ability of compiler to emit references to a given symbol as if it was some other symbol (should be really straight forward).
> 
> 3. Linker have more data and is able to achieve colossal size savings, essentially running through the same algorithm before actually linking things. Again it's easily separable step and can be an opt-in.
> 
> 4. Ironically the same exact thing works with any kind of immutable data structures. It looks like string pooling is superseded by this proposal.
> 
> Thoughts?

Thoughts? Nothing much. I thought of that a while ago, but as an external program, that finds function calls by disassembling and removing dead/duplicate code. So I agree with you. A similar feature is a CTFE cache (or general code cache) that checksums a function's source code and gets the compiled version from a cache.
Template bloat could be especially important to 'fix' on embedded systems. But I don't consider it important enough at the moment. :/   Let's wait till the bugs and important features are implemented or hack the compiler ourselves.

-- 
Marco

April 08, 2012
On Sun, Apr 08, 2012 at 03:01:56PM +0400, Dmitry Olshansky wrote:
> I think it's been ages since I meant to ask why nobody (as in compiler vendors) does what I think is rather simple optimization.
> 
> In the short term the plan is to introduce a "link-time" flavored optimization at code generation or (better) link step.

This would be incompatible with how current (non-dmd) linkers work. But I do like the idea. Perhaps if it works well, other linkers will adopt it? (Just like how the gcc linker adopted duplicate template code elimination due to C++ templates.)


> For simplicity let's assume compiler does all of the following during generation of an object file.
> 
> 1. Every time a function is generated (or pretty much any symbol)
> not only a size calculated but also a checksum* of it's data.
> (If we go for link-time optimization we should find a place to stick
> it to in the object file)

We'd have to make sure the checksum doesn't end up in the final executable though, otherwise the bloat may negate any gains we've made.


> 2. Compiler maintains a hash-table of symbol_size ---> list( ~ array) of pairs (references to data, checksum) of all symbols with given size. Call it a duplicate table. Every function generated and more generally global immutable data should end up there.
> 
> 3. After any function was generated compiler checks an entry in the
> duplicate table that matches size, followed by matching checksum and
> only then (if required) doing a straight memcmp. If it happens that
> there is a match compiler just throws generated code away and
> _aliases_ it's symbol to that of a matched entry.
> (so there has to be an alias table if there isn't one already)

I think you don't even need an alias table; IIRC the OS dynamic linker can easily handle symbols that have the same value (i.e. that point to the same place). All you have to do is to change the value of the duplicated symbols so that they all point to the same address.


[...]
> Applicability:
> It's not only const-immutable bloat, it can be alleviated with
> inout. Yet there are plenty of places the exact same code is being
> generated: e.g. sealed containers of int vs uint, std.find on
> dchar[] vs uint[]/int[] an so on.
> In general, the coarse grained parametrization is the root of all
> evil and it is inevitable since we are just humans after all.

I'm not sure I understand the last sentence there, but duplicate code elimination is definitely a big plus. It will also mean that we can use templates more freely without having to worry about template bloat.


> Notes:
> 1. If we do checksum calculation on the fly during codegen it gets
> at virtually no cost as the data is in CPU data cache. Preliminary
> version can avoid hacking this part of backend though.
> 
> 2. By _alias_ I mean the ability of compiler to emit references to a given symbol as if it was some other symbol (should be really straight forward).

Like I said, I think this isn't even necessary, if the compiler can just generate the same value for the duplicated symbols.


> 3. Linker have more data and is able to achieve colossal size savings, essentially running through the same algorithm before actually linking things. Again it's easily separable step and can be an opt-in.

This assumes the (maybe external) linker knows how to take advantage of the info. But IMO, linkers *should* be a lot smarter than they currently are, so I don't see a problem with this as long as a dumb linker will still produce a working executable (just without the space savings).

Alternatively we can have an external pre-link tool that scans a given set of object files and eliminates duplicated code (by turning duplicated symbols into external references in all but one of the instances), before we hand the files off to the OS's native linker. Come to think of it, this might be a good way to experiment with this idea before we commit lots of effort into integrating it with a real linker.


> 4. Ironically the same exact thing works with any kind of immutable data structures. It looks like string pooling is superseded by this proposal.
[...]

Not really... string pooling can take advantage of overlapping (sub)strings, but I don't think you can do that with code. But I think your idea has a lot of merit. I'm for making linkers smarter than they currently are.


T

-- 
It always amuses me that Windows has a Safe Mode during bootup. Does that mean that Windows is normally unsafe?
April 08, 2012
On 04/08/12 13:01, Dmitry Olshansky wrote:
> 3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry.
> (so there has to be an alias table if there isn't one already)
[...]
> Thoughts?

Don't forget that this needs to work:

   static auto f(T)(T a) { return a; }
   assert(cast(void*)&f!int!=cast(void*)&f!uint);

artur
April 08, 2012
On 08.04.2012 18:21, Artur Skawina wrote:
> On 04/08/12 13:01, Dmitry Olshansky wrote:
>> 3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry.
>> (so there has to be an alias table if there isn't one already)
> [...]
>> Thoughts?
>
> Don't forget that this needs to work:
>
>     static auto f(T)(T a) { return a; }
>     assert(cast(void*)&f!int!=cast(void*)&f!uint);
>
> artur

A reference to spec page plz.

-- 
Dmitry Olshansky
April 08, 2012
On 04/08/12 17:20, Dmitry Olshansky wrote:
> On 08.04.2012 18:21, Artur Skawina wrote:
>> On 04/08/12 13:01, Dmitry Olshansky wrote:
>>> 3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry.
>>> (so there has to be an alias table if there isn't one already)
>> [...]
>>> Thoughts?
>>
>> Don't forget that this needs to work:
>>
>>     static auto f(T)(T a) { return a; }
>>     assert(cast(void*)&f!int!=cast(void*)&f!uint);

> A reference to spec page plz.

A reference to *a* D spec, please...

There isn't one, but that does not mean that common sense does not need to apply.

Do you really suggest making different template instantiations identical, just because the compiler happened to emit the same code? The situation is not very different from

   const int a = 1; const uint b = 1; assert(&a!=&b);

The user can take the addresses, compare them, use as AA keys, set breakpoints etc.

Note that my point is just that the compiler needs to emit a dummy so that the addresses remain unique, eg

   module.f!uint:
       jmp module.f!int

that only is used when taking the address. Even calling f!int()
instead of the uint version could be acceptable, as long as there
is a compiler option to turn this optimization off (think breakpoints).

artur
April 08, 2012
Am Sun, 8 Apr 2012 07:18:26 -0700
schrieb "H. S. Teoh" <hsteoh@quickfur.ath.cx>:

> We'd have to make sure the checksum doesn't end up in the final executable though, otherwise the bloat may negate any gains we've made.

Executables (and object files) are made up mostly of sections, some of which are 'special cased' to contain the code, zero initialized data, thread local storage etc. and some user defined. The checksums would most probably end up in their own section, like is already happening for debug info or comments. Using a tool like strip you can remove any section by its name.

Once a linker knows how to use the checksums, it would strip them by default.

-- 
Marco

April 08, 2012
On 4/8/12 10:59 AM, Artur Skawina wrote:
> On 04/08/12 17:20, Dmitry Olshansky wrote:
>> On 08.04.2012 18:21, Artur Skawina wrote:
>>> On 04/08/12 13:01, Dmitry Olshansky wrote:
>>>> 3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry.
>>>> (so there has to be an alias table if there isn't one already)
>>> [...]
>>>> Thoughts?
>>>
>>> Don't forget that this needs to work:
>>>
>>>      static auto f(T)(T a) { return a; }
>>>      assert(cast(void*)&f!int!=cast(void*)&f!uint);
>
>> A reference to spec page plz.
>
> A reference to *a* D spec, please...
>
> There isn't one, but that does not mean that common sense does
> not need to apply.

Doesn't apply to C++.

Andrei


April 08, 2012
Am Sun, 08 Apr 2012 16:21:14 +0200
schrieb Artur Skawina <art.08.09@gmail.com>:

> On 04/08/12 13:01, Dmitry Olshansky wrote:
> > 3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry.
> > (so there has to be an alias table if there isn't one already)
> [...]
> > Thoughts?
> 
> Don't forget that this needs to work:
> 
>    static auto f(T)(T a) { return a; }
>    assert(cast(void*)&f!int!=cast(void*)&f!uint);
> 
> artur

Do you actually rely on that behavior? It is the same as asking this to work, I think:

string a = "abc";
string b = "abcdef";
assert(a.ptr !is b.ptr);

There should be ways to logically check the unequality of your two functions, not by comparing their memory addresses.

-- 
Marco

April 08, 2012
On 08.04.2012 19:59, Artur Skawina wrote:
> On 04/08/12 17:20, Dmitry Olshansky wrote:
>> On 08.04.2012 18:21, Artur Skawina wrote:
>>> On 04/08/12 13:01, Dmitry Olshansky wrote:
>>>> 3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry.
>>>> (so there has to be an alias table if there isn't one already)
>>> [...]
>>>> Thoughts?
>>>
>>> Don't forget that this needs to work:
>>>
>>>      static auto f(T)(T a) { return a; }
>>>      assert(cast(void*)&f!int!=cast(void*)&f!uint);
>
>> A reference to spec page plz.
>
> A reference to *a* D spec, please...

Yeah, sorry.
>
> There isn't one, but that does not mean that common sense does
> not need to apply.
>

There is common sense and that is (in my book):
don't tie up compiler's hands for no benefit.

> Do you really suggest making different template instantiations
> identical, just because the compiler happened to emit the same
> code? The situation is not very different from
>
>     const int a = 1; const uint b = 1; assert(&a!=&b);
>

I wouldn't expect this assert to always hold. Moreover (all things being equal) I would expect taking address of constant integers a poor practice.

> The user can take the addresses, compare them, use as AA keys,
> set breakpoints etc.
>

Yes, that's what I call poor practice ( I mean function pointer as AA _key_, seriously?). As for breakpoints, obviously one debugs non-optimized program (at least most of the time).

> Note that my point is just that the compiler needs to emit a dummy
> so that the addresses remain unique, eg
>
>     module.f!uint:
>         jmp module.f!int

Could work as a conservative option. But I don't think it's justified.

>
> that only is used when taking the address. Even calling f!int()
> instead of the uint version could be acceptable, as long as there
> is a compiler option to turn this optimization off (think breakpoints).
>

Yup, that's what optimizations are.

-- 
Dmitry Olshansky
« First   ‹ Prev
1 2 3 4