April 08, 2012
On 04/08/12 18:14, Andrei Alexandrescu wrote:
> On 4/8/12 10:59 AM, Artur Skawina wrote:
>> On 04/08/12 17:20, Dmitry Olshansky wrote:
>>> On 08.04.2012 18:21, Artur Skawina wrote:
>>>> On 04/08/12 13:01, Dmitry Olshansky wrote:
>>>>> 3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry.
>>>>> (so there has to be an alias table if there isn't one already)
>>>> [...]
>>>>> Thoughts?
>>>>
>>>> Don't forget that this needs to work:
>>>>
>>>>      static auto f(T)(T a) { return a; }
>>>>      assert(cast(void*)&f!int!=cast(void*)&f!uint);
>>
>>> A reference to spec page plz.
>>
>> A reference to *a* D spec, please...
>>
>> There isn't one, but that does not mean that common sense does not need to apply.
> 
> Doesn't apply to C++.

Well, let's say it isn't specified. "Functions-are-not-objects" allows the compilers that merge the bodies to get away with it, because it's not explicitly illegal. But that does not mean that returning a different random pointer, which just happens to point to another identical code sequence, is a good idea.


On 04/08/12 18:30, Dmitry Olshansky wrote:
> There is common sense and that is (in my book):
> don't tie up compiler's hands for no benefit.

In this case the benefit from not requiring unique function addresses isn't very large (it only matters iff the address is taken).

Hmm, as long as the returned pointer points to functionally identical code, often everything will still be fine. In other cases doing this optimization *at all* will cause trouble. And i think it should be done (the benefits are obvious), An "unique" function attribute would be the solution; that would let us eat the cake, except when we want to have it. :)

artur
April 09, 2012
"Artur Skawina" <art.08.09@gmail.com> wrote in message news:mailman.1480.1333900846.4860.digitalmars-d@puremagic.com...
>
> Note that my point is just that the compiler needs to emit a dummy so that the addresses remain unique, eg
>
>   module.f!uint:
>       jmp module.f!int
>

Or use a nop slide before the start of the function.  Since we're modifying the object file format anyway, it would be trivial for the compiler to mark functions which have their address taken as needing a unique address.


April 09, 2012
"Dmitry Olshansky" <dmitry.olsh@gmail.com> wrote in message news:jlsmka$22ce$1@digitalmars.com...
>
> The refinement is merging prefixes and suffixes of course.
> And for that one needs to calculate hashes for all of prefixes and all of
> suffixes. I will define _all_ later on.
>

I think you'll find that this is better done in the compiler instead of the linker.  Merging prefixes is problematic because at some point you will need to work out which tail to execute, so you'll always need to modify the generated code.

Merging suffixes is easier, you can merge all returning blocks with
identical code, and then merge all blocks that always jump to the same
blocks, etc.
This will need to happen after code generation if you want to merge int/uint
code, which would be difficult in dmd's back end.

Merging functions with identical bodies is of course much easier, and can be done in the linker without needing to modify any code (just the relocations).


April 09, 2012
On Mon, Apr 09, 2012 at 10:59:26AM +1000, Daniel Murphy wrote:
> "Artur Skawina" <art.08.09@gmail.com> wrote in message news:mailman.1480.1333900846.4860.digitalmars-d@puremagic.com...
> >
> > Note that my point is just that the compiler needs to emit a dummy so that the addresses remain unique, eg
> >
> >   module.f!uint:
> >       jmp module.f!int
> >
> 
> Or use a nop slide before the start of the function.  Since we're modifying the object file format anyway, it would be trivial for the compiler to mark functions which have their address taken as needing a unique address.
[...]

Why is it so important to have unique addresses for functions?


T

-- 
BREAKFAST.COM halted...Cereal Port Not Responding. -- YHL
April 09, 2012
Am Sun, 8 Apr 2012 19:14:22 -0700
schrieb "H. S. Teoh" <hsteoh@quickfur.ath.cx>:

> On Mon, Apr 09, 2012 at 10:59:26AM +1000, Daniel Murphy wrote:
> > "Artur Skawina" <art.08.09@gmail.com> wrote in message news:mailman.1480.1333900846.4860.digitalmars-d@puremagic.com...
> > >
> > > Note that my point is just that the compiler needs to emit a dummy so that the addresses remain unique, eg
> > >
> > >   module.f!uint:
> > >       jmp module.f!int
> > >
> > 
> > Or use a nop slide before the start of the function.  Since we're modifying the object file format anyway, it would be trivial for the compiler to mark functions which have their address taken as needing a unique address.
> [...]
> 
> Why is it so important to have unique addresses for functions?
> 
> 
> T

I would like to know that use case as well, especially since I learned coding with the "copy&paste code is bad" phrase. It just seems odd to me to expect any constant data/code to have unique addresses when they can be merged or overlapped.

-- 
Marco

April 09, 2012
Le 08/04/2012 16:18, H. S. Teoh a écrit :
> On Sun, Apr 08, 2012 at 03:01:56PM +0400, Dmitry Olshansky wrote:
>> I think it's been ages since I meant to ask why nobody (as in compiler vendors) does what I think is rather simple optimization.
>>
>> In the short term the plan is to introduce a "link-time" flavored optimization at code generation or (better) link step.
> 
> This would be incompatible with how current (non-dmd) linkers work. But I do like the idea. Perhaps if it works well, other linkers will adopt it? (Just like how the gcc linker adopted duplicate template code elimination due to C++ templates.)
> 
> T
> 

Actually, in C++ (as well as D), the added benefit would be a greatly
improved compilation speed, wouldn't it ?
I bet if the idea works in D and proves increased compilation, compiler
writers would be very compelled to implement it in C++.
April 09, 2012
On 09.04.2012 5:11, Daniel Murphy wrote:
> "Dmitry Olshansky"<dmitry.olsh@gmail.com>  wrote in message
> news:jlsmka$22ce$1@digitalmars.com...
>>
>> The refinement is merging prefixes and suffixes of course.
>> And for that one needs to calculate hashes for all of prefixes and all of
>> suffixes. I will define _all_ later on.
>>
>
> I think you'll find that this is better done in the compiler instead of the
> linker.  Merging prefixes is problematic because at some point you will need
> to work out which tail to execute, so you'll always need to modify the
> generated code.

"Easy": just add a hidden pointer argument to functions that have merged prefix (call it dispatch). The prefix part of code is followed by an indirect jump to this pointer.
Compiler arranges so that every time function is called the correct dispatch address is passed behind the scenes.

BTW there are no extra checks and such it's one naked indirect jump, and it's totally predictable unlike say switch jump.
(well unless you use a few copy-paste-susceptible functions in the same loop that turn out to have prefixes merged)

It still implies that prefix merging should be applied with more care then suffix merging. Once this tested and working even merging arbitrary parts of functions is doable with this approach.

>
> Merging suffixes is easier, you can merge all returning blocks with
> identical code, and then merge all blocks that always jump to the same
> blocks, etc.
> This will need to happen after code generation if you want to merge int/uint
> code, which would be difficult in dmd's back end.
>

Any chance to fit this into IR-->CodeGen step? Like use alternative comparison then memcmp. Or better do basically the same algorithm but on map!(....)(IR) that morphs things so that some identical ops (e.g. uint/int == and !=) are considered the same. In fact this might be even faster then generating useless machine code!

> Merging functions with identical bodies is of course much easier, and can be
> done in the linker without needing to modify any code (just the
> relocations).
>
>


-- 
Dmitry Olshansky
April 09, 2012
On 04/09/12 08:21, Somedude wrote:
> Le 08/04/2012 16:18, H. S. Teoh a écrit :
>> On Sun, Apr 08, 2012 at 03:01:56PM +0400, Dmitry Olshansky wrote:
>>> I think it's been ages since I meant to ask why nobody (as in compiler vendors) does what I think is rather simple optimization.
>>>
>>> In the short term the plan is to introduce a "link-time" flavored optimization at code generation or (better) link step.
>>
>> This would be incompatible with how current (non-dmd) linkers work. But I do like the idea. Perhaps if it works well, other linkers will adopt it? (Just like how the gcc linker adopted duplicate template code elimination due to C++ templates.)
>>
>> T
>>
> 
> Actually, in C++ (as well as D), the added benefit would be a greatly
> improved compilation speed, wouldn't it ?
> I bet if the idea works in D and proves increased compilation, compiler
> writers would be very compelled to implement it in C++.
> 

They already do.

It's a very simple and trivial optimization, the question is only about programmer expectations. Every (memory) object having an unique address *is* a valuable feature with clear benefits. (C++ has functions as non-objects, that's why the compilers can get away with the optimization) Note that that does not actually mean that everything has to be placed at an unique address -- it only needs to behave *AS IF*, as long as the program can't tell the difference.


On 04/09/12 02:59, Daniel Murphy wrote:
> "Artur Skawina" <art.08.09@gmail.com> wrote in message news:mailman.1480.1333900846.4860.digitalmars-d@puremagic.com...
>>
>> Note that my point is just that the compiler needs to emit a dummy so that the addresses remain unique, eg
>>
>>   module.f!uint:
>>       jmp module.f!int
> 
> Or use a nop slide before the start of the function.  Since we're modifying the object file format anyway, it would be trivial for the compiler to mark functions which have their address taken as needing a unique address.

Nice idea. Given todays amounts of alignment noops emitted it would usually be completely free.

But I now think the optimization would be ok, and should even on by default for the case where the identical code sequence was generated from an identical token sequence. That would handle the template bloat issue while avoiding most of the problems; having non-unique addresses for this case should be harmless and would just need to be properly documented.

It's only the random-completely-unrelated-function-replacement that is
problematic - think such functions randomly appearing in the call chain,
confusing both downstream code and programmers looking at backtraces or
perf profiles, and breakpoints that magically appear out of nowhere at random.

artur
April 09, 2012
On Mon, Apr 09, 2012 at 08:21:08AM +0200, Somedude wrote:
> Le 08/04/2012 16:18, H. S. Teoh a écrit :
> > On Sun, Apr 08, 2012 at 03:01:56PM +0400, Dmitry Olshansky wrote:
> >> I think it's been ages since I meant to ask why nobody (as in compiler vendors) does what I think is rather simple optimization.
> >>
> >> In the short term the plan is to introduce a "link-time" flavored optimization at code generation or (better) link step.
> > 
> > This would be incompatible with how current (non-dmd) linkers work. But I do like the idea. Perhaps if it works well, other linkers will adopt it? (Just like how the gcc linker adopted duplicate template code elimination due to C++ templates.)
> > 
> > T
> > 
> 
> Actually, in C++ (as well as D), the added benefit would be a greatly
> improved compilation speed, wouldn't it ?
> I bet if the idea works in D and proves increased compilation, compiler
> writers would be very compelled to implement it in C++.

Exactly my point. I *want* to give incentive to toolchain devs to add these kinds of enhancements to linkers in general.


T

-- 
Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- Michael Beibl
April 09, 2012
"H. S. Teoh" <hsteoh@quickfur.ath.cx> wrote in message news:mailman.1518.1333937643.4860.digitalmars-d@puremagic.com...
>
> Why is it so important to have unique addresses for functions?
>

Just because I can't think of a use case doesn't mean nobody is relying on it!

But I guess there really isn't one.