August 30, 2013
On Friday, 30 August 2013 at 08:23:20 UTC, Benjamin Thaut wrote:
> So let me summarize the currently discussed solutions:
>
> 1) Use alias symbols
> Pros:
>  - Better usability (no additional command line parameters needed when compiling / linking against DLLs)
>
> Cons:
>  - Less efficient code for cross DLL function calls (one additional call instruction)
>  - Less efficient code for all accesses to global data, no matter if it will end up in a DLL or not. That means __gshared variables, global shared variables, hidden global data like module info, type info, vtables, etc (one additional level of indirection). It might be possible to avoid this in most cases using link time optimization. But it is unlikely that we can get LTO to work easily.
>  - Additional level of indirection might confuse current debuggers
>

My understanding is that it can be optimized away. Or can it ?

> 2) Use additional command line
> Pros:
>  - More efficient code
>
> Cons:
>  - Additional command line parameters needed when compiling / linking against a DLL. (can be hidden away inside sc.ini for phobos / druntime)

August 30, 2013
Am 30.08.2013 11:17, schrieb deadalnix:
> My understanding is that it can be optimized away. Or can it ?

It can in C/C++ because microsoft has really good LTO (link time optimization). As Rainer Schuetze explained the format of the object files used for LTO is propriatary and unkown to the public. So we can not easily use that, and its unlikely we can use it at all. Also for 32-bit windows it can not be optimized away because optlink does not do LTO.

In theory it could be optimized away, but in pratice I doubt we are going to see this kind of optimization in dmd any time soon. And as dmd is the only viable option on windows right now I would prefer the better performing second solution to this problem


August 30, 2013
On Friday, 30 August 2013 at 09:33:25 UTC, Benjamin Thaut wrote:
> Am 30.08.2013 11:17, schrieb deadalnix:
>> My understanding is that it can be optimized away. Or can it ?
>
> It can in C/C++ because microsoft has really good LTO (link time optimization). As Rainer Schuetze explained the format of the object files used for LTO is propriatary and unkown to the public. So we can not easily use that, and its unlikely we can use it at all. Also for 32-bit windows it can not be optimized away because optlink does not do LTO.
>

win32 is kind of a dead end anyway. It is still very alive today, but we want to aim for the future. And it WILL disappear.

The object format is kind of anoying.

> In theory it could be optimized away, but in pratice I doubt we are going to see this kind of optimization in dmd any time soon. And as dmd is the only viable option on windows right now I would prefer the better performing second solution to this problem

I understand. But sacrificing good design for immediate reward is bound to cost way more in the long run if D is successful. And if D fails, then it doesn't matter.
August 30, 2013
Am 30.08.2013 11:52, schrieb deadalnix:
> On Friday, 30 August 2013 at 09:33:25 UTC, Benjamin Thaut wrote:
>> Am 30.08.2013 11:17, schrieb deadalnix:
>>> My understanding is that it can be optimized away. Or can it ?
>>
>> It can in C/C++ because microsoft has really good LTO (link time
>> optimization). As Rainer Schuetze explained the format of the object
>> files used for LTO is propriatary and unkown to the public. So we can
>> not easily use that, and its unlikely we can use it at all. Also for
>> 32-bit windows it can not be optimized away because optlink does not
>> do LTO.
>>
>
> win32 is kind of a dead end anyway. It is still very alive today, but we
> want to aim for the future. And it WILL disappear.
>
> The object format is kind of anoying.
>
>> In theory it could be optimized away, but in pratice I doubt we are
>> going to see this kind of optimization in dmd any time soon. And as
>> dmd is the only viable option on windows right now I would prefer the
>> better performing second solution to this problem
>
> I understand. But sacrificing good design for immediate reward is bound
> to cost way more in the long run if D is successful. And if D fails,
> then it doesn't matter.

I don't consider good it good design to add lots of indirections just to remove a command line parameter to the compiler. Its not like its going to make any difference in source code.
August 30, 2013

On 29.08.2013 21:09, Martin Nowak wrote:
> On 08/29/2013 08:30 PM, Rainer Schuetze wrote:
>> I meant the import part. How are accesses or data references to data
>> inside another shared library implemented?
>
> References from the data segment use absolute relocations.

Just remembered that we don't have that in Windows, so if we want to mimick what C++ does, we'll have to generate initialization-code to be run at startup to fill in the proper pointers from the import table.
August 30, 2013
On Friday, 30 August 2013 at 06:23:02 UTC, Rainer Schuetze wrote:
>
> On 29.08.2013 21:09, Martin Nowak wrote:
>>
>> References from the data segment use absolute relocations.
>> References from PIC code use the GOT.
>
> So an indirection through the GOT is added to every access to a non-static global if you compile with -fPIC?

My understanding was references using the GOT have the same number of indirections as in the PLT -- that is, a one-time cost for the dynamic linker to dereference and replace an entry with the absolute address -- and the major difference is PLT is done lazily, while GOT is fixed up in its entirety at runtime.

Having said that, now I wonder if I'm misunderstanding the question?

-Wyatt
August 30, 2013
On 08/30/2013 12:50 PM, Rainer Schuetze wrote:
>> References from the data segment use absolute relocations.
>
> Just remembered that we don't have that in Windows, so if we want to
> mimick what C++ does, we'll have to generate initialization-code to be
> run at startup to fill in the proper pointers from the import table.

This is also done at runtime by the loader. The additional startup time is one reason why exporting data in ABIs is bad. The other is copy relocations for writeable data (http://docs.oracle.com/cd/E19082-01/819-0690/chapter4-84604/index.html).
August 30, 2013
On 08/30/2013 08:20 AM, Rainer Schuetze wrote:
> It's a bit easier to see the code by adding debug symbols and using
> dumpbin /disasm main.exe.
>
> Unfortnately, this won't help us a lot, because the intermediate object
> files have some unknown format and in fact are just transferring some
> code representation to the linker that then invokes the compiler and
> optimizer on the full source code. This is very specific to the C/C++
> toolchain and I don't think we can take advantage of it.

This is sadly true for the dmd->COFF->VC link toolchain.
But we'll likely see this from GDC or LDC in the future.
Despite that I don't think this is a huge performance issue at all
we could put all the import data pointers in one section so that they are all packed and allow efficient use of the CPU cache.
August 30, 2013
Am 30.08.2013 10:23, schrieb Benjamin Thaut:
> 1) Use alias symbols

I'm convinced now that Martin Nowaks aliasing solution is the better solution as there will be less runtime overhead then I originally thought. Also the usability will be better. It also leaves room for further optimizations by doing some clever pointer patching, in case we find a solution that works on x64.

If no one does a veto I'm going to update the DIP with this aliasing solution and all other discussed changes. (e.g. making export a attribute)

Kind Regards
Benjamin Thaut


-- 
Kind Regards
Benjamin Thaut
September 01, 2013
I updated the DIP with all discussed content. Feedback is welcome.

http://wiki.dlang.org/DIP45

Kind Regards
Benjamin Thaut