Thread overview
Stripping Data Symbols (Win64)
Dec 28, 2015
Benjamin Thaut
Dec 30, 2015
Rainer Schuetze
Dec 30, 2015
Benjamin Thaut
Jan 01, 2016
Rainer Schuetze
Jan 04, 2016
Benjamin Thaut
December 28, 2015
My current work on the D compiler lead me to the following test case which I put through a unmodified version of dmd 2.069.2

import core.stdc.stdio;

struct UnusedStruct
{
	int i = 3;
	float f = 4.0f;
};

class UnusedClass
{
	int i = 2;
	float f = 5.0f;
};

void main(string[] args)
{
  printf("Hello World!");
}

When compiling this on windows with dmd -m64 main.d -L/MAP
and then inspecting the map file I noticed that the following 4 data symbols end up in the final executable although they shouldn't be used.

 0003:00000a90       _D4main12UnusedStruct6__initZ 0000000140046a90     main.obj
 0003:00000ad0       _D4main11UnusedClass6__initZ 0000000140046ad0     main.obj
 0003:00000af0       _D4main11UnusedClass7__ClassZ 0000000140046af0     main.obj
 0003:00000ba0       _D4main11UnusedClass6__vtblZ 0000000140046ba0     main.obj

For the struct this is the initializer, for the class its the initializer, class info and vtbl.

Is this behavior correct? Shouldn't UnusedStruct and UnusedClass be stripped completely from the binary? Is this somehow connected to the module info / object.factory?

I noticed by looking at some object file dumps that dmd puts each function into its own section, but data symbols, like initializers, are all merged into the same section. Could this be the root issue?
December 30, 2015

On 28.12.2015 13:05, Benjamin Thaut wrote:
> My current work on the D compiler lead me to the following test case
> which I put through a unmodified version of dmd 2.069.2
>
> import core.stdc.stdio;
>
> struct UnusedStruct
> {
>      int i = 3;
>      float f = 4.0f;
> };
>
> class UnusedClass
> {
>      int i = 2;
>      float f = 5.0f;
> };
>
> void main(string[] args)
> {
>    printf("Hello World!");
> }
>
> When compiling this on windows with dmd -m64 main.d -L/MAP
> and then inspecting the map file I noticed that the following 4 data
> symbols end up in the final executable although they shouldn't be used.
>
>   0003:00000a90       _D4main12UnusedStruct6__initZ 0000000140046a90
> main.obj
>   0003:00000ad0       _D4main11UnusedClass6__initZ 0000000140046ad0
> main.obj
>   0003:00000af0       _D4main11UnusedClass7__ClassZ 0000000140046af0
> main.obj
>   0003:00000ba0       _D4main11UnusedClass6__vtblZ 0000000140046ba0
> main.obj
>
> For the struct this is the initializer, for the class its the
> initializer, class info and vtbl.
>
> Is this behavior correct? Shouldn't UnusedStruct and UnusedClass be
> stripped completely from the binary? Is this somehow connected to the
> module info / object.factory?
>

I noticed something similar recently when compiling a C file with /Gy, see https://github.com/D-Programming-Language/druntime/pull/1446#issuecomment-160880021

The compiler puts all functions into COMDATs, but they are all still linked in if only a single symbol is referenced, even if linked with /OPT:REF.

So I suspect this is not an issue with dmd, but the Microsoft linker. I still wonder whether the approach to use "function level linking" works at all for Win64.

> I noticed by looking at some object file dumps that dmd puts each
> function into its own section, but data symbols, like initializers, are
> all merged into the same section. Could this be the root issue?

Having all data in a single section misses some possible optimizations, and it might be the reason for the behavior in your case (you can check this with "dumpbin /all objectfile"), but the issue above does not contain any data.
December 30, 2015
On Wednesday, 30 December 2015 at 09:43:32 UTC, Rainer Schuetze wrote:
>
>
>
> I noticed something similar recently when compiling a C file with /Gy, see https://github.com/D-Programming-Language/druntime/pull/1446#issuecomment-160880021
>
> The compiler puts all functions into COMDATs, but they are all still linked in if only a single symbol is referenced, even if linked with /OPT:REF.
>
> So I suspect this is not an issue with dmd, but the Microsoft linker. I still wonder whether the approach to use "function level linking" works at all for Win64.
>
> > I noticed by looking at some object file dumps that dmd puts
> each
> > function into its own section, but data symbols, like
> initializers, are
> > all merged into the same section. Could this be the root
> issue?
>
> Having all data in a single section misses some possible optimizations, and it might be the reason for the behavior in your case (you can check this with "dumpbin /all objectfile"), but the issue above does not contain any data.

So if I understand this correctly the microsoft linker only strips unused comdats, otherwise always the entire object file gets pulled in?

For me stripping of individual data symbols not working is actually a good thing, if it doesn't work, I can't break it. ;-)
January 01, 2016

On 30.12.2015 13:25, Benjamin Thaut wrote:
> On Wednesday, 30 December 2015 at 09:43:32 UTC, Rainer Schuetze
> wrote:
>>
>>
>>
>> I noticed something similar recently when compiling a C file with
>> /Gy, see
>> https://github.com/D-Programming-Language/druntime/pull/1446#issuecomment-160880021
>>
>>
>>
>> The compiler puts all functions into COMDATs, but they are all
>> still linked in if only a single symbol is referenced, even if
>> linked with /OPT:REF.
>>
>> So I suspect this is not an issue with dmd, but the Microsoft
>> linker. I still wonder whether the approach to use "function level
>> linking" works at all for Win64.
>>
>>> I noticed by looking at some object file dumps that dmd puts
>> each
>>> function into its own section, but data symbols, like
>> initializers, are
>>> all merged into the same section. Could this be the root
>> issue?
>>
>> Having all data in a single section misses some possible
>> optimizations, and it might be the reason for the behavior in your
>>  case (you can check this with "dumpbin /all objectfile"), but the
>>  issue above does not contain any data.
>
> So if I understand this correctly the microsoft linker only strips
> unused comdats, otherwise always the entire object file gets pulled
> in?

I tried to reproduce the issue right now, but failed to do so (both with a C file compiled with /Gy and a D compiled). Only referenced COMDATs where included in a link, not other COMDATs in the same object file. Maybe it was an issue with my build script back then.

>
> For me stripping of individual data symbols not working is actually a
>  good thing, if it doesn't work, I can't break it. ;-)

Please note that building with -lib puts every function/declaration into it's own object file inside the library, and unused class declarations are no longer in the linked executable.
January 04, 2016
On Friday, 1 January 2016 at 13:57:01 UTC, Rainer Schuetze wrote:
>
> Please note that building with -lib puts every function/declaration into it's own object file inside the library, and unused class declarations are no longer in the linked executable.

Ok, that is very good information. I should be able to build a test case out of that.

Kind Regards
Benjamin Thaut