May 22, 2015
https://issues.dlang.org/show_bug.cgi?id=14571

--- Comment #12 from Vladimir Panteleev <thecybershadow@gmail.com> ---
(In reply to Vladimir Panteleev from comment #11)
> Arrays in the data segment have a fixed address. This means that arr.ptr is actually known at compile time (whether the language exposes it or not).

Sorry, that's wrong. It's known at link time.

--
May 22, 2015
https://issues.dlang.org/show_bug.cgi?id=14571

--- Comment #11 from Vladimir Panteleev <thecybershadow@gmail.com> ---
(In reply to Walter Bright from comment #9)
> (In reply to Vladimir Panteleev from comment #7)
> > The problem is entirely with .init.
> 
> Which is a global variable, and has all the downsides of them. My advice still applies.

Certain data structures are much easier to implement using static arrays (for example, object pools with statically-sized chunks).

> Global arrays already have that extra indirection.

You mean TLS?

Arrays in the data segment have a fixed address. This means that arr.ptr is actually known at compile time (whether the language exposes it or not). arr[5] involves no address calculation at all during program execution as well.

> True, then it goes into the BSS segment, but the other issues still apply that are not fixable by changing the compiler (i.e. executable bloat).

BSS does not bloat executable size, only virtual memory.

--
May 22, 2015
https://issues.dlang.org/show_bug.cgi?id=14571

--- Comment #13 from Walter Bright <bugzilla@digitalmars.com> ---
Oh, and lest I forget, using large thread local static arrays is surely going to be a mistake, as it'll consume memory for every thread, and will make thread creation quite expensive.

--
May 22, 2015
https://issues.dlang.org/show_bug.cgi?id=14571

--- Comment #14 from Walter Bright <bugzilla@digitalmars.com> ---
(In reply to Vladimir Panteleev from comment #11)
> Certain data structures are much easier to implement using static arrays (for example, object pools with statically-sized chunks).
> 
> > Global arrays already have that extra indirection.
> 
> You mean TLS?

Take a look at the code generated for global data. In 64 bit code, it's all relative to the program counter. In 32 bit code, it's indirect because of shared library support (PIC).


> Arrays in the data segment have a fixed address. This means that arr.ptr is actually known at compile time (whether the language exposes it or not).

Link time, not compile time.


> arr[5] involves no address calculation at all during program execution as well.

That died a couple decades ago with the advent of DLLs. x86-64 bit code doesn't even have a direct addressing mode. Even the presumably direct addressing modes in x32 are indirect because of the segment registers, and despite that, the CPU does such a good job of address pipelining you'll never see the effect of using a register offset.


> BSS does not bloat executable size, only virtual memory.

I know, that's why I mentioned it. BSS is special.

--
May 22, 2015
https://issues.dlang.org/show_bug.cgi?id=14571

--- Comment #15 from Vladimir Panteleev <thecybershadow@gmail.com> ---
(In reply to Walter Bright from comment #14)
> Take a look at the code generated for global data. In 64 bit code, it's all relative to the program counter. In 32 bit code, it's indirect because of shared library support (PIC).

Not on Win32, though.

> That died a couple decades ago with the advent of DLLs.

DLLs are relocated at load time (and usually are linked with a base unlikely to conflict, so relocations are often not done). The hypothetical ptr[5] would be relocated as well.

> x86-64 bit code
> doesn't even have a direct addressing mode. Even the presumably direct
> addressing modes in x32 are indirect because of the segment registers, and
> despite that, the CPU does such a good job of address pipelining you'll
> never see the effect of using a register offset.

I would need to run some benchmarks to test this. But a quick test shows that 64-bit code has dedicated CPU instructions for relative addressing of globals, but indexing arrays on the heap still requires two instructions (mov rax, arr + mov dword ptr [arr+idx*4], value).

> I know, that's why I mentioned it. BSS is special.

I understood your post as that executable bloat still applies even though it goes into BSS.

--
May 23, 2015
https://issues.dlang.org/show_bug.cgi?id=14571

--- Comment #16 from Walter Bright <bugzilla@digitalmars.com> ---
(In reply to Vladimir Panteleev from comment #15)
> DLLs are relocated at load time (and usually are linked with a base unlikely to conflict, so relocations are often not done). The hypothetical ptr[5] would be relocated as well.

It goes through a relocation thunk. So does TLS.

> Not on Win32

Win32 is dead. Even phones are 64 bit processors, aren't they?

> I would need to run some benchmarks to test this. But a quick test shows that 64-bit code has dedicated CPU instructions for relative addressing of globals, but indexing arrays on the heap still requires two instructions (mov rax, arr + mov dword ptr [arr+idx*4], value).

64 bit code indexes static data with the Program Counter.

Furthermore, if you're accessing large arrays, the cost of getting a pointer to the start of it is utterly swamped by accessing the data itself. Like I said, I bet if you do some benchmarking, you'd be hard pressed to find ANY improvement of static large arrays over allocated ones.

--
May 23, 2015
https://issues.dlang.org/show_bug.cgi?id=14571

--- Comment #17 from Vladimir Panteleev <thecybershadow@gmail.com> ---
(In reply to Walter Bright from comment #16)
> (In reply to Vladimir Panteleev from comment #15)
> > DLLs are relocated at load time (and usually are linked with a base unlikely to conflict, so relocations are often not done). The hypothetical ptr[5] would be relocated as well.
> 
> It goes through a relocation thunk. So does TLS.

I don't know what a "relocation thunk" is (no Google hits), but all offsets are adjusted at load time, if that's what you mean.

> Win32 is dead. Even phones are 64 bit processors, aren't they?

Only the newer ones... and the phones don't use x86_64. Is the situation on ARM the same?

> Furthermore, if you're accessing large arrays, the cost of getting a pointer to the start of it is utterly swamped by accessing the data itself. Like I said, I bet if you do some benchmarking, you'd be hard pressed to find ANY improvement of static large arrays over allocated ones.

I think you are right. But I also think that this is a valid, working pattern in C/C++/Delphi programs, so it should be supported. There is value in having a 1:1 port of a program work as-is without refactorings to work around compiler limitations.

--
May 23, 2015
https://issues.dlang.org/show_bug.cgi?id=14571

--- Comment #18 from Manu <turkeyman@gmail.com> ---
(In reply to Walter Bright from comment #6)
> I'll fix this, but you should know that this necessarily creates large sections in the executable file of repeated data. For such large arrays, it is better to allocate and initialize them on program startup.

I reported it because it was a bug, and I agree it's not excellent practise.
That said though, Vladimir has already presented my thoughts; it's perfectly
valid code, people do it, code like this exists.
In my case, I am porting C++ code, and it's quite a lot of code... it is
difficult to refactor and port at the same time.


Regarding .init, this is something I never really thought about too much, but
I'm now really concerned. I have been concerned by classinfo's in the past, but
somehow init slipped under my radar.
I think a few things need to be considered and/or possible.

1. How do I disable the existence of 'init' for a type? It's conceivable that I want to produce an uninitialised (and uninitialisable) type, like these ones I have here.

2. Any type with a static array naturally has an unreasonably large .init value; what optimisations are possible with relation to init? Can they be alocated+synthesised at init (*cough*) time, rather than built into the exe? An array is a series of repeated elements, so storing that full array in the binary is not only wasteful, but can only lead to disaster when people put a static array as a member of a type, and the length is large, or perhaps is fed from a 3rd party who doesn't have this specific consideration in mind (nor should they).

3. Can D effectively link-strip .init when it is un-referenced? How can we make this possible if there is something preventing it?

I'd love to spend some time working towards D binaries being the same
predictable size as C/C++ binaries. For some reason, despite my efforts, I
always seem to end up with D binaries that are easily 10 times the size of
their counterpart C binary.
Infact, I constantly find myself in the surprising situation where I create a D
interface for a C lib, which simply declares extern(C)'s along with minimal D
code for adaptation, no actual functional D code in sight, and the .lib it
produces is significantly larger than the entire C lib that it represents.
I have never taken the time to explore the problem, I suspect it's just
classinfo's and init values... are there other known bloat inducing problems?


> The way globals work on modern CPUs is you are not saving any execution time by using static data. Large static arrays is an artifact of FORTRAN.

This isn't about execution time, it's about perfectly valid code that looks
completely benign causing an unexpected explosion to your binary.
Not all programmers are aware or conscious of this sort of thing. It shouldn't
be so simple for an unexpecting (junior?) programmer to make a mess like this,
and likely not understand what they've done, or that they've even done it at
all.

--
May 23, 2015
https://issues.dlang.org/show_bug.cgi?id=14571

--- Comment #19 from Vladimir Panteleev <thecybershadow@gmail.com> ---
(In reply to Manu from comment #18)
> 1. How do I disable the existence of 'init' for a type?

I think that if you make sure that all of the initial values are zero bytes, the compiler won't generate a .init block. Instead, the TypeInfo will have a null .init pointer, and the runtime will use that as a clue to simply do a memset instead of copying over the .init data when allocating new types. You might be able to also use this to ensure that your complex types aren't accidentally creating .init blocks.

> 2. Any type with a static array naturally has an unreasonably large .init value; what optimisations are possible with relation to init? Can they be alocated+synthesised at init (*cough*) time, rather than built into the exe?

Not at the moment, AFAIK.

> 3. Can D effectively link-strip .init when it is un-referenced? How can we make this possible if there is something preventing it?

Each .init would need to be in its own section to allow linker garbage collection. DMD doesn't seem to do this at the moment, though (at least not on Win32/Win64).

Whether to put things in individual sections is usually a trade-off between link time and resulting executable size. It would be great if DMD at least gave the user some control over this. gcc has e.g. -ffunction-sections and -fdata-sections.

> I'd love to spend some time working towards D binaries being the same predictable size as C/C++ binaries. For some reason, despite my efforts, I always seem to end up with D binaries that are easily 10 times the size of their counterpart C binary.

I agree, bloated executables are not nice. This becomes a real problem with proprietary/closed-source applications, since then the compiler is pulling in code and data that is never actually used, and which should not be present in the published executable.

> I have never taken the time to explore the problem, I suspect it's just classinfo's and init values... are there other known bloat inducing problems?

Yes.

- Static constructors pull in everything they reference.
- Object.factory requires that all classes that the compiler sees must be
instantiatable, which means pulling in their vtables, invariants, virtual
methods, and all their dependencies.
- Many things which could be emitted in separate sections are put in one
section. As a result, anything that's referenced within that section pulls in
everything else from it, and all their dependencies.
- There are probably other problems.

This is generally one of the more neglected aspects of D and the current implementations. People working on embedded D stuff are constantly running into the above problems as well.

--
May 24, 2015
https://issues.dlang.org/show_bug.cgi?id=14571

github-bugzilla@puremagic.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--