Thread overview
Generating struct .init at run time?
Jul 02, 2020
Ali Çehreli
Jul 02, 2020
IGotD-
Jul 02, 2020
Ali Çehreli
Jul 02, 2020
Patrick Schluter
Jul 02, 2020
kinke
Jul 02, 2020
Basile B.
Jul 02, 2020
Ali Çehreli
Jul 02, 2020
kinke
Jul 02, 2020
kinke
Jul 02, 2020
Ali Çehreli
July 02, 2020
Normally, struct .init values are known at compile time. Unfortunately, they add to binary size:

enum elementCount = 1024 * 1024;

struct S {
  double[elementCount] a;
}

void main() {
    S s;
    assert(typeid(S).initializer.length == double.sizeof * elementCount);
    assert(typeid(S).initializer.ptr !is null);
}

Both asserts pass: S.init is 800M and is embedded into the compiled program.

Of course, the solution is to define members with '= void':

enum elementCount = 1024 * 1024;

struct S {
  double[elementCount] a = void;  // <-- HERE
}

void main() {
    S s;
    assert(typeid(S).initializer.length == double.sizeof * elementCount);
    assert(typeid(S).initializer.ptr is null);
}

Now the program binary is 800M shorter. (Note .ptr is now null.) Also note that I did NOT use the following syntax because there is a dmd bug:

  auto s = S(); // Segfaults: https://issues.dlang.org/show_bug.cgi?id=21004

My question is: Is there a function that I can call to initialize 's' to the same .init value that compiler would have used:

S sInit;

shared static this() {
  defaultInitValue(&sInit);  // Does this exist?
}

I can then use sInit to copy over the bytes of all S objects in the program. (Both the structs and their object instantiations are all code-generated; so there is no usability issue. There are thousands of structs and the current binary size is 2G! :) )

If not, I am planning on writing the equivalent of defaultInitValue() that will zero-init the entire struct and then overwrite float, double, char, wchar, and dchar members with their respective .init values, recursively. Does that make sense?

Ali
July 02, 2020
On Thursday, 2 July 2020 at 07:51:29 UTC, Ali Çehreli wrote:
>
> Both asserts pass: S.init is 800M and is embedded into the compiled program.
>

Not an answer to your problem but what on earth are those extra 800MB? The array size is 8MB so if the program would just copy the data it would just take 8MB. Does the binary have this size, even with the debugging info stripped?

Also, this an obvious optimization that can be implemented, that the program do an initialization loop instead of putting it in the data segment when the array size is above a certain size and they are supposed to have the same value.

July 02, 2020
On Thursday, 2 July 2020 at 07:51:29 UTC, Ali Çehreli wrote:
> Normally, struct .init values are known at compile time. Unfortunately, they add to binary size:
>
> [...]
memset() is the function you want. The initializer is an element generated in the data segment (or in a read only segment) that will be copied to the variable by a internal call to memcpy. The same happens in C except that the compilers are often clever and replace the copy by a memset().



July 02, 2020
On Thursday, 2 July 2020 at 07:51:29 UTC, Ali Çehreli wrote:
> Of course, the solution is to define members with '= void'

Since when? https://issues.dlang.org/show_bug.cgi?id=11331 and your https://issues.dlang.org/show_bug.cgi?id=16956 are still open.

For recent LDC versions, the 'solution' is to (statically) initialize the array with zeros, as fully zero-initialized structs don't feature any explicit .init symbols anymore.

> enum elementCount = 1024 * 1024;
>
> struct S {
>   double[elementCount] a = void;  // <-- HERE
> }
>
> void main() {
>     S s;
>     assert(typeid(S).initializer.length == double.sizeof * elementCount);
>     assert(typeid(S).initializer.ptr is null);
> }
>
> Now the program binary is 800M shorter.

So you're saying you have a *stack* that can deal with an 800M struct (assuming you used a different `elementCount` for the actual tests)?! Even 8 MB should be too large without extra compiler/linker options, as that's the default stack size on Linux IIRC (on Windows, 2 MB IIRC).

I don't think a struct should ever be that large, as it can probably only live on the heap anyway and only passed around by refs. I'd probably use a thin struct instead, containing and managing a `double[]` member (or `double[elementCount]*`).
July 02, 2020
On Thursday, 2 July 2020 at 10:37:27 UTC, kinke wrote:
> I don't think a struct should ever be that large, as it can probably only live on the heap anyway and only passed around by refs. I'd probably use a thin struct instead, containing and managing a `double[]` member (or `double[elementCount]*`).

so right but the compiler should definitively not crash.

July 02, 2020
On 7/2/20 2:37 AM, IGotD- wrote:

> what on earth are those extra 800MB?

I'm losing my mind. :) Of course it's just 8M. Too many digits for me to handle. :p

> Also, this an obvious optimization that can be implemented, that the
> program do an initialization loop instead of putting it in the data
> segment when the array size is above a certain size and they are
> supposed to have the same value.

+1

Ali

July 02, 2020
On 7/2/20 3:37 AM, kinke wrote:

> On Thursday, 2 July 2020 at 07:51:29 UTC, Ali Çehreli wrote:
>> Of course, the solution is to define members with '= void'
>
> Since when? https://issues.dlang.org/show_bug.cgi?id=11331 and your
> https://issues.dlang.org/show_bug.cgi?id=16956 are still open.

Wow! I didn't remember that one. According to its date, it was written when I was working for Weka. Apparently, ldc took care of it for them after all.

> For recent LDC versions, the 'solution' is to (statically) initialize
> the array with zeros, as fully zero-initialized structs don't feature
> any explicit .init symbols anymore.

What about floating point and char types? Their .init values are not all zeros in D spec. (I don't think this matters in my case but still.)

> So you're saying you have a *stack* that can deal with an 800M struct

Sorry, my test code was too simplistic. The actual code constructs these objects in dynamic memory for that exact reason.

> I don't think a struct should ever be that large, as it can probably
> only live on the heap anyway and only passed around by refs. I'd
> probably use a thin struct instead, containing and managing a `double[]`
> member (or `double[elementCount]*`).

Exactly.

These structs are code-generated to reflect ROS interface message types. Just like in D, arrays have dynamic/static distinction in ROS so I blindly translated the types to D without remembering this .init issue.

The following are the options I am considering:

a) Move to ldc

b) As you and IGotD- suggest, define all members with '= void' and memset to zero at runtime. (I will decide whether to  take care of char and floating point types specially e.g. by setting doubles to NaN; this distinction may not be important in our use case.) Luckily, issue 16956 you mention above does not affect us because these are non-template structs.

c) Again, as you say, define static arrays as dynamic arrays, code-generate a default constructor that sets the length to the actual static length, which requires some magic as struct default constructor cannot be defined for structs.

d) ?

Ali


July 02, 2020
On Thursday, 2 July 2020 at 15:20:23 UTC, Ali Çehreli wrote:
> According to its date, it was written when I was working for Weka. Apparently, ldc took care of it for them after all.

If so, then without them posting any issue beforehand or giving any feedback afterwards.

> > For recent LDC versions, the 'solution' is to (statically)
> initialize
> > the array with zeros, as fully zero-initialized structs don't
> feature
> > any explicit .init symbols anymore.
>
> What about floating point and char types? Their .init values are not all zeros in D spec. (I don't think this matters in my case but still.)

That's why all you have to do, in order not to have recent LDC emit the struct's init symbol, is to initialize these members manually with zeros:

struct S { double[elementCount] a = 0; }
void foo() { S s; } // compiler does a memset

`= void` for members doesn't work and, I dare say, not work anytime soon if ever.
July 02, 2020
On Thursday, 2 July 2020 at 16:51:52 UTC, kinke wrote:
> `= void` for members doesn't work and, I dare say, not work anytime soon if ever.

I've quickly checked; `= void` for members has initialize-with-zeros semantics too, so with LDC, it's equivalent to `= 0` but applicable to user-defined types as well.
For DMD, `= void` for non-default-zero-initialized members can be used for the same effect. If all members are effectively zero-initialized, the init symbol isn't emitted, and the compiler initializes the whole struct with zeros. With `= 0`, DMD still emits the init symbol into the object file, but doesn't use it (at least not for stack allocations).

TLDR: Seems like initializing (all non-default-zero-initialized) members with `= void` is the portable solution to elide the init symbols *and* have the compiler initialize the whole struct with zeros, so a manual memset isn't required.
July 02, 2020
On 7/2/20 10:51 AM, kinke wrote:
> On Thursday, 2 July 2020 at 16:51:52 UTC, kinke wrote:
>> `= void` for members doesn't work and, I dare say, not work anytime soon if ever.
> 
> I've quickly checked; `= void` for members has initialize-with-zeros semantics too, so with LDC, it's equivalent to `= 0` but applicable to user-defined types as well.
> For DMD, `= void` for non-default-zero-initialized members can be used for the same effect. If all members are effectively zero-initialized, the init symbol isn't emitted, and the compiler initializes the whole struct with zeros. With `= 0`, DMD still emits the init symbol into the object file, but doesn't use it (at least not for stack allocations).
> 
> TLDR: Seems like initializing (all non-default-zero-initialized) members with `= void` is the portable solution to elide the init symbols *and* have the compiler initialize the whole struct with zeros, so a manual memset isn't required.

Thank you! I just checked: Even 2.084 behaves the same. I will deal with double.nan, etc. for structs where they matter.

Ali