August 04, 2020
On Tuesday, 4 August 2020 at 02:09:13 UTC, Andrei Alexandrescu wrote:
> Oh, yes forgot about that important efficiency matter. Yes it does look like we need a __trait after all.

How do you think it should be exposed? An initialization function the compiler generates? Some kind of range of ranges? (so like a representation of "4 bytes of zero, 5000 bytes uninitialized, 4 bytes of 4s". Though at that point a .tupleof may make more sense, just gotta account for hidden fields too like the class vtable pointer.)

I'm thinking the function is probably the best though then tweaking it becomes a compiler patch again. It would also want to be guaranteed to be inlined probably.

I don't know though, it is kinda tricky to actually account for those =void things.
August 03, 2020
On 8/3/20 10:35 PM, Adam D. Ruppe wrote:
> On Tuesday, 4 August 2020 at 02:09:13 UTC, Andrei Alexandrescu wrote:
>> Oh, yes forgot about that important efficiency matter. Yes it does look like we need a __trait after all.
> 
> How do you think it should be exposed? An initialization function the compiler generates? Some kind of range of ranges? (so like a representation of "4 bytes of zero, 5000 bytes uninitialized, 4 bytes of 4s". Though at that point a .tupleof may make more sense, just gotta account for hidden fields too like the class vtable pointer.)
> 
> I'm thinking the function is probably the best though then tweaking it becomes a compiler patch again. It would also want to be guaranteed to be inlined probably.
> 
> I don't know though, it is kinda tricky to actually account for those =void things.

I'm an introspection junkie so I just wish I got access to the initial value of every field. Come to think of it - a litmus test for introspection is that you can print out during compliation an exact definition of any data structure in the program. That is, you should be able to write a function:

printDefinition(T)

such that during compilation, given:

struct S {
    int a = 42;
    immutable double b;
    string c = "hi";
    char[100] c = void;
    void func(double);
    ...
}

then printDefinition!T would output S during compilation. (Without method bodies, but with all qualifiers and attributes and alignment directives and all.)

From that perspective, clearly there's a need for __traits(initializerString, T, "c") or __traits(initializerString, T, 2). It always returns a string containing the initializer value ("void" for void) so code can either print it or mixin it.

For S, __traits(initializerString, T, 0) returns "42", __traits(initializerString, T, 2) and __traits(initializerString, T, "c") return "\"hi\"", and so on.

August 03, 2020
On 8/3/20 11:47 PM, Andrei Alexandrescu wrote:
> then printDefinition!T would output S during compilation.

s/printDefinition!T/printDefinition!S/
August 04, 2020
On Tuesday, 4 August 2020 at 03:47:44 UTC, Andrei Alexandrescu wrote:
> struct S {
>     int a = 42;
>     immutable double b;
>     string c = "hi";
>     char[100] c = void;
>     void func(double);
>     ...
> }
>
> then printDefinition!T would output S during compilation. (Without method bodies, but with all qualifiers and attributes and alignment directives and all.)
>
> From that perspective, clearly there's a need for __traits(initializerString, T, "c") or __traits(initializerString, T, 2). It always returns a string containing the initializer value ("void" for void) so code can either print it or mixin it.
>
> For S, __traits(initializerString, T, 0) returns "42", __traits(initializerString, T, 2) and __traits(initializerString, T, "c") return "\"hi\"", and so on.

The problem with initializerString is it doesn't play nice with mixins - when a field is of a type not defined or imported in the module that does the mixin, the compiler barfs.

Since the initializer must be a compile-time constant, can't we just have the __trait return the value, and void in the case of void-initialization? (if so, what do we do for fields not explicitly initialized?)

--
  Simen
August 04, 2020
On Tuesday, 4 August 2020 at 02:03:34 UTC, Andrei Alexandrescu wrote:
> On 8/3/20 10:44 AM, Johan wrote:
>> On Monday, 3 August 2020 at 13:01:55 UTC, Andrei Alexandrescu wrote:
>>> Would it be effective to iterate through the .tupleof and initialize each in turn?
>> 
>> Possibly. IIRC, the spec obliges us to initialize the padding in-between address-aligned members aswell, such that a memcmp works to compare structs. If that is true, then we have to initialize the padding aswell and a memcpy would be that much nicer.
>
> To play devil's advocate, the padding bytes should not have been changed by user code in the first place :o).

But the memory into which objects are placed will be tainted and thus the padding areas will not be the same for each object. (it's the same for =void members. All can be incorporated into the initializer function, but it's work.)

-Johan

August 04, 2020
Am Tue, 04 Aug 2020 09:31:16 +0000 schrieb Johan:

> On Tuesday, 4 August 2020 at 02:03:34 UTC, Andrei Alexandrescu wrote:
>> On 8/3/20 10:44 AM, Johan wrote:
>>> On Monday, 3 August 2020 at 13:01:55 UTC, Andrei Alexandrescu wrote:
>>>> Would it be effective to iterate through the .tupleof and initialize each in turn?
>>> 
>>> Possibly. IIRC, the spec obliges us to initialize the padding in-between address-aligned members aswell, such that a memcmp works to compare structs. If that is true, then we have to initialize the padding aswell and a memcpy would be that much nicer.
>>
>> To play devil's advocate, the padding bytes should not have been changed by user code in the first place :o).
> 
> But the memory into which objects are placed will be tainted and thus the padding areas will not be the same for each object. (it's the same for =void members. All can be incorporated into the initializer function, but it's work.)
> 
> -Johan

I wonder whether an initial memset + then initializing members may be a good solution? The compiler backends may be clever enough to optimize the memset (e.g. if there are no gaps, so it's completely redundant, if there is a single gap and explicitly filling that gap is more efficient than zeroing everything, ...).

However, in some cases a memcpy which copies both member initialization data and padding may be better? I'm not sure how to decide when which option is better or whether we can somehow have both...

-- 
Johannes
August 04, 2020
On 03.08.20 00:48, Stefan Koch wrote:
> On Sunday, 2 August 2020 at 22:25:19 UTC, Andrei Alexandrescu wrote:
>> I'm working on redoing typeid for classes without compiler magic, and stumbled upon the class initializer - the bytes blitted over the class before the constructor is called.
>>
>> Any ideas on how to do that via introspection? The fields are accessible, but not their default values.
>>
>> It seems like __traits(type, getInitializer) might be necessary.
> 
> So you are introducing new compiler magic in the form of __traits,
> To replace the old compiler magic in the form of type-info?
> 
> What exactly is the goal of this?

Orthogonality of magic.
August 05, 2020
On Monday, 3 August 2020 at 14:44:38 UTC, Johan wrote:
>
> My current solution [*]: https://github.com/weka-io/druntime/blob/0dab4b0dc5cbccb891351095ff09b0558e3fbe06/src/core/internal/lifetime.d#L92-L140
>
> -Johan
>
> [*] Hits an obscure mangling bug, so doesn't quite work with Weka's codebase yet

This is the bug: https://issues.dlang.org/show_bug.cgi?id=21120

-Johan

August 05, 2020
Am Tue, 04 Aug 2020 10:13:53 +0000 schrieb Johannes Pfau:

> Am Tue, 04 Aug 2020 09:31:16 +0000 schrieb Johan:
> 
>> On Tuesday, 4 August 2020 at 02:03:34 UTC, Andrei Alexandrescu wrote:
>>> On 8/3/20 10:44 AM, Johan wrote:
>>>> On Monday, 3 August 2020 at 13:01:55 UTC, Andrei Alexandrescu wrote:
>>>>> Would it be effective to iterate through the .tupleof and initialize each in turn?
>>>> 
>>>> Possibly. IIRC, the spec obliges us to initialize the padding in-between address-aligned members aswell, such that a memcmp works to compare structs. If that is true, then we have to initialize the padding aswell and a memcpy would be that much nicer.
>>>
>>> To play devil's advocate, the padding bytes should not have been changed by user code in the first place :o).
>> 
>> But the memory into which objects are placed will be tainted and thus the padding areas will not be the same for each object. (it's the same for =void members. All can be incorporated into the initializer function, but it's work.)
>> 
>> -Johan
> 
> I wonder whether an initial memset + then initializing members may be a good solution? The compiler backends may be clever enough to optimize the memset (e.g. if there are no gaps, so it's completely redundant, if there is a single gap and explicitly filling that gap is more efficient than zeroing everything, ...).
> 
> However, in some cases a memcpy which copies both member initialization data and padding may be better? I'm not sure how to decide when which option is better or whether we can somehow have both...

A quick look at some generated ASM for C++ code suggests that GCC can "see through" memcpys if the copied data is "well known":

https://godbolt.org/z/jno9KM *

So if GCC actually knows which data will be memcpyed, it may rewrite the memcpy to assignments of statically known values. Or it may rewrite the memcpy into multiple assignments skipping holes, it may remove redundant writes (e.g. if a member is immediately written after initialization), ...


I'd therefore suggest the following:
1) Make all init symbols COMDAT: This ensures that if a smybol is
actually needed (address taken, real memcpy call) it will be available.
But if it is not needed, the compiler does not have to output the symbol.
If it's required in multiple files, COMDAT will merge the symbols into
one.

2) Ensure the compiler always knows the data of that symbol. This probably means during codegen, the initializer should never be an external symbol. It needs to be a COMDAT symbol with attached initializer expression. And the initializer data must always be fully available in .di files.

The two rules combined should allow the backend to choose the initialization method that is most appropriate for the target architecture.


To summarize, implementing "initializer functions" may prevent this optimization to some degree (depends on inlining and other factors though). So I'd probably prefer to keep compiler generated initializer symbols for aggregates, but make sure that these symbold always have an initializer expression attached, so the backend can choose which one to use.

In addition, there needs to be some well-defined way for user code to initialize variables and trigger these optimizations. Most likely __builtin_memset(p, 0, size) and __builtin_memcpy(p, &T.init, T.sizeof) would be fine though.



* Interesting that the most efficient way to return a default-initialized aggregate on X86 by value is to just return an address to the initializer. I guess the ABI copies anyway in the caller...

-- 
Johannes
August 05, 2020
On Wednesday, 5 August 2020 at 13:40:16 UTC, Johannes Pfau wrote:
>
> I'd therefore suggest the following:
> 1) Make all init symbols COMDAT: This ensures that if a smybol is
> actually needed (address taken, real memcpy call) it will be available.
> But if it is not needed, the compiler does not have to output the symbol.
> If it's required in multiple files, COMDAT will merge the symbols into
> one.
>
> 2) Ensure the compiler always knows the data of that symbol. This probably means during codegen, the initializer should never be an external symbol. It needs to be a COMDAT symbol with attached initializer expression. And the initializer data must always be fully available in .di files.
>
> The two rules combined should allow the backend to choose the initialization method that is most appropriate for the target architecture.

What you are suggesting is pretty much exactly what the compilers already do. Except that we don't expose the initialization symbol directly to the user (T.init is an rvalue, and does not point to the initialization symbol), but through TypeInfo.initializer. Not exposing the initializer symbol to the user had a nice benefit: for cases where we never want to emit an initializer symbol (very large structs), we simply removed that symbol and started doing something else (memset zero), without breaking any user code. However this only works for all-zero structs, because TypeInfo.initializer must return a slice ({init symbol, length}) to data or {null,length} for all-zero (the 'null' is what we started making use of). More complex cases cannot elide the symbol.

Initializer functions would allow us to tailor initialization for more complex cases (e.g. with =void holes, padding schenanigans, or non-zero-but-repetitive-constant double[1million] arrays, ...), without having to always turn-on some backend optimizations (at -O0) and without having to expose a TypeInfo.initializer slice, but instead exposing a TypeInfo.initializer function pointer.

-Johan