August 05, 2020
Am Wed, 05 Aug 2020 14:36:37 +0000 schrieb Johan:

> On Wednesday, 5 August 2020 at 13:40:16 UTC, Johannes Pfau wrote:
>>
>> I'd therefore suggest the following:
>> 1) Make all init symbols COMDAT: This ensures that if a smybol is
>> actually needed (address taken, real memcpy call) it will be available.
>> But if it is not needed, the compiler does not have to output the
>> symbol.
>> If it's required in multiple files, COMDAT will merge the symbols into
>> one.
>>
>> 2) Ensure the compiler always knows the data of that symbol. This probably means during codegen, the initializer should never be an external symbol. It needs to be a COMDAT symbol with attached initializer expression. And the initializer data must always be fully available in .di files.
>>
>> The two rules combined should allow the backend to choose the initialization method that is most appropriate for the target architecture.
> 
> What you are suggesting is pretty much exactly what the compilers already do. Except that we don't expose the initialization symbol directly to the user (T.init is an rvalue, and does not point to the initialization symbol), but through TypeInfo.initializer. Not exposing the initializer symbol to the user had a nice benefit: for cases where we never want to emit an initializer symbol (very large structs), we simply removed that symbol and started doing something else (memset zero), without breaking any user code. However this only works for all-zero structs, because TypeInfo.initializer must return a slice ({init symbol, length}) to data or {null,length} for all-zero (the 'null' is what we started making use of). More complex cases cannot elide the symbol.
> 
> Initializer functions would allow us to tailor initialization for more complex cases (e.g. with =void holes, padding schenanigans, or non-zero-but-repetitive-constant double[1million] arrays, ...), without having to always turn-on some backend optimizations (at -O0) and without having to expose a TypeInfo.initializer slice, but instead exposing a TypeInfo.initializer function pointer.
> 
> -Johan

But initializer symbols are currently not in COMDAT, or does LDC implement that? That's a crucial point, as it addresses Andrei's initializer bloat point. And it also means you can avoid emitting the symbol if it's never referenced. But if it is referenced, it will be available.

Initializer functions have the drawback that backends can no longer choose different strategies for -Os or -O2. All the other benefits you mention (=void holes, padding schenanigans, or non-zero-but-repetitive- constant double[1million] arrays, ...) can also be handled properly by the backend in the initializer-symbol case if the initializer expression is available to the backend. And you have to ensure that the initialization function can always be inlined, so without -O flags it may also lead to suboptimal code...

If the initializer optimizations depend on -O flags, it should also be possible to move the necessary steps in the backend into a different step which is executed even without optimization flags. Choosing to initialize using expressions vs. a symbol should not be an expensive step.

I don't see how an initializer function would be more flexible than that. In fact, you could generate the initializer function in the backend if information about the initialization expression is always preserved. Constructing an initializer function earlier (in the frontend, or D user code) removes information about the target architecture (-Os, memory available, efficient addressing of local constant data, ...). Because of that, I think the backend is the best place to implement this and the frontend should just provide the symbol initializer expression.

-- 
Johannes
August 05, 2020
On Wednesday, 5 August 2020 at 16:08:59 UTC, Johannes Pfau wrote:
> Am Wed, 05 Aug 2020 14:36:37 +0000 schrieb Johan:
>
>> On Wednesday, 5 August 2020 at 13:40:16 UTC, Johannes Pfau wrote:
>>>
>>> I'd therefore suggest the following:
>>> 1) Make all init symbols COMDAT: This ensures that if a smybol is
>>> actually needed (address taken, real memcpy call) it will be available.
>>> But if it is not needed, the compiler does not have to output the
>>> symbol.
>>> If it's required in multiple files, COMDAT will merge the symbols into
>>> one.
>>>
>>> 2) Ensure the compiler always knows the data of that symbol. This probably means during codegen, the initializer should never be an external symbol. It needs to be a COMDAT symbol with attached initializer expression. And the initializer data must always be fully available in .di files.
>>>
>>> The two rules combined should allow the backend to choose the initialization method that is most appropriate for the target architecture.
>> 
>> What you are suggesting is pretty much exactly what the compilers already do. Except that we don't expose the initialization symbol directly to the user (T.init is an rvalue, and does not point to the initialization symbol), but through TypeInfo.initializer. Not exposing the initializer symbol to the user had a nice benefit: for cases where we never want to emit an initializer symbol (very large structs), we simply removed that symbol and started doing something else (memset zero), without breaking any user code. However this only works for all-zero structs, because TypeInfo.initializer must return a slice ({init symbol, length}) to data or {null,length} for all-zero (the 'null' is what we started making use of). More complex cases cannot elide the symbol.
>> 
>> Initializer functions would allow us to tailor initialization for more complex cases (e.g. with =void holes, padding schenanigans, or non-zero-but-repetitive-constant double[1million] arrays, ...), without having to always turn-on some backend optimizations (at -O0) and without having to expose a TypeInfo.initializer slice, but instead exposing a TypeInfo.initializer function pointer.
>> 
>> -Johan
>
> But initializer symbols are currently not in COMDAT, or does LDC implement that? That's a crucial point, as it addresses Andrei's initializer bloat point. And it also means you can avoid emitting the symbol if it's never referenced. But if it is referenced, it will be available.

It does not matter whether the initializer symbol is in COMDAT, because (currently) it has to be dynamically accessible (e.g. by a user of a compiled library or e.g. by druntime GC object destroy code) and thus cannot be determined whether it is referenced at link/compile time.

> Initializer functions have the drawback that backends can no longer choose different strategies for -Os or -O2. All the other benefits you mention (=void holes, padding schenanigans, or non-zero-but-repetitive- constant double[1million] arrays, ...) can also be handled properly by the backend in the initializer-symbol case if the initializer expression is available to the backend. And you have to ensure that the initialization function can always be inlined, so without -O flags it may also lead to suboptimal code...

Backends can also turn an initializer function into a memcpy function.
It's perfectly fine if code is suboptimal without -O.
You can simply express more with a function than with a symbol (a symbol implies the function "memcpy(all)", whereas a function could do that and more).
How would you express =void using a symbol in an object file?

> If the initializer optimizations depend on -O flags, it should also be possible to move the necessary steps in the backend into a different step which is executed even without optimization flags. Choosing to initialize using expressions vs. a symbol should not be an expensive step.

Actually, this does sound like an expensive analysis to me (e.g. detecting the case of a large array with repetitive initialization inside a struct with a few other members). But maybe more practically, is it possible to enable/disable specific optimization passes for individual functions with gcc backend at -O0? (we can't with LLVM)

> I don't see how an initializer function would be more flexible than that. In fact, you could generate the initializer function in the backend if information about the initialization expression is always preserved. Constructing an initializer function earlier (in the frontend, or D user code) removes information about the target architecture (-Os, memory available, efficient addressing of local constant data, ...). Because of that, I think the backend is the best place to implement this and the frontend should just provide the symbol initializer expression.

I'm a little confused because your last sentence is exactly what we currently do, with the terminology:  frontend = dmd code that outputs a semantically analyzed AST. Backend = DMD/GCC/LLVM codegen. Possibly with "glue layer intermediate representation" in-between.
What I thought is discussed in this thread, is that we move the complexity out of the compilers (so out of current backends) into druntime. For that, I think an initializer function is a good solution (similar to emitting a constructor function, rather than implementing that codegen inside the backend).

-Johan

August 06, 2020
Am Wed, 05 Aug 2020 22:19:11 +0000 schrieb Johan:


>> But initializer symbols are currently not in COMDAT, or does LDC implement that? That's a crucial point, as it addresses Andrei's initializer bloat point. And it also means you can avoid emitting the symbol if it's never referenced. But if it is referenced, it will be available.
> 
> It does not matter whether the initializer symbol is in COMDAT, because (currently) it has to be dynamically accessible (e.g. by a user of a compiled library or e.g. by druntime GC object destroy code) and thus cannot be determined whether it is referenced at link/compile time.

You're right, I forgot for a second that right now, the initializer symbol has to be accessible. So obviously making it comdat now is not possible, however I think Andrei wanted to make most of that optional with the TypeInfo changes.

Regarding "e.g. by a user of a compiled library": That is exactly my point when I said the initializer _expression_ must always be available to the compiler, even for such precompiled libraries. And whenever an initializer is accessed in some code unit, the symbol should be generated and put into comdat.

This way, there can be exactly 0 or 1 instances of the initializer symbol, pay-as-you-go depending on whether it's used.

> 
>> Initializer functions have the drawback that backends can no longer choose different strategies for -Os or -O2. All the other benefits you mention (=void holes, padding schenanigans, or non-zero-but-repetitive- constant double[1million] arrays, ...) can also be handled properly by the backend in the initializer-symbol case if the initializer expression is available to the backend. And you have to ensure that the initialization function can always be inlined, so without -O flags it may also lead to suboptimal code...
> 
> Backends can also turn an initializer function into a memcpy function.

Yes but as there's no symbol with a global name, the compiler has to somehow place the data locally (local symbol / in code). Inline your code into two code units and you have unnecessarily duplicated initializer data.

Interestingly, I can't even get GCC to convert an initilizer function into
a symbol: https://godbolt.org/z/b61fcs
There's the same problem for inlining though, this will lead to lots of
duplication bloat. So when using initializer functions, inlining should
probably be not enforced and there needs to be a global function symbol as
a fallback. OTOH we want the inliner to be able to actually inline
initializer functions in any case...


> It's perfectly fine if code is suboptimal without -O.
> You can simply express more with a function than with a symbol (a symbol
> implies the function "memcpy(all)", whereas a function could do that and
> more).

That's why I'm not talking about only a symbol, I'm talking about the symbol backed by an initializer expression. The initializer expression (StructInitializer / ExpInitializer) is essentially the code representation of the initializer, as complex / compact as it may be. But the symbol fallback (SymbolExp?) can be useful in some cases.

> How would you express =void using a symbol in an object file?

Obviously there has to be some data there, 0, random, whatever. But again, I don't want to have the symbols, I only want to have them as a fallback when needed:

Maybe I don't really understand the problem: Consider this code: https://explore.dgnu.org/z/_yixUX
----------
struct Large
{
    ubyte a = 42;
    size_t[64] blob = void;
    ubyte b = 10;
}

void foo()
{
    Large l;
}
----------

Because of the byte-by-byte struct comparison, the blob memory actually
has to be initialized to 0. Nevertheless, you can see that the backend
does not reference the symbol at -O0 and it explicitly does:
mov     BYTE PTR [rbp-528], 42
So it does not only see "the symbol", it does see the individual field
initializers. If byte-by-byte comparison wasn't a requirement, the
backend (GCC) would perfectly only initialize a and b.

Now move struct Large into a different file: You'll see that GCC now "only sees the symbol", so copies from "_D1s5Large6__initZ".


I see two problems with this:
* We do not get the symbol-less initializer form if using multiple-files.
  That's why I think the frontend should make the initializer expression
(StructInitializer)
  which provides expressions to initialize all fields even for aggregates
  in non-root modules.
* We always emit the initializer symbol and pay for the overhead ==>
  comdat.


Apart from that, there is also a GDC "bug" which seems to always emit the symbol-less initializer, if possible. It would be preferable to let the backend (GCC) choose which one to use and according to some tests in C++ experiments, that is be possible. But it probably needs -O to choose the best solution.

> 
>> If the initializer optimizations depend on -O flags, it should also be possible to move the necessary steps in the backend into a different step which is executed even without optimization flags. Choosing to initialize using expressions vs. a symbol should not be an expensive step.
> 
> Actually, this does sound like an expensive analysis to me (e.g. detecting the case of a large array with repetitive initialization inside a struct with a few other members). But maybe more practically, is it possible to enable/disable specific optimization passes for individual functions with gcc backend at -O0? (we can't with LLVM)

Of course it depends on how far you go. Simply checking how much actual initialization data there is vs. =void and alignement holes is simple. Detecting foo [1, 2, 3, 1, 2, 3, 1, 2, 3] would be quite difficult. But how is that different when done in the frontend?

However, I'm not arguing at all that we should just pass a flat data buffer to the glue code and let the glue code figure out how to reconstruct initialization code from that. I'm suggesting that we always pass both, the comdat symbol and the initialization expression, to the backend:

For GCC, we can simply pass any expression (I'm not sure if it has to be constant, i.e. computable at compile time) in the GCC GENERIC backend language to DECL_INITIAL for a variable. So if the initializer in D was this:
-------
struct Foo
{
    int[64] data = repeat(1, 3, 64);
}
-------

in theory we should be able to just pass the initializer code in it's GENERIC form to DECL_INITILIZER. The GCC backend could then just generate the code for initialization.

So this then essentially is an initializer function, but of a more GCC readable kind. In some cases (Initialization of a global variable, maybe others) GCC would probably have to evaluate that code at compile time to obtain the data representation. That might be difficult, so maybe we have to consider this in the glue code and pass a complex expression/code based initializer in places where we can execute code but a data based initilizer where that's not possible.

Ideally, we pass both options to GCC and let GCC choose. The GCC backend code could be as simple as:

if (decl.initializer.isSymbol() &&
decl.initializer.symbol.hasInitializerExpression())
    // TODO: When to use expr vs. symbol?
    initializer = decl.initializer.symbol.initializerExpression;

> 
>> I don't see how an initializer function would be more flexible than that. In fact, you could generate the initializer function in the backend if information about the initialization expression is always preserved. Constructing an initializer function earlier (in the frontend, or D user code) removes information about the target architecture (-Os, memory available, efficient addressing of local constant data, ...). Because of that, I think the backend is the best place to implement this and the frontend should just provide the symbol initializer expression.
> 
> I'm a little confused because your last sentence is exactly what we currently do, with the terminology:  frontend = dmd code that outputs a semantically analyzed AST. Backend = DMD/GCC/LLVM codegen. Possibly with "glue layer intermediate representation" in-between.

When I said backend there, I meant the GCC, architecture dependent backend, not the glue layer.

> What I thought is discussed in this thread, is that we move the complexity out of the compilers (so out of current backends) into druntime. For that, I think an initializer function is a good solution (similar to emitting a constructor function, rather than implementing that codegen inside the backend).

But how is a initializer function different to the backend from a tree of StructInitializer / ExpInitializer? This is a 1:1 representation of the default initializer as written by the user. If you were to write an initializer function, wouldn't you just wrap that initializer tree in a statement and into a function?

But the backend would still have to do exactly the same code transformation, with the main difference that it now has to generate a function, inline the function and it has less information about the function (e.g. an initializer tree can be evaluated at compile time / const in GCC terms, a function may not necessarily be, side effects, ...).

So it seems to me, just passing the initializer tree from frontend to glue layer is the most information-preserving solution.





Reflecting on this some more, I guess I finally understand your point
about using a function. To summarize my points:
1 We do not get the expression initializer form if using multiple-files.
  That's why I think the frontend should make the initializer expression
  (StructInitializer) which provides expressions to initialize all fields
  even for aggregates in non-root modules.
2 We always emit the initializer symbol and pay for the overhead ==>
  comdat
3 One thing I didn't consider so far: CTFE constant folding of
  expressions in the expression based initializer: I guess that can
  destroy interesting information for the glue layer. So here we really
  want two things: A code based initializer expression, which never does
  CTFE constant folding. And a folded / evaluated expression to initialize
global variables.

So I guess if we decide we never need the symbol and drop point 2, the third point, a "non-CTFEd initializer expression" is probably pretty close to what you wanted as an initializer function. I just didn't think of it as a function...

OTOH my point about using a symbol to unify initializer storage used in multiple invocations across code units would also apply to expression based initializers: Having a function there would actually allow saving space in some cases compared to always inlining the expression. So maybe a comdat, usually-inlined but optionally available function (e.g. for - Os) is a good idea...

I'm not sure if the GCC backend can handle an initilizer function (with known body) as well a a DECL_INITIAL in non-optimizing cases though. Maybe this needs some backend engineering in GCC. (DECL_FUNC(DECL_INITIAL(x) = ...) ?
-- 
Johannes
August 06, 2020
Hi Johannes,
  Can you rewrite your email without all the GDC implementation details? Let's keep the discussion backend-agnostic.

The questions to solve are:
Q1 - What do we expose to the user? (an init symbol, an init function, typeid pointer to symbol/function for dynamic types... ?) User code should be able to reset an object to the init state. Currently user code can do that without compile-time knowledge of the dynamic type of an object.
Q2 - Do we want to take care of initialization in druntime or inside the compilers? (currently it is done inside the compilers, and each backend does things its own way as long as it complies with the answer to Q1. Array comparison was moved from the compilers into druntime. It's the same kind of discussion.).

At the moment, we only provide user code dynamic access to initializer symbol through typeid.initializer. The idea in this thread was to add a way to have 'static' access that preserves type information (e.g. doing initialization by calling a druntime template function with type as template parameter).

cheers,
  Johan

August 06, 2020
Am Thu, 06 Aug 2020 12:58:22 +0000 schrieb Johan:

> Hi Johannes,
>    Can you rewrite your email without all the GDC implementation
> details? Let's keep the discussion backend-agnostic.
> 
> The questions to solve are:
> Q1 - What do we expose to the user? (an init symbol, an init function,
> typeid pointer to symbol/function for dynamic types... ?) User code
> should be able to reset an object to the init state.
> Currently user code can do that without compile-time knowledge of the
> dynamic type of an object.
> Q2 - Do we want to take care of initialization in druntime or inside the
> compilers? (currently it is done inside the compilers,
> and each backend does things its own way as long as it complies with the
> answer to Q1. Array comparison was moved from the compilers into
> druntime. It's the same kind of discussion.).
> 
> At the moment, we only provide user code dynamic access to initializer symbol through typeid.initializer. The idea in this thread was to add a way to have 'static' access that preserves type information (e.g. doing initialization by calling a druntime template function with type as template parameter).
> 
> cheers,
>    Johan

Sorry, I guess that Email Text got much longer than what I initially wanted to write.


In the following, I'll just call "variables with non-statically known type" "dynamic types".

Q1: Only an rvalue? I didn't know anything actually needs to get an initializer for a dynamic type. Where is this used, in the GC? If we really need that, we either need a pointer to an symbol or a function. I guess I'd agree the function is likely a better solution here. Maybe put it in the vtbl then, to get it out of TypeInfo.

I don't mind exposing a function to the user if it's pay-as-you-go, e.g. only emitted on demand. Using it for dynamic types however means we'll always need to emit that function. So if it's somehow possible, I'd rather get rid of getting the initializer for dynamic types completely.


Q2: In the compilers. My previous messages were only considering cases where the type is statically known. In that case, I think the compilers can do better than a runtime solution could. (E.g. use code based initializers for small types, remove redundant initialization, emit a single initializer function for large types as the initialization code may get too large (especially if duplicated), -Os vs. -O2, ...).


-- 
Johannes
1 2 3 4
Next ›   Last »