How about some __initialize magic?

How about some __initialize magic?
Nov 27, 2021 Stanislav Blinov
Nov 28, 2021 kinke
Nov 28, 2021 russhy
Nov 28, 2021 Stanislav Blinov
Nov 28, 2021 russhy
Nov 28, 2021 Stanislav Blinov
Nov 28, 2021 russhy
Nov 29, 2021 Stanislav Blinov
Nov 29, 2021 russhy
Nov 28, 2021 russhy
Nov 28, 2021 Stanislav Blinov
Dec 13, 2021 Per Nordlöw
Dec 19, 2021 Tejas
Jan 04, 2022 vit

November 27, 2021

Posted by Stanislav Blinov

Permalink

Stanislav Blinov

Permalink

D lacks syntax for initializing the uninitialized. We can do this:

T stuff = T(args); // or new T(args);

but this?..

T* ptr = allocateForT();
// now what?.. Can't just do *ptr = T(args) - that's an assignment, not initialization!
// is T a struct? A union? A class? An int?.. Is it even a constructor call?..

This is, uh, "solved", using library functions - emplaceInitializer, emplace, copyEmplace, moveEmplace. The fact that there are four functions to do this should already ring a bell, but if one was to look at how e.g. the emplace is implemented, there's lots and lots more to it - classes or structs? Constructor or no constructor? Postblit? Copy?.. And all the delegation... A single call to emplace may copy the bits around more than once. Talk about initializing a static array... Or look at emplaceInitializer, which the other three all depend upon: it is, currently, built on a hack just to avoid blowing up the stack (which is, ostensibly, what previous less hacky hack lead to). Upcoming __traits(initSymbol) would help in removing the hack, but won't help CTFE any. At various points of their lives, these things even explicitly called memcpy, which is just... argh! And some still do (copyEmplace, I'm looking at you). Call into CRT to blit a 8-byte struct? With statically known size and alignment? Just to sidestep type system? Eh??? Much fun for copying arrays!
...And still, none of them would work in CTFE for many types, due to various implementation quirks (which include those very calls to memcpy, or reinterpret casts). This one could, potentially, be solved with more barbed wire and swear words, that is, code, but...

Thing is, all those functions are re-implementing what the compiler can already do, but in a library. Or rather, come very close to doing that, but still don't really get there. C++ with its library solution does this better!

What if the language specified a "magic" function, called, say, __initialize, that would just do the right thing (tm)? Given an lvalue, it would instruct the compiler to generate code writing initializer, bliting, copying, or calling the appropriate constructor with the arguments. And most importantly, would work in CTFE regardless of type, and not require weird dances around T.init, dummy types involving extra argument copies, or manual fieldwise and elementwise blits (which is what one would have to do in order to e.g. make copyEmplace CTFE-able).

I.e:

// Write .init
T* raw0 = allocateForT();
// currently - emplaceInitializer(raw0);
(*raw0).__initialize;

// Initialize fields or call constructor, whichever is applicable for T(arg1, arg2)
T* raw1 = allocateForT();
// currently - raw1.emplace(forward!(arg1, arg2));
(*raw1).__initialize(forward!(arg1, arg2));

// Copy
T* raw2 = allocateForT();
// currently - copyEmplace(*raw1, *raw2);
(*raw2).__initialize(*raw1);

// Move
T* raw3 = allocateForT();
// currently - moveEmplace(*raw2, *raw3);
(*raw3).__initialize(move(*raw2));

// Could be called at runtime or during CTFE
auto createArray()
{
   // big array, don't initialize
   const(T)[1000] result = void;
   // exception handling omitted for brevity
   foreach (i, ref it; result)
   {
       // currently - `emplace`, which may fail to compile in CTFE
       it.__initialize(createIthElement(i));
   }
   return result;
}

// CTFE use case:
static auto array = createArray();

The wins are obvious - unified syntax, better error messages, CTFE support, less library voodoo failing at mimicking the compiler. The losses? I don't see any.

Note that I am not talking about yet another library function. This would not be a symbol in druntime, this would be compiler magic. Having that, emplaceInitializer, emplace and copyEmplace could be re-implemented in terms of __initialize, and eventually deprecated and removed. moveEmplace could linger until DIP1040 is implemented, tried, and proven. The move example, verbatim, would be pessimized compared to moveEmplace due to moving twice, which hopefully DIP1040 could solve.

I'm a bit hesitant to suggest how this should interact with @safe. On one hand, the established precedent is in emplace - it infers, and I'm leaning towards that, even though it can potentially invalidate existing state. On the other hand, because it can indeed invalidate existing state, it should be @system. But then it would require some additional facility just for inference, so it could be called @trusted correctly, otherwise it'd be useless. And that facility, whatever it is, better not be another library reincarnation of all required semantics. For example, something like a __traits(isSafeToInitWith, T, args). Whichever the approach, it should definitely infer all other attributes.

There are undoubtedly other things to consider. For example - classes. It would seem prudent for this hypothetical __initialize to be calling class ctors. On the other, a reference itself is just a POD, and generic code might indeed want to write null as opposed to attempting to call a default constructor. Then again, generic code still would have to specialize for classes... Thoughts welcome.

What do you think? DIP this, yay or nay? Suggestions?..

November 28, 2021

Re: How about some __initialize magic?

Posted by kinke
in reply to Stanislav Blinov

Permalink

kinke

Posted in reply to Stanislav Blinov

Permalink

On Saturday, 27 November 2021 at 21:56:05 UTC, Stanislav Blinov wrote:

[...]
Upcoming __traits(initSymbol) would help in removing the hack,

It's already removed in master.

but won't help CTFE any. At various points of their lives, these things even explicitly called memcpy, which is just... argh! And some still do (copyEmplace, I'm looking at you). Call into CRT to blit a 8-byte struct? With statically known size and alignment? Just to sidestep type system? Eh???

Most optimizers recognize a memcmp call and its semantics, and try to avoid the lib call accordingly.
A slice copy (source[] = target[] with e.g. void[]-typed slices) is a memcpy with additional potential checks for matching length and no overlap (with enabled bounds checks IIRC), so memcpy avoids that overhead. It also works with -betterC; e.g., the aforementioned checks are implemented as a druntime helper function for LDC and so not available with -betterC.
I haven't checked, but if memcpy is the only real CTFE blocker for emplace at the moment, I guess one option would be extending the CTFE interpreter by a memcpy builtin, in order not to have to further uglify the existing library code.

What do you think? DIP this, yay or nay? Suggestions?..

I'm not convinced I'm afraid. :) - I've been thinking in the other direction, treating core.lifetime.{move,forward} as builtins for codegen (possibly restricted to function call argument expressions), in order to save work for the optimizer and less bloat for debug builds.

November 28, 2021

Re: How about some __initialize magic?

Posted by russhy
in reply to kinke

Permalink

russhy

Posted in reply to kinke

Permalink

I would love to be able to do:


T* t = alloc();

(*t) = .{};

// or better
t.* = .{};

// then we could also go ahead and be able to do like:
t.* = .{ field_a: 1, fiels_2: 2 }

Basically relaxing that rule: https://dlang.org/spec/struct.html#static_struct_init

Other languages do that, and i love them

Don't let us stay behind because we refuse to more forward!

November 28, 2021

Re: How about some __initialize magic?

Posted by Stanislav Blinov
in reply to kinke

Permalink

Stanislav Blinov

Posted in reply to kinke

Permalink

On Sunday, 28 November 2021 at 02:15:37 UTC, kinke wrote:

On Saturday, 27 November 2021 at 21:56:05 UTC, Stanislav Blinov wrote:

[...]
Upcoming __traits(initSymbol) would help in removing the hack,

It's already removed in master.

Cool!

> >

Most optimizers recognize a memcmp call and its semantics, and try to avoid the lib call accordingly.

I'd rather not leave this to "try". Not only because it's work that needn't be done, but also for debug performance. Exactly the stuff you talk about at the end of your post :D

A slice copy (source[] = target[] with e.g. void[]-typed slices) is a memcpy with additional potential checks for matching length and no overlap (with enabled bounds checks IIRC), so memcpy avoids that overhead. It also works with -betterC; e.g., the aforementioned checks are implemented as a druntime helper function for LDC and so not available with -betterC.

Slice copies aren't needed :) Nor would they work in CTFE, as that requires reinterpret-casting a T to a slice.

I haven't checked, but if memcpy is the only real CTFE blocker for emplace at the moment, I guess one option would be extending the CTFE interpreter by a memcpy builtin, in order not to have to further uglify the existing library code.

emplace is also deficient:

https://github.com/dlang/druntime/blob/2b7873da09c63761fe6e69dc4dd225c0844ed4e9/src/core/internal/lifetime.d#L31-L59

Also note that that's already one call down from emplace, and potentially could move the bits or copy the argument(s) again (to call the fake struct ctor), and then, of course, again, in implementation of that fake ctor. Same goes for the actual non-fake struct __ctor version. Initializing large structs or those having expensive copy ctors is no fun. -O build may help with some of that, of course, but again I'd rather this didn't need to be in the first place.

emplaceInitializer also may not work in all cases. Current one would fail on that mangling business, upcoming one - because __traits(initSymbol) gives you a void[], meaning a reinterpret cast is needed somewhere, meaning no dice for CTFE. And that means none of these guys would work when initializer is required, since everyone in the emplace family is dependent on emplaceInitializer. So CTFE-able implementation would be back to union fun. Except, of course, for classes, which is... questionable.

Making mem* functions available to CTFE would be a big improvement for sure, but it only solves half the problem (the other being reinterpret casts).

emplace in CTFE should fail for one reason only - if the ctor is not CTFE-able (i.e. that's caller's responsibility). So far, it may fail for reasons that are down to language plumbing :(

> >

What do you think? DIP this, yay or nay? Suggestions?..

A compiler extension? Wouldn't that require semantics to be the same? Surely you wouldn't want to artificially limit their implementation in compiler just because library versions are deficient?

I mean, I'm not against this idea, but AFAIUI that route mandates we make library versions more robust. Then again, why have four builtins where one can suffice? ;)

November 28, 2021

Re: How about some __initialize magic?

Posted by Stanislav Blinov
in reply to russhy

Permalink

Stanislav Blinov

Posted in reply to russhy

Permalink

On Sunday, 28 November 2021 at 03:19:49 UTC, russhy wrote:

I would love to be able to do:

This is orthogonal to this discussion. Even if concise initializer syntax that you suggest was allowed...


T* t = alloc();

(*t) = .{};

...that's an assignment. I.e. that would lower down to uninitializedGarbage.opAssign(T.init);. Destructing garbage and/or calling operators on garbage isn't exactly the way to success :)

Which is the crux of the problem in question, and why things like emplace exist in the first place.

November 28, 2021

Re: How about some __initialize magic?

Posted by russhy
in reply to Stanislav Blinov

Permalink

russhy

Posted in reply to Stanislav Blinov

Permalink

On Sunday, 28 November 2021 at 08:54:39 UTC, Stanislav Blinov wrote:

On Sunday, 28 November 2021 at 03:19:49 UTC, russhy wrote:

I would love to be able to do:

This is orthogonal to this discussion. Even if concise initializer syntax that you suggest was allowed...


T* t = alloc();

(*t) = .{};

...that's an assignment. I.e. that would lower down to uninitializedGarbage.opAssign(T.init);. Destructing garbage and/or calling operators on garbage isn't exactly the way to success :)

Which is the crux of the problem in question, and why things like emplace exist in the first place.

this is the exact same issue

this is exactly why i mentioned it

emplace is a library, it doesn't solve anything

it solves people's addiction to "import" things

if you tell people they need to import package to to initialization, then the language is a failure

.{} wins over __initialize

there need to be a movement to stop making syntax such a pain to write, and make things overall consistent

It's the same with enums

MyEnumDoingThings myEnumThatINeed = MyEnumDoingThings.SOMETHING_IS_NOT_RIGHT;

And now you want to same for everything else

(*raw1).__initialize(forward!(arg1, arg2));

more typing! templates!! more long lines!!! more slowness!!!!

November 28, 2021

Re: How about some __initialize magic?

Posted by russhy
in reply to Stanislav Blinov

Permalink

russhy

Posted in reply to Stanislav Blinov

Permalink

On Sunday, 28 November 2021 at 08:54:39 UTC, Stanislav Blinov wrote:

This is orthogonal to this discussion. Even if concise initializer syntax that you suggest was allowed

let's improve it then, let's play more with it

instead of introducing new functions/templates

i feel like this is the perfect place to have such improvements take place

November 28, 2021

Re: How about some __initialize magic?

Posted by Stanislav Blinov
in reply to russhy

Permalink

Stanislav Blinov

Posted in reply to russhy

Permalink

On Sunday, 28 November 2021 at 16:36:05 UTC, russhy wrote:

this is the exact same issue

No, it isn't.

It's the same with enums

No, it isn't.

And now you want to same for everything else

No, I don't.

(*raw1).__initialize(forward!(arg1, arg2));

more typing! templates!! more long lines!!! more slowness!!!!

Way off mark here.

> >

This is orthogonal to this discussion. Even if concise initializer syntax that you suggest was allowed

let's improve it then, let's play more with it
instead of introducing new functions/templates
i feel like this is the perfect place to have such improvements take place

This topic has nothing to do with what you're talking about.

November 28, 2021

Re: How about some __initialize magic?

Posted by russhy
in reply to Stanislav Blinov

Permalink

russhy

Posted in reply to Stanislav Blinov

Permalink

On Sunday, 28 November 2021 at 19:30:11 UTC, Stanislav Blinov wrote:

This topic has nothing to do with what you're talking about.

It does, you just don't understand what "we could improve it" mean; relaxing its rules, and reusing the syntax for doing what you ask for

November 29, 2021

Re: How about some __initialize magic?

Posted by Stanislav Blinov
in reply to russhy

Permalink

Stanislav Blinov

Posted in reply to russhy

Permalink

On Sunday, 28 November 2021 at 22:00:05 UTC, russhy wrote:

On Sunday, 28 November 2021 at 19:30:11 UTC, Stanislav Blinov wrote:

This topic has nothing to do with what you're talking about.

It does, you just don't understand what "we could improve it" mean; relaxing its rules, and reusing the syntax for doing what you ask for

Oh I have no doubt that there is indeed some lack of understanding here. So I'm going to try one last time. The problem in question lies in the assignment operator, not whatever's on the right hand side of it. It's absolutely irrelevant here how you spell the initializer.

First please understand the difference between initialization and assignment. Then read up on https://dlang.org/spec/declaration.html#void_init and then try to understand that assigning to uninitialized structs that have an explicit or implicit opAssign defined would involve using uninitialized values, which may lead to UB. And that is just one of the problems that existing library solutions address. The rest is spelled out in the first post.

Have fun with this little program:

import std.stdio;

void main() {
    File file = void;
    file = File.init; // File.init, .{}, BANANAS - doesn't matter, it's UB
}

So once again, if you want to discuss initializer syntax, feel free to create a topic for that as that is not what's in question here.

Top | Forum index | About this forum

Forums