IMO we need to make core.lifetime.{move,forward} compiler intrinsics, to enable further optimizations that aren't possible with a library solution.
Move
- semantics: move an lvalue to a new rvalue, at a new memory address, 'hijacking' the lvalue resources; the lvalue is reset to T.init (blit, not assignment!) afterwards
- will be complete with move ctor; syntax needs to be decided, but signature is
(ref T)(yes, must be an explicit ref)- allows to opt out of the default blit (memcpy struct payload), e.g., to fix up interior pointers
- move ctor interop with C++ should be doable (just getting the extern(C++) mangle right)
- problem: handle/avoid all compiler-implicit moves/blits (would have to call move ctor and dtor now; emplace FTW!)
- would be nice as intrinsic:
- not to have to import
core.lifetimeeverywhere and end up with complicated template bloat for a basically trivial operation - potential optimization: elide lvalue reset to T.init and its destruction iff:
- it is a local (can skip destruction)
- and not used after the move
- and the destruction of T.init is a noop (modulo mods to the struct's own payload), so its elision not observable
- not to have to import
When move isn't sufficient: perfect forwarding
forward must become an intrinsic:
- for vars with
refstorage class: as-is, yields the original lvalue - non-ref lvalues (NEW semantics): 're-interpret as rvalue' - no move, and accordingly no destruction after forwarding (because the rvalue will already be destructed earlier)
- only valid for locals (incl. params), the destruction of other lvalues cannot be skipped
- invalid/undefined to access the original lvalue after forwarding it (has been destructed already)
- probably only valid:
- as function call argument expressions (glue layer needs to treat it like a frontend-generated temporary, passing it directly by ref)
- as assignment right-hand-sides, for move-assign (
dst = forward!src;=>dst.opAssign(forward!src);) - as return expressions, for move-constructions (but prefer NRVO if possible, for direct emplace)
- probably needs to keep template syntax (
forward!x, notforward(x)) for backwards compatibility with druntime template
Let's take a look at an example:
import core.stdc.stdio;
import core.lifetime;
struct S {
int x;
this(int x) {
this.x = x;
printf("ctor: %p\n", &this);
}
this(this) {
printf("copy: %p\n", &this);
}
~this() {
printf("dtor: %p\n", &this);
}
}
void main() {
{
auto lval = S(1);
printf("lval: %p\n", &lval);
const r = bar1(lval);
printf(" r: %p\n", &r);
}
{
printf("\nrvalue:\n");
const r = bar1(S(2));
printf(" r: %p\n", &r);
}
}
S bar1()(auto ref S s) {
printf("bar1: %p\n", &s);
return bar2(forward!s);
}
S bar2()(auto ref S s) {
printf("bar2: %p\n", &s);
return bar3(forward!s);
}
S bar3()(auto ref S s) {
printf("bar3: %p\n", &s);
return bar4(forward!s);
}
S bar4()(auto ref S s) {
printf("bar4: %p, got a ref: %d\n", &s, __traits(isRef, s));
return s; // copy parameter lvalue to return value
}
Output with DMD (and GDC), no backend optimizations:
ctor: 0x7ffebea26460
lval: 0x7ffebea26460
bar1: 0x7ffebea26460
bar2: 0x7ffebea26460
bar3: 0x7ffebea26460
bar4: 0x7ffebea26460, got a ref: 1
copy: 0x7ffebea263d0
r: 0x7ffebea26464
dtor: 0x7ffebea26464
dtor: 0x7ffebea26460
rvalue:
ctor: 0x7ffebea2647c
bar1: 0x7ffebea26488
bar2: 0x7ffebea26424
bar3: 0x7ffebea263e4
bar4: 0x7ffebea263a4, got a ref: 0
copy: 0x7ffebea26358
dtor: 0x7ffebea263a4
dtor: 0x7ffebea263e4
dtor: 0x7ffebea26424
dtor: 0x7ffebea26488
r: 0x7ffebea26478
dtor: 0x7ffebea26478
What we see is that current core.lifetime.forward propagates the ref-ness of the parameter, but has to core.lifetime.move it in the non-ref case, creating 3 explicit moves + destructions.
We also see that there are compiler-implicit moves ('optimized', i.e., no reset+destruction of the moved-from value):
- when passing the
S(2)rvalue tobar1(not sure why, seems like a bug) - note the different addresses ofctorandbar1 - for the return values - the addresses of
copyandrdiverge (constructed @ 0x7ffebea26358, destructed @ 0x7ffebea26478)
With LDC, we at least already get perfectly forwarded return values (the addresses of copy and r are identical):
ctor: 0x7ffda922edbc
lval: 0x7ffda922edbc
bar1: 0x7ffda922edbc
bar2: 0x7ffda922edbc
bar3: 0x7ffda922edbc
bar4: 0x7ffda922edbc, got a ref: 1
copy: 0x7ffda922edb8
r: 0x7ffda922edb8
dtor: 0x7ffda922edb8
dtor: 0x7ffda922edbc
rvalue:
ctor: 0x7ffda922eda0
bar1: 0x7ffda922ed6c
bar2: 0x7ffda922ed1c
bar3: 0x7ffda922eccc
bar4: 0x7ffda922ecc8, got a ref: 0
copy: 0x7ffda922eda4
dtor: 0x7ffda922ecc8
dtor: 0x7ffda922eccc
dtor: 0x7ffda922ed1c
dtor: 0x7ffda922ed6c
r: 0x7ffda922eda4
dtor: 0x7ffda922eda4
The compiler needs to implement RVO (Return Value Optimization, different to Named-RVO!) to enable perfect forwarding of the return values. In this example, r is allocated in main, then its address passed and forwarded as hidden pointer all the way to bar4, where it gets copy-constructed.
With the proposed forward semantics, we'd get perfect forwarding of the s parameters too, without the 3 explicit moves and destructions. The S(2) rvalue would be created in main, then passed and forwarded directly by ref all the way to bar4, where it would get destructed when the s param goes out of scope.
Cherry on top: Last-use optimization from DIP 1040
This would make the compiler automatically forward suited lvalues. In the example, we wouldn't have to use a single explicit forward in the barN trampolines, and the copy-construction of the return value in the non-ref version of bar4 would be optimized to a move-construction (return forward!s).
Permalink
Reply