IMO we need to make core.lifetime.{move,forward}
compiler intrinsics, to enable further optimizations that aren't possible with a library solution.
Move
- semantics: move an lvalue to a new rvalue, at a new memory address, 'hijacking' the lvalue resources; the lvalue is reset to T.init (blit, not assignment!) afterwards
- will be complete with move ctor; syntax needs to be decided, but signature is
(ref T)
(yes, must be an explicit ref)- allows to opt out of the default blit (memcpy struct payload), e.g., to fix up interior pointers
- move ctor interop with C++ should be doable (just getting the extern(C++) mangle right)
- problem: handle/avoid all compiler-implicit moves/blits (would have to call move ctor and dtor now; emplace FTW!)
- would be nice as intrinsic:
- not to have to import
core.lifetime
everywhere and end up with complicated template bloat for a basically trivial operation - potential optimization: elide lvalue reset to T.init and its destruction iff:
- it is a local (can skip destruction)
- and not used after the move
- and the destruction of T.init is a noop (modulo mods to the struct's own payload), so its elision not observable
- not to have to import
When move isn't sufficient: perfect forwarding
forward must become an intrinsic:
- for vars with
ref
storage class: as-is, yields the original lvalue - non-ref lvalues (NEW semantics): 're-interpret as rvalue' - no move, and accordingly no destruction after forwarding (because the rvalue will already be destructed earlier)
- only valid for locals (incl. params), the destruction of other lvalues cannot be skipped
- invalid/undefined to access the original lvalue after forwarding it (has been destructed already)
- probably only valid:
- as function call argument expressions (glue layer needs to treat it like a frontend-generated temporary, passing it directly by ref)
- as assignment right-hand-sides, for move-assign (
dst = forward!src;
=>dst.opAssign(forward!src);
) - as return expressions, for move-constructions (but prefer NRVO if possible, for direct emplace)
- probably needs to keep template syntax (
forward!x
, notforward(x)
) for backwards compatibility with druntime template
Let's take a look at an example:
import core.stdc.stdio;
import core.lifetime;
struct S {
int x;
this(int x) {
this.x = x;
printf("ctor: %p\n", &this);
}
this(this) {
printf("copy: %p\n", &this);
}
~this() {
printf("dtor: %p\n", &this);
}
}
void main() {
{
auto lval = S(1);
printf("lval: %p\n", &lval);
const r = bar1(lval);
printf(" r: %p\n", &r);
}
{
printf("\nrvalue:\n");
const r = bar1(S(2));
printf(" r: %p\n", &r);
}
}
S bar1()(auto ref S s) {
printf("bar1: %p\n", &s);
return bar2(forward!s);
}
S bar2()(auto ref S s) {
printf("bar2: %p\n", &s);
return bar3(forward!s);
}
S bar3()(auto ref S s) {
printf("bar3: %p\n", &s);
return bar4(forward!s);
}
S bar4()(auto ref S s) {
printf("bar4: %p, got a ref: %d\n", &s, __traits(isRef, s));
return s; // copy parameter lvalue to return value
}
Output with DMD (and GDC), no backend optimizations:
ctor: 0x7ffebea26460
lval: 0x7ffebea26460
bar1: 0x7ffebea26460
bar2: 0x7ffebea26460
bar3: 0x7ffebea26460
bar4: 0x7ffebea26460, got a ref: 1
copy: 0x7ffebea263d0
r: 0x7ffebea26464
dtor: 0x7ffebea26464
dtor: 0x7ffebea26460
rvalue:
ctor: 0x7ffebea2647c
bar1: 0x7ffebea26488
bar2: 0x7ffebea26424
bar3: 0x7ffebea263e4
bar4: 0x7ffebea263a4, got a ref: 0
copy: 0x7ffebea26358
dtor: 0x7ffebea263a4
dtor: 0x7ffebea263e4
dtor: 0x7ffebea26424
dtor: 0x7ffebea26488
r: 0x7ffebea26478
dtor: 0x7ffebea26478
What we see is that current core.lifetime.forward
propagates the ref-ness of the parameter, but has to core.lifetime.move
it in the non-ref case, creating 3 explicit moves + destructions.
We also see that there are compiler-implicit moves ('optimized', i.e., no reset+destruction of the moved-from value):
- when passing the
S(2)
rvalue tobar1
(not sure why, seems like a bug) - note the different addresses ofctor
andbar1
- for the return values - the addresses of
copy
andr
diverge (constructed @ 0x7ffebea26358, destructed @ 0x7ffebea26478)
With LDC, we at least already get perfectly forwarded return values (the addresses of copy
and r
are identical):
ctor: 0x7ffda922edbc
lval: 0x7ffda922edbc
bar1: 0x7ffda922edbc
bar2: 0x7ffda922edbc
bar3: 0x7ffda922edbc
bar4: 0x7ffda922edbc, got a ref: 1
copy: 0x7ffda922edb8
r: 0x7ffda922edb8
dtor: 0x7ffda922edb8
dtor: 0x7ffda922edbc
rvalue:
ctor: 0x7ffda922eda0
bar1: 0x7ffda922ed6c
bar2: 0x7ffda922ed1c
bar3: 0x7ffda922eccc
bar4: 0x7ffda922ecc8, got a ref: 0
copy: 0x7ffda922eda4
dtor: 0x7ffda922ecc8
dtor: 0x7ffda922eccc
dtor: 0x7ffda922ed1c
dtor: 0x7ffda922ed6c
r: 0x7ffda922eda4
dtor: 0x7ffda922eda4
The compiler needs to implement RVO (Return Value Optimization, different to Named-RVO!) to enable perfect forwarding of the return values. In this example, r
is allocated in main
, then its address passed and forwarded as hidden pointer all the way to bar4
, where it gets copy-constructed.
With the proposed forward
semantics, we'd get perfect forwarding of the s
parameters too, without the 3 explicit moves and destructions. The S(2)
rvalue would be created in main
, then passed and forwarded directly by ref all the way to bar4
, where it would get destructed when the s
param goes out of scope.
Cherry on top: Last-use optimization from DIP 1040
This would make the compiler automatically forward
suited lvalues. In the example, we wouldn't have to use a single explicit forward
in the barN
trampolines, and the copy-construction of the return value in the non-ref version of bar4
would be optimized to a move-construction (return forward!s
).