Jump to page: 1 2
Thread overview
__rvalue and Move Semantics first draft
Nov 09
kinke
Nov 11
kinke
Re: __rvalue and Move Semantics first draft - aliasing problem/danger
Nov 11
kinke
Nov 09
kinke
Nov 11
kinke
November 09
https://github.com/WalterBright/documents/blob/5dbf6728d7d0ae46a411c720ec41e3603310172b/rvalue.md

I gave up on the previous move DIP. This one is better.
November 09

Thanks, this is definitely a step in the right direction, getting us perfect forwarding. I very much like its simplicity. First thoughts wrt. the __rvalue builtin:

>

This means that an __rvalue(lvalue expression) argument destroys the expression upon function return. Attempts to continue to use the lvalue expression are invalid. The compiler won't always be able to detect a use after being passed to the function, which means that the destructor for the object must reset the object's contents to its initial value, or at least a benign value.

What IMO needs to be stressed here is that there's always one implicit use of the original lvalue after the __rvalue usage - its destruction when going out of scope! So the dtor at the very least needs to make sure that it can handle a double-destruction, adjusting the payload to make the 2nd destruction a 'noop', not freeing effective resources twice etc.

And that's my only real problem with the proposal in its current shape - who's going to revise all existing code to check for problematic struct dtors that don't handle double-destruction, just in case someone applies __rvalue on one of these types, or a custom struct with those types as fields?

The proposed __rvalue is very similar to what I proposed in https://forum.dlang.org/thread/xnwhexrctbfgntfklzaf@forum.dlang.org, the proposed revised forward semantics in the non-ref-storage-class case. The main difference is that I went the suppress-2nd-destruction way, limiting its applicability to local variables (incl. params) only, where the destruction could be controlled via a magic destructor-guard variable for each local that might be __rvalue'd.

When going with the double destruction to keep things simpler and allow __rvalue for all lvalues (I guess PODs too, which aren't guaranteed to be passed by ref under the hood, and so might still be blitted or passed in registers, depending on the platform ABI), then I'd propose automatically performing a reset-blit to T.init after the function call (incl. the case where the callee threw - the rvalue has still been destructed in that case, so we still need to reset the payload for the 2nd destruction). This has a number of advantages:

  • No need to check and fix up all existing dtors.
  • Well-defined state of the lvalue after its usage as __rvalue - T.init -, not some nebulous 'initial value, or at least a benign value' (as proposed, the state the first destruction left the object in, or if the type has no dtor (not all non-PODs have a dtor), the state the callee left the object in).
  • Not paying the price for resets for every destruction, only after __rvalue usages. I guess the overall number of destructions is usually orders of magnitude greater than __rvalue usages.

Eliding the T.init reset and the 2nd destruction - in suited cases - could be implemented as an optimization later.


Wrt. safety, I think we should at least also mention the aliasing problem/danger:

void callee(ref S x, S y) {
    assert(&x != &y);
}

void caller() {
    S lval;
    callee(lval, __rvalue(lval));
}
November 10
On 09/11/2024 10:33 PM, Walter Bright wrote:
> https://github.com/WalterBright/documents/ blob/5dbf6728d7d0ae46a411c720ec41e3603310172b/rvalue.md
> 
> I gave up on the previous move DIP. This one is better.

This is a restatement of what I said yesterday at the monthly meeting.

I am significantly happier with this design however:

1. We'll need to introduce a swap builtin, since we have no way to say describe moves between parameters. This can come later, as it is an addition.
2. I have the concern that existing code that is not designed to accept a move, will have a move into it. White listing via an attribute ``@move`` to say that this constructor/opAssign is designed to handle a move in would be valuable.
3. Optimizing of eliding of destructors should be done with type state analysis, it does not need its own dedicated DFA.

November 09

Oh, there's at least one problem with the this(T) move-ctor signature - C++ interop. C++ doesn't destroy the parameter, because it's an rvalue-ref. The proposed by-value signature in D however includes the destruction of the value-parameter as part of the move-construction. The same applies to move-assignment via opAssign(T). So after calling a C++ move ctor/assignOp with an __rvalue(x) argument, the rvalue wasn't destructed, and its state is as the C++ callee left it. Automatically reset-blitting to T.init would be invalid in that case, as the moved-from lvalue might still have stuff to destruct.

November 09
Some great insights.

I suggest the most pragmatic implementation of your ideas is to append to the destructor calls to rvalue parameters a blit of the .init value. It is only necessary if the rvalue has a destructor. The callee cannot know if an rvalue was passed using __rvalue, so it has to defensively do this anyway.

I also suggest that maybe omit the blit for @system code, like we enable omitting array bounds checking in @system code. For efficiency, naturally!
November 09
I'm not sure it's a problem or a danger.

Timon mentioned the related problem with:

```
callee(__rvalue s, __rvalue s);
```

where s would be destroyed twice. This isn't always detectable:
```
S* ps = ...;
callee(__rvalue *s, __rvalue(*s));
```
But can be rendered benign with the blit of S.init after the destructor call.

On 11/9/2024 6:32 AM, kinke wrote:
> Wrt. safety, I think we should at least also mention the aliasing problem/danger:
> ```D
> void callee(ref S x, S y) {
>      assert(&x != &y);
> }
> 
> void caller() {
>      S lval;
>      callee(lval, __rvalue(lval));
> }
> ```


November 09
On 11/9/2024 8:15 AM, Richard (Rikki) Andrew Cattermole wrote:
> 1. We'll need to introduce a swap builtin, since we have no way to say describe moves between parameters. This can come later, as it is an addition.

Doesn't a swap function get arguments passed by `ref`?

> 2. I have the concern that existing code that is not designed to accept a move, will have a move into it. White listing via an attribute ``@move`` to say that this constructor/opAssign is designed to handle a move in would be valuable.

This can work, but if the users have to proactively add this attribute, I'm afraid we've failed.

> 3. Optimizing of eliding of destructors should be done with type state analysis, it does not need its own dedicated DFA.

The two are the same, aren't they?
November 09
On 11/9/2024 9:37 AM, kinke wrote:
> Oh, there's at least one problem with the `this(T)` move-ctor signature - C++ interop. C++ doesn't destroy the parameter, because it's an rvalue-ref. The proposed by-value signature in D however includes the destruction of the value-parameter as part of the move-construction. The same applies to move-assignment via `opAssign(T)`. So after calling a C++ move ctor/assignOp with an `__rvalue(x)` argument, the rvalue wasn't destructed, and its state is as the C++ callee left it. Automatically reset-blitting to `T.init` would be invalid in that case, as the moved-from lvalue might still have stuff to destruct.

We could disallow __rvalue arguments for call to C++ functions?
November 10
On 10/11/2024 11:59 AM, Walter Bright wrote:
> On 11/9/2024 8:15 AM, Richard (Rikki) Andrew Cattermole wrote:
>> 1. We'll need to introduce a swap builtin, since we have no way to say describe moves between parameters. This can come later, as it is an addition.
> 
> Doesn't a swap function get arguments passed by `ref`?

Yes, but for lifetime tracking, we need to be able to say the original value isn't here anymore.

```d
int* a, b;

int* c = a, d = b;

swap(a, b);

// c has same variable state as a
// d has same variable state as b
```

In general moving is easy:

```d
int* move(?initialized,reachable ref int* input) {
	return input;
}
```

But swap isn't.

```d
void swap(
     ?initialized,initialized @escape(b) ref int* a,
     ?initialized,initialized @escape(a) ref int* b);
```

>> 2. I have the concern that existing code that is not designed to accept a move, will have a move into it. White listing via an attribute ``@move`` to say that this constructor/opAssign is designed to handle a move in would be valuable.
> 
> This can work, but if the users have to proactively add this attribute, I'm afraid we've failed.

The alternative is to disallow constructor/opAssign that is in a D2 module and not by-ref to have __rvalue passed to it.

Tie it to a new edition.

Any function being called that is by-ref will work the same.

```d
module thing 2025;

struct Foo {
	this(Foo input);
}

void main() {
	Foo f;
	Foo t = __rvalue(f); // move constructor call
}
```

```d
module thing 2;

struct Foo {
	this(Foo input);
	this(ref Foo input);
}

void main() {
	Foo f;
	Foo t = __rvalue(f); // copy constructor call
}
```

>> 3. Optimizing of eliding of destructors should be done with type state analysis, it does not need its own dedicated DFA.
> 
> The two are the same, aren't they?

Yes exactly.

When you converge (or other known points), you'd look to see what the last destructor is, and if appropriete ``var.lastDestroy.disabled = true;``.

Type state analysis has the absolutely beautiful property that the builtin states are 100% correct even in ``@system`` code.

It is _always_ an error to dereference a null pointer.

It is _always_ a logic error to read from uninitialized memory.

So it'll be run on all code, which means we can rely on it to do eliding for stuff like this.

Same situation with RC.

```d
rc.opAddRef();
rc.opSubRef();
```

Same object, pair can be elided.

It is why the add needs to happen in the called function, because then it can be elided without cross-function analysis.

November 10
On 10/11/2024 11:44 AM, Walter Bright wrote:
> Timon mentioned the related problem with:
> 
> |callee(__rvalue s, __rvalue s); |
> 
> where s would be destroyed twice. This isn't always detectable:

Break it down into an IR:

```
a = __rvalue(s)
b = __rvalue(s)
callee(a, b)
```

This is what type state analysis sees at an IR level.

```
// s must be >=initialized
a = s
// s is reachable which is < initialied

// s must be >= initialized, ERROR
b = s
```

We don't need to solve type state analysis here ;)

But it does tell us, that as a language feature it is dependent upon it, to be working correctly, so can't be turned on until then.

« First   ‹ Prev
1 2