Jump to page: 1 2 3
Thread overview
Feedback Thread: DIP 1040--Copying, Moving, and Forwarding--Community Review Round 1
Mar 05, 2021
Mike Parker
Mar 05, 2021
Ben Jones
Mar 05, 2021
Paul Backus
Mar 08, 2021
Walter Bright
Mar 08, 2021
deadalnix
Mar 05, 2021
tsbockman
Mar 06, 2021
deadalnix
Mar 08, 2021
Walter Bright
Mar 08, 2021
Timon Gehr
Mar 08, 2021
Imperatorn
Mar 08, 2021
Tobias Pankrath
Mar 08, 2021
Timon Gehr
Mar 08, 2021
Imperatorn
Mar 08, 2021
deadalnix
Mar 08, 2021
Mike Parker
Mar 10, 2021
Atila Neves
Mar 11, 2021
tsbockman
Mar 16, 2021
Dukc
Mar 20, 2021
Walter Bright
Mar 18, 2021
Elronnd
Mar 20, 2021
Walter Bright
Mar 18, 2021
deadalnix
Mar 20, 2021
Walter Bright
Mar 20, 2021
Walter Bright
Mar 20, 2021
Mike Parker
Mar 23, 2021
Manu
Mar 23, 2021
Max Haughton
Mar 23, 2021
deadalnix
Mar 24, 2021
Manu
Mar 24, 2021
Max Haughton
March 05, 2021
This is the feedback thread for the first round of Community Review of DIP 1040, "Copying, Moving, and Forwarding".

===================================
**THIS IS NOT A DISCUSSION THREAD**

Posts in this thread must adhere to the feedback thread rules outlined in the Reviewer Guidelines (and listed at the bottom of this post).

https://github.com/dlang/DIPs/blob/master/docs/guidelines-reviewers.md

That document also provides guidelines on contributing feedback to a DIP review. Please read it before posting here. If you would like to discuss this DIP, please do so in the discussion thread:

https://forum.dlang.org/post/ncoqnixvllbjsxdzbyxi@forum.dlang.org
==================================

You can find DIP 1040 here:

https://github.com/dlang/DIPs/blob/a9c553b0dbab1c2983a801b5e89b51c5c33d5180/DIPs/DIP1040.md

The review period will end at 11:59 PM ET on March 19, or when I make a post declaring it complete. Feedback posted to this thread after that point may be ignored.

At the end of this review round, the DIP will be moved into the Post-Community Round 1 state. Significant revisions resulting from this review round may cause the DIP manager to require another round of Community Review, otherwise the DIP will be queued for the Final Review.

==================================
Posts in this thread that do not adhere to the following rules will be deleted at the DIP author's discretion:

* All posts must be a direct reply to the DIP manager's initial post, with only two exceptions:

    - Any commenter may reply to their own posts to retract feedback contained in the original post

    - The DIP author may (and is encouraged to) reply to any feedback solely to acknowledge the feedback with agreement or disagreement (preferably with supporting reasons in the latter case)

* Feedback must be actionable, i.e., there must be some action the DIP author can choose to take in response to the feedback, such as changing details, adding new information, or even retracting the proposal.

* Feedback related to the merits of the proposal rather than to the contents of the DIP (e.g., "I'm against this DIP.") is allowed in Community Review (not Final Review), but must be backed by supporting arguments (e.g., "I'm against this DIP because..."). The supporting arguments must be reasonable. Obviously frivolous arguments waste everyone's time.

* Feedback should be clear and concise, preferably listed as bullet points (those who take the time to do an in-depth review and provide feedback in the form of answers to the questions in the documentation linked above will receive much gratitude). Information irrelevant to the DIP or which is not provided in service of clarifying the feedback is unwelcome.
March 05, 2021
On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
> This is the feedback thread for the first round of Community Review of DIP 1040, "Copying, Moving, and Forwarding".
>

Minor bug: The functions `gun` and `sun` in the last use section look like they're missing an S parameter.


March 05, 2021
On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
> This is the feedback thread for the first round of Community Review of DIP 1040, "Copying, Moving, and Forwarding".

In the section on "Last Use":

> Module level functions that contain nested functions or lambda functions that access an outer local variable will not be subject to last use dataflow analysis. The reason is that a pointer to the nested function or lambda could be escaped from the containing function

This rule has the potential to cause "spooky action at a distance," where a change to one part of a function silently causes the meaning of seemingly-unrelated code in the same function to change. For example:

    struct EMO { /* ... */ }

    version (Before)
    void example()
    {
        int a;
        EMO b;
        fun(a);
        gun(b); // last use of b -> move
    }

    version (After)
    void example()
    {
        int a;
        EMO b;
        void helper() { fun(a); } // DFA disabled here
        helper();
        gun(b); // no DFA -> copy
    }

Ideally, only variables that are actually accessed from nested functions would have their last-use analysis disabled. In the example above, that would include `a`, which is accessed in `helper`, but not `b`.
March 05, 2021
On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
> This is the feedback thread for the first round of Community Review of DIP 1040, "Copying, Moving, and Forwarding".

There should be at least one example demonstrating WHY a non-default move constructor might be useful, not just a syntactical description of HOW to declare one. How about including an example struct with an interior pointer that needs to be updated whenever it is moved or copied?
March 06, 2021
On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
> This is the feedback thread for the first round of Community Review of DIP 1040, "Copying, Moving, and Forwarding".

First, very good proposal (in fact, I sent something very similar to Andrei, Walter and Atila in 2019, and I hope this had some influence over this proposal). For completeness, I will paste this email at the end of this reply.

It cover some subtle but important point, notably:
 - The argument is invalid after this move, and is not destructed.
 - Moving an EMO means that the responsibility for destruction of that value moves along with it.

This is something I stressed out at the time, because it has profound consequences at the ABI level. It is unfortunate that this design goal - while achieved - has dropped from the rationale. To stress out how important it is, let me contrast with C++.

In C++, the caller remains in charge of the destruction of an object. This means that structs that are not trivially destructible, movable or copyable need to be passed by reference. This introduce many indirections in the generated code, but, worse, adds a ton of pressure on the alias analysis, which will prevent the optimizer to do its job properly.

The second downside is that structs always need to have a "null" state. They need to remain destructible, even after the callee move the object. This forces the callee to do extra work in order to set the moved object into a null state and extra work from the caller to check for the null state during destruction. A concrete example of this is for instance that a lock which locks a mutex in its constructor and unlocks it in its destructor now must check for the mutex being nulled in its destruction, introducing an unecessary branch in all path that unlock the mutex.

These are serious downside in the C++ way, and this proposal should make it explicit in its rationale that both problems must be avoided:
 - Ability to pass non POD as value even at the ABI level.
 - Ability to have movable structs without a "null" state.

Another aspect of the proposal that needs improvement is the "Last Use" section. the description of this last use is incomplete - and maybe doesn't adopt the right approach, because listing all scenarii is always going to be a challenge. One such scenario is the case of divergent branches:

S s;
if (...) {
    // Even though it is
    fun(s);
} else {
    // ...
}

The way to cover all base is to go for a fixed point analysis. A fixed point analysis approach would go ass follow: define a state for all local variable, then follow the control flow updating the state of said variable. When a merge happen in the control flow, verify that both both path through which you arrive at the merge point have a different state, then apply a fix and reprocess affected branches of the control flow, up to the point all merge point are consistent - you reached a fixed point.

Adopting such an approach will ensure the behavior is not only defined for current control flow constructs, but also whatever might come up in the future. In the pasted email, I go over in more detail about what a fixed point analysis could look like for this. While it is likely not formal enough to make it into the DIP as this, it makes for a good starting point, IMO.

======

Hi both of you,

This is a topic I discussed with Atila and Andrei informally, and now I'd to loop you in Walter. If this gains traction, it'll be time for a DIP, but I'd really like to gather some feedback before, as putting everything in a format that is formal enough is seriously time consuming.

As I assume Walter already knows, C++ copy/destruction semantic is actually quite an important barrier to optimization. For the sake of making the point, I'm going to use unique_ptr and shared_ptr as example, and what we imagine equivalent in D would look like. We will also consider that the compiler inline ctor/dtor and copies.

Let's consider this sample code:

void foo(unique<T> ptr):

unique<T> t = ...;
foo(move(t));

The first thing we consider s that unique<T>, as well as shared<T> are not PODs. This is rather obvious, as this is the whole point of having them, if we wanted a POD, we'd use a raw pointer. In C++, non POD must have an address. in D, they must have an address for the ctor to run, but after this, the compiler is free to do whatever it wants as D struct are movable by default.

An interesting side effect of that fact is that non POD are ALWAYS passed by reference in C++, never by value. The compiler creates a temporary in the caller in which the copy is stored. The code is lowered to something that look like this:

foo(T** ptrref);

T* t = ...;
T* tmp = nullptr;
swap(t, tmp);
foo(&tmp);
if (tmp != null) destroy(tmp);
if (t != null) destroy(t);

The compiler will have no problem figuring out that t is always null, and therefore optimize away its destruction, but the tmp one remains. A second order problem is that EVERY struct must have some sort of 'null' state. While this is fairly obvious what it is for a pointer (!) it is the source of many trap when it comes to correctness for others types of structs. Imagine a lock, for instance, it now has to check every time if the mutex it refers to has been nulled or not, introducing a branch every time you unlock the mutex.

Now immagine that foo does something like:

void foo(unique<T> ptr) {
  global = move(ptr);
}

Now foo end up having to go through an indirection to get the value of ptr, which right there is 3 cycles at best, a cache miss at worse. But worse, it MUST store null into ptr that the caller will then dutifully check is null and decide to not destroy anything. t will do so in 100% of the case, yet the compiler is unable to do anything to optimize it away. In addition, store to load forwarding has some bizarre perf edge cases on many CPUs (that is, loading an address that is still in the store buffer and hasn't hit cache yet) so it is something you'd like to see optimized.

If we replace the unique<T> by a shared<T> in the example, it gets worse because the increment/decrement operation cannot be optimized away at all. This result in code where you got to sprinkle calls to move all over the place constantly and it's a guarantee that you'll forget to put it sometime and end up with a ton of unexpected copies.

Here is what I propose.

1/ The ABI for non POD is the exact same as for POD. it means that the responsibility of the destruction falls on the the foo method in our example, not on the caller. This means it can be passed in registers in the case of unique ptr. The obvious gotcha is that the adress of the this pointer will be different in foo that it was when the copy constructor was called. D provides no guarantee on that front anyways, so I think we are good on that front.

Another side effect is that we lose the "russian doll" effect of construction destruction when things are moved to another function. However, by specifying that the copy need to be made in LTR order for parameters, and destruction from the last parameter to the first, we end up with this remaining true almost all the time, which is probably the right tradeof.

In our example, it means that foo can receive the unique<T> in a register, then the compiler can see that the value when we exit the function is always null and can optimize away the destruction.

2/ Decide when to copy using the following algorithm:

We go over the function body, keeping track of a status per lvalue that exist: destroyed, alive or consumed. All parameters, globals, or value reached through indirections are considered initially alive. All locals are considered destroyed initially, then alive once the constructor as run/initial value is set.

When an lvalue is converted into an rvalue, for instance when returned, when passed as argument, when assigned to another lvalue, etc... the following happens:
 - if it is alive, it is marked consumed.
 - if it is consumed, then we backtrack to the point at which it was consumed and insert a copy operation. The lvalue is reconsidered alive after the copy, the the algorithm restart from there.
 - if it is destroyed, then this is an error (this should basically never happen and is likely a compiler bug).

When an lvalue needs to be destroyed, the compiler does as follow:
 - if it is alive, call the desructor.
 - if it is consumed, do nothing.
 - if it is already destroyed, then once again, you probably have a compiler bug.

This is relatively straightforward, but there is one more tricky situation, is when you reach a "merge" point i the control flow, for instance, after an if/then/else construct. In this case, a variable can have different state for each codepath. The resolution happens as follow:
 - alive/alive : all good.
 - alive/destroyed: error. Some codepath did not initialize the variable or you have  compiler bug.
 - alive/consume: backtrack the consume path and insert copies. The lvalue marked alive.
 - consume/destroyed: Mark destroyed.

When an lvalue is assigned to, it's previous value is move to a temporary and then destroyed immediately after the assignment finishes. The temporary also keeps the state so destruction only occurs when the variable was alive.

All rvalues must be immediately returned, assigned, passed as argument or destroyed by calling the constructor.

Existing optimizations such as RVO naturaly arise out of this process, but this is much more generic and powerful, as it generate the minimal number of copies possible assuming structs are movable, which is what we want.

In case of loops, the state tracked by the algorithm at the end of the loop can affect the merge at the top of the loop, so you may have to pass over the loop twice. Same wise, backtracking may cause you to have to go over some code twice, but if you play with the idea you'll figure out that it converge pretty damn quickly and that it is not easy to come up with example requiring many passes, and it ALWAYS converges, if for no other reason that it inserted copy at each point a C++ compiler would.

In practice, this means that

shared<T> t = ...;
foo(t);

Would simple move t's ownership to foo and never generate a copy. However, if later one does

shared<T> t = ...;
foo(t);
bar(t);

Then the call to foo will be patched by the compiler to insert a copy. In both cases, t's destructor will never be called because it is consumed by the end of the control flow, so destruction is a noop.

This is really where we need to go IMO. We avoid so many copy by basically moving by default that and solution such as a RC solution will be significantly better than C++'s, and on par with stuff like ARC.

In addition, the same algorithm can be used to ensure that all fields are initialized for constructors. The start in the destroyed state and must all be in the alive state by the end of the constructor's execution.
March 08, 2021
On 3/6/2021 12:28 PM, deadalnix wrote:
> First, very good proposal (in fact, I sent something very similar to Andrei, Walter and Atila in 2019, and I hope this had some influence over this proposal). For completeness, I will paste this email at the end of this reply.
> 
> It cover some subtle but important point, notably:
>   - The argument is invalid after this move, and is not destructed.
>   - Moving an EMO means that the responsibility for destruction of that value moves along with it.
> 
> This is something I stressed out at the time, because it has profound consequences at the ABI level. It is unfortunate that this design goal - while achieved - has dropped from the rationale. To stress out how important it is, let me contrast with C++.
> 
> In C++, the caller remains in charge of the destruction of an object. This means that structs that are not trivially destructible, movable or copyable need to be passed by reference. This introduce many indirections in the generated code, but, worse, adds a ton of pressure on the alias analysis, which will prevent the optimizer to do its job properly.

> The second downside is that structs always need to have a "null" state. They need to remain destructible, even after the callee move the object. This forces the callee to do extra work in order to set the moved object into a null state and extra work from the caller to check for the null state during destruction. A concrete example of this is for instance that a lock which locks a mutex in its constructor and unlocks it in its destructor now must check for the mutex being nulled in its destruction, introducing an unecessary branch in all path that unlock the mutex.
> 
> These are serious downside in the C++ way, and this proposal should make it explicit in its rationale that both problems must be avoided:
>   - Ability to pass non POD as value even at the ABI level.
>   - Ability to have movable structs without a "null" state.

Ok, good idea, although to be clear this is implicit in the Description.

> Another aspect of the proposal that needs improvement is the "Last Use" section. the description of this last use is incomplete - and maybe doesn't adopt the right approach, because listing all scenarii is always going to be a challenge. One such scenario is the case of divergent branches:
> 
> S s;
> if (...) {
>      // Even though it is
>      fun(s);
> } else {
>      // ...
> }
> 
> The way to cover all base is to go for a fixed point analysis. A fixed point analysis approach would go ass follow: define a state for all local variable, then follow the control flow updating the state of said variable. When a merge happen in the control flow, verify that both both path through which you arrive at the merge point have a different state, then apply a fix and reprocess affected branches of the control flow, up to the point all merge point are consistent - you reached a fixed point.

That's exactly how the @live analysis is currently implemented. I didn't know it was called a fixed point analysis.

The downside of this is the data flow analysis in @live is a bit expensive. Perhaps the DIP should allow for the DFA being optional, and assume worst case (i.e. never last use) for an initial implementation.


> Adopting such an approach will ensure the behavior is not only defined for current control flow constructs, but also whatever might come up in the future. In the pasted email, I go over in more detail about what a fixed point analysis could look like for this. While it is likely not formal enough to make it into the DIP as this, it makes for a good starting point, IMO.

The idea with DFA is to not think in terms of if, while, foreach, etc. But to think instead of flow as nodes connected by edges. In other words, replace all the structured code with goto's. Anything that can be lowered to gotos will then work naturally.

Andrei and I have discussed "last use" optimizations for over a decade :-) but I always balked because it needed DFA.

Thanks for reposting the email. It's full of good information.
March 08, 2021
On 3/5/2021 11:15 AM, Paul Backus wrote:
> On Friday, 5 March 2021 at 12:20:36 UTC, Mike Parker wrote:
>> This is the feedback thread for the first round of Community Review of DIP 1040, "Copying, Moving, and Forwarding".
> 
> In the section on "Last Use":
> 
>> Module level functions that contain nested functions or lambda functions that access an outer local variable will not be subject to last use dataflow analysis. The reason is that a pointer to the nested function or lambda could be escaped from the containing function
> 
> This rule has the potential to cause "spooky action at a distance," where a change to one part of a function silently causes the meaning of seemingly-unrelated code in the same function to change. For example:
> 
>      struct EMO { /* ... */ }
> 
>      version (Before)
>      void example()
>      {
>          int a;
>          EMO b;
>          fun(a);
>          gun(b); // last use of b -> move
>      }
> 
>      version (After)
>      void example()
>      {
>          int a;
>          EMO b;
>          void helper() { fun(a); } // DFA disabled here
>          helper();
>          gun(b); // no DFA -> copy
>      }
> 
> Ideally, only variables that are actually accessed from nested functions would have their last-use analysis disabled. In the example above, that would include `a`, which is accessed in `helper`, but not `b`.

It is possible to do the DFA through nested functions. I plan on doing that with @live functions.

What the DIP should make clear, however, is that changing copies to moves should be regarded as implementation dependent, i.e. it is a mistake if the user attempts to rely on a particular sequence of moves and copies. Much like relying on this for NRVO is a mistake. NRVO has always been an optional optimization, and if it happens or not is difficult to determine just by looking at the code. In practice, the only problem that has cropped up has been pressure to find more cases where NRVO can be applied, so I think we're good with labeling the copy=>move semantics as an implementation-dependent optimization.
March 08, 2021
On 08.03.21 11:21, Walter Bright wrote:
>>
>>
>> The way to cover all base is to go for a fixed point analysis. A fixed point analysis approach would go ass follow: define a state for all local variable, then follow the control flow updating the state of said variable. When a merge happen in the control flow, verify that both both path through which you arrive at the merge point have a different state, then apply a fix and reprocess affected branches of the control flow, up to the point all merge point are consistent - you reached a fixed point.
> 
> That's exactly how the @live analysis is currently implemented. I didn't know it was called a fixed point analysis.

The general framework is also called abstract interpretation, where the merge operation is called a join of abstract elements. There's also a meet operation that can introduce information from conditionals into the state. (E.g. if you want to do value range propagation this can be useful.) In general, abstract interpretation does not converge to a fixed point in finite time, then you need to introduce some sort of widening operator. In practice, this is often the main challenge to make it work.
March 08, 2021
On Monday, 8 March 2021 at 11:28:53 UTC, Timon Gehr wrote:
> On 08.03.21 11:21, Walter Bright wrote:
>>> [...]
>> 
>> That's exactly how the @live analysis is currently implemented. I didn't know it was called a fixed point analysis.
>
> The general framework is also called abstract interpretation, where the merge operation is called a join of abstract elements. There's also a meet operation that can introduce information from conditionals into the state. (E.g. if you want to do value range propagation this can be useful.) In general, abstract interpretation does not converge to a fixed point in finite time, then you need to introduce some sort of widening operator. In practice, this is often the main challenge to make it work.

Just curious, is there any connection here to symbolic execution?
March 08, 2021
On Monday, 8 March 2021 at 14:01:33 UTC, Imperatorn wrote:
>
> Just curious, is there any connection here to symbolic execution?

When I look at the wikipedia article about symbolic execution, it looks a lot like symbolic model checking, and Patrick Cousot has somewhere shown that model checking can be seen as a subform of abstract interpretation.

« First   ‹ Prev
1 2 3