Escape Analysis & Owner Escape Analysis

Escape Analysis & Owner Escape Analysis
Aug 24 Richard (Rikki) Andrew Cattermole
Aug 24 Doigt
Aug 24 IchorDev
Aug 24 Richard (Rikki) Andrew Cattermole
Aug 29 IchorDev
Aug 29 Richard (Rikki) Andrew Cattermole
Aug 30 IchorDev
Sep 03 Richard (Rikki) Andrew Cattermole
Sep 03 Dennis
Sep 04 Richard (Rikki) Andrew Cattermole
Sep 04 Dennis
Sep 04 Richard (Rikki) Andrew Cattermole
Sep 04 jmh530
Sep 05 IchorDev
Sep 05 Richard (Rikki) Andrew Cattermole
Sep 22 IchorDev
Sep 22 Richard (Rikki) Andrew Cattermole
Sep 04 IchorDev
Sep 04 Richard (Rikki) Andrew Cattermole

August 24

Posted by Richard (Rikki) Andrew Cattermole

Permalink

Richard (Rikki) Andrew Cattermole

Permalink

As a follow-up to the recent DIP1000 meeting where it was agreed to start compiling a list of its failings and that inference is important to such a design, I am also posting my proposal for a complete replacement.

To those who have tried DIP1000 and then dumped it, I am interested to know how you find the owner escape analysis of this proposal in terms of being restrictiveness.
Please evaluate it, so I can know if there is a pattern that needs resolving (if possible).

Latest: https://gist.github.com/rikkimax/0c1de705bf6d9dbc1869d60baee0fa81

Current: https://gist.github.com/rikkimax/0c1de705bf6d9dbc1869d60baee0fa81/c369cb4e9416298c6cf348915205959ee272a5f8

This proposal is an attempted potential replacement for both DIP1000 and @live.
It is in recognition that we may not be able to get DIP1000 and @live fully functional in making memory that is borrowed from owners tracked within @safe code.
To do this, an escape set is described per parameter to describe the relationship of an input to its output.
An output is one that is tied to one or more inputs, and an input is any pointer that is stored in some place.

I do want to emphasize at this point that the escape set should be more or less be perfect in terms of inference due to inference as being a side effect of the verification. Rather than a separate process.
Unless you go virtual, or lack a body the need to annotate should be minimal.
A nice side effect of this, is that the compiler will be able to promote memory to the stack without you annotating scope.

Escape analysis provides guarantees to its caller on the relationship to outputs for each function parameter. It is cross-scope aware.
Owner escape analysis provides guarantees to the callee that the inputs for each output will remain valid for the life of the output value.
They are complimentary of each other, enabling each to be simpler.

Here is a reference counted type example, although this can equally apply to any other pointer type:

struct RC {
    int* borrow() @escapevia(return);
}

RC first(/* @escapevia(return) */ RC input) {
    int* borrowed1 = input.borrow();
    // input is an owner, and therefore protected due to the borrow borrowed1
    input = RC.init; // Error

    int* borrowed2 = second(borrowed1);
    // borrowed1 is an owner, and therefore protected due to the borrow borrowed2
    borrowed1 = null; // Error

    return input;
}

int* second(/* @escapevia(return) */ int* second) {
    writeln(*second);
    return second;
}

How it interacts with const:

struct S {
      int field;

@safe:

    bool isNull() const {
        return false;
    }

    void makeNull() {
    }
}

S s;
int* field = &s.field;

writeln(s.isNull); // ok
s.makeNull(); // Error: Variable `s` has a borrow and may not be mutated by calling `makeNull`.

This also works when not the this pointer, but instead is a function parameter by-ref:

void func(ref const S s) {
 	s = S(2); // Error: cannot modify `const` expression `s`
}

August 24

Re: Escape Analysis & Owner Escape Analysis

Posted by Doigt
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Doigt

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

Hey, this is something I can actually understand. DIP1000 is so confusing in the semantics it uses to express the code that I couldn't even get started on it and actually try to use it. This in comparison, is a design that is much more concise and easy to wrap your head around.

August 24

Re: Escape Analysis & Owner Escape Analysis

Posted by IchorDev
in reply to Richard (Rikki) Andrew Cattermole

Permalink

IchorDev

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Saturday, 24 August 2024 at 12:20:13 UTC, Richard (Rikki) Andrew Cattermole wrote:

Unfortunately I never used DIP1000 much. I do like the look of your pattern though.

int* borrow() @escapevia(return);

So this escapevia(return) is the same as return?

    int* borrowed1 = input.borrow();
   // input is an owner, and therefore protected due to the borrow borrowed1
   input = RC.init; // Error
   int* borrowed2 = second(borrowed1);
   // borrowed1 is an owner, and therefore protected due to the borrow borrowed2
   borrowed1 = null; // Error

I can imagine this could cause very long annoying chains of ‘ugh just let me reuse this pointer’. Disallowing modifying the pointer itself is really odd, but I see why it’s done—otherwise you might have to reckon with there now being several ‘owners’ created downstream. I just wish there was a nice way around that.

If this pattern will genuinely solve the various issues people have with DIP100, then I hope it gets implemented.

August 25

Re: Escape Analysis & Owner Escape Analysis

Posted by Richard (Rikki) Andrew Cattermole
in reply to IchorDev

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to IchorDev

Permalink

On 25/08/2024 1:15 AM, IchorDev wrote:
> On Saturday, 24 August 2024 at 12:20:13 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> To those who have tried DIP1000 and then dumped it, I am interested to know how you find the owner escape analysis of this proposal in terms of being restrictiveness.
>> Please evaluate it, so I can know if there is a pattern that needs resolving (if possible).
> 
> Unfortunately I never used DIP1000 much. I do like the look of your pattern though.

Thanks!

>> ```d
>> int* borrow() @escapevia(return);
>> ```
> 
> So this `escapevia(return)` is the same as `return`?

Yes, except no.

``return`` maps to either the return value or the this pointer.

>> ```d
>>     int* borrowed1 = input.borrow();
>>    // input is an owner, and therefore protected due to the borrow borrowed1
>>    input = RC.init; // Error
>>    int* borrowed2 = second(borrowed1);
>>    // borrowed1 is an owner, and therefore protected due to the borrow borrowed2
>>    borrowed1 = null; // Error
>> ```
> 
> I can imagine this could cause very long annoying chains of ‘ugh just let me reuse this pointer’. Disallowing modifying the pointer itself is really odd, but I see *why* it’s done—otherwise you might have to reckon with there now being several ‘owners’ created downstream. I just wish there was a nice way around that.

You can assign to a variable, its just can't have any owners.

I.e. this will work:

```d
int* borrowed = acquire(owner);

borrowed = new int;
```

August 29

Re: Escape Analysis & Owner Escape Analysis

Posted by IchorDev
in reply to Richard (Rikki) Andrew Cattermole

Permalink

IchorDev

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Saturday, 24 August 2024 at 13:20:36 UTC, Richard (Rikki) Andrew Cattermole wrote:

You can assign to a variable, its just can't have any owners.

I.e. this will work:

int* borrowed = acquire(owner);

borrowed = new int;

That’s what I meant—there are situations where you can’t reassign the pointer even though it’s not referenced.

Also, a couple of minor suggestions:
First one, which is a bit silly: I assumed the way to indicate return via ref/out parameters would be @escapevia(ref) or @escapevia(out). Using the parameter name makes more sense, but having a way to apply the escape to all ref/out parameters would be neat. Again, not exactly a showstopper.
Second thing: @escapevia is very long (especially when combined with its identifiers), and doesn’t even sound grammatically correct—it should be @escapesvia, as in ‘int x escapes via return’. If we don’t care about it reading correctly then @escapeset makes more sense—that’s what the DIP refers to it as—and the natural shortening would be @escape. Of course, I’d prefer something really short like @esc because typing is painful (I’m not really typing this message) but also because with identifiers like __parameters my fully-attributed library function signatures will look like utter earwax. I know you’ll say ‘but they can be inferred’, but unfortunately documentation generators don’t read between the lines like that; and I want my users to be able to know what parameters my functions escape without reading my function bodies or having to pragma(msg, typeof(someFunction)).

August 30

Re: Escape Analysis & Owner Escape Analysis

Posted by Richard (Rikki) Andrew Cattermole
in reply to IchorDev

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to IchorDev

Permalink

On 30/08/2024 10:06 AM, IchorDev wrote:
> On Saturday, 24 August 2024 at 13:20:36 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> You can assign to a variable, its just can't have any owners.
>>
>> I.e. this will work:
>>
>> ```d
>> int* borrowed = acquire(owner);
>>
>> borrowed = new int;
>> ```
> 
> That’s what I meant—there are situations where you can’t reassign the pointer even though it’s not referenced.
> 
> Also, a couple of minor suggestions:
> First one, which is a bit silly: I assumed the way to indicate return via `ref`/`out` parameters would be `@escapevia(ref)` or `@escapevia(out)`. Using the parameter name makes more sense, but having a way to apply the escape to all `ref`/`out` parameters would be neat. Again, not exactly a showstopper.

How often do you have multiple ref/out parameters and will be escaping to all of them for the same input parameter?

Currently I don't believe a special case is necessary for this, so needs to be more than just a nice to have.

> Second thing: `@escapevia` is very long (especially when combined with its identifiers), and doesn’t even sound grammatically correct—it should be `@escapesvia`, as in ‘int x escapes via return’. If we don’t care about it reading correctly then `@escapeset` makes more sense—that’s what the DIP refers to it as—and the natural shortening would be `@escape`. Of course, I’d prefer something **really** short like `@esc` because typing is painful (I’m not really typing this message) but also because with identifiers like `__parameters` my fully-attributed library function signatures will look like utter earwax. I know you’ll say ‘but they can be inferred’, but unfortunately documentation generators don’t read between the lines like that; and I want my users to be able to know what parameters my functions escape without reading my function bodies or having to `pragma(msg, typeof(someFunction))`.

I could do ``@escape(...)``.

But yeah, if you're annotating good chance you'll want to do one parameter per line. Not ideal, but if you want control, some sacrifices towards convenience is gonna happen.

August 30

Re: Escape Analysis & Owner Escape Analysis

Posted by IchorDev
in reply to Richard (Rikki) Andrew Cattermole

Permalink

IchorDev

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Thursday, 29 August 2024 at 23:48:37 UTC, Richard (Rikki) Andrew Cattermole wrote:

I could do @escape(...).

Sounds good.

September 03

Re: Escape Analysis & Owner Escape Analysis

Posted by Richard (Rikki) Andrew Cattermole
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

I've done an almost complete rewrite, I expect this to be close to the final version:

- Globals only need to be loaded from, after that they may become an owner
- I changed how ``scope`` works, it is no longer a flag, its instead a relationship strength, establishing a relationship strength is what changed the majority of the document.
- New attribute ``@move`` inferred, allows for functions like swap and move to be modelled without you needing to add said attribute (DIP1000 doesn't do any of this).
- Acknowledgement that DIP1000 attributes can live side by side these ones, meaning the migration from DIP1000 would be a smooth one.

Current: https://gist.github.com/rikkimax/0c1de705bf6d9dbc1869d60baee0fa81/5dba16b3b1fbe250b31ae237dda2ffc7b66f9399

As I believe this is more less complete, I'll include a copy here.

--------------------------------------------------------------------------------------------------------------

# Escape Analysis
| Field           | Value            |
|-----------------|-----------------------------------------------------------------|
| DIP:            | (number/id -- assigned by DIP Manager)            |
| Author:         | Richard (Rikki) Andrew Cattermole <firstname@lastname.co.nz>                        |
| Implementation: | (links to implementation PR if any)            |
| Status:         | Draft            |

## Abstract

The movement of pointers within a program graph easily escapes known points that own that memory within the call stack of a given thread. This logical error can result in program termination or undetected corruption of the program state. This proposal is a new attempt at preventing this corruption within the ``@safe`` code.

## Contents
* [Rationale](#rationale)
* [Prior Work](#prior-work)
* [Description](#description)
* [Breaking Changes and Deprecations](#breaking-changes-and-deprecations)
* [Reference](#reference)
* [Copyright & License](#copyright--license)
* [Reviews](#reviews)

## Rationale

In a review of the existing escape analysis solution implemented in D's reference compiler DIP1000, there is one major limitation of what it models and assumption growth to facilitate functionality.

The implementation of DIP1000 models a single output variable per function, this is the return value or if ``void`` the first parameter (could be the ``this`` pointer). In practice functions typically have more than one output, this includes mutable pointers in, ``ref`` and ``out`` function parameters.

```d
int* /* output */ func();

struct S {
	int* /* output */ method1();
	void method2() /* output */;
}
```

The relationship between parameters is modelled using the ``return ref`` and ``return scope`` attributes. These communicate to the compiler the varying input and how it relates to the output for that parameter.

Needing two different attributes to determine the relationship status between parameters has been highly incommunicable to experienced programmers.

Due to it not being able to model multiple outputs, a lot of typical D code cannot be safely represented using DIP1000. The design does not protect you from extending past the modelled subset of the language.

To resolve both of these core issues in the existing design, an escape set must be modelled per parameter. While this resolves the callee's side, it does not protect the caller from misusing the callee.  The design DIP1000 attempts to solve this by modelling the relationship between parameters using the two different attributes.

Another solution to this problem is to utilize the information provided by escapes and inverse it, given an output and given the inputs that form it, protect the inputs so that nothing can invalidate the output. This resulted in the proposal that was ``@live``, an opt-in analysis that does not communicate to either the callee or caller any guarantees cross-function, making it functionally irrelevant to the guarantees of DIP1000.

An opt-in solution to ownership does not allow for reference counting to occur safely. To safely do this, the referenced counted owner must be pinned and made effectively read-only so that both a reference to it and the borrowed resource may be passed around. This was a [blocker](https://forum.dlang.org/post/v0eu64$23bj$1@digitalmars.com) determined by Walter and Timon for adding reference counting to the language.

Furthermore without the entry point to escape analysis having analysis associated with it, there is no differentiation of what can constitute of a safe to borrow from source and what can't be. An example of this is with a global, in the case of a variable thread local storage, it is possible in fully ``@safe`` code with DIP1000 turned on to cause a segfault.

```d
import std;

int* tlsGlobal;

@safe:

void main() {
    tlsGlobal = new int(2);
    assert(*tlsGlobal == 2);

    toCall();
    assert(*tlsGlobal == 2); // Segfault
}

void toCall() {
    tlsGlobal = null;
}
```

## Prior Work

Escape analysis as a subject matter is primarily an [analysis of graphs](https://dl.acm.org/doi/10.1145/320385.320400). How they are mutated and who owns them at what places. Modelling this can be an expensive set of operations in the form of data flow analysis. For sufficient and best experience, a full program analysis is needed with a full graph of manipulation and logic therein analysed.

Native programming languages do not align themselves to full program analysis, due to the separate compilation model. D is a native programming language that uses this model almost exclusively. For this reason, it cannot use a full program analysis and full program graph analysis for modelling escaping. Instead, a flattened view of the graph must be representable inside a function signature.

At the time of this proposal, a solution for escape analysis has been implemented in the D reference compiler that is commonly referred to by its DIP number, DIP1000. This does not cover memory ownership guarantees, instead ``@live`` as an opt-in attribute enables some localized to the given function guarantees.

In Rust ownership is a [transfer based system](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html), so that only one variable has any ownership of memory. In contrast to D, where this is modelled and attempting to enforce this would not match how garbage collected memory would be used. Further [guarantees](https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html) are given, in that when a borrow occurs from an owner, only one mutable borrow is allowed in a given scope. This complements the ownership transfer system as it guarantees nobody else has the potential for aliasing.

## Description

This proposal introduces escape analysis, owner escape analysis along with a way to know if a variable associated with an argument has changed its value post function call.

What escape analysis and its complement owner escape analysis does, is it protects against invalidation of memory ownership whilst one or more borrows exist.

The grammar changes for the new function are described here, removal of DIP1000 and ``@live`` are done in its own heading ``Removal of Existing Language Elements``.

```diff
AtAttribute:
+    @ EscapeAttribute
+    @ move

ParameterAttributes:
+    @ EscapeAttribute

+ EscapeAttribute:
+    escape ( Identifiers )
+    escape ( )
+    escape

FuncAttr:
+    FuncAttrMove

+ FuncAttrMove:
+    No
```

The semantic analysis for both analysis, is done at the same time as they are guarantees provided in complement of each other and do not exist in isolation. A switch should be provided to disable this analysis should a use case is required to not perform it in the form of ``--disable-memorysafety``. If it is not set, it will be enabled for a given edition and above by default. For any edition below this will include the inferring of ``@escape`` and ``@move`` attribute, however no errors will be generated for either attribute.

There is some potential for the escape set on a parameter to be explosive in nature for mangling. At this time no specific mangling scheme is suggested, but it is allowed in this proposal for one to be implemented.

### Relationships

An expression is said to have a set of relationships between its inputs and outputs,  with some kind of transformation applied to the inputs to get the outputs. This is described using the formula ``T(inputs...) = (outputs...)``.

A function prototype, or function pointer declares this relationship, without providing the transformation function. An example of a transformation in the form of an identity function is given:

```d
int* identity(/* has relationship to return */ int* input) => input;
```

The return value of the ``identity`` function is the output and has a relationship to the ``input`` parameter.

Each relationship of an input to its outputs can be described as having a strength. These strengths are:

- No relationship
- Weak, comes from input and influences output
- Strong, requires the input to be valid for the output to be valid

For example, to get a strong relationship you can take a pointer to stack memory:

```d
int input;
/*has a strong relationship*/ int* output = &var;
```

In this example, the variable ``input`` must outlive the variable ``output``, if you don't you will get stack corruption.

Another way you can get a strong relationship is to assign a copy of the value that is stored in an already strong relationship variable.

```d
/*has a strong relationship*/ int* input = ...;
int* output = var; // has a strong relationship
```

The way to explicitly establish the link between a variable and its initializer as strong is to annotate it with ``scope``.

```d
scope int* input = ...;
int* output = var; // has a strong relationship
```

These three behaviors of establishing a relationship between two variables also applies when a containing type is in play:

```d
struct Input {
	int* ptr;
	int field;
}

Input input1 = ...;
int* output1 = &input1.field; // has a strong relationship between `output1` and `input1`

scope Input input2 = ...;
int* output2 = input2.ptr; // has a strong relationship between `output2` and `input2`
```

The default relationship of a function's outputs to its inputs is weak, unless the argument for a given input has a strong relationship. In the following example the ``input`` variable has a strong relationship to the stack. So when it gets the output from the ``identity`` function call it has a strong relationship to its input too.

```d
scope input = new int;
int* output = identity(input);
// `output` variable has a strong relationship to `input` variable
```

A way to require that a given input has a strong relationship to its outputs is by marking a function parameter as scope.

```d
int* strongIdentity(/* has relationship to return */ scope int* input) => input;
```

Now you can take an input that does not have a strong relationship, and require the output has a strong relationship to it!

```d
int* input = new int;
int* output = strongIdentity(input);
// `output` variable has a strong relationship to `input` variable
```

A weak relationship between an input and output, does not limit the output. It only establishes that there is an relationship to be had. This can be quite useful to composed types like tuples and static arrays:

```d
int* transformation(int* input) {
	int*[2] array;
	array[0] = input; // `array` has a weak relationship to `input`
	array[1] = new int; // GC allocation has no relationships without a constructor call or initializer to form one
	return array[0];
}
```

In the above example, the output value of the function will not be constrained by the variable ``array``. But the weak relationship from the ``array`` variable to ``input`` parameter, will be inherited by the return value, giving the following prototype:

```d
int* transformation(/* has relationship to return */ int* input);
```

Setting up a relationship within a function body is one thing, but to do it within a function signature? That is much harder. In the following example, a weak relationship is established where an input pointer is stored inside an output static array.

```d
void assignElement(ref int*[2] output, /* has a relationship to output */ int* input) {
	output[0] = input;
	output[1] = new int;
}
```

The previous examples in this heading used a raw pointer ``int*`` to establish relationships, the full list of types that affect the formation of relationships are:

- Slices: ``T[]``
- Raw pointers: ``T*``
- Associative arrays: ``T[U]``
- Pointer-containing fields or elements:
	- Structs
	- Unions
	- Static arrays
	- Tuples
- Any by-ref input parameter or variable will have an implicitly strong relationship to its output if it is also by-ref.

Other types, that behave as a value type like ``int`` are unaffected and do not establish a relationship. This means that they may be interacted with without establishing a relationship.

### Escape set

Previously this proposal has limited the terminology to establishing a relationship between a given input to its outputs. In this heading the method for describing this relationship in code is presented.

A new attribute is provided, ``@escape(...)``, within the brackets an escape set is provided using the following identifiers as elements within it:

- Nothing
- ``return``
- ``this``, also applies to the context pointer of a delegate.
- ``__unknown``, for exceptions, and globals.
- ``__parameters``, for all parameters, except the current one.
- Any function parameter names.

This attribute may be placed on function parameters and on the function which is represents the ``this`` pointer.

When the attribute is missing its escape set, it defaults the escape set to ``@escape(return, this, __unknown, __parameters)``. When the annotation is missing, this will indicate that it is to be inferred.

```d
alias D = T delegate(@escape U input1, @escape(return) V input2) @escape(return);

T freeFunction(@escape U input1, @escape(return) V input2);

class C {
	T method(@escape U input1, @escape(return) V input2) @escape(return);
}
```

The escape set only applies to types that are pointers. Non-pointers do not have an escape set, and therefore no ``@escape`` attribute. Function parameters that are non-pointers will have their attribute removed if it is specified. The ``this`` pointer is always a pointer type, even for structs.

For easier reading the empty escape set may be elided for variables that are marked ``scope``. A non-empty escape set must remain on the variable and cannot be elided.

```d
void doSomething(@escape() scope int* input);
```

Will become:

```d
void doSomething(scope int* input);
```

When a function pointer or delegate does not have an annotation for an escape set, it is assumed to be the empty set. This is a safe assumption thanks to the type system enforcing it.

```d
alias F = int* function(int*);

int* identity(@escape(return) int* input){
	return input;
}

F func = &identity; // Error: Variable `func` has type `int* function(int*)` and cannot be assigned `int* function(@escape(return) int*)`
```

Functions that are ``@trusted`` have their function signatures inferred for escapes, but will not error within the body or when the body does not match the signature. For ``@safe`` functions these are inferred but will error within the body and when the signature does not match the body. Lastly ``@system`` functions will not be analysed for escapes and any annotation of escapes upon its signature will be ignored.

The compiler has no way to assume what an escape set contains for a function declaration without a body. To verify it there is an implicit assumption that the linker will catch it by comparing symbol names with the help of mangling. To prevent accidental assumptions creeping into ``@safe`` code, any function without a body that is not fully annotated for the escape sets, will be downgraded to ``@system``. The following function declaration would be treated as if it wasn't annotated as ``@safe``.

```d
int* someFunction(int*, @escape() int*) @safe;
```

But this will be ``@safe``:

```d
int* someFunction(scope int*, @escape() int*) @safe;
```

Not all ABI's support name mangling of escape sets. By taking the responsibility of escape annotation requirement off the linker, this guarantees the compiler is able to provide stronger guarantees for memory safety analysis without the linker providing a backdoor using innocuous looking code.

When ``scope`` is placed upon a variable, it requires that when a variable is converged to not escape into unknown locations. This means that ``__unknown`` is not allowed to appear in the escape set. This also applies when a weak relationship parameter is upgraded to strong by the argument.

```d
void func1(@escape(__unknown) scope int* ptr); // Error the parameter `ptr` cannot have an escape set that includes `__unknown` and be marked as having a strong relationship `scope`
void func2(@escape(__unknown) int* ptr);

scope int* ptr;
func2(ptr); // Error variable `ptr` has a strong relation and cannot be escaped out through a `__unknown` parameter
```

Overriden methods in classes must have an escape set per parameter that is less than or equal to the parent method's set.

```d
class Parent {
	int* method() @escape(return);
}

class Child : Parent {
	override int* method() @escape(return, __unknown); // Error: the escape set for the `this` pointer on `method` must be equal or lesser than the parent which is `return` not `return, __unknown`
}
```

#### An Input Changed its Known Value

Sometimes an argument will have its value changed from the input. This is quite important for by-ref parameters who may have its value being tracked. To indicate to the compiler that it should not consider the value prior to a call is the same as the one after, the attribute ``@move`` on a parameter will indicate it will have changed. Common functions that demonstrate this behavior are ``swap`` and ``move``.

```d
T move(T)(@move @escape(return) T input) {
	return input;
}
```

At most one escape in the escape set of a parameter, to an output that has only one input may be used to allow the compiler to track movement of a given value between function calls.

```d
void swap(T)(@move @escape(input2) ref T input1,
			 @move @escape(input1) ref T input2) {
	T temp = input1;
	input1 = input2;
	input2 = temp;
}

int* a, b;
swap(a, b);
// Compiler can see that b is in a
// Compiler can see that a is in b
```

All ``out`` parameters will have ``@move`` applied to it automatically and need not be programmer applied.

If the ``@move`` attribute is applied to a parameter that is not by-ref, templated or the parameter type does not have move constructors it is an error.

As an attribute ``@move`` may be inferred if the compiler can see that the input was changed for a given parameter at the end of the called function's body.

```d
struct Unique {
	int* ptr;

	this(/*@move*/ ref Unique other) {
		this.ptr = other.ptr;
		other = Unique.init; // the input into `other` was changed
	}
}
```

#### Analysis

The goal of escape analysis, is to have an accurate accounting of where inputs go to their outputs and how to converge it between scopes. It provides protection from false assumptions on lifetimes creeping into ``@safe`` code.

An example of two scopes, whereupon assignment resets the escape set of an inner variable:

```d
int* outer;

{
	int* inner = ...;
	outer = inner;
	// @escape(outer) inner
	
	// Converge `outer` with any owners of `inner` lifetimes
	inner = ...;
	// @escape() inner
}
```

When converging on multiple sets instead of taking the minimum set and erroring, the analysis will take the maximum set of all the scopes:

```d
int* func(int* input) {
	if (input is null) {
		return new int;
		// @escape() input
	} else {
		return input;
		// @escape(return) input
	}
	// @escape(return) input
}
```

Elements in an array, fields in a class/struct/union are conflated with the variable that stores them in. Supporting awareness and the differentiation of each of these cases is not included in this proposal but a subset coudl be done.

```d
struct S {
	int* field;
}

void handle(int* ptr) {
	S s;
	s.field = ptr;
	// @escape(s) ptr
}
```

The point of convergence matters for lifetime analysis. It occurs like regular function destructor cleanup for a given scope. It happens in reverse order of the declarations. This has consequences, it allows a variable that has a strong relationship, to grow its escape set during its scope, but be a lot smaller at the end.

```d
struct S {
    int* field;
}

int* acquire(ref S s) @safe {
    return s.field;
}

void caller() @safe {
    int x = 2;
    S s = S(&x);

    *acquire(s) = 3;
}
```

Is equivalent to:

```d
struct S {
    int* field;

    this(@escape(this) int* field) @safe {
        this.field = field;
    }
}

int* acquire(@escape(return) ref S s) @safe {
    return s.field;
}

void caller() @safe {
    int x = 2;

    scope xPtr = &x;
    // @escape(xPtr) x, escape set cannot grow

    S s = S(xPtr);
    // @escape(s) xPtr

    int* fooReturn = acquire(s);
    // @escape(fooReturn) s

    *fooReturn = 3;

    __cleanup(fooReturn); // Cleanup code from compiler such as destructors get injected here
    // @escape() s
    __cleanup(s); // Cleanup code from compiler such as destructors get injected here
    // @escape() xPtr
    __cleanup(xPtr); // Cleanup code from compiler such as destructors get injected here
    // @escape() x
    __cleanup(x); // Cleanup code from compiler such as destructors get injected here
    // x escape set is empty, therefore ok
}
```

### Owner Escape Analysis

Seeing what variable contributes to another (or becomes), is one thing, but that does not provide guarantees in of itself. For guarantees to be made the relationship between variables must be made inversely. This inverse relationship describes an output variable as being a borrow to one or more owner input variables.

To establish a borrow, a variable must have one or more relationships to it that are strong.

```d
int owner;
int* borrowed = &owner;
// `borrowed` has a strong relationship to `owner`
```

Function calls:

```d
int* identity(/*@escape(return)*/ int* input) {
	return input;
}

int owner;
int* borrowed = identity(&owner); // Due to `&owner` `borrowed` has a strong relationship to `owner`
```

Borrowed memory is only ever valid, as long as the owners are not mutated. Mutation of the owners could unmap the borrowed memory, or change it in such a way that the program becomes corrupted. When a borrow is seen, the compiler protects the owner from mutation by requiring it to be "effectively const" as long as borrows exist. It cannot be assigned to, or be passed to methods or functions mutably.

```d
struct Top {
	int* field;
}

void func(ref Top owner) @safe {
	int* field = owner.field;
	// owner is now effectively const, it cannot be mutated
	
	owner = Top.init; // Error: The variable `owner` has a borrow and cannot be mutated
	owner.field = null; // Error: The variable `owner` has a borrow and cannot be mutated

	if (field !is null) {
		writeln(*field);
		*field = 2; // ok, fully mutable
	}
}
```

When converging between multiple scopes, the borrowed variables must have the same value in it.

```d
int owner;
int* borrowed;

if (random() > 0.5) {
	borrowed = &owner;
} else {

}
// Error: Variable `borrowed` has two different values in it, it can be owned by `owner` and be null
```

Side effects from method calls must be prevented, otherwise it will be possible to invalidate a borrow unknowingly. An existing language element for this is for checking against mutability, whereby mutable is disallowed but non-mutable allowed.

```d
struct S {
	int field;

@safe:

	bool isNull() const {
		return false;
	}

	void makeNull() {
	}
}

S s;
int* field = &s.field;

writeln(s.isNull); // ok
s.makeNull(); // Error: Variable `s` has a borrow and may not be mutated by calling `makeNull`.
```

The attribute ``@move`` indicates that a function call will mutate the input, and therefore if there are borrows from that variable to error.

```d
void someConsumer(@move scope ref int* input);

int* owner = ...;
int** borrowed = &owner;
someConsumer(owner); // Error: Variable `owner` has a borrow and cannot be moved into the parameter as it would invalidate the borrows
```

#### Global Variables

Not all variables can be tracked throughout a program's lifecycle. Global variables including those in thread local storage, can appear in any point in the call stack multiple times. Pinning of specific values into existance cannot occur for a global for this reason. It can be changed out from under you with no way to prevent it in ``@safe`` code.

Loading a value that is a pointer (including structs with pointer fields), into another will apply a flag onto that variable to say it contains global memory. This corresponds with the ``__unknown`` relationship argument.

```d
int* global;

void func() {
	int* ptr = global;
	// is a global `ptr`
}
```

This is useful information to have, as it informs any memory that tries to contribute to it, that it will be escaped out through ``__unknown`` lifetime.

```d
int** global;

void func() {
	int** globalPtr = global;
	// is a global `globalPtr`

	int value;
	int* ptr = &value;
	// Variable `ptr` is owned by the stack

	*globalPtr = ptr; // Error: variable `ptr` which has a shorter lifetime cannot be placed into globally accessible memory in `globalPtr`
}
```

It isn't limited to a single call frame, it can protect against cross-function scopes as well.

```d
int** global;

void caller() {
	int** globalPtr = global;
	// is a global `globalPtr`

	int value;
	int* ptr = &value;
	// Variable `ptr` is owned by the stack

	called(globalPtr, ptr); // Error: Variable `ptr` which is owned by the stack would escape into a longer lifetime memory that is globally accessible `globalPtr`
}

void called(@escape() int** globalPtr, @escape(globalPtr) int* ptr) {
	*globalPtr = ptr;
}
```

### Removal of Existing Language Elements

The language design elements that are being removed are DIP1000 and ``@live``. Together these attempted to do this proposal but in a non-integrated way that has shown minimal adoption.

```diff
Attribute:
-   return

AtAttribute:
-    @ live

FuncAttr:
- FuncAttrReturn
- FuncAttrLive

- FuncAttrReturn:
-	Nj

- FuncAttrLive:
-	Nm
```

No timeline is specified for removal.

## Breaking Changes and Deprecations

DIP1000 will not be able to be turned on at the same time as this proposal.
Any syntax specific (such as ``return`` attribute) to DIP1000 will break.

Any new semantic analysis would only cause errors to be applied to a new edition and would not affect the base D2 language.

During the transition period from DIP1000 to this proposal, the attributes from each proposal that is not active do not contribute to mangling. This enables attributes from each proposal to live side by side to keep a code base compiling.

## Reference
- [Shape Analysis](https://en.wikipedia.org/wiki/Shape_analysis_(program_analysis)) (type state & memory escapes)
- [@system variables DIP](https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1035.md)
- [# Compositional pointer and escape analysis for Java programs](https://dl.acm.org/doi/10.1145/320385.320400)
- [4.1.  What is Ownership?](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html)
- [4.2.  References and Borrowing](https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html)

## Copyright & License
Copyright (c) 2024 by the D Language Foundation

Licensed under [Creative Commons Zero 1.0](https://creativecommons.org/publicdomain/zero/1.0/legalcode.txt)

## History
The DIP Manager will supplement this section with links to forum discussions and a summary of the formal assessment.

September 03

Re: Escape Analysis & Owner Escape Analysis

Posted by Dennis
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Dennis

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Tuesday, 3 September 2024 at 03:00:20 UTC, Richard (Rikki) Andrew Cattermole wrote:

I've done an almost complete rewrite, I expect this to be close to the final version:

The description is getting clearer every revision, props for that. But it's also becoming increasingly hard for me to rhyme the proposal with the complaints of DIP1000.

Most of the DIP is spent on the 'multiple outputs' problem for separate compilation, inventing a meticulous function signature syntax to capture all kinds of possible assignments between parameters, globals, and the return value. And while this does solve the limitation that a swap function on scope values being impossible with DIP1000, it doesn't address other woes:

Simple safe D

A common sentiment was "I don't care for @nogc @safe, keep the language simple by just using the GC or go @system". While it may be hard to believe for some, DIP1000 is not a breaking change in theory and leaves GC-based code alone. This proposal however breaks @safe code by design - both DIP1000-based code using scope pointers (because of new syntax) and 'regular' GC-based code (because added @live-like semantics).

I agree @live being opt-in per function is unsound, but forcing "effectively const" semantics everywhere in a new edition is not going to please people just happily using the GC.

Attribute soup

return and scope annotations are noisy / confusing, but this proposal adds more and jumbles the existing ones in a way that's not necessarily easier to understand. For a simple int* f(int* x) function, the parameter attributes change in the following way**:

DIP1000	Escape Analysis
`return ref scope`	`scope @escape(return)`
`return ref`	impossible***
`return scope`	`@escape(return)`
`scope`	`@escape()` / `scope`

It solves the return scope and scope return problem, but might have problems of its own:

scope now means two unrelated things: 'strong relationship' and 'default empty escape set'
@escape is the opposite of @escape(), which could be confusing

** I might be wrong, but if so, that really doesn't bode well for the 'communicability' aspect of the lifetime attributes, which the DIP tries to address

*** That's what I take from "Error the parameter ptr cannot have an escape set that includes __unknown and be marked as having a strong relationship scope"

Composability with respect to structs

Explicitly unaddressed

Elements in an array, fields in a class/struct/union are conflated with the variable that stores them in.

Transitive scope

Not mentioned.

All in all, I feel the DIP is too focussed on addressing one issue (multiple outputs) while neglecting others. The most pressing issue is that many people simply don't want D to become like Rust. DIP1000 and @live at least leave 'regular' GC-based D mostly alone: just don't take the address of local variables in @safe functions and you're good. It would be really good if whatever 'escape analysis' D ends up boasting (if any), it would be for the benefit of specialized library types (e.g. RefCounted(T)) without complicating common pointer/array operations in @safe code.

September 04

Re: Escape Analysis & Owner Escape Analysis

Posted by Richard (Rikki) Andrew Cattermole
in reply to Dennis

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Dennis

Permalink

On 04/09/2024 4:37 AM, Dennis wrote:
> On Tuesday, 3 September 2024 at 03:00:20 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> I've done an almost complete rewrite, I expect this to be close to the final version:
> 
> The description is getting clearer every revision, props for that. But it's also becoming increasingly hard for me to rhyme the proposal with the complaints of DIP1000.
> 
> Most of the DIP is spent on the 'multiple outputs' problem for separate compilation, inventing a meticulous function signature syntax to capture all kinds of possible assignments between parameters, globals, and the return value. And while this does solve the limitation that a `swap` function on `scope` values being impossible with DIP1000, it doesn't address other woes:
> 
> ### Simple safe D
> 
> A common sentiment was "I don't care for @nogc @safe, keep the language simple by just using the GC or go @system". While it may be hard to believe for some, DIP1000 [is not a breaking change in theory](https://forum.dlang.org/post/gnuekdxflffjhwlnnwqr@forum.dlang.org) and leaves GC-based code alone. This proposal however breaks @safe code by design - both DIP1000-based code using `scope` pointers (because of new syntax) and 'regular' GC-based code (because added @live-like semantics).
> 
> I agree @live being opt-in per function is unsound, but forcing "effectively const" semantics everywhere in a new edition is not going to please people just happily using the GC.

I can only see us going in one of two directions over this:

- Add a temporally safe D attribute that goes above ``@safe``, so that when you need it you have it, and when you don't you can use ``@safe`` instead.
- Add an effects system.

I don't care which of the two directions we go in, I've done an ideas post over the first. However I suspect the first is the one we as a community may like the best as it silo's the extra protection without forcing effects annotations on everyone else.

Mutation has the side effect of invalidating borrows, it's the only one we have, therefore only one in proposal.

It would be an easy enough swap to change ``@safe`` to ``@tsafe``. But that isn't a decision we need to make here. We can make that prior to launch.

But I do want to make a point here, owner escape analysis only kicks in and forces effectively const on the owner if:

1. You take a pointer to stack memory
2. You receive memory that has a strong relationship (perhaps done explicitly for reference counting!)
3. You take a pointer to a field of struct/class/union

The first two are already provided by DIP1000. That isn't new.
The third is new.

What matters about this, is as long as you are not doing pointer arithmetic (like taking a pointer, or by-ref), you can use GC memory freely without restriction. In a way its a hole in the design, but an intentional one as it makes for a very good user experience and doesn't really have a lot of down sides.

I was going to fill in that hole, but ``@system`` variables covers it enough that I kinda just went meh.

> ### Attribute soup
> 
> `return` and `scope` annotations are noisy / confusing, but this proposal adds more and jumbles the existing ones in a way that's not necessarily easier to understand. For a simple `int* f(int* x)` function, the parameter attributes change in the following way**:
> 
> | DIP1000              | Escape Analysis                 |
> |----------------------|---------------------------------|
> | `return ref scope`   | `scope @escape(return)`         |
> | `return ref`         | impossible***                   |
> | `return scope`       | `@escape(return)`               |
> | `scope`              | `@escape()` / `scope`           |
> 
> It solves the `return scope` and `scope return` problem, but might have problems of its own:
> - `scope` now means two unrelated things: 'strong relationship' and 'default empty escape set'

This is the same meaning it has today with DIP1000. Just reworded. By itself it matches the definition prior to DIP1000 too.

So this is inherently well understood.

If you have a parameter or variable that is only ``scope`` it may still compile with this proposal without changes. If it doesn't go awry of owner escape analysis and doesn't compile, I'd like to know!

> - `@escape` is the opposite of `@escape()`, which could be confusing

Originally I was going to make this to mean 'inferred', but it's better if everything gets inferred by default.

It needs to mean something, so got an alternative?

> ** I might be wrong, but if so, that really doesn't bode well for the 'communicability' aspect of the lifetime attributes, which the DIP tries to address

With DIP1000, the attribute elicits both the strength and the escape set in the same attribute, with this it does not.

``@escape`` tells you where it can go, ``scope`` upgrades the relationship to a strong one.

Giving ``scope`` a default escape set is to allow it to match existing understanding, which does help with communicability.

So I do disagree with the statement that this is not aiding in communicability, its a lot easier to communicate one thing per attribute, rather than trying to communicate two things. With subtle differences between similarly looking ones.

> *** That's what I take from "Error the parameter `ptr` cannot have an escape set that includes `__unknown` and be marked as having a strong relationship `scope`"

Yes you are correct.

It inherently describes that there is an owner of the pointer being passed in and that it needs to be protected (somehow).

If you were allowed to take a pointer to a by-ref variable and then store it some place you are most likely escaping a pointer. And that would not be a good thing. This should not be allowed in ``@safe``, and if it does that's a bug.

> ### Composability with respect to structs
> 
> Explicitly unaddressed
> 
>> Elements in an array, fields in a class/struct/union are conflated with the variable that stores them in.

I'm going to need an example of what you think is not addressed here.

From my perspective the field gets conflated with its containing instance variable and that covers composability.

> ### Transitive scope
> 
> Not mentioned.

``scope`` is not transitive, at least as far as the language knows transitive to mean.

Taking a value out of a field of a struct would establish a weak relationship between the resulting variable and the containing struct instance variable.

```d
struct S {
	int* field;
}

void handle(int* ptr) {
	S s;
	s.field = ptr;
	// @escape(s) ptr
}
```

```d
struct Input {
	int* ptr;
	int field;
}

Input input1 = ...;
int* output1 = &input1.field; // has a strong relationship between `output1` and `input1`

scope Input input2 = ...;
int* output2 = input2.ptr; // has a strong relationship between `output2` and `input2`
```

This works because a weak relationship can be upgraded to a strong relationship, without the function being annotated as such based upon the argument.

As a result cross-function guarantees are maintained and therefore transitively.

Okay this needs elaborating.

"The attribute ``scope`` is not transitive. Instead it relies upon cross-function analysis to make guarantees for fields access/mutation and function calls. If any expression causes an output to exist, this will inherently have a strong relationship and therefore can be typed as ``scope``."

Uploaded.

> All in all, I feel the DIP is too focussed on addressing one issue (multiple outputs) while neglecting others. The most pressing issue is that many people simply don't want D to become like Rust. DIP1000 and @live at least leave 'regular' GC-based D mostly alone: just don't take the address of local variables in `@safe` functions and you're good. It would be really good if whatever 'escape analysis' D ends up boasting (if any), it would be for the benefit of specialized library types (e.g. `RefCounted(T)`) without complicating common pointer/array operations in `@safe` code.

I focus upon multiple outputs, because to make flattening to a function signature to work, you have to do this. If you don't you are not going to model enough code, and will be going against the literature on this subject making it harder to use.

Pointer arithmetic is already disallowed in ``@safe``, in a lot of ways _any_ taking of a pointer is unsafe without some form of escape analysis. This makes it safe to do both consistently.

I don't know how we could make pointers safer without throwing owner escape analysis at it.

Top | Forum index | About this forum

Forums

Simple safe D

Attribute soup

Composability with respect to structs

Transitive scope