DIP1000: 'return scope' ambiguity and why you can't make opIndex work

DIP1000: 'return scope' ambiguity and why you can't make opIndex work
Jun 18, 2021 Dennis
Jun 18, 2021 Steven Schveighoffer
Jul 06, 2021 Per Nordlöw
Jun 18, 2021 Ola Fosheim Grøstad
Jun 18, 2021 jmh530
Jun 19, 2021 Bradley Chatha
Jun 19, 2021 Ola Fosheim Grøstad
Jun 19, 2021 Dennis
Jun 19, 2021 Dukc
Jun 19, 2021 Ola Fosheim Grøstad
Jun 18, 2021 ag0aep6g
Jun 18, 2021 Dennis
Jun 19, 2021 Ola Fosheim Grøstad
Jun 18, 2021 Dukc
Jun 19, 2021 ag0aep6g
Jun 19, 2021 Dukc
Jun 19, 2021 ag0aep6g
Jun 19, 2021 Dukc
Jun 19, 2021 Dennis
Jun 19, 2021 Dukc
Jun 19, 2021 Dennis
Jun 19, 2021 Dennis
Jun 21, 2021 Dukc
Jun 21, 2021 Dennis
Jun 21, 2021 Dukc
Jun 21, 2021 nkm1
Jul 05, 2021 Walter Bright
Jul 05, 2021 ag0aep6g
Jul 05, 2021 Walter Bright
Jul 05, 2021 claptrap
Jul 06, 2021 Walter Bright
Jul 06, 2021 claptrap
Jul 06, 2021 Ola Fosheim Grøstad
Jul 06, 2021 Dennis
Jul 06, 2021 Walter Bright
Jul 06, 2021 Dennis
Jul 06, 2021 Walter Bright

June 18, 2021

Posted by Dennis

Permalink

Dennis

Permalink

You may have seen my previous dip1000 posts:

Consider this part 3 in the "fixing dip1000 series", but it's about a different bug.

Background

dip25 and dip1000 are supposed to provide simple lifetime tracking that's still good enough to be useful. In the previous thread Atila Neves mentioned that Lifetime Annotations like in Rust are to be avoided. Is it simple though?

On Wednesday, 26 May 2021 at 15:29:32 UTC, Paul Backus wrote:

Of course, D's vision here is severely hampered in practice by
the poor quality of its documentation (raise your hand if you
can explain what "return ref parameter semantics with
additional scope parameter semantics" actually means). But
that's the idea.

Working on dip1000 made me finally able to "raise my hand", so here's how it works:

Function parameters of a type with pointers have three possible lifetimes: infinite, scope, or return scope. You might have heard that scope is "not transitive" and think that there's only one layer to it. However, the key insight is that there's actually two layers when ref comes into play: then the parameter's address itself also has a lifetime in addition to the value. It can be demonstrated with a linked list:

@safe:
struct Node {
    int x;
    Node* next;
}

// First layer: returning the address of the node
int* get0(return ref Node node) {
    return &node.x;
}

// Second layer: returning a value of the node
int* get1(ref return scope Node node) {
    return &node.next.x;
}

// Third layer and beyond: this is where scope checking ends
int* get2(ref scope Node node) {
    return &node.next.next.x;
}

The lifetimes are determined as follows:

Lifetime	`ref` address	value of pointer type
infinite	never	default
current scope	default	with `scope` keyword
return scope	with `return` keyword	with `return scope`

A few code examples:

@safe:
int* v0(             int* x) {return x;} // allowed, no lifetime restrictions
int* v1(return       int* x) {return x;} // allowed, returned value is `scope`
int* v2(       scope int* x) {return x;} // not allowed, x is `scope`
int* v3(return scope int* x) {return x;} // allowed, equivalent to v1

int* r0(       ref int x) {return &x;} // not allowed, `ref` is always scope
int* r1(scope  ref int x) {return &x;} // not allowed, `scope` does nothing here
int* r2(return ref int x) {return &x;} // allowed, return applies to `ref`

As you can see, scope always applies to the pointer value and not to the ref, since ref is inherently scope. No ambiguity there. But what if we have a ref int*: does return apply to the address of the ref or the int* value?

That's where those confusing lines from the specification come in, which distinguishes "return ref semantics" and "return scope semantics". It turns out there are three important factors: whether the function's return type is ref, whether the parameter is ref, and whether the parameter is annotated scope. Here's a table:

Does the return attribute apply to the parameter's ref or the pointer value?

	`scope`	no `scope`
`ref` return type / `ref` param	`ref`	`ref`
value return type / `ref` param	value	`ref`
`ref` return type / value param	value	value
value return type / value param	value	value

If you're still confused, I don't blame you: I'm still confusing myself regularly when reading signatures with return and ref. Anyway, is this difficulty problematic?

On Wednesday, 15 May 2019 at 08:32:09 UTC, Walter Bright wrote:

On 5/15/2019 12:21 AM, Dukc wrote:

Could be worth a try even without docs, but in the long run we
definitely need some explaining.

True, but I've tried fairly hard with the error messages.
Please post your experiences with them.

Also, there shouldn't be any caveats with using it. If it
passes the compiler, it should be good to go. (Much like const
and pure.)

All you need to do is see if the compiler complains, try adding return and/or scope, and see if the errors go away. Well...

@safe:
struct S {
    int x;
}

int* f(ref return scope S s) {
    return &s.x; // Error: returning `&s.x` escapes a reference to parameter `s`
                 // perhaps annotate the parameter with `return`
}

That's a confusing supplemental error, the parameter is annotated return. The actual problem is that return applies to the value, not the ref parameter, since there is no ref return.

struct T {
    int x;
    int* y; // <- pointer member added
}

int* g(ref return scope T t) {
    return &t.x; // No error
}

And now the compiler accepts invalid code. Indeed, even the compiler doesn't always know what the return storage class actually applies to. See bugzilla issue 21868.

The issue

While fixing issue 21868, the CI uncovered that dub package 'automem' relies on the current accepts-invalid behavior. Here's the reduced code:

struct Vector {
    float[] _elements;
    ref float opIndex(size_t i) scope return {
        return this._elements[i];
    }
}

With the patch I made, the error becomes:

source/automem/vector.d(212,25): Error: scope parameter `this` may not be returned
source/automem/vector.d(212,25):        note that `return` applies to `ref`, not the value

My new supplemental error message is working, yay! But how to fix it?
One way is to pass the Vector by value instead of by reference, but opIndex must be a member function to work as an operator overload and member functions pass this by reference. Another way is to return by value instead of by reference, but that means accessing array elements introduces a copy, and &vector[0] won't work anymore.

dip1000 simply can't express a 'return scope' opIndex returning by ref.

So it turns out the double duty of the return storage class is neither simple, nor expressive enough. Do you have any ideas how to move forward, and express the Vector.opIndex method without making the attribute soup worse? Keep in mind that dip25 (with return ref) is already in the language, but dip1000 (with return scope) is still behind a preview switch.

June 18, 2021

Re: DIP1000: 'return scope' ambiguity and why you can't make opIndex work

Posted by Steven Schveighoffer
in reply to Dennis

Permalink

Steven Schveighoffer

Posted in reply to Dennis

Permalink

On 6/18/21 11:44 AM, Dennis wrote:
> If you're still confused, I don't blame you: I'm still confusing myself regularly when reading signatures with `return` and `ref`.

I have a headache reading this post, and it makes me want to never use DIP1000.

We are creeping towards having as much confusion and pain as Rust, without the benefit.

I strongly believe we should implement DIP1000 in an expressive manner, instead of relying on confusing conventions -- just make a type constructor to signify lifetime management and be done.

-Steve

June 18, 2021

Re: DIP1000: 'return scope' ambiguity and why you can't make opIndex work

Posted by Ola Fosheim Grøstad
in reply to Dennis

Permalink

Ola Fosheim Grøstad

Posted in reply to Dennis

Permalink

On Friday, 18 June 2021 at 15:44:02 UTC, Dennis wrote:

If you're still confused, I don't blame you: I'm still confusing myself regularly when reading signatures with return and ref. Anyway, is this difficulty problematic?

I am getting the same feeling from this as I am getting from certain aspects in C++ (e.g. intricate details of constructors).

Thank you for explaining it, but I also think I will not remember it. I think stuff like this is what programmers will throw into a bucket labeled "I will figure this out later" and just apply keywords until it compiles...

I've suggested that one might want to make the function signatures more readable and keep "auxiliary stuff" on a separate line:

https://forum.dlang.org/thread/nzwobsazsawxvxbxhoue@forum.dlang.org

I personally think explicit lifetimes are easier to read, because I don't actually have to remember what keywords signify.

It also makes it possible to expand the capabilities of the compiler over time.

June 18, 2021

Re: DIP1000: 'return scope' ambiguity and why you can't make opIndex work

Posted by ag0aep6g
in reply to Dennis

Permalink

ag0aep6g

Posted in reply to Dennis

Permalink

On Friday, 18 June 2021 at 15:44:02 UTC, Dennis wrote:

Does the return attribute apply to the parameter's ref or the pointer value?

	`scope`	no `scope`
`ref` return type / `ref` param	`ref`	`ref`
value return type / `ref` param	value	`ref`
`ref` return type / value param	value	value
value return type / value param	value	value

[...]

Here's the reduced code:

struct Vector {
    float[] _elements;
    ref float opIndex(size_t i) scope return {
        return this._elements[i];
    }
}

With the patch I made, the error becomes:

source/automem/vector.d(212,25): Error: scope parameter `this` may not be returned
source/automem/vector.d(212,25):        note that `return` applies to `ref`, not the value

Geez, this isn't easy. I had to go step by step to make sense of that error, so maybe this can help others understand:

opIndex has a half-hidden parameter: return ref scope this. Depending on the opIndex's return type, the return part of the this parameter can either bind to its ref part or to its scope part. In pseudo code, it can be either (return ref) (not-return scope) this or (not-return ref) (return scope) this.

opIndex has a ref return type. According to the table above, that means return binds to the ref part of ref scope this. I.e., it's (return ref) (not-return scope) this.

(return ref) this means opIndex may return a ref to this or this._elements (same address).

(not-return scope) this means it cannot return a ref to the elements of this._elements, because that would be returning a scope pointer which hasn't been annotated with return.

As far as I understand, opIndex could return &this._elements[i] by value. Then the return would bind to the scope part of ref scope this, making &this._elements[i] a return scope pointer. But float* would be an awkward return type for opIndex.

Geez, this isn't easy.

June 18, 2021

Re: DIP1000: 'return scope' ambiguity and why you can't make opIndex work

Posted by jmh530
in reply to Ola Fosheim Grøstad

Permalink

jmh530

Posted in reply to Ola Fosheim Grøstad

Permalink

On Friday, 18 June 2021 at 17:02:41 UTC, Ola Fosheim Grøstad wrote:

[snip]

I've suggested that one might want to make the function signatures more readable and keep "auxiliary stuff" on a separate line:

https://forum.dlang.org/thread/nzwobsazsawxvxbxhoue@forum.dlang.org

I personally think explicit lifetimes are easier to read, because I don't actually have to remember what keywords signify.

It also makes it possible to expand the capabilities of the compiler over time.

I am sympathetic to this. scope is relatively simple, but once you start getting into more combinations it requires a bit of mental energy.

June 18, 2021

Re: DIP1000: 'return scope' ambiguity and why you can't make opIndex work

Posted by Dennis
in reply to ag0aep6g

Permalink

Dennis

Posted in reply to ag0aep6g

Permalink

On Friday, 18 June 2021 at 17:04:02 UTC, ag0aep6g wrote:

Geez, this isn't easy.

I know right? When I started to get the hang of it I was like "I should write a tutorial about this" followed closely by "how am I going to explain this in one go to someone who hasn't spelunked dmd/escape.d and looked at the relevant spec a dozen times?"

For this post I hoped to get across the idea that dmd has concepts of 'escaping by reference' for ref int and 'escaping by value' for int*, and that it currently sometimes goes wrong when you mix them. But there is so much more to cover:

constructors act like they return this by ref, but still have return scope semantics
out acts like ref
in acts like... I don't know. With -preview=in it's implementation defined whether it's ref scope or just scope, so is it also implementation defined what return applies to then?
auto ref... Don't know how that works internally.
ref in foreach is actually not inerhently scope like in parameters, and it has its own hole.
when scope is inferred, could it change the meaning of return to apply to the value instead of the ref?
... who knows what I missed

Learning a complex system could be rewarding if afterwards you can write expressive code with lifetime tracking, but in the case of dip1000, after all your learning efforts you still can't write a routine that splits a scope string into a scope(string)[] because dip1000 simply can't express that.

June 18, 2021

Re: DIP1000: 'return scope' ambiguity and why you can't make opIndex work

Posted by Dukc
in reply to Dennis

Permalink

Dukc

Posted in reply to Dennis

Permalink

On Friday, 18 June 2021 at 15:44:02 UTC, Dennis wrote:

[snip]

Wow, if nothing else you're doing a great job documenting DIP1000 with your posts. Thanks!

With regular pointers and ref parameters, I think we should change the semantics of scope ref to be simply same as ref, i.e. no binding scope to the underlying pointer. Other than that, the semantics you explained are understandable IMO.

I'd prefer to call the return scope storage class just a return storage class. Your post shows they are the same except for the corner cases with ref scope I just recommended ditching. Do you agree?

Of course, we also need to be able to annotate the this pointer as return. Simplest answer IMO: have return storage class for a function declaration to always bind to the this argument, compiler error if there is none. return storage class for the returned value makes no sense anyway.

June 19, 2021

Re: DIP1000: 'return scope' ambiguity and why you can't make opIndex work

Posted by Ola Fosheim Grøstad
in reply to Dennis

Permalink

Ola Fosheim Grøstad

Posted in reply to Dennis

Permalink

On Friday, 18 June 2021 at 18:31:40 UTC, Dennis wrote:

I think this is the most significant issue. There is now way to extend it later without making signatures even more complicated.

June 19, 2021

Re: DIP1000: 'return scope' ambiguity and why you can't make opIndex work

Posted by ag0aep6g
in reply to Dennis

Permalink

ag0aep6g

Posted in reply to Dennis

Permalink

On 18.06.21 17:44, Dennis wrote:
> So it turns out the double duty of the `return` storage class is neither simple, nor expressive enough. Do you have any ideas how to move forward, and express the `Vector.opIndex` method without making the attribute soup worse? Keep in mind that dip25 (with `return ref`) is already in the language, but dip1000 (with `return scope`) is still behind a preview switch.

A quick and easy fix could be introducing `return(ref)` and `return(scope)`, allowing the programmer to pick what `return` binds to. Then `opIndex` can be written this way:

----
ref float opIndex(size_t i) return(scope) {
    return this._elements[i];
}
----

But:

* That's still hard to figure out, especially with methods because `ref this` is invisible.
* It doesn't address the underlying issues: one level of `scope` is not enough, and treating `ref` different from other indirections is confusing.

I'm afraid DIPs 25 and 1000 are falling short.

June 19, 2021

Re: DIP1000: 'return scope' ambiguity and why you can't make opIndex work

Posted by Bradley Chatha
in reply to Ola Fosheim Grøstad

Permalink

Bradley Chatha

Posted in reply to Ola Fosheim Grøstad

Permalink

On Friday, 18 June 2021 at 17:02:41 UTC, Ola Fosheim Grøstad wrote:

I've suggested that one might want to make the function signatures more readable and keep "auxiliary stuff" on a separate line:

https://forum.dlang.org/thread/nzwobsazsawxvxbxhoue@forum.dlang.org

I personally think explicit lifetimes are easier to read, because I don't actually have to remember what keywords signify.

It also makes it possible to expand the capabilities of the compiler over time.

Being able to perform explicit, sort of 'algebra-esque' expressions of lifetime seems like a much more reasonable idea than the current magical keyword combinations.

What are the chances though that the path/syntax can be changed at this point though, mostly in regards to convincing people? Not just for this suggestion, but any suggestion/criticism towards DIP 1000 in general?

My main worry is that we'll end up with an inflexible, hard to understand system that doesn't even do the job right. Yet another tacked on feature for the language, etc.

I've not been terribly optimistic for a quite a while now about the general direction things like this end up going, so I'm not getting my hopes up in anyway.

Top | Forum index | About this forum

Forums

Background

The issue