DIP25/DIP1000: My thoughts round 2

Sep 02, 2018

Chris M.

Sep 02, 2018

Nicholas Wilson

Sep 04, 2018

Sep 04, 2018

Sep 05, 2018

Sep 05, 2018

Sep 07, 2018

September 02, 2018

DIP25/DIP1000: My thoughts round 2

Posted by Chris M.

Permalink

Chris M.

Permalink

Round 2 because I had this whole thing typed up, and then my power went out on me right before I posted. I was much happier with how that one was worded too.


Basically I'd like to go over at length one of the issues I see with these DIPs (though I think it applies more to DIP1000), namely return parameters and what we could do to make them stronger. I will say I do not have the chops to go implement these ideas myself, even if I had approval and support. This is more to get my thoughts out there and see what other people think about them (frankly I'd be putting this in Study if it wasn't a ghost town over there).


First I'm going to reiterate over DIP25 as I understand it for background, stealing some examples from the DIP page. Let's starting with the following.

ref int id(ref int x) {
    return x; // pass-through function that does nothing
}

ref int fun() {
    int x;
    return id(x); // escape the address of local variable
}


The id() function just takes and returns a variable by ref, which is perfectly legal. However it is open to abuse. As you see in fun(), id() is used to escape a reference to a local variable, which is obviously not desired behavior. The issue is how do we tell fun(), from id()'s signature alone, "id() will return a reference to whatever you pass it, one way or another. Make sure you don't give id()'s return value to something that'll outlive the argument you pass to id()" (though we need to say this in more concise terms obviously). DIP25 solves this pretty nicely with return parameters


// now this function is banned, since it has a ref parameter and returns by ref
ref int wrongId(ref int x) {
    return x; // ERROR! Cannot return a ref, please use "return ref"
}

// this is fine however
ref int id(return ref int x) {
    return x;
}

ref int fun() {
    int x;
    static int y;
    return id(x); // no, wait, since we're returning to a scope that'll outlive x, this errors at compile-time. Thanks return ref
    return id(y); // fine, sure, y lives forever
}


fun() now knows the return value of id() cannot outlive the argument it passes to id(). This allows us to disallow certain undesired behavior at compile-time, which is great.

With that in mind, let's move on to DIP1000. Namely, I'm looking at this issue Walter filed.

https://issues.dlang.org/show_bug.cgi?id=19097

I'll try to detail it here (and steal more examples, thanks Mike :*) ). It has to do with the same principles I outlined above for DIP25, only this time we're using pointers rather than refs.

First example, which works as expected


int* frank(return scope int* p) { return p; } // basically id()

void main()
{
    // lifetimes end in reverse order from which they are declared
    int* p;  // `p`'s lifetime is longer than `i`'s
    int i;   // `i`'s lifetime is longer than `q`'s
    int* q;  // `q`'s lifetime is the shortest

    q = frank(&i); // ok because `i`'s lifetime is longer than `q`'s
    p = frank(&i); // error because `i`'s lifetime is shorter than `p`'s
}


frank() marks its parameter as return, to signal to main() that wherever main() puts frank()'s return value, it can't outlive what main() passed as an argument to frank(). All fine and dandy.

Second example (I'd pay closer attention to betty()'s definition here)


void betty(ref scope int* r, return scope int* p)
{
    r = p; // (1) Error: scope variable `p` assigned to `r` with longer lifetime
}

void main()
{
    int* p;
    int i;
    int* q;

    betty(q, &i); // (2) ok
    betty(p, &i); // (3) should be error
}


Hang on, why can't I compile betty(), when it's doing the same thing as frank(), only putting the return value in the first parameter rather than returning it? No reason, I absolutely should be able to compile and use betty(). So the question becomes, how can betty() tell main(), that what main() passes as the first argument to betty() can't outlive what's passed as the second argument? Marking the second parameter return does not work here, as that only ties its lifetime to the return value. It can't be used on arbitrary parameters. How to resolve this?

Walter's solution is as follows. If a function is void, and its first parameter is ref, apply the "return" annotation to the first parameter rather than the return value of the function. Using these conditions, betty() now compiles, and main() errors at (3) as expected. However I find this solution too restrictive. While it fits many functions within Phobos, we are tying users to this special case and forcing them to unnecessarily refactor their code around it. What if I don't want it to be void and want the function to return something as well? What if I want to return via the second parameter? This just seems to be setting up another trap for users to fall into.

I talked about this in the "Is @safe still a work-in-progress?" thread, but I'll repeat it here again. There is a cleaner way to do this. I'll demonstrate using some borrowed Rust syntax, but remember the syntax doesn't matter too much here so much as the idea. Rather than using "return", we instead annotate the parameters like so


void betty(ref scope int*'a r, scope int*'a p) // okay it's not pretty
{
    r = p; // cool, p's lifetime is tied to r's lifetime
}

void main()
{
    int* p;
    int i;
    int* q;

    betty(q, &i); // (2) ok
    betty(p, &i); // (3) error
}


Good, these are the results I expect. What if I want to output to the second parameter?


void betty(scope int*'a r, ref scope int*'a p)
{
    p = r; // cool, p's lifetime is tied to r's lifetime
}

void main()
{
    int* p;
    int i;
    int* q;

    betty(&i, q); // (2) ok
    betty(&i, p); // (3) error
}


Nice, that'll work too

Here's frank()


int*'a frank(scope int*'a p) { return p; } // basically id()

void main()
{
    // lifetimes end in reverse order from which they are declared
    int* p;  // `p`'s lifetime is longer than `i`'s
    int i;   // `i`'s lifetime is longer than `q`'s
    int* q;  // `q`'s lifetime is the shortest

    q = frank(&i); // ok because `i`'s lifetime is longer than `q`'s
    p = frank(&i); // error because `i`'s lifetime is shorter than `p`'s
}


These annotations are much more flexible since they can be moved any which way around the function signature, and have the added benefit of visually tying together lifetimes. For further consistency it could also be extended back to DIP25


ref'a int id(ref'a int x) {
    return x;
}


Hopefully that was coherent. Again this is me for me to get my thoughts out there, but also I'm interested in what other people think about this.

On Sunday, 2 September 2018 at 05:14:58 UTC, Chris M. wrote: > Hopefully that was coherent. Again this is me for me to get my thoughts out there, but also I'm interested in what other people think about this. Thanks! Please add anything you think is missing to https://github.com/dlang/dlang.org/pull/2453 since Walter doesn't seem to be interested.

Rust's lifetime syntax is noisy - the scope name is repeated, and why require a name if it's usually not given a meaningful one (`a`)? Rust is more limited semantically due to unique mutability, so it may have different requirements for function signatures to D. (I think they recently tweaked the rules on how lifetimes can be inferred). My syntax for parameters that may get aliased to another parameter is to write the parameter number that may escape it in its scope attribute: On Sunday, 2 September 2018 at 05:14:58 UTC, Chris M. wrote: > void betty(ref scope int*'a r, scope int*'a p) // okay it's not pretty void betty(ref scope int* r, scope(1) int* p); p is documented as (possibly) escaped in parameter 1. > void betty(scope int*'a r, ref scope int*'a p) void betty(scope(2) int* r, ref scope int* p); I think my syntax is lightweight, clearer than Walter's `return` for void functions PR, but just as expressive as your examples. > int*'a frank(scope int*'a p) { return p; } // basically id() I'd keep `return scope` for p. There's also: void swap(ref scope(2) T a, ref scope(1) T b); swap(r[0], r[1]); Arguments to a,b must have the same lifetime. Without support for this, we might need to use e.g. `swapAt(r, 0, 1)` instead of indexing throughout range algorithms.

On Tuesday, 4 September 2018 at 16:36:20 UTC, Nick Treleaven wrote: > Rust's lifetime syntax is noisy - the scope name is repeated, and why require a name if it's usually not given a meaningful one (`a`)? > > Rust is more limited semantically due to unique mutability, so it may have different requirements for function signatures to D. (I think they recently tweaked the rules on how lifetimes can be inferred). As I was typing this up I was thinking about how Rust's rules with how an object can be borrowed would affect it, but I can't think of any examples off the top of my head. > > My syntax for parameters that may get aliased to another parameter is to write the parameter number that may escape it in its scope attribute: > > On Sunday, 2 September 2018 at 05:14:58 UTC, Chris M. wrote: >> void betty(ref scope int*'a r, scope int*'a p) // okay it's not pretty > > void betty(ref scope int* r, scope(1) int* p); > > p is documented as (possibly) escaped in parameter 1. > >> void betty(scope int*'a r, ref scope int*'a p) > > void betty(scope(2) int* r, ref scope int* p); > > I think my syntax is lightweight, clearer than Walter's `return` for void functions PR, but just as expressive as your examples. I wouldn't disagree, it's much cleaner than what I had. > >> int*'a frank(scope int*'a p) { return p; } // basically id() > > I'd keep `return scope` for p. That's true, it'd also allow your other syntax to be fitted over retroactively. > > There's also: > > void swap(ref scope(2) T a, ref scope(1) T b); > > swap(r[0], r[1]); > > Arguments to a,b must have the same lifetime. Without support for this, we might need to use e.g. `swapAt(r, 0, 1)` instead of indexing throughout range algorithms. That's a good example.

On Tuesday, 4 September 2018 at 16:36:20 UTC, Nick Treleaven wrote: > My syntax for parameters that may get aliased to another parameter is to write the parameter number that may escape it in its scope attribute: > > On Sunday, 2 September 2018 at 05:14:58 UTC, Chris M. wrote: >> void betty(ref scope int*'a r, scope int*'a p) // okay it's not pretty > > void betty(ref scope int* r, scope(1) int* p); > > p is documented as (possibly) escaped in parameter 1. Would using parameter names instead of numbers work? As an unfamiliar reader, it wouldn't be clear at all to me what `scope(1)` meant, but `scope(r) int* p` would at least suggest that there's some connection between `p` and `r`.

On Wednesday, 5 September 2018 at 01:06:47 UTC, Paul Backus wrote: > On Tuesday, 4 September 2018 at 16:36:20 UTC, Nick Treleaven wrote: >> My syntax for parameters that may get aliased to another parameter is to write the parameter number that may escape it in its scope attribute: >> >> On Sunday, 2 September 2018 at 05:14:58 UTC, Chris M. wrote: >>> void betty(ref scope int*'a r, scope int*'a p) // okay it's not pretty >> >> void betty(ref scope int* r, scope(1) int* p); >> >> p is documented as (possibly) escaped in parameter 1. > > Would using parameter names instead of numbers work? As an unfamiliar reader, it wouldn't be clear at all to me what `scope(1)` meant, but `scope(r) int* p` would at least suggest that there's some connection between `p` and `r`. It's indeed imho better as numbered parameters are a pita. Any change is annoying and fragile. I cannot count how often in C I had issues with annotations like __attribute__((nonnnul(5,9))) and __attribute__((format(printf, 3, 4))) when I had to change the parameters.

On Sunday, 2 September 2018 at 05:14:58 UTC, Chris M. wrote: > Hopefully that was coherent. Again this is me for me to get my thoughts out there, but also I'm interested in what other people think about this. Somewhat related, I was reading through this thread on why we can't do ref variables and thought this was interesting. A lot of these use cases could be prevented. I tacked my own comments on with //** https://forum.dlang.org/post/aqvtunmdqfkrsvzlgcet@forum.dlang.org struct S { return ref int r; } //ref local variable/stack, Ticking timebomb //compiler may refuse //** nope, never accept this void useRef(ref S input, int r) { input.r = r; //** error } //should be good, right? S useRef2(S input, return ref int r) { //Can declare @safe, right??? input.r = r; //maybe, maybe not. //** sure we can return S; } //Shy should indirect care if it's local/stack or heap? //** someone double-check my rationale here, but it should be fine S indirect(return ref int r) { return useRef2(S(), r); } //local variables completely okay to ref! Right? //** Nope! Reject! indirect2() knows whatever receives the return value can't outlive r S indirect2() { int r; return useRef2(S(), r); } S someScope() { int* pointer = new int(31); //i think that's right int local = 127; S s; //reference to calling stack! (which may be destroyed now); //Or worse it may silently work for a while //** or the function never gets compiled useRef(s, 99); assert(s.r == 99); return s; s = useRef2(s, pointer); //or is it *pointer? //** no clue what to say about this one assert(s.r == 31); //good so far if it passes correctly return s; //good, heap allocated s = useRef2(s, local); //** fine here, local outlives s assert(s.r == 127); //good so far (still local) return s; //Ticking timebomb! //** but we reject it here s = indirect(local); //** fine here, local outlives s assert(s.r == 127); //good so far (still local) return s; //timebomb! //** reject again s = indirect2(); //** never accepted in the first place return s; //already destroyed! Unknown consequences! }

Forums