February 26, 2015
On Thursday, 26 February 2015 at 17:56:14 UTC, Zach the Mystic wrote:
> On Wednesday, 25 February 2015 at 21:26:33 UTC, Marc Schütz wrote:
>> IIRC H.S. Teoh suggested a change to the compilation model. I think he wants to expand the minimal compilation unit to a library or executable. In that case, inference for all kinds of attributes will be available in many more circumstances; explicit annotation would only be necessary for exported symbols.
>
> You probably mean Dicebot:
>
> http://forum.dlang.org/post/otejdbgnhmyvbyaxatsk@forum.dlang.org

You're right! And I just (again wrongly) implicated Martin Nowak in this, too :-P

>
>> Anyway, it is a good idea to enable scope semantics implicitly for all references involved in @safe code. As far as I understand it, this is something you suggest, right? It will eliminate annotations except in cases where a parameter is returned, which - as you note - will probably be acceptable, because it's already been suggested in DIP25.
>
> Actually you could eliminate `return` parameters as well, I think. If the compiler has the body of a function, which it usually does, then there shouldn't be a need to mark *any* of the covariant function or parameter attributes. I think it's the kind of thing which should "Just Work" in all these cases.

Agreed. I had the export/import case in mind, where you don't have the function body. The signature then needs to contain `return` parameters, although `scope` would be implied by `@safe`.

>> I also think it is too coarse. Even variables declared at the same lexical scope have different lifetimes, because they are destroyed in reverse order of declaration. This is relevant if they contain references and have destructors that access the references; we need to make sure that no reference to a destroyed variable can be kept in a variable whose destructor hasn't yet run.
>
> It might be too coarse. We could reserve a few more bits for depth-constant declaration order. At the same, time, it doesn't seem *that* urgent to me. But maybe I'm naive about this. Everything is being destroyed anyway, so what's the real danger?

struct A {
    B* b;
    ~this() {
        b.doSomething();
    }
}

struct B {
    void doSomething();
}

void foo() {
    A a;      // declscope(1)
    B b;      // declscope(1)
    a.b = &b; // refscope(1) <= declscope(1): OK
    // end of scope:
    // `b` is destroyed
    // `a`'s destructor is called
    // => your calling a method on a destroyed object
}

Basically, every variable needs to get its own declscope; all declscopes form a strict hierarchy (no partial overlaps).

>
>>> Principle 5: It's always un@safe to copy a declaration scope from a higher scopedepth to a reference variable stored at lower scopedepth. DIP69 tries to banish this type of thing only in `scope` variables, but I'm not afraid to banish it in all @safe code period:
>>
>> For backwards compatibility reasons, it might be better to restrict it to `scope` variables. But as all references in @safe code should be implicitly `scope`, this would mostly have the same effect.
>
> I guess this is the "Language versus Legacy" issue. I think D's strength is in it's language, not its huge legacy codebase. Therefore, I find myself going with the #pleasebreakourcode crowd, for the sake of extending D's lead where it shines.

I'm too, actually, but it would be a really hard sell.

> I'm not sure all references in safe code need to be `scope` - that would break a lot of code unto itself, right?

Not sure how much would be affected. I actually suspect that most of it already behaves as if it were scope, with the exception of newly allocated memory. But those should ideally be "owned" instead.

But your right, there still needs to be an opt-out possibility, most likely static.

>>> Principle 10: You'll probably have noticed that all scopes accumulate each other according to lexical ordering, and that's good news, because any sane person assigns and return references in lexical order.
>>
>> As you say, that's broken. But why does it need to be in lexical order in the first place? I would simply analyze the entire function first, assign reference scopes, and disallow circular relations (like `a = b; b = a;`).
>
> T* fun(T* a, T** b) {
>   T* c = new T;
>   c = a;
>   *b = c;
>   return c;
> }

Algorithm for inference of ref scopes (= parameter annotations):

1) Each variable, parameter, and the return value get a ref scope (or ref depth). A ref scope can either be another variable (including `return` and `this`) or `static`.

2) The initial ref scope of variables is themselves.

3) Each time a variable (or something reachable through a variable) is assigned (returning is assignment to the return value), i.e. for each location in the function that an assignment happens, the new scope ref will be:

3a) the scope of the source, if it is larger or equal to the old scope

3b) otherwise (for disjunct scopes, or assignment from smaller to larger scope), it is an error (could potentially violate guarantees)

4) If a source scope refers to a variable (apart from the destination itself), for which not all assignments have been processed yet, it is put into a queue, to be evaluated later. For code like `a = b; b = a;` there can be dependency cycles. Such code will be disallowed.

How exactly the scope of a complex expression has to be computed is left open here.

In the end, if there was no error, all variables, parameters and the return value will have a minimum reference scope assigned. If that scope is the variable itself, they can be inferred as `scope`. If it is a parameter, that parameter get an `out!identifier` or `return` annotation.

Note that the order in which the "assignments" occur inside the function doesn't matter. This is more restrictive than strictly necessary, but it's certainly ok in most cases, easy to work around when not, and it doesn't require data/control flow analysis.

(By the way: inference cannot work for recursive functions.)

Your example:

T* fun(T* a, T** b) {
    // => S(a) = a
    // => S(b) = b
    // => S(return) = <doesn't matter>
    T* c; // == (T*).init == null
    // => S(c) = c
    c = new T;
    // `new` returns static, which is wider than c
    // => S(c) = static
    c = a;
    // => invalid, narrowing not allowed
    // (this is what I asked about, and now I
    // see why it's necessary)
    // let's assume it didn't happen, so that
    // the next two statements work
    *b = c;
    // => S(b) = S(c) = static
    return c;
    // => S(return) = S(c) = static
}

This algorithm can also be modified slightly to allow only partial inference (only of some variables, e.g. locals, when the parameters have already been explicitly annotated), as well as for checking whether the assignments are valid in this case.

I'm a bit tired now, so maybe this contains glaring mistakes, but if so, I hope they can be fixed :-) I hope it's clear what I'm trying to do here.

Something else that needs consideration: What happens when parameters alias each other? I think it is ok, because the checking phase will naturally prohibit calling functions in a way that would break the guarantees, but I haven't thought it through completely.

>> It's not so simple at all. For full-blown unique ownership, there needs to be some kind of borrow-checking like in Rust. I have some ideas how a simple borrow-checker can be implemented without much work (without data flow analysis as Rust does). It's basically my "const borrowing" idea (whose one flaw incidentally cannot be triggered by unique types, because it is conditioned on the presence of aliasing).
>>
>> There are still some things in the proposal that I'm sure can be simplified. We probably don't need new keywords like `noscope`. I'm not even sure the concept itself is needed.
>
> Unless you want to flat out ban copying a parameter reference to a global in @safe code, you will need `noscope`, or, as you suggested, `static`.

You're right, it's necessary.

> I'm actually thinking of reusing `noscope` as a function attribute (`@noscope` perhaps) which says that the function may return a heap or global reference. This is all that's necessary to complete an ownership system. If a scope has exactly 1 "mystery" bit set, and is known not to come from the heap or a global, then you know that it *must* contain a reference to exactly the parameter for which the mystery bit is set. You know exactly what it contains == ownership.

I will have to think about this, but I believe you cannot express such concepts as deadalnix's islands, or "const borrowing". But maybe, if we're lucky, I'm wrong :-)
February 26, 2015
On Thursday, 26 February 2015 at 20:46:07 UTC, deadalnix wrote:
> Consider :
>
> void foo(T** a) {
>     T** b = a; // OK
>     T*  = ...;
>     *b = c; // Legal because of your transitive clause,
>             // but not safe as a can have an
>             // arbitrary large lifetime.
> }

This example's incomplete, but I can guess you meant something like this:

void foo(T** a) {
    T** b = a; // OK
    T d;
    T* c = &d;
    *b = c; // Legal because of your transitive clause,
            // but not safe as a can have an
            // arbitrary large lifetime.
}

> This show that anything you reach through an indirection can have from the same lifetime as the indirection up to an infinite lifetime (and anything in between). When using it as an lvalue, you should consider the largest possible lifetime, when using it as an rvalue, you should consider the smallest (this is the only way to be safe).

I'm starting to see what you mean. I guess it's only applicable to variables with double (or more) indirections (e.g. T**, T***, etc.), since only they can lose information with transitive scopes. Looks like we need a new rule: variables assigning to one of their double indirections cannot acquire a scope-depth greater than (or lifetime less than) their current one. Does that fix the problem?
February 27, 2015
On Thursday, 26 February 2015 at 22:45:19 UTC, Zach the Mystic wrote:
> I'm starting to see what you mean. I guess it's only applicable to variables with double (or more) indirections (e.g. T**, T***, etc.), since only they can lose information with transitive scopes. Looks like we need a new rule: variables assigning to one of their double indirections cannot acquire a scope-depth greater than (or lifetime less than) their current one. Does that fix the problem?

Cool. I think that can work (I'm not 100% convinced, but at least something close to that should work). But that is probably too limiting.

Hence the proposed differentiation of lvalue and rvalues.
February 27, 2015
On Thursday, 26 February 2015 at 21:33:53 UTC, Marc Schütz wrote:
> On Thursday, 26 February 2015 at 17:56:14 UTC, Zach the Mystic wrote:
>> On Wednesday, 25 February 2015 at 21:26:33 UTC, Marc Schütz wrote:
> struct A {
>     B* b;
>     ~this() {
>         b.doSomething();
>     }
> }
>
> struct B {
>     void doSomething();
> }
>
> void foo() {
>     A a;      // declscope(1)
>     B b;      // declscope(1)
>     a.b = &b; // refscope(1) <= declscope(1): OK
>     // end of scope:
>     // `b` is destroyed
>     // `a`'s destructor is called
>     // => your calling a method on a destroyed object
> }
>
> Basically, every variable needs to get its own declscope; all declscopes form a strict hierarchy (no partial overlaps).

Well, technically you only need one per variable with a destructor. Fortunately, this doesn't seem hard to add. Just another few bits, allowing as many declarations with destructors as seem necessary (4 bits = 15 variables, 5 bits = 31 variables, etc.), with the last being treated conservatively as unsafe. (I think anyone declaring 31+ variables with destructors in a function, and taking the addresses of those variables has bigger problems than memory safety!)

>> I guess this is the "Language versus Legacy" issue. I think D's strength is in it's language, not its huge legacy codebase. Therefore, I find myself going with the #pleasebreakourcode crowd, for the sake of extending D's lead where it shines.
>
> I'm too, actually, but it would be a really hard sell.

But look, Walter and Andrei were fine with adding `return ref` parameters. There's hope yet!

>> I'm not sure all references in safe code need to be `scope` - that would break a lot of code unto itself, right?
>
> Not sure how much would be affected. I actually suspect that most of it already behaves as if it were scope, with the exception of newly allocated memory. But those should ideally be "owned" instead.
>
> But your right, there still needs to be an opt-out possibility, most likely static.

I don't even have a use for `scope` itself in my proposal. The only risk I'm running is a lot of false positives -- safe constructs which the detection mechanism conservatively treats as unsafe because it can't follow the program logic. Still, it's hard for me to imagine even these appearing very much. And they can be put into @trusted lambdas -- all @trusted functions are treated as if they copy no references, effectively canceling any parameter attributes to the contrary.

>> T* fun(T* a, T** b) {
>>  T* c = new T;
>>  c = a;
>>  *b = c;
>>  return c;
>> }
>
> Algorithm for inference of ref scopes (= parameter annotations):
>
> 1) Each variable, parameter, and the return value get a ref scope (or ref depth). A ref scope can either be another variable (including `return` and `this`) or `static`.
>
> 2) The initial ref scope of variables is themselves.

Actually, no. The *declaration* scope is themselves. The initial ref scope is whatever the variable is initialized with, or just null if nothing. We could even have a bit for "could be null". You might get some null-checking out of this for free. But then you'd need more attributes in the signature to indicate "could be null!" But crashing due to null is not considered a safety issue (I think!), so I haven't gone there yet.

> 3) Each time a variable (or something reachable through a variable) is assigned (returning is assignment to the return value), i.e. for each location in the function that an assignment happens, the new scope ref will be:
>
> 3a) the scope of the source, if it is larger or equal to the old scope

If scope depth is >= 1, you inherit the maximum of the source and the target. If it's 0, you do a bitwise OR on the mystery scopes (unless the compiler can easily prove it doesn't need to), so you can accumulate all possible origins of the assigned-to scope.

> 3b) otherwise (for disjunct scopes, or assignment from smaller to larger scope), it is an error (could potentially violate guarantees)

I don't have "disjunct scopes". There's just greater than and less than. The mystery scopes are for figuring out what the parameter attributes are, and in the absence of inference, causing errors in safe code for the parameters not being accurately marked.

> 4) If a source scope refers to a variable (apart from the destination itself), for which not all assignments have been processed yet, it is put into a queue, to be evaluated later. For code like `a = b; b = a;` there can be dependency cycles. Such code will be disallowed.

No, my system is simpler. I want to make this proposal appealing from the implementation side as well as from the language side. You analyze the code in lexical order:

T* dum(T* a) {
  T* b = a; // b accumulates a
  return b; // okay... lexical ordering, b has a only
  T c;
  b = &c; // now b accumulates scopedepth(1);
  return b; // error here, but *only* here
}

The whole process relies on accumulating the scopes as the compiler encounters them. There are cases of branching conditional, combined with goto labels, or the beginnings of loops, where the logical order could be different from the lexical order. Only *these* cases are pushed onto an array and revisited when the branching conditional is complete. Because it's more likely (possibly mathematically certain) to catch all problems, these statements are "reheated" in reverse order. My reasoning for this is to keep compiler passes to a minimum, to save compilation time. In theory, all the scope assignments could be traversed again and again, until no scope was left unturned, so to say, but I wanted to come up with something with what you call an O(1) compilation time.

Honestly, it's almost impossible to say what the tax in compilation time will be until something's implemented (something I learned from Walter).

> How exactly the scope of a complex expression has to be computed is left open here.

If you call a function, the return value (if a reference) will have a scope which can be deduced from the function signature. You inherit the scope of what you pass accordingly, and pass those scopes on to the next function (if you're in a function chain), or the "out!" parameters, if need be:

T* fun(return T* a, T* b, out!b T** c); // signature only

void gun() {
  T e; // local
  T* f;
  T** g = new T*;
  f = fun(&e, f, g); // f inherits scope of(&e), g inherits f
}

The results of a called function are just inherited as indicated by the function signature. I don't know what other kinds of "complex expression" you are referring to.

> In the end, if there was no error, all variables, parameters and the return value will have a minimum reference scope assigned. If that scope is the variable itself, they can be inferred as `scope`. If it is a parameter, that parameter get an `out!identifier` or `return` annotation.

The function's final return scope is used to assign "return" to the parameter attributes for the final function signature, in the case of attribute inference, and the parameter attributes are used to deduce the return scope when the function is called.

> Note that the order in which the "assignments" occur inside the function doesn't matter. This is more restrictive than strictly necessary, but it's certainly ok in most cases, easy to work around when not, and it doesn't require data/control flow analysis.

This is different from my proposal. I aim to just go in lexical order, with a little extra work done in when lexical order is detected as possibly being different from the logical order (in a conditional inside a loop).

> (By the way: inference cannot work for recursive functions.)

I would like to see a "best effort" approach taken for solving the problem of recursive function inference. I think a function should be considered "innocent until proven guilty" as regards 'pure', for example. It's one of those things which seems like it's really hard to screw up. How could a function which is otherwise pure become impure just because it calls itself?

T hun(...) {
  [no impure code]
  hun(...);
  [no impure code]
}

I may be wrong, but I can't figure out how this function could magically become impure just because it calls itself. The same goes for the other attributes. And you can use the same trick, of pushing questionable expressions onto a stack or array, and just revisiting them at the end of the function to check for attribute violations. But I admit I don't really understand why attributes can't be inferred with recursive calls in the general case. Maybe somebody can explain to me what I'm missing here.

> Your example:
>
> T* fun(T* a, T** b) {
>     // => S(a) = a
>     // => S(b) = b
>     // => S(return) = <doesn't matter>
>     T* c; // == (T*).init == null
>     // => S(c) = c
>     c = new T;
>     // `new` returns static, which is wider than c

`c's reference hasn't been assigned until now, so it's neither wider nor narrower. We're not tracking null references yet, so I'm just treating them like they're global.

>     // => S(c) = static
>     c = a;
>     // => invalid, narrowing not allowed
>     // (this is what I asked about, and now I
>     // see why it's necessary)

Actually this is fine, I think. Even if `c` inherited something narrower than "new T" (i.e. depth 1), it would be fine, because it would now be considered depth(1) and could no longer be copied to anything with depth <1. It might or might not store a global, but for safety reasons it must now be treated with the narrowest it could possibly have. The error now would be if you copied it *back* to a parameter or a global. (Difference between `c's declaration scope `&c` = (1), and its reference scope = null, until otherwise assigned.)

>     // let's assume it didn't happen, so that
>     // the next two statements work
>     *b = c;
>     // => S(b) = S(c) = static
>     return c;
>     // => S(return) = S(c) = static
> }

This would be fine, since your code only has a `new T` and a `T*` parameter copied to c so far. In the case of inference, the function now infers: "T fun(return T* a, out!a T** b)". In the absence of inference, it gives errors on both counts (in @safe code of course, as always). And we're not tracking null yet (which is a different issue), so I won't worry about that. Also, in non-branching code, the compiler could actually know that c was no longer null at this time.

> Something else that needs consideration: What happens when parameters alias each other? I think it is ok, because the checking phase will naturally prohibit calling functions in a way that would break the guarantees, but I haven't thought it through completely.

I'm not sure what you mean. I don't think it's a problem.

>> I'm actually thinking of reusing `noscope` as a function attribute (`@noscope` perhaps) which says that the function may return a heap or global reference. This is all that's necessary to complete an ownership system. If a scope has exactly 1 "mystery" bit set, and is known not to come from the heap or a global, then you know that it *must* contain a reference to exactly the parameter for which the mystery bit is set. You know exactly what it contains == ownership.
>
> I will have to think about this, but I believe you cannot express such concepts as deadalnix's islands, or "const borrowing". But maybe, if we're lucky, I'm wrong :-)

We'll see!
February 27, 2015
On Friday, 27 February 2015 at 00:44:21 UTC, deadalnix wrote:
> On Thursday, 26 February 2015 at 22:45:19 UTC, Zach the Mystic wrote:
>> I'm starting to see what you mean. I guess it's only applicable to variables with double (or more) indirections (e.g. T**, T***, etc.), since only they can lose information with transitive scopes. Looks like we need a new rule: variables assigning to one of their double indirections cannot acquire a scope-depth greater than (or lifetime less than) their current one. Does that fix the problem?
>
> Cool. I think that can work (I'm not 100% convinced, but at least something close to that should work). But that is probably too limiting.
>
> Hence the proposed differentiation of lvalue and rvalues.

Yeah, wasn't completely clear. I meant to say:

Variables assigning to one of their double indirections cannot acquire a scope-depth greater than (or lifetime less than) their current longest-lived one. Also, bear in mind, a parameter could be an "lvalue":

void fun(T* a, T** b) {
  *b = a;
}

I guess its just better to use "source" and "targer" than lvalue and rvalue.

Also bear in mind that in the worst case scenario, any code can be made to work by putting it into the newly approved-of idiom: The @trusted Lambda! We want a safety mechanism conservative enough to catch all failures, accurate enough to avoid too many false positives (thus minimizing @trusted lambdas), easy enough to implement, and which doesn't tax compile time too heavily. The magic Four! I still have a few doubts (recursive inference, for example, which can probably be improved), but not too many.
February 27, 2015
It is necessary to use lvalue/rvalues, as it is not just assignment. Passing thing as ref parameter for instance, needs to follow these rules.
February 27, 2015
I put my own version into the Wiki, building on yours:
http://wiki.dlang.org/User:Schuetzm/scope2

It's quite similar to what you propose (at least as far as I understand it), and there are a few further user-facing simplifications, and provisions for backward compatibility. I intentionally kept it as concise as possible; there are neither justifications for particular decisions, nor any implementation details, nor examples. These can be added later.

For me, it's important to keep the implementation details and algorithms separate from the basic workings. Otherwise it's hard for me to fully understand it in all aspects.
February 27, 2015
On Friday, 27 February 2015 at 22:10:11 UTC, Marc Schütz wrote:
> I put my own version into the Wiki, building on yours:
> http://wiki.dlang.org/User:Schuetzm/scope2
>
> It's quite similar to what you propose (at least as far as I understand it), and there are a few further user-facing simplifications, and provisions for backward compatibility. I intentionally kept it as concise as possible; there are neither justifications for particular decisions, nor any implementation details, nor examples. These can be added later.

I like this phrase: "Because all relevant information about lifetimes is contained in the function signature..." This keeps seeming more and more important to me. There's no other place functions can "talk" to each other -- and they *really* need to talk to each other for any of these advanced features to work well. I'm pretty sure it's really the function signature which needs designing -- what to add, what can be deduced (and therefore not added), and how to express them all elegantly and simply. And of course, my favorite Castle in the Sky: attribute inference!

I won't really know how your proposal works until I see code examples.

> For me, it's important to keep the implementation details and algorithms separate from the basic workings. Otherwise it's hard for me to fully understand it in all aspects.

Okay, but hopefully some examples are forthcoming, cause they help *me* think.
February 27, 2015
I think I have an inference algorithm that works. It can infer the required scope levels for local variables given the constraints of function parameters, and it can even infer the annotations for the parameters (in template functions). It can also cope with local variables that are explicitly declared as `scope`, though these are mostly unnecessary.

Interestingly, the rvalue/lvalue problem deadalnix found is only relevant during assignment checking, but not during inference. That's because we are free to widen the scope of variables that are to be inferred as needed.

It's based on two principles:

* We start with the minimum possible scope a variable may have, which is empty for local variables, and its own lifetime for parameters.
* When a scoped value is stored somewhere, it is then reachable through the destination. Therefore, assuming the source's scope is fixed, the destination's scope must be widened to accommodate the source's scope.
* From the opposite viewpoint, a value that is to be stored somewhere must have at least the destination's scope. Therefore, assuming the destination's scope is fixed, the source's scope needs to be widened accordingly.

I haven't formalized it yet, but I posted a very detailed step-by-step demonstration on my wiki talk page (nicer to read because it has syntax highlighting):
http://wiki.dlang.org/User_talk:Schuetzm/scope2

I will also add examples how return and static annotations are handled.
February 27, 2015
On Friday, 27 February 2015 at 23:05:39 UTC, Zach the Mystic wrote:
> On Friday, 27 February 2015 at 22:10:11 UTC, Marc Schütz wrote:
>> I put my own version into the Wiki, building on yours:
>> http://wiki.dlang.org/User:Schuetzm/scope2
>>
>> It's quite similar to what you propose (at least as far as I understand it), and there are a few further user-facing simplifications, and provisions for backward compatibility. I intentionally kept it as concise as possible; there are neither justifications for particular decisions, nor any implementation details, nor examples. These can be added later.
>
> I like this phrase: "Because all relevant information about lifetimes is contained in the function signature..." This keeps seeming more and more important to me. There's no other place functions can "talk" to each other -- and they *really* need to talk to each other for any of these advanced features to work well. I'm pretty sure it's really the function signature which needs designing -- what to add, what can be deduced (and therefore not added), and how to express them all elegantly and simply. And of course, my favorite Castle in the Sky: attribute inference!
>
> I won't really know how your proposal works until I see code examples.
>
>> For me, it's important to keep the implementation details and algorithms separate from the basic workings. Otherwise it's hard for me to fully understand it in all aspects.
>
> Okay, but hopefully some examples are forthcoming, cause they help *me* think.

Yes, definitely! I already started with the inference algorithm, see the other post. But I'll go to bed now, it's already past midnight.