Jump to page: 1 26  
Page
Thread overview
My Reference Safety System (DIP???)
Feb 25, 2015
Zach the Mystic
Feb 25, 2015
deadalnix
Feb 26, 2015
Zach the Mystic
Feb 26, 2015
Zach the Mystic
Feb 26, 2015
Zach the Mystic
Feb 26, 2015
deadalnix
Feb 26, 2015
Zach the Mystic
Feb 27, 2015
deadalnix
Feb 27, 2015
Zach the Mystic
Feb 27, 2015
deadalnix
Feb 27, 2015
Marc Schütz
Feb 27, 2015
deadalnix
Feb 28, 2015
Marc Schütz
Feb 28, 2015
Marc Schütz
Mar 01, 2015
deadalnix
Mar 02, 2015
Marc Schütz
Feb 28, 2015
Zach the Mystic
Feb 28, 2015
Marc Schütz
Feb 25, 2015
Marc Schütz
Feb 25, 2015
H. S. Teoh
Feb 26, 2015
Marc Schütz
Feb 26, 2015
Zach the Mystic
Feb 26, 2015
Marc Schütz
Feb 27, 2015
Zach the Mystic
Feb 27, 2015
Marc Schütz
Feb 27, 2015
Zach the Mystic
Feb 27, 2015
Marc Schütz
Feb 28, 2015
Marc Schütz
Mar 01, 2015
Zach the Mystic
Mar 01, 2015
Zach the Mystic
Mar 01, 2015
Marc Schütz
Mar 01, 2015
Zach the Mystic
Mar 02, 2015
deadalnix
Mar 02, 2015
Zach the Mystic
Mar 02, 2015
Zach the Mystic
Mar 02, 2015
deadalnix
Mar 02, 2015
Zach the Mystic
Mar 02, 2015
deadalnix
Mar 02, 2015
Marc Schütz
Mar 02, 2015
deadalnix
Mar 03, 2015
Marc Schütz
Mar 02, 2015
Zach the Mystic
Mar 02, 2015
deadalnix
Mar 02, 2015
Zach the Mystic
Mar 02, 2015
deadalnix
Mar 02, 2015
Zach the Mystic
Mar 03, 2015
deadalnix
Mar 03, 2015
Zach the Mystic
Mar 03, 2015
deadalnix
Mar 03, 2015
deadalnix
Mar 02, 2015
Marc Schütz
February 25, 2015
So I've been thinking about how to do safety for a while, and this is how I would do it if I got to start from scratch. I think it can be harnessed to D, but I'm worried that people will be confused by it, or that there might be a show-stopping use case I haven't thought of, or that it is simply too cumbersome to be taken seriously, but I'll make a DIP when it overcomes these three obstacles.

I'm feeding off the momentum built by the approval of DIP25, and off of other recent `scope` proposals:
http://wiki.dlang.org/DIP25
http://wiki.dlang.org/User:Schuetzm/scope
http://wiki.dlang.org/DIP69

This system goes farther than either DIP25 or DIP69 towards complete safety, but is simpler and easier to implement I (I think) than Mark Schutz's and deadalnix's proposal. It is not an ownership or reference counting system, but can serve as the foundation to one. Which leads to...

Principle 1: Memory safety is indispensable to ownership, but not the other way around. Memory safety focuses on all the things which *might* happen, and casts a wide net, akin to an algebraic union, whereas ownership targets specific things, focuses on what *will* happen, and is akin to the algebraic intersection of things. I will therefore present the memory safety system first, leave grafting an ownership system on top of it for later.

Principle 2: The Function is the key unit of memory safety. The compiler must never need to leave the function it is compiling to verify that it is safe. This means that no information important to safety can be excluded from the signatures of the functions that the compiling function is calling. This principle has already been conceded in part by Walter and Andrei's acceptance of `return ref` parameters in DIP25, which simply implements the most common use case where safety is needed. Here I am taking this principle to the extreme, in the interest of total safety. But speaking of function signatures,

Principle 3: Extra function and parameter attributes are the tradeoff for great memory safety. There is no other way to support both encapsulation of control flow (Principle 2) and the separate-compilation model (indispensable to D). Function signatures pay the price for this with their expanding size. I try to create the new attributes for the rare case, as opposed to the common one, so that they don't appear very often.

Principle 4: Scopes. My system has its own notion of scopes. They are compile time information, used by the compiler to ensure safety. Every declaration which holds data at runtime must have a scope, called its "declaration scope". Every reference type (defined below in Principle 6) will have an additional scope called its "reference scope". A scope consists of a very short bit array, with a minimum of approximately 16 bits and reasonable maximum of 32, let's say. For this proposal I'm using 16, in order to emphasize this system's memory efficiency. 32 bits would not change anything fundamental, only allow the compiler to be a little more precise about what's safe and what's not, which is not a big deal since it conservatively defaults to @system when it doesn't know.

So what are these bits? Reserve 4 bits for an unsigned integer (range 0-15) I call "scopedepth". Scopedepth is easier for me to think about than lifetime, of which it is simply the inverse, with (0) scopedepth being infinite lifetime, 1 having a lifetime at function scope, etc. Anyway, a declaration's scopedepth is determined according to logic similar that found in DIP69 and Mark Schutz's proposal:

int r; // declaration scopedepth(0)

void fun(int a /*scopedepth(0)*/) {
  int b; // depth(1)
  {
    int c; // depth(2)
    {
      int d; // (3)
    }
    {
      int e; // (3)
    }
  }
  int f; // (1)
}

Principle 5: It's always un@safe to copy a declaration scope from a higher scopedepth to a reference variable stored at lower scopedepth. DIP69 tries to banish this type of thing only in `scope` variables, but I'm not afraid to banish it in all @safe code period:

void gun() @safe {
  T* t; // t's declaration depth: 1
  T u;
  {
    T* uu = &u; // fine, this is normal
    T tt;
    t = &tt; // t's reference depth: 2, error, un@safe
  }
  // now t is corrupted
}

So you'd have to enclose "t = &tt;" above in a @trusted lambda or a @system block. The truth is, it is absurd to copy the address of something with shorter lifetime into something with longer lifetime... what use would you ever have for it in the longer-lived variable? I'm therefore simplifying the system by making all instances of this unsafe.

Looking at Principle 5, I realize I forgot:

Principle 6: Reference variables: Any data which stores a reference is a "reference variable". That includes any pointer, class instance, array/slice, `ref` parameter, or any struct containing any of those. For the sake of simplicity, I boil _all_ of these down to "T*" in this proposal. All reference types are effectively the _same_ in this regard. DIP25 does not indicate that it has any interest in expanding beyond `ref` parameters. But all reference types are unsafe in exactly the same way as `ref` is. (By the way, see footnote [1] for why I think `ref` is much different from `scope`). I don't understand the restriction of dIP25 to `ref` paramteres only. Part of my system is to expand `return` parameter to all reference types.

Principle 7: In this system, all scopes are *transitive*: any reference type with double indirections inherits the scope of the outermost reference. Think of it this way:

T** grun() {
  T** tpp = new T*; // reference scopedepth(0)
  return tpp; // fine, safe

  static T st; // decl depth(0)
  T* tp = &st; // ref depth(0)
  *tpp = tp;
  return tpp; // safe, all depths still 0

  T t; // decl depth(1)
  tp = &t; // tp reference depth now (1)
  *tpp = &tp; // safe, depths all 1
  return tpp; // un@safe
}

If a reference type contains *any* pointer, no matter how indirect, to a local scope, the *whole* type is corrupted when the scope finishes.

Principle 8: Any time a reference is copied, the reference scope inherits the *maximum* of the two scope depths:

T* gru() {
  static T st; // decl depth(0)
  T t; // decl depth(1)
  T* tp = &t; // ref depth(1)
  tp = &st; // ref depth STILL (1)
  return tp; // error!
}

If you have ever loaded a reference with a local scope, it retains that scope level permanently, ensuring the safety of the reference.

Whatever your worries about scopedepth, I want to introduce the purpose of the other 12 bits in a scope.

I said a scope consisted of 16 bits, and I only used 4 so far. What are the other 12 for, then? Simple, we need one bit for each of the function's parameters. Let's reserve 8 bits for them. All references copied to or from the 8th parameter or above are treated as if they copied to *all* of them. Very few functions will do this, so we paint them all with a broad brush, for safety reasons. (Likewise, all scopedepths above 15 are treated the same.)

We have 4 bits left. These are for the "special" parameters: One for the implicit `this` parameter of member functions, one bit for the context of a nested function, one special bit to symbolize access to or from global or heap variables, and one bit left over in case I missed something. Remember, the "luxury" version would have a whole 32, or even 64 bits to play around with, but 16 will suffice in most cases.

Each of the functions parameters is initialized with its own bit set. All these bits represent "mystery scopes" -- that is, we don't know what their scope is in the calling function, but:

Principle 8: We don't need to know! For all intents and purposes, a reference parameter has infinite lifetime for the duration of the function it is compiled in. Whenever we copy any reference, we do a bitwise OR on *all* of the mystery scopes. The new reference accumulates every scope it has ever had access to, directly or indirectly.

T* fun(T* a, T* b, T** c) {
  // the function's "return scope" accumulates `a` here
  return a;
  T* d = b; // `d's reference scope accumulates `b`

  // the return scope now accumulates `b` from `d`
  return d;

  *c = d; // now mutable parameter `c` gets `d`

  static T* t;
  *t = b; // this might be safe, but only the caller can know
}

All this accumulation results in the implicit function signature:

T* fun(return T* a, // DIP25
       return noscope T* d, // DIP25 and DIP71
       out!b T** c  // from DIP71
       ) @safe;

(See footnote [2] for a comment on on the `out!` and `noscope` attributes.)

Principle 9: When calling a function, DIP25 (expanded to all reference types) in combination with DIP71 gives you everything you need to know to ensure total memory safety. If we have a function signature:

T* gun(return T* a, noscope T* b, out!b T** c) @safe;

T* hun(return T* a1, T** b2) {
  T t;
  T* tp, tp2;
  tp = new T; // depth zero
  tp2 = gun(a1,  // tp2 accumulates a1 based on fun()'s signature
           tp, // okay to copy a new T to a global pointer
           b2); // b2 now loaded with tp's global only scope
  return tp2; // okay, all we have so far is a1, marked `return`

  tp = &t; // tp now loaded with local t's scope
  return gun(tp, // error, gun() inherits tp's local scope
             tp2, // tp2 has a1 only right now
             b2, // error, b2 not marked `out!a1`
}

The point is that there's nothing gun() can do to corrupt hun() on its own, since all its exits are blocked.

Principle 10: You'll probably have noticed that all scopes accumulate each other according to lexical ordering, and that's good news, because any sane person assigns and return references in lexical order. The fun part of this proposal is that for 99.99% of uses the safety mechanism will catch the load ordering accurately on the first pass, with hardly any compiler effort. It's safe because it accumulates and never loses information. But there is a way to break this system, although there are only two types of people who would ever do it: malicious programmers trying to break the safety system, and fools. This is how you do it:

T* what() {
  T t;
  T* yay;
  foreach(i; 1..4) {
    if (i == 3)
      yay = new T;
    else if (i == 2)
      return yay;
    else if (i == 1)
      yay = &t;
  }
}

The good news is that even this kind of malicious coding can be detected. The bad news is that checking for this 0.01% of code may take up an unfriendly amount of compile time. Here's the way I thought of to check even for this malicious code:

The lexical ordering can only be different from the logical order of execution when one is inside a branching conditional which is inside a "jumpback" situation, where the code can be revisited. A jumpback can only occur after a jump label has been found (rare), or inside a loop (common). Anytime a reference is copied under the potentially dangerous condition, push the statement that copied it onto a stack. When the end of the conditional has been reached, revisit each statement in reverse order and "reheat" the relevant scopes.

Aside from this unfortunate "gotcha", D would be 100% memory safe with this system (at least in single-threaded code -- exceptions and thread safety different issues I haven't fully thought through).

Conclusion

1. With this system as foundation, an effective ownership system is easily within reach. Just confine the outgoing scopes to a single parameter and no globals, and you have your ownership. You might need another (rare) function attribute to help with this, and a storage class (e.g. `scope`, `unique`) to give you an error when you do something wrong, but the groundwork is 90% laid.

2. Do I realize that it's weird dressing up function parameters with so much information about what they do? Yes I do. But I think it's important to see what 100% safety would actually look like, even if it's rejected on account of being too burdensome. And it wouldn't even *be* burdensome if attribute inference were made uniform throughout the language. The function signatures could then appear dressed up in their full glory typically only in compiler generated interface files, and other places where programmers, not compilers, wanted them. Anyway, this is my reference safety system. Pop it with your needles!

[1] The problems with `ref` come from the fact that it is the only storage class which changes the way a program works without giving you an error:

void notRef(/*ref*/ int a) { ++a; }
void yesRef(  ref   int a) { ++a; }

void test() {
  int a = 0;
  yesRef(a); // a == 1
  notRef(a); // a still 1
}

Both yesRef() and notRef() are accepted, but it changes what happens which one you use. Adding or subtracting any other attribute will at most give you an error, but won't silently change things. `ref`, an "immutable pointer with value semantics," is a complicated beast, a type but not a type. I say this because `scope` and its variants are not so complicated. `scope` is like most other attributes. All is does is help the compiler optimize things and generate errors when misused. Its presence or absence will never change what the program actually does, and therefore it should not be lumped together with the problems associated with `ref`. [End 1]

[2] Since the discussion to DIP71:

http://forum.dlang.org/post/xjhvpmjrlwhhgeqyoipv@forum.dlang.org

...which proposes `out!` and `noscope` parameters as a way of warning the caller what is done inside the function, I have started to consider the issue of ownership in addition to reference safety. I'm not wedded to the name `noscope` in the role I proposed for it. Mark Schutz suggested reusing keyword `static` instead, to indicate that a reference is copied to a global variable. This may be wise, in light of the fact that an ownership system may require something like `noscope` for a subtly different purpose. But there's no point in discussing details unless the whole proposal gains traction first. [End 2]
February 25, 2015
On Wednesday, 25 February 2015 at 01:12:15 UTC, Zach the Mystic wrote:
> So what are these bits? Reserve 4 bits for an unsigned integer (range 0-15) I call "scopedepth". Scopedepth is easier for me to think about than lifetime, of which it is simply the inverse, with (0) scopedepth being infinite lifetime, 1 having a lifetime at function scope, etc. Anyway, a declaration's scopedepth is determined according to logic similar that found in DIP69 and Mark Schutz's proposal:
>
> int r; // declaration scopedepth(0)
>
> void fun(int a /*scopedepth(0)*/) {
>   int b; // depth(1)
>   {
>     int c; // depth(2)
>     {
>       int d; // (3)
>     }
>     {
>       int e; // (3)
>     }
>   }
>   int f; // (1)
> }
>

You have element of differing lifetime at scope depth 0 so far.

> Principle 5: It's always un@safe to copy a declaration scope from a higher scopedepth to a reference variable stored at lower scopedepth. DIP69 tries to banish this type of thing only in `scope` variables, but I'm not afraid to banish it in all @safe code period:
>
> void gun() @safe {
>   T* t; // t's declaration depth: 1
>   T u;
>   {
>     T* uu = &u; // fine, this is normal
>     T tt;
>     t = &tt; // t's reference depth: 2, error, un@safe
>   }
>   // now t is corrupted
> }
>

Bingo. However, when you throw goto into the mix, weird thing happens. The general idea is good but need refining.

> Principle 6: Reference variables: Any data which stores a reference is a "reference variable". That includes any pointer, class instance, array/slice, `ref` parameter, or any struct containing any of those. For the sake of simplicity, I boil _all_ of these down to "T*" in this proposal. All reference types are effectively the _same_ in this regard. DIP25 does not indicate that it has any interest in expanding beyond `ref` parameters. But all reference types are unsafe in exactly the same way as `ref` is. (By the way, see footnote [1] for why I think `ref` is much different from `scope`). I don't understand the restriction of dIP25 to `ref` paramteres only. Part of my system is to expand `return` parameter to all reference types.
>

Bingo 2!

> Principle 7: In this system, all scopes are *transitive*: any reference type with double indirections inherits the scope of the outermost reference. Think of it this way:
>

It is more complex than that, and this is where most proposals fail short (including this one and DIP69). If you want to disallow the assignment of a reference to something with a short lifetime, you can't consider scope transitive when used as a lvalue. You can, however, consider it transitive when used as an rvalue.

The more general rule is that you want to consider the largest possible lifetime of an lvalue, and the smallest possible one for an rvalue.

When going through an indirection, that will differ, unless we choose to tag all indirections, which is undesirable.

> Principle 8: Any time a reference is copied, the reference scope inherits the *maximum* of the two scope depths:
>

That makes control flow analysis easier, so I can buy this :)

> Principle 8: We don't need to know! For all intents and purposes, a reference parameter has infinite lifetime for the duration of the function it is compiled in. Whenever we copy any reference, we do a bitwise OR on *all* of the mystery scopes. The new reference accumulates every scope it has ever had access to, directly or indirectly.
>

That would allow to copy a parameter reference to a global, which is dead unsafe.

There is some goodness in there. Please address my comment and tell me if I'm wrong, but I think you didn't covered all bases.
February 25, 2015
I didn't yet have much time to look at it closely enough, but I'll already make some comments.

On Wednesday, 25 February 2015 at 01:12:15 UTC, Zach the Mystic wrote:
> Principle 3: Extra function and parameter attributes are the tradeoff for great memory safety. There is no other way to support both encapsulation of control flow (Principle 2) and the separate-compilation model (indispensable to D). Function signatures pay the price for this with their expanding size. I try to create the new attributes for the rare case, as opposed to the common one, so that they don't appear very often.

IIRC H.S. Teoh suggested a change to the compilation model. I think he wants to expand the minimal compilation unit to a library or executable. In that case, inference for all kinds of attributes will be available in many more circumstances; explicit annotation would only be necessary for exported symbols.

Anyway, it is a good idea to enable scope semantics implicitly for all references involved in @safe code. As far as I understand it, this is something you suggest, right? It will eliminate annotations except in cases where a parameter is returned, which - as you note - will probably be acceptable, because it's already been suggested in DIP25.

>
> Principle 4: Scopes. My system has its own notion of scopes. They are compile time information, used by the compiler to ensure safety. Every declaration which holds data at runtime must have a scope, called its "declaration scope". Every reference type (defined below in Principle 6) will have an additional scope called its "reference scope". A scope consists of a very short bit array, with a minimum of approximately 16 bits and reasonable maximum of 32, let's say. For this proposal I'm using 16, in order to emphasize this system's memory efficiency. 32 bits would not change anything fundamental, only allow the compiler to be a little more precise about what's safe and what's not, which is not a big deal since it conservatively defaults to @system when it doesn't know.

This bitmask seems to be mostly an implementation detail. AFAIU, further below you're introducing some things that make it visible to the user. I'm not convinced this is a good idea; it looks complicated for sure.

I also think it is too coarse. Even variables declared at the same lexical scope have different lifetimes, because they are destroyed in reverse order of declaration. This is relevant if they contain references and have destructors that access the references; we need to make sure that no reference to a destroyed variable can be kept in a variable whose destructor hasn't yet run.

>
> So what are these bits? Reserve 4 bits for an unsigned integer (range 0-15) I call "scopedepth". Scopedepth is easier for me to think about than lifetime, of which it is simply the inverse, with (0) scopedepth being infinite lifetime, 1 having a lifetime at function scope, etc. Anyway, a declaration's scopedepth is determined according to logic similar that found in DIP69 and Mark Schutz's proposal:
>
> int r; // declaration scopedepth(0)
>
> void fun(int a /*scopedepth(0)*/) {

(Already pointed out by deadalnix.) Why do parameters have the same depth as globals?

>   int b; // depth(1)
>   {
>     int c; // depth(2)
>     {
>       int d; // (3)
>     }
>     {
>       int e; // (3)
>     }
>   }
>   int f; // (1)
> }
>
> Principle 5: It's always un@safe to copy a declaration scope from a higher scopedepth to a reference variable stored at lower scopedepth. DIP69 tries to banish this type of thing only in `scope` variables, but I'm not afraid to banish it in all @safe code period:

For backwards compatibility reasons, it might be better to restrict it to `scope` variables. But as all references in @safe code should be implicitly `scope`, this would mostly have the same effect.

> Principle 6: Reference variables: Any data which stores a reference is a "reference variable". That includes any pointer, class instance, array/slice, `ref` parameter, or any struct containing any of those. For the sake of simplicity, I boil _all_ of these down to "T*" in this proposal. All reference types are effectively the _same_ in this regard. DIP25 does not indicate that it has any interest in expanding beyond `ref` parameters. But all reference types are unsafe in exactly the same way as `ref` is. (By the way, see footnote [1] for why I think `ref` is much different from `scope`). I don't understand the restriction of dIP25 to `ref` paramteres only. Part of my system is to expand `return` parameter to all reference types.

Fully agree with the necessity to apply it to all kinds of references, of course.

> Principle 8: Any time a reference is copied, the reference
  ^^^^^^^^^^^
  Principle 7 ?
> scope inherits the *maximum* of the two scope depths:
>
> T* gru() {
>   static T st; // decl depth(0)
>   T t; // decl depth(1)
>   T* tp = &t; // ref depth(1)
>   tp = &st; // ref depth STILL (1)
>   return tp; // error!
> }
>
> If you have ever loaded a reference with a local scope, it retains that scope level permanently, ensuring the safety of the reference.

Why is this rule necessary? Can you show an example what could go wrong without it? I assume it's just there to ease implementation (avoids the need for data flow analysis)?

> T* fun(T* a, T* b, T** c) {
>   // the function's "return scope" accumulates `a` here
>   return a;
>   T* d = b; // `d's reference scope accumulates `b`
>
>   // the return scope now accumulates `b` from `d`
>   return d;
>
>   *c = d; // now mutable parameter `c` gets `d`
>
>   static T* t;
>   *t = b; // this might be safe, but only the caller can know
> }
>
> All this accumulation results in the implicit function signature:
>
> T* fun(return T* a, // DIP25
>        return noscope T* d, // DIP25 and DIP71
>        out!b T** c  // from DIP71
>        ) @safe;

I supposed that's about attribute inference?

> Principle 10: You'll probably have noticed that all scopes accumulate each other according to lexical ordering, and that's good news, because any sane person assigns and return references in lexical order.

As you say, that's broken. But why does it need to be in lexical order in the first place? I would simply analyze the entire function first, assign reference scopes, and disallow circular relations (like `a = b; b = a;`).

> Conclusion
>
> 1. With this system as foundation, an effective ownership system is easily within reach. Just confine the outgoing scopes to a single parameter and no globals, and you have your ownership. You might need another (rare) function attribute to help with this, and a storage class (e.g. `scope`, `unique`) to give you an error when you do something wrong, but the groundwork is 90% laid.

It's not so simple at all. For full-blown unique ownership, there needs to be some kind of borrow-checking like in Rust. I have some ideas how a simple borrow-checker can be implemented without much work (without data flow analysis as Rust does). It's basically my "const borrowing" idea (whose one flaw incidentally cannot be triggered by unique types, because it is conditioned on the presence of aliasing).

There are still some things in the proposal that I'm sure can be simplified. We probably don't need new keywords like `noscope`. I'm not even sure the concept itself is needed.

That all said, I think you're on the right track. The fact that you don't require a new type modifier will make Walter very happy. This looks pretty good!
February 25, 2015
On Wed, Feb 25, 2015 at 09:26:31PM +0000, via Digitalmars-d wrote: [...]
> On Wednesday, 25 February 2015 at 01:12:15 UTC, Zach the Mystic wrote:
> >Principle 3: Extra function and parameter attributes are the tradeoff for great memory safety. There is no other way to support both encapsulation of control flow (Principle 2) and the separate-compilation model (indispensable to D). Function signatures pay the price for this with their expanding size. I try to create the new attributes for the rare case, as opposed to the common one, so that they don't appear very often.
> 
> IIRC H.S. Teoh suggested a change to the compilation model. I think he wants to expand the minimal compilation unit to a library or executable. In that case, inference for all kinds of attributes will be available in many more circumstances; explicit annotation would only be necessary for exported symbols.

I don't remember making any such suggestion... the closest I can think of is the idea that attribute inference should always be done, and saved as part of the emitted object file(s), perhaps even in generated .di files that contain all inferred attributes. When importing some module, the compiler would read the inferred attributes from the saved information. Programmers won't even need to write any attributes except when they want to override the compiler's inference, but the code will automatically get the benefit of all inferred attributes. Library users would also benefit by having all inferred attributes available in the auto-generated .di files. This can be made to work regardless of what the minimal compilation unit is.

Automatic inference also frees us from the concern that functions have too many attributes -- if the compiler will automatically infer most of them for us, we can freely add all sorts of attributes without worrying that it will become impractically verbose to write. Saving this info as part of the object file also lets the compiler take advantage of these extra attributes even when source code isn't available, or perform whole-program optmizations based on them.


T

-- 
They say that "guns don't kill people, people kill people." Well I think the gun helps. If you just stood there and yelled BANG, I don't think you'd kill too many people. -- Eddie Izzard, Dressed to Kill
February 26, 2015
On Wednesday, 25 February 2015 at 18:08:55 UTC, deadalnix wrote:
> On Wednesday, 25 February 2015 at 01:12:15 UTC, Zach the Mystic wrote:
>> int r; // declaration scopedepth(0)
>>
>> void fun(int a /*scopedepth(0)*/) {
>>  int b; // depth(1)
>>  {
>>    int c; // depth(2)
>>    {
>>      int d; // (3)
>>    }
>>    {
>>      int e; // (3)
>>    }
>>  }
>>  int f; // (1)
>> }
>>
>
> You have element of differing lifetime at scope depth 0 so far.

Sorry for the delay.

I made a mistake. Parameter `a` will have a *declaration* scope of 1, just like int b above. It's *reference* scope will have depth 0, with the "mystery" bit for the first parameter set.

>> Principle 5: It's always un@safe to copy a declaration scope from a higher scopedepth to a reference variable stored at lower scopedepth. DIP69 tries to banish this type of thing only in `scope` variables, but I'm not afraid to banish it in all @safe code period:
>>
>> void gun() @safe {
>>  T* t; // t's declaration depth: 1
>>  T u;
>>  {
>>    T* uu = &u; // fine, this is normal
>>    T tt;
>>    t = &tt; // t's reference depth: 2, error, un@safe
>>  }
>>  // now t is corrupted
>> }
>>
>
> Bingo. However, when you throw goto into the mix, weird thing happens. The general idea is good but need refining.

I addressed this further down, in Principle 10. My proposed solution has the compiler detecting the presence of code which could both 1) be visited again (through a jump label or a loop) and 2) is in a branching condition. In these cases it pushes any statement which copies a reference onto a special stack. When the branching condition finishes, it revisits the stack, "reheating" the scopes in reverse order. If there is a way to defeat this technique, it must be very convoluted, since the scopes do nothing but accumulate possibilities. It may even be mathematically impossible.

>> Principle 7: In this system, all scopes are *transitive*: any reference type with double indirections inherits the scope of the outermost reference. Think of it this way:
>>
>
> It is more complex than that, and this is where most proposals fail short (including this one and DIP69). If you want to disallow the assignment of a reference to something with a short lifetime, you can't consider scope transitive when used as a lvalue. You can, however, consider it transitive when used as an rvalue.
>
> The more general rule is that you want to consider the largest possible lifetime of an lvalue, and the smallest possible one for an rvalue.
>
> When going through an indirection, that will differ, unless we choose to tag all indirections, which is undesirable.

I'm unclear about what you're saying. Can you give an example in code?

>> Principle 8: Any time a reference is copied, the reference scope inherits the *maximum* of the two scope depths:
>>
>
> That makes control flow analysis easier, so I can buy this :)
>
>> Principle 8: We don't need to know! For all intents and purposes, a reference parameter has infinite lifetime for the duration of the function it is compiled in. Whenever we copy any reference, we do a bitwise OR on *all* of the mystery scopes. The new reference accumulates every scope it has ever had access to, directly or indirectly.
>>
>
> That would allow to copy a parameter reference to a global, which is dead unsafe.

Actually, it's not unsafe, so long as you have the parameter attribute `noscope` (or possibly `static`) working for you:

void fun(T* a) {
  static T* t;
  *t = a; // this might be safe
}

The truth is, this *might* be safe. It's only unsafe if the parameter `a` is located on the stack. From within the function, the compiler can't possibly know this. But if it forces you to mark `a` with `noscope` (or is allowed to infer the same), it tells the caller all it needs to know about `a`. Simply put, it's an error to pass a local to a `noscope` parameter. And it runs all the way down: any parameter which it itself passed to a `noscope` parameter must also be marked `noscope`. (Note: I'm actually preferring the name `static` at this point, but using `noscope` for consistency):

T* fun(noscope T* a) {
  static T* t;
  *t = a; // this might be safe
}

void tun(T* b) {
  T c;
  fun(&c); // error, local
  fun(b); // error, unless b also marked (or inferred) `noscope`
}

> There is some goodness in there. Please address my comment and tell me if I'm wrong, but I think you didn't covered all bases.

The only base I'm really worried about is the lvalue vs rvalue base. Hopefully we can fix that!
February 26, 2015
On Thursday, 26 February 2015 at 16:40:27 UTC, Zach the Mystic wrote:
>>> int r; // declaration scopedepth(0)
>>>
>>> void fun(int a /*scopedepth(0)*/) {
>>> int b; // depth(1)
>>> {
>>>   int c; // depth(2)
>>>   {
>>>     int d; // (3)
>>>   }
>>>   {
>>>     int e; // (3)
>>>   }
>>> }
>>> int f; // (1)
>>> }
>>>
>>
>> You have element of differing lifetime at scope depth 0 so far.
>
> Sorry for the delay.
>
> I made a mistake. Parameter `a` will have a *declaration* scope of 1, just like int b above. It's *reference* scope will have depth 0, with the "mystery" bit for the first parameter set.

That is, `a` would have such a reference scope is it were a reference type... :-)
February 26, 2015
On Thursday, 26 February 2015 at 16:42:30 UTC, Zach the Mystic wrote:
> That is, `a` would have such a reference scope is it were a reference type... :-)

s/is/if/

I seem to be making one more mistake for every mistake I correct.
February 26, 2015
On Wednesday, 25 February 2015 at 21:26:33 UTC, Marc Schütz wrote:
> IIRC H.S. Teoh suggested a change to the compilation model. I think he wants to expand the minimal compilation unit to a library or executable. In that case, inference for all kinds of attributes will be available in many more circumstances; explicit annotation would only be necessary for exported symbols.

You probably mean Dicebot:

http://forum.dlang.org/post/otejdbgnhmyvbyaxatsk@forum.dlang.org

> Anyway, it is a good idea to enable scope semantics implicitly for all references involved in @safe code. As far as I understand it, this is something you suggest, right? It will eliminate annotations except in cases where a parameter is returned, which - as you note - will probably be acceptable, because it's already been suggested in DIP25.

Actually you could eliminate `return` parameters as well, I think. If the compiler has the body of a function, which it usually does, then there shouldn't be a need to mark *any* of the covariant function or parameter attributes. I think it's the kind of thing which should "Just Work" in all these cases.

>> Principle 4: Scopes. My system has its own notion of scopes. They are compile time information, used by the compiler to ensure safety. Every declaration which holds data at runtime must have a scope, called its "declaration scope". Every reference type (defined below in Principle 6) will have an additional scope called its "reference scope". A scope consists of a very short bit array, with a minimum of approximately 16 bits and reasonable maximum of 32, let's say. For this proposal I'm using 16, in order to emphasize this system's memory efficiency. 32 bits would not change anything fundamental, only allow the compiler to be a little more precise about what's safe and what's not, which is not a big deal since it conservatively defaults to @system when it doesn't know.
>
> This bitmask seems to be mostly an implementation detail.

I guess I'm trying to win over the people who might think the system will cost too much memory or compilation time.

> AFAIU, further below you're introducing some things that make it visible to the user.

The only things I'm making visible to the user are things which *must* appear in the function signature for the sake of the separate compilation model. Everything else would be invisible, except the occasional false positive, where something actually safe is thought unsafe (the solution being to enclose the statement in a @trusted black or lambda).

> I'm not convinced this is a good idea; it looks complicated for sure.

It's not that complicated. My main fear is that it's too simple! Some of the logic may seem complicated, but the goal is to make it possible to compile a function without having to visit any other function. Everything is figured out "in house".

> I also think it is too coarse. Even variables declared at the same lexical scope have different lifetimes, because they are destroyed in reverse order of declaration. This is relevant if they contain references and have destructors that access the references; we need to make sure that no reference to a destroyed variable can be kept in a variable whose destructor hasn't yet run.

It might be too coarse. We could reserve a few more bits for depth-constant declaration order. At the same, time, it doesn't seem *that* urgent to me. But maybe I'm naive about this. Everything is being destroyed anyway, so what's the real danger?

>> Principle 5: It's always un@safe to copy a declaration scope from a higher scopedepth to a reference variable stored at lower scopedepth. DIP69 tries to banish this type of thing only in `scope` variables, but I'm not afraid to banish it in all @safe code period:
>
> For backwards compatibility reasons, it might be better to restrict it to `scope` variables. But as all references in @safe code should be implicitly `scope`, this would mostly have the same effect.

I guess this is the "Language versus Legacy" issue. I think D's strength is in it's language, not its huge legacy codebase. Therefore, I find myself going with the #pleasebreakourcode crowd, for the sake of extending D's lead where it shines. I'm not sure all references in safe code need to be `scope` - that would break a lot of code unto itself, right?

>> Principle 8: Any time a reference is copied, the reference
>   ^^^^^^^^^^^
>   Principle 7 ?
>> scope inherits the *maximum* of the two scope depths:
>>
>> T* gru() {
>>  static T st; // decl depth(0)
>>  T t; // decl depth(1)
>>  T* tp = &t; // ref depth(1)
>>  tp = &st; // ref depth STILL (1)
>>  return tp; // error!
>> }
>>
>> If you have ever loaded a reference with a local scope, it retains that scope level permanently, ensuring the safety of the reference.
>
> Why is this rule necessary? Can you show an example what could go wrong without it? I assume it's just there to ease implementation (avoids the need for data flow analysis)?

You're right. It's only necessary when code is branching. My proposal could be amended as such.

>> T* fun(T* a, T* b, T** c) {
>>  // the function's "return scope" accumulates `a` here
>>  return a;
>>  T* d = b; // `d's reference scope accumulates `b`
>>
>>  // the return scope now accumulates `b` from `d`
>>  return d;
>>
>>  *c = d; // now mutable parameter `c` gets `d`
>>
>>  static T* t;
>>  *t = b; // this might be safe, but only the caller can know
>> }
>>
>> All this accumulation results in the implicit function signature:
>>
>> T* fun(return T* a, // DIP25
>>       return noscope T* d, // DIP25 and DIP71
>>       out!b T** c  // from DIP71
>>       ) @safe;
>
> I supposed that's about attribute inference?

Well, that, and in the absence of inference, errors in @safe functions.

>> Principle 10: You'll probably have noticed that all scopes accumulate each other according to lexical ordering, and that's good news, because any sane person assigns and return references in lexical order.
>
> As you say, that's broken. But why does it need to be in lexical order in the first place? I would simply analyze the entire function first, assign reference scopes, and disallow circular relations (like `a = b; b = a;`).

T* fun(T* a, T** b) {
  T* c = new T;
  c = a;
  *b = c;
  return c;
}

Both `b` and the "return scope" need to pick up that they are from `a` (the end result being the signature "T* fun(return T* a, out!a T** b);"). If `c` is returned first, the return scope will only inherit what c was declared with. It won't pick up that it also has `a's scope. What underlying mechanism would you have the compiler use to allow for these chains of references? (Note that I haven't yet suggested the final attribute which would imbue the return scope with heap or global references, and thus this possibility is not yet contained in the function signature.)

>> Conclusion
>>
>> 1. With this system as foundation, an effective ownership system is easily within reach. Just confine the outgoing scopes to a single parameter and no globals, and you have your ownership. You might need another (rare) function attribute to help with this, and a storage class (e.g. `scope`, `unique`) to give you an error when you do something wrong, but the groundwork is 90% laid.
>
> It's not so simple at all. For full-blown unique ownership, there needs to be some kind of borrow-checking like in Rust. I have some ideas how a simple borrow-checker can be implemented without much work (without data flow analysis as Rust does). It's basically my "const borrowing" idea (whose one flaw incidentally cannot be triggered by unique types, because it is conditioned on the presence of aliasing).
>
> There are still some things in the proposal that I'm sure can be simplified. We probably don't need new keywords like `noscope`. I'm not even sure the concept itself is needed.

Unless you want to flat out ban copying a parameter reference to a global in @safe code, you will need `noscope`, or, as you suggested, `static`. I'm actually thinking of reusing `noscope` as a function attribute (`@noscope` perhaps) which says that the function may return a heap or global reference. This is all that's necessary to complete an ownership system. If a scope has exactly 1 "mystery" bit set, and is known not to come from the heap or a global, then you know that it *must* contain a reference to exactly the parameter for which the mystery bit is set. You know exactly what it contains == ownership.

> That all said, I think you're on the right track. The fact that you don't require a new type modifier will make Walter very happy. This looks pretty good!

Thanks.
February 26, 2015
On Wednesday, 25 February 2015 at 23:33:57 UTC, H. S. Teoh wrote:
> On Wed, Feb 25, 2015 at 09:26:31PM +0000, via Digitalmars-d wrote:
> [...]
>> On Wednesday, 25 February 2015 at 01:12:15 UTC, Zach the Mystic wrote:
>> >Principle 3: Extra function and parameter attributes are the tradeoff
>> >for great memory safety. There is no other way to support both
>> >encapsulation of control flow (Principle 2) and the
>> >separate-compilation model (indispensable to D). Function signatures
>> >pay the price for this with their expanding size. I try to create the
>> >new attributes for the rare case, as opposed to the common one, so
>> >that they don't appear very often.
>> 
>> IIRC H.S. Teoh suggested a change to the compilation model. I think he
>> wants to expand the minimal compilation unit to a library or
>> executable. In that case, inference for all kinds of attributes will
>> be available in many more circumstances; explicit annotation would
>> only be necessary for exported symbols.
>
> I don't remember making any such suggestion...

I'm sorry then... I've pulled this from the back of my mind, and I'm sure something similar was actually suggested (not as a formal proposal, mind you). Maybe it was Martin Nowak, because he's working on DIP45 (export)? But better not to speculate, lest more innocent people get accused of proposing things ;-)

> the closest I can think
> of is the idea that attribute inference should always be done, and saved
> as part of the emitted object file(s), perhaps even in generated .di
> files that contain all inferred attributes. When importing some module,
> the compiler would read the inferred attributes from the saved
> information. Programmers won't even need to write any attributes except
> when they want to override the compiler's inference, but the code will
> automatically get the benefit of all inferred attributes. Library users
> would also benefit by having all inferred attributes available in the
> auto-generated .di files. This can be made to work regardless of what
> the minimal compilation unit is.
>
> Automatic inference also frees us from the concern that functions have
> too many attributes -- if the compiler will automatically infer most of
> them for us, we can freely add all sorts of attributes without worrying
> that it will become impractically verbose to write. Saving this info as
> part of the object file also lets the compiler take advantage of these
> extra attributes even when source code isn't available, or perform
> whole-program optmizations based on them.

Yes, I fully agree with that. The one thing that's then missing is a way to disable automatic inference (for stable interfaces); `export` fits that mold.
February 26, 2015
On Thursday, 26 February 2015 at 16:40:27 UTC, Zach the Mystic wrote:
> I'm unclear about what you're saying. Can you give an example in code?
>

See below.

>> That would allow to copy a parameter reference to a global, which is dead unsafe.
>
> Actually, it's not unsafe, so long as you have the parameter attribute `noscope` (or possibly `static`) working for you:
>

Consider :

void foo(T** a) {
    T** b = a; // OK
    T*  = ...;
    *b = c; // Legal because of your transitive clause,
            // but not safe as a can have an
            // arbitrary large lifetime.
}

This show that anything you reach through an indirection can have from the same lifetime as the indirection up to an infinite lifetime (and anything in between). When using it as an lvalue, you should consider the largest possible lifetime, when using it as an rvalue, you should consider the smallest (this is the only way to be safe).
« First   ‹ Prev
1 2 3 4 5 6