Jump to page: 1 2 3
Thread overview
scope escaping
Feb 06, 2014
Adam D. Ruppe
Feb 06, 2014
Adam D. Ruppe
Feb 06, 2014
Adam D. Ruppe
Feb 06, 2014
Matej Nanut
Feb 06, 2014
Adam D. Ruppe
Feb 06, 2014
Elie Morisse
Feb 06, 2014
Adam D. Ruppe
Feb 06, 2014
Paulo Pinto
Feb 06, 2014
Adam D. Ruppe
Feb 07, 2014
Paulo Pinto
Feb 06, 2014
Benjamin Thaut
Feb 06, 2014
Namespace
Feb 06, 2014
Adam D. Ruppe
Feb 06, 2014
Benjamin Thaut
Feb 06, 2014
Dicebot
Feb 07, 2014
Marco Leise
Feb 06, 2014
Meta
Feb 06, 2014
Adam D. Ruppe
Feb 06, 2014
Meta
Feb 07, 2014
Adam D. Ruppe
Feb 06, 2014
Dicebot
Feb 07, 2014
Adam D. Ruppe
Feb 08, 2014
Marc Schütz
February 06, 2014
Let's see if we can make this work in two steps: first, making the existing scope storage class work, and second, but considering making it the default.

First, let's define it. A scope reference may never escape its scope. This means:

0) Note that scope is irrelevant on value types. I believe it is also mostly irrelevant on references to immutable data (such as strings) since they are de facto value types. (A side effect of this: immutable stack data is wrong.... which is an arguable point, since correct enforcement of slices into it would let you maintain the immutable illusion. Hmm, both sides have good points.)

Nevertheless, while the immutable reference can be debated, scope definitely doesn't matter on value types. While it might be there, I think it should just be a no-op.

1) It or its address must never be assigned to a higher scope. (The compiler currently disallows rebinding scope variables, which I think does achieve this, but is more blunt than it needs to be. If we want to disable rebinding, let's do that on a type-by-type basis e.g. disabling postblit on a unique ptr.)

void foo() {
   int[] outerSlice;
   {
      scope int[] innerSlice = ...;
      outerSlice = innerSlice; // error
      innerSlice = innerSlice[1 .. $]; // I think this should be ok
   }
}

Parameters and return values are considered the same level for this, since the parameter and return value both belong to the caller. So:

int[] foo() {
   int[15] staticBuffer;
   scope int[] slice = staticBuffer[];
   return slice; // illegal, return value is one level higher than inner function
}

// OK, you aren't giving the caller anything they don't already have
scope char[] strchr(scope char[] s, char[]) { return s; }

It is acceptable to pass it to a lower scope.

int average(in int[]); // in == const scope

void foo() {
    int[15] staticBuffer;
    scope int[] slice = staticBuffer[];
    int avg = average(slice); // OK, passing to inner scope is fine
}


scope slice.ptr and &scope slice's return values themselves must be scope. Yes, scope MUST work on function return values as well as parameters and variables. This is an absolute necessity for any degree of sanity, which I'll talk about more in my next numbered point.


BTW I keep using slices into static buffers here because that's the main real-world concern we should keep in mind. A static buffer is a strictly-scoped owned container built right into the language. We know it is wrong to return a reference to stack data, we know why. Conversely, we have a pretty good idea about what *can* work with it. Scope, if we do it right, should statically catch misuses of static array slices while allowing proper uses.

So when in doubt about something, ask: does this make sense when referring to a static buffer slice?

2) scope must be carried along with the variable at every step of its life. (In this sense, it starts to look more like a type constructor than a storage class, but I think it is slightly different still.)

void foo() {
   int[] outerSlice;
   {
       int[16] staticBuffer;
       scope int[] innerSlice = staticBuffer[]; // OK
       int[] cheatingSlice = innerSlice; // uh oh, no good because...
       outerSlice = cheatingSlice; // ...it enables this
   }
}


A potential workaround is to require every assignment to also be scope.

       scope int[] cheatingSlice = innerSlice; // OK
       outerSlice = cheatingSlice; // this is still disallowed, so cool

It is very important that this also applies through function return values, since otherwise:

T identity(T)(scope T t) { return t; }

can and will escape references. Consider strchr on a static stack array. We do NOT want that to return a pointer to the stack memory after it ceases to exist.

This, that identity function should be illegal with cannot return scope from a non-scope function. We'll allow it by marking the return value as scope as well. (Again, this sounds a lot like a type constructor.)


3) structs are considered reference types if ANY of their members are reference types (unless specifically noted otherwise, see my following post about default and encapsulation for details). Thus, the scope rules may apply to them:

struct Holder {
   int[] foo;
}

Holder h;
void test(scope int[] f) {
    h.foo = f; // must be an error, f is escaping to global scope directly
    h = Holder(f); // this must also be an error, f is escaping indirectly
}

The constructed Holder inside would have to inherit the scopiness of f. This might be the trickiest part of getting this right (though it is kinda neatly solved if scope is default :) )

a) A struct constructed with a scope variable itself must be scope, and thus all the rules apply to it.

b) Assigning to a struct which is not scope, even if it is a local variable, must not be permitted.

Holder h2;
h2.foo = f; // this isn't escaping the scope, but is dropping scope

Just as if we had a local variable of type int[].

We may make the struct scope:

scope Holder h2;
h2.foo = f; // OK

c) Calling methods on a struct which may escape the scope is wrong. Ideally, `this` would always be scope... in fact, I think that's the best way to go. An alternative though might be to restrict calling of non-pure functions. Pure functions don't allow mutation of non-scope data in the first place, so they shouldn't be able to escape references.




I think that covers what I want. Note that this is not necessarily @safe:

struct C_Array { /* grows with malloc */ scope T* borrow() {} }

C_Array!int i;
int* b = i.borrow;
i ~= 10; // might realloc...
// leaving b dangling


So it isn't necessarily @safe. I think it *would* be @safe with static arrays. BTW static array slicing should return scope as should most user defined containers. But with user-defined types, @safety is still in the hands of the programmer. Reallocing with a non-sealed reference should always be considered @trusted.


Stand by for my next post which will discuss making it default, with a few more points relevant to the whole concept.
February 06, 2014
Making scope the default
=======================


There's five points to discuss:

1) All variables are assumed to be marked with scope implicitly

2) The exception is structs with a special annotation which marks
   that they encapsulate a resource. An encapsulated resource
   explicitly marked scope at the usage site is STILL scope, but
   it will not implicitly inherit the scopiness of the member reference/

@encapsulated_resource
struct RefCounted(T) {
    T t; // the scopiness of this would not propagated to refcounted itself
}

This lets us write structs to manage raw pointers (etc.) as an escape from
the rules. Note you may also write @encaspulated_resource struct Borrowed(T){}
as an escape from the rules. Using this would of course be at your own risk,
analogous to @trusted code.

3) Built-in allocations return GC!T instead of T. GC!T's definition is:

@encapsulated_resource
struct GC(T) {
    private T _managed_payload;
    /* @force_inline */
    /* implicit scope return value */
    @safe nothrow inout(T) borrow() { return _managed_payload; }
    alias borrow this;
}

NOTE: if inout(T) there doesn't work for const correctness, we need to fix
const on wrapped types; an orthogonal issue.

If you don't care about ownership, the alias this gives you a naked borrowed
reference whenever needed. If you do care about ownership:

auto foo = new Foo();
static assert(is(typeof(foo) == GC!Foo));

letting you store it with confidence without additional steps or assumptions.

When passing to a template, if you want to explicitly borrow it, you might
write borrow. Otherwise, IFTI will see the whole GC!T type. This is important
if we want to write owned identity templates.

If an argument is scope, ownership is irrelevant. We might strip it off but
I don't think that's necessary... might help avoid template bloat though.

4) All other types remain the same. Yes, typeof(this) == T, NEVER GC!T.
   Again, remember the rule of thumb: would this work with as static stack
   buffer?

   class Foo { Foo getMe() { return this; } }
   ubyte[__traits(classInstanceSize, Foo)] buffer;
   Foo f = emplace!Foo(buffer); // ok so far, f is scope
   GC!Foo gc = f.getMe(); // obviously wrong, f is not GC

   The object does not control its own allocation, so it does not own
   its own memory. Thus, `this` is *always* borrowed.

   Does this work if building a tree:

   class Tree { Tree[] children; Tree addChild(Tree t) { children ~= t; } }

   addChild there would *not* compile, since it escapes the t into the object's
   scope. Tree would need to know ownership: make children and addChild take
   GC!Tree instead, for example, then it will work.

   What if addChild wants to set t.parent = this; ? That wouldn't be possible
   (without using a trust-me borrowed!T wrapper)... and while this would break
   some of my code... I say unto you, such code was already broken, because
   the parent might be emplaced on a stack buffer!

   GC!Tree child = new Tree();
   {
       ubyte[...] stack;
       Owned!Tree parent = emplace!Tree(stack[]);
       parent.addChild(child);
   }
   child.parent; // bug city


   Instead, addChild should request its own ownership.

   Tree addChild(GC!Tree child, GC!Tree _this) {
       children ~= child;
       child.parent = _this;
   }


   Then, the buggy above scenario does not compile, while making it possible
   to do the correct thing, storing a (verified) GC reference in the
   object graph.


   I understand that would be a bit of a pain, but you agree it is more correct,
   yes? So that might be worthwhile breakage (especailly since we're talking
   about potentially large breakage already.)


5) Interaction with @safe is something we can debate. @safe works best with
   the GC, but if we play our scope cards right, memory corruption via stack
   stuff can be statically eliminated too, thus making some varaints of emplace
   @safe too. So I don't think even @safe functions can assume this == GC, and
   even if they could, we shouldn't since it limits us from legitimate
   optimizations.

   So I think the @safe rules should stay exactly as they are now. Wrapper
   structs that do things like malloc/realloc might be @system because it
   would still be possible for a borrowed pointer to be invalidated when
   they realloc (note this is not the case with GC, which is @safe even
   through growth reallocations). So @safe and scope are separate issues.
February 06, 2014
Sorry, my lines got mangled, let me try pasting it again.


Making scope the default
=======================


There's five points to discuss:

1) All variables are assumed to be marked with scope implicitly

2) The exception is structs with a special annotation which marks that they encapsulate a resource. An encapsulated resource explicitly marked scope at the usage site is STILL scope, but it will not implicitly inherit the scopiness of the member reference/

@encapsulated_resource
struct RefCounted(T) {
    T t; // the scopiness of this would not propagated to
         // refcounted itself
}

This lets us write structs to manage raw pointers (etc.) as an escape from the rules. Note you may also write @encaspulated_resource struct Borrowed(T){} as an escape from the rules. Using this would of course be at your own risk, analogous to @trusted code.

3) Built-in allocations return GC!T instead of T. GC!T's definition is:

@encapsulated_resource
struct GC(T) {
    private T _managed_payload;
    /* @force_inline */
    /* implicit scope return value */
    @safe nothrow inout(T) borrow() { return _managed_payload; }
    alias borrow this;
}

NOTE: if inout(T) there doesn't work for const correctness, we need to fix const on wrapped types; an orthogonal issue.

If you don't care about ownership, the alias this gives you a naked borrowed reference whenever needed. If you do care about ownership:

auto foo = new Foo();
static assert(is(typeof(foo) == GC!Foo));

letting you store it with confidence without additional steps or assumptions.

When passing to a template, if you want to explicitly borrow it, you might write borrow. Otherwise, IFTI will see the whole GC!T type.  This is important if we want to write owned identity templates.

If an argument is scope, ownership is irrelevant. We might strip it off but I don't think that's necessary... might help avoid template bloat though.

4) All other types remain the same. Yes, typeof(this) == T, NEVER GC!T.  Again, remember the rule of thumb: would this work with as static stack buffer?

   class Foo { Foo getMe() { return this; } }
   ubyte[__traits(classInstanceSize, Foo)] buffer;
   Foo f = emplace!Foo(buffer); // ok so far, f is scope
   GC!Foo gc = f.getMe(); // obviously wrong, f is not GC

   The object does not control its own allocation, so it does not own its own memory. Thus, `this` is *always* borrowed.

   Does this work if building a tree:

   class Tree { Tree[] children; Tree addChild(Tree t) {
children ~= t; } }

   addChild there would *not* compile, since it escapes the t into the object's scope. Tree would need to know ownership: make children and addChild take GC!Tree instead, for example, then it will work.

   What if addChild wants to set t.parent = this; ? That wouldn't be possible (without using a trust-me borrowed!T wrapper)... and while this would break some of my code... I say unto you, such code was already broken, because the parent might be emplaced on a stack buffer!

   GC!Tree child = new Tree();
   {
       ubyte[...] stack;
       Owned!Tree parent = emplace!Tree(stack[]);
       parent.addChild(child);
   }
   child.parent; // bug city


   Instead, addChild should request its own ownership.

   Tree addChild(GC!Tree child, GC!Tree _this) {
       children ~= child;
       child.parent = _this;
   }


   Then, the buggy above scenario does not compile, while making it possible to do the correct thing, storing a (verified) GC reference in the object graph.


   I understand that would be a bit of a pain, but you agree it is more correct, yes? So that might be worthwhile breakage (especailly since we're talking about potentially large breakage already.)


5) Interaction with @safe is something we can debate. @safe works best with the GC, but if we play our scope cards right, memory corruption via stack stuff can be statically eliminated too, thus making some varaints of emplace @safe too. So I don't think even @safe functions can assume this == GC, and even if they could, we shouldn't since it limits us from legitimate optimizations.

   So I think the @safe rules should stay exactly as they are now. Wrapper structs that do things like malloc/realloc might be @system because it would still be possible for a borrowed pointer to be invalidated when they realloc (note this is not the case with GC, which is @safe even through growth reallocations). So @safe and scope are separate issues.
February 06, 2014
On 6 Feb 2014 16:56, "Adam D. Ruppe" <destructionator@gmail.com> wrote:
> Making scope the default
> =======================
> [...]

I just stumbled upon Rust's memory management scheme yesterday and it seemed similar to this.

On first glance, I really like it.


February 06, 2014
On Thursday, 6 February 2014 at 18:29:48 UTC, Matej Nanut wrote:
> I just stumbled upon Rust's memory management scheme yesterday and it seemed similar to this.

Yeah, I haven't used rust but I have read about it, and the more I think about it, the more I realize it really isn't that new - it is just formalizing what we already do as programmers.

Escaping a reference to stack data is always wrong. We know this and try not to do it. The language barely helps with this though - we're on our own. We can't even be completely sure a reference actually is GC since it might be on the stack without us realizing it.

So what the Rust system and my proposal (which I'm pretty sure is simpler than the Rust one - it doesn't catch all the problems, but should be easier to implement and use for the majority of cases) does is try to get the language to help us get this right.

It's the same thing with like error handling. In C, you know you have to clean up with a failed operation and you have to do it yourself. This is often done by checking return values and goto clean up code. In D, we have struct destructors, scope(failure), and exceptions to help us do the same task with less work and more confidence.
February 06, 2014
On Thursday, 6 February 2014 at 15:53:01 UTC, Adam D. Ruppe wrote:
>
> Making scope the default
>

How about letting the compiler decide what's best in the default case?

 · if a global reference to the variable espaces or a reference is returned by a function ⇒ GC-allocated
 · otherwise ⇒ scoped to where the last reference to the variable is seen by static analysis
February 06, 2014
Another idea. I would totaly love that behaviour.

void foo(scope int[] arg) { ... }

foo([1 2 3 4]); // allocates the array literal on the stack, because it is scoped.

Kind Regards
Benjamin Thaut
February 06, 2014
On Thursday, 6 February 2014 at 20:17:25 UTC, Benjamin Thaut wrote:
> Another idea. I would totaly love that behaviour.
>
> void foo(scope int[] arg) { ... }
>
> foo([1 2 3 4]); // allocates the array literal on the stack, because it is scoped.
>
> Kind Regards
> Benjamin Thaut

+1
February 06, 2014
On Thursday, 6 February 2014 at 20:17:25 UTC, Benjamin Thaut wrote:
> foo([1 2 3 4]); // allocates the array literal on the stack, because it is scoped.

Absolutely. In fact, generically, any scope item could be moved to the stack. We were just discussing in the chat room how scope = stack allocation and scope = don't escape the reference actually go hand in hand; they are not two separate features, stack allocation is an optimization enabled by the restriction... and the restriction is required by the optimization to maintain memory safety.
February 06, 2014
On Thursday, 6 February 2014 at 19:24:19 UTC, Elie Morisse wrote:
> How about letting the compiler decide what's best in the default case?

The problem there is the compiler would have to look at the big picture to make an informed decision, and big picture decisions are generally hard to implement.

Determining whether it is GC or not automatically would require analysis of the function body, tracing where each reference ends up, and looking at other functions it gets passed to (which might not be possible if you have only the prototype without a body). Things like pure can help with it, but generally, I don't think the compiler can make a smart decision.
« First   ‹ Prev
1 2 3