draft proposal for ref counting in D

This is an email conversation we had last summer. It's of general interest, so reposted here with permission. We didn't reach any conclusions, but there's a lot of good stuff in here, and it's particularly relevant to other recent threads.

October 09, 2013

Re: draft proposal for ref counting in D

Posted by Walter Bright
in reply to Walter Bright

Permalink

Walter Bright

Posted in reply to Walter Bright

Permalink

This is based on n.g. discussions and ideas from you guys. I'll redo it as a DIP if it passes the smoke test from y'all.

----------------------------------------------------------------------
    Adding Reference Counting to D

D currently supports manual memory management and generalized GC. Unfortunately, the pausing
and memory consumption inherent in GC is not acceptable for many programs, and manual
memory management is error-prone, tedious, and unsafe. A third style, reference counting (RC),
addresses this. Common implementations are COM's AddRef/Release, Objective-C's ARC,
and C++'s shared_ptr<>.

None of these three schemes are guaranteed memory-safe, they all require the programmer
to conform to a protocol (even O-C's ARC). Stepping outside of the protocol results in
memory corruption. A D implementation must make it possible to use ref counted objects
in code marked as @safe, although it will be necessary for the implementation of those
objects to be unsafe.

Some aspects of a D implementation are inevitable:

1. Avoid any requirement for more pointer types. This would cause drastic increases in
complexity for both the compiler and the user. It may make generic code much more difficult
to write.

2. Decay of a ref-counted pointer to a non-ref-counted pointer is unsafe, and can only
be allowed (in @safe code) in circumstances where it can be statically proven to be safe.

3. A ref counted object is inherently a reference type, not a value type.

4. The compiler needs to know about ref counted types.


==Proposal==

If a class contains the following methods, in either itself or a base class, it is
an RC class:


    T AddRef();
    T Release();

An RC class is like a regular D class with these additional semantics:

1. In @safe code, casting (implicit or explicit) to a base class that does not
have both AddRef() and Release() is an error.

2. Initialization of a class reference causes a call to AddRef().

3. Assignment to a class reference causes a call to Release() on its original value
and AddRef() on the new value.

4. Null checks are done before calling any AddRef() or Release().

5. Upon scope exit of all RC variables or temporaries, a call to Release() is performed,
analogously to the destruction of struct variables and temporaries.

6. If a class or struct contains RC fields, calls to Release() for those fields will
be added to the destructor, and a destructor will be created if one doesn't exist already.

7. If a closure is created that contains RC fields, either a compile time error will be
generated or a destructor will be created for it.

8. Explicit calls to AddRef/Release will not be allowed in @safe code.

9. A call to AddRef() will be added to any argument passed as a parameter.

10. Function returns have an AddRef() already done to the return value.

11. The compiler can elide any AddRef()/Release() calls it can prove are redundant.

12. AddRef() is not called when passed as the implicit 'this' reference.

13. Taking the address of, or passing by reference, any fields of an RC object
is not allowed in @safe code. Passing by reference an RC field is allowed.

14. RC objects will still be allocated on the GC heap - this means that a normal
GC run will reap RC objects that are in a cycle, and RC objects will get automatically
scanned for heap references with no additional action required by the user.


==Existing Code==

D COM objects already have AddRef() and Release(). This proposal should not break
that code, it'll just mean that there will be extra AddRef()/Release calls made.
Calling AddRef()/Release() should never have been allowed in @safe code anyway.

Any other existing uses of AddRef()/Release() will break.

==Arrays==

Built-in arrays have no place to put a reference count. Ref counted arrays would hence
become a library type, based on a ref counted class with overloaded operators for
the array operations.

==Results==

This is a very flexible approach, allowing for support of general RC objects, as well
as specific support for COM objects and Objective-C ARC. AddRef()/Release()'s implementation
is entirely up to the user or library writer.

@safe code can be guaranteed to be memory safe, as long as AddRef()/Release() are correctly
implemented.

October 09, 2013

Re: draft proposal for ref counting in D

Posted by Walter Bright
in reply to Walter Bright

Permalink

Walter Bright

Posted in reply to Walter Bright

Permalink

Steven Schveighoffer wrote:

Looks like a good start.

A few things:

1. Proposal point 3, you need to AddRef first, THEN Release the original. Reason being, you could be assigning a reference the same value as before. In this case, if you decrement *first*, you will decrement the reference, which might reduce it to 0, and free the object before you increment.  In most cases, the AddRef and Release are elided, so it's not bad if you do this.

I wonder if it's not a good idea to have a RefAssign function that takes two RC objects and does a check to see if they are the same before doing AddRef and Release to help with this issue. Calls where the compiler can prove they are the same value can be elided.

2. AddRef and Release should be treated not as function calls, but as callables. That is, if for some reason AddRef and Release should be aliases, this does not detract from the solution. Only requirement should be that they cannot be UFCS, as that doesn't make any sense (one cannot add reference counting after the fact to an object).

I'm thinking of the Objective-C objects, whose functions are "release" and "retain". It would be good to use the names Objective-C coders are used to.

3. Objective-C ARC uses a mechanism called auto-release pools that help cut down on the release/retain calls. It works like this:

@autoreleasepool {  // create a new pool
     NSString *str = [NSString stringWithFormat:@"an int: %d", 1];
     @autoreleasepool { // create a new pool on the "pool stack"
           NSDate *date = [NSDate date];
           {
               NSDate *date2 = [NSDate date];
           }
     } // auto-release date and date2
} // auto-release str

In this way, it's not the pointer going out of scope that releases the object, it's the release pool going out of scope that releases the object. These release pools themselves are simply RC objects that call release on all their objects (in fact, prior to the @autoreleasepool directive, you had to manually create and destroy these pools).

The benefit of this model is that basically, inside a pool, you can move around auto-released objects at will, pass them into functions, return them from functions, etc, without having to retain or release them for each assignment. It's kind of like a mini-GC.

It works by convention, that any function that returns a RC object:

if it's called 'new...' or 'init...' or 'alloc...' or 'copy...', then the object is assumed returned with it's retain count incremented on behalf of the calling scope. This means, if you assign it to a member variable, for instance, you do not have to retain the object again, and if it goes out of scope, you must call release on it.

All other functions return 'auto-released' objects, or objects which have queued in the latest auto release pool. The compiler knows this and can elide more of the releases and retains.

Would this be a mechanism that's worth putting in? I think it goes really well with something like TempAlloc.  I don't think we should use convention, though...

-Steve

October 09, 2013

Re: draft proposal for ref counting in D

Posted by Walter Bright
in reply to Walter Bright

Permalink

Walter Bright

Posted in reply to Walter Bright

Permalink

On 6/25/2013 1:19 PM, Steven Schveighoffer wrote:
> Looks like a good start.
>
> A few things:
>
> 1. Proposal point 3, you need to AddRef first, THEN Release the original. Reason being, you could be assigning a reference the same value as before. In this case, if you decrement *first*, you will decrement the reference, which might reduce it to 0, and free the object before you increment.  In most cases, the AddRef and Release are elided, so it's not bad if you do this.

Yeah, I got that backwards, and I should know better.

>
> I wonder if it's not a good idea to have a RefAssign function that takes two RC objects and does a check to see if they are the same before doing AddRef and Release to help with this issue. Calls where the compiler can prove they are the same value can be elided.

I'd like to see how far we get with just AddRef/Release first, and getting the semantics of them right first.

>
> 2. AddRef and Release should be treated not as function calls, but as callables. That is, if for some reason AddRef and Release should be aliases, this does not detract from the solution.

Yes, just like the names for Ranges are used.

>   Only requirement should be that they cannot be UFCS, as that doesn't make any sense (one cannot add reference counting after the fact to an object).

That would be covered by disallowing explicit calls to AddRef/Release in @safe code.

>
> I'm thinking of the Objective-C objects, whose functions are "release" and "retain". It would be good to use the names Objective-C coders are used to.

AddRef/Release are the COM names. It's trivial to have one wrap the other. I picked AddRef/Release because I'm familiar with their semantics, and am not with O-C.


>
> 3. Objective-C ARC uses a mechanism called auto-release pools that help cut down on the release/retain calls. It works like this:
>
> @autoreleasepool {  // create a new pool
>       NSString *str = [NSString stringWithFormat:@"an int: %d", 1];
>       @autoreleasepool { // create a new pool on the "pool stack"
>             NSDate *date = [NSDate date];
>             {
>                 NSDate *date2 = [NSDate date];
>             }
>       } // auto-release date and date2
> } // auto-release str
>
> In this way, it's not the pointer going out of scope that releases the object, it's the release pool going out of scope that releases the object. These release pools themselves are simply RC objects that call release on all their objects (in fact, prior to the @autoreleasepool directive, you had to manually create and destroy these pools).
>
> The benefit of this model is that basically, inside a pool, you can move around auto-released objects at will, pass them into functions, return them from functions, etc, without having to retain or release them for each assignment. It's kind of like a mini-GC.
>
> It works by convention, that any function that returns a RC object:
>
> if it's called 'new...' or 'init...' or 'alloc...' or 'copy...', then the object is assumed returned with it's retain count incremented on behalf of the calling scope. This means, if you assign it to a member variable, for instance, you do not have to retain the object again, and if it goes out of scope, you must call release on it.
>
> All other functions return 'auto-released' objects, or objects which have queued in the latest auto release pool. The compiler knows this and can elide more of the releases and retains.
>
> Would this be a mechanism that's worth putting in? I think it goes really well with something like TempAlloc.  I don't think we should use convention, though...

I agree with not relying on convention. But also reserving the new*, init*, alloc* and copy* namespaces seems excessive for D.

As for autoreleasepool, it is relying on a convention that its fields are not leaking. I don't see how we can enforce this.

October 10, 2013

Re: draft proposal for ref counting in D

Posted by Walter Bright
in reply to Walter Bright

Permalink

Walter Bright

Posted in reply to Walter Bright

Permalink

I have overlooked addressing what happens when you pass an RC ref to a pure function. Is the pure function allowed to call AddRef()/Release()? Not sure.

October 10, 2013

Re: draft proposal for ref counting in D

Posted by Walter Bright
in reply to Walter Bright

Permalink

Walter Bright

Posted in reply to Walter Bright

Permalink

Steven Schveighoffer wrote:

That's a resounding yes.

Consider that this is allowed:

class X {}

struct S
{
   X foo;
   void setFoo(X newfoo) pure {foo = newfoo;}
}

If X is ref-counted, you HAVE to increment the ref count.

The only issue here is, ref counting may have to access global data.  But we already have exceptions for memory management, even for strong-pure functions.

-Steve

On Jun 25, 2013, at 4:48 PM, Walter Bright wrote:

> I have overlooked addressing what happens when you pass an RC ref to a pure function. Is the pure function allowed to call AddRef()/Release()? Not sure.

October 10, 2013

Re: draft proposal for ref counting in D

Posted by Walter Bright
in reply to Walter Bright

Permalink

Walter Bright

Posted in reply to Walter Bright

Permalink

Updated incorporating Steven's suggestion, and some comments about shared/const/mutable/purity.

-------------------------------------------------------------

    Adding Reference Counting to D

D currently supports manual memory management and generalized GC. Unfortunately, the pausing
and memory consumption inherent in GC is not acceptable for many programs, and manual
memory management is error-prone, tedious, and unsafe. A third style, reference counting (RC),
addresses this. Common implementations are COM's AddRef/Release, Objective-C's ARC,
and C++'s shared_ptr<>.

None of these three schemes are guaranteed memory-safe, they all require the programmer
to conform to a protocol (even O-C's ARC). Stepping outside of the protocol results in
memory corruption. A D implementation must make it possible to use ref counted objects
in code marked as @safe, although it will be necessary for the implementation of those
objects to be unsafe.

Some aspects of a D implementation are inevitable:

1. Avoid any requirement for more pointer types. This would cause drastic increases in
complexity for both the compiler and the user. It may make generic code much more difficult
to write.

2. Decay of a ref-counted pointer to a non-ref-counted pointer is unsafe, and can only
be allowed (in @safe code) in circumstances where it can be statically proven to be safe.

3. A ref counted object is inherently a reference type, not a value type.

4. The compiler needs to know about ref counted types.


==Proposal==

If a class contains the following methods, in either itself or a base class, it is
an RC class:


    T AddRef();
    T Release();

An RC class is like a regular D class with these additional semantics:

1. In @safe code, casting (implicit or explicit) to a base class that does not
have both AddRef() and Release() is an error.

2. Initialization of a class reference causes a call to AddRef().

3. Assignment to a class reference causes a call to AddRef() on the new value
followed by a call to Release() on its original value.

4. Null checks are done before calling any AddRef() or Release().

5. Upon scope exit of all RC variables or temporaries, a call to Release() is performed,
analogously to the destruction of struct variables and temporaries.

6. If a class or struct contains RC fields, calls to Release() for those fields will
be added to the destructor, and a destructor will be created if one doesn't exist already.

7. If a closure is created that contains RC fields, either a compile time error will be
generated or a destructor will be created for it.

8. Explicit calls to AddRef/Release will not be allowed in @safe code.

9. A call to AddRef() will be added to any argument passed as a parameter.

10. Function returns have an AddRef() already done to the return value.

11. The compiler can elide any AddRef()/Release() calls it can prove are redundant.

12. AddRef() is not called when passed as the implicit 'this' reference.

13. Taking the address of, or passing by reference, any fields of an RC object
is not allowed in @safe code. Passing by reference an RC field is allowed.

14. RC objects will still be allocated on the GC heap - this means that a normal
GC run will reap RC objects that are in a cycle, and RC objects will get automatically
scanned for heap references with no additional action required by the user.

15. The class implementor will be responsible for deciding whether or not to support
sharing. Casting to shared is already disallowed in @safe code, so this is only
viable in system code.

16. RC objects cannot be const or immutable.

17. Can RC objects be arguments to pure functions?

==Existing Code==

D COM objects already have AddRef() and Release(). This proposal should not break
that code, it'll just mean that there will be extra AddRef()/Release calls made.
Calling AddRef()/Release() should never have been allowed in @safe code anyway.

Any other existing uses of AddRef()/Release() will break.

==Arrays==

Built-in arrays have no place to put a reference count. Ref counted arrays would hence
become a library type, based on a ref counted class with overloaded operators for
the array operations.

==Results==

This is a very flexible approach, allowing for support of general RC objects, as well
as specific support for COM objects and Objective-C ARC. AddRef()/Release()'s implementation
is entirely up to the user or library writer.

@safe code can be guaranteed to be memory safe, as long as AddRef()/Release() are correctly
implemented.

October 10, 2013

Re: draft proposal for ref counting in D

Posted by Walter Bright
in reply to Walter Bright

Permalink

Walter Bright

Posted in reply to Walter Bright

Permalink

Steven Schveighoffer wrote:

On Jun 25, 2013, at 4:44 PM, Walter Bright wrote:

>
> On 6/25/2013 1:19 PM, Steven Schveighoffer wrote:
>>
>> Would this be a mechanism that's worth putting in? I think it goes really well with something like TempAlloc.  I don't think we should use convention, though...
>
> I agree with not relying on convention. But also reserving the new*, init*, alloc* and copy* namespaces seems excessive for D.
>
> As for autoreleasepool, it is relying on a convention that its fields are not leaking. I don't see how we can enforce this.

I don't think the autoreleasepool is relying on convention, it's simply giving the compiler a way to elide careful tracking of temporaries' reference counts.

It was definitely of more use when manual reference counting was done, because one only had to worry about retaining non-temporary data in that case.

But the compiler can make the same optimizations (and does in the ARC version of Objective-C).

Consider the following code:

NSString *str = [NSString stringWithFormat:@"%d", 5];

// translating to D, that would be something like:
NSString str = NSString.stringWithFormat("%d", 5);

stringWithFormat is a class method that gives you back a temporary string.  You are not asserting ownership, you are just assigning to a variable.

Now, if you wanted to do some fancy stuff with str, we could do:

{
NSString str2;

{
   NSString str = NSString.stringWithFormat("%d", 5);
   if(condition)
       str2 = str;
   if(otherCondition)
   {
       NSString str3 = str;
       str = NSString.stringWithFormat("%d", 6);
   }
}

str2 = str;
}

Now, in all this mess, how is the compiler to sort out the AddRefs and Releases?  Most likely, it will end up adding more than it needs to.

But with autorelease pool, it's like you did this:

AutoReleasePool arp;
{
NSString str2;

{
   NSString str = NSString.stringWithFormat("%d", 5);
   arp.add(str);
   if(condition)
       str2 = str;
   if(otherCondition)
   {
       NSString str3 = str;
       str = NSString.stringWithFormat("%d", 6);
       arp.add(str);
   }
}

str2 = str;
}
arp.drain(); // releases both strings used, don't care what now-out-of-scope variables

Essentially, they are only retained when created, and because they go out of scope, they are no longer needed.

The compiler can surmise that because the fields aren't leaving the scope, it doesn't have to retain them.  If it does see that, it adds a retain.

Then, it can release them all at once.

In fact, this could be done automatically, but you have to allocate a place to put these 'scheduled for release' things. In Cocoa, the main event loop has an auto release pool, and you can add them manually wherever you wish for more fine-grained memory management (that is, if you wanted to free objects before you left the event loop).

Note that in Objective-C, they use those naming conventions to determine whether an object is auto-released or not.  But we could make sure it's *always* auto-released, as we don't have the historical requirements that Objective-C has.  The question is, does it make sense to use this technique to "lump together" deallocations instead of conservatively calling retain/release wherever you assign variables (like C++ shared_ptr)?  And a further question is whether the compiler should pick those points, or whether they should be picked manually.

-Steve

October 10, 2013

Re: draft proposal for ref counting in D

Posted by Walter Bright
in reply to Walter Bright

Permalink

Walter Bright

Posted in reply to Walter Bright

Permalink

If autoreleasepull is just a handy way to lump together Release() calls, then that is quite unnecessary if the compiler inserts calls to Release() automatically. If it is, instead, a promise that members of autoreleasepull do not leak references outside of that object, then this is very problematic for D to guarantee such - and guarantee it it must. I.e. it's "escape analysis" in another disguise.

I think the compiler should pick where to put the Release() calls, that is the whole point of ARC. If the compiler can do sufficient escape analysis to determine that the calls can be elided, so much the better.

October 10, 2013

Re: draft proposal for ref counting in D

Posted by Walter Bright
in reply to Walter Bright

Permalink

Walter Bright

Posted in reply to Walter Bright

Permalink

On Jun 25, 2013, at 5:28 PM, Walter Bright wrote:

> If autoreleasepull is just a handy way to lump together Release() calls, then that is quite unnecessary if the compiler inserts calls to Release() automatically. If it is, instead, a promise that members of autoreleasepull do not leak references outside of that object, then this is very problematic for D to guarantee such - and guarantee it it must. I.e. it's "escape analysis" in another disguise.
>
> I think the compiler should pick where to put the Release() calls, that is the whole point of ARC. If the compiler can do sufficient escape analysis to determine that the calls can be elided, so much the better.

I'm not sure exactly what is required for ARC to guarantee proper memory management (whether it requires flow-analysis or not), but it seems to work quite well for Objective-C. I think it helps minimize the expensive release/retain calls when you can just say "oh, someone else will clean that up later", just like you can with a GC.

It might be good for someone who knows the ARC eliding techniques that clang uses to explain how they work.  We certainly shouldn't ignore those techniques.

-Steve

Top | Forum index | About this forum

Forums