October 10, 2013
Rainer Schuetze wrote:

On 27.06.2013 15:50, Michel Fortin wrote:
>
> Le 27-juin-2013 à 8:03, "Rainer Schuetze" <r.sagitario@gmx.de> a écrit :
>
>> class C
>> {
>>   C readThis();
>>   void writeThis(ref C c);
>> }
>>
>> where the function can include the necessary locks, e.g.
>>
>> class C
>> {
>>   int refcnt;
>>
>>   C readThis()
>>   {
>>     synchronized(this)
>>     {
>>       refcnt++;
>>       return this;
>>     }
>>   }
>>   void writeThis(ref C c)
>>   {
>>     synchronized(c)
>>     {
>>        C x = c;
>>        c = this;
>>        if (--c.refcnt == 0)
>>          delete c;
>>     }
>>   }
>> }
>
> There's an error in this code. You must synchronize on the lock
> protecting the pointer, not on the lock at the other end of the
> pointer's value.

You're right (I have been about to run to a meeting when writing this). Then, readThis will also need a reference to the pointer. Another more obvious bug is that it should read

        if (--x.refcnt == 0)
          delete x;

> Also, you only need to do this if the pointer pointing to the object
> is shared. If the pointer is thread-local, assignment does not need
> to be atomic. And if the object itself is thread-local, not even the
> reference counter need to be atomic.
>

True, these issues only happen with shared pointers. But remember that fields in shared objects are also shared.

I also have a hard time to imagine how the code above works with reading pointers that live in registers or writing to "references to registers". These are never shared, so they could have simpler implementations.


October 10, 2013
Michel Fortin wrote:

Le 27-juin-2013 à 13:04, Walter Bright  a écrit :

> I don't think we should do a fully ref counted GC anyway.

Speaking of the GC, you should probably rethink this point:

> 14. RC objects will still be allocated on the GC heap - this means that a normal
> GC run will reap RC objects that are in a cycle, and RC objects will get automatically
> scanned for heap references with no additional action required by the user.

If you allocate the object from the GC heap, the GC will collect it regardless of its reference count. That's fine as long as all the retaining pointers are visible to the GC. But if you're defining a COM object, likely that's because you'll pass a pointer to an external API, and this API might store the pointer somewhere not scanned by the GC. This API will call AddRef to make sure the object is retained, but if the GC doesn't see that pointer on its heap it'll deallocate and next time external code uses the object everything goes boom! So that doesn't work.

If instead you allocate the object outside of the GC heap and your object contains pointers to the GC heap, you'll need to add roots to the GC for any pointer variable in the object. (This is what DMD/Objective-C currently does.) There's no way to detect cycles with that scheme, but it is simple.

We could use a hybrid scheme with two reference counts: one for internal references that the GC can see and one for external references that the GC cannot see. The GC cannot collect an object if the external reference count is non-zero. If the external count is zero, it can collect the object if the internal reference count reaches zero or if it becomes unreachable from any root. This allows detection of cycles, as long as this cycle is only made of internal references. Care must be taken about incrementing/decrementing the right reference count depending on the context, which sounds tricky.

Or we could use a somewhat less hybrid scheme where we have one reference count and the only thing it does is prevent objects from being deallocated. This can be implemented as one global hash table and you put all objects that have a non-zero reference count in that table. This hash table being scanned by the GC anything in it will never be collected. This will also detect internal cycles like the previous two-counter scheme, but it doesn't allow immediate deallocation as it waits for the GC to deallocate. (This is similar to how it worked in my defunct D/Objective-C bridge that did not rely on tweaking the compiler.)

October 10, 2013
On 6/27/2013 11:38 AM, Michel Fortin wrote:
> Le 27-juin-2013 à 13:04, Walter Bright  a écrit :
>
>> I don't think we should do a fully ref counted GC anyway.
> Speaking of the GC, you should probably rethink this point:
>
>> 14. RC objects will still be allocated on the GC heap - this means that a normal
>> GC run will reap RC objects that are in a cycle, and RC objects will get automatically
>> scanned for heap references with no additional action required by the user.
> If you allocate the object from the GC heap, the GC will collect it regardless of its reference count. That's fine as long as all the retaining pointers are visible to the GC. But if you're defining a COM object, likely that's because you'll pass a pointer to an external API, and this API might store the pointer somewhere not scanned by the GC. This API will call AddRef to make sure the object is retained, but if the GC doesn't see that pointer on its heap it'll deallocate and next time external code uses the object everything goes boom! So that doesn't work.

We already require that if you're going to pass a pointer to any GC allocated data to external code, that you retain a pointer. I see no additional issue with requiring this for COM objects created on the GC heap.

>
> If instead you allocate the object outside of the GC heap and your object contains pointers to the GC heap, you'll need to add roots to the GC for any pointer variable in the object. (This is what DMD/Objective-C currently does.) There's no way to detect cycles with that scheme, but it is simple.

Yes, but that's a lot harder (and more error-prone) than simply requiring the programmer to retain a pointer as I outlined above.

>
> We could use a hybrid scheme with two reference counts: one for internal references that the GC can see and one for external references that the GC cannot see. The GC cannot collect an object if the external reference count is non-zero. If the external count is zero, it can collect the object if the internal reference count reaches zero or if it becomes unreachable from any root. This allows detection of cycles, as long as this cycle is only made of internal references. Care must be taken about incrementing/decrementing the right reference count depending on the context, which sounds tricky.

That also seems far more complex than what I proposed.

>
> Or we could use a somewhat less hybrid scheme where we have one reference count and the only thing it does is prevent objects from being deallocated. This can be implemented as one global hash table and you put all objects that have a non-zero reference count in that table. This hash table being scanned by the GC anything in it will never be collected. This will also detect internal cycles like the previous two-counter scheme, but it doesn't allow immediate deallocation as it waits for the GC to deallocate. (This is similar to how it worked in my defunct D/Objective-C bridge that did not rely on tweaking the compiler.)
>

I'd really like to stick to the shared_ptr<T> model. (A global hash table also is not so simple when factoring in loading and unloading DLLs.) Of course, for the O-C bridge, you can implement it as required to be compatible with O-C.

October 10, 2013
Michel Fortin wrote:

Le 27-juin-2013 à 15:32, Walter Bright  a écrit :

> On 6/27/2013 11:38 AM, Michel Fortin wrote:
>>> 14. RC objects will still be allocated on the GC heap - this means that a normal
>>> GC run will reap RC objects that are in a cycle, and RC objects will get automatically
>>> scanned for heap references with no additional action required by the user.
>> If you allocate the object from the GC heap, the GC will collect it regardless of its reference count. That's fine as long as all the retaining pointers are visible to the GC. But if you're defining a COM object, likely that's because you'll pass a pointer to an external API, and this API might store the pointer somewhere not scanned by the GC. This API will call AddRef to make sure the object is retained, but if the GC doesn't see that pointer on its heap it'll deallocate and next time external code uses the object everything goes boom! So that doesn't work.
>
> We already require that if you're going to pass a pointer to any GC allocated data to external code, that you retain a pointer. I see no additional issue with requiring this for COM objects created on the GC heap.


Perhaps it's just me, but I'd say if you need to anticipate the duration for which you need to keep the object alive when you pass it to some external code it completely defeats the purpose of said external code calling AddRef and Release.

With the scheme you propose, reference counting would be useful inside D code as a way to deallocate some classes of objects early without waiting a GC scan. The GC can collect cycles for those objects.

People passing COM objects to external code however should allocate those objects outside of the GC if they intend to pass the object to external code. They should also add member pointers as GC roots. Also, no cycle detection for those objects. If done right it could be made memory safe, but cycles will leak.

Maybe that could work.

October 10, 2013
On 6/27/2013 1:15 PM, Michel Fortin wrote:
> Le 27-juin-2013 à 15:32, Walter Bright  a écrit :
>
>> On 6/27/2013 11:38 AM, Michel Fortin wrote:
>>>> 14. RC objects will still be allocated on the GC heap - this means that a normal
>>>> GC run will reap RC objects that are in a cycle, and RC objects will get automatically
>>>> scanned for heap references with no additional action required by the user.
>>> If you allocate the object from the GC heap, the GC will collect it regardless of its reference count. That's fine as long as all the retaining pointers are visible to the GC. But if you're defining a COM object, likely that's because you'll pass a pointer to an external API, and this API might store the pointer somewhere not scanned by the GC. This API will call AddRef to make sure the object is retained, but if the GC doesn't see that pointer on its heap it'll deallocate and next time external code uses the object everything goes boom! So that doesn't work.
>> We already require that if you're going to pass a pointer to any GC allocated data to external code, that you retain a pointer. I see no additional issue with requiring this for COM objects created on the GC heap.
>
> Perhaps it's just me, but I'd say if you need to anticipate the duration for which you need to keep the object alive when you pass it to some external code it completely defeats the purpose of said external code calling AddRef and Release.
>
> With the scheme you propose, reference counting would be useful inside D code as a way to deallocate some classes of objects early without waiting a GC scan. The GC can collect cycles for those objects.
>
> People passing COM objects to external code however should allocate those objects outside of the GC if they intend to pass the object to external code. They should also add member pointers as GC roots. Also, no cycle detection for those objects. If done right it could be made memory safe, but cycles will leak.
>
> Maybe that could work.
>

Nothing about the proposal acts to prevent one from constructing COM objects any way they wish, including using malloc/free and managing it all themselves. All COM objects require is an implementation of the COM interface, which says nothing at all beyond having a pointer to an AddRef() and Release().

If you are building a COM object that is to be fired and forgotten into the void of unknown external code, I don't think there's any automated replacement for thinking carefully about it and constructing it accordingly. D's memory safety guarantees cannot, of course, cover unknown and unknowable external code.

What I'm trying to accomplish with this proposal is:

1. A way to do ref-counted memory allocation for specific objects
2. Do it in a guaranteed memory safe manner (at least for the user of those objects)
3. Do it in a way that does not interfere with people who want to use the GC or do manual memory management
4. Not impose penalties on non-refcounted code
5. Do it in a way that offers a similar performance and overhead profile to C++'s shared_ptr<T>
6. Do it in a way that makes it usable to construct COM objects, and work with NSObject's
7. Not require pointer annotations
8. Address the most common "why I can't use D" complaint

What I'm not trying to accomplish is:

1. Replacing all memory allocation in D with ref counting
October 10, 2013
Michel Fortin wrote:

Le 27-juin-2013 à 16:56, Walter Bright  a écrit :

> What I'm trying to accomplish with this proposal is:
>
> 1. A way to do ref-counted memory allocation for specific objects
> 2. Do it in a guaranteed memory safe manner (at least for the user of those objects)
> 3. Do it in a way that does not interfere with people who want to use the GC or do manual memory management
> 4. Not impose penalties on non-refcounted code
> 5. Do it in a way that offers a similar performance and overhead profile to C++'s shared_ptr<T>
> 6. Do it in a way that makes it usable to construct COM objects, and work with NSObject's
> 7. Not require pointer annotations
> 8. Address the most common "why I can't use D" complaint
>
> What I'm not trying to accomplish is:
>
> 1. Replacing all memory allocation in D with ref counting

That list is great for limiting the scope of your DIP. Make sure you include it in the DIP.

So if we return to the core of it, here's the problems that still need solving:

1. Depending on the reference counting scheme implemented, it might be more efficient to have a single operation for an assignment (retain a/release b) operation. I think that should be allowed.
2. If the pointer variable is shared assignment must be atomic (done under a lock, and it must always be the same lock for a given pointer, obviously).
3. If the pointer variable is shared, reading its value must be done atomically with a retain too.

Here's a suggestion for problem number 1 above:

	class MyObject
	{
		// user-implemented
		static void opRetain(MyObject var);  // must accept null
		static void opRelease(MyObject var); // must accept null

		// optional (showing default implementation below)
		// this can be made faster with for some implementations of ref-counting
		// only call it for an assignment, not for constructing/destructing the pointer
		// (notably for Objective-C)
		static void opPtrAssign(ref MyObject var, MyObject newVal) {
			opRetain(newVal);
			opRelease(var);
			var = newVal;
		}
	}

This maps 1 on 1 to the underlying functions for Objective-C ARC.

I don't have a solution for the shared case. We do in fact have a tail-shared problem here. If I have a shared(MyObject), the pointer is as much shared along with the object. When the pointer itself is shared, we need a lock to access it reliably and that can only be provided by the outer context.

If we had a way to express tail-shared, then we could repeat the above three functions for tail-shared object pointers and it'd work reliably for that.

October 10, 2013
Michel Fortin Wrote:

Le 27-juin-2013 à 18:35, Michel Fortin a écrit :

> 	class MyObject
> 	{
> 		// user-implemented
> 		static void opRetain(MyObject var);  // must accept null
> 		static void opRelease(MyObject var); // must accept null
>
> 		// optional (showing default implementation below)
> 		// this can be made faster with for some implementations of ref-counting
> 		// only call it for an assignment, not for constructing/destructing the pointer
> 		// (notably for Objective-C)
> 		static void opPtrAssign(ref MyObject var, MyObject newVal) {
> 			opRetain(newVal);
> 			opRelease(var);
> 			var = newVal;
> 		}
> 	}
>
> This maps 1 on 1 to the underlying functions for Objective-C ARC.


Actually, I made a small error in opRetain. To match Objective-C ARC it should return the retained object:

	static MyObject opRetain(MyObject var);  // must accept null

and the default implementation for opPtrAssign would then become:

	static void opPtrAssign(ref MyObject var, MyObject newVal) {
		newVal = opRetain(newVal);
		opRelease(var);
		var = newVal;
	}

One reason is that Objective-C blocks (equivalent of delegate literals) are stack allocated. If you call retain on a block, it'll make a copy on the heap and return that copy.

Another reason for opRetain to return the object is to enable tail-call optimization for cases like this one:

	NSObject x;

	NSObject getX() {
		return x; // D ARC should insert an implicit opRetain here
	}

Of course it doesn't necessarily need to work that way, but it'd certainly make it easier to integrate with Objective-C if it worked that way.

October 10, 2013
Rainer Schuetze wrote:

On 25.06.2013 23:00, Walter Bright wrote:
> 4. Null checks are done before calling any AddRef() or Release().

Here is another nitpick that needs to be addressed: As mentioned in the implementation of ComObject invariants (and out contracts) must not be called when returning from Release, if it is ok to actually delete the object.

October 10, 2013
Rainer Schuetze wrote:

On 28.06.2013 00:35, Michel Fortin wrote:
> So if we return to the core of it, here's the problems that still
> need solving:
>
> 1. Depending on the reference counting scheme implemented, it might
> be more efficient to have a single operation for an assignment
> (retain a/release b) operation. I think that should be allowed.

> 2. If the pointer variable is shared assignment must be atomic (done
> under a lock, and it must always be the same lock for a given
> pointer, obviously).

> 3. If the pointer variable is shared, reading its value must be done
> atomically with a retain too.

I just had an idea, maybe it is obvious and just distracts, but I thought it might be worth sharing:

Instead of defining methods on the class type, we could also redefine the reference type. The compiler detects a type declaration "reference_type" in the class declaration and replaces all references to that class with that type.

class C
{
    alias shared_ptr!C reference_type;
}

C c = new C;

is lowered to

shared_ptr!C c = new C;

"new C" returns a shared_ptr!C aswell.

It is then up to the implementation of shared_ptr to define what member functions to call for reference counting and to deal with proper shared semantics in assignments. It can also define whether opCall should increment the reference count or not. For most of the needed functionality, struct semantics work out-of-the-box.

2 immediate gotchas

- In a class hierarchy, you would want to define the reference_type in the base class only, so maybe it has to be a template. I'm not sure implicite casting to base class reference type and interfaces can be implemented.

- the implementation of the shared_ptr template will have to be able to deal with the "raw" reference, so that might need some type modifier/annotation. I think this might also be true for the addRef/release version, if the implementation is not just working on the refcount, but is also calling other functions.

- To elide redundant reference counting, the compiler will need annotations here, too. Move semantics of structs might reduce the number of reference count operations already, though.

October 10, 2013
Rainer Schuetze wrote:

On 28.06.2013 09:07, Walter Bright wrote:
> Will add to proposal.
>
> On 6/27/2013 11:27 PM, Rainer Schuetze wrote:
>> On 25.06.2013 23:00, Walter Bright wrote:
>>> 4. Null checks are done before calling any AddRef() or Release().
>>
>> Here is another nitpick that needs to be addressed: As mentioned in
>> the implementation of ComObject invariants (and out contracts) must
>> not be called when returning from Release, if it is ok to actually
>> delete the object.
>>


Sorry to produce these drop by drop, but while writing the last mail, I noticed another issue to think about:

What happens if the class also implements interfaces? A reference of the interface type must do reference counting as well. So the interface must also define AddRef and Release. This is currently true for COM-interfaces derived from IUnknown, but not for other interfaces.