October 11, 2013
On Thursday, 10 October 2013 at 02:28:13 UTC, Walter Bright wrote:
> Steven Schveighoffer wrote:
>
> On Jun 30, 2013, at 8:18 PM, Walter Bright wrote:
>
> >
> > I very much want to avoid requiring atomic counts - it's a
> major performance penalty. Note that if the GC is reaping a cycle, nobody else is referencing the object, so this should not be an issue.
>
> I think you didn't understand what Michel was saying.
>
> Take for example:
>
> A->B->C->A
>
> this is a cycle.  Imagine that nobody else is pointing at A, B or C.  Fine.  The GC starts to collect this cycle.
>
> But let's say that D is not being collected *AND* B has a reference to D.
>
> B could be getting destroyed in one thread, and decrementing D's reference count, while someone else in another thread is incrementing/decrementing D's reference count.
>
> I agree that RC optimally is thread-local.  But if you involve the GC, then ref incs and decs have to be atomic.

I think this ties into the requirement that after the GC collects thread-local objects, they must be finalized by the thread that owns them (assuming it's still alive).  What's missing is some way to track what thread owns an object.  This isn't super difficult to add in the simple case, but if we allow thread-local objects to be transferred between threads, then the transferral of ownership has to be communicated to the GC.  Assuming for the moment that's not a problem though, I think RC updates could be non-atomic for thread-local data.
October 12, 2013
On Thursday, 10 October 2013 at 08:55:00 UTC, Robert Schadek
wrote:
> On 10/10/2013 03:45 AM, Walter Bright wrote:
>> Rainer Schuetze wrote:
>>
>> You have to put the lock around the pair of AddRef and Release, but if
>> the compiler already splits this into two function calls, this cannot
>> be done in the implementation.
> I would imagine the counter to be manipulated with atomic_add_and_fetch
> operations, so no locks are required.

On shared objects, yes. Local objects need no atomics at all.
October 12, 2013

On 12.10.2013 04:16, inout wrote:
> On Thursday, 10 October 2013 at 08:55:00 UTC, Robert Schadek
> wrote:
>> On 10/10/2013 03:45 AM, Walter Bright wrote:
>>> Rainer Schuetze wrote:
>>>
>>> You have to put the lock around the pair of AddRef and Release, but if
>>> the compiler already splits this into two function calls, this cannot
>>> be done in the implementation.
>> I would imagine the counter to be manipulated with atomic_add_and_fetch
>> operations, so no locks are required.
>
> On shared objects, yes. Local objects need no atomics at all.

Atomic increments/decrements are not good enough for shared references. See this example from later in the discussion:

Consider a global shared reference R that holds the last reference to an object O. One thread exchanges the reference with another reference P while another thread reads the reference into S.

shared(C) R = O;      ; refcnt of O is 1

in pseudo-assembly missing null-checks:

Thread1 (R = P)        Thread2 (S = R)

                       mov ecx,[R]
                       ; thread suspended
mov eax,[P]
inc [eax].refcnt
mov ebx,[R]
mov [R],eax
dec [ebx].refcnt      ; refcnt of O now 0
jnz done
call delete_ebx
                       ; thread resumed
                       inc [ecx].refcnt
done:

The increment on [ecx].refcnt modifies garbage.
October 12, 2013
On 2013-10-12 06:16:17 +0000, Rainer Schuetze <r.sagitario@gmx.de> said:

> On 12.10.2013 04:16, inout wrote:
>> On Thursday, 10 October 2013 at 08:55:00 UTC, Robert Schadek
>> wrote:
>>> I would imagine the counter to be manipulated with atomic_add_and_fetch
>>> operations, so no locks are required.
>> 
>> On shared objects, yes. Local objects need no atomics at all.
> 
> Atomic increments/decrements are not good enough for shared references. See this example from later in the discussion:
> 
> Consider a global shared reference R that holds the last reference to an object O. One thread exchanges the reference with another reference P while another thread reads the reference into S.
> 
> shared(C) R = O;      ; refcnt of O is 1
> 
> in pseudo-assembly missing null-checks:
> 
> Thread1 (R = P)        Thread2 (S = R)
> 
>                         mov ecx,[R]
>                         ; thread suspended
> mov eax,[P]
> inc [eax].refcnt
> mov ebx,[R]
> mov [R],eax
> dec [ebx].refcnt      ; refcnt of O now 0
> jnz done
> call delete_ebx
>                         ; thread resumed
>                         inc [ecx].refcnt
> done:
> 
> The increment on [ecx].refcnt modifies garbage.

I think you are mixing shared references with shared objects. Atomic increment/decrement will work fine for shared objects with unshared pointers to it. But if the pointer to the object is itself shared and available from two threads (as in your example) then you need to properly serialize reads and writes to that pointer (with a mutex or through other means).

Of course, then you fall into the problem that in D you are not able to set different attributes to an object and to its reference. (Hint: const(Object)ref)

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

October 12, 2013
I've made short roundup of the different features/requirements that have been mentioned (may have forgotten some and added some) as this thread has a complexity that presumably makes it quite hard to follow for most readers. I have also attached an estimated priority for each requirement based on the discussion and my own experiences.

 - Memory safety [very important but also very difficult/limiting]
     Disallow escaping uncounted references to reference counted
     memory. Keywords: pure, scope, isolated/owned
 - COM compatible [important]
     Needs to support internal reference counting using
     AddRef/ReleaseRef, while obeying to the call convention
 - Objective-C compatible [important]
     Weak references, manual memory management and autorelease pools are
     some keywords here
 - Handle reference cycles [nice to have]
     Requires GC memory for storing the instances
 - Weak pointers [critical]
     Only briefly mentioned, but critical for many data structures
     (e.g. graphs or caches) requires external reference counting
 - Not require two separate counts for COM [mildly important]
     Using GC memory would require a second reference count for the
     D side of things, which is not desirable for multiple reasons
 - Support type qualifiers [critical]
     All usual type qualifiers and conversions should work as expected.
     This is not possible in a pure template based solution.
 - Support non-class types [nice to have]
     Although slightly limiting, classes provide a convenient
     abstraction and will arguably capture >90% use cases just fine
 - Support referencing fields/slices [nice to have]
     Letting references to members escape in a safe way would greatly
     increase the flexibility, but ties it tightly to the GC
 - Allow manual memory management [critical]
     For COM/Obj-C and any situation where external code is to take
     responsibility of the ref counting this needs to be an option
 - Not require new annotations [important]
     Getting this to work without introducing new keywords/syntax is
     strongly preferred by Walter
 - Safe shared(RefCountedType) variables [important]
     There needs to be a way to make shared references thread-safe
 - Support OOP reasonably well [important]
     The usual up and down casts should work and calling functions of
     base classes needs to be safe

Please mention any points that I have forgotten so that we have some kind of unit test against which proposed designs can be checked.
October 12, 2013
To support most of the requirements we need to offer some control over the reference type. Forcing the reference to be a pointer to the class instance precludes proper weak references and makes thread-safety difficult.

Rainer Schütze's proposal [1] looked promising, but didn't quite work out. However, by going a bit further, I think this approach can be fixed and will provide all the flexibility needed to implement solutions that can satisfy any of those requirements.

The basic idea is the same: Any reference to a class that is recognized as being reference counted is replaced by a struct that performs the reference counting using RAII (e.g. std.typecons.RefCounted). This allows any reference counting scheme to be implemented (internal/external, support weak refs or not, global counter table, GC memory or malloc, etc.).

TL;DR Let some code speak instead of a full blown spec first:

struct RefCounted(T) { /* ... */ }

// @referenceType!RefCounted
class C {
	// the presence of a C.ReferenceType template makes the class a
	// reference counted class
	// Note: A @referenceType UDA as above defined in object.d
        //       could be a cleaner/more explicit alternative
	alias ReferenceType = RefCounted;

	// C is now internally renamed to __ref_C to avoid ambiguities
	// and "C" itself is an alias for ReferenceType!__ref_C
	pragma(msg, C.stringof); // "RefCounted!__ref_C"

	void method()
	{
		// The "this" pointer, too, is of the reference counted
		// type. caveats: how to handle private fields? COM call
 		// convention?
		pragma(msg, typeof(this)); // "RefCounted!__ref_C"

		// Alternative: allow only pure functions to avoid
		// escaping references

		// Another alternative is to make 'scope' powerful
		// enough and use that:
		pragma(msg, typeof(this)); // "scope __ref_C"
	}
}

// The reference type itself is never const/immutable
// to enable reference counting for qualified types
C c; // -> RefCounted!__ref_C
const(C) d; // -> RefCounted!(const(__ref_C))

// To support the usual implicit conversions, some substitution is
// needed, since we have no implicit cast support for UDTs in the
// language
d = c; // d = typeof(D)(c)
       // or
       // d = typeof(D).implicitCastFrom(c)
       // or
       // d = typeof(C).implicitCastTo!D

// shared, however, is applied to the reference count itself (and
// transitively to the object) to force a thread-safety - or rather to
// avoid accidental use of unsafe implementations for shared references)
shared(const(C)) e; // -> shared(RefCounted!(const(__ref_C)))

---

Caveat: Changing the "this" pointer from '__ref_C' to 'RefCounted!__ref_C' has implications on the calling convention, which needs to be taken into account when COM objects are involved. A simple COMPtr-like struct that only contains the target pointer may be enough here, though.

Also, to guarantee memory safety, some additional measures need to be taken to avoid escaping plain references to refcounted memory. One solution is the use of isolated/owned types, another is to make 'scope' not only check for shallow reference escaping, but also check for escaping of references to fields (similar thing but behaves differently). Both combined would of course be ideal. I think this is an issue that is mostly orthogonal to the refcount topic. See also the corresponding thread [2].

[1]: http://forum.dlang.org/thread/l34lei$255v$1@digitalmars.com?page=5#post-l352nk:242g3b:246:40digitalmars.com
[2]: http://forum.dlang.org/thread/kluaojijixhwigoujeip@forum.dlang.org
October 12, 2013
On Saturday, 12 October 2013 at 06:16:24 UTC, Rainer Schuetze wrote:
> in pseudo-assembly missing null-checks:
>
> Thread1 (R = P)        Thread2 (S = R)
>
>                        mov ecx,[R]
>                        ; thread suspended

You need an sequentially consistent write. You also need to increment the refcount BEFORE ! This codegen is incorrect.

> mov eax,[P]
> inc [eax].refcnt

Same here.

> mov ebx,[R]
> mov [R],eax
> dec [ebx].refcnt      ; refcnt of O now 0
> jnz done
> call delete_ebx
>                        ; thread resumed
>                        inc [ecx].refcnt
> done:
>
> The increment on [ecx].refcnt modifies garbage.

This can be done atomically (even with eventually consistent increment, don't need full sequential consistency here, you still need fully sequential consistent decrement).
October 13, 2013
On Saturday, 12 October 2013 at 14:03:26 UTC, Sönke Ludwig wrote:
> I've made short roundup of the different features/requirements that have been mentioned (may have forgotten some and added some) as this thread has a complexity that presumably makes it quite hard to follow for most readers. I have also attached an estimated priority for each requirement based on the discussion and my own experiences.
>
>  - Memory safety [very important but also very difficult/limiting]
>      Disallow escaping uncounted references to reference counted
>      memory. Keywords: pure, scope, isolated/owned

We have a first missing block here :D

I do think this is mandatory.

>  - Objective-C compatible [important]
>      Weak references, manual memory management and autorelease pools are
>      some keywords here

Can someone explain me what an autorelease pool is ?

>  - Handle reference cycles [nice to have]
>      Requires GC memory for storing the instances

The good new is that we can (and IMO should) layer ref counting on top of GC. This is the only way to make both work nicely together and have safety net for leakage.

>  - Weak pointers [critical]
>      Only briefly mentioned, but critical for many data structures
>      (e.g. graphs or caches) requires external reference counting

Easy for RefCounted but become tricky for GC stuff.

>  - Not require two separate counts for COM [mildly important]
>      Using GC memory would require a second reference count for the
>      D side of things, which is not desirable for multiple reasons

Can you explain that one a bit more ? Especially how it require 2 count.

>  - Support type qualifiers [critical]
>      All usual type qualifiers and conversions should work as expected.
>      This is not possible in a pure template based solution.

Here we have the second piece missing. We need a way to tail qualify template.

>  - Not require new annotations [important]
>      Getting this to work without introducing new keywords/syntax is
>      strongly preferred by Walter

We have 2 missing bloc to make that work (from a language perpective, many runtime/compiler magic is also required).

>  - Support OOP reasonably well [important]
>      The usual up and down casts should work and calling functions of
>      base classes needs to be safe
>

I see no way to make that work without increasing the scope of scope (huhuhu :P)

> Please mention any points that I have forgotten so that we have some kind of unit test against which proposed designs can be checked.

You did an excellent job here. I guess we have 2 missing piece (and an incomplete one in the name of scope) to get sorted out and we can get something really nice here !
October 13, 2013
On 2013-10-13 01:15:49 +0000, "deadalnix" <deadalnix@gmail.com> said:

> Can someone explain me what an autorelease pool is ?

A basic concept in Objective-C to make manual reference counting bearable. As things are moving to *automatic* reference counting now autorelease pools are becoming less important, but they remain there for backward compatibility and autoreleased objects must be handled correctly by ARC following to the existing conventions.

The concept is to have functions return autoreleased objects, objects pending release. Each time you autorelease an object, instead of the counter being decremented immediately, the object gets added to the autorelease pool and the pool decrement the counter later when it gets drained. So, when the caller gets an autoreleased object, it doesn't have to decrement the counter when it stopped using the object as a temporary, the object will be cleaned up automatically later, generally at the next iteration of the event loop. You only need to retain the object if you're storing it somewhere else than a local variable.

So this is what made manual reference counting bearable in Objective-C. Autorelease pool support is only useful and needed for correctly implementing ARC for Objective-C object types.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

October 13, 2013

On 12.10.2013 14:16, Michel Fortin wrote:
> On 2013-10-12 06:16:17 +0000, Rainer Schuetze <r.sagitario@gmx.de> said:
>
>> On 12.10.2013 04:16, inout wrote:
>>> On Thursday, 10 October 2013 at 08:55:00 UTC, Robert Schadek
>>> wrote:
>>>> I would imagine the counter to be manipulated with atomic_add_and_fetch
>>>> operations, so no locks are required.
>>>
>>> On shared objects, yes. Local objects need no atomics at all.
>>
>> Atomic increments/decrements are not good enough for shared
>> references. See this example from later in the discussion:
>>
>> Consider a global shared reference R that holds the last reference to
>> an object O. One thread exchanges the reference with another reference
>> P while another thread reads the reference into S.
>>
>> shared(C) R = O;      ; refcnt of O is 1
>>
>> in pseudo-assembly missing null-checks:
>>
>> Thread1 (R = P)        Thread2 (S = R)
>>
>>                         mov ecx,[R]
>>                         ; thread suspended
>> mov eax,[P]
>> inc [eax].refcnt
>> mov ebx,[R]
>> mov [R],eax
>> dec [ebx].refcnt      ; refcnt of O now 0
>> jnz done
>> call delete_ebx
>>                         ; thread resumed
>>                         inc [ecx].refcnt
>> done:
>>
>> The increment on [ecx].refcnt modifies garbage.
>
> I think you are mixing shared references with shared objects. Atomic
> increment/decrement will work fine for shared objects with unshared
> pointers to it. But if the pointer to the object is itself shared and
> available from two threads (as in your example) then you need to
> properly serialize reads and writes to that pointer (with a mutex or
> through other means).

I agree, that's why I used the term "shared reference", too ;-)

If you are using only shared objects, but not shared references, you'll have to use message passing coming with its own set of synchronization operations that are not easily made lock-free.

>
> Of course, then you fall into the problem that in D you are not able to
> set different attributes to an object and to its reference. (Hint:
> const(Object)ref)
>

I think being able to distinguish these types is missing from the type system. It's even worse for the class describing TypeInfo object as it is used for both the reference and the object.