October 10, 2013
On 6/30/2013 4:35 PM, Michel Fortin wrote:
> Le 30-juin-2013 à 18:11, Walter Bright  a écrit :
>
>> On 6/30/2013 3:05 PM, Michel Fortin wrote:
>>> Le 2013-06-30 à 16:32, Walter Bright  a écrit :
>>>
>>>> Amended as:
>>>>
>>>> 6. If a class or struct contains RC fields, calls to Release() for those fields will
>>>> be added to the destructor, and a destructor will be created if one doesn't exist already.
>>>> Release() implementations should take care to not destroy objects that are already destroyed,
>>>> which can happen if the objects are allocated on the GC heap and the GC removes a cycle of
>>>> refcounted objects.
>>> Good advice. But... how do you implement that? For one thing, I doubt there's an API in the GC you can query for deleted objects, and if there was it'd be inefficient to call it for every call to Release. And also, will a virtual call to a function of a destroyed object work in the first place? It all seems quite fragile to me.
>> The GC doesn't actually delete anything while it is doing a collection cycle. So the refcount could simply be checked.
>
> ... checked and decremented, and if it reaches zero in the thread the GC is currently running then it doesn't have to delete the object as, in theory, it should be destructed as part of the same run. Ok, I get it now.
>
> You should add a requirement that the reference counter be atomic because the GC can run in any thread and you still need to decrement counters of referenced objects in destructor.

I very much want to avoid requiring atomic counts - it's a major performance penalty. Note that if the GC is reaping a cycle, nobody else is referencing the object, so this should not be an issue.

>
> Honestly, I think it'd be much easier if the runtime provided its own base object you could use for reference counting with the GC to collect cycles. The provided implementation could rely on internal details of the GC since both would be part of druntime. There isn't much room for alternate implementations when the GC is involved anyway.
>
October 10, 2013
Steven Schveighoffer wrote:

On Jun 30, 2013, at 8:18 PM, Walter Bright wrote:

>
> I very much want to avoid requiring atomic counts - it's a major performance penalty. Note that if the GC is reaping a cycle, nobody else is referencing the object, so this should not be an issue.

I think you didn't understand what Michel was saying.

Take for example:

A->B->C->A

this is a cycle.  Imagine that nobody else is pointing at A, B or C.  Fine.  The GC starts to collect this cycle.

But let's say that D is not being collected *AND* B has a reference to D.

B could be getting destroyed in one thread, and decrementing D's reference count, while someone else in another thread is incrementing/decrementing D's reference count.

I agree that RC optimally is thread-local.  But if you involve the GC, then ref incs and decs have to be atomic.

I don't think this is that bad.  iOS on ARM which has terrible atomic primitives uses atomic reference counts.

If you do NOT involve the GC and are careful about cycles, then you could potentially have a RC solution that does not require atomics.  But that would have to be a special case, with the danger of having cycles.

-Steve
October 10, 2013
Michel Fortin wrote:
Le 30-juin-2013 à 20:25, Steven Schveighoffer a écrit :

> A->B->C->A
>
> this is a cycle.  Imagine that nobody else is pointing at A, B or C.  Fine. The GC starts to collect this cycle.
>
> But let's say that D is not being collected *AND* B has a reference to D.
>
> B could be getting destroyed in one thread, and decrementing D's reference count, while someone else in another thread is incrementing/decrementing D's reference count.
>
> I agree that RC optimally is thread-local.  But if you involve the GC, then ref incs and decs have to be atomic.

Exactly what I was trying to explain. Thanks.

> I don't think this is that bad.  iOS on ARM which has terrible atomic primitives uses atomic reference counts.

Moreover iOS uses a single spinlock to protect a global hash table containing all reference counts.

> If you do NOT involve the GC and are careful about cycles, then you could potentially have a RC solution that does not require atomics.  But that would have to be a special case, with the danger of having cycles.

Not involving the GC is quite difficult: you need to be absolutely sure you have no pointer pointing to that thread-local ref-counted object anywhere in the GC-heap. Unfortunately, there's no way to guaranty statically what is part of the GC heap and what is not, so any non-atomic reference counter is not @safe.
October 10, 2013
On 6/30/2013 5:25 PM, Steven Schveighoffer wrote:
> On Jun 30, 2013, at 8:18 PM, Walter Bright wrote:
>
>> I very much want to avoid requiring atomic counts - it's a major performance penalty. Note that if the GC is reaping a cycle, nobody else is referencing the object, so this should not be an issue.
> I think you didn't understand what Michel was saying.
>
> Take for example:
>
> A->B->C->A
>
> this is a cycle.  Imagine that nobody else is pointing at A, B or C.  Fine. The GC starts to collect this cycle.
>
> But let's say that D is not being collected *AND* B has a reference to D.
>
> B could be getting destroyed in one thread, and decrementing D's reference count, while someone else in another thread is incrementing/decrementing D's reference count.
>
> I agree that RC optimally is thread-local.  But if you involve the GC, then ref incs and decs have to be atomic.

This is actually a problem right now with the GC, as destructors may be run in another thread than they belong in. The situation you describe is not worse or better than that, it's the same thing. The solution is to run the destructors in the same thread the objects belong in.

>
> I don't think this is that bad.  iOS on ARM which has terrible atomic primitives uses atomic reference counts.

It's bad. ARM is not the only processor out there.
October 10, 2013
Steven Schveighoffer wrote:
On Jun 30, 2013, at 10:08 PM, Michel Fortin wrote:

> Le 30-juin-2013 à 20:25, Steven Schveighoffer a écrit :
>> I don't think this is that bad.  iOS on ARM which has terrible atomic primitives uses atomic reference counts.
>
> Moreover iOS uses a single spinlock to protect a global hash table containing all reference counts.

Hearing this, I actually find it amazing how well it works :)

>> If you do NOT involve the GC and are careful about cycles, then you could potentially have a RC solution that does not require atomics.  But that would have to be a special case, with the danger of having cycles.
>
> Not involving the GC is quite difficult: you need to be absolutely sure you have no pointer pointing to that thread-local ref-counted object anywhere in the GC-heap. Unfortunately, there's no way to guaranty statically what is part of the GC heap and what is not, so any non-atomic reference counter is not @safe.

This is true, I was thinking of garbage collected RC object referring to RC object, I wasn't thinking of fully GC object referring to RC object.

In terms of pure functions and possibly a @nogc attribute, this might be a possibility.  Maybe at some point we have a @nogcref attribute we attach to specific *types* so the compiler prevents you from storing any references to that type in the GC.

I think it is be important to reserve the possibility for having cases where RC inc/dec is not atomic.  Especially where we have D's type system identifying what is shared and what is not.  Especially when there is the possibility for thread-local GCs.

-Steve
October 10, 2013
Steven Schveighoffer wrote:

On Jun 30, 2013, at 10:26 PM, Walter Bright wrote:

>
> On 6/30/2013 5:25 PM, Steven Schveighoffer wrote:
>> On Jun 30, 2013, at 8:18 PM, Walter Bright wrote:
>>
>>> I very much want to avoid requiring atomic counts - it's a major performance penalty. Note that if the GC is reaping a cycle, nobody else is referencing the object, so this should not be an issue.
>> I think you didn't understand what Michel was saying.
>>
>> Take for example:
>>
>> A->B->C->A
>>
>> this is a cycle.  Imagine that nobody else is pointing at A, B or C.  Fine.  The GC starts to collect this cycle.
>>
>> But let's say that D is not being collected *AND* B has a reference to D.
>>
>> B could be getting destroyed in one thread, and decrementing D's reference count, while someone else in another thread is incrementing/decrementing D's reference count.
>>
>> I agree that RC optimally is thread-local.  But if you involve the GC, then ref incs and decs have to be atomic.
>
> This is actually a problem right now with the GC, as destructors may be run in another thread than they belong in. The situation you describe is not worse or better than that, it's the same thing. The solution is to run the destructors in the same thread the objects belong in.

I think that's a tall order presently.  For instance, on linux, the threads are all stopped using a signal.  It's a very bad idea to run destructors in a signal handler.

What it seems like you are saying is that a prerequisite for ref counting is to have thread-local GC working.  If that is the case, we need to start a thread-local GC "thread" before this goes any further.

>>
>> I don't think this is that bad.  iOS on ARM which has terrible atomic primitives uses atomic reference counts.
>
> It's bad. ARM is not the only processor out there.


Pragmatically, I think if D targets x86 variants and ARM, it is well-situated in the mainstream of existing devices.  Yes, it would be nice if it could target other obscure platforms, but if we are talking ref counting works poorly on those, I don't think we are any worse off than today.  Note that we can keep the options open, and implement atomic RC now without many headaches.

-Steve
October 10, 2013
Michel Fortin wrote:
Le 30-juin-2013 à 22:26, Walter Bright  a écrit :

> This is actually a problem right now with the GC, as destructors may be run in another thread than they belong in. The situation you describe is not worse or better than that, it's the same thing. The solution is to run the destructors in the same thread the objects belong in.

Indeed. Maybe that could work. How ironic that we can't implement RC efficiently because of the GC.

That said, it strongly favors having a base RC object implementation in druntime, where it can be kept in sync with the GC.
October 10, 2013
On 6/30/2013 7:36 PM, Steven Schveighoffer wrote:
> On Jun 30, 2013, at 10:26 PM, Walter Bright wrote:
>
>> On 6/30/2013 5:25 PM, Steven Schveighoffer wrote:
>>> On Jun 30, 2013, at 8:18 PM, Walter Bright wrote:
>>>
>>>> I very much want to avoid requiring atomic counts - it's a major performance penalty. Note that if the GC is reaping a cycle, nobody else is referencing the object, so this should not be an issue.
>>> I think you didn't understand what Michel was saying.
>>>
>>> Take for example:
>>>
>>> A->B->C->A
>>>
>>> this is a cycle.  Imagine that nobody else is pointing at A, B or C.  Fine.  The GC starts to collect this cycle.
>>>
>>> But let's say that D is not being collected *AND* B has a reference to D.
>>>
>>> B could be getting destroyed in one thread, and decrementing D's reference count, while someone else in another thread is incrementing/decrementing D's reference count.
>>>
>>> I agree that RC optimally is thread-local.  But if you involve the GC, then ref incs and decs have to be atomic.
>> This is actually a problem right now with the GC, as destructors may be run in another thread than they belong in. The situation you describe is not worse or better than that, it's the same thing. The solution is to run the destructors in the same thread the objects belong in.
> I think that's a tall order presently.  For instance, on linux, the threads are all stopped using a signal.  It's a very bad idea to run destructors in a signal handler.
>
> What it seems like you are saying is that a prerequisite for ref counting is to have thread-local GC working.  If that is the case, we need to start a thread-local GC "thread" before this goes any further.

Not really. This doesn't make anything worse. Also, the proposed solution to this issue is to post the "destruct" list to the appropriate thread, and that thread runs it next time it calls the GC.

>
>>> I don't think this is that bad.  iOS on ARM which has terrible atomic primitives uses atomic reference counts.
>> It's bad. ARM is not the only processor out there.
>
> Pragmatically, I think if D targets x86 variants and ARM, it is well-situated in the mainstream of existing devices.  Yes, it would be nice if it could target other obscure platforms, but if we are talking ref counting works poorly on those, I don't think we are any worse off than today.  Note that we can keep the options open, and implement atomic RC now without many headaches.
>

We don't need to require atomic RC for these.
October 10, 2013
On 6/30/2013 7:47 PM, Michel Fortin wrote:
> Le 30-juin-2013 à 22:26, Walter Bright  a écrit :
>
>> This is actually a problem right now with the GC, as destructors may be run in another thread than they belong in. The situation you describe is not worse or better than that, it's the same thing. The solution is to run the destructors in the same thread the objects belong in.
> Indeed. Maybe that could work. How ironic that we can't implement RC efficiently because of the GC.
>
> That said, it strongly favors having a base RC object implementation in druntime, where it can be kept in sync with the GC.
>

The GC doesn't need to know about it.
October 10, 2013
Steven Schveighoffer wrote:

On Jul 1, 2013, at 3:11 AM, Walter Bright wrote:

>
> On 6/30/2013 7:36 PM, Steven Schveighoffer wrote:
>> I think that's a tall order presently.  For instance, on linux, the threads are all stopped using a signal.  It's a very bad idea to run destructors in a signal handler.
>>
>> What it seems like you are saying is that a prerequisite for ref counting is to have thread-local GC working.  If that is the case, we need to start a thread-local GC "thread" before this goes any further.
>
> Not really. This doesn't make anything worse. Also, the proposed solution to this issue is to post the "destruct" list to the appropriate thread, and that thread runs it next time it calls the GC.

I really urge you to make this a separate project.  It's not trivial. Logically, it's sound, but the implementation will be very difficult.  I also think Sean (and probably others) should be involved for that discussion.

>>
>> Pragmatically, I think if D targets x86 variants and ARM, it is well-situated in the mainstream of existing devices.  Yes, it would be nice if it could target other obscure platforms, but if we are talking ref counting works poorly on those, I don't think we are any worse off than today.  Note that we can keep the options open, and implement atomic RC now without many headaches.
>>
>
> We don't need to require atomic RC for these.

I didn't say that.  I said we could implement atomic RC without any changes to the GC, and worry about optimizing with non-atomic RC later.  As long as we make it *possible*.

-Steve