October 13, 2013

On 12.10.2013 20:31, deadalnix wrote:
> On Saturday, 12 October 2013 at 06:16:24 UTC, Rainer Schuetze wrote:
>> in pseudo-assembly missing null-checks:
>>
>> Thread1 (R = P)        Thread2 (S = R)
>>
>>                        mov ecx,[R]
>>                        ; thread suspended
>
> You need an sequentially consistent write. You also need to increment
> the refcount BEFORE ! This codegen is incorrect.

How do you increment the counter without reading its address?

>
>> mov eax,[P]
>> inc [eax].refcnt
>
> Same here.

Same here ;-)

>
>> mov ebx,[R]
>> mov [R],eax
>> dec [ebx].refcnt      ; refcnt of O now 0
>> jnz done
>> call delete_ebx
>>                        ; thread resumed
>>                        inc [ecx].refcnt
>> done:
>>
>> The increment on [ecx].refcnt modifies garbage.
>
> This can be done atomically (even with eventually consistent increment,
> don't need full sequential consistency here, you still need fully
> sequential consistent decrement).

According to the "Handbook of Garbage Collection" by Richard Jones eager lock-free reference counting can only be done with a cas2 operation modifying two seperate locations atomically (algorithm 18.2 "Eager reference counting with CompareAndSwap is broken"). This might be the quoted paper: http://scholr.ly/paper/2199608/lock-free-reference-counting

Unfortunately the CAS2 operation does not exist in most processors.
October 13, 2013
On Sunday, 13 October 2013 at 07:03:39 UTC, Rainer Schuetze wrote:
> How do you increment the counter without reading its address?
>

I assumed that the reference count was in a struct with the data, and refcounted point to it.

In this case, if you remove the pointer via a sequencially consistent write (while keeping a local copy internally) and THEN decrement the counter, the other thread will access another object (or skip on a null check). Granted the read is sequencially consistent.
October 13, 2013
On 2013-10-13 06:54:04 +0000, Rainer Schuetze <r.sagitario@gmx.de> said:

> I agree, that's why I used the term "shared reference", too ;-)
> 
> If you are using only shared objects, but not shared references, you'll have to use message passing coming with its own set of synchronization operations that are not easily made lock-free.

For one of my projects I implemented a shared pointer like this. It uses the pointer value itself as a spin lock with the assumption that -1 is an invalid pointer value:

1. read pointer value
2. if read value is -1 go to line 1 (spin)
3. compare and swap (previously read value <-> -1)
4. if failure go to line 1 (spin)
// now pointer is "locked", its value is -1 but we have a copy of the original
5. copy pointer locally or assign to it (and update counter)
6. write back pointer value atomically to replace the -1

No mutex, but there's a spin lock so it's not good if there's contention.

That said, I find it extremely rare to want a shared pointer that isn't already protected by a mutex alongside other variables, or that isn't propagated using some form of message passing.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

October 13, 2013
On Sunday, 13 October 2013 at 07:03:39 UTC, Rainer Schuetze wrote:
>
> According to the "Handbook of Garbage Collection" by Richard Jones eager lock-free reference counting can only be done with a cas2 operation modifying two seperate locations atomically (algorithm 18.2 "Eager reference counting with CompareAndSwap is broken"). This might be the quoted paper: http://scholr.ly/paper/2199608/lock-free-reference-counting
>
> Unfortunately the CAS2 operation does not exist in most processors.

I suppose it's worth noting that Boost (and now standard C++) has a shared_ptr that works across threads and the implementation I've seen doesn't use a mutex.  In fact, I think the Boost one doesn't even use CAS on x86, though it's been quite a few years so my memory could be wrong on that last detail.
October 13, 2013
On 10/13/13 11:19, deadalnix wrote:
> On Sunday, 13 October 2013 at 07:03:39 UTC, Rainer Schuetze wrote:
>> How do you increment the counter without reading its address?
>>
> 
> I assumed that the reference count was in a struct with the data, and refcounted point to it.
> 
> In this case, if you remove the pointer via a sequencially consistent write (while keeping a local copy internally) and THEN decrement the counter, the other thread will access another object (or skip on a null check). Granted the read is sequencially consistent.

No, if you have two (or more) threads concurrently accessing the object,
it is possible that one threads reads the pointer, then sleeps before
incrementing the count. Then another thread comes and /destroys/ the
object, the memory is reused for something else, or even unmapped.
Then the first thread wakes up and incs the counter, which is no longer
there, causing a crash or data corruption.

But this is only a problem for shared objects, which are accessed without any locking -- it's not a common case at all, and can be dealt with by simply taking a lock *before* reading the reference. (there are many much more complex solutions such as CAS2 or RCU based ones).

artur
October 13, 2013
Am 13.10.2013 17:15, schrieb Sean Kelly:
> On Sunday, 13 October 2013 at 07:03:39 UTC, Rainer Schuetze wrote:
>>
>> According to the "Handbook of Garbage Collection" by Richard Jones
>> eager lock-free reference counting can only be done with a cas2
>> operation modifying two seperate locations atomically (algorithm 18.2
>> "Eager reference counting with CompareAndSwap is broken"). This might
>> be the quoted paper:
>> http://scholr.ly/paper/2199608/lock-free-reference-counting
>>
>> Unfortunately the CAS2 operation does not exist in most processors.
>
> I suppose it's worth noting that Boost (and now standard C++) has a
> shared_ptr that works across threads and the implementation I've seen
> doesn't use a mutex.  In fact, I think the Boost one doesn't even use
> CAS on x86, though it's been quite a few years so my memory could be
> wrong on that last detail.

I didn't read the paper, but I'd suspect that the paper refers to the case where both, the reference count _and_ the reference is thread-safe, since the boost/c++ shared_ptr only has a thread-safe reference count after all.
October 14, 2013

On 13.10.2013 19:05, Sönke Ludwig wrote:
> Am 13.10.2013 17:15, schrieb Sean Kelly:
>> On Sunday, 13 October 2013 at 07:03:39 UTC, Rainer Schuetze wrote:
>>>
>>> According to the "Handbook of Garbage Collection" by Richard Jones
>>> eager lock-free reference counting can only be done with a cas2
>>> operation modifying two seperate locations atomically (algorithm 18.2
>>> "Eager reference counting with CompareAndSwap is broken"). This might
>>> be the quoted paper:
>>> http://scholr.ly/paper/2199608/lock-free-reference-counting
>>>
>>> Unfortunately the CAS2 operation does not exist in most processors.
>>
>> I suppose it's worth noting that Boost (and now standard C++) has a
>> shared_ptr that works across threads and the implementation I've seen
>> doesn't use a mutex.  In fact, I think the Boost one doesn't even use
>> CAS on x86, though it's been quite a few years so my memory could be
>> wrong on that last detail.
>
> I didn't read the paper, but I'd suspect that the paper refers to the
> case where both, the reference count _and_ the reference is thread-safe,
> since the boost/c++ shared_ptr only has a thread-safe reference count
> after all.

I haven't read it either, but AFAICT the cas2 operation is used two modify the pointer and the reference count at the same time atomically.

I just checked boost::shared_ptr, it uses cas operations on the reference counts. It has the same problem as described in my example, see the read/write example 3 here: http://www.boost.org/doc/libs/1_54_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety

boost::shared_ptr is also unsafe with respect to calling member functions through "->" as it doesn't increment the reference count.
October 14, 2013
On Monday, 14 October 2013 at 17:43:33 UTC, Rainer Schuetze wrote:
>
>
> On 13.10.2013 19:05, Sönke Ludwig wrote:
>> Am 13.10.2013 17:15, schrieb Sean Kelly:
>>> On Sunday, 13 October 2013 at 07:03:39 UTC, Rainer Schuetze wrote:
>>>>
>>>> According to the "Handbook of Garbage Collection" by Richard Jones
>>>> eager lock-free reference counting can only be done with a cas2
>>>> operation modifying two seperate locations atomically (algorithm 18.2
>>>> "Eager reference counting with CompareAndSwap is broken"). This might
>>>> be the quoted paper:
>>>> http://scholr.ly/paper/2199608/lock-free-reference-counting
>>>>
>>>> Unfortunately the CAS2 operation does not exist in most processors.
>>>
>>> I suppose it's worth noting that Boost (and now standard C++) has a
>>> shared_ptr that works across threads and the implementation I've seen
>>> doesn't use a mutex.  In fact, I think the Boost one doesn't even use
>>> CAS on x86, though it's been quite a few years so my memory could be
>>> wrong on that last detail.
>>
>> I didn't read the paper, but I'd suspect that the paper refers to the
>> case where both, the reference count _and_ the reference is thread-safe,
>> since the boost/c++ shared_ptr only has a thread-safe reference count
>> after all.
>
> I haven't read it either, but AFAICT the cas2 operation is used two modify the pointer and the reference count at the same time atomically.
>
> I just checked boost::shared_ptr, it uses cas operations on the reference counts. It has the same problem as described in my example, see the read/write example 3 here: http://www.boost.org/doc/libs/1_54_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety
>
> boost::shared_ptr is also unsafe with respect to calling member functions through "->" as it doesn't increment the reference count.

I'm totally out of my depth here but can't you store the reference count adjacent to the pointer and use CMPXCHG16B
October 14, 2013

On 13.10.2013 13:48, Michel Fortin wrote:
> On 2013-10-13 06:54:04 +0000, Rainer Schuetze <r.sagitario@gmx.de> said:
>
>> I agree, that's why I used the term "shared reference", too ;-)
>>
>> If you are using only shared objects, but not shared references,
>> you'll have to use message passing coming with its own set of
>> synchronization operations that are not easily made lock-free.
>
> For one of my projects I implemented a shared pointer like this. It uses
> the pointer value itself as a spin lock with the assumption that -1 is
> an invalid pointer value:
>
> 1. read pointer value
> 2. if read value is -1 go to line 1 (spin)
> 3. compare and swap (previously read value <-> -1)
> 4. if failure go to line 1 (spin)
> // now pointer is "locked", its value is -1 but we have a copy of the
> original
> 5. copy pointer locally or assign to it (and update counter)
> 6. write back pointer value atomically to replace the -1
>
> No mutex, but there's a spin lock so it's not good if there's contention.
>
> That said, I find it extremely rare to want a shared pointer that isn't
> already protected by a mutex alongside other variables, or that isn't
> propagated using some form of message passing.
>

Locking is very bad if you have threads at different priorities as it might introduce priority inversion. Spinning is probably even worse in that scenario.

At work, I use shared pointers all the time to pass information to a real time audio thread. The scheme uses triple-buffering of pointers for a lock free safe transport from/to the real time thread.

Not having to worry about these low-level locking stuff is one of the good aspects about garbage collecting.
October 14, 2013
On Monday, 14 October 2013 at 17:50:01 UTC, John Colvin wrote:
> I'm totally out of my depth here but can't you store the reference count adjacent to the pointer and use CMPXCHG16B

I think that can work if the refcount is stored with the pointer (ie. if a fat pointer is used) and not in the object. But that would defeat the whole point of the refcount :(

It seems that Rainer Schuetze is right and that the implementation that boost::shared_ptr uses is not safe.