May 07, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Monday, 6 May 2013 at 18:46:56 UTC, Andrei Alexandrescu wrote:
> Any concurrent operation (in this case read from one thread and write from another) requires a handshake between threads, most often in the form of an release write coupled with an acquire read. Whenever the handshake is absent but concurrent operations on shared memory do occur, the code is broken. The beauty of the TLS-based pattern is that in the steady state there's no need for a shared read and handshake.
>
> Andrei
Hmm, are you referring to the same lack of a barrier that the others are also referring to?
As far as I can see, there shouldn't be a need for any other handshake in this example.
As long as the object is fully initialized before _static is written to (easy enough with just a memory barrier), there is no penalty for subsequent reads whatsoever.
Right?
|
May 07, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mehrdad | 07-May-2013 10:47, Mehrdad пишет: > On Monday, 6 May 2013 at 18:56:08 UTC, Dmitry Olshansky wrote: > > > Thanks for the detailed explanation! > > >> And now compiler/CPU decides to optimize/execute out of order (again, >> it's an illustration) it as: >> >> lock _static_mutex; >> x = alloc int; >> //even if that's atomic >> static_ = x; >> // BOOM! somebody not locking mutex may already >> // see static_ in "half-baked" state >> x[0] = 42; >> unlock _static_mutex; > > > > That's exactly the same as the classic double-checked lock bug, right? > Yeah, and that was my point to begin with - your method doesn't bring anything new. It's the same as the one with null and 'if-null-check' with same issues and requires atomics or barriers. > As I wrote in my original code -- and as you also mentioned yourself -- > isn't it trivially fixed with a memory barrier? > > Like maybe replacing > > _static = new ActualValue<T>(); > > with > > var value = new ActualValue<T>(); > _ReadWriteBarrier(); > _static = value; > > > > Wouldn't this make it correct? Would but then it's the same as the old fixed double-checked locking. Barriers hurt performance that we were after to begin with. Now it would be interesting to measure speed of this TLS low-lock vs atomic-load/memory barrier + mutex. This measurement is absent from the blog post, but Andrei claims memory barrier on each access is too slow. -- Dmitry Olshansky |
May 07, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mehrdad | On 5/7/13 2:50 AM, Mehrdad wrote:
> On Monday, 6 May 2013 at 18:46:56 UTC, Andrei Alexandrescu wrote:
>> Any concurrent operation (in this case read from one thread and write
>> from another) requires a handshake between threads, most often in the
>> form of an release write coupled with an acquire read. Whenever the
>> handshake is absent but concurrent operations on shared memory do
>> occur, the code is broken. The beauty of the TLS-based pattern is that
>> in the steady state there's no need for a shared read and handshake.
>>
>> Andrei
>
>
>
> Hmm, are you referring to the same lack of a barrier that the others are
> also referring to?
>
>
> As far as I can see, there shouldn't be a need for any other handshake
> in this example.
>
> As long as the object is fully initialized before _static is written to
> (easy enough with just a memory barrier), there is no penalty for
> subsequent reads whatsoever.
>
> Right?
No. A tutorial on memory consistency models would be too long to insert here. I don't know of a good online resource, does anyone?
Andrei
|
May 07, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Tue, 07 May 2013 09:25:36 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote: > No. A tutorial on memory consistency models would be too long to insert here. I don't know of a good online resource, does anyone? In essence, a read requires an acquire memory barrier, a write requires a release memory barrier, but in this case, we only need to be concerned if the value we get back is not valid (i.e. NullValue). Once in steady state, there is no need to acquire (as long as the write is atomic, the read value will either be NullValue or ActualValue, not something else). The code in the case of NullValue must be handled very carefully with the correct memory barriers (Given the fact that you should only execute once, just insert a full memory barrier). But that is not the steady state. It's not a revolutionary design, it's basic double-checked locking (implemented in an unnecessarily complex way). It can be done right, but it's still really difficult to get right. The benefit of David's method is that it's REALLY easy to get right, and is MM independent. -Steve |
May 07, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mehrdad | On Tuesday, 7 May 2013 at 06:50:16 UTC, Mehrdad wrote:
> As far as I can see, there shouldn't be a need for any other handshake in this example.
>
> As long as the object is fully initialized before _static is written to (easy enough with just a memory barrier), there is no penalty for subsequent reads whatsoever.
>
> Right?
The issue is that the write to _static might never appear on the other threads, thus leading to multiple instances being created - even though it is atomic in the sense that you never end up reading a pointer with e.g. only half of the bytes updated.
David
|
May 07, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to David Nadlinger | On Tue, 07 May 2013 10:33:13 -0400, David Nadlinger <see@klickverbot.at> wrote:
> On Tuesday, 7 May 2013 at 06:50:16 UTC, Mehrdad wrote:
>> As far as I can see, there shouldn't be a need for any other handshake in this example.
>>
>> As long as the object is fully initialized before _static is written to (easy enough with just a memory barrier), there is no penalty for subsequent reads whatsoever.
>>
>> Right?
>
> The issue is that the write to _static might never appear on the other threads, thus leading to multiple instances being created - even though it is atomic in the sense that you never end up reading a pointer with e.g. only half of the bytes updated.
I don't think that is an issue. The write to _static is protected by a lock, which should present a consistent view of it inside the lock.
Mehrdad corrected his version right away that you need to check the value inside the lock again. This essentially is classic double-checked locking, with one check being a virtual table lookup.
-Steve
|
May 07, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On 5/7/13 10:31 AM, Steven Schveighoffer wrote:
> On Tue, 07 May 2013 09:25:36 -0400, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> No. A tutorial on memory consistency models would be too long to
>> insert here. I don't know of a good online resource, does anyone?
>
> In essence, a read requires an acquire memory barrier, a write requires
> a release memory barrier, but in this case, we only need to be concerned
> if the value we get back is not valid (i.e. NullValue).
>
> Once in steady state, there is no need to acquire (as long as the write
> is atomic, the read value will either be NullValue or ActualValue, not
> something else).
There's always a need to acquire so as to figure whether the steady state has been entered.
Andrei
|
May 07, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Tue, 07 May 2013 11:30:12 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote: > On 5/7/13 10:31 AM, Steven Schveighoffer wrote: >> On Tue, 07 May 2013 09:25:36 -0400, Andrei Alexandrescu >> <SeeWebsiteForEmail@erdani.org> wrote: >> >>> No. A tutorial on memory consistency models would be too long to >>> insert here. I don't know of a good online resource, does anyone? >> >> In essence, a read requires an acquire memory barrier, a write requires >> a release memory barrier, but in this case, we only need to be concerned >> if the value we get back is not valid (i.e. NullValue). >> >> Once in steady state, there is no need to acquire (as long as the write >> is atomic, the read value will either be NullValue or ActualValue, not >> something else). > > There's always a need to acquire so as to figure whether the steady state has been entered. Not really. Whether it is entered or not is dictated by the vtable. Even classic double-check locking doesn't need an acquire outside the lock. Even if your CPU's view of the variable is outdated, the check after the memory barrier inside the lock only occurs once. After that, steady state is achieved. All subsequent reads need no memory barriers, because the singleton object will never change after that. The only thing we need to guard against is non-atomic writes, and out of order writes of the static variable (fixed with a memory barrier). Instruction ordering OUTSIDE the lock is irrelevant, because if we don't get the "steady state" value (not null), then we go into the lock to perform the careful initialization with barriers. I think aligned native word writes are atomic, so we don't have to worry about that. But I think we've spent enough time on this solution. Yes, double-checked locking can be done, but David's pattern is far easier to implement, understand, and explain. It comes at a small cost of checking a boolean before each access of the initialized data. His benchmarks show a very small performance penalty. And another LARGE benefit is you don't have to pull out your obscure (possibly challenged) memory model book/blog post or the CPU spec to prove it :) Hmm... you might be able to mitigate the penalty by storing the actual object reference instead of a bool in the _instantiated variable. Then a separate load is not required. David? -Steve |
May 07, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Tuesday, 7 May 2013 at 16:14:50 UTC, Steven Schveighoffer wrote:
> Not really. Whether it is entered or not is dictated by the vtable. Even classic double-check locking doesn't need an acquire outside the lock. Even if your CPU's view of the variable is outdated, the check after the memory barrier inside the lock only occurs once. After that, steady state is achieved. All subsequent reads need no memory barriers, because the singleton object will never change after that.
>
> The only thing we need to guard against is non-atomic writes, and out of order writes of the static variable (fixed with a memory barrier). Instruction ordering OUTSIDE the lock is irrelevant, because if we don't get the "steady state" value (not null), then we go into the lock to perform the careful initialization with barriers.
>
> I think aligned native word writes are atomic, so we don't have to worry about that.
>
That is incorrect as the thread not going into the lock can see a partially initialized object.
|
May 07, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to deadalnix | On Tue, 07 May 2013 12:30:05 -0400, deadalnix <deadalnix@gmail.com> wrote:
> On Tuesday, 7 May 2013 at 16:14:50 UTC, Steven Schveighoffer wrote:
>> Not really. Whether it is entered or not is dictated by the vtable. Even classic double-check locking doesn't need an acquire outside the lock. Even if your CPU's view of the variable is outdated, the check after the memory barrier inside the lock only occurs once. After that, steady state is achieved. All subsequent reads need no memory barriers, because the singleton object will never change after that.
>>
>> The only thing we need to guard against is non-atomic writes, and out of order writes of the static variable (fixed with a memory barrier). Instruction ordering OUTSIDE the lock is irrelevant, because if we don't get the "steady state" value (not null), then we go into the lock to perform the careful initialization with barriers.
>>
>> I think aligned native word writes are atomic, so we don't have to worry about that.
>>
>
> That is incorrect as the thread not going into the lock can see a partially initialized object.
The memory barrier prevents that. You don't store the variable until the object is initialized. That is the whole point.
-Steve
|
Copyright © 1999-2021 by the D Language Foundation