May 07, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | On Tue, 07 May 2013 16:02:08 -0400, Dmitry Olshansky <dmitry.olsh@gmail.com> wrote: > 07-May-2013 23:49, Andrei Alexandrescu пишет: >> On 5/7/13 12:46 PM, Steven Schveighoffer wrote: >>> On Tue, 07 May 2013 12:30:05 -0400, deadalnix <deadalnix@gmail.com> >>> wrote: >>> > [snip] >>>> >>>> That is incorrect as the thread not going into the lock can see a >>>> partially initialized object. >>> >>> The memory barrier prevents that. You don't store the variable until the >>> object is initialized. That is the whole point. >> >> A memory barrier is not a one-way thing, i.e. not only the writer must >> do it. Any operation on shared memory is a handshake between the writer >> and the reader. If the reader doesn't do its bit, it can see the writes >> out of order no matter what the writer does. >> > Exactly. > So the memory barrier ensures that neither the compiler nor the processor can re-order the stores to memory. But you are saying that the *reader* must also put in a memory barrier, otherwise it might see the stores out of order. It does not make sense to me, how does the reader see a half-initialized object, when the only way it sees anything is when it's stored to main memory *and* the stores are done in order? So that must not be what it is doing. What it must be doing is storing out of order, BUT placing a prevention mechanism from reading the memory until the "release" is done? Kind of like a minuscule mutex lock. So while it is out-of-order writing the object data, it holds a lock on the reference pointer to that data, so anyone using acquire cannot read it yet? That actually makes sense to me. Is that the case? > Returning to the confusing point. > > On x86 things are actually muddied by stronger then required hardware guarantees. And only because of this there no need for explicit read barrier (nor x86 have one) in this case. Still the read operation has to be marked specifically (volatile, asm block, whatever) to ensure the _compiler_ does the right thing (no reordering of across it). I would think the model Mehrdad used would be sufficient to prevent caching of the data, since it is a virtual, no? -Steve |
May 07, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mehrdad | On 5/7/13 5:57 PM, Mehrdad wrote: > On Tuesday, 7 May 2013 at 19:49:30 UTC, Andrei Alexandrescu wrote: >> A memory barrier is not a one-way thing, i.e. not only the writer must >> do it. Any operation on shared memory is a handshake between the >> writer and the reader. If the reader doesn't do its bit, it can see >> the writes out of order no matter what the writer does. >> >> Andrei > > > > Andrew, I still don't understand: > > The writer is ensuring that writes to memory are happening _after_ the > object is initialized and _before_ the reference to the old object is > modified, via a memory barrier. The writer is only half of the equation. The reader has its own cache to worry about and its own loading order. > Unless you're claiming that a memory barrier _doesn't_ do what it's > supposed to (i.e., the memory module is executing writes out-of-order > even though the processor is issuing them in the correct order), there > is no way for _anyone_ to see a partially initialized object anywhere... I'm not claiming, I'm destroying :o). There is. I know it's confusing. You may want to peruse the reading materials linked by others. Andrei |
May 07, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On 5/7/13 6:12 PM, Steven Schveighoffer wrote: > So the memory barrier ensures that neither the compiler nor the > processor can re-order the stores to memory. > > But you are saying that the *reader* must also put in a memory barrier, > otherwise it might see the stores out of order. Yes. (One detail is there are several kinds of barriers, such as acquire and release.) > It does not make sense to me, how does the reader see a half-initialized > object, when the only way it sees anything is when it's stored to main > memory *and* the stores are done in order? There are several actors involved: two processors, each with its own cache, and the main memory. There are several cache coherency protocols, which reflect in different programming models. > So that must not be what it is doing. What it must be doing is storing > out of order, BUT placing a prevention mechanism from reading the memory > until the "release" is done? Kind of like a minuscule mutex lock. So > while it is out-of-order writing the object data, it holds a lock on the > reference pointer to that data, so anyone using acquire cannot read it yet? > > That actually makes sense to me. Is that the case? Not at all. A memory barrier dictates operation ordering. It doesn't do interlocking. One of Herb's slides shows very nicely how memory operations can't be moved ahead of an acquire and past a release. >> Returning to the confusing point. >> >> On x86 things are actually muddied by stronger then required hardware >> guarantees. And only because of this there no need for explicit read >> barrier (nor x86 have one) in this case. Still the read operation has >> to be marked specifically (volatile, asm block, whatever) to ensure >> the _compiler_ does the right thing (no reordering of across it). > > I would think the model Mehrdad used would be sufficient to prevent > caching of the data, since it is a virtual, no? No. Andrei |
May 07, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Tuesday, 7 May 2013 at 22:44:28 UTC, Andrei Alexandrescu wrote: > The writer is only half of the equation. The reader has its own cache to worry about and its own loading order. Oooh! so basically this is the scenario you're referring to? 1. The reader has the uninitialized data in its cache 2. The writer writes the new data and a pointer to the new data 3. The reader sees a new pointer and attempts to load the new data 4. The reader receives stale data from its cache In that case.... > I'm not claiming, I'm destroying :o) ... well done!! > There is. I know it's confusing. You may want to peruse the reading materials linked by others. Ok thanks! |
May 07, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On 5/7/13 3:44 PM, Andrei Alexandrescu wrote: > On 5/7/13 5:57 PM, Mehrdad wrote: >> On Tuesday, 7 May 2013 at 19:49:30 UTC, Andrei Alexandrescu wrote: >>> A memory barrier is not a one-way thing, i.e. not only the writer must >>> do it. Any operation on shared memory is a handshake between the >>> writer and the reader. If the reader doesn't do its bit, it can see >>> the writes out of order no matter what the writer does. >>> >>> Andrei >> >> >> >> Andrew, I still don't understand: >> >> The writer is ensuring that writes to memory are happening _after_ the >> object is initialized and _before_ the reference to the old object is >> modified, via a memory barrier. > > The writer is only half of the equation. The reader has its own cache to > worry about and its own loading order. > >> Unless you're claiming that a memory barrier _doesn't_ do what it's >> supposed to (i.e., the memory module is executing writes out-of-order >> even though the processor is issuing them in the correct order), there >> is no way for _anyone_ to see a partially initialized object anywhere... > > I'm not claiming, I'm destroying :o). There is. I know it's confusing. > You may want to peruse the reading materials linked by others. > > > Andrei (this might be a repeat, I've only skimmed this thread) Section 8.2 of this doc is a good read: http://download.intel.com/products/processor/manual/325384.pdf There's a couple allowed re-orderings that are not what most people expect. Later, Brad |
May 08, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Tue, 07 May 2013 18:49:19 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote: > On 5/7/13 6:12 PM, Steven Schveighoffer wrote: >> So that must not be what it is doing. What it must be doing is storing >> out of order, BUT placing a prevention mechanism from reading the memory >> until the "release" is done? Kind of like a minuscule mutex lock. So >> while it is out-of-order writing the object data, it holds a lock on the >> reference pointer to that data, so anyone using acquire cannot read it yet? >> >> That actually makes sense to me. Is that the case? > > Not at all. A memory barrier dictates operation ordering. It doesn't do interlocking. One of Herb's slides shows very nicely how memory operations can't be moved ahead of an acquire and past a release. I will have to watch those some time, when I have 3 hours to kill :) Thanks for sticking with me and not dismissing out of hand. This is actually the first time I felt close to grasping this concept. -Steve |
May 08, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to dsimcha | On Sun, 05 May 2013 22:35:27 -0400, dsimcha <dsimcha@yahoo.com> wrote:
> On the advice of Walter and Andrei, I've written a blog article about the low-lock Singleton pattern in D. This is a previously obscure pattern that uses thread-local storage to make Singletons both thread-safe and efficient and was independently invented by at least me and Alexander Terekhov, an IBM researcher. However, D's first-class treatment of thread-local storage means the time has come to move it out of obscurity and possibly make it the standard way to do Singletons.
>
> Article:
> http://davesdprogramming.wordpress.com/2013/05/06/low-lock-singletons/
>
> Reddit:
> http://www.reddit.com/r/programming/comments/1droaa/lowlock_singletons_in_d_the_singleton_pattern/
Pulling this out from the depths of this discussion:
David, the current pattern protects the read of the __gshared singleton with a thread local boolean. This means to check to see if the value is valid, we do:
if(!instantiated_)
{ ... // thread safe initialization of instance_
instantiated_ = true;
}
return instance_;
This requires a load of the boolean, and then a load of the instance.
I wonder, if it might be more efficient to store a copy of the instance instead of the bool? This would only require one load for the check:
if(!instantiated_instance_) // a TLS copy of the instance
{ ... // thread safe initialization of instance_
instantiated_instance_ = instance_;
}
return instantiated_instance_;
I think in the steady state this devolves to the "unsafe" case, which of course is safe by that time. Might this account for at least some of dmd's poorer performance?
-Steve
|
May 08, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Tuesday, 7 May 2013 at 22:12:17 UTC, Steven Schveighoffer wrote: > On Tue, 07 May 2013 16:02:08 -0400, Dmitry Olshansky <dmitry.olsh@gmail.com> wrote: > >> 07-May-2013 23:49, Andrei Alexandrescu пишет: >>> On 5/7/13 12:46 PM, Steven Schveighoffer wrote: >>>> On Tue, 07 May 2013 12:30:05 -0400, deadalnix <deadalnix@gmail.com> >>>> wrote: >>>> >> [snip] >>>>> >>>>> That is incorrect as the thread not going into the lock can see a >>>>> partially initialized object. >>>> >>>> The memory barrier prevents that. You don't store the variable until the >>>> object is initialized. That is the whole point. >>> >>> A memory barrier is not a one-way thing, i.e. not only the writer must >>> do it. Any operation on shared memory is a handshake between the writer >>> and the reader. If the reader doesn't do its bit, it can see the writes >>> out of order no matter what the writer does. >>> >> Exactly. >> > > So the memory barrier ensures that neither the compiler nor the processor can re-order the stores to memory. > > But you are saying that the *reader* must also put in a memory barrier, otherwise it might see the stores out of order. > > It does not make sense to me, how does the reader see a half-initialized object, when the only way it sees anything is when it's stored to main memory *and* the stores are done in order? > Short answer : because read can be done out of order and/or from cached values. The reader must do its part of the job as well. You ay want to read this serie of articles, they are very interesting. http://preshing.com/20120913/acquire-and-release-semantics |
May 24, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Monday, 6 May 2013 at 17:58:19 UTC, Walter Bright wrote:
> On 5/6/2013 6:14 AM, Max Samukha wrote:
>> FWIW, I played with a generalized form of this pattern long ago, something like
>> (typing from memory):
>
> And, that's the classic double checked locking bug!
D man is the bug! I simply failed to insert a line that flags the
thread-local guard. Sorry for that.
The Nullable thing was an impromptu to avoid ugly specializations
I used for nullable and non-nullable types in my original
implementation. Note that the Nullable is not phobos Nullable -
the latter incurs unnecessary overhead for types that are already
nullable. Maybe, the use of Nullable is an overkill and a
__gshared boolean (redundant in case of a nullable type) would
suffice.
David mentioned the trick on this NG years ago. It is well known,
understood and would be rarely needed if D could properly do
eager initialization of global state. :P
Anyway, I popped up mainly to point out that the pattern should
not be limited to classes.
|
May 24, 2013 Re: Low-Lock Singletons In D | ||||
---|---|---|---|---|
| ||||
Posted in reply to Max Samukha | Max Samukha:
> Note that the Nullable is not phobos Nullable -
> the latter incurs unnecessary overhead for types that are already nullable.
In Bugzilla I have suggested some improvements for Nullable, but in Phobos there is already an alternative Nullable that avoids that overhead:
struct Nullable(T, T nullValue);
Bye,
bearophile
|
Copyright © 1999-2021 by the D Language Foundation