Low-Lock Singletons In D (page 3)

On 5/6/13 2:25 PM, Mehrdad wrote: > On Monday, 6 May 2013 at 13:33:54 UTC, Andrei Alexandrescu wrote: >> It's well known. Needs a memory barrier on each access, so it's slower. > > > Hmm, are you saying I'm missing a memory barrier I should have written, > or are you saying I already have a memory barrier which I'm not seeing? > > > The only memory barrier I have is during initialization, and after that > only read operations occur. Any concurrent operation (in this case read from one thread and write from another) requires a handshake between threads, most often in the form of an release write coupled with an acquire read. Whenever the handshake is absent but concurrent operations on shared memory do occur, the code is broken. The beauty of the TLS-based pattern is that in the steady state there's no need for a shared read and handshake. Andrei

On Monday, 6 May 2013 at 18:23:51 UTC, Mehrdad wrote: >> One way is to ensure write is atomic w.r.t. that particular read operation > Are't pointer writes always atomic? No. But *aligned* word-sized writes *on x86* are. David

May 06, 2013

Re: Low-Lock Singletons In D

Posted by Dmitry Olshansky
in reply to Mehrdad

Permalink

Dmitry Olshansky

Posted in reply to Mehrdad

Permalink

06-May-2013 22:23, Mehrdad пишет:
> On Monday, 6 May 2013 at 11:21:37 UTC, Dmitry Olshansky wrote:
>> Yes, but the 1st processor may read _static exactly when the 2nd is
>> inside the lock and writing to that field. Then chances are it will
>> read whatever partial state there is written.
>> Surely it can. 1st processor takes the lock and write while the second
>> one reads static_ to call Get.
>
>
> It's a single pointer, there is no partial state -- it's either written
> or it isn't.

True... e.g on x86 if that word is aligned.
But the core of problem is not only that word it's rather the fact
that processor (and compiler and who's not) is free to re-order read/write ops
inside that locked region.

Potentially it could lead to epically nasty things.

struct Static{
	int value;
	Static(int v){
		value = v;
	}
}

Now suppose:

lock(_static){
	_static = new Static(42); //or smth like
}

Is compiled down to this "C--" code (quite realistically):

lock _static_mutex;
x = alloc int;
x[0] = 42;
static_ = x;
unlock _static_mutex;

And now compiler/CPU decides to optimize/execute out of order (again, it's an illustration) it as:

lock _static_mutex;
x = alloc int;
//even if that's atomic
static_ = x;
// BOOM! somebody not locking mutex may already
// see static_ in "half-baked" state
x[0] = 42;
unlock _static_mutex;

Observe that:
a) It can and nothing prevents such a scenario. In fact it should feel free to optimize inside the locked section, but not accross
b) The chance that it happens is non-zero and rises with the complexity of construction
c) Depends on hardware kind(!) used as in OoO vs non-OoO

So it would work more reliably on old Atoms if that of any comfort ;). That if your compiler isn't equally smart and does nasty things behind your back ("Sorry, I just scheduled instructions optimally...").

>> One way is to ensure write is atomic w.r.t. that particular read
>> operation
>
>
> Are't pointer writes always atomic?

In short - no. Even not counting the world of legal re-ordering, unaligned writes too have this nasty habit of partially written stuff being visible at the wrong time.

Truth be told relying on these kind of "not ever likely to go wrong" is the very reason explicit atomics are encouraged even if they are NOPs on your current arch. That or faster/harder stuff - barriers.

Speaking of barriers - that is exactly the kind of thing that would disallow moving write of x[0] = 42 past "line" of static_ = x;
Simply put barriers act as a line of "no reordering across this point" for specific operations (depending on type, or all of them). Needles to say - hurts performance.

-- 
Dmitry Olshansky

On Monday, 6 May 2013 at 02:35:33 UTC, dsimcha wrote: > On the advice of Walter and Andrei, I've written a blog article about the low-lock Singleton pattern in D. This is a previously obscure pattern that uses thread-local storage to make Singletons both thread-safe and efficient and was independently invented by at least me and Alexander Terekhov, an IBM researcher. However, D's first-class treatment of thread-local storage means the time has come to move it out of obscurity and possibly make it the standard way to do Singletons. > > Article: > http://davesdprogramming.wordpress.com/2013/05/06/low-lock-singletons/ > > Reddit: > http://www.reddit.com/r/programming/comments/1droaa/lowlock_singletons_in_d_the_singleton_pattern/ Thanks! I want to make a module with template mixins that implement some common idioms - singleton being one of them(http://forum.dlang.org/thread/fofbrlqruxbevnxchxdp@forum.dlang.org). I'm going to use your version for implementing the singleton idiom.

On 5/6/13, dsimcha <dsimcha@yahoo.com> wrote: > On the advice of Walter and Andrei, I've written a blog article about the low-lock Singleton pattern in D. Personally this covers it for me: final abstract class singleton { static: void foo() { } int f; } void main() { singleton.foo(); singleton.f = 1; } Never needed anything more complex than that. :p

On Monday, 6 May 2013 at 21:13:24 UTC, Andrej Mitrovic wrote: > On 5/6/13, dsimcha <dsimcha@yahoo.com> wrote: >> On the advice of Walter and Andrei, I've written a blog article >> about the low-lock Singleton pattern in D. > > Personally this covers it for me: > > final abstract class singleton > { > static: > void foo() { } > int f; > } > > void main() > { > singleton.foo(); > singleton.f = 1; > } > > Never needed anything more complex than that. :p Although that's only a single-per-thread-ton :P

On Mon, 06 May 2013 17:36:00 -0400, Andrej Mitrovic <andrej.mitrovich@gmail.com> wrote: > On 5/6/13, Diggory <diggsey@googlemail.com> wrote: >> Although that's only a single-per-thread-ton :P > > My bad. > >> final abstract class singleton >> { >> __gshared: >> void foo() { } >> int f; >> } The point that is missed is the lazy creation. Imagine if it's not just an int but a whole large resource, which takes a long time to create/open. If all we cared about is access, and you *know* you are going to need it, you can just create it eagerly via shared static this(). That is not what the singleton pattern is for. -Steve

On Monday, 6 May 2013 at 18:51:12 UTC, David Nadlinger wrote: > On Monday, 6 May 2013 at 18:23:51 UTC, Mehrdad wrote: >>> One way is to ensure write is atomic w.r.t. that particular read operation >> Are't pointer writes always atomic? > > No. But *aligned* word-sized writes *on x86* are. > > David Right, but this was specifically meant for x86 (as I mentioned in the OP) and pointers are aligned by default. :)

On Monday, 6 May 2013 at 18:56:08 UTC, Dmitry Olshansky wrote: Thanks for the detailed explanation! > And now compiler/CPU decides to optimize/execute out of order (again, it's an illustration) it as: > > lock _static_mutex; > x = alloc int; > //even if that's atomic > static_ = x; > // BOOM! somebody not locking mutex may already > // see static_ in "half-baked" state > x[0] = 42; > unlock _static_mutex; That's exactly the same as the classic double-checked lock bug, right? As I wrote in my original code -- and as you also mentioned yourself -- isn't it trivially fixed with a memory barrier? Like maybe replacing _static = new ActualValue<T>(); with var value = new ActualValue<T>(); _ReadWriteBarrier(); _static = value; Wouldn't this make it correct? >> Are't pointer writes always atomic? > > In short - no. Even not counting the world of legal re-ordering, unaligned writes But my example was completely aligned...

Forums