May 19, 2013
On Saturday, 18 May 2013 at 16:58:19 UTC, Idan Arye wrote:
> OK, I implemented everything and made a pull request: https://github.com/D-Programming-Language/phobos/pull/1294

Nice, but the singleton implementation seems somewhat over-complicated, and the low-lock singleton is broken, possibly as a result of the former problem.

You need two variables of type T (assuming T is the target class). One should be __gshared, the other thread-local. When "instance()" is called, first check if the thread-local variable is non-null, if so just return it. Otherwise enter a synchronized block, check if the __gshared variables is null (and if so initialise it) then copy its value to the thread-local variable and finally return it.

For a "hasInstance" method to make any sense, the caller must be able to synchronize on the same object that "instance()" uses to synchronize on when it accesses the __gshared variable. Otherwise the return value is out of date before it's even been returned.
May 19, 2013
On Sunday, 19 May 2013 at 08:36:24 UTC, Diggory wrote:
> On Saturday, 18 May 2013 at 16:58:19 UTC, Idan Arye wrote:
>> OK, I implemented everything and made a pull request: https://github.com/D-Programming-Language/phobos/pull/1294
>
> Nice, but the singleton implementation seems somewhat over-complicated, and the low-lock singleton is broken, possibly as a result of the former problem.
>
> You need two variables of type T (assuming T is the target class). One should be __gshared, the other thread-local. When "instance()" is called, first check if the thread-local variable is non-null, if so just return it. Otherwise enter a synchronized block, check if the __gshared variables is null (and if so initialise it) then copy its value to the thread-local variable and finally return it.
>
> For a "hasInstance" method to make any sense, the caller must be able to synchronize on the same object that "instance()" uses to synchronize on when it accesses the __gshared variable. Otherwise the return value is out of date before it's even been returned.

There is no point in saving a thread local reference to the global instance. The `__gshared` instance is never changed once initialized, so if we saved a thread local reference, it would *always* be either null or the same as the `__gshared` one - which means that if the local reference is not null, there is no difference between returning the local and the global references.

`hasInstance` does not need no synchronization - it would just slow it down. Synchronization is redundant in readonly and writeonly scenarios - and this is a readonly scenario. A single read is atomic with or without a synchronization.

At any rate, using my implementation was broekn - I forgot to set the thread local boolean instantiation indicator to true(which would mean there will always be a lock!). I fixed it. Thanks for pointing that out!
May 19, 2013
>
> There is no point in saving a thread local reference to the global instance. The `__gshared` instance is never changed once initialized, so if we saved a thread local reference, it would *always* be either null or the same as the `__gshared` one - which means that if the local reference is not null, there is no difference between returning the local and the global references.
>
> `hasInstance` does not need no synchronization - it would just slow it down. Synchronization is redundant in readonly and writeonly scenarios - and this is a readonly scenario. A single read is atomic with or without a synchronization.
>
> At any rate, using my implementation was broekn - I forgot to set the thread local boolean instantiation indicator to true(which would mean there will always be a lock!). I fixed it. Thanks for pointing that out!

With regard to using a boolean instead of storing the instance thread locally - you're still reading from a mutable __gshared variable with no synchronisation on the reader's side, and that is always a bug. It may work in most cases but due to instruction reordering and differences between architectures there's no guarantee of that.

It's also less efficient as you have to read both the thread-local boolean and the __gshared instance. Since the thread-local boolean is likely going to use a word anyway you may as well store the instance in there instead.

Single reads are NOT atomic. On x86 word-aligned reads *happen* to be atomic, and even that is not guaranteed on other architectures. The main advantage of the low-lock singleton idea is that it is completely independent of architecture (there are more efficient ways if the architecture is known).

With respect to "hasInstance", what is a possible use-case where synchronisation is not required?
May 19, 2013
On Sunday, 19 May 2013 at 20:35:06 UTC, Diggory wrote:
>>
>> There is no point in saving a thread local reference to the global instance. The `__gshared` instance is never changed once initialized, so if we saved a thread local reference, it would *always* be either null or the same as the `__gshared` one - which means that if the local reference is not null, there is no difference between returning the local and the global references.
>>
>> `hasInstance` does not need no synchronization - it would just slow it down. Synchronization is redundant in readonly and writeonly scenarios - and this is a readonly scenario. A single read is atomic with or without a synchronization.
>>
>> At any rate, using my implementation was broekn - I forgot to set the thread local boolean instantiation indicator to true(which would mean there will always be a lock!). I fixed it. Thanks for pointing that out!
>
> With regard to using a boolean instead of storing the instance thread locally - you're still reading from a mutable __gshared variable with no synchronisation on the reader's side, and that is always a bug. It may work in most cases but due to instruction reordering and differences between architectures there's no guarantee of that.
>
> It's also less efficient as you have to read both the thread-local boolean and the __gshared instance. Since the thread-local boolean is likely going to use a word anyway you may as well store the instance in there instead.
>
> Single reads are NOT atomic. On x86 word-aligned reads *happen* to be atomic, and even that is not guaranteed on other architectures. The main advantage of the low-lock singleton idea is that it is completely independent of architecture (there are more efficient ways if the architecture is known).
>
> With respect to "hasInstance", what is a possible use-case where synchronisation is not required?

Reading an aligned word-sized value should be atomic, pointers are word-sized, and compilers usually align variables - so I do believe it's safe to treat reads as atomic.

And even if they weren't - there should be no problem regarding `hasInstance`, because it returns a boolean value - true or false. That means there are for options:

1) `hasInstance` returns true when the instance was already initialized.
2) `hasInstance` returns false when the instance was not yet initialized.
3) `hasInstance` returns false while the instance is being initialized in another thread.
4) `hasInstance` returns true while the instance is being initialized in another thread.

Obviously we have no problem with 1 and 2, but 3 and 4 seem problematic. I claim they are not!

Let's take a look at 3. `hasInstance` returns false even thought an instance is currently being initialized. But if `hasInstace` was invoked a nanosecond earlier it would have also returned false, and this time it would be correct. In both cases(nanosecond earlier and nanosecond later), by the time thread that called `hasInstance` has reach the next command the singleton will be instantiated and the answer of `hasInstance` will be outdated. So - scenario 3 does not introduce any new bugs.

Scenario 4, however, seems to be a bit more problematic - when `hasInstance` returns true even though the singleton has not been fully instantiated. What does that mean? well, the only way a read or a write can be non-atomic is if the memory is read/written in parts:
 - Maybe the read split - maybe `hasInstance` read half the variable, then the other thread written, then `hasInstance` read the other half.
 - Maybe the write split - maybe the other thread wrote half the variable, then `hasInstance` read it, then the other thread wrote the other half.
 - And maybe both of them split.
At any rate, no matter was was split, the result is the same - `hasInstance` will return true because what it read from the global reference is not null(even though it's not a full reference). This seems bad - `hasInstance` tells us that the singleton is already instantiated even though the other thread didn't finish writing the reference before it was invoked!

Well, I say it doesn't matter. By the time `hasInstance` will return it's value and the code can act upon it, the thread that finished the singleton will surely have finished writing that single global reference!

And even if it didn't - we can rest assured that we can rely on `hasInstance`'s answer, because it tell us that the singleton can no longer be initialized, and that by the time we need to access it, it will already be initialized. And even if by then the other thread would still not finished writing the reference - maybe because the thread that called `hasInstance` is being run on a 3.5GHz core and the thread that initiates the singleton instance is being run by a monkey with an abacus - once you try to actually access that instance you will enter that synchronization block that will hold you until the monkey finish writing the reference and it is ready to use.



Beside `hasInstance`, there is two other place where I am reading the `__gshared` reference without synchronization - both at `instance()`. One is a null check in case singleton does not support implicit instantiation - and now that I look at it, maybe I should put a synchronization there(though I still don't think it's needed). The other one is the return statement - but by the time we reach it, we know that the instantiation was already completed, and we know that the the reference will never change again until the end of the process - in other words, we know there will not be any more writes to this `__gshared` reference, and that means we can read it safely without synchronizations just we can safely read immutable values without synchronization.
May 19, 2013
On Sunday, 19 May 2013 at 20:35:06 UTC, Diggory wrote:
> It's also less efficient as you have to read both the thread-local boolean and the __gshared instance. Since the thread-local boolean is likely going to use a word anyway you may as well store the instance in there instead.

This is a good point. I've changed the local indicator to a local reference.

I've also added the local check under `instance()`. Until now, I relayed on `singleton_tryInitInstance()` to do that check, but I figured out that it's better to do an extra local check on first call(in each thread) than to do an extra method call for every call afterward. Also, when that check is inside another method, the optimization you suggested won't work(unless the compiler decides to inline it)
May 20, 2013
In your logic you're assuming that the order of operations is maintained - without the correct memory barriers or synchronisation both the compiler and CPU are free to completely reorder any operations you do. That's why it's always a bug to access mutable shared data without synchronisation - any read not using some form of synchronisation could give you back any value that the memory is ever set to during the course of the program.
May 20, 2013
On Monday, 20 May 2013 at 05:39:42 UTC, Diggory wrote:
> In your logic you're assuming that the order of operations is maintained - without the correct memory barriers or synchronisation both the compiler and CPU are free to completely reorder any operations you do. That's why it's always a bug to access mutable shared data without synchronisation - any read not using some form of synchronisation could give you back any value that the memory is ever set to during the course of the program.

It can't be THAT chaotic. Neither the compiler nor the CPU will perform a check on a value and then go back in time to fetch an old version of that value and return it. This kind of behavior would break non-threaded code.
May 20, 2013
On Monday, 20 May 2013 at 06:53:34 UTC, Idan Arye wrote:
> On Monday, 20 May 2013 at 05:39:42 UTC, Diggory wrote:
>> In your logic you're assuming that the order of operations is maintained - without the correct memory barriers or synchronisation both the compiler and CPU are free to completely reorder any operations you do. That's why it's always a bug to access mutable shared data without synchronisation - any read not using some form of synchronisation could give you back any value that the memory is ever set to during the course of the program.
>
> It can't be THAT chaotic. Neither the compiler nor the CPU will perform a check on a value and then go back in time to fetch an old version of that value and return it. This kind of behavior would break non-threaded code.

Of course it's possible, for example the code may produce the expected result if some invariant holds which does in fact hold if there was a single thread running, but with multiple threads the invariant is broken. Or more simply - the fact remains that you are writing on one thread (correctly using synchronisation) and reading from another (not using synchronisation) and synchronisation is required on both the read and the write. The compiler/CPU is then free to reorder the reads under the assumption that the value won't change, and this assumption is clearly wrong.

Basically most of your argument is just hoping that it will behave the way you want, but without having any real guarantees, and that's not the way to write thread-safe code, especially if it's going to be part of the standard library.
May 20, 2013
On Monday, 20 May 2013 at 11:19:44 UTC, Diggory wrote:
> Of course it's possible, for example the code may produce the expected result if some invariant holds which does in fact hold if there was a single thread running, but with multiple threads the invariant is broken. Or more simply - the fact remains that you are writing on one thread (correctly using synchronisation) and reading from another (not using synchronisation) and synchronisation is required on both the read and the write. The compiler/CPU is then free to reorder the reads under the assumption that the value won't change, and this assumption is clearly wrong.

I do not assume that the compiler or the CPU will not change the order of reads in the unsynchronized thread - I showed that the result of `hasInstance()` is not affected by such reordering! `hasInstance()` has a single read to __gshared memory, and the only thing that can effect the result of that read is a write to that memory, which is done *once* in the synchronized thread. That means I should only care when the read in `hasInstance()` happens related to that write.

I have shown that whether the read happens before the write, or the write happens before the read, or they happen at the same time, or the write is split and the read is done between the two parts of the write, or the other way around, or if both the read and write are split and their parts are performed alternately - no matter what, `hasInstance()` yields a result I can rely on.

Since `hasInstance()` produces reliable results even if it gets mixed in the timeframe of the instantiation in another thread - I see no reason to do a costly synchronization to prevent the mixing.



I have tried to form a similar proof for the static branch in `instance()` that handles the no-default-constructor case, and realized that this one does need synchronization, because the compiler might decide to set the reference before it runs the initialization of the object. Even though the result it will return will be the correct reference to the instance, the instance object itself might not be ready.

So, I'm making that part synchronized, simply to force `instance()` to wait until the instance object has finished it's instantiation.
May 20, 2013
20-May-2013 19:28, Idan Arye пишет:
> On Monday, 20 May 2013 at 11:19:44 UTC, Diggory wrote:
>> Of course it's possible, for example the code may produce the expected
>> result if some invariant holds which does in fact hold if there was a
>> single thread running, but with multiple threads the invariant is
>> broken. Or more simply - the fact remains that you are writing on one
>> thread (correctly using synchronisation) and reading from another (not
>> using synchronisation) and synchronisation is required on both the
>> read and the write. The compiler/CPU is then free to reorder the reads
>> under the assumption that the value won't change, and this assumption
>> is clearly wrong.
>
> I do not assume that the compiler or the CPU will not change the order
> of reads in the unsynchronized thread - I showed that the result of
> `hasInstance()` is not affected by such reordering! `hasInstance()` has
> a single read to __gshared memory, and the only thing that can effect
> the result of that read is a write to that memory, which is done *once*
> in the synchronized thread. That means I should only care when the read
> in `hasInstance()` happens related to that write.
>

Long story short - re-read the discussion in the Low-lock thread again:
http://forum.dlang.org/thread/pelhvaxwjzhehdjtpsav@forum.dlang.org


-- 
Dmitry Olshansky