Thread overview
bool Associative Array Synchronized
Mar 20, 2014
Etienne
Mar 20, 2014
Etienne
Mar 20, 2014
Chris Williams
Mar 20, 2014
H. S. Teoh
Mar 20, 2014
Etienne
Mar 20, 2014
John Colvin
Mar 20, 2014
Etienne
March 20, 2014
I'd like to "cowboy it" on an AA that looks like this:

__gshared bool[string] m_mutex;

I think it'll be much faster for my code because this AA could need to be checked and switched possibly millions of times per second and I wouldn't want to slow it down with a mutex protecting it.

I'm thinking the bool would be synchronized at the hardware level anyway, since it's just a bit being flipped or returned. Am I right to assume this and (maybe an assembly guru can answer) it safe to use?

Thanks
March 20, 2014
On Thu, 20 Mar 2014 13:43:58 -0400, Etienne <etcimon@gmail.com> wrote:

> I'd like to "cowboy it" on an AA that looks like this:
>
> __gshared bool[string] m_mutex;
>
> I think it'll be much faster for my code because this AA could need to be checked and switched possibly millions of times per second and I wouldn't want to slow it down with a mutex protecting it.
>
> I'm thinking the bool would be synchronized at the hardware level anyway, since it's just a bit being flipped or returned. Am I right to assume this and (maybe an assembly guru can answer) it safe to use?

No, it's not safe. With memory reordering, there is no "safe" unless you use mutexes or low-level atomics.

Long story short, the compiler, the processor, or the memory cache can effectively reorder operations, making one thread see things happen in a different order than they are written/executed on another thread. There are no guarantees.

-Steve
March 20, 2014
On Thursday, 20 March 2014 at 17:43:58 UTC, Etienne wrote:
> I'd like to "cowboy it" on an AA that looks like this:
>
> __gshared bool[string] m_mutex;
>
> I think it'll be much faster for my code because this AA could need to be checked and switched possibly millions of times per second and I wouldn't want to slow it down with a mutex protecting it.
>
> I'm thinking the bool would be synchronized at the hardware level anyway, since it's just a bit being flipped or returned. Am I right to assume this and (maybe an assembly guru can answer) it safe to use?
>
> Thanks

On x86 only?

If all you do is read/write from/to each location then you are guaranteed atomicity. Adding/removing members? Forget it

Also, atomicity is not a strong enough guarantee for implementing a mutex, which is what I assume you are trying to do. You need ordering guarantees as well.

Watch both of these:

http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2
http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-2-of-2

and then look at core.atomic

and then give up and use a lock :p
March 20, 2014
On 2014-03-20 1:58 PM, John Colvin wrote:
>
> Also, atomicity is not a strong enough guarantee for implementing a
> mutex, which is what I assume you are trying to do. You need ordering
> guarantees as well.

Heh. Will do!
March 20, 2014
On 2014-03-20 1:52 PM, Steven Schveighoffer wrote:
> On Thu, 20 Mar 2014 13:43:58 -0400, Etienne <etcimon@gmail.com> wrote:
>
>> I'd like to "cowboy it" on an AA that looks like this:
>>
>> __gshared bool[string] m_mutex;
>>
>> I think it'll be much faster for my code because this AA could need to
>> be checked and switched possibly millions of times per second and I
>> wouldn't want to slow it down with a mutex protecting it.
>>
>> I'm thinking the bool would be synchronized at the hardware level
>> anyway, since it's just a bit being flipped or returned. Am I right to
>> assume this and (maybe an assembly guru can answer) it safe to use?
>
> No, it's not safe. With memory reordering, there is no "safe" unless you
> use mutexes or low-level atomics.
>
> Long story short, the compiler, the processor, or the memory cache can
> effectively reorder operations, making one thread see things happen in a
> different order than they are written/executed on another thread. There
> are no guarantees.
>
> -Steve

Right, I was assuming it was always ordered, but modern processor pipelines are different I guess.
March 20, 2014
On Thursday, 20 March 2014 at 18:06:18 UTC, Etienne wrote:
> Right, I was assuming it was always ordered, but modern processor pipelines are different I guess.

Even without rearranging the order of your code, your bit exists in RAM but all the logic takes place in a CPU register, meaning that any time you operate with the bit, there's at least two copies. When the CPU switches threads (which could be at any point), it empties the CPU into RAM. If the thread it's switching to then tries to interact with the bit, it's going to create a third copy. Now whichever of the two threads is slower to write back to the original location is going to smash the other's.
March 20, 2014
On Thu, Mar 20, 2014 at 06:39:10PM +0000, Chris Williams wrote:
> On Thursday, 20 March 2014 at 18:06:18 UTC, Etienne wrote:
> >Right, I was assuming it was always ordered, but modern processor pipelines are different I guess.
> 
> Even without rearranging the order of your code, your bit exists in RAM but all the logic takes place in a CPU register, meaning that any time you operate with the bit, there's at least two copies. When the CPU switches threads (which could be at any point), it empties the CPU into RAM. If the thread it's switching to then tries to interact with the bit, it's going to create a third copy. Now whichever of the two threads is slower to write back to the original location is going to smash the other's.

Furthermore, the CPU does not access bits directly; it processes them as (at least) bytes. To set/clear a bit in memory location X, the CPU has to first read X into a register (at least 8 bits long), update the bit, and write it back into memory. If two threads are simultaneously operating on different bits that resides in the same byte, whichever CPU runs last will smash whatever the first CPU wrote.

Let's say you start with 00000000b in location X, and CPU1 wants to set the first bit and CPU2 wants to set the second bit. Both CPU's initially reads 00000000b into their registers, then the first CPU sets the first bit, so it becomes 00000001b (in register) and the second CPU sets the second bit so it becomes 00000010b (in register). Now CPU1 writes 00000001b back to memory, followed by CPU2 writing 00000010b back to memory. Now what CPU1 did has been smashed by CPU2's write.

Now, the current AA implementation doesn't actually pack bits like this, so this particular problem doesn't actually happen, but other similar problems will occur if you add/remove keys from the AA -- two CPU's will try to update internal AA pointers simultaneously, and end up trashing it.


T

-- 
What doesn't kill me makes me stranger.
March 20, 2014
On 2014-03-20 2:47 PM, H. S. Teoh wrote:
> On Thu, Mar 20, 2014 at 06:39:10PM +0000, Chris Williams wrote:
>> On Thursday, 20 March 2014 at 18:06:18 UTC, Etienne wrote:
>>> Right, I was assuming it was always ordered, but modern processor
>>> pipelines are different I guess.
>>
>> Even without rearranging the order of your code, your bit exists in
>> RAM but all the logic takes place in a CPU register, meaning that
>> any time you operate with the bit, there's at least two copies. When
>> the CPU switches threads (which could be at any point), it empties
>> the CPU into RAM. If the thread it's switching to then tries to
>> interact with the bit, it's going to create a third copy. Now
>> whichever of the two threads is slower to write back to the original
>> location is going to smash the other's.
>
> Furthermore, the CPU does not access bits directly; it processes them as
> (at least) bytes. To set/clear a bit in memory location X, the CPU has
> to first read X into a register (at least 8 bits long), update the bit,
> and write it back into memory. If two threads are simultaneously
> operating on different bits that resides in the same byte, whichever CPU
> runs last will smash whatever the first CPU wrote.
>
> Let's say you start with 00000000b in location X, and CPU1 wants to set
> the first bit and CPU2 wants to set the second bit. Both CPU's initially
> reads 00000000b into their registers, then the first CPU sets the first
> bit, so it becomes 00000001b (in register) and the second CPU sets the
> second bit so it becomes 00000010b (in register). Now CPU1 writes
> 00000001b back to memory, followed by CPU2 writing 00000010b back to
> memory. Now what CPU1 did has been smashed by CPU2's write.
>
> Now, the current AA implementation doesn't actually pack bits like this,
> so this particular problem doesn't actually happen, but other similar
> problems will occur if you add/remove keys from the AA -- two CPU's will
> try to update internal AA pointers simultaneously, and end up trashing
> it.
>
>
> T
>

Hmm my misunderstanding comes from not taking into account everything I knew about CPU caches. As a side note, I think I heard somewhere that abstractions don't flatten out the learning curve at all, it's like a bigger gun but you still need to know all the basics to avoid shooting yourself with it. So true. Well, thanks for the explanation there :)