Jump to page: 1 2
Thread overview
Need help with communication between multiple threads
Feb 20, 2007
Chad J
Feb 20, 2007
kris
Feb 20, 2007
Sean Kelly
Feb 21, 2007
Chad J
Feb 21, 2007
kris
Feb 21, 2007
Chad J
Feb 21, 2007
kris
Feb 21, 2007
kris
Feb 21, 2007
Sean Kelly
Feb 21, 2007
Chad J
Feb 21, 2007
Sean Kelly
February 20, 2007
I'm a bit of a newbie to this whole multithreading thing, so I'm hoping someone can help me with this.

First, what I know (or what I think I know):  So I've been reading about this, and apparently there's this problem where two threads that are reading a value from the same address in memory at the same time may end up with two completely different values.  Apparently this is because when you write something, it may just end up in the cache and not be updated in global memory.  Also, when reading, you may end up with something that is just an outdated copy in the cache, and not the actual thing from global memory.  But it doesn't stop there, apparently x86 computers are very forgiving on this stuff so if you make a mistake you won't know it until your program is run on some more obscure hardware.

Now then, my question:  In D, how do I ensure that when I write something, the write is to global memory, and when I read something, the read is from global memory?

Some more info:  This comes up because I am trying to write a Timer class for Tango, and it will include timers that trigger events at a later date, which requires multithreading.  So it'd be most helpful if I could accomplish this using only D features and/or Tango.
February 20, 2007
Chad J wrote:
> I'm a bit of a newbie to this whole multithreading thing, so I'm hoping someone can help me with this.
> 
> First, what I know (or what I think I know):  So I've been reading about this, and apparently there's this problem where two threads that are reading a value from the same address in memory at the same time may end up with two completely different values.  Apparently this is because when you write something, it may just end up in the cache and not be updated in global memory.  Also, when reading, you may end up with something that is just an outdated copy in the cache, and not the actual thing from global memory.  But it doesn't stop there, apparently x86 computers are very forgiving on this stuff so if you make a mistake you won't know it until your program is run on some more obscure hardware.
> 
> Now then, my question:  In D, how do I ensure that when I write something, the write is to global memory, and when I read something, the read is from global memory?
> 
> Some more info:  This comes up because I am trying to write a Timer class for Tango, and it will include timers that trigger events at a later date, which requires multithreading.  So it'd be most helpful if I could accomplish this using only D features and/or Tango.

Basicially, you need to protect the value from contention between two threads. There are a number of ways to do this:

1) using native D facilities via the synchronized keyword: expose a getter and setter method, and have them both synch on the same object/lock. This is a fairly heavyweight resolution, but it would work.

2) get under the covers and utilize a mutex, semaphore, or some other classical synchronization construct exposed by the OS itself. Tango will provide a cross-platform way of doing this in the next release. This is potentially lighter weight than #1

3) use CPU-specific instructions to ensure value access is atomic. This is what Sean has exposed in the Atomic module within Tango. It is a lightweight and low-overhead solution, and works by locking the bus for the duration of the read/write access.

4) use a small discrete unit for the value. If value is just a byte, the underlying hardware will usually treat it as an indivisible unit, giving you the desired result (similar to #3). However, there are memory barriers involved also, which D respects via the "volatile" keyword. Beyond that, there may be issues with cache-reconciliation on a multi-core device, so this approach is generally not recommended.

- Kris
February 20, 2007
kris wrote:
> 
> 4) use a small discrete unit for the value. If value is just a byte, the underlying hardware will usually treat it as an indivisible unit, giving you the desired result (similar to #3). However, there are memory barriers involved also, which D respects via the "volatile" keyword. Beyond that, there may be issues with cache-reconciliation on a multi-core device, so this approach is generally not recommended.

The "volatile" keyword is somewhat tricky, as it's effectively a memory barrier for compiler optimizations only.  It will prevent the compiler from moving loads/stores across the volatile region during optimization, but it does not affect the ASM code in any way.  I think it will also affect whether register caching of loads occurs, etc, which is occasionally necessary if you're performing a busy wait on a shared variable.  ie, under normal circumstances:

    while( i == 0 )
    {
        // do nothing
    }

it's clear to the compiler that the value of 'i' will not change during the loop, so the code could theoretically be transformed into:

    if( i == 0 )
    {
        while( true )
        {
            // do nothing
        }
    }

which is an equivalent sequence of operations.  In C++, this is called the "as if" rule: the compiler can do whatever the heck it wants so long as the behavior meets expectations /within the context of the virtual machine described for the language/.  For C++, this virtual machine is single-threaded so the above transformation is legal.  I believe the same is currently true of D, though D provides "volatile" to tell the compiler "I don't care what fancy stuff you think you can do to this code to make it faster.  Don't do it.  I know more than you do about what's going on here."

That said, progress is being made towards defining a multithreaded virtual machine for C++, and once it is settled I suspect the D model will follow the C++ model in spirit, if perhaps not exactly.

By the way, for any who are interested, Doug Lea described a memory model last month that I think has tremendous promise, regardless of whether it's chosen for C++.  He describes it here:

http://www.decadentplace.org.uk/pipermail/cpp-threads/2007-January/001287.html


Sean
February 21, 2007
kris wrote:
> 
> Basicially, you need to protect the value from contention between two threads. There are a number of ways to do this:
> 
> 1) using native D facilities via the synchronized keyword: expose a getter and setter method, and have them both synch on the same object/lock. This is a fairly heavyweight resolution, but it would work.
> 

So would something like this do the trick?

class Mutex
{
    uint pointlessVariable;
}
Mutex mutex;

uint m_value = 42; // The thing to be protected.
uint value() // getter
{
    synchronized(mutex)
        return m_value;
}
uint value(uint newbie) // setter
{
    synchronized(mutex)
        m_value = newbie;

    return newbie;
}

Or am I supposed to do something else like put m_value inside the mutex class?

> 2) get under the covers and utilize a mutex, semaphore, or some other classical synchronization construct exposed by the OS itself. Tango will provide a cross-platform way of doing this in the next release. This is potentially lighter weight than #1
> 

I suppose I'll wait for that release and see what happens.

> 3) use CPU-specific instructions to ensure value access is atomic. This is what Sean has exposed in the Atomic module within Tango. It is a lightweight and low-overhead solution, and works by locking the bus for the duration of the read/write access.
> 

This sounds cool, but I don't quite understand how to use the Atomic module - what is msync and which value of it do I pick to make things work?  I made a post about this in the Tango forum incase it's more appropriate to discuss there.

> 4) use a small discrete unit for the value. If value is just a byte, the underlying hardware will usually treat it as an indivisible unit, giving you the desired result (similar to #3). However, there are memory barriers involved also, which D respects via the "volatile" keyword. Beyond that, there may be issues with cache-reconciliation on a multi-core device, so this approach is generally not recommended.
> 

Right, I'll just stay away from that.

> - Kris

When I was porting Phobos to work on ARM-WinCE, it was very helpful to be able to discard a module without breaking other parts of the lib, namely in the case of that module requiring a broken language feature or inline assembly (inline asm is not currently available with arm-wince-pe-gdc, and I don't feel like learning ARM asm yet).  That said, Atomic looks like it will be very broken on ARM in its current state.  I also benchmarked synchronized reads vs atomic reads and yeah, synchronized was much slower (I picked "whatever makes it compile" values for msync).  So I'll probably implement a version using only synchronization and a version that uses Atomic instead whenever possible.
February 21, 2007
Chad J wrote:
> When I was porting Phobos to work on ARM-WinCE, it was very helpful to be able to discard a module without breaking other parts of the lib, namely in the case of that module requiring a broken language feature or inline assembly 

if the underlying OS api's are present, then the upcoming tango.locks ought to work on WinCE. I'd imagine this to be your best bet, or to go with synchronized instead :)

Which approach you choose is ultimately down to the manner in which you need to share the entity.
February 21, 2007
Chad J wrote:
> kris wrote:
>>
>> Basicially, you need to protect the value from contention between two threads. There are a number of ways to do this:
>>
>> 1) using native D facilities via the synchronized keyword: expose a getter and setter method, and have them both synch on the same object/lock. This is a fairly heavyweight resolution, but it would work.
>>
> 
> So would something like this do the trick?
> 
> class Mutex
> {
>     uint pointlessVariable;
> }
> Mutex mutex;
> 
> uint m_value = 42; // The thing to be protected.
> uint value() // getter
> {
>     synchronized(mutex)
>         return m_value;
> }
> uint value(uint newbie) // setter
> {
>     synchronized(mutex)
>         m_value = newbie;
> 
>     return newbie;
> }
> 
> Or am I supposed to do something else like put m_value inside the mutex class?

You could use synchronized with no arguments and everything will work file.  The default behavior for free functions is to synchronize on a hidden global object.  Alternately:

    Object valueLock = new Object;

    uint m_value = 42; // The thing to be protected.

    uint value() // getter
    {
        synchronized(valueLock)
            return m_value;
    }

    uint value(uint newbie) // setter
    {
        synchronized(valueLock)
            m_value = newbie;
        return newbie;
    }

This works if you want to synch only specific functions with respect to one anohther.

>> 3) use CPU-specific instructions to ensure value access is atomic. This is what Sean has exposed in the Atomic module within Tango. It is a lightweight and low-overhead solution, and works by locking the bus for the duration of the read/write access.
> 
> This sounds cool, but I don't quite understand how to use the Atomic module - what is msync and which value of it do I pick to make things work?  I made a post about this in the Tango forum incase it's more appropriate to discuss there.

The Tango forums are probably more appropriate, but I can give a quick summary here (I'm on my way out the door as I write this). tango.core.Atomic does essentially two things: it ensures that any operation it performs is atomic, and it provides methods to control memory ordering regarding such operations.  The latter issue is somewhat complicated, but suffice to say that msync.seq is the safest option and should be used in most situations.  So for the above:

    uint m_value = 42;

    uint value() // getter
    {
        return atomicLoad!(msync.seq)( m_value );
    }

    uint value(uint newbie) // setter
    {
        atomicStore!(msync.seq)( m_value, newbie );
        return newbie;
    }

For data which will always be modified atomically, a wrapper struct is also provided:

    Atomic!(uint) m_value;

    // Atomic really needs a ctor, but this should work
    // for "fast" construction.
    m_value.store!(msync.raw)( 42 );

    uint value() // getter
    {
        return m_value.load!(msync.seq);
    }

    uint value(uint newbie) // setter
    {
        m_value.store!(msync.seq)( newbie );
        return newbie;
    }

Please note that Atomic currently only supports x86, but if there's a demand for it then I may add support for other architectures.  If this happens, it will probably under Posix (and not Win32), since I'm not entirely sure about out-of-the-box assembler support with DMD/Win32.

> When I was porting Phobos to work on ARM-WinCE, it was very helpful to be able to discard a module without breaking other parts of the lib, namely in the case of that module requiring a broken language feature or inline assembly (inline asm is not currently available with arm-wince-pe-gdc, and I don't feel like learning ARM asm yet).  That said, Atomic looks like it will be very broken on ARM in its current state.  I also benchmarked synchronized reads vs atomic reads and yeah, synchronized was much slower (I picked "whatever makes it compile" values for msync).  So I'll probably implement a version using only synchronization and a version that uses Atomic instead whenever possible.

See above :-)  Atomic won't work on ARM without additional code.

By the way, it may also eventually be necessary to add a hardware instruction for ordering load operations on x86, since I'm becoming convinced that load reordering is actually allowed by the IA-32 spec (and it may actually be done on some AMD CPUs).  I've been resisting this until now because it will slow down synchronized loads substantially for what may be only a small portion of the x86 hardware in production.  So if you (or anyone) decides to use Atomic as-is and see weird behavior with atomicLoad using msync.acq or msync.hlb, please let me know.


Sean
February 21, 2007
Sean Kelly wrote:
> Chad J wrote:
> 
>> kris wrote:
>>
>>>
>>> Basicially, you need to protect the value from contention between two threads. There are a number of ways to do this:
>>>
>>> 1) using native D facilities via the synchronized keyword: expose a getter and setter method, and have them both synch on the same object/lock. This is a fairly heavyweight resolution, but it would work.
>>>
>>
>> So would something like this do the trick?
>>
>> class Mutex
>> {
>>     uint pointlessVariable;
>> }
>> Mutex mutex;
>>
>> uint m_value = 42; // The thing to be protected.
>> uint value() // getter
>> {
>>     synchronized(mutex)
>>         return m_value;
>> }
>> uint value(uint newbie) // setter
>> {
>>     synchronized(mutex)
>>         m_value = newbie;
>>
>>     return newbie;
>> }
>>
>> Or am I supposed to do something else like put m_value inside the mutex class?
> 
> 
> You could use synchronized with no arguments and everything will work file.  The default behavior for free functions is to synchronize on a hidden global object.  Alternately:
> 
>     Object valueLock = new Object;
> 
>     uint m_value = 42; // The thing to be protected.
> 
>     uint value() // getter
>     {
>         synchronized(valueLock)
>             return m_value;
>     }
> 
>     uint value(uint newbie) // setter
>     {
>         synchronized(valueLock)
>             m_value = newbie;
>         return newbie;
>     }
> 
> This works if you want to synch only specific functions with respect to one anohther.
> 
>>> 3) use CPU-specific instructions to ensure value access is atomic. This is what Sean has exposed in the Atomic module within Tango. It is a lightweight and low-overhead solution, and works by locking the bus for the duration of the read/write access.
>>
>>
>> This sounds cool, but I don't quite understand how to use the Atomic module - what is msync and which value of it do I pick to make things work?  I made a post about this in the Tango forum incase it's more appropriate to discuss there.
> 
> 
> The Tango forums are probably more appropriate, but I can give a quick summary here (I'm on my way out the door as I write this). tango.core.Atomic does essentially two things: it ensures that any operation it performs is atomic, and it provides methods to control memory ordering regarding such operations.  The latter issue is somewhat complicated, but suffice to say that msync.seq is the safest option and should be used in most situations.  So for the above:
> 
>     uint m_value = 42;
> 
>     uint value() // getter
>     {
>         return atomicLoad!(msync.seq)( m_value );
>     }
> 
>     uint value(uint newbie) // setter
>     {
>         atomicStore!(msync.seq)( m_value, newbie );
>         return newbie;
>     }
> 
> For data which will always be modified atomically, a wrapper struct is also provided:
> 
>     Atomic!(uint) m_value;
> 
>     // Atomic really needs a ctor, but this should work
>     // for "fast" construction.
>     m_value.store!(msync.raw)( 42 );
> 
>     uint value() // getter
>     {
>         return m_value.load!(msync.seq);
>     }
> 
>     uint value(uint newbie) // setter
>     {
>         m_value.store!(msync.seq)( newbie );
>         return newbie;
>     }
> 
> Please note that Atomic currently only supports x86, but if there's a demand for it then I may add support for other architectures.  If this happens, it will probably under Posix (and not Win32), since I'm not entirely sure about out-of-the-box assembler support with DMD/Win32.
> 
>> When I was porting Phobos to work on ARM-WinCE, it was very helpful to be able to discard a module without breaking other parts of the lib, namely in the case of that module requiring a broken language feature or inline assembly (inline asm is not currently available with arm-wince-pe-gdc, and I don't feel like learning ARM asm yet).  That said, Atomic looks like it will be very broken on ARM in its current state.  I also benchmarked synchronized reads vs atomic reads and yeah, synchronized was much slower (I picked "whatever makes it compile" values for msync).  So I'll probably implement a version using only synchronization and a version that uses Atomic instead whenever possible.
> 
> 
> See above :-)  Atomic won't work on ARM without additional code.
> 
> By the way, it may also eventually be necessary to add a hardware instruction for ordering load operations on x86, since I'm becoming convinced that load reordering is actually allowed by the IA-32 spec (and it may actually be done on some AMD CPUs).  I've been resisting this until now because it will slow down synchronized loads substantially for what may be only a small portion of the x86 hardware in production.  So if you (or anyone) decides to use Atomic as-is and see weird behavior with atomicLoad using msync.acq or msync.hlb, please let me know.
> 
> 
> Sean

Cool thanks for the info and this handy low-overhead threading tool.

I didn't have any problems with msync.acq or msync.hlb so far, but when loading using msync.seq I get a Win32 Exception.  I created a ticket about this.
February 21, 2007
kris wrote:
> Chad J wrote:
> 
>> When I was porting Phobos to work on ARM-WinCE, it was very helpful to be able to discard a module without breaking other parts of the lib, namely in the case of that module requiring a broken language feature or inline assembly 
> 
> 
> if the underlying OS api's are present, then the upcoming tango.locks ought to work on WinCE. I'd imagine this to be your best bet, or to go with synchronized instead :)
> 
> Which approach you choose is ultimately down to the manner in which you need to share the entity.

Alright.

I'm starting to think it would be handy if modules that only work on some platforms (like Atomic and possibly Locks) would expose a const bool variable that is set to true if the module is supported on the hardware, and false if it isn't.  That way I could version different blocks of code by that, rather than trying to guess what will compile on the different platforms.
February 21, 2007
Chad J wrote:
> kris wrote:
> 
>> Chad J wrote:
>>
>>> When I was porting Phobos to work on ARM-WinCE, it was very helpful to be able to discard a module without breaking other parts of the lib, namely in the case of that module requiring a broken language feature or inline assembly 
>>
>>
>>
>> if the underlying OS api's are present, then the upcoming tango.locks ought to work on WinCE. I'd imagine this to be your best bet, or to go with synchronized instead :)
>>
>> Which approach you choose is ultimately down to the manner in which you need to share the entity.
> 
> 
> Alright.
> 
> I'm starting to think it would be handy if modules that only work on some platforms (like Atomic and possibly Locks) would expose a const bool variable that is set to true if the module is supported on the hardware, and false if it isn't.  That way I could version different blocks of code by that, rather than trying to guess what will compile on the different platforms.

That's a very good point. Some kind of mechanism would be very convenient
February 21, 2007
Chad J wrote:
> kris wrote:
> 
>> Chad J wrote:
>>
>>> When I was porting Phobos to work on ARM-WinCE, it was very helpful to be able to discard a module without breaking other parts of the lib, namely in the case of that module requiring a broken language feature or inline assembly 
>>
>>
>>
>> if the underlying OS api's are present, then the upcoming tango.locks ought to work on WinCE. I'd imagine this to be your best bet, or to go with synchronized instead :)
>>
>> Which approach you choose is ultimately down to the manner in which you need to share the entity.
> 
> 
> Alright.
> 
> I'm starting to think it would be handy if modules that only work on some platforms (like Atomic and possibly Locks) would expose a const bool variable that is set to true if the module is supported on the hardware, and false if it isn't.  That way I could version different blocks of code by that, rather than trying to guess what will compile on the different platforms.

If you're serious about having Tango run on WinCE, come and chat with us on the IRC channel?

- Kris
« First   ‹ Prev
1 2