| Thread overview | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
August 04, 2008 Multicores and Publication Safety | ||||
|---|---|---|---|---|
| ||||
"What memory fences are useful for on multiprocessors; and why you should care, even if you're not an assembly programmer." http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/ http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/ | ||||
August 05, 2008 Re: Multicores and Publication Safety | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright wrote:
> http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/
There seems to be a cadre of reddit readers who immediately vote down anything on D. That can be counteracted if the community votes them up!
| |||
August 05, 2008 Re: Multicores and Publication Safety | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g7855a$2sd3$1@digitalmars.com... > "What memory fences are useful for on multiprocessors; and why you should care, even if you're not an assembly programmer." > > http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/ > > http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/ None of that is relevant on x86 as far as I understand. I could only find the one regarding x86-64, but as far as I know it's the same on x86-32. http://www.intel.com/products/processor/manuals/318147.pdf The key point being loads are not reordered with other loads, and stores are not reordered with other stores. | |||
August 05, 2008 Re: Multicores and Publication Safety | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jb | Jb wrote:
> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g7855a$2sd3$1@digitalmars.com...
>> "What memory fences are useful for on multiprocessors; and why you should care, even if you're not an assembly programmer."
>>
>> http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/
>>
>> http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/
>
> None of that is relevant on x86 as far as I understand. I could only find the one regarding x86-64, but as far as I know it's the same on x86-32.
>
> http://www.intel.com/products/processor/manuals/318147.pdf
>
> The key point being loads are not reordered with other loads, and stores are not reordered with other stores.
>
Pay very close attention to sections 2.3 and 2.4 of that document.
| |||
August 05, 2008 Re: Multicores and Publication Safety | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jb | Jb wrote:
> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g7855a$2sd3$1@digitalmars.com...
>> "What memory fences are useful for on multiprocessors; and why you should care, even if you're not an assembly programmer."
>>
>> http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/
>>
>> http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/
>
> None of that is relevant on x86 as far as I understand. I could only find the one regarding x86-64, but as far as I know it's the same on x86-32.
>
> http://www.intel.com/products/processor/manuals/318147.pdf
>
> The key point being loads are not reordered with other loads, and stores are not reordered with other stores.
Not true. The actual behavior of IA-32 processors has been hotly debated, but it's been established that at least certain AMD processors may reorder loads. Also, even under the PCsc model it is completely legal to "hoist" loads above stores, or equivalently, to "sink" stores below loads. In short, unless you've *really* done your homework I suggest being very careful with respect to lock-free programming--ie. always perform fully sequenced operations just to be safe. Tango has had such a module from the start, and it looks like Phobos2 may get one fairly soon as well.
Sean
| |||
August 05, 2008 Re: Multicores and Publication Safety | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Brad Roberts | Brad Roberts wrote:
> Jb wrote:
>> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g7855a$2sd3$1@digitalmars.com...
>>> "What memory fences are useful for on multiprocessors; and why you should care, even if you're not an assembly programmer."
>>>
>>> http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/
>>>
>>> http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/
>> None of that is relevant on x86 as far as I understand. I could only find the one regarding x86-64, but as far as I know it's the same on x86-32.
>>
>> http://www.intel.com/products/processor/manuals/318147.pdf
>>
>> The key point being loads are not reordered with other loads, and stores are not reordered with other stores.
>>
>
> Pay very close attention to sections 2.3 and 2.4 of that document.
2.4 is the most interesting aspect of PC. It means that you can run into situations like this:
// Thread A
x = 1;
// Thread B
if( x == 1 )
y = 1;
// Thread C
if( y == 1 )
assert( x == 1 ); // may fail
Alex Terekhov came up with a sneaky solution for this based on how the IA-32 spec says CAS is currently implemented:
// Thread A
x = 1;
// Thread B
t = CAS( x, 0, 0 );
if( t == 1 )
y = 1;
// Thread C
if( y == 1 )
assert( x == 1 ); // true
In essence, Intel currently implements CAS by either storing the new value /or/ re-storing the old value based on the result of the comparison, and because all stores from a single processor are ordered, Thread C is therefore guaranteed to see the store to x before the store to y.
As cool as I find the above solution, however, I do hope that this helps to demonstrate the complexity of lock-free programming. It also shows just how complex analysis of this stuff is. Even with the full source code available it would take some doing for a compiler to recognize a problem similar to the above.
Sean
| |||
August 05, 2008 Re: Multicores and Publication Safety | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Sean Kelly wrote: > Brad Roberts wrote: >> Jb wrote: >>> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g7855a$2sd3$1@digitalmars.com... >>>> "What memory fences are useful for on multiprocessors; and why you should care, even if you're not an assembly programmer." >>>> >>>> http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/ >>>> >>>> >>>> http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/ >>> None of that is relevant on x86 as far as I understand. I could only find the one regarding x86-64, but as far as I know it's the same on x86-32. >>> >>> http://www.intel.com/products/processor/manuals/318147.pdf >>> >>> The key point being loads are not reordered with other loads, and stores are not reordered with other stores. >>> >> >> Pay very close attention to sections 2.3 and 2.4 of that document. > > 2.4 is the most interesting aspect of PC. It means that you can run into situations like this: > > // Thread A > x = 1; > > // Thread B > if( x == 1 ) > y = 1; > > // Thread C > if( y == 1 ) > assert( x == 1 ); // may fail > > Alex Terekhov came up with a sneaky solution for this based on how the IA-32 spec says CAS is currently implemented: > > // Thread A > x = 1; > > // Thread B > t = CAS( x, 0, 0 ); > if( t == 1 ) > y = 1; > > // Thread C > if( y == 1 ) > assert( x == 1 ); // true > > In essence, Intel currently implements CAS by either storing the new value /or/ re-storing the old value based on the result of the comparison, and because all stores from a single processor are ordered, Thread C is therefore guaranteed to see the store to x before the store to y. > > As cool as I find the above solution, however, I do hope that this helps to demonstrate the complexity of lock-free programming. It also shows just how complex analysis of this stuff is. Even with the full source code available it would take some doing for a compiler to recognize a problem similar to the above. > > > Sean For that example, section 2.8 kicks in, locked instructions (such as CAS) help constrain ordering. So.. summary. Reordering is real, even on x86 class hardware. To make life even more interesting, there's also various cpu bugs that help make things even worse. See this thread (unconfirmed info, but interesting non-the-less) on the linux-kernel mailing list: http://www.ussg.iu.edu/hypermail/linux/kernel/0808.0/0882.html Whee, Brad | |||
August 05, 2008 Re: Multicores and Publication Safety | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Brad Roberts | "Brad Roberts" <braddr@puremagic.com> wrote in message news:mailman.10.1217908384.1156.digitalmars-d@puremagic.com... > Jb wrote: >> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g7855a$2sd3$1@digitalmars.com... >>> "What memory fences are useful for on multiprocessors; and why you >>> should >>> care, even if you're not an assembly programmer." >>> >>> http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/ >>> >>> http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/ >> >> None of that is relevant on x86 as far as I understand. I could only find the one regarding x86-64, but as far as I know it's the same on x86-32. >> >> http://www.intel.com/products/processor/manuals/318147.pdf >> >> The key point being loads are not reordered with other loads, and stores >> are >> not reordered with other stores. >> > > Pay very close attention to sections 2.3 and 2.4 of that document. They dont override 2.1, they complement it. IE... *Stores cannot be reordered with other stores* *Loads cannot be reordered with other loads* x = 1; ready = 1; Happens in order whether or not a load is reordered with those stores. You cant have a situation where a processor sees the write to "ready" before it sees the write "x". What Bartoz said.. "writes to memory can be completed out of order and" Is not true on x86. What 2.3 is saying is that a later load could be reordered before either store, but it still cant be reordered before the store to 'x' and after the store to 'ready', because the order of those stores cannot be changed. If it gets reordered before the store to 'x' it implicity gets reordered before the store to ready. That's the whole point of the ordering of stores / loads being enforced. Reagrding 2.4 : What this is saying is that there may be a delay between processors seeing each others stores, not that they can be seen out of order. Processor 1 may see it's own write to 'x' before processor 2 does, but processor 2 still wont see the write to 'ready' before the write to 'x'. | |||
August 05, 2008 Re: Multicores and Publication Safety | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | "Sean Kelly" <sean@invisibleduck.org> wrote in message news:g78man$17sb$1@digitalmars.com... > Jb wrote: >> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g7855a$2sd3$1@digitalmars.com... >>> "What memory fences are useful for on multiprocessors; and why you should care, even if you're not an assembly programmer." >>> >>> http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/ >>> >>> http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/ >> >> None of that is relevant on x86 as far as I understand. I could only find the one regarding x86-64, but as far as I know it's the same on x86-32. >> >> http://www.intel.com/products/processor/manuals/318147.pdf >> >> The key point being loads are not reordered with other loads, and stores are not reordered with other stores. > > Not true. The actual behavior of IA-32 processors has been hotly debated, but it's been established that at least certain AMD processors may reorder loads. Thats news to me. > Also, even under the PCsc model it is completely legal to "hoist" loads above stores, or equivalently, to "sink" stores below loads. Yes but as long as stores are not reordered with other stores, and loads not reordered with other loads, then that kind of re-ordering wont result in the situation Bartoz described. | |||
August 05, 2008 Re: Multicores and Publication Safety | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jb | Jb wrote:
> What Bartoz said.. "writes to memory can be completed out of order and"
>
> Is not true on x86.
It's risky to write such code, however, because:
1. someone else may try to port it to another processor, and then be mystified as to why it breaks
2. Intel may change this behavior on future x86's, which means your code will break years from now
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply