August 05, 2008
"Walter Bright" <newshound1@digitalmars.com> wrote in message news:g795mq$25jq$1@digitalmars.com...
> Jb wrote:
>> What Bartoz said.. "writes to memory can be completed out of order and"
>>
>> Is not true on x86.
>
> It's risky to write such code, however, because:
>
> 1. someone else may try to port it to another processor, and then be mystified as to why it breaks

You cant design / write your code based on the idea that someone who doesnt know what they are doing will try and modify it later. And if they are unaware of memory ordering they are likely unaware of alignment atomicity, and probably dont understand the subtleties of syncronization, and a whole bunch of other issues.

I'm not saying every joe blogs programmer should know about memory ordering and use it where they can to avoid more expensive syncronization primatives. But the compiler and stdlib, or multithreding librarys, should know about it. I dont think the compiler should be dumping memory fences all over the place on the assumtion that they might be needed by the x86 processors of 2012.


> 2. Intel may change this behavior on future x86's, which means your code will break years from now

I dont think they could because i think a lot of code probably already relys on it. And i think it's likely that the new comitment to strong memory ordering, from both AMD and INTEL (both have pdfs regarding 64 bit that specify it), is mainly because they realize it is needed to help progress with multi core.




August 05, 2008
Jb wrote:
> "Sean Kelly" <sean@invisibleduck.org> wrote in message news:g78man$17sb$1@digitalmars.com...
>> Jb wrote:
>>> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g7855a$2sd3$1@digitalmars.com...
>>>> "What memory fences are useful for on multiprocessors; and why you should care, even if you're not an assembly programmer."
>>>>
>>>> http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/
>>>>
>>>> http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/
>>> None of that is relevant on x86 as far as I understand. I could only find the one regarding x86-64, but as far as I know it's the same on x86-32.
>>>
>>> http://www.intel.com/products/processor/manuals/318147.pdf
>>>
>>> The key point being loads are not reordered with other loads, and stores are not reordered with other stores.
>> Not true.  The actual behavior of IA-32 processors has been hotly debated, but it's been established that at least certain AMD processors may reorder loads.
> 
> Thats news to me.
> 
> 
>> Also, even under the PCsc model it is completely legal to "hoist" loads above stores, or equivalently, to "sink" stores below loads.
> 
> Yes but as long as stores are not reordered with other stores, and loads not reordered with other loads, then that kind of re-ordering wont result in the situation Bartoz described.

True enough.  It's mostly an issue with creating mutexes and the like.


Sean
August 05, 2008
Jb wrote:
> "Sean Kelly" <sean@invisibleduck.org> wrote in message news:g78man$17sb$1@digitalmars.com...
>> Jb wrote:
>>> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g7855a$2sd3$1@digitalmars.com...
>>>> "What memory fences are useful for on multiprocessors; and why you should care, even if you're not an assembly programmer."
>>>>
>>>> http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/
>>>>
>>>> http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/
>>> None of that is relevant on x86 as far as I understand. I could only find the one regarding x86-64, but as far as I know it's the same on x86-32.
>>>
>>> http://www.intel.com/products/processor/manuals/318147.pdf
>>>
>>> The key point being loads are not reordered with other loads, and stores are not reordered with other stores.
>> Not true.  The actual behavior of IA-32 processors has been hotly debated, but it's been established that at least certain AMD processors may reorder loads.
> 
> Thats news to me.

I don't know that this was ever confirmed with anyone at AMD, but it did come up in the C++0x talks and I believe the linux kernel accounts for it.


Sean
August 05, 2008
"Sean Kelly" <sean@invisibleduck.org> wrote in message news:g79ugv$mdd$1@digitalmars.com...
> Jb wrote:
>> "Sean Kelly" <sean@invisibleduck.org> wrote in message news:g78man$17sb$1@digitalmars.com...
>>> Jb wrote:
>>>> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g7855a$2sd3$1@digitalmars.com...
>>>>> "What memory fences are useful for on multiprocessors; and why you should care, even if you're not an assembly programmer."
>>>>>
>>>>> http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/
>>>>>
>>>>> http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/
>>>> None of that is relevant on x86 as far as I understand. I could only find the one regarding x86-64, but as far as I know it's the same on x86-32.
>>>>
>>>> http://www.intel.com/products/processor/manuals/318147.pdf
>>>>
>>>> The key point being loads are not reordered with other loads, and stores are not reordered with other stores.
>>> Not true.  The actual behavior of IA-32 processors has been hotly debated, but it's been established that at least certain AMD processors may reorder loads.
>>
>> Thats news to me.
>
> I don't know that this was ever confirmed with anyone at AMD, but it did come up in the C++0x talks and I believe the linux kernel accounts for it.

I did a bit of googling and it does seem older AMDs were less strongly ordered. It seems SSE/3DNow non temporal stores particulary. But it looks like they have gone for strong ordering with AMD64.

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf

From 7.2 : Multiprocessor Memory Ordering.

"Loads do not pass previous loads (loads are not re-ordered). Stores do not
pass previous stores
(stores are not re-ordered)"

Although skim reading more of chapter 7 it looks like they might do reordering behind the scence, or "such that the appearance of in-order execution is maintained" as they say.

My guess is that strong ordering, or at least the appearance of it, is an important factor in multi core cpus scalling well.


August 05, 2008
Jb wrote:
> "Sean Kelly" <sean@invisibleduck.org> wrote in message news:g79ugv$mdd$1@digitalmars.com...
>> Jb wrote:
>>> "Sean Kelly" <sean@invisibleduck.org> wrote in message news:g78man$17sb$1@digitalmars.com...
>>>> Jb wrote:
>>>>> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g7855a$2sd3$1@digitalmars.com...
>>>>>> "What memory fences are useful for on multiprocessors; and why you should care, even if you're not an assembly programmer."
>>>>>>
>>>>>> http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/
>>>>>>
>>>>>> http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/
>>>>> None of that is relevant on x86 as far as I understand. I could only find the one regarding x86-64, but as far as I know it's the same on x86-32.
>>>>>
>>>>> http://www.intel.com/products/processor/manuals/318147.pdf
>>>>>
>>>>> The key point being loads are not reordered with other loads, and stores are not reordered with other stores.
>>>> Not true.  The actual behavior of IA-32 processors has been hotly debated, but it's been established that at least certain AMD processors may reorder loads.
>>> Thats news to me.
>> I don't know that this was ever confirmed with anyone at AMD, but it did come up in the C++0x talks and I believe the linux kernel accounts for it.
> 
> I did a bit of googling and it does seem older AMDs were less strongly ordered. It seems SSE/3DNow non temporal stores particulary. But it looks like they have gone for strong ordering with AMD64.
> 
> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
> 
> From 7.2 : Multiprocessor Memory Ordering.
> 
> "Loads do not pass previous loads (loads are not re-ordered). Stores do not pass previous stores
> (stores are not re-ordered)"
> 
> Although skim reading more of chapter 7 it looks like they might do reordering behind the scence, or "such that the appearance of in-order execution is maintained" as they say.

At least AMD and Intel have figured out how to separate discussion of implementation issues with visible behavior.  The original IA-32 spec was an absolute disaster in this respect.  I'm also encouraged that the memory model has been both fully specified and strengthened to PCsc or better.  The x86 has always been pretty easy to deal with and it's nice to see that this will continue to be true.  I suppose my only question at this point is how the official memory barrier instructions apply to normal (non-SSE) instruction ordering.  I don't suppose the recent specs say anything about this?

> My guess is that strong ordering, or at least the appearance of it, is an important factor in multi core cpus scalling well.

Yup.  And the Intel announcement makes the very good point that it's a huge factor in performance per watt as well.  Strengthening the memory model and shrinking the pipeline allows for a tremendous amount of logic hardware to simply be thrown away, which means smaller, cooler, more energy-efficient CPUs.  My big question now is how computers will be built in the coming years... will we have a few traditional (fast) cores plus a general-purpose parallel computing cluster?  I suppose I should read that Intel paper posted yesterday.


Sean
August 05, 2008
Sean Kelly wrote:
> My big question now is how computers will be built in the coming years... will we have a few traditional (fast) cores plus a general-purpose parallel computing cluster?
> 
> Sean

Interesting you should bring this up. I was just reading an article yesterday about the "Cell Broadband Engine" used in the Playstation 3.

It features one general-purpose 64-bit PowerPC chip (the "Power Processor Element") and eight co-processing cores (the "Synergistic Processing Units"), each with a 128-bit SIMD architecture.

So, at least from the perspective of IBM and Sony, the answer is "yes".

--benji
August 06, 2008
Jb Wrote:

> 
> "Walter Bright" <newshound1@digitalmars.com> wrote in message news:g795mq$25jq$1@digitalmars.com...
> > Jb wrote:
> >> What Bartoz said.. "writes to memory can be completed out of order and"
> >>
> >> Is not true on x86.
> >
> > It's risky to write such code, however, because:
> >
> > 1. someone else may try to port it to another processor, and then be mystified as to why it breaks
> 
> You cant design / write your code based on the idea that someone who doesnt know what they are doing will try and modify it later. And if they are unaware of memory ordering they are likely unaware of alignment atomicity, and probably dont understand the subtleties of syncronization, and a whole bunch of other issues.
> 
> I'm not saying every joe blogs programmer should know about memory ordering and use it where they can to avoid more expensive syncronization primatives. But the compiler and stdlib, or multithreding librarys, should know about it. I dont think the compiler should be dumping memory fences all over the place on the assumtion that they might be needed by the x86 processors of 2012.

The model the compiler uses is to generate code "as if" fences were inserted everywhere. The compiler may, however, as part of optimization and generating code for a particular CPU, elide as many as it can.


> > 2. Intel may change this behavior on future x86's, which means your code will break years from now
> 
> I dont think they could because i think a lot of code probably already relys on it. And i think it's likely that the new comitment to strong memory ordering, from both AMD and INTEL (both have pdfs regarding 64 bit that specify it), is mainly because they realize it is needed to help progress with multi core.

I think that is because the current language technology is deficient. We aim to fix that with D :-)

August 06, 2008
"Walter Bright" <walter@nospammm-digitalmars.com> wrote in message news:g7b7h1$aeb$1@digitalmars.com...
>>
>> > 2. Intel may change this behavior on future x86's, which means your
>> > code
>> > will break years from now
>>
>> I dont think they could because i think a lot of code probably already
>> relys
>> on it. And i think it's likely that the new comitment to strong memory
>> ordering, from both AMD and INTEL (both have pdfs regarding 64 bit that
>> specify it), is mainly because they realize it is needed to help progress
>> with multi core.
>
> I think that is because the current language technology is deficient. We aim to fix that with D :-)

FWIW i think you're right.

But a little more help from the hardware would be nice aswell. I'd like to see "lock free" (non blocking) syncronization made a bit easier, somthing like a double CAS.


1 2
Next ›   Last »