February 01, 2010
On 1-feb-10, at 23:43, Michel Fortin wrote:

> Le 2010-02-01 ? 17:26, Fawzi Mohamed a ?crit :
>
>> On 1-feb-10, at 23:10, Andrei Alexandrescu wrote:
>>
>>> Fawzi Mohamed wrote:
>>>> I gave a quick reading about the fact that reading needs a lock
>>>> because otherwise it might not be updated almost forever, this is
>>>> (as far as I know) wrong.
>>>> Yes the view of one thread might be offset with respect with the
>>>> one of another, but not indefinitely so.
>>>> The main reason to put the sync is to ensure that one sees a
>>>> consistent view of the value.
>>>> If a value is always updated in an atomic way then the sync is
>>>> not needed.
>>>
>>> It's a classic that if you read (without handshake) a value in a loop thinking you're doing spinning, various compiler and processor optimizations will cache the value and spin forever. The synchronization in there is needed for the handshake, and can be optimized by the compiler if a simple barrier is needed.
>>
>> you don't need a barrier, you need a volatile statement to avoid compiler optimizations
>
> Is that something the compiler could do in an optimization pass, I mean determining when a barrier isn't necessary and volatile is enough? For instance when periodically checking a shared variable in a loop?

I think that there is some confusion about what barriers do, they just introduces a partial ordering In general a compiler cannot decide automatically to remove them.

The handshake Andrei was talking about really isn't the point here.
The thing is that the compiler should avoid some optimizations, in the
sense that it should always load from memory (and not store it in a
register and read it from there).
A barrier alone has nothing to do with that, but a read barrier can
ensure that what is visible after the read is for sure the status
after a corresponding write barrier.

Fawzi
>
> -- 
> Michel Fortin
> michel.fortin at michelf.com
> http://michelf.com/
>
>
>
> _______________________________________________
> dmd-concurrency mailing list
> dmd-concurrency at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency

February 01, 2010
On Mon, 01 Feb 2010 17:11:57 -0500, Andrei Alexandrescu <andrei at erdani.com> wrote:

> Robert Jacques wrote:
>> Scope needs to prevent escaping to any scope, which most importantly
>> includes other scope variables. i.e.:
>>  shared int[] x;
>> int[] y;
>> swap(x,y);
>>  Causes a unshared array to become shared.
>
> That won't typecheck for other reasons.
>
> There might be an issue, but the above doesn't describe it.

Okay. The problem I see is that a local version of T and a shared version of T might be swapped, which would cause local data to escape its thread. The problem is that scope doesn't know anything about the scope it contains, so it can't say whether assignment between the two is valid. It's similar to const; const doesn't know if it is mutable or immutable, so it takes a conservative view (no-mutation). Where as scope doesn't know if something is local or shared, so it has to take the conservative view (no-assignment/escapes)

February 01, 2010
Fawzi Mohamed wrote:
> you don't need a barrier, you need a volatile statement to avoid compiler optimizations

That depends on the platform. On contemporary Intel machines I agree there's no need for a barrier.

Andrei

February 02, 2010
On 2-feb-10, at 01:34, Andrei Alexandrescu wrote:

> Fawzi Mohamed wrote:
>> you don't need a barrier, you need a volatile statement to avoid compiler optimizations
>
> That depends on the platform. On contemporary Intel machines I agree there's no need for a barrier.

as I explained in another post (that I wrote several hours ago, but
unfortunately was just sent (I closed the computer too fast...) I
think that you don't really understand what barriers do.
Barriers introduce a partial ordering, they don't necessarily
guarantee that the value in a local cache is immediately "synchronized".
The important thing (in the example of accessing a single value that
is accessed atomically as in the example we are discussing about) is
that it is really read from memory (even if through caches) and not
from a register.
Noramally barriers are "global", but itanium began toying with more
local barriers (that don't force the update of the whole cache), but
even that does not change anything to the basic idea.
I am not aware of any hardware needing the kind of handshake you
describe.
Even Alpha, that is the only processor that needs the extremely
annoying dependent load barriers (I really hope that will never come
back), does not need it.

Fawzi
>
> Andrei
>
> _______________________________________________
> dmd-concurrency mailing list
> dmd-concurrency at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency

February 01, 2010
Fawzi Mohamed wrote:
> 
> On 2-feb-10, at 01:34, Andrei Alexandrescu wrote:
> 
>> Fawzi Mohamed wrote:
>>> you don't need a barrier, you need a volatile statement to avoid compiler optimizations
>>
>> That depends on the platform. On contemporary Intel machines I agree there's no need for a barrier.
> 
> as I explained in another post (that I wrote several hours ago, but unfortunately was just sent (I closed the computer too fast...) I think that you don't really understand what barriers do.

It's possible I am not expressing myself clearly. My understanding is described at length in this article:

http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf


Andrei
February 02, 2010
On 2-feb-10, at 04:59, Andrei Alexandrescu wrote:

> Fawzi Mohamed wrote:
>> On 2-feb-10, at 01:34, Andrei Alexandrescu wrote:
>>> Fawzi Mohamed wrote:
>>>> you don't need a barrier, you need a volatile statement to avoid compiler optimizations
>>>
>>> That depends on the platform. On contemporary Intel machines I agree there's no need for a barrier.
>> as I explained in another post (that I wrote several hours ago, but unfortunately was just sent (I closed the computer too fast...) I think that you don't really understand what barriers do.
>
> It's possible I am not expressing myself clearly. My understanding is described at length in this article:
>
> http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf

yes in that case you need a barrier because you expect that some other
memory (the one that holds the singleton) is initialized *if* you are
able to read the value of the pointer pointing to it.
Thus you need to introduce a partial ordering that guarantees that you
will never see the pointer pointing to the memory before seeing the
initialized memory.
You need a partial ordering in the memory write/reads.
This is not connected (in general, maybe on some hardware it is) to
reading the updated value read from memory.
What breaks an access to a single atomic value is not the absence of a
barrier, but putting it in a register and not bothering to load it
each time (something that a compiler might to in some occasions as
optimization).
I hope now the issue is clearer.

Fawzi
February 01, 2010
Fawzi Mohamed wrote:
> 
> On 2-feb-10, at 04:59, Andrei Alexandrescu wrote:
> 
>> Fawzi Mohamed wrote:
>>> On 2-feb-10, at 01:34, Andrei Alexandrescu wrote:
>>>> Fawzi Mohamed wrote:
>>>>> you don't need a barrier, you need a volatile statement to avoid compiler optimizations
>>>>
>>>> That depends on the platform. On contemporary Intel machines I agree there's no need for a barrier.
>>> as I explained in another post (that I wrote several hours ago, but unfortunately was just sent (I closed the computer too fast...) I think that you don't really understand what barriers do.
>>
>> It's possible I am not expressing myself clearly. My understanding is described at length in this article:
>>
>> http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
> 
> yes in that case you need a barrier because you expect that some other
> memory (the one that holds the singleton) is initialized *if* you are
> able to read the value of the pointer pointing to it.
> Thus you need to introduce a partial ordering that guarantees that you
> will never see the pointer pointing to the memory before seeing the
> initialized memory.
> You need a partial ordering in the memory write/reads.
> This is not connected (in general, maybe on some hardware it is) to
> reading the updated value read from memory.
> What breaks an access to a single atomic value is not the absence of a
> barrier, but putting it in a register and not bothering to load it each
> time (something that a compiler might to in some occasions as
> optimization).
> I hope now the issue is clearer.

I suggest we resume and start with describing what was unclear, then specifying the steps we need to take to clarify things better in the draft. Thanks!

Andrei

February 02, 2010
On 2-feb-10, at 05:12, Fawzi Mohamed wrote:

>
> On 2-feb-10, at 04:59, Andrei Alexandrescu wrote:
>
>> Fawzi Mohamed wrote:
>>> On 2-feb-10, at 01:34, Andrei Alexandrescu wrote:
>>>> Fawzi Mohamed wrote:
>>>>> you don't need a barrier, you need a volatile statement to avoid compiler optimizations
>>>>
>>>> That depends on the platform. On contemporary Intel machines I agree there's no need for a barrier.
>>> as I explained in another post (that I wrote several hours ago, but unfortunately was just sent (I closed the computer too fast...) I think that you don't really understand what barriers do.
>>
>> It's possible I am not expressing myself clearly. My understanding is described at length in this article:
>>
>> http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
>
> yes in that case you need a barrier because you expect that some
> other memory (the one that holds the singleton) is initialized *if*
> you are able to read the value of the pointer pointing to it.
> Thus you need to introduce a partial ordering that guarantees that
> you will never see the pointer pointing to the memory before seeing
> the initialized memory.
> You need a partial ordering in the memory write/reads.
> This is not connected (in general, maybe on some hardware it is) to
> reading the updated value read from memory.
> What breaks an access to a single atomic value is not the absence of
> a barrier, but putting it in a register and not bothering to load it
> each time (something that a compiler might to in some occasions as
> optimization).
> I hope now the issue is clearer.

by the way the article you linked is correct, and points to a real problem of naive double lock patterns, but not relevant in the current context, the details of concurrency are difficult, I also get confused at times :).

Fawzi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/dmd-concurrency/attachments/20100202/667fe9db/attachment.htm>
February 02, 2010
> I suggest we resume and start with describing what was unclear, then specifying the steps we need to take to clarify things better in the draft. Thanks!

p 26-27
"All operations on _balance are now protected by acquiring _guard. It
may seem
there is no need to protect balance with _guard because a double can
be read atomi-
cally, but protection must be there for subtle reasons that shall come
forth later. In brief,
due to today?s aggressive optimizing compilers and relaxed memory
models, all access
to shared data must entail some handshake between threads; no
handshake means a
thread could call obj.balance repeatedly and always get a cached copy
form some pro-
cessor?s cache that never, ever gets updated, in spite of other
threads? frenetic use of
deposit and withdraw. ( This is one of the ways in which modern
multithreading de?es
intuition and confuses programmers versed in classic multithreading.)"

 From the previous discussion it should be clear that as written it is
wrong.
Indeed it is better to have a barrier there (if the update would not
be atomic, in general case) but there is no need for it in this
specific case, as access is atomic.
It is wrong that a cache of a processor might be never updated, it
will be updated at some point, and a barrier (in general) will not
force the update to be performed immediately.
A problem that can come up is that the variable is promoted to a
register in a loop and is not updated from the memory.
To avoid this one can use volatile. This issue is a priory
disconnected with the presence of barriers (well one would hope that a
correct compiler disables "upgrading to registers" for variables in a
locked section crossing the boundary of it, but a priory it doesn't
have to be like that, and especially using just memory barriers, I
would not trust current compilers to always do the correct thing
without a "volatile".

Fawzi

February 01, 2010
On Feb 1, 2010, at 8:41 PM, Fawzi Mohamed wrote:

>> I suggest we resume and start with describing what was unclear, then specifying the steps we need to take to clarify things better in the draft. Thanks!
> 
> p 26-27
> "All operations on _balance are now protected by acquiring _guard. It may seem
> there is no need to protect balance with _guard because a double can be read atomi-
> cally, but protection must be there for subtle reasons that shall come forth later. In brief,
> due to today?s aggressive optimizing compilers and relaxed memory models, all access
> to shared data must entail some handshake between threads; no handshake means a
> thread could call obj.balance repeatedly and always get a cached copy form some pro-
> cessor?s cache that never, ever gets updated, in spite of other threads? frenetic use of
> deposit and withdraw. ( This is one of the ways in which modern multithreading de?es
> intuition and confuses programmers versed in classic multithreading.)"
> 
> From the previous discussion it should be clear that as written it is wrong.
> Indeed it is better to have a barrier there (if the update would not be atomic, in general case) but there is no need for it in this specific case, as access is atomic.
> It is wrong that a cache of a processor might be never updated, it will be updated at some point, and a barrier (in general) will not force the update to be performed immediately.

Yup.  It's true that a LOCK prefix triggered a bus lock once upon a time, but it doesn't escape the cache any longer.  This would have been as close to a handshake as I can come up with.

> A problem that can come up is that the variable is promoted to a register in a loop and is not updated from the memory.
> To avoid this one can use volatile. This issue is a priory disconnected with the presence of barriers (well one would hope that a correct compiler disables "upgrading to registers" for variables in a locked section crossing the boundary of it, but a priory it doesn't have to be like that, and especially using just memory barriers, I would not trust current compilers to always do the correct thing without a "volatile".

Sadly, volatile is deprecated in D 2.0.  You pretty much have to use inline asm and rely on the fact that Walter has said D compilers should never optimize in or across asm code.