November 15, 2012
Le 14/11/2012 21:01, Sean Kelly a écrit :
> On Nov 14, 2012, at 6:32 AM, Andrei Alexandrescu<SeeWebsiteForEmail@erdani.org>  wrote:
>>
>> This is a simplification of what should be going on. The core.atomic.{atomicLoad, atomicStore} functions must be intrinsics so the compiler generate sequentially consistent code with them (i.e. not perform certain reorderings). Then there are loads and stores with weaker consistency semantics (acquire, release, acquire/release, and consume).
>
> No.  These functions all contain volatile ask blocks.  If the compiler respected the "volatile" it would be enough.

It is sufficient for monocore and mostly correct for x86. But isn't enough.

volatile isn't for concurency, but memory mapping.
November 15, 2012
On 15.11.2012 11:52, Manu wrote:
> On 15 November 2012 12:14, Jacob Carlborg <doob@me.com
> <mailto:doob@me.com>> wrote:
>
>     On 2012-11-15 10:22, Manu wrote:
>
>         Not to repeat my prev post... but in reply to Walter's take on
>         it, it
>         would be interesting if 'shared' just added implicit lock()/unlock()
>         methods to do the mutex acquisition and then remove the cast
>         requirement, but have the language runtime assert that the object is
>         locked whenever it is accessed (this guarantees the safety in a more
>         useful way, the casts are really annying). I can't imagine a
>         simpler and
>         more immediately useful solution.
>
>
>     How about implementing a library function, something like this:
>
>     shared int i;
>
>     lock(i, (x) {
>          // operate on x
>     });
>
>     * "lock" will acquire a lock
>     * Cast away shared for "i"
>     * Call the delegate with the now plain "int"
>     * Release the lock
>
>     http://pastebin.com/tfQ12nJB
>
>
> Interesting concept. Nice idea, could certainly be useful, but it
> doesn't address the problem as directly as my suggestion.
> There are still many problem situations, for instance, any time a
> template is involved. The template doesn't know to do that internally,
> but under my proposal, you lock it prior to the workload, and then the
> template works as expected. Templates won't just break and fail whenever
> shared is involved, because assignments would be legal. They'll just
> assert that the thing is locked at the time, which is the programmers
> responsibility to ensure.


I managed to make a simple example that works with the current implementation:

http://dpaste.dzfl.pl/27b6df62

http://forum.dlang.org/thread/k7orpj$1tt5$1@digitalmars.com?page=4#post-k7s0gs:241h45:241:40digitalmars.com

It seems to me that solving this shared issue cannot be done purely on a compiler basis but will require a runtime support. Actually I don't see how it can be done properly without telling "this lock must be locked when accessing this variable".

http://dpaste.dzfl.pl/edbd3e10
November 15, 2012
On Thursday, November 15, 2012 14:32:47 Manu wrote:
> On 15 November 2012 13:38, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> I don't really see the difference, other than, as you say, the cast is
> explicit.
> Obviously the possibility for the situation you describe exists, it's
> equally possible with the cast, except this way, the usage pattern is made
> more convenient, the user has a convenient way to control the locks and
> most importantly, it would work with templates.
> That said, this sounds like another perfect application of 'scope'. Perhaps
> only scope parameters can receive a locked, shared thing... that would
> mechanically protect you against escape.

You could make casting away const implicit too, which would make some code easier, but it would be a disaster, because the programer wouldn't have a clue that it's happening in many cases, and the code would end up being very, very wrong. Implicitly casting away shared would put you in the same boat. _Maybe_ you could get away with it in very restricted circumstances where both pure and scope are being used, but then it becomes so restrictive that it's nearly useless anyway. And again, it would be hidden from the programmer, when this is something that _needs_ to be explicit. Having implicit locks happen on you could really screw with any code trying to do explicit locks, as would be needed anyway in all but the most basic cases.

> 2. It's often the case that you need to lock/unlock groups of stuff together
> > such that locking specific variables is of often of limited use and would
> > just
> > introduce pointless extra locks when dealing with multiple variables. It
> > would
> > also increase the risk of deadlocks, because you wouldn't have much - if
> > any -
> > control over what order locks were acquired in when dealing with multiple
> > shared variables.
> 
> Your fear is precisely the state we're in now, except it puts all the work
> on the user to create and use the synchronisation objects, and also to
> assert that things are locked when they are accessed.
> I'm just suggesting some reasonably simple change that would make the
> situation more usable and safer immediately, short of waiting for all these
> fantastic designs being discussed having time to simmer and manifest.

Except that with your suggestion, you're introducing potential deadlocks which are outside of the programmer's control, and you're introducing extra overhead with those locks (both in terms of memory and in terms of the runtime costs). Not to mention, it would probably cause all kinds of issues for something like shared int* to have a mutex with it, because then its size is completely different from int*. It also would cause even worse problems when that shared int* was cast to int* (aside from the size issues), because all of the locking that was happening for the shared int* was invisible. If you want automatic locks, then use synchronized classes. That's what they're for.

Honestly, I really don't buy into the idea that it makes sense for shared to magically make multi-threaded code work without the programmer worrying about locks. Making it so that it's well-defined as to what's atomic is great for code that has any chance of being lock-free, but it's still up to the programmer to understand when locks are and aren't needed and how to use them correctly. I don't think that it can possibly work for it to be automatic. It's far to easy to introduce deadlocks, and it would only work in the simplest of cases anyway, meaning that the programmer needs to understand and properly solve the issues anyway. And if the programmer has to understand it all to get it right, why bother adding the extra overhead and deadlock potential caused by automatically locking anything? D provides some great synchronization primitives. People should use them.

I think that the only things that share really needs to be solving are:

1. Indicating to the compiler via the type system that the object is not thread-local. This properly segregates shared and unshared code and allows the compiler to take advantage of thread locality for optimizations and avoid optimizations with shared code that screw up threading (e.g. double-checked locking won't work if the compiler does certain optimizations).

2. Making it explicit and well-defined as part of the language which operations can assumed to be atomic (even if it that set of operations is very small, having it be well-defined is valuable).

3. Ensuring sequential consistency so that it's possible to do lock-free code when atomic operations permit it and so that there are fewer weird issues due to undefined behavior.

- Jonathan M Davis
November 15, 2012
Le 14/11/2012 23:21, Andrei Alexandrescu a écrit :
> On 11/14/12 12:00 PM, Sean Kelly wrote:
>> On Nov 14, 2012, at 6:16 AM, Andrei
>> Alexandrescu<SeeWebsiteForEmail@erdani.org> wrote:
>>
>>> On 11/14/12 1:20 AM, Walter Bright wrote:
>>>> On 11/13/2012 11:37 PM, Jacob Carlborg wrote:
>>>>> If the compiler should/does not add memory barriers, then is there a
>>>>> reason for
>>>>> having it built into the language? Can a library solution be enough?
>>>>
>>>> Memory barriers can certainly be added using library functions.
>>>
>>> The compiler must understand the semantics of barriers such as e.g.
>>> it doesn't hoist code above an acquire barrier or below a release
>>> barrier.
>>
>> That was the point of the now deprecated "volatile" statement. I still
>> don't entirely understand why it was deprecated.
>
> Because it's better to associate volatility with data than with code.
>

Happy to see I'm not alone on that one.

Plus, volatile and sequential consistency are 2 different beast. Volatile means no register promotion and no load/store reordering. It is required, but not sufficient for concurrency.
November 15, 2012
Le 15/11/2012 10:08, Manu a écrit :
> The Nintendo Wii for instance, not an unpopular machine, only sold 130
> million units! Does not have synchronisation instructions in the
> architecture (insane, I know, but there it is. I've had to spend time
> working around this in the past).
> I'm sure it's not unique in this way.
>

Can you elaborate on that ?
November 15, 2012
Le 14/11/2012 22:09, Walter Bright a écrit :
> On 11/14/2012 7:08 AM, Andrei Alexandrescu wrote:
>> On 11/14/12 6:39 AM, Alex Rønne Petersen wrote:
>>> On 14-11-2012 15:14, Andrei Alexandrescu wrote:
>>>> On 11/14/12 1:19 AM, Walter Bright wrote:
>>>>> On 11/13/2012 11:56 PM, Jonathan M Davis wrote:
>>>>>> Being able to have double-checked locking work would be valuable, and
>>>>>> having
>>>>>> memory barriers would reduce race condition weirdness when locks
>>>>>> aren't used
>>>>>> properly, so I think that it would be desirable to have memory
>>>>>> barriers.
>>>>>
>>>>> I'm not saying "memory barriers are bad". I'm saying that having the
>>>>> compiler blindly insert them for shared reads/writes is far from the
>>>>> right way to do it.
>>>>
>>>> Let's not hasten. That works for Java and C#, and is allowed in C++.
>>>>
>>>> Andrei
>>>>
>>>>
>>>
>>> I need some clarification here: By memory barrier, do you mean x86's
>>> mfence, sfence, and lfence?
>>
>> Sorry, I was imprecise. We need to (a) define intrinsics for loading
>> and storing
>> data with high-level semantics (a short list: acquire, release,
>> acquire+release,
>> and sequentially-consistent) and THEN (b) implement the needed code
>> generation
>> appropriately for each architecture. Indeed on x86 there is little
>> need to
>> insert fence instructions, BUT there is a definite need for the
>> compiler to
>> prevent certain reorderings. That's why implementing shared data
>> operations
>> (whether implicit or explicit) as sheer library code is NOT possible.
>>
>>> Because as Walter said, inserting those blindly when unnecessary can
>>> lead to terrible performance because it practically murders
>>> pipelining.
>>
>> I think at this point we need to develop a better understanding of
>> what's going
>> on before issuing assessments.
>
> Yes. And also, I agree that having something typed as "shared" must
> prevent the compiler from reordering them. But that's separate from
> inserting memory barriers.
>

I'm sorry but that is dumb.

What is the point of ensuring that the compiler does not reorder load/stores if the CPU is allowed to do so ?
November 15, 2012
Am 15.11.2012 05:32, schrieb Andrei Alexandrescu:
> On 11/14/12 7:24 PM, Jonathan M Davis wrote:
>> On Thursday, November 15, 2012 03:51:13 Jonathan M Davis wrote:
>>> I have no idea what we want to do about this situation though. Regardless of what we do with memory barriers and the like, it has no impact on whether casts are required. And I think that introducing the shared equivalent of const would be a huge mistake, because then most code would end up being written using that attribute, meaning that all code essentially has to be treated as shared from the standpoint of compiler optimizations. It would almost be the same as making everything shared by default again. So, as far as I can see, casting is what we're forced to do.
>>
>> Actually, I think that what it comes down to is that shared works nicely when you have a type which is designed to be shared, and it encapsulates everything that it needs. Where it starts requiring casting is when you need to pass it to other stuff.
>>
>> - Jonathan M Davis
> 
> TDPL 13.14 explains that inside synchronized classes, top-level shared is automatically lifted.
> 
> Andrei

There are three problems I currently see with this:

 - It's not actually implemented
 - It's not safe because unshared references can be escaped or dragged in
 - Synchronized classes provide no way to avoid the automatic locking in certain methods, but often
it is necessary to have more fine-grained control for efficiency reasons, or to avoid dead-locks

November 15, 2012
On 15 November 2012 15:00, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> On Thursday, November 15, 2012 14:32:47 Manu wrote:
> > On 15 November 2012 13:38, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
>
> > I don't really see the difference, other than, as you say, the cast is
> > explicit.
> > Obviously the possibility for the situation you describe exists, it's
> > equally possible with the cast, except this way, the usage pattern is
> made
> > more convenient, the user has a convenient way to control the locks and
> > most importantly, it would work with templates.
> > That said, this sounds like another perfect application of 'scope'.
> Perhaps
> > only scope parameters can receive a locked, shared thing... that would mechanically protect you against escape.
>
> You could make casting away const implicit too, which would make some code
> easier, but it would be a disaster, because the programer wouldn't have a
> clue
> that it's happening in many cases, and the code would end up being very,
> very
> wrong. Implicitly casting away shared would put you in the same boat.


... no, they're not even the same thing. const things can not be changed. Shared things are still mutable things, and perfectly compatible with other non-shared mutable things, they just have some access control requirements.

_Maybe_ you could get away with it in very restricted circumstances where
> both pure
> and scope are being used, but then it becomes so restrictive that it's
> nearly
> useless anyway. And again, it would be hidden from the programmer, when
> this
> is something that _needs_ to be explicit. Having implicit locks happen on
> you
> could really screw with any code trying to do explicit locks, as would be
> needed anyway in all but the most basic cases.
>

I think you must have misunderstood my suggestion, I certainly didn't
suggest locking would be implicit.
All locks would be explicit, all I suggested is that shared things would
gain an associated mutex, and an implicit assert that said mutex is locked
whenever it is accessed, rather than deny assignment between
shared/unshared things.

You could use lock methods, or a nice alternative would be to submit them to some sort of synchronised scope like luka illustrates.

I'm of the opinion that for the time being, explicit lock control is mandatory (anything else is a distant dream), and atomic primitives may not be relied upon.

> 2. It's often the case that you need to lock/unlock groups of stuff together
> > > such that locking specific variables is of often of limited use and
> would
> > > just
> > > introduce pointless extra locks when dealing with multiple variables.
> It
> > > would
> > > also increase the risk of deadlocks, because you wouldn't have much -
> if
> > > any -
> > > control over what order locks were acquired in when dealing with
> multiple
> > > shared variables.
> >
> > Your fear is precisely the state we're in now, except it puts all the
> work
> > on the user to create and use the synchronisation objects, and also to
> > assert that things are locked when they are accessed.
> > I'm just suggesting some reasonably simple change that would make the
> > situation more usable and safer immediately, short of waiting for all
> these
> > fantastic designs being discussed having time to simmer and manifest.
>
> Except that with your suggestion, you're introducing potential deadlocks
> which
> are outside of the programmer's control, and you're introducing extra
> overhead
> with those locks (both in terms of memory and in terms of the runtime
> costs).
> Not to mention, it would probably cause all kinds of issues for something
> like
> shared int* to have a mutex with it, because then its size is completely
> different from int*. It also would cause even worse problems when that
> shared
> int* was cast to int* (aside from the size issues), because all of the
> locking
> that was happening for the shared int* was invisible. If you want automatic
> locks, then use synchronized classes. That's what they're for.
>
> Honestly, I really don't buy into the idea that it makes sense for shared
> to
> magically make multi-threaded code work without the programmer worrying
> about
> locks. Making it so that it's well-defined as to what's atomic is great for
> code that has any chance of being lock-free, but it's still up to the
> programmer to understand when locks are and aren't needed and how to use
> them
> correctly. I don't think that it can possibly work for it to be automatic.
> It's far to easy to introduce deadlocks, and it would only work in the
> simplest of cases anyway, meaning that the programmer needs to understand
> and
> properly solve the issues anyway. And if the programmer has to understand
> it
> all to get it right, why bother adding the extra overhead and deadlock
> potential caused by automatically locking anything? D provides some great
> synchronization primitives. People should use them.
>

To all above:
You've completely misunderstood my suggestion. It's basically the same as
luka.
It's not that hard, shared just assists the user do what they do anyway by
associating a lock primitive, and implicitly assert it is locked when
accessed.
No magic should be performed on the users behalf.

I think that the only things that share really needs to be solving are:
>
> 1. Indicating to the compiler via the type system that the object is not
> thread-local. This properly segregates shared and unshared code and allows
> the
> compiler to take advantage of thread locality for optimizations and avoid
> optimizations with shared code that screw up threading (e.g. double-checked
> locking won't work if the compiler does certain optimizations).
>
> 2. Making it explicit and well-defined as part of the language which
> operations
> can assumed to be atomic (even if it that set of operations is very small,
> having it be well-defined is valuable).
>
> 3. Ensuring sequential consistency so that it's possible to do lock-free
> code
> when atomic operations permit it and so that there are fewer weird issues
> due
> to undefined behavior.
>
> - Jonathan M Davis
>


November 15, 2012
On 11/15/12 1:08 AM, Manu wrote:
> On 14 November 2012 19:54, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org <mailto:SeeWebsiteForEmail@erdani.org>>
> wrote:
>     Yah, the whole point here is that we need something IN THE LANGUAGE
>     DEFINITION about atomicLoad and atomicStore. NOT IN THE IMPLEMENTATION.
>
>     THIS IS VERY IMPORTANT.
>
>
> I won't outright disagree, but this seems VERY dangerous to me.
>
> You need to carefully study all popular architectures, and consider that
> if the language is made to depend on these primitives, and the
> architecture doesn't support it, or support that particular style of
> implementation (fairly likely), than D will become incompatible with a
> huge number of architectures on that day.

All contemporary languages that are serious about concurrency support atomic primitives one way or another. We must too. There's no two ways about it.

[snip]
> Side note: I still think a convenient and fairly practical solution is
> to make 'shared' things 'lockable'; where you can lock()/unlock() them,
> and assignment to/from shared things is valid (no casting), but a
> runtime assert insists that the entity is locked whenever it is
> accessed.

This (IIUC) is conflating mutex-based synchronization with memory models and atomic operations. I suggest we postpone anything related to that for the sake of staying focused.


Andrei
November 15, 2012
11/15/2012 1:06 AM, Walter Bright пишет:
> On 11/14/2012 3:14 AM, Benjamin Thaut wrote:
>> A small code example which would break as soon as we allow destructing
>> of shared
>> value types would really be nice.
>
> I hate to repeat myself, but:
>
> Thread 1:
>      1. create shared object
>      2. pass reference to that object to Thread 2
>      3. destroy object
>
> Thread 2:
>      1. manipulate that object

Ain't structs typically copied anyway?

Reference would imply pointer then. If the struct is on the stack (weird but could be) then the thread that created it destroys the object once. The thing is as unsafe as escaping a pointer is.

Personally I think that shared stuff allocated on the stack is here-be-dragons @system code in any case.

Otherwise it's GC's responsibility to destroy heap allocated struct when there are no references to it.

What's so puzzling about it?

BTW currently GC-allocated structs are not having their destructor called at all. The bug is however _minor_ ...

http://d.puremagic.com/issues/show_bug.cgi?id=2834

-- 
Dmitry Olshansky