January 04, 2010
Jason House wrote:
> On Jan 4, 2010, at 12:00 PM, Andrei Alexandrescu <andrei at erdani.com> wrote:
> 
>> 
>> ... subject to the "tail-shared" exemption that I'll discuss at a later point.
> 
> I wish you'd stop giving teasers like that. It feels like we can't have a discussion because a) you haven't tried to share your perspective b) you're too busy to have the conversation anyway
> 
> I'm probably way off with my impression...

In the words of Wolf in Pulp Fiction: "If I'm curt it's because time is of the essence".

The tail-shared exemption is very simple and deducible so I didn't think it was necessary to give that detail at this point: inside a synchronized method, we know the current object is locked but the indirectly-accessed memory is not. So although the object's type is still shared, accessing direct fields of the object will have their memory barriers lifted. This is just a compiler optimization that doesn't affect semantics.

and:

> Sadly, this is a side effect of a simplistic handling of shared. Shared is more like "here be dragons, here's an ice cube in case one breaths fire on you". Nearly all protection / correctness verification are missing and left for D3 or beyond. Message passing lets shared aware code remain in the trusted code base...

This is speculation. Please stop it. We plan to define clean semantics for shared.


Andrei
January 04, 2010
On Jan 4, 2010, at 3:13 PM, Graham St Jack wrote:

> What is the plan to stop "shared" from spreading through all of an application's code?
> 
> My own preference is to adopt an approach that severely limits the number of shared objects, perhaps by using a keyword like "shareable that means: "this object is shared, but it is completely ok to call its synchronized methods from non-shared methods/objects".
> 
> This approach facilitates message-passing, and can very simply handle plenty of multi-threaded use-cases. A shareable object whose externally accessible methods are all synchronized and DO NOT accept or return any references to mutable data should be callable from any thread with complete safety, WITHOUT polluting the calling code with "shared".


I'd planned to ask this a bit later in the discussion, but I've been wondering if all methods need to be labeled as shared or synchronized or if only the public (and possibly protected) ones do?  If I have:

    class A
    {
        void fnA() synchronized { fnB(); } // 1
        void fnB() shared { fnC(); } // 2
        private void fnC() {}
    }

Since fnC() may only be accessed through A's public interface, all of which is ostensibly safe, does fnC() have to be synchronized or shared?  I can see an argument for not allowing case 2, but it does seem safe for fnC() to be called by fnA() at least.  I know D uses recursive locks so there's no functional barrier to making fnC() synchronized, but this seems like it could be horribly slow.  I suppose the compiler could elide additional lock_acquire() calls though, based on static analysis.

> Within such a shareable object, we can use low-level stuff like mutexes, semaphores and conditions to build the desired behaviour, wrapping it up and presenting a clean interface.

Since mutexes can override a class' monitor, it already works to do this:

    class A
    {
        this() {
            mut = new Mutex( this );
            cond = new Condition( mut );
        }

        void fnA() synchronized { // locks "mut"
            cond.wait(); // unlocks the lock acquired when fnA() was entered and blocks, etc.
        }

        Mutex mut; Condition cond;
    }

I'm hoping this library-level trick will be enough to allow most of the classic synchronization mechanisms to be used in D 2.0.

> Re synchronized containers - I don't like the idea at all. That is going down the path of having many shared objects, which is notoriously difficult to get right, especially for non-experts. IMO, shared objects should be small in number, and serve as boundaries between threads, which otherwise play in their own separate sand-pits.

A good pathological but functionally correct example is the Thread class in core.thread.  Thread instances should really be labeled as shared, but I know that when this happens the compiler will throw a fit.  The thing is, while some of the methods are already synchronized, there are quite a few which are not and don't need to be and neither do they need the variables they access to be made lock-free.  Even weirder is:

    class Thread
    {
        this( void function() fn ) { m_fn = fn; }
        void start() {
            thread_start( &threadFn, this );
        }
        private void run() { fn(); }
    }

    extern (C) void threadFn( void* x ) {
        Thread t = cast(Thread) x;
        t.run();
    }

Perhaps the shared-violating design here should simply be swept under the rug as TCB implementation details?

> Dare I say it - go's emphasis on channels is very appealing to me, and should be something that is easy to do in D - even though it is too limiting to have it as the only tool in the box.

Channels are a lot like Hoare's CSP model, correct?  ie. you define specific channels where threads may communicate and possibly even what data types may be passed?  This could be built on top of an Erlang-style API without too much fuss, and I've been aiming for the Erlang-style API as the foundation for message passing.  In short, I agree, but that's a conversation for a bit later :-)
January 04, 2010
I figure that the main way to do multithreaded programming in D will be with the message passing library that Sean is building. A user using that should never see shared. Shared is for those cases that cannot be done with message passing.

I think Go's channels are just message passing. The fundamental problem with it in Go is Go has no concept of immutability, so there's no way to guarantee safety.

Graham St Jack wrote:
> What is the plan to stop "shared" from spreading through all of an application's code?
>
> My own preference is to adopt an approach that severely limits the number of shared objects, perhaps by using a keyword like "shareable that means: "this object is shared, but it is completely ok to call its synchronized methods from non-shared methods/objects".
>
> This approach facilitates message-passing, and can very simply handle plenty of multi-threaded use-cases. A shareable object whose externally accessible methods are all synchronized and DO NOT accept or return any references to mutable data should be callable from any thread with complete safety, WITHOUT polluting the calling code with "shared".
>
> Within such a shareable object, we can use low-level stuff like mutexes, semaphores and conditions to build the desired behaviour, wrapping it up and presenting a clean interface.
>
> Re synchronized containers - I don't like the idea at all. That is going down the path of having many shared objects, which is notoriously difficult to get right, especially for non-experts. IMO, shared objects should be small in number, and serve as boundaries between threads, which otherwise play in their own separate sand-pits.
>
> Dare I say it - go's emphasis on channels is very appealing to me, and should be something that is easy to do in D - even though it is too limiting to have it as the only tool in the box.
>
> Thanks,
> Graham.
>
> Andrei Alexandrescu wrote:
>> This may be easiest to answer for people with extensive experience with Java's threading model. Consider the following D code:
>>
>> class A {
>>     void foo() synchronized;
>>     void bar() shared;
>>     void baz();
>> }
>>
>> If we have an object of type A, all three methods are callable. If we have an object of type shared(A), baz() is not callable. There is a remaining problem with foo() and bar(): the first uses lock-based synchronization, the second uses a sort of free threading (e.g. lock-free) that the compiler is unable to control. It follows that calls to foo() and bar() from different threads may easily create race conditions.
>>
>> I think the compiler should enforce that a class either defines synchronized methods, shared methods, but not both. So the class writer must decide upfront whether they use lock-based or lock-free threading with a given class. Does this make sense? Is it too restrictive?
>>
>> Walter asked - what if we take this one step further and force a class to define _all_ methods either synchronized, shared, or neither? In that case the keyword could be moved to the top:
>>
>> shared class A { ... all methods shared ... }
>> synchronized class B { ... all methods synchronized ... }
>> class C { ... shared/synchronized not allowed ... }
>>
>> Has experience with Java shown that it's best to design classes for concurrency in separation from single-threaded classes? Effective Java doesn't mention that. But then there are all these methods synchronizedCollection, synchronizedList etc. that wrap the corresponding collections (which probably don't have synchronized methods). I'm reading this up, but if there's someone on this list who could save us all some time, please chime in.
>>
>>
>> Thanks,
>>
>> Andrei
>> _______________________________________________
>> dmd-concurrency mailing list
>> dmd-concurrency at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
>
> _______________________________________________
> dmd-concurrency mailing list
> dmd-concurrency at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
>
>
January 04, 2010
Sean Kelly wrote:
> On Jan 4, 2010, at 9:00 AM, Andrei Alexandrescu wrote:
>> The only things that work on shared objects are:
>>
>> * calls to synchronized or shared methods, if any;
>>
>> * reading if the object is word-size or less;
>>
>> * writing if the object is word-size or less.
> 
> Cool!  It's perhaps a minor issue right now, but it would be nice if RMW operations could be performed via library functions.  Hopefully all that's required is to accept a "ref shared T" and then write ASM for the machinery from there?  ie. Is there any need for compiler changes to support this?

Yes, that's the plan. In fact I have proposed an even more Draconian plan: disallow even direct reads and writes to shared objects. To exact them, user code would have to invoke the intrinsics sharedRead and sharedWrite. Then it's very clear and easy to identify where barriers are inserted, and the semantics of the program is easily definable: the program preserves the sequences of calls to sharedRead and sharedWrite.

Consider your example:

shared int x;
...
++x;

The putative user notices that that doesn't work, so she's like, meh, I'll do this then:

int y = x;
++y;
x = y;

And the user remains with this impression that the D compiler is a bit dumb. Of course that doesn't avoid the race condition though. If the user would have to call atomicIncrement(x) that would be clearly an improvement, but even this would be an improvement:

int y = sharedRead(x);
++y;
sharedWrite(y, x);

When writing such code the user inevitably hits on the documentation for the two intrinsics, which clearly define their guarantees: only the sequence of sharedRead and sharedWrite is preserved. At that point, inspecting the code and understanding how it works is improved.

> So if I have:
> 
>     class A
>     {
>         void fn() shared { x = 5; }
>         int x;
>     }
> 
> Is this legal?  If the type of the object doesn't change then I'd guess that I won't be allowed to access non-shared fields inside a shared function?

Shared automatically propagates to fields, so typeof((new shared(A)).x) is shared int. Of course that's not the case right now; the typeof expression doesn't even compile :o).


Andrei
January 04, 2010
On Jan 4, 2010, at 3:58 PM, Andrei Alexandrescu wrote:

> Sean Kelly wrote:
>> On Jan 4, 2010, at 9:00 AM, Andrei Alexandrescu wrote:
>>> The only things that work on shared objects are:
>>> 
>>> * calls to synchronized or shared methods, if any;
>>> 
>>> * reading if the object is word-size or less;
>>> 
>>> * writing if the object is word-size or less.
>> Cool!  It's perhaps a minor issue right now, but it would be nice if RMW operations could be performed via library functions.  Hopefully all that's required is to accept a "ref shared T" and then write ASM for the machinery from there?  ie. Is there any need for compiler changes to support this?
> 
> Yes, that's the plan. In fact I have proposed an even more Draconian plan: disallow even direct reads and writes to shared objects. To exact them, user code would have to invoke the intrinsics sharedRead and sharedWrite. Then it's very clear and easy to identify where barriers are inserted, and the semantics of the program is easily definable: the program preserves the sequences of calls to sharedRead and sharedWrite.

You've mentioned this before, and I really like the idea.  This makes the atomic ops readily apparent, which seems like a good thing.  I guess this could mess with template functions a bit, but since you really need custom algorithms to do nearly anything safely with shared variables, this is probably a good thing as well.

> Consider your example:
> 
> shared int x;
> ...
> ++x;
> 
> The putative user notices that that doesn't work, so she's like, meh, I'll do this then:
> 
> int y = x;
> ++y;
> x = y;
> 
> And the user remains with this impression that the D compiler is a bit dumb.

Ack!  I hadn't thought of that.

> Of course that doesn't avoid the race condition though. If the user would have to call atomicIncrement(x) that would be clearly an improvement, but even this would be an improvement:
> 
> int y = sharedRead(x);
> ++y;
> sharedWrite(y, x);
> 
> When writing such code the user inevitably hits on the documentation for the two intrinsics, which clearly define their guarantees: only the sequence of sharedRead and sharedWrite is preserved. At that point, inspecting the code and understanding how it works is improved.

Exactly.  Once they're in the module, one would hope that they'll notice the sharedIncrement() routine as well.

>> So if I have:
>>    class A
>>    {
>>        void fn() shared { x = 5; }
>>        int x;
>>    }
>> Is this legal?  If the type of the object doesn't change then I'd guess that I won't be allowed to access non-shared fields inside a shared function?
> 
> Shared automatically propagates to fields, so typeof((new shared(A)).x) is shared int. Of course that's not the case right now; the typeof expression doesn't even compile :o).

Hm... but what if fn() were synchronized instead of shared?  Making x shared in that instance seems wasteful.  I had thought that perhaps a shared function would simply only be allowed to access shared variables, and possibly call synchronized functions:

    class A {
        void fnA() shared { x = 5; } // ok, x is shared
        void fnB() shared { y = 5; } // not ok, y is not shared
        void fnC() synchronized { y = 5; } // ok, non-shared ops are ok if synchronized
        shared int x;
        int y;
    }
January 04, 2010
Sean Kelly wrote:
>>> So if I have:
>>>    class A
>>>    {
>>>        void fn() shared { x = 5; }
>>>        int x;
>>>    }
>>> Is this legal?  If the type of the object doesn't change then I'd guess that I won't be allowed to access non-shared fields inside a shared function?
>> Shared automatically propagates to fields, so typeof((new shared(A)).x) is shared int. Of course that's not the case right now; the typeof expression doesn't even compile :o).
> 
> Hm... but what if fn() were synchronized instead of shared?  Making x shared in that instance seems wasteful.  I had thought that perhaps a shared function would simply only be allowed to access shared variables, and possibly call synchronized functions:
> 
>     class A {
>         void fnA() shared { x = 5; } // ok, x is shared
>         void fnB() shared { y = 5; } // not ok, y is not shared
>         void fnC() synchronized { y = 5; } // ok, non-shared ops are ok if synchronized
>         shared int x;
>         int y;
>     }

Aha! You've just discovered the tail-shared exemption: inside a synchronized method, direct fields can be accessed without barriers (neither implicit or explicit) although technically their type is still shared. Fortunately the compiler has all the information it needs to elide those barriers.


Andrei

January 05, 2010
Walter Bright wrote:
> I figure that the main way to do multithreaded programming in D will be with the message passing library that Sean is building. A user using that should never see shared. Shared is for those cases that cannot be done with message passing.
That's great, and I agree completely.

However, we need the tools to be able to write libraries of our own without any special back-door tricks. Therefore, we need a way to create shared objects that provide safe touching-points between threads, without seeing shared pop up everywhere.
>
> I think Go's channels are just message passing. The fundamental problem with it in Go is Go has no concept of immutability, so there's no way to guarantee safety.
Agreed - immutable is essential.
January 04, 2010
On Jan 4, 2010, at 4:12 PM, Andrei Alexandrescu wrote:

> Sean Kelly wrote:
>>>> So if I have:
>>>>   class A
>>>>   {
>>>>       void fn() shared { x = 5; }
>>>>       int x;
>>>>   }
>>>> Is this legal?  If the type of the object doesn't change then I'd guess that I won't be allowed to access non-shared fields inside a shared function?
>>> Shared automatically propagates to fields, so typeof((new shared(A)).x) is shared int. Of course that's not the case right now; the typeof expression doesn't even compile :o).
>> Hm... but what if fn() were synchronized instead of shared?  Making x shared in that instance seems wasteful.  I had thought that perhaps a shared function would simply only be allowed to access shared variables, and possibly call synchronized functions:
>>    class A {
>>        void fnA() shared { x = 5; } // ok, x is shared
>>        void fnB() shared { y = 5; } // not ok, y is not shared
>>        void fnC() synchronized { y = 5; } // ok, non-shared ops are ok if synchronized
>>        shared int x;
>>        int y;
>>    }
> 
> Aha! You've just discovered the tail-shared exemption: inside a synchronized method, direct fields can be accessed without barriers (neither implicit or explicit) although technically their type is still shared. Fortunately the compiler has all the information it needs to elide those barriers.

Oh, I see.  I can't decide if it's weird that this optimization means that the sharedRead() and sharedWrite() functions wouldn't be necessary, but I'm leaning towards saying that it's actually a good thing since the changed behavior is obvious.

The only catch with the approach above (and you've mentioned this before) is:

    class A {
        void fnA() shared { x = 5; }
        void fnB() synchronized { x = 6; }
        int x;
    }

I had thought that explicitly labeling variables as shared would sidestep this by requiring ops on the vars to always be atomic.  An alternative (as you've said before) would be to not allow shared and synchronized methods to both be used in a class, but it's pretty common that I'll want to do something like this:

    class A {
        void fnA() synchronized { sharedWrite( flag, true ); }
        void fnB() shared { return sharedRead( flag ); }
        shared flag;
    }

Maybe my explicitly declaring flag as shared somehow provides an exemption?  Or did you have another idea for a way around this?
January 04, 2010
On Jan 4, 2010, at 6:45 PM, Andrei Alexandrescu <andrei at erdani.com> wrote:

> Jason House wrote:
>> On Jan 4, 2010, at 12:00 PM, Andrei Alexandrescu <andrei at erdani.com> wrote:
>>> ... subject to the "tail-shared" exemption that I'll discuss at a later point.
>> I wish you'd stop giving teasers like that. It feels like we can't have a discussion because a) you haven't tried to share your perspective b) you're too busy to have the conversation anyway I'm probably way off with my impression...
>
> In the words of Wolf in Pulp Fiction: "If I'm curt it's because time
> is
> of the essence".
>
> The tail-shared exemption is very simple and deducible so I didn't
> think
> it was necessary to give that detail at this point: inside a
> synchronized method, we know the current object is locked but the
> indirectly-accessed memory is not. So although the object's type is
> still shared, accessing direct fields of the object will have their
> memory barriers lifted. This is just a compiler optimization that
> doesn't affect semantics.

You're right that it is easy. I think I assumed too much from the name.

Even so, this style of optimization doesn't necessarilly align very well with user intention. My best example of this is D's garbage collector. It uses a single lick for far more data than just head access to "this".

Actually, when I think about it, the optimization you mention is sometimes incorrect. See below.



>
> and:
>
>> Sadly, this is a side effect of a simplistic handling of shared. Shared is more like "here be dragons, here's an ice cube in case one breaths fire on you". Nearly all protection / correctness verification are missing and left for D3 or beyond. Message passing lets shared aware code remain in the trusted code base...
>
> This is speculation. Please stop it. We plan to define clean semantics for shared.


I don't think it's speculation. I'll try to list a few things I
consider to be supporting facts:
1. Shared does not encode which lock should be held when accessing
data. There are 3 big categories here: lock-free, locked by monitor
for "this", and locked by something else.
2. Shared means that there might be simultaneous attempts to use the
data. The compiler can't infer which objects are intended to use the
same lock, and  can't optimize away fences. Similarly, the compiler
can't detect deviations from programmer intent.
3. All shared data is a candidate for lock-free access. The compiler
can't detect which objects the programmer intended to be lock-free and
can't detect use of lock-free variables in a lock is held manner.
4. Variables that the programmer intended to only be accessed when a
specific lock is held can't be inferred by the compiler.  Because of
this, the compiler can't detect any failure to lock scenarios.
5. Certain lock-free variables always require sequential consistency
(and the compiler can't/doesn't infer this). This is what I referred
to above. This can be important if there are shared but not
synchronized methods, or some other code uses a member variable in a
lock-free (or incorrectly locked) manner. Again, the compiler doesn't
detect such distributed accesses and so the optimization you mentioned
is invalid.
6. There are no known ways to validate lock-free logic, and the
current shared design implicitly allows it everywhere.

Does that help? Currently, shared only facilitates segregating race- free thread-local data/logic from the (here be dragons) world of shared access which lacks all verification of proper locking. Even Bartosz's more complex scheme has a here-be-dragons subset that he labeled with lock-free... I can back that last part up if it also feels like speculation.
>
January 04, 2010
Sean Kelly wrote:
> On Jan 4, 2010, at 4:12 PM, Andrei Alexandrescu wrote:
> 
>> Sean Kelly wrote:
>>>>> So if I have:
>>>>>   class A
>>>>>   {
>>>>>       void fn() shared { x = 5; }
>>>>>       int x;
>>>>>   }
>>>>> Is this legal?  If the type of the object doesn't change then I'd guess that I won't be allowed to access non-shared fields inside a shared function?
>>>> Shared automatically propagates to fields, so typeof((new shared(A)).x) is shared int. Of course that's not the case right now; the typeof expression doesn't even compile :o).
>>> Hm... but what if fn() were synchronized instead of shared?  Making x shared in that instance seems wasteful.  I had thought that perhaps a shared function would simply only be allowed to access shared variables, and possibly call synchronized functions:
>>>    class A {
>>>        void fnA() shared { x = 5; } // ok, x is shared
>>>        void fnB() shared { y = 5; } // not ok, y is not shared
>>>        void fnC() synchronized { y = 5; } // ok, non-shared ops are ok if synchronized
>>>        shared int x;
>>>        int y;
>>>    }
>> Aha! You've just discovered the tail-shared exemption: inside a synchronized method, direct fields can be accessed without barriers (neither implicit or explicit) although technically their type is still shared. Fortunately the compiler has all the information it needs to elide those barriers.
> 
> Oh, I see.  I can't decide if it's weird that this optimization means that the sharedRead() and sharedWrite() functions wouldn't be necessary, but I'm leaning towards saying that it's actually a good thing since the changed behavior is obvious.

Same here. All - please advise.

> The only catch with the approach above (and you've mentioned this before) is:
> 
>     class A {
>         void fnA() shared { x = 5; }
>         void fnB() synchronized { x = 6; }
>         int x;
>     }
> 
> I had thought that explicitly labeling variables as shared would sidestep this by requiring ops on the vars to always be atomic.  An alternative (as you've said before) would be to not allow shared and synchronized methods to both be used in a class, but it's pretty common that I'll want to do something like this:
> 
>     class A {
>         void fnA() synchronized { sharedWrite( flag, true ); }
>         void fnB() shared { return sharedRead( flag ); }
>         shared flag;
>     }
> 
> Maybe my explicitly declaring flag as shared somehow provides an exemption?  Or did you have another idea for a way around this?

Hmmm... if a field is shared in a non-shared object, it means you're not using object's own lock to control access to that field. (You can e.g. assume that other threads have the address of that field.) So that field, inside a synchronized method, will not benefit of the tail-shared exemption and will be subjected to all limitations specific to e.g. a shared global bool.


Andrei