View mode: basic / threaded / horizontal-split · Log in · Help
June 11, 2012
Re: valid uses of shared
On Mon, 11 Jun 2012 09:39:40 -0400, Artur Skawina <art.08.09@gmail.com>  
wrote:

> On 06/11/12 14:07, Steven Schveighoffer wrote:
>> However, allocating another heap block to do sharing, in my opinion, is  
>> worth the extra cost.  This way, you have clearly separated what is  
>> shared and what isn't.
>>
>> You can always cast to get around the limitations.
>
> "clearly separating what is shared and what isn't" *is* exactly what
> tagging the data with 'shared' does.


I posted a response, it showed up in the online forums, but for some  
reason didn't show up in my nntp client...

If you missed it, it is here.

http://forum.dlang.org/post/op.wfqtz5u0eav7ka@steves-laptop

-Steve
June 11, 2012
Re: valid uses of shared
On 06/11/12 16:57, Steven Schveighoffer wrote:
> On Mon, 11 Jun 2012 09:39:40 -0400, Artur Skawina <art.08.09@gmail.com> wrote:
> 
>> On 06/11/12 14:07, Steven Schveighoffer wrote:
>>> However, allocating another heap block to do sharing, in my opinion, is worth the extra cost.  This way, you have clearly separated what is shared and what isn't.
>>>
>>> You can always cast to get around the limitations.
>>
>> "clearly separating what is shared and what isn't" *is* exactly what
>> tagging the data with 'shared' does.
> 
> 
> I posted a response, it showed up in the online forums, but for some reason didn't show up in my nntp client...
> 
> If you missed it, it is here.
> 
> http://forum.dlang.org/post/op.wfqtz5u0eav7ka@steves-laptop

The mailing list delivered it too.

I'm against disallowing things that are not unsafe as such and have valid
use cases, so we will probably not agree about that.

I considered the GC/mempool implications before arguing for allowing 'shared'
fields inside unshared aggregates - the compiler has enough knowledge to pick
the right pool, if it ever decides to treat "local" data differently. I'm not
sure doing that would be good idea, in cases where the lifetime of an object
cannot be determined statically. But deciding to use a global pool can always
be done by checking if a shared field exists.

artur
June 11, 2012
Re: valid uses of shared
On Mon, 11 Jun 2012 09:41:37 -0400, Artur Skawina <art.08.09@gmail.com>  
wrote:

> On 06/11/12 14:11, Steven Schveighoffer wrote:
>> On Mon, 11 Jun 2012 07:56:12 -0400, Artur Skawina <art.08.09@gmail.com>  
>> wrote:
>>
>>> On 06/11/12 12:35, Steven Schveighoffer wrote:
>>
>>>> I wholly disagree.  In fact, keeping the full qualifier intact  
>>>> *enforces* incorrect code, because you are forcing shared semantics  
>>>> on literally unshared data.
>>>>
>>>> Never would this start ignoring shared on data that is truly shared.   
>>>> This is why I don't really get your argument.
>>>>
>>>> If you could perhaps explain with an example, it might be helpful.
>>>
>>> *The programmer* can then treat shared data just like unshared.  
>>> Because every
>>> load and every store will "magically" work. I'm afraid that after more  
>>> than
>>> two or three people touch the code, the chances of it being correct  
>>> would be
>>> less than 50%...
>>> The fact that you can not (or shouldn't be able to) mix shared and  
>>> unshared
>>> freely is one of the main advantages of shared-annotation.
>>
>> If shared variables aren't doing the right thing with loads and stores,  
>> then we should fix that.
>
> Where do you draw the line?
>
> shared struct S {
>    int i
>    void* p;
>    SomeStruct s;
>    ubyte[256] a;
> }
>
> shared(S)* p = ... ;
>
> auto v1 = p.i;
> auto v2 = p.p;
> auto v3 = p.s;
> auto v4 = p.a;
> auto v5 = p.i++;
>
> Are these operations on shared data all safe? Note that if these
> accesses would be protected by some lock, then the 'shared' qualifier
> wouldn't really be needed - compiler barriers, that make sure it all
> happens while this thread holds the lock, would be enough. (even the
> order of operations doesn't usually matter in that case and enforcing
> one would in fact add overhead)

No, they should not be all safe, I never suggested that.  It's impossible  
to engineer a one-size-fits-all for accessing shared variables, because it  
doesn't know what mechanism you are going to use to protect it.  As you  
say, once this data is protected by a lock, memory barriers aren't  
needed.  But requiring a lock is too heavy handed for all cases.  This is  
a good point to make about the current memory-barrier attempts, they just  
aren't comprehensive enough, nor do they guarantee pretty much anything  
except simple loads and stores.

Perhaps the correct way to implement shared semantics is to not allow  
access *whatsoever* (except taking the address of a shared piece of data),  
unless you:

a) lock the block that contains it
b) use some library feature that uses casting-away of shared to accomplish  
the correct thing.  For example, atomicOp.

None of this can prevent deadlocks, but it does create a way to prevent  
deadlocks.

If this was the case, stack data would be able to be marked shared, and  
you'd have to use option b (it would not be in a block).  Perhaps for  
simple data types, when memory barriers truly are enough, and a  
shared(int) is on the stack (and not part of a container), straight loads  
and stores would be allowed.

Now, would you agree that:

auto v1 = synchronized p.i;

might be a valid mechanism?  In other words, assuming p is lockable,  
synchronized p.i locks p, then reads i, then unlocks p, and the result  
type is unshared?

Also, inside synchronized(p), p becomes tail-shared, meaning all data  
contained in p is unshared, all data referred to by p remains shared.

In this case, we'd need a new type constructor (e.g. locked) to formalize  
the type.

Make sense?

-Steve
June 11, 2012
Re: valid uses of shared
>> Are these operations on shared data all safe? Note that if these
>> accesses would be protected by some lock, then the 'shared' qualifier
>> wouldn't really be needed - compiler barriers, that make sure it all
>> happens while this thread holds the lock, would be enough. (even the
>> order of operations doesn't usually matter in that case and enforcing
>> one would in fact add overhead)
>
> No, they should not be all safe, I never suggested that. It's impossible
> to engineer a one-size-fits-all for accessing shared variables, because
> it doesn't know what mechanism you are going to use to protect it. As
> you say, once this data is protected by a lock, memory barriers aren't
> needed. But requiring a lock is too heavy handed for all cases. This is
> a good point to make about the current memory-barrier attempts, they
> just aren't comprehensive enough, nor do they guarantee pretty much
> anything except simple loads and stores.
>
> Perhaps the correct way to implement shared semantics is to not allow
> access *whatsoever* (except taking the address of a shared piece of
> data), unless you:
>
> a) lock the block that contains it
> b) use some library feature that uses casting-away of shared to
> accomplish the correct thing. For example, atomicOp.
>
It may be a good idea. Though I half-expect reads and writes to be 
atomic. Yet things like this are funky trap:
shread int x; //global
...
x = x + func();
//Booom! read-modify-write and not atomic, should have used x+= func()

So a-b set of rules could be more reasonable then it seems.

> None of this can prevent deadlocks, but it does create a way to prevent
> deadlocks.
>
> If this was the case, stack data would be able to be marked shared, and
> you'd have to use option b (it would not be in a block). Perhaps for
> simple data types, when memory barriers truly are enough, and a
> shared(int) is on the stack (and not part of a container), straight
> loads and stores would be allowed.
>
> Now, would you agree that:
>
> auto v1 = synchronized p.i;
>
> might be a valid mechanism? In other words, assuming p is lockable,
> synchronized p.i locks p, then reads i, then unlocks p, and the result
> type is unshared?
>
> Also, inside synchronized(p), p becomes tail-shared, meaning all data
> contained in p is unshared, all data referred to by p remains shared.
>
> In this case, we'd need a new type constructor (e.g. locked) to
> formalize the type.
>
> Make sense?
>

While I've missed a good portion of this thread I think we should 
explore this direction. Shared has to be connected with locks/synchronized.

-- 
Dmitry Olshansky
June 11, 2012
Re: valid uses of shared
On Mon, 11 Jun 2012 13:42:37 -0400, Dmitry Olshansky  
<dmitry.olsh@gmail.com> wrote:

>> a) lock the block that contains it
>> b) use some library feature that uses casting-away of shared to
>> accomplish the correct thing. For example, atomicOp.
>>
> It may be a good idea. Though I half-expect reads and writes to be  
> atomic. Yet things like this are funky trap:
> shread int x; //global
> ...
> x = x + func();
> //Booom! read-modify-write and not atomic, should have used x+= func()

We cannot prevent data races such as these (though we may be able to  
disable specific cases like this), since you can always split out this  
expression into multiple valid ones.  Also, you can hide details in  
functions:

x = func(x);

But we can say that you cannot *read or write* a shared variable  
non-atomically.  That is a goal I think is achievable by the type system  
and the language.  That arguably has no real-world value, ever, whereas  
the above may be valid in some cases (maybe you know more semantically  
about the application than the compiler can glean).

>
> While I've missed a good portion of this thread I think we should  
> explore this direction. Shared has to be connected with  
> locks/synchronized.

Yes, I agree.  If shared and synchronized are not connected somehow, the  
point of both seems rather lost.

As this was mostly a brainstorming post, I'll restate what I think as a  
reply to the original post, since my views have definitely changed.

-Steve
June 11, 2012
Re: valid uses of shared
On 06/11/12 19:27, Steven Schveighoffer wrote:
> On Mon, 11 Jun 2012 09:41:37 -0400, Artur Skawina <art.08.09@gmail.com> wrote:
> 
>> On 06/11/12 14:11, Steven Schveighoffer wrote:
>>> On Mon, 11 Jun 2012 07:56:12 -0400, Artur Skawina <art.08.09@gmail.com> wrote:
>>>
>>>> On 06/11/12 12:35, Steven Schveighoffer wrote:
>>>
>>>>> I wholly disagree.  In fact, keeping the full qualifier intact *enforces* incorrect code, because you are forcing shared semantics on literally unshared data.
>>>>>
>>>>> Never would this start ignoring shared on data that is truly shared.  This is why I don't really get your argument.
>>>>>
>>>>> If you could perhaps explain with an example, it might be helpful.
>>>>
>>>> *The programmer* can then treat shared data just like unshared. Because every
>>>> load and every store will "magically" work. I'm afraid that after more than
>>>> two or three people touch the code, the chances of it being correct would be
>>>> less than 50%...
>>>> The fact that you can not (or shouldn't be able to) mix shared and unshared
>>>> freely is one of the main advantages of shared-annotation.
>>>
>>> If shared variables aren't doing the right thing with loads and stores, then we should fix that.
>>
>> Where do you draw the line?
>>
>> shared struct S {
>>    int i
>>    void* p;
>>    SomeStruct s;
>>    ubyte[256] a;
>> }
>>
>> shared(S)* p = ... ;
>>
>> auto v1 = p.i;
>> auto v2 = p.p;
>> auto v3 = p.s;
>> auto v4 = p.a;
>> auto v5 = p.i++;
>>
>> Are these operations on shared data all safe? Note that if these
>> accesses would be protected by some lock, then the 'shared' qualifier
>> wouldn't really be needed - compiler barriers, that make sure it all
>> happens while this thread holds the lock, would be enough. (even the
>> order of operations doesn't usually matter in that case and enforcing
>> one would in fact add overhead)
> 
> No, they should not be all safe, I never suggested that.  It's impossible to engineer a one-size-fits-all for accessing shared variables, because it doesn't know what mechanism you are going to use to protect it.  As you say, once this data is protected by a lock, memory barriers aren't needed.  But requiring a lock is too heavy handed for all cases.  This is a good point to make about the current memory-barrier attempts, they just aren't comprehensive enough, nor do they guarantee pretty much anything except simple loads and stores.
> 
> Perhaps the correct way to implement shared semantics is to not allow access *whatsoever* (except taking the address of a shared piece of data), unless you:
> 
> a) lock the block that contains it
> b) use some library feature that uses casting-away of shared to accomplish the correct thing.  For example, atomicOp.

Exactly; this is what I'm after the whole time. And I think it can be done
in most cases without casting away shared. For example by allowing the safe
conversions from/to shared of results of expression involving shared data,
but only under certain circumstances. Eg in methods with a shared 'this'. 


> None of this can prevent deadlocks, but it does create a way to prevent deadlocks.
> 
> If this was the case, stack data would be able to be marked shared, and you'd have to use option b (it would not be in a block).  Perhaps for simple data types, when memory barriers truly are enough, and a shared(int) is on the stack (and not part of a container), straight loads and stores would be allowed.

Why? Consider the case of function that directly or indirectly launches a few
threads and gives them the address of some local shared object. If the current
thread also accesses this object, which has to be possible, then it must obey
the same rules.


> Now, would you agree that:
> 
> auto v1 = synchronized p.i;
> 
> might be a valid mechanism?  In other words, assuming p is lockable, synchronized p.i locks p, then reads i, then unlocks p, and the result type is unshared?

I think I would prefer

  auto v1 = synchronized(p).i;

ie for the synchronized expression to lock the object, return an unshared
reference, and the object be unlocked once this ref goes away. RLII. ;)

Which would then also allow for

  {
     auto unshared_p = synchronized(p);
     auto v1 = unshared_p.i;
     auto v2 = unshared_p.p;
     // etc
  }

and with a little more syntax sugar it could turn into

  synchronized (unshared_p = p) {
     auto v1 = unshared_p.i;
     auto v2 = unshared_p.p;
     // etc
  }


The problem with this is that it only unshares the head, which
I think isn't enough. Hmm. One approach would be to allow

  shared struct S {
     ubyte* data; 
     AStruct *s1;
     shared AnotherStruct *s2;
     shared S* next;
  }

and for synchronized(s){} to drop 'shared' from any field that
isn't also marked as shared. IOW treat any 'unshared' field as
owned by the object. (an alternative could be to tag the fields
that should be unshared instead)

> Also, inside synchronized(p), p becomes tail-shared, meaning all data contained in p is unshared, all data referred to by p remains shared.
> 
> In this case, we'd need a new type constructor (e.g. locked) to formalize the type.

I should have read to the end i guess. :)

You mean something like I described above, only done by mutating
the type of 'p'? That might work too.

But I need to think about this some more.

Why would we need 'locked'?


> Make sense?

More and more.

artur
June 11, 2012
Re: valid uses of shared
On Mon, 11 Jun 2012 15:23:56 -0400, Artur Skawina <art.08.09@gmail.com>  
wrote:

> On 06/11/12 19:27, Steven Schveighoffer wrote:

>> Perhaps the correct way to implement shared semantics is to not allow  
>> access *whatsoever* (except taking the address of a shared piece of  
>> data), unless you:
>>
>> a) lock the block that contains it
>> b) use some library feature that uses casting-away of shared to  
>> accomplish the correct thing.  For example, atomicOp.
>
> Exactly; this is what I'm after the whole time. And I think it can be  
> done
> in most cases without casting away shared. For example by allowing the  
> safe
> conversions from/to shared of results of expression involving shared  
> data,
> but only under certain circumstances. Eg in methods with a shared 'this'.

Good, I'm glad we are starting to come together.

>> None of this can prevent deadlocks, but it does create a way to prevent  
>> deadlocks.
>>
>> If this was the case, stack data would be able to be marked shared, and  
>> you'd have to use option b (it would not be in a block).  Perhaps for  
>> simple data types, when memory barriers truly are enough, and a  
>> shared(int) is on the stack (and not part of a container), straight  
>> loads and stores would be allowed.
>
> Why? Consider the case of function that directly or indirectly launches  
> a few
> threads and gives them the address of some local shared object. If the  
> current
> thread also accesses this object, which has to be possible, then it must  
> obey
> the same rules.

I think this is possible for what I prescribed.  You need a special  
construct for locking and using shared data on the stack (for instance  
Lockable!S).

Another possible option is to consider the stack frame as the "container",  
and if it contains any shared data, put in a hidden mutex.

In order to do this correctly, we need a way to hook synchronized properly  
from library code.

>> Now, would you agree that:
>>
>> auto v1 = synchronized p.i;
>>
>> might be a valid mechanism?  In other words, assuming p is lockable,  
>> synchronized p.i locks p, then reads i, then unlocks p, and the result  
>> type is unshared?
>
> I think I would prefer
>
>    auto v1 = synchronized(p).i;

This kind of makes synchronized a type constructor, which it is not.

> ie for the synchronized expression to lock the object, return an unshared
> reference, and the object be unlocked once this ref goes away. RLII. ;)
>
> Which would then also allow for
>
>    {
>       auto unshared_p = synchronized(p);
>       auto v1 = unshared_p.i;
>       auto v2 = unshared_p.p;
>       // etc
>    }

I think this can be done, but I would not want to use synchronized.  One  
of the main benefits of synchronized is it's a block attribute, not a type  
attribute.  So you can't actually abuse it.

The locked type I specify below might fit the bill.  But it would have to  
be hard-tied to the block.  In other words, we would have to make *very*  
certain it would not escape the block.  Kind of like inout.

>> Also, inside synchronized(p), p becomes tail-shared, meaning all data  
>> contained in p is unshared, all data referred to by p remains shared.
>>
>> In this case, we'd need a new type constructor (e.g. locked) to  
>> formalize the type.
>
> I should have read to the end i guess. :)
>
> You mean something like I described above, only done by mutating
> the type of 'p'? That might work too.

Right, any accesses to p *inside* the block "magically" become locked(S)  
instead of shared(S).  We have to make certain locked(S) instances cannot  
escape, and we already do something like this with inout -- just don't  
allow members or static variables to be typed as locked(T).

I like replacing the symbol because then it doesn't allow you access to  
the outer symbol (although you can get around this, it should be made  
difficult).  As long as the locks are reentrant, it shouldn't pose a large  
problem, but obviously you should try and avoid locking the same data over  
and over again.

One interesting thing: synchronized methods now would mark this as  
locked(typeof(this)) instead of typeof(this).  So you can *avoid* the  
locking and unlocking code while calling member functions, while  
preserving it for the first call.

This is important -- you don't want to escape a reference to the unlocked  
type somewhere.

-Steve
June 11, 2012
Re: valid uses of shared
On 06/11/12 22:21, Steven Schveighoffer wrote:
>>> Now, would you agree that:
>>>
>>> auto v1 = synchronized p.i;
>>>
>>> might be a valid mechanism?  In other words, assuming p is lockable, synchronized p.i locks p, then reads i, then unlocks p, and the result type is unshared?
>>
>> I think I would prefer
>>
>>    auto v1 = synchronized(p).i;
> 
> This kind of makes synchronized a type constructor, which it is not.

Yes; the suggestion was to also allow synchronized /expressions/, in addition
to statements. 


>> ie for the synchronized expression to lock the object, return an unshared
>> reference, and the object be unlocked once this ref goes away. RLII. ;)
>>
>> Which would then also allow for
>>
>>    {
>>       auto unshared_p = synchronized(p);
>>       auto v1 = unshared_p.i;
>>       auto v2 = unshared_p.p;
>>       // etc
>>    }
> 
> I think this can be done, but I would not want to use synchronized.  One of the main benefits of synchronized is it's a block attribute, not a type attribute.  So you can't actually abuse it.

There's a precedent, mixin expressions.

However, there's no need to invent new constructs, as this already works:

   {
      auto unshared_p = p.locked;
      auto v1 = unshared_p.i;
      auto v2 = unshared_p.p;
      // etc
   }

and does not require compiler or language changes.

I'm using this idiom with mutexes and semaphores; the 'locked' implementation
is *extremely* fragile, it's very easy to confuse the compiler, which then
spits out nonsensical error messages and refuses to cooperate. But the above
should already be possible, only the return type could be problematic; keeping
'p' opaque would be best. I'll play with this when I find some time.

But 'synchronized' and 'shared' are really two different things, I probably
shouldn't have used your original example as a base, as it only added to the
confusion, sorry.

'synchronized' allows you to implement critical sections.
'shared' is just a way to mark some data as needing special treatment.

If all accesses to an object are protected by 'synchronized', either
explicitly or implicitly (by using a struct or class marked as
synchronized) then you don't need to mark the data as 'shared' at all.
It would be pointless - the thread that owns the lock also owns the
data.

'shared' is what lets you implement the locking primitives used by
synchronized and various lock-free schemes. (right now 'shared' alone
isn't powerful enough, yes)

You can use one or the other, sometimes even both, but they are not
directly tied to each other. So there's no need for 'synchronized'
to unshare anything, at least not in the simple mutex case.
Accessing objects both with and without holding a lock is extremely
rare.


> The locked type I specify below might fit the bill.  But it would have to be hard-tied to the block.  In other words, we would have to make *very* certain it would not escape the block.  Kind of like inout.     

   void f(scope S*);
   ...
   {
      auto locked_p = p.locked;
      f(locked_p.s);
   }

Requiring the signature to be 'void f(locked S*);' would not be a good
idea; this must continue to work and introducing another type would
exclude all code not specifically written with it in mind, like
practically all libraries.


> This is important -- you don't want to escape a reference to the unlocked type somewhere.

Yes, but it needs another solution. 'scope' might be enough, but right
now we'd have to trust the programmer completely...

(It's about not leaking refs to *inside* the locked object, not just
'p' (or 'locked_p') itself)

artur
June 11, 2012
Re: valid uses of shared
On 06/12/12 00:00, Artur Skawina wrote:
> On 06/11/12 22:21, Steven Schveighoffer wrote:
>>>> Now, would you agree that:
>>>>
>>>> auto v1 = synchronized p.i;
>>>>
>>>> might be a valid mechanism?  In other words, assuming p is lockable, synchronized p.i locks p, then reads i, then unlocks p, and the result type is unshared?

What I think you want is relatively simple, something like this:

  struct synchronized(m) S {
     int i;
     void *p;
     Mutex m;
  }

and then for S to be completely opaque, unless inside a synchronized
statement. So

  S* s = ...
  auto v1 = s.i; // "Error: access to 's.i' requires synchronization"
  synchronized (s) {
     auto v2 = s.i;
     // ...
  }
  auto v3 = s.p; // "Error: access to 's.p' requires synchronization"

and there's no 'shared' involved at all. 

Provided that no reference to a locked 's' can escape this should be
enough to solve this problem.

Preventing the leaks while not unnecessarily restricting what can be done
inside the synchronized block would be a different problem. The obvious
solution would be to treat all refs gotten from or via 's' as scoped (and
trust the programmer; with time the enforcing can be improved), but
sometimes you will actually want to remove objects from a synchronized
container - so that must be possible too.

artur
June 12, 2012
Re: valid uses of shared
Le 08/06/2012 01:51, Steven Schveighoffer a écrit :
> 2. shared value types.

2. You can have value type on heap. Or value types that point to shared 
data.
1 2 3 4 5
Top | Discussion index | About this forum | D home