| Thread overview | ||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
January 02, 2015 What exactly shared means? | ||||
|---|---|---|---|---|
| ||||
I always think that shared should be use to make variable global across threads (similar to __gshared) with some synchronize protection. But this code doesn't work (app is stuck on _aaGetX or _aaRehash ):
shared double[size_t] logsA;
void main() {
auto logs = new double[1_000_000];
foreach(i, ref elem; parallel(logs, 4)) {
elem = log(i + 1.0);
logsA[i]= elem;
}
}
But when I add synchronized block it is OK:
shared double[size_t] logsA;
void main() {
auto logs = new double[1_000_000];
foreach(i, ref elem; parallel(logs, 4)) {
elem = log(i + 1.0);
synchronized {
logsA[i]= elem;
}
}
}
| ||||
January 02, 2015 Re: What exactly shared means? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Kozak | On Friday, 2 January 2015 at 11:47:47 UTC, Daniel Kozak wrote: > I always think that shared should be use to make variable global across threads (similar to __gshared) with some synchronize protection. But this code doesn't work (app is stuck on _aaGetX or _aaRehash ): > > > But when I add synchronized block it is OK: > I am not aware of any changes since the following thread (see the second post): http://forum.dlang.org/thread/brpbjefcgauuzguyiiwr@forum.dlang.org#post-mailman.679.1336909909.24740.digitalmars-d-learn:40puremagic.com So AFAIK "shared" is currently nothing more than a compiler hint (despite the documentation suggesting otherwise (Second to last paragraph of "__gshared" doc compares it to "shared", see http://dlang.org/attribute.html). My current understanding is that you either use "__gshared" and do your own synchronisation, or you use thread local storage, i.e. do not use "shared", but I would be happy to be proven wrong on that point. BTW, you can use the following search to find more information (It is was I used to find the above linked thread): https://www.google.com/?#q=site:forum.dlang.org+shared | |||
January 02, 2015 Re: What exactly shared means? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Daniel Kozak | On Friday, January 02, 2015 11:47:46 Daniel Kozak via Digitalmars-d-learn wrote:
> I always think that shared should be use to make variable global across threads (similar to __gshared) with some synchronize protection. But this code doesn't work (app is stuck on _aaGetX or _aaRehash ):
>
> shared double[size_t] logsA;
>
> void main() {
>
> auto logs = new double[1_000_000];
>
> foreach(i, ref elem; parallel(logs, 4)) {
> elem = log(i + 1.0);
> logsA[i]= elem;
> }
> }
>
>
> But when I add synchronized block it is OK:
>
> shared double[size_t] logsA;
>
> void main() {
>
> auto logs = new double[1_000_000];
>
> foreach(i, ref elem; parallel(logs, 4)) {
> elem = log(i + 1.0);
> synchronized {
> logsA[i]= elem;
> }
> }
> }
Objects in D default to being thread-local. __gshared and shared both make it so that they're not thread-local. __gshared does it without actually changing the type, making it easier to use but also dangerous to use, because it makes it easy to violate the compiler's guarantees, because it'll treat it like a thread-local variable with regards to optimizations and whatnot. It's really only meant for use with C global variable declarations, but plenty of folks end up using it for more, because it avoids having the compiler complain at them like it does with shared. Regardless, if you use __gshared, you need to make sure that you protect it against being accessed by multiple threads at once using mutexes or synchronized blocks or whatnot.
shared does not add any more synchronization or automatic mutex-locking or anything like that than __gshared does (IIRC, there is some talk in TDPL about shared adding memory barriers - which __gshared wouldn't do - but that hasn't been implemented and probably never would be, because it would be too expensive with regards to efficiency). However, unlike __gshared, shared _does_ alter the type of the variable, so the compiler will treat it differently. That way, it won't do stuff like optimize code under the assumption that the object is thread-local like it can do with non-shared objects. It makes it clear which objects are thread-local and which aren't and enforces that with the type system. In principle, this is great, since it clearly separates thread-local and non-thread local objects and protects you against treating a non-thread local object as if it were thread-local. And as long as you're writing code which operates specifically on shared variables rather than trying to use "normal" code with them, it works great. The problem is that you inevitably want to do things like use a function that takes thread-local variables on a shared object - e.g. if a type is written to be used as thread-local, then none of its member functions are shared, and none of them can be used by a shared object, which obviously makes using such a type as shared to be a bit of a pain.
In principle, D is supposed to provide ways to safely convert shared objects to thread-local ones - i.e. when the compiler can guarantee that the object is protected by a mutex or synchronized block or whatnot. The main way that this was proposed is what TDPL describes with regards to synchronized classes. The member variables of a synchronized class would be shared, and protected by the class, since all of its member functions would be synchronized, and no direct access to the member variables would be allowed, guaranteeing that any time the member variables were accessed, it would be within a synchronized function, meaning that the compiler could guarantee that all access to the member variables was protected by a mutex. So, the compiler would then be able to safely strip away the outermost layer of shared, allowing you to theoretically use the member variables with normal functions.
However, that only strips away the outermost layer (since that's all the compiler could guarantee was protected), which frequently wouldn't be enough, and it requires creating entire synchronized types just to use shared objects. So, the efficacy of the idea is questionable IMHO, much as the motivation is a good one (only removing shared when the compiler can guarantee that only one thread can access it). However, synchronized classes have yet to be implemented (only synchronized functions), so we don't currently have the ability to have the outermost layer of shared be stripped away like that. There is currently no place in the language where the compiler is able to guarantee that a shared object is sufficiently protected against access from multiple threads at once for it to be able to automatically remove shared under any circumstances.
The _only_ way to strip it away at this point is to cast it away explicitly. So, right now, what you're forced to do is something like
shared T foo = funcThatReturnsSharedT();
synchronized(someObj)
{
// be sure at this point that all other code that access foo
// also synchronizes on someObj before accessing it.
auto bar = cast(T)foo;
// do something with bar like call normal member functions or
// pass call normal free functions on it that don't take shared.
// be sure at this point that there are no other thread-local
// references to foo/bar remaining after whatever has been done to it
// in this synchronized block has been done to it. All references
// to it outside the synchronized block must be shared.
}
// now, there should only be shared reference so foo.
Obviously, this is error-prone in that it's up to you to make sure that all accesses to the shared object are protected and that no thread-local reference to it escapes a synchronized block. Ideally, the compiler would be able to determine that shared could be stripped from the object within the synchronized block, but it has no way of knowing that all other references to it are properly protected as well (unlike it would with synchronized if they existed classes), so it can't do that. It's up to you to explicitly protect access to the shared variable and to make sure that no thread-local references to it escape that protection.
So, ultimately, when using shared, you currently have one of two options:
1. Only ever use a shared object with code that is specifically written for shared objects (e.g. classes where all of the member variables are shared). This avoids having to cast away shared, but you still need to use synchronized blocks or mutexes to protect access to any shared objects, and it can mean having to duplicate code that works with thread-local objects.
2. Cast away shared within a synchronized block (or when a mutex is locked) like in the example above.
Ideally, the situation would be better than this, and we're pretty much all in agreement that we want to improve it, but we have yet to actually come up with a better solution yet. The unfortunate result is that a lot of folks just use __gshared rather than write code explicitly for shared objects or cast away shared within synchronized blocks. But while using shared "properly" is currently far more annoying than it should be, IMHO it's well worth the extra protection you get of knowing when objects are shared or thread-local. It only comes at the cost of having to make sure that all accesses to the variable are properly protected by a mutex or synchronized block and having to cast away shared within that area of protection, and except for the casting, that's exactly what you have to do in languages like C++ and Java anyway, except that in D, you know exactly what code involves shared objects, and it's nicely segregated, whereas in C++, it could easily be anywhere in you code and you wouldn't know it, because it's not part of the type system at all. So, even if shared is not yet where we want it to be, it's still a significant improvement over the likes of C++ and Java IMHO. But hopefully, the situation in D will improve in the future so that using shared isn't quite as unwieldy.
- Jonathan M Davis
| |||
January 02, 2015 Re: What exactly shared means? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | On Friday, 2 January 2015 at 13:14:14 UTC, Jonathan M Davis via Digitalmars-d-learn wrote: > Objects in D default to being thread-local. __gshared and shared both make > it so that they're not thread-local. __gshared does it without actually > changing the type, making it easier to use but also dangerous to use, > because it makes it easy to violate the compiler's guarantees, because it'll > treat it like a thread-local variable with regards to optimizations and > whatnot. I'm pretty sure that's not true. __gshared corresponds to C-style globals, which are *not* assumed to be thread-local (see below). > > shared does not add any more synchronization or automatic mutex-locking or > anything like that than __gshared does (IIRC, there is some talk in TDPL > about shared adding memory barriers - which __gshared wouldn't do - but that > hasn't been implemented and probably never would be, because it would be too > expensive with regards to efficiency). However, unlike __gshared, shared > _does_ alter the type of the variable, so the compiler will treat it > differently. That way, it won't do stuff like optimize code under the > assumption that the object is thread-local like it can do with non-shared > objects. Are you sure about all this optimisation stuff? I had (perhaps wrongly) assumed that __gshared and shared variables in D guaranteed Sequential Consistency for Data Race Free (SCDRF) and nothing more, just like all normal variables in C, C++ and Java. Thread-local variables (i.e. everything else in D ) could in theory be loosened up to allow some *extra* optimisations that C/C++/Java etc. don't normally support (because they would risk violating SCDRF), but I don't know how much of this is taken advantage of currently. If I'm correct, then the advice to users would be "Use __gshared and pretend you're writing C/C++/Java, or use shared and do exactly the same but with type-system support for your convenience/frustration". | |||
January 02, 2015 Re: What exactly shared means? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On 1/2/15 2:47 PM, John Colvin wrote: > > Are you sure about all this optimisation stuff? I had (perhaps wrongly) > assumed that __gshared and shared variables in D guaranteed Sequential > Consistency for Data Race Free (SCDRF) and nothing more, just like all > normal variables in C, C++ and Java. There is nothing special about __gshared other than where it is put. Real simple test: __gshared int x; void main() { int xlocal; int *xp = (rand() % 2) ? &x; &xlocal; *xp = 5; } tell me how the compiler can possibly know anything about what type of data xp points at? But with shared, the type itself carries the hint that the data is shared between threads. At this point, this guarantees nothing in terms of races and ordering, which is why shared is so useless. In fact the only useful aspect of shared is that data not marked as shared is guaranteed thread local. > If I'm correct, then the advice to users would be "Use __gshared and > pretend you're writing C/C++/Java, or use shared and do exactly the same > but with type-system support for your convenience/frustration". Use __gshared for accessing C globals, and otherwise only if you know what you are doing. There are many aspects of D that make assumptions based on whether a type is shared or not. -Steve | |||
January 02, 2015 Re: What exactly shared means? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Friday, 2 January 2015 at 20:32:51 UTC, Steven Schveighoffer wrote:
> On 1/2/15 2:47 PM, John Colvin wrote:
>
>>
>> Are you sure about all this optimisation stuff? I had (perhaps wrongly)
>> assumed that __gshared and shared variables in D guaranteed Sequential
>> Consistency for Data Race Free (SCDRF) and nothing more, just like all
>> normal variables in C, C++ and Java.
>
> There is nothing special about __gshared other than where it is put.
>
> Real simple test:
>
> __gshared int x;
>
> void main()
> {
> int xlocal;
> int *xp = (rand() % 2) ? &x; &xlocal;
>
> *xp = 5;
> }
>
> tell me how the compiler can possibly know anything about what type of data xp points at?
>
> But with shared, the type itself carries the hint that the data is shared between threads. At this point, this guarantees nothing in terms of races and ordering, which is why shared is so useless. In fact the only useful aspect of shared is that data not marked as shared is guaranteed thread local.
>
>> If I'm correct, then the advice to users would be "Use __gshared and
>> pretend you're writing C/C++/Java, or use shared and do exactly the same
>> but with type-system support for your convenience/frustration".
>
> Use __gshared for accessing C globals, and otherwise only if you know what you are doing. There are many aspects of D that make assumptions based on whether a type is shared or not.
>
> -Steve
Perhaps a more precise statement of affairs would be this:
All variables/data are SC-DRF with the exception of static variables and globals, which are thread-local. `shared` exists only to express via the type-system the necessity of thread-safe usage, without prescribing or implementing said usage.
Hmm. I went in to writing that thinking "shared isn't so bad". Now I've thought about it, it is pretty damn useless. What's the point of knowing that data is shared without knowing how to safely use it? I guess it protects against completely naive usage.
Couldn't we have thread-safe access encapsulated within types a-la std::atomic?
| |||
January 02, 2015 Re: What exactly shared means? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On Friday, 2 January 2015 at 21:06:03 UTC, John Colvin wrote:
> Hmm. I went in to writing that thinking "shared isn't so bad". Now I've thought about it, it is pretty damn useless. What's the point of knowing that data is shared without knowing how to safely use it? I guess it protects against completely naive usage.
The real issue with "shared" is that objects may change status during runtime based on the state of the program.
What you really want to know is when a parameter is "local", that is, guaranteed to not be accessed by another thread during the execution of the function. If so you open up for optimizations.
| |||
January 02, 2015 Re: What exactly shared means? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Ola Fosheim Grøstad | On Friday, 2 January 2015 at 22:10:36 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 2 January 2015 at 21:06:03 UTC, John Colvin wrote:
>> Hmm. I went in to writing that thinking "shared isn't so bad". Now I've thought about it, it is pretty damn useless. What's the point of knowing that data is shared without knowing how to safely use it? I guess it protects against completely naive usage.
>
> The real issue with "shared" is that objects may change status during runtime based on the state of the program.
>
> What you really want to know is when a parameter is "local", that is, guaranteed to not be accessed by another thread during the execution of the function. If so you open up for optimizations.
What significant optimisations does SC-DRF actually prevent?
| |||
January 02, 2015 Re: What exactly shared means? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On Friday, January 02, 2015 19:47:50 John Colvin via Digitalmars-d-learn wrote:
> On Friday, 2 January 2015 at 13:14:14 UTC, Jonathan M Davis via Digitalmars-d-learn wrote:
> > Objects in D default to being thread-local. __gshared and
> > shared both make
> > it so that they're not thread-local. __gshared does it without
> > actually
> > changing the type, making it easier to use but also dangerous
> > to use,
> > because it makes it easy to violate the compiler's guarantees,
> > because it'll
> > treat it like a thread-local variable with regards to
> > optimizations and
> > whatnot.
>
> I'm pretty sure that's not true. __gshared corresponds to C-style globals, which are *not* assumed to be thread-local (see below).
No, the type system will treat __gshared like a thread-local variable. It gets put in shared memory like a C global would be, but __gshared isn't actually part of the type, so the compiler has no way of knowing that it's anything other than a thread-local variable - which is precisely why it's so dangerous to use it instead of shared. For instance,
__gshared int* foo;
void main()
{
foo = new int;
int* bar = foo;
}
will compile just fine, whereas if you used shared, it wouldn't.
- Jonathan M Davis
| |||
January 02, 2015 Re: What exactly shared means? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Friday, January 02, 2015 15:32:51 Steven Schveighoffer via Digitalmars-d-learn wrote:
> In fact the
> only useful aspect of shared is that data not marked as shared is
> guaranteed thread local.
That and the fact that you're supposed to be able to know which portions of your program are operating on shared data quite easily that way, as opposed to it potentially being scattered everywhere through a program like it can be in languages like C++ or Java. But it definitely doesn't provide any of the kinds of compiler guarantees that we all wanted it to. The result is that it's arguably a bit like C++'s const in that it helps, but it really doesn't ultimately provide strong guarantees. Personally, I think that we're still far better off using shared the way it is than using __gshared or being stuck with what C++ and the like have, but there's no question that it's not where we want it to be.
- Jonathan M Davis
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply