Jump to page: 1 2 3
Thread overview
Mallocator and 'shared'
Feb 10, 2017
bitwise
Feb 11, 2017
Michael Coulombe
Feb 12, 2017
bitwise
Feb 13, 2017
Moritz Maxeiner
Feb 13, 2017
ag0aep6g
Feb 13, 2017
Moritz Maxeiner
Feb 13, 2017
ag0aep6g
Feb 13, 2017
Kagamin
Feb 13, 2017
Moritz Maxeiner
Feb 14, 2017
Johannes Pfau
Feb 14, 2017
Moritz Maxeiner
Feb 14, 2017
Moritz Maxeiner
Feb 14, 2017
Johannes Pfau
Feb 14, 2017
Kagamin
Feb 14, 2017
Johannes Pfau
Feb 14, 2017
Kagamin
Feb 14, 2017
Kagamin
Feb 14, 2017
Moritz Maxeiner
Feb 15, 2017
Kagamin
Feb 15, 2017
Moritz Maxeiner
Feb 11, 2017
Nicholas Wilson
February 10, 2017
https://github.com/dlang/phobos/blob/cd7846eb96ea7d2fa65ccb04b4ca5d5b0d1d4a63/std/experimental/allocator/mallocator.d#L63-L65

Looking at Mallocator, the use of 'shared' doesn't seem correct to me.

The logic stated in the comment above is that 'malloc' is thread safe, and therefore all methods of Mallocator can be qualified with 'shared'.

I thought that qualifying a method as 'shared' meant that it _can_ touch shared memory, and is therefore _not_ thread safe.


The following program produces this error:
"Error: shared method Mallocator.allocate is not callable using a non-shared object"

import std.experimental.allocator.mallocator;

int main(string[] argv) {
    Mallocator m;
    m.allocate(64);
    return 0;
}

And the above error is because it would be un(thread)safe to call those methods from a non-shared context, due to the fact that they may access shared memory.

Am I wrong here?

February 11, 2017
On Friday, 10 February 2017 at 23:57:18 UTC, bitwise wrote:
> https://github.com/dlang/phobos/blob/cd7846eb96ea7d2fa65ccb04b4ca5d5b0d1d4a63/std/experimental/allocator/mallocator.d#L63-L65
>
> Looking at Mallocator, the use of 'shared' doesn't seem correct to me.
>
> The logic stated in the comment above is that 'malloc' is thread safe, and therefore all methods of Mallocator can be qualified with 'shared'.
>
> I thought that qualifying a method as 'shared' meant that it _can_ touch shared memory, and is therefore _not_ thread safe.
>
>
> The following program produces this error:
> "Error: shared method Mallocator.allocate is not callable using a non-shared object"
>
> import std.experimental.allocator.mallocator;
>
> int main(string[] argv) {
>     Mallocator m;
>     m.allocate(64);
>     return 0;
> }
>
> And the above error is because it would be un(thread)safe to call those methods from a non-shared context, due to the fact that they may access shared memory.
>
> Am I wrong here?

A shared method means that it can only be called on a shared instance of the struct/class, which will have shared fields. A shared method should be logically thread-safe, but that cannot be guaranteed by the compiler. A non-shared method can touch shared memory, and thus should be thread-safe if it does, but can only be called on a non-shared instance with possibly non-shared fields.

shared/non-shared methods don't mix because you generally need to use different, less-efficient instructions and algorithms to be thread-safe and scalable in a shared method. In the case of Mallocator, there are no fields so as far as I can tell the attribute doesn't do much except for documentation and for storing references to it in other shared structs/objects.
February 11, 2017
On Friday, 10 February 2017 at 23:57:18 UTC, bitwise wrote:
> https://github.com/dlang/phobos/blob/cd7846eb96ea7d2fa65ccb04b4ca5d5b0d1d4a63/std/experimental/allocator/mallocator.d#L63-L65
>
> Looking at Mallocator, the use of 'shared' doesn't seem correct to me.
>
> [...]

IIRC you're supposed to use `Mallocator.instance` as it is a singleton.
February 12, 2017
On Saturday, 11 February 2017 at 04:32:37 UTC, Michael Coulombe wrote:
> On Friday, 10 February 2017 at 23:57:18 UTC, bitwise wrote:
>> [...]
>
> A shared method means that it can only be called on a shared instance of the struct/class, which will have shared fields. A shared method should be logically thread-safe, but that cannot be guaranteed by the compiler. A non-shared method can touch shared memory, and thus should be thread-safe if it does, but can only be called on a non-shared instance with possibly non-shared fields.
>
> shared/non-shared methods don't mix because you generally need to use different, less-efficient instructions and algorithms to be thread-safe and scalable in a shared method. In the case of Mallocator, there are no fields so as far as I can tell the attribute doesn't do much except for documentation and for storing references to it in other shared structs/objects.

Thanks for the explanation, but I'm still confused.

It seems like you're saying that 'shared' should mean both 'thread safe' and 'not thread safe' depending on context, which doesn't make sense.

Example:

shared A a;

struct A {
    int x, y;

    void foo() shared {
        a.x = 1;
    }
}

int main(string[] argv) {
    a.x = 5;
    a.y = 5;
    a.foo();
    return 0;
}

Qualifying 'a' with 'shared' means that it's shared between threads, which means that accessing it is _not_ thread safe. Since the method 'foo' accesses 'a', 'foo' is also _not_ thread safe. Given that both the data and the method are 'shared', a caller should know that race conditions are possible and that they should aquire a lock before accessing either of them...or so it would seem.

I imagine that qualifying a method with 'shared' should mean that it can access shared data, and hence, is _not_ thread safe. This prevent access to 'shared' data from any non 'shared' context, without some kind of bridge/cast that a programmer would use when they knew that they had aquired the lock or ownership of the data. Although this is what would make sense to me, it doesn't seem to match with the current implementation of 'shared', or what you're saying.

It seems that methods qualified with 'shared' may be what you're suggesting matches up with the 'bridge' I'm trying to describe, but again, using the word 'shared' to mean both 'thread safe' and 'not thread safe' doesn't make sense. Firstly, because the same keyword should not mean two strictly opposite things. Also, if a 'shared' method is supposed to be thread safe, then the fact that it has access to shared data is irrelevant to the caller. So 'shared' as a method qualifier doesn't really make sense. What would make more sense is to have a region where 'shared' data could be accessed - Maybe something like this:

struct S {
    shared int x;
    Lock lk;

    private void addNum(int n) shared {
        x += num;
    }

    int add(int a, int b)
    {
        shared {
            lk.lock();
            addNum(a);
            addNum(b);
            lk.unlock();
        }
    }
}

So above,
1) 'x' would be shared, and mutating it would not thread safe.
2) 'addNum' would have access to 'shared' data, and also be non-thread-safe
3) 'x' and 'addNum' would be inaccessible from 'add' since they're 'shared'
4) a 'shared' block inside 'add' would allow access to 'x' or 'addNum', with the responsibility being on the programmer to lock.
5) alternatively, 'shared' data could be accessed from within a 'synchronized' block.

I thought 'shared' was a finished feature, but it's starting to seem like it's a WIP. This kind of feature seems like it has great potential, but is mostly useless in it's current state. After more testing with shared, it seems that 'shared' data is mutable from many contexts, from which it would be unsafe to mutate it without locking first, which basically removes any gauruntee that would make 'shared' useful.

Again, tell me if I'm wrong here, but there seems to be a lot of holes in 'shared'.

  Thanks
February 13, 2017
On Sunday, 12 February 2017 at 20:08:05 UTC, bitwise wrote:
> It seems that methods qualified with 'shared' may be what you're suggesting matches up with the 'bridge' I'm trying to describe, but again, using the word 'shared' to mean both 'thread safe' and 'not thread safe' doesn't make sense. [...]

For essentially all that follows, refer to [1][2]
`shared` (as well as `__gshared`) on a variable has the semantics of multiple threads sharing one single memory location for that variable (i.e. it will not be put into thread local storage). Accessing such data directly is inherently not thread safe. Period. You will need some form of synchronization (see [3]) to access such data in a thread safe manner.
Now, `shared` is supposed to additionally provide memory barriers, so that reads/writes on such variables are guaranteed not to be reordered in a way that breaks your algorithm; remember, the compiler (and also later the cpu when it reorders the opcode) is allowed to reorder reads/writes to a memory location to be more efficient, as long as doing so won't change the logic as the compiler/or cpu sees it. Example:

__gshared int a = 0;

// thread 1:
a = 1;
int b = a;
writeln(b);

// thread 2:
a += 1;

In the above, you may expect `b` to be either 1, or 2, depending on how the cpu interleaves the memory access, but it can, in fact, also be 0, since neither the compiler, nor the cpu can detect any reason as to why `a = 1` should need to come before `int b = a` and may thus reorder the write and the read. Memory barriers prevent such reordering in the cpu and if we had made `a` `shared` those barriers would've been supposed to be emitted by the compiler (in addition to not reordering them itself). Unfortunately, that emission is not implemented.

From [4]:
> Non-static member functions can have, in addition to the usual FunctionAttributes, the attributes const, immutable, shared, or inout. These attributes apply to the hidden this parameter.

Thus a member function being `shared` means nothing more than that the instance it is called on must also be `shared`, i.e.

class Foo
{
    shared void bar();
}

Foo foo;
foo.bar();    // this is illegal, `foo` (the hidden `this` of `bar`) is not shared

shared Foo foobar;
foobar.bar(); // this is legal, since `foobar` is shared

That's it, there are no two meanings of `shared` depending on some context, there is only one: The data in question, which is either the attributed variable, or the object/instance of the member function being attributed, is shared between threads and accessing it directly is not thread safe.

On Sunday, 12 February 2017 at 20:08:05 UTC, bitwise wrote:
> I thought 'shared' was a finished feature, but it's starting to seem like it's a WIP.

I prefer the term "unfinished" since "WIP" implies that it's being worked on. AFAIK there's no one currently working on implementing what's missing in the compiler frontend with regards to the spec.

On Sunday, 12 February 2017 at 20:08:05 UTC, bitwise wrote:
> This kind of feature seems like it has great potential, but is mostly useless in it's current state.

I share that opinion and generally either use `__gshared` if I absolutely have to share data via shared memory and design carefully to avoid all the potential issues, or - which I much prefer - use message passing: `std.concurrency` is your friend.

On Sunday, 12 February 2017 at 20:08:05 UTC, bitwise wrote:
> After more testing with shared, it seems that 'shared' data is mutable from many contexts, from which it would be unsafe to mutate it without locking first, which basically removes any gauruntee that would make 'shared' useful.

As pointed out above that's to be expected, since that's its job. Regarding guarantees: Since D treats data as thread local by default, you need either `shared` or `__gshared` to have mutable shared (intra-process) memory (ignoring OS facilities for inter-process shared memory). The main advantage is not in data being `shared`/`__gshared`, but in the guarantees that all the other (unattributed, thread local) data gets: Each thread has its own copies and any optimisations applied to code that accesses them need not consider multiple threads (I'd wager this is a significant boon towards D's fast compile times).
If you only talk about useful benefits of `shared` over `__gshared`, if the spec were properly implemented, the useful properties would include you not needing to worry about memory barriers. Other useful guaranties are the more rigorous type checks, when compared to `__gshared`, which are supposed to prevent you from committing some of the common mistakes occurring in non-sequential programming (see, e.g. the code example with `class Foo` above).

On Sunday, 12 February 2017 at 20:08:05 UTC, bitwise wrote:
> Again, tell me if I'm wrong here, but there seems to be a lot of holes in 'shared'.

There are holes in the implementation of `shared`; it's spec, however, is complete and coherent.

[1] https://dlang.org/faq.html#shared_guarantees
[2] https://dlang.org/spec/attribute.html#shared
[3] https://tour.dlang.org/tour/en/multithreading/synchronization-sharing
[4] https://dlang.org/spec/class.html
February 13, 2017
On Sunday, 12 February 2017 at 20:08:05 UTC, bitwise wrote:
> It seems like you're saying that 'shared' should mean both 'thread safe' and 'not thread safe' depending on context, which doesn't make sense.

Makes sense to me: A `shared` variable is shared among threads. Accesses are not thread-safe. When a function has a `shared` parameter, it's expected to access that variable in a thread-safe manner. A `shared` method is a function with a `shared` `this` parameter.

Considering the alternative, that functions are not expected to ensure thread-safety on their `shared` parameters, that would mean you have to ensure it at the call site. And then what would be the point of marking the parameter `shared`, if thread-safety is already ensured?

> Example:
>
> shared A a;
>
> struct A {
>     int x, y;
>
>     void foo() shared {
>         a.x = 1;
>     }
> }
>
> int main(string[] argv) {
>     a.x = 5;
>     a.y = 5;
>     a.foo();
>     return 0;
> }
>
> Qualifying 'a' with 'shared' means that it's shared between threads, which means that accessing it is _not_ thread safe.

Yup. In my opinion, non-atomic accesses like that should be rejected by the compiler. But you can write race conditions with only atomic stores/loads, so in the end it's the programmer's responsibility to write correct code.

> Since the method 'foo' accesses 'a', 'foo' is also _not_ thread safe.

Well, yes, that `foo` isn't thread-safe. But it should be written differently so that it is.

> Given that both the data and the method are 'shared', a caller should know that race conditions are possible and that they should aquire a lock before accessing either of them...or so it would seem.

But when you have the lock, you can safely call any method, including non-`shared` ones. I see no point in distinguishing `shared` and unshared methods then.

Non-`shared` methods are obviously not safe to call on `shared` objects. So `shared` methods must be other thing: safe.

> I imagine that qualifying a method with 'shared' should mean that it can access shared data, and hence, is _not_ thread safe.

Every function/method can access shared data. They're all expected to do it safely. The `shared` attribute just qualifies the `this` reference.

> This prevent access to 'shared' data from any non 'shared' context, without some kind of bridge/cast that a programmer would use when they knew that they had aquired the lock or ownership of the data. Although this is what would make sense to me, it doesn't seem to match with the current implementation of 'shared', or what you're saying.

It wouldn't exactly "prevent" it, would it? The compiler can't check that you've got the correct lock. It would be expected of the programmer to do so before calling the `shared` method. That's easy to get wrong.

When `shared` methods are safe themselves, you can't get the calls to them wrong. The ugly is nicely contained. To call an unsafe method, you have to cast and that's a good indicator that you're entering the danger zone.

> It seems that methods qualified with 'shared' may be what you're suggesting matches up with the 'bridge' I'm trying to describe, but again, using the word 'shared' to mean both 'thread safe' and 'not thread safe' doesn't make sense.

Maybe don't think of it meaning "safe" or "unsafe" then. It just means "shared".

A `shared` variable is just that: shared. The way you deal with it can be thread-safe or not. Everyone is expected to deal with it safely, though. "Everyone" includes `shared` methods.

> Firstly, because the same keyword should not mean two strictly opposite things. Also, if a 'shared' method is supposed to be thread safe, then the fact that it has access to shared data is irrelevant to the caller.

Non-`shared` methods are not thread-safe. They expect unshared data. You can still call them on shared objects, though, with a cast. And when you've ensured thread-safety beforehand, it's perfectly fine to do so.

If `shared` methods were unsafe too, then that would only allow calling unsafe code without a cast. Doesn't seem like an improvement.

> So 'shared' as a method qualifier doesn't really make sense. What would make more sense is to have a region where 'shared' data could be accessed - Maybe something like this:
>
> struct S {
>     shared int x;
>     Lock lk;
>
>     private void addNum(int n) shared {
>         x += num;
>     }
>
>     int add(int a, int b)
>     {
>         shared {
>             lk.lock();
>             addNum(a);
>             addNum(b);
>             lk.unlock();
>         }
>     }
> }
>
> So above,
> 1) 'x' would be shared, and mutating it would not thread safe.

As it is now.

> 2) 'addNum' would have access to 'shared' data, and also be non-thread-safe

Today, non-`shared` methods are unsafe, and they can access shared data just like `shared` methods. But I imagine you'd have `shared` methods alter `shared` data freely, without casting.

> 3) 'x' and 'addNum' would be inaccessible from 'add' since they're 'shared'

As it is now. Can't just call a `shared` method from a non-`shared` one.

> 4) a 'shared' block inside 'add' would allow access to 'x' or 'addNum', with the responsibility being on the programmer to lock.

So the `shared` block as a whole is thread-safe and it's the programmer's duty to make sure of that. While today, a `shared` method as a whole is thread-safe and it's the programmer's duty to make sure of that. Not much of a difference, is it?

> 5) alternatively, 'shared' data could be accessed from within a 'synchronized' block.
>
> I thought 'shared' was a finished feature, but it's starting to seem like it's a WIP. This kind of feature seems like it has great potential, but is mostly useless in it's current state.

That may be so or not. I don't think you've made an argument for "unfinished" or "useless", though. You've argued "inconsistent", and maybe "surprising" or simply "bad". I wouldn't expect further development of the feature to meet your vision.

> After more testing with shared, it seems that 'shared' data is mutable from many contexts, from which it would be unsafe to mutate it without locking first, which basically removes any gauruntee that would make 'shared' useful.

There is no guarantee of thread-safety, yes. There cannot be, as far as I understand, because the compiler cannot know which operations must happen without interruption.

However, as I've said above, I'd like non-atomic accesses of `shared` variables to be rejected. Non-atomic increment is being rejected, so it makes no sense to me that non-atomic writes and reads are allowed.

Overall, I think `shared` is solid. It's not magic. It mainly prevents you from accidentally doing unsafe stuff by highlighting shared data and forcing you to cast or use special functions that deal safely with shared data.
February 13, 2017
On 02/13/2017 01:27 AM, Moritz Maxeiner wrote:
> __gshared int a = 0;
>
> // thread 1:
> a = 1;
> int b = a;
> writeln(b);
>
> // thread 2:
> a += 1;
>
> In the above, you may expect `b` to be either 1, or 2, depending on how
> the cpu interleaves the memory access, but it can, in fact, also be 0,
> since neither the compiler, nor the cpu can detect any reason as to why
> `a = 1` should need to come before `int b = a` and may thus reorder the
> write and the read.

This doesn't make sense to me. b depends on a. If I run thread 1 alone, I can expect b to be 1, no?  Thread 2 can then a) read 0, write 1; or b) read 1, write 2. How can b be 0 when the writeln is executed?

An example like this makes more sense to me:

----
shared int a = 0, b = 0;

// Thread 1:
a = 1;
b = 2;

// Thread 2:
writeln(a + b);
----

One might think that thread 2 cannot print "2", because from the order of statements the numbers must be 0 + 0 or 1 + 0 or 1 + 2. But thread 1 might execute `b = 2` before `a = 1`, because the order doesn't matter to thread 1. So 0 + 2 is possible, too.
February 13, 2017
On Monday, 13 February 2017 at 01:30:57 UTC, ag0aep6g wrote:
> This doesn't make sense to me. b depends on a. If I run thread 1 alone, I can expect b to be 1, no?  Thread 2 can then a) read 0, write 1; or b) read 1, write 2. How can b be 0 when the writeln is executed?
> 
> An example like this makes more sense to me:
>
> ----
> shared int a = 0, b = 0;
>
> // Thread 1:
> a = 1;
> b = 2;
>
> // Thread 2:
> writeln(a + b);
> ----
>
> One might think that thread 2 cannot print "2", because from the order of statements the numbers must be 0 + 0 or 1 + 0 or 1 + 2. But thread 1 might execute `b = 2` before `a = 1`, because the order doesn't matter to thread 1. So 0 + 2 is possible, too.

You're right, of course, and I shall do well to remember not to think up examples for non-sequential code at such an hour, I am sorry. Thank you for providing a correct example plus explanation. The rest of my post still stands, though.
In recompense I shall provide another example, this one translated from Wikipedia[1] instead:

__gshared int f = 0, x = 0;

// thread 1
while (f == 0);
// Memory barrier required here
writeln(x)

// thread 2
x = 42;
// Memory barrier required here
f = 1;

The above demonstrates a case where you do need a memory barrier:
thread 1 and 2 have a consumer/producer relationship, where thread 1 wants to consume a value from thread 2 via `x`, using `f` to be notified that `x` is ready to be consumed;
Without memory barriers at both of the indicated lines the cpu is free to reorder either thread:
The first is required so that thread 1 doesn't get reordered to consume before being notified and the second so that thread 2 doesn't get reordered to signal thread 1 before having produced.

If we had made `f` and `x` `shared` instead of `__gshared` the spec would require (at least) the two indicated memory barriers being emitted. Currently, though, it won't and for this case (AFAIK) `shared` won't get you any benefit I'm aware of over `__gshared`. You'll still need to add those memory barriers (probably using inline assembler, though I'm not sure what's the best way is in D, since I usually just stick with message passing).

[1] https://en.wikipedia.org/wiki/Memory_barrier
February 13, 2017
On Sunday, 12 February 2017 at 20:08:05 UTC, bitwise wrote:
> Given that both the data and the method are 'shared', a caller should know that race conditions are possible and that they should aquire a lock before accessing either of them...or so it would seem.

Thread unsafe methods shouldn't be marked shared, it doesn't make sense. If you don't want to provide thread-safe interface, don't mark methods as shared, so they will not be callable on a shared instance and thus the user will be unable to use the shared object instance and hence will know the object is thread unsafe and needs manual synchronization.

ps Memory barriers are a bad idea because they don't defend from a race condition, but they look like they do :) use std.concurrency for a simple and safe concurrency, that's what it's made for.
February 13, 2017
On Monday, 13 February 2017 at 14:20:05 UTC, Kagamin wrote:
> Thread unsafe methods shouldn't be marked shared, it doesn't make sense. If you don't want to provide thread-safe interface, don't mark methods as shared, so they will not be callable on a shared instance and thus the user will be unable to use the shared object instance and hence will know the object is thread unsafe and needs manual synchronization.

To be clear: While I might, in general, agree that using shared methods only for thread safe methods seems to be a sensible restriction, neither language nor compiler require it to be so; and absence of evidence of a useful application is not evidence of absence.

On Monday, 13 February 2017 at 14:20:05 UTC, Kagamin wrote:
> ps Memory barriers are a bad idea because they don't defend from a race condition, but they look like they do :)

There are two very common pitfalls in non-sequential programming with regards to reads/writes to memory shared between threads:
Issue 1: Sequencing/Interleaving of several threads into the logical memory access order
Issue 2: Reordering of code within one thread

Code that changes semantics because of issue 1 has race conditions; fixing it requires synchronization primitives, such as locking opcode, transactional memory, etc.

Code that changes semantics because of issue 2 may or may not have race conditions, but it definitely requires memory barriers.

Claiming that memory barriers are a bad idea because they don't defend against race conditions, but look like they do (when that's what synchronization is for) is similar enough to saying airbags in cars are a bad idea because they don't keep your body in place, but look like they do (when that's what seat belts are for).
My point here being that I don't understand what made you state that memory barriers look like they deal with race conditions, as they have nothing to do with that.

To be clear: Synchronization (the fix for race conditions) does not help you to deal with issue 2. If my last example had instead been

---
__gshared int f = 0, x = 0;
Object monitor;

// thread 1
synchronized (monitor) while (f == 0);
// Memory barrier required here
synchronized (monitor) writeln(x)

// thread 2
synchronized (monitor) x = 42;
// Memory barrier required here
synchronized (monitor) f = 1;
---

you'd still need those memory barriers. Also note that the synchronization in the above is not needed in terms of semantics. The code has no race conditions, all permutations of the (interleaved) memory access order yield the same output from thread 1. Also, since synchronization primitives and memory barriers have different runtime costs, depending on your hardware support and how they are translated to that support from D, there's no "one size fits all" solution on the low level we're on here.

My opinion on the matter of `shared` emitting memory barriers is that either the spec and documentation[1] should be updated to reflect that sequential consistency is a non-goal of `shared` (and if that is decided this should be accompanied by an example of how to add memory barriers yourself), or it should be implemented. Though leaving it in the current "not implemented, no comment / plan on whether/when it will be implemented" state seems to have little practical consequence - since no one seems to actually work on this level in D - and I can thus understand why dealing with that is just not a priority.

On Monday, 13 February 2017 at 14:20:05 UTC, Kagamin wrote:
> use std.concurrency for a simple and safe concurrency, that's what it's made for.

I agree, message passing is considerably less tricky and you're unlikely to shoot yourself in the foot. Nonetheless, there are valid use cases where the overhead of MP may not be acceptable.

[1] https://dlang.org/faq.html#shared_guarantees
« First   ‹ Prev
1 2 3