Jump to page: 1 2
Thread overview
synchronized class bugs?
Apr 07, 2020
Gregor Mückl
Apr 07, 2020
IGotD-
Apr 07, 2020
Gregor Mückl
Apr 07, 2020
IGotD-
Apr 07, 2020
Gregor Mückl
Apr 08, 2020
IGotD-
Apr 08, 2020
Gregor Mückl
Apr 09, 2020
Gregor Mückl
April 07, 2020
I've been playing around with synchronized class. My example is the following dummy class:

    synchronized class Shared {
    public:
        void increment(int value) {
            //(cast(Shared) this).x += value;
            x += value;
        }

        void decrement(int value) {
            //(cast(Shared) this).x -= value;
            x -= value;
        }

        shared int get() { return x; }

    private:
        int x;
    }

Then I'm calling increment and decrement many times in parallel threads so that they *should* cancel out in the end and x should be back to 0 at the end. With the implicit synchronization using d monitors, this should work.

Bug number 1: dmd doesn't translate this without casting away shared:

sharedtest.d(14): Error: read-modify-write operations are not allowed for shared variables. Use core.atomic.atomicOp!"+="(this.x, value) instead.

The += and -= operators are safe as they are inside locked monitors. The emitted code contains calls to d_montiorenter and d_monitorexit. The compiler should understand this.

Bug number 2: even when casting away shared, the end result is wrong when using multiple threads. Thus, the monitor locking code in druntime cannot be correct. It does have an effect, though. Omitting synchronized on Shared results in values that are wide off the mark. Including it results in small differences. This suggests that a memory fence might be missing somewhere.

The observed reality contradicts the documentation in many ways and is inconsistent with itself. What is going on here?


April 07, 2020
On Tuesday, 7 April 2020 at 15:18:41 UTC, Gregor Mückl wrote:
> I've been playing around with synchronized class. My example is the following dummy class:
>
>     synchronized class Shared {
>     public:
>         void increment(int value) {
>             //(cast(Shared) this).x += value;
>             x += value;
>         }
>
>         void decrement(int value) {
>             //(cast(Shared) this).x -= value;
>             x -= value;
>         }
>
>         shared int get() { return x; }
>
>     private:
>         int x;
>     }
>
> Then I'm calling increment and decrement many times in parallel threads so that they *should* cancel out in the end and x should be back to 0 at the end. With the implicit synchronization using d monitors, this should work.
>
> Bug number 1: dmd doesn't translate this without casting away shared:
>
> sharedtest.d(14): Error: read-modify-write operations are not allowed for shared variables. Use core.atomic.atomicOp!"+="(this.x, value) instead.
>
> The += and -= operators are safe as they are inside locked monitors. The emitted code contains calls to d_montiorenter and d_monitorexit. The compiler should understand this.
>
> Bug number 2: even when casting away shared, the end result is wrong when using multiple threads. Thus, the monitor locking code in druntime cannot be correct. It does have an effect, though. Omitting synchronized on Shared results in values that are wide off the mark. Including it results in small differences. This suggests that a memory fence might be missing somewhere.
>
> The observed reality contradicts the documentation in many ways and is inconsistent with itself. What is going on here?

Correct me if I'm wrong, aren't all synchronized classes protected with a mutex. In this case atomic operations are pointless as all methods are protected by the mutex anyway.

April 07, 2020
On Tuesday, 7 April 2020 at 16:11:12 UTC, IGotD- wrote:
> Correct me if I'm wrong, aren't all synchronized classes protected with a mutex. In this case atomic operations are pointless as all methods are protected by the mutex anyway.

That's what's I'm trying to say. They should be protected and the code for locking and unlocking is generated (tested on dmd, gdc and ldc). But

- the compiler tries to enforce atomics in this context anyway, but they are pointless due to the mutex
- the mutex doesn't lead to proper locking behavior at runtime; when calling increment an decrement in many threads in parallel, the result is wrong.


April 07, 2020
On Tuesday, 7 April 2020 at 16:18:53 UTC, Gregor Mückl wrote:
> On Tuesday, 7 April 2020 at 16:11:12 UTC, IGotD- wrote:
>> Correct me if I'm wrong, aren't all synchronized classes protected with a mutex. In this case atomic operations are pointless as all methods are protected by the mutex anyway.
>
> That's what's I'm trying to say. They should be protected and the code for locking and unlocking is generated (tested on dmd, gdc and ldc). But
>
> - the compiler tries to enforce atomics in this context anyway, but they are pointless due to the mutex
> - the mutex doesn't lead to proper locking behavior at runtime; when calling increment an decrement in many threads in parallel, the result is wrong.

I think this is correct, you should be allowed to cast to atomics or do whatever operation you want within the mutex, atomic operation or not.

The wrong result sound more like a bug in the locking itself. One way is to try this is on another operating system and see if you get similar result.
April 07, 2020
On Tuesday, 7 April 2020 at 16:33:13 UTC, IGotD- wrote:
> On Tuesday, 7 April 2020 at 16:18:53 UTC, Gregor Mückl wrote:
>> On Tuesday, 7 April 2020 at 16:11:12 UTC, IGotD- wrote:
>>> Correct me if I'm wrong, aren't all synchronized classes protected with a mutex. In this case atomic operations are pointless as all methods are protected by the mutex anyway.
>>
>> That's what's I'm trying to say. They should be protected and the code for locking and unlocking is generated (tested on dmd, gdc and ldc). But
>>
>> - the compiler tries to enforce atomics in this context anyway, but they are pointless due to the mutex
>> - the mutex doesn't lead to proper locking behavior at runtime; when calling increment an decrement in many threads in parallel, the result is wrong.
>
> I think this is correct, you should be allowed to cast to atomics or do whatever operation you want within the mutex, atomic operation or not.
>

I don't think you understand: the compiler actively prohibits the use of non-atomic operations in synchronized functions. That makes no sense.

> The wrong result sound more like a bug in the locking itself. One way is to try this is on another operating system and see if you get similar result.

The results are wrong with dmd 2.088 on Windows and dmd 2.91, gdc 9.3.0 and ldc 1.20.1 on Linux.

April 08, 2020
On Tuesday, 7 April 2020 at 23:34:13 UTC, Gregor Mückl wrote:
> On Tuesday, 7 April 2020 at 16:33:13 UTC, IGotD- wrote:
>
> I don't think you understand: the compiler actively prohibits the use of non-atomic operations in synchronized functions. That makes no sense.

Sorry, I misunderstood. I agree, that doesn't make any sense at all.
April 08, 2020
On 4/7/20 11:18 AM, Gregor Mückl wrote:
> I've been playing around with synchronized class. My example is the following dummy class:
> 
>      synchronized class Shared {
>      public:
>          void increment(int value) {
>              //(cast(Shared) this).x += value;
>              x += value;
>          }
> 
>          void decrement(int value) {
>              //(cast(Shared) this).x -= value;
>              x -= value;
>          }
> 
>          shared int get() { return x; }
> 
>      private:
>          int x;
>      }
> 
> Then I'm calling increment and decrement many times in parallel threads so that they *should* cancel out in the end and x should be back to 0 at the end. With the implicit synchronization using d monitors, this should work.
> 
> Bug number 1: dmd doesn't translate this without casting away shared:
> 
> sharedtest.d(14): Error: read-modify-write operations are not allowed for shared variables. Use core.atomic.atomicOp!"+="(this.x, value) instead.
> 
> The += and -= operators are safe as they are inside locked monitors. The emitted code contains calls to d_montiorenter and d_monitorexit. The compiler should understand this.

The requirement for atomicOp is separate from the synchronization. The synchronization is only sound because you made it sound.

In fact, if you did:

auto getPtr() { return &x; }

Then the synchronization all of a sudden does not apply to that variable.

This is why we are moving to REQUIRING casting away shared to read/write data -- you have to do it anyway.

> Bug number 2: even when casting away shared, the end result is wrong when using multiple threads. Thus, the monitor locking code in druntime cannot be correct. It does have an effect, though. Omitting synchronized on Shared results in values that are wide off the mark. Including it results in small differences. This suggests that a memory fence might be missing somewhere.
> 
> The observed reality contradicts the documentation in many ways and is inconsistent with itself. What is going on here?

synchronized classes call _d_monitorenter and _d_monitorexit, which lock OS primitives (and initialize the monitor if necessary). These have not been touched in years, and depend on the OS mutex which should be rock-solid. Looking at the history, it looks like the last significant change was https://github.com/dlang/druntime/commit/70b3536ddd06e8e1afb269b88ed3298ec03b8798#diff-d609308f6ce1b06db16e214b30db6f74 in 2015.

I've definitely depended on synchronized functions in the past, and haven't had issues, but it was in the far past.

The only place I can reasonably assume that there might be an issue is for the initialization.

To test this, try initializing the lock FIRST before spawning multiple threads (just call one of the functions), and see if it helps. If so, it might be a bug in the initialization routines.

Also if you find an issue, it may be specific to your OS.

-Steve
April 08, 2020
On Wednesday, 8 April 2020 at 13:43:08 UTC, Steven Schveighoffer wrote:
> This is why we are moving to REQUIRING casting away shared to read/write data -- you have to do it anyway.
>

OK, I understand the rationale, even though I don't like the solution very much. You shouldn't really use atomic ops when locks on the affected variables are held. That's bad practice.


>> Bug number 2: even when casting away shared, the end result is wrong when using multiple threads. Thus, the monitor locking code in druntime cannot be correct. It does have an effect, though. Omitting synchronized on Shared results in values that are wide off the mark. Including it results in small differences. This suggests that a memory fence might be missing somewhere.
>> 
>> The observed reality contradicts the documentation in many ways and is inconsistent with itself. What is going on here?
>
> synchronized classes call _d_monitorenter and _d_monitorexit, which lock OS primitives (and initialize the monitor if necessary). These have not been touched in years, and depend on the OS mutex which should be rock-solid. Looking at the history, it looks like the last significant change was https://github.com/dlang/druntime/commit/70b3536ddd06e8e1afb269b88ed3298ec03b8798#diff-d609308f6ce1b06db16e214b30db6f74 in 2015.
>

The calls to _d_monitorenter and _d_monitorexit are in the generated code and I've looked up their implementation before posting to make sure I have correct understanding.

I would also assume mutexes to be rock solid. This is why I am in utter disbelief about the actual behavior of the toy program (see code at the end).

> I've definitely depended on synchronized functions in the past, and haven't had issues, but it was in the far past.
>
> The only place I can reasonably assume that there might be an issue is for the initialization.
>
> To test this, try initializing the lock FIRST before spawning multiple threads (just call one of the functions), and see if it helps. If so, it might be a bug in the initialization routines.
>

I added such function calls and that didn't change anything.

> Also if you find an issue, it may be specific to your OS.
>

Well, as I've mentioned before, I've reproduced this on Linux and Windows with dmd, ldc and gdc. So it's not OS specific, or at least not entirely.

> -Steve

The the full program code is below the line. You can run it on run.dlang.org to reproduce the problem.

----

import core.thread;

import std.stdio;

version = Sync;

version(Sync) {
    pragma(msg, "synchronized");

    synchronized class Shared {
    public:
        void increment(int value) {
            (cast(Shared) this).x += value;
        }

        void decrement(int value) {
            (cast(Shared) this).x -= value;
        }

        shared int get() { return x; }

    private:
        int x;
    }
} else {
    pragma(msg, "non-synchronized");

    class Shared {
    public:
        shared void increment(int value) {
            (cast(Shared) this).x += value;
        }

        shared void decrement(int value) {
            (cast(Shared) this).x -= value;
        }

        shared int get() { return x; }

    private:
        int x;
    }
}

void main()
{
    shared Shared s = new shared Shared();

    s.increment(1);
    s.decrement(1);
    writeln(s.get());

    Thread[] threads = new Thread[10];
    for(int i = 0; i < threads.length; i++) {
        threads[i] = new Thread({
            for(int j = 0; j < 1_000_000; j++) {
                s.increment(i);
            }
            for(int j = 0; j < 1_000_000; j++) {
                s.decrement(i);
            }
        });
        threads[i].start();
    }

    for(int i = 0; i < threads.length; i++) {
        threads[i].join();
    }

    writeln(s.get());
}

April 08, 2020
On 4/8/20 6:50 PM, Gregor Mückl wrote:
> On Wednesday, 8 April 2020 at 13:43:08 UTC, Steven Schveighoffer wrote:
>> This is why we are moving to REQUIRING casting away shared to read/write data -- you have to do it anyway.
>>
> 
> OK, I understand the rationale, even though I don't like the solution very much. You shouldn't really use atomic ops when locks on the affected variables are held. That's bad practice.

No, you misunderstand. The correct path is to cast away shared for those variables that you manually ensure cannot be touched outside your class, not to use atomic ops.

But the compiler has no way to prove you did it right, which is why the casting is required.

The expectation is that frameworks that use this requirement will emerge that make this much more convenient.

> 
> The the full program code is below the line. You can run it on run.dlang.org to reproduce the problem.

OK, will look to see what I can find.

-Steve
April 08, 2020
On 4/8/20 6:50 PM, Gregor Mückl wrote:

> void main()
> {
>      shared Shared s = new shared Shared();
> 
>      s.increment(1);
>      s.decrement(1);
>      writeln(s.get());
> 
>      Thread[] threads = new Thread[10];
>      for(int i = 0; i < threads.length; i++) {
>          threads[i] = new Thread({
>              for(int j = 0; j < 1_000_000; j++) {
>                  s.increment(i);
>              }
>              for(int j = 0; j < 1_000_000; j++) {
>                  s.decrement(i);
>              }
>          });
>          threads[i].start();
>      }
> 
>      for(int i = 0; i < threads.length; i++) {
>          threads[i].join();
>      }
> 
>      writeln(s.get());
> }
> 

The issue is not synchronization of the class data, but your promiscuous usage of the stack frame ;)

Try this instead for your loop:

    for(int i = 0; i < threads.length; i++) {
        threads[i] = (int val) { return new Thread({
            for(int j = 0; j < 1_000_000; j++) {
                s.increment(val);
            }
            for(int j = 0; j < 1_000_000; j++) {
                s.decrement(val);
            }
        });}(i);


Why doesn't yours work? Because you are starting your threads and then changing the value of i as you iterate the loop.

This works, because it captures the i value as a new variable in a closure, so `val` remains constant while `i` increments.

-Steve
« First   ‹ Prev
1 2