Thread overview | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
October 16, 2016 Speed of synchronized | ||||
---|---|---|---|---|
| ||||
Hi, for an exercise I had to implement a thread safe counter. This is what I came up with: ---SNIP--- import std.stdio; import core.thread; import std.conv; import std.datetime; static import core.atomic; import core.sync.mutex; int NR_OF_THREADS = 100; int NR_OF_INCREMENTS = 10000; interface Counter { void increment() shared; long get() shared; } class ThreadUnsafeCounter : Counter { long counter; void increment() shared { counter++; } long get() shared { return counter; } } class ThreadSafe1Counter : Counter { private long counter; synchronized void increment() shared { counter++; } long get() shared { return counter; } } class ThreadSafe2Counter : Counter { private long counter; __gshared Mutex lock; // http://forum.dlang.org/post/rzyooanimrynpmqlywmf@forum.dlang.org this() shared { lock = new Mutex; } void increment() shared { synchronized (lock) { counter++; } } long get() shared { return counter; } } class AtomicCounter : Counter { private long counter; void increment() shared { core.atomic.atomicOp!"+="(this.counter, 1); } long get() shared { return counter; } } void main() { void runWith(Counter)() { shared Counter counter = new shared Counter(); void doIt() { Thread[] threads; for (int i=0; i<NR_OF_THREADS; ++i) { threads ~= new Thread({ for (int i=0; i<NR_OF_INCREMENTS; ++i) { counter.increment(); } }); } foreach (Thread t; threads) { t.start(); } foreach (Thread t; threads) { t.join(); } } auto duration = benchmark!(doIt)(1); writeln(typeid(counter), ": got: ", counter.get(), " expected: ", NR_OF_THREADS * NR_OF_INCREMENTS, " in ", to!Duration(duration[0])); } runWith!(AtomicCounter)(); runWith!(ThreadSafe1Counter)(); runWith!(ThreadSafe2Counter)(); runWith!(ThreadUnsafeCounter)(); void doIt2() { auto mutex = new Mutex; int numThreads = NR_OF_THREADS; int numTries = NR_OF_INCREMENTS; int lockCount = 0; void testFn() { for( int i = 0; i < numTries; ++i ) { synchronized( mutex ) { ++lockCount; } } } auto group = new ThreadGroup; for( int i = 0; i < numThreads; ++i ) group.create( &testFn ); group.joinAll(); assert( lockCount == numThreads * numTries ); } auto duration = benchmark!(doIt2)(1); writeln("from example got: ", to!Duration(duration[0])); } ---SNIP--- For completeness I added also the example from core.sync.mutex (https://dlang.org/phobos/core_sync_mutex.html) at the end. My question now is, why is each mutex based thread safe variant so slow compared to a similar java program? The only hint could be something like: https://blogs.oracle.com/dave/entry/java_util_concurrent_reentrantlock_vs that mentions, that there is some magic going on underneath. For the atomic and the non thread safe variant, the d solution seems to be twice as fast as my java program, for the locked variant, the java program seems to be 40 times faster? btw. I run the code with dub run --build=release Thanks in advance, Christian |
October 16, 2016 Re: Speed of synchronized | ||||
---|---|---|---|---|
| ||||
Posted in reply to Christian Köstlin | On Sunday, 16 October 2016 at 08:41:26 UTC, Christian Köstlin wrote: > Hi, > > for an exercise I had to implement a thread safe counter. This is what I came up with: > > [...] Could you try that: class ThreadSafe3Counter: Counter{ private long counter; private core.sync.mutex.Mutex mtx; public this() shared{ mtx = cast(shared)( new core.sync.mutex.Mutex ); } void increment() shared { (cast()mtx).lock(); scope(exit){ (cast()mtx).unlock(); } core.atomic.atomicOp!"+="(this.counter, 1); } long get() shared { return counter; } } Unfortunately, there are some stupid design decisions in D about "shared", and some people does not want to accept them. Example while you are using mutex, so you shouldn't be forced to use atomicOp there. As a programmer, you know that it will be protected already. That is a loss of performance in the long run. |
October 17, 2016 Re: Speed of synchronized | ||||
---|---|---|---|---|
| ||||
Posted in reply to Christian Köstlin | Dne 16.10.2016 v 10:41 Christian Köstlin via Digitalmars-d-learn napsal(a):
> My question now is, why is each mutex based thread safe variant so slow
> compared to a similar java program? The only hint could be something
> like:
> https://blogs.oracle.com/dave/entry/java_util_concurrent_reentrantlock_vs that
> mentions, that there is some magic going on underneath.
> For the atomic and the non thread safe variant, the d solution seems to
> be twice as fast as my java program, for the locked variant, the java
> program seems to be 40 times faster?
>
> btw. I run the code with dub run --build=release
>
> Thanks in advance,
> Christian
Can you post your timings (both D and Java)? And can you post your java code?
|
October 17, 2016 Re: Speed of synchronized | ||||
---|---|---|---|---|
| ||||
Posted in reply to Daniel Kozak Attachments: | On 17/10/16 06:55, Daniel Kozak via Digitalmars-d-learn wrote: > Dne 16.10.2016 v 10:41 Christian Köstlin via Digitalmars-d-learn napsal(a): > >> My question now is, why is each mutex based thread safe variant so slow >> compared to a similar java program? The only hint could be something >> like: >> https://blogs.oracle.com/dave/entry/java_util_concurrent_reentrantlock_vs >> that >> mentions, that there is some magic going on underneath. >> For the atomic and the non thread safe variant, the d solution seems to >> be twice as fast as my java program, for the locked variant, the java >> program seems to be 40 times faster? >> >> btw. I run the code with dub run --build=release >> >> Thanks in advance, >> Christian > Can you post your timings (both D and Java)? And can you post your java > code? Hi, thanks for asking. I attached my java and d sources. Both try to do more or less the same thing. They spawn 100 threads, that call increment on a counter object 10000 times. The implementation of the counter object is exchanged, between a obviously broken thread unsafe implementation, some with atomic operations, some with mutex-implementations. to run java call ./gradlew clean build -> counter.AtomicIntCounter@25992ae3 expected: 2000000 got: 1000000 in: 22ms counter.AtomicLongCounter@2539f946 expected: 2000000 got: 1000000 in: 17ms counter.ThreadSafe2Counter@527d56c2 expected: 2000000 got: 1000000 in: 33ms counter.ThreadSafe1Counter@6fd8b1a expected: 2000000 got: 1000000 in: 173ms counter.ThreadUnsafeCounter@6bb33878 expected: 2000000 got: 562858 in: 10ms obviously the unsafe implementation is fastest, followed by atomics. the vrsion with reentrant locks performs very well, wheras the implementation with synchronized is the slowest. to run d call dub test (please mark, that the dub test build is configured like this: buildType "unittest" { buildOptions "releaseMode" "optimize" "inline" "unittests" "debugInfo" } , it should be release code speed and quality). -> app.AtomicCounter: got: 1000000 expected: 1000000 in 23 ms, 852 μs, and 6 hnsecs app.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 3 secs, 673 ms, 232 μs, and 6 hnsecs app.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 3 secs, 684 ms, 416 μs, and 2 hnsecs app.ThreadUnsafeCounter: got: 690073 expected: 1000000 in 8 ms and 540 μs from example got: 3 secs, 806 ms, and 258 μs here again, the unsafe implemenation is the fastest, atomic performs in the same ballpark as java only the thread safe variants are far off. thanks for looking into this, best regards, christian |
October 17, 2016 Re: Speed of synchronized | ||||
---|---|---|---|---|
| ||||
Posted in reply to tcak | On 16/10/16 19:50, tcak wrote:
> On Sunday, 16 October 2016 at 08:41:26 UTC, Christian Köstlin wrote:
>> Hi,
>>
>> for an exercise I had to implement a thread safe counter. This is what I came up with:
>>
>> [...]
>
> Could you try that:
>
> class ThreadSafe3Counter: Counter{
> private long counter;
> private core.sync.mutex.Mutex mtx;
>
> public this() shared{
> mtx = cast(shared)( new core.sync.mutex.Mutex );
> }
>
> void increment() shared {
> (cast()mtx).lock();
> scope(exit){ (cast()mtx).unlock(); }
>
> core.atomic.atomicOp!"+="(this.counter, 1);
> }
>
> long get() shared {
> return counter;
> }
> }
>
>
> Unfortunately, there are some stupid design decisions in D about "shared", and some people does not want to accept them.
>
> Example while you are using mutex, so you shouldn't be forced to use atomicOp there. As a programmer, you know that it will be protected already. That is a loss of performance in the long run.
thanks for the implementation. i think this is nicer, than using __gshared. i think using atomic operations and mutexes at the same time, does not make any sense. one or the other.
thanks,
Christian
|
October 17, 2016 Re: Speed of synchronized | ||||
---|---|---|---|---|
| ||||
Posted in reply to Christian Köstlin | Dne 17.10.2016 v 07:55 Christian Köstlin via Digitalmars-d-learn napsal(a):
> to run java call ./gradlew clean build
> ->
> counter.AtomicIntCounter@25992ae3 expected: 2000000 got: 1000000 in: 22ms
> counter.AtomicLongCounter@2539f946 expected: 2000000 got: 1000000 in: 17ms
> counter.ThreadSafe2Counter@527d56c2 expected: 2000000 got: 1000000 in: 33ms
> counter.ThreadSafe1Counter@6fd8b1a expected: 2000000 got: 1000000 in: 173ms
> counter.ThreadUnsafeCounter@6bb33878 expected: 2000000 got: 562858 in: 10ms
>
I am still unable to get your java code working:
[kozak@dajinka threads]$ ./gradlew clean build
:clean
:compileJava
:processResources UP-TO-DATE
:classes
:jar
:assemble
:compileTestJava
:processTestResources UP-TO-DATE
:testClasses
:test
:check
:build
BUILD SUCCESSFUL
Total time: 3.726 secs
How I can run it?
|
October 17, 2016 Re: Speed of synchronized | ||||
---|---|---|---|---|
| ||||
Posted in reply to Daniel Kozak | On Monday, 17 October 2016 at 06:38:08 UTC, Daniel Kozak wrote: > Dne 17.10.2016 v 07:55 Christian Köstlin via Digitalmars-d-learn napsal(a): > >>[...] > I am still unable to get your java code working: > [kozak@dajinka threads]$ ./gradlew clean build > :clean > :compileJava > :processResources UP-TO-DATE > :classes > :jar > :assemble > :compileTestJava > :processTestResources UP-TO-DATE > :testClasses > :test > :check > :build > > BUILD SUCCESSFUL > > Total time: 3.726 secs > > > How I can run it? I have it, it is in build/test-results/test/TEST-counter.CounterTest.xml |
October 17, 2016 Re: Speed of synchronized | ||||
---|---|---|---|---|
| ||||
Posted in reply to Christian Köstlin | Dne 16.10.2016 v 10:41 Christian Köstlin via Digitalmars-d-learn napsal(a): > Hi, > > for an exercise I had to implement a thread safe counter. > This is what I came up with: > .... > > btw. I run the code with dub run --build=release > > Thanks in advance, > Christian So I have done some testing, on my pc: Java result counter.AtomicLongCounter@7ff5e7d8 expected: 2000000 got: 1000000 in: 83ms counter.ThreadSafe2Counter@59b44e4b expected: 2000000 got: 1000000 in: 77ms counter.ThreadSafe1Counter@2e5f6b4b expected: 2000000 got: 1000000 in: 154ms counter.ThreadUnsafeCounter@762b155d expected: 2000000 got: 730428 in: 13ms and my D results (code: http://dpaste.com/3QFXACY ): snip.AtomicCounter: got: 1000000 expected: 1000000 in 77 ms and 783 μs snip.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 287 ms, 727 μs, and 3 hnsecs snip.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 281 ms, 117 μs, and 1 hnsec snip.ThreadSafe3Counter: got: 1000000 expected: 1000000 in 158 ms, 480 μs, and 2 hnsecs snip.ThreadUnsafeCounter: got: 1000000 expected: 1000000 in 6 ms, 682 μs, and 1 hnsec so atomic is same as in Java pthread_mutex is same speed as java synchronized D mutexes and D synchronized are almost same, I belive that if I could setup same attrs as in pthread version it will be around 160ms too. Unsafe is almost same for D and java. Only java ReentrantLock seems to work better. I believe there is some trick, so it will end up not using mutexes in the end at all. For example consider this change in D code: void doIt(alias counter)() { auto thg = new ThreadGroup(); for (int i=0; i<NR_OF_THREADS; ++i) { thg.create(&threadFuncBody!(counter)); } thg.joinAll(); } change it to void doIt(alias counter)() { auto thg = new ThreadGroup(); for (int i=0; i<NR_OF_THREADS; ++i) { auto tc = thg.create(&threadFuncBody!(counter)); tc.join(); } } and results are: snip.AtomicCounter: got: 1000000 expected: 1000000 in 22 ms, 251 μs, and 6 hnsecs snip.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 46 ms, 146 μs, and 3 hnsecs snip.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 44 ms, 961 μs, and 5 hnsecs snip.ThreadSafe3Counter: got: 1000000 expected: 1000000 in 42 ms, 512 μs, and 8 hnsecs snip.ThreadUnsafeCounter: got: 1000000 expected: 1000000 in 2 ms, 108 μs, and 5 hnsecs |
October 17, 2016 Re: Speed of synchronized | ||||
---|---|---|---|---|
| ||||
Posted in reply to Daniel Kozak | On 17/10/16 14:09, Daniel Kozak via Digitalmars-d-learn wrote:
> Dne 16.10.2016 v 10:41 Christian Köstlin via Digitalmars-d-learn napsal(a):
>> Hi,
>>
>> for an exercise I had to implement a thread safe counter.
>> This is what I came up with:
>> ....
>>
>> btw. I run the code with dub run --build=release
>>
>> Thanks in advance,
>> Christian
> So I have done some testing, on my pc:
> Java result
> counter.AtomicLongCounter@7ff5e7d8 expected: 2000000 got: 1000000 in: 83ms
> counter.ThreadSafe2Counter@59b44e4b expected: 2000000 got: 1000000 in: 77ms
> counter.ThreadSafe1Counter@2e5f6b4b expected: 2000000 got: 1000000 in:
> 154ms
> counter.ThreadUnsafeCounter@762b155d expected: 2000000 got: 730428 in: 13ms
>
> and my D results (code: http://dpaste.com/3QFXACY ):
> snip.AtomicCounter: got: 1000000 expected: 1000000 in 77 ms and 783 μs
> snip.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 287 ms, 727
> μs, and 3 hnsecs
> snip.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 281 ms, 117
> μs, and 1 hnsec
> snip.ThreadSafe3Counter: got: 1000000 expected: 1000000 in 158 ms, 480
> μs, and 2 hnsecs
> snip.ThreadUnsafeCounter: got: 1000000 expected: 1000000 in 6 ms, 682
> μs, and 1 hnsec
>
> so atomic is same as in Java pthread_mutex is same speed as java
> synchronized
> D mutexes and D synchronized are almost same, I belive that if I could
> setup same attrs as in pthread version it will be around 160ms too.
>
> Unsafe is almost same for D and java. Only java ReentrantLock seems to work better. I believe there is some trick, so it will end up not using mutexes in the end at all. For example consider this change in D code:
>
> void doIt(alias counter)() {
> auto thg = new ThreadGroup();
> for (int i=0; i<NR_OF_THREADS; ++i) {
> thg.create(&threadFuncBody!(counter));
> }
> thg.joinAll();
> }
>
> change it to
>
> void doIt(alias counter)() {
> auto thg = new ThreadGroup();
> for (int i=0; i<NR_OF_THREADS; ++i) {
> auto tc = thg.create(&threadFuncBody!(counter));
> tc.join();
> }
> }
>
> and results are:
>
> snip.AtomicCounter: got: 1000000 expected: 1000000 in 22 ms, 251 μs, and
> 6 hnsecs
> snip.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 46 ms, 146
> μs, and 3 hnsecs
> snip.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 44 ms, 961
> μs, and 5 hnsecs
> snip.ThreadSafe3Counter: got: 1000000 expected: 1000000 in 42 ms, 512
> μs, and 8 hnsecs
> snip.ThreadUnsafeCounter: got: 1000000 expected: 1000000 in 2 ms, 108
> μs, and 5 hnsecs
>
>
>
>
>
thank you for looking into it.
this seems to be quite good.
I did expect something in those lines, but got the mentioned numbers on
my os x macbook. perhaps its a os x glitch.
|
October 17, 2016 Re: Speed of synchronized | ||||
---|---|---|---|---|
| ||||
Posted in reply to Christian Köstlin | On 17/10/16 14:44, Christian Köstlin wrote: > On 17/10/16 14:09, Daniel Kozak via Digitalmars-d-learn wrote: >> Dne 16.10.2016 v 10:41 Christian Köstlin via Digitalmars-d-learn napsal(a): >>> Hi, >>> >>> for an exercise I had to implement a thread safe counter. >>> This is what I came up with: >>> .... >>> >>> btw. I run the code with dub run --build=release >>> >>> Thanks in advance, >>> Christian >> So I have done some testing, on my pc: >> Java result >> counter.AtomicLongCounter@7ff5e7d8 expected: 2000000 got: 1000000 in: 83ms >> counter.ThreadSafe2Counter@59b44e4b expected: 2000000 got: 1000000 in: 77ms >> counter.ThreadSafe1Counter@2e5f6b4b expected: 2000000 got: 1000000 in: >> 154ms >> counter.ThreadUnsafeCounter@762b155d expected: 2000000 got: 730428 in: 13ms >> >> and my D results (code: http://dpaste.com/3QFXACY ): >> snip.AtomicCounter: got: 1000000 expected: 1000000 in 77 ms and 783 μs >> snip.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 287 ms, 727 >> μs, and 3 hnsecs >> snip.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 281 ms, 117 >> μs, and 1 hnsec >> snip.ThreadSafe3Counter: got: 1000000 expected: 1000000 in 158 ms, 480 >> μs, and 2 hnsecs >> snip.ThreadUnsafeCounter: got: 1000000 expected: 1000000 in 6 ms, 682 >> μs, and 1 hnsec >> >> so atomic is same as in Java pthread_mutex is same speed as java >> synchronized >> D mutexes and D synchronized are almost same, I belive that if I could >> setup same attrs as in pthread version it will be around 160ms too. >> >> Unsafe is almost same for D and java. Only java ReentrantLock seems to work better. I believe there is some trick, so it will end up not using mutexes in the end at all. For example consider this change in D code: >> >> void doIt(alias counter)() { >> auto thg = new ThreadGroup(); >> for (int i=0; i<NR_OF_THREADS; ++i) { >> thg.create(&threadFuncBody!(counter)); >> } >> thg.joinAll(); >> } >> >> change it to >> >> void doIt(alias counter)() { >> auto thg = new ThreadGroup(); >> for (int i=0; i<NR_OF_THREADS; ++i) { >> auto tc = thg.create(&threadFuncBody!(counter)); >> tc.join(); >> } >> } >> >> and results are: >> >> snip.AtomicCounter: got: 1000000 expected: 1000000 in 22 ms, 251 μs, and >> 6 hnsecs >> snip.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 46 ms, 146 >> μs, and 3 hnsecs >> snip.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 44 ms, 961 >> μs, and 5 hnsecs >> snip.ThreadSafe3Counter: got: 1000000 expected: 1000000 in 42 ms, 512 >> μs, and 8 hnsecs >> snip.ThreadUnsafeCounter: got: 1000000 expected: 1000000 in 2 ms, 108 >> μs, and 5 hnsecs >> >> >> >> >> > thank you for looking into it. > this seems to be quite good. > I did expect something in those lines, but got the mentioned numbers on > my os x macbook. perhaps its a os x glitch. > Thanks for the hint about the OS. I rerun the tests on a linux machine, and there everything is fine! linux dlang code: app.AtomicCounter: got: 1000000 expected: 1000000 in 24 ms, 387 μs, and 3 hnsecs app.ThreadSafe1Counter: got: 1000000 expected: 1000000 in 143 ms, 534 μs, and 9 hnsecs app.ThreadSafe2Counter: got: 1000000 expected: 1000000 in 159 ms, 685 μs, and 1 hnsec app.ThreadUnsafeCounter: got: 399937 expected: 1000000 in 9 ms and 556 μs from example got: 156 ms, 198 μs, and 9 hnsecs linux java code: counter.CounterTest > testAtomicIntCounter STANDARD_OUT counter.AtomicIntCounter@1f2a2347 expected: 1000000 got: 1000000 in: 29ms counter.CounterTest > testAtomicLongCounter STANDARD_OUT counter.AtomicLongCounter@675ad891 expected: 1000000 got: 1000000 in: 24ms counter.CounterTest > testThreadSafe2Counter STANDARD_OUT counter.ThreadSafe2Counter@3043c6d2 expected: 1000000 got: 1000000 in: 38ms counter.CounterTest > testThreadSafeCounter STANDARD_OUT counter.ThreadSafe1Counter@bac4ba3 expected: 1000000 got: 1000000 in: 145ms counter.CounterTest > testThreadUnsafeCounter STANDARD_OUT counter.ThreadUnsafeCounter@2fe82bf8 expected: 1000000 got: 433730 in: 9ms Could someone check the numbers on another OS-X machine? Unfortunately I only have one available. Thanks in advance! |
Copyright © 1999-2021 by the D Language Foundation