Thread overview | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
February 17, 2015 LLVM and TLS | ||||
---|---|---|---|---|
| ||||
I've noticed that on my windows 7 development machine, switching between TLS and non-TLS storage has a minimal impact on performance (when using DMD). I haven't tried LDC yet, however, on a macbook pro, which uses clang (LLVM) for the linker, using TLS has a huge performance impact (much much slower). Does anyone know if this is because of the way LLVM handles TLS storage? I'll have to try using LDC on my windows machine but maybe one of you know off hand whether or not LLVM has some performance problems with TLS storage. Thanks! |
February 17, 2015 Re: LLVM and TLS | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan Marler | "Jonathan Marler" <johnnymarler@gmail.com> writes:
> I've noticed that on my windows 7 development machine, switching between TLS and non-TLS storage has a minimal impact on performance (when using DMD). I haven't tried LDC yet, however, on a macbook pro, which uses clang (LLVM) for the linker, using TLS has a huge performance impact (much much slower). Does anyone know if this is because of the way LLVM handles TLS storage? I'll have to try using LDC on my windows machine but maybe one of you know off hand whether or not LLVM has some performance problems with TLS storage. Thanks!
Last time I checked, DMD still did not use OS X native TLS support, but has its own solution. Try LDC and see if the performance improves because LDC uses OS X native TLS.
--
Dan
|
February 17, 2015 Re: LLVM and TLS | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dan Olson | On Tuesday, 17 February 2015 at 06:16:04 UTC, Dan Olson wrote:
> Try LDC and see if the performance improves because LDC uses OS X native TLS.
Is there more information available abput OSX' TLS support and how this is implemented in LDX? What version of OSX is required? I'd very much like to use that for DMD/druntime too, so that we can go on with the shared library support.
|
February 17, 2015 Re: LLVM and TLS | ||||
---|---|---|---|---|
| ||||
Posted in reply to Martin Nowak | On 2015-02-17 07:47, Martin Nowak wrote: > Is there more information available abput OSX' TLS support and how this > is implemented in LDX? What version of OSX is required? I'd very much > like to use that for DMD/druntime too, so that we can go on with the > shared library support. I've created an issue for this, there is some information about the implementation in the issue [1]. OS X 10.7 or later is required. But I'm pretty sure we can back port it to 10.6 if we really want/need to. [1] https://issues.dlang.org/show_bug.cgi?id=9476#c2 -- /Jacob Carlborg |
February 17, 2015 Re: LLVM and TLS | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan Marler | On Tuesday, 17 February 2015 at 02:41:12 UTC, Jonathan Marler wrote: > I've noticed that on my windows 7 development machine, switching between TLS and non-TLS storage has a minimal impact on performance (when using DMD). I haven't tried LDC yet, however, on a macbook pro, which uses clang (LLVM) for the linker, using TLS has a huge performance impact (much much slower). Does anyone know if this is because of the way LLVM handles TLS storage? I'll have to try using LDC on my windows machine but maybe one of you know off hand whether or not LLVM has some performance problems with TLS storage. Thanks! It has little to do with the linker or llvm. dmd doesn't use the native TLS APIs on OS X, as Dan says, because OS X didn't have native TLS back then: http://www.drdobbs.com/architecture-and-design/implementing-thread-local-storage-on-os/228701185 Druntime has since been updated to call pthread_setspecific and pthread_getspecific, but maybe that's still slower than non-TLS on OS X: https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_osx.d#L151 As Dan noted, David got ldc working with the since-added undocumented TLS, ie TLV, functions on OS X: https://github.com/ldc-developers/druntime/blob/ldc/src/ldc/osx_tls.c On Tuesday, 17 February 2015 at 06:47:12 UTC, Martin Nowak wrote: > On Tuesday, 17 February 2015 at 06:16:04 UTC, Dan Olson wrote: >> Try LDC and see if the performance improves because LDC uses OS X native TLS. > > Is there more information available abput OSX' TLS support and how this is implemented in LDX? What version of OSX is required? I'd very much like to use that for DMD/druntime too, so that we can go on with the shared library support. The functions David used were added in 10.7. |
February 17, 2015 Re: LLVM and TLS | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan Marler | Hi Jonathan!
On Tuesday, 17 February 2015 at 02:41:12 UTC, Jonathan Marler wrote:
> I've noticed that on my windows 7 development machine, switching between TLS and non-TLS storage has a minimal impact on performance (when using DMD). I haven't tried LDC yet, however, on a macbook pro, which uses clang (LLVM) for the linker, using TLS has a huge performance impact (much much slower). Does anyone know if this is because of the way LLVM handles TLS storage? I'll have to try using LDC on my windows machine but maybe one of you know off hand whether or not LLVM has some performance problems with TLS storage. Thanks!
On Windows, LLVM uses the segment registers for TLS storage (gs: for 32bit and fs: for 64bit). There is no other impact.
Regards,
Kai
|
February 18, 2015 Re: LLVM and TLS | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jacob Carlborg | I've created a simple program to demonstrate the issue. The performance cost of TLS vs __gshared is over one and a half orders of magnitude! import std.stdio; import std.datetime; size_t tlsGlobal; __gshared size_t sharedGlobal; void main(string[] args) { runTest(3, 10_000_000); } void runTest(size_t runCount, size_t loopCount) { writeln("--------------------------------------------------"); StopWatch sw; for(auto runIndex = 0; runIndex < runCount; runIndex++) { writefln("run %s (loopcount %s)", runIndex + 1, loopCount); sw.reset(); sw.start(); for(size_t i = 0; i < loopCount; i++) { tlsGlobal = i; } sw.stop(); writefln(" TLS : %s milliseconds", sw.peek.msecs); sw.reset(); sw.start(); for(size_t i = 0; i < loopCount; i++) { sharedGlobal = i; } sw.stop(); writefln(" Shared: %s milliseconds", sw.peek.msecs); } } -------------------------------------------------- Output: -------------------------------------------------- run 1 (loopcount 10000000) TLS : 104 milliseconds Shared: 3 milliseconds run 2 (loopcount 10000000) TLS : 97 milliseconds Shared: 4 milliseconds run 3 (loopcount 10000000) TLS : 99 milliseconds Shared: 3 milliseconds |
February 18, 2015 Re: LLVM and TLS | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan Marler | On 2015-02-18 02:41, Jonathan Marler wrote: > I've created a simple program to demonstrate the issue. The performance > cost of TLS vs __gshared is over one and a half orders of magnitude! It would be nice to have a comparison in C as well, which do use the native TLS implementation. -- /Jacob Carlborg |
February 18, 2015 Re: LLVM and TLS | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan Marler | "Jonathan Marler" <johnnymarler@gmail.com> writes: > I've created a simple program to demonstrate the issue. The performance cost of TLS vs __gshared is over one and a half orders of magnitude! > --snip-- I ran on my MacBook to compare DMD and LDC 2.066.1 versions. With LDC, I had to put in an emty asm instruction in the for loops otherwise the optimizer removed all but the last write and timing looked really good (0 milliseconds)! LDC __gshared versus TLS time is a bit better than DMD. $ dmd -O timetls.d $ ./timetls -------------------------------------------------- run 1 (loopcount 10000000) TLS : 93 milliseconds Shared: 6 milliseconds run 2 (loopcount 10000000) TLS : 91 milliseconds Shared: 6 milliseconds run 3 (loopcount 10000000) TLS : 92 milliseconds Shared: 4 milliseconds $ ldmd2 -O3 timetls.d $ ./timetls -------------------------------------------------- run 1 (loopcount 10000000) TLS : 21 milliseconds Shared: 3 milliseconds run 2 (loopcount 10000000) TLS : 22 milliseconds Shared: 5 milliseconds run 3 (loopcount 10000000) TLS : 20 milliseconds Shared: 3 milliseconds |
February 18, 2015 Re: LLVM and TLS | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dan Olson | On Wednesday, 18 February 2015 at 17:03:38 UTC, Dan Olson wrote: > LDC __gshared versus TLS time is a bit better than DMD. > > $ dmd -O timetls.d > $ ./timetls > -------------------------------------------------- > run 1 (loopcount 10000000) > TLS : 93 milliseconds > Shared: 6 milliseconds > run 2 (loopcount 10000000) > TLS : 91 milliseconds > Shared: 6 milliseconds > run 3 (loopcount 10000000) > TLS : 92 milliseconds > Shared: 4 milliseconds > > $ ldmd2 -O3 timetls.d > $ ./timetls > -------------------------------------------------- > run 1 (loopcount 10000000) > TLS : 21 milliseconds > Shared: 3 milliseconds > run 2 (loopcount 10000000) > TLS : 22 milliseconds > Shared: 5 milliseconds > run 3 (loopcount 10000000) > TLS : 20 milliseconds > Shared: 3 milliseconds That's quite a bit better. If I run this using DMD on windows I get almost the same performance: dmd test.d -------------------------------------------------- run 1 (loopcount 10000000) TLS : 28 milliseconds Shared: 25 milliseconds run 2 (loopcount 10000000) TLS : 28 milliseconds Shared: 25 milliseconds run 3 (loopcount 10000000) TLS : 27 milliseconds Shared: 25 milliseconds If I turn on optimization they both take 7 milliseconds. |
Copyright © 1999-2021 by the D Language Foundation