Thread overview
TLS/GC issues on macOS
Jan 03, 2019
Sönke Ludwig
Jan 03, 2019
Jacob Carlborg
Jan 03, 2019
David Nadlinger
Jan 06, 2019
Jacob Carlborg
Jan 03, 2019
David Nadlinger
Jan 07, 2019
Sönke Ludwig
Jan 07, 2019
David Nadlinger
January 03, 2019
There appears to be a GC visibility issue of TLS sections on macOS when multiple threads are involved. I added a test case to what looks like the same issue: https://github.com/ldc-developers/ldc/issues/2187

The question is, does anyone have enough insight on the TLS implementation to make a quick diagnosis or fix?

This is biting us pretty hard in our upcoming commercial project (DMD has codegen issues on macOS), so any help would be highly appreciated.
January 03, 2019
On 2019-01-03 13:58, Sönke Ludwig wrote:
> There appears to be a GC visibility issue of TLS sections on macOS when multiple threads are involved. I added a test case to what looks like the same issue: https://github.com/ldc-developers/ldc/issues/2187
> 
> The question is, does anyone have enough insight on the TLS implementation to make a quick diagnosis or fix?
> 
> This is biting us pretty hard in our upcoming commercial project (DMD has codegen issues on macOS), so any help would be highly appreciated.

I haven't tested anything, but I noticed this [1] difference between LDC and DMD.

[1] https://github.com/ldc-developers/druntime/blob/4e745768c79f6b72f03af108a845145e382e71e4/src/rt/sections_elf_shared.d#L1005-L1011

-- 
/Jacob Carlborg
January 03, 2019
On 3 Jan 2019, at 19:35, Jacob Carlborg via digitalmars-d-ldc wrote:
> I haven't tested anything, but I noticed this [1] difference between LDC and DMD.
>
> [1] https://github.com/ldc-developers/druntime/blob/4e745768c79f6b72f03af108a845145e382e71e4/src/rt/sections_elf_shared.d#L1005-L1011

Doesn't DMD still use the old, non-shared-library rt.sections implementation on OS X?

 — David

January 03, 2019
On 3 Jan 2019, at 12:58, Sönke Ludwig via digitalmars-d-ldc wrote:
> There appears to be a GC visibility issue of TLS sections on macOS when multiple threads are involved. I added a test case to what looks like the same issue: https://github.com/ldc-developers/ldc/issues/2187

Try this patch:

```
diff --git a/src/rt/sections_elf_shared.d b/src/rt/sections_elf_shared.d
index a7b3336a..bc84a116 100644
--- a/src/rt/sections_elf_shared.d
+++ b/src/rt/sections_elf_shared.d
@@ -296,6 +296,20 @@ else
         {
             _tlsRanges = cast(Array!(void[])*)calloc(1, Array!(void[]).sizeof);
             _tlsRanges || assert(0, "Could not allocate TLS range storage");
+
+            version (Shared) {} else
+            {
+                version (Linux)
+                {
+                    // Nothing to do; glibc allocates the TLS area for additional
+                    // threads at the beginning of the stack space, so they will already
+                    // be scanned.
+                }
+                else version (OSX)
+                {
+                    _tlsRanges.insertBack(getTLSRange(&dummyTlsSymbol));
+                }
+            }
         }
         return _tlsRanges;
     }
```

Still trying to figure out how much of that code can be stripped away for non-Shared builds (it's a mess, as DMD partially supports shared libraries on Linux with a static runtime), or how on earth nobody has found this before.

Best,
David
January 06, 2019
On 2019-01-03 20:55, David Nadlinger wrote:

> Doesn't DMD still use the old, non-shared-library rt.sections implementation on OS X?

Yes.

-- 
/Jacob Carlborg
January 07, 2019
Am 03.01.2019 um 21:18 schrieb David Nadlinger:
> On 3 Jan 2019, at 12:58, Sönke Ludwig via digitalmars-d-ldc wrote:
>> There appears to be a GC visibility issue of TLS sections on macOS when multiple threads are involved. I added a test case to what looks like the same issue: https://github.com/ldc-developers/ldc/issues/2187
> 
> Try this patch:
> 
> (...)
> 
> Still trying to figure out how much of that code can be stripped away for non-Shared builds (it's a mess, as DMD partially supports shared libraries on Linux with a static runtime), or how on earth nobody has found this before.
> 
> Best,
> David

That appears to fix it indeed. I still have to run the actual test case, but the application runs fine now.

Many thanks for the quick solution!
January 07, 2019
On 7 Jan 2019, at 8:02, Sönke Ludwig via digitalmars-d-ldc wrote:
> That appears to fix it indeed. I still have to run the actual test case, but the application runs fine now.
>
> Many thanks for the quick solution!

Okay, good – I'll see about upstreaming a proper fix asap.

 — David