March 09, 2014
On Sunday, 9 March 2014 at 05:38:07 UTC, Joakim wrote:
> On Saturday, 8 March 2014 at 22:44:16 UTC, David Nadlinger wrote:
>> Well, yes and no. I was specifically referring to keeping the normal
>> TLS infrastructure (i.e. %gs-based addressing on Linux/x86) in place
>> and just replacing the part that Glibc does (but Bionic doesn't) with
>> a piece of code in druntime. __tls_get_addr isn't necessarily used on
>> x86.
>
> While Android/X86 TLS does use the %gs register (https://github.com/android/platform_bionic/blob/master/libc/private/__get_tls.h#L45), that's not portable and I'd like to try Android/ARM after this, so I'll stick with the pthread_(get|set)specific calls to wrap it:
>
> https://github.com/android/platform_bionic/blob/master/libc/bionic/pthread_key.cpp

You mention "replacing the part that Glibc does (but Bionic doesn't) with a piece of code in druntime."  Just to be clear, you're referring to accessing TLS variables using an offset into the initialization image, which is what ___tls_get_addr from druntime does in Walter's packed TLS approach, right?  If not, I'm not sure exactly what you're referring to.  With all this TLS stuff split up between the compiler, linker, and runtime linker, often undocumented or poorly documenented in the latter two cases, it's been confusing to follow the TLS code path to see what's happening.
March 09, 2014
On 2014-03-09 07:04, Joakim wrote:

> I wondered earlier why you weren't just using Walter's packed TLS
> approach and now I see why, ldc doesn't use it.  Looks like Apple hasn't
> ported the TLV functions which ldc uses to iOS yet either, so you're out
> of luck there too.  I guess you'll have to port Walter's approach to ldc
> to get TLS working on iOS:

I think it would be possible to implement the missing TLV functions our self in druntime. Hopefully this would allow to use the same TLS approach both on OS X and on iOS.

-- 
/Jacob Carlborg
March 09, 2014
On Sunday, 9 March 2014 at 09:55:33 UTC, Jacob Carlborg wrote:
> On 2014-03-09 07:04, Joakim wrote:
>
>> I wondered earlier why you weren't just using Walter's packed TLS
>> approach and now I see why, ldc doesn't use it.  Looks like Apple hasn't
>> ported the TLV functions which ldc uses to iOS yet either, so you're out
>> of luck there too.  I guess you'll have to port Walter's approach to ldc
>> to get TLS working on iOS:
>
> I think it would be possible to implement the missing TLV functions our self in druntime. Hopefully this would allow to use the same TLS approach both on OS X and on iOS.

OK, I assumed OS support was necessary, maybe not.

On Saturday, 8 March 2014 at 18:16:58 UTC, Joakim wrote:
> On Saturday, 8 March 2014 at 14:25:43 UTC, David Nadlinger wrote:
>> However, there is a third options which might be worth investigating, namely re-implementing at least parts of the necessary runtime linker features in druntime and continuing to use the same scheme as on GNU Linux/x86. This depends on %gs not being used in another way, etc. though.
> I tried to reuse the existing dl_iterate_phdr approach on Android, but then I noticed that the dl_phdr_info struct defined in bionic doesn't include the dlpi_tls_modid and dlpi_tls_data members.  However, now that you mention it, maybe those aren't strictly necessary, as long as I'm not worried about shared libraries.  I'll look into it further.

Speaking of OS support, I just tried this and I was able to access the TLS initialization image using dl_phdr_info on Android/x86.  Those dlpi_tls_* members are not necessary, though I'm guessing dlpi_tls_modid would be for shared library support.  Now I just have to figure out some way to have the TLS relocations access the initialization image, presumably the way Walter does it for dmd/OSX.
March 09, 2014
On 2014-03-09 11:11, Joakim wrote:

> OK, I assumed OS support was necessary, maybe not.

Well, yes. In this case the OS support comes in the form of the dynamic linker. We can do the same as the dynamic linker does in druntime. I don't know if it helps but the dynamic linker on OS X has code for tlv_get_addr for ARM, but it's disabled.

-- 
/Jacob Carlborg
March 09, 2014
On 9 Mar 2014, at 8:36, Joakim wrote:
> You mention "replacing the part that Glibc does (but Bionic doesn't) with a piece of code in druntime."  Just to be clear, you're referring to accessing TLS variables using an offset into the initialization image, which is what ___tls_get_addr from druntime does in Walter's packed TLS approach, right?  If not, I'm not sure exactly what you're referring to.  With all this TLS stuff split up between the compiler, linker, and runtime linker, often undocumented or poorly documenented in the latter two cases, it's been confusing to follow the TLS code path to see what's happening.

There are several possible ABIs for thread-local storage. For the sake of this argument, let's assume that our particular system works like the Linux/x86 implementation or Walter's OS X approach in that the TLS storage area is simply a flat block of memory where the individual variables reside at some offset. Then, there is still the question of how the application knows a) the base address of the block and b) the offset of the variable of interest.

In Walter's OS X implementation, both is taken care of by __tls_get_addr, which expects a pointer into the section where the TLS initialization data is stored. On e.g. Linux/x86_64, however, the base address is stored in %fs, and the offset is provided by special linker relocations (which essentially evaluate to the offset of a given symbol from the beginning of the initialization image). No extra function calls are inserted by the compiler here to access TLS data, and the (C) runtime is not directly involved for the accesses.

For an overview of the different models, see http://www.akkadia.org/drepper/tls.pdf (which is the most comprehensive document I could find, in spite of what you might think about the author).

But regardless of what model is chosen, there is still the issue of actually setting up a copy of the data for each thread during initialization. This is what I was referring to when I mentioned "replacing the part that Glibc does (but Bionic doesn't) with a piece of code in druntime".

So, if %gs works as expected on Android and the linker supports the necessary relocations, then it might be an option to simply use the existing TLS implementation in LLVM and simply provide the missing bits in druntime. On the other hand, if you choose to go with an entirely different TLS scheme (such as the DMD OS X implementation), you need to figure out how to change the codegen to emit the extra function calls to your __tls_get_addr analog, etc. Looking at llvm/lib/Target/ARM/ARMISelLowering.cpp, there might actually be a working implementation for this in LLVM already (which I didn't realize before), so this route would not necessarily be more complex than going with a different scheme. You'd probably just need to provide the __tls_get_addr implementation in druntime and figure out how LLVM emits the TLS image resp. how to get its base address.

Hope this helps,
David
March 09, 2014
On Sunday, 9 March 2014 at 16:12:19 UTC, David Nadlinger wrote:
> On 9 Mar 2014, at 8:36, Joakim wrote:
>> You mention "replacing the part that Glibc does (but Bionic doesn't) with a piece of code in druntime."  Just to be clear, you're referring to accessing TLS variables using an offset into the initialization image, which is what ___tls_get_addr from druntime does in Walter's packed TLS approach, right?  If not, I'm not sure exactly what you're referring to.  With all this TLS stuff split up between the compiler, linker, and runtime linker, often undocumented or poorly documenented in the latter two cases, it's been confusing to follow the TLS code path to see what's happening.
>
> There are several possible ABIs for thread-local storage. For the sake of this argument, let's assume that our particular system works like the Linux/x86 implementation or Walter's OS X approach in that the TLS storage area is simply a flat block of memory where the individual variables reside at some offset. Then, there is still the question of how the application knows a) the base address of the block and b) the offset of the variable of interest.
>
> In Walter's OS X implementation, both is taken care of by __tls_get_addr, which expects a pointer into the section where the TLS initialization data is stored. On e.g. Linux/x86_64, however, the base address is stored in %fs, and the offset is provided by special linker relocations (which essentially evaluate to the offset of a given symbol from the beginning of the initialization image). No extra function calls are inserted by the compiler here to access TLS data, and the (C) runtime is not directly involved for the accesses.
>
> For an overview of the different models, see http://www.akkadia.org/drepper/tls.pdf (which is the most comprehensive document I could find, in spite of what you might think about the author).
Yeah, I've had that pdf loaded in my browser for the last couple months, skimmed some of it initially and I've been slowly going through it in more detail.  I tried simply loading a binary built using bracketed sections and the linker's current TLS relocations, ie no extra function calls, in Android/x86 and I got some other random data in the resulting TLS initialization image.  I think this is because bionic stores the pthread_setspecific-created void* pointers in the normal TLS area, so you can't just use the TLS relocations that dmd and the gold linker generate for linux/x86 on Android/x86, ie using the %gs register directly.

I have no opinion on the author, should I? ;)

> But regardless of what model is chosen, there is still the issue of actually setting up a copy of the data for each thread during initialization. This is what I was referring to when I mentioned "replacing the part that Glibc does (but Bionic doesn't) with a piece of code in druntime".

I was finally able to access a proper initialization image created by dmd in druntime on Android/x86 a couple hours back, by using dl_phdr_info similarly to what is done on linux now.

> So, if %gs works as expected on Android and the linker supports the necessary relocations, then it might be an option to simply use the existing TLS implementation in LLVM and simply provide the missing bits in druntime. On the other hand, if you choose to go with an entirely different TLS scheme (such as the DMD OS X implementation), you need to figure out how to change the codegen to emit the extra function calls to your __tls_get_addr analog, etc. Looking at llvm/lib/Target/ARM/ARMISelLowering.cpp, there might actually be a working implementation for this in LLVM already (which I didn't realize before), so this route would not necessarily be more complex than going with a different scheme. You'd probably just need to provide the __tls_get_addr implementation in druntime and figure out how LLVM emits the TLS image resp. how to get its base address.
I think this is the best route, with the advantage that if my ___tls_get_addr uses pthread_(get|set)specific, it will likely just work on ARM too.  I thought I'd have to get ldc to generate slightly different IR to do this, but it'd be great if llvm already does this.  I had briefly looked at X86ISelLowering.cpp but not the ARM one, I'll see what it does.

> Hope this helps,
> David
Yeah, I think we're on the same page, thanks for the explanation.  I've just been learning about TLS recently, so I wasn't sure before.
March 17, 2014
On Sunday, 9 March 2014 at 18:23:00 UTC, Joakim wrote:
> On Sunday, 9 March 2014 at 16:12:19 UTC, David Nadlinger wrote:
>> So, if %gs works as expected on Android and the linker supports the necessary relocations, then it might be an option to simply use the existing TLS implementation in LLVM and simply provide the missing bits in druntime. On the other hand, if you choose to go with an entirely different TLS scheme (such as the DMD OS X implementation), you need to figure out how to change the codegen to emit the extra function calls to your __tls_get_addr analog, etc. Looking at llvm/lib/Target/ARM/ARMISelLowering.cpp, there might actually be a working implementation for this in LLVM already (which I didn't realize before), so this route would not necessarily be more complex than going with a different scheme. You'd probably just need to provide the __tls_get_addr implementation in druntime and figure out how LLVM emits the TLS image resp. how to get its base address.
> I think this is the best route, with the advantage that if my ___tls_get_addr uses pthread_(get|set)specific, it will likely just work on ARM too.  I thought I'd have to get ldc to generate slightly different IR to do this, but it'd be great if llvm already does this.  I had briefly looked at X86ISelLowering.cpp but not the ARM one, I'll see what it does.

Alright, I looked into the ARM and X86 assembly lowering source and it appears that those __tls_get_addr calls are simply the ones put in for the dynamic thread models.  I tried hijacking those ___tls_get_addr calls by compiling all code as PIC, which forces a dynamic thread model in llvm that puts in the __tls_get_addr function calls, and then building as a shared library, which causes the gold linker to disable any linker optimizations that remove those calls.  However, the resulting shared library would not run because there are still a few TLS relocations from the GOT for the dynamic linker to execute and the Android dynamic linker doesn't do those TLS relocations.

So that was a deadend, looks like it's back to the packed TLS approach and having ldc generate IR that calls my __tls_get_addr manually.
March 20, 2014
On Monday, 17 March 2014 at 10:25:22 UTC, Joakim wrote:
> So that was a deadend, looks like it's back to the packed TLS approach and having ldc generate IR that calls my __tls_get_addr manually.

Since packed TLS looks like the way this needs to be done, any chance one of the ldc developers might be able to toss this off?

This is the first time I've ever tinkered with a compiler, so it will very likely take me longer than it would take one of you.  Right now, I'm looking at hacking dmd to do this, as that seems like the fastest route to get something working, but obviously ldc will need it too for Android/ARM and the dmd patch is not going to be reusable for ldc.

If not, not a big deal, I'm sure I'll get something working eventually.
March 27, 2014
Any TLS progress out there in LDC-land?

To pass thread/fiber unittests on iOS, I put in temporary workaround using pthread_get/setspecific directly for the two threadlocals (Thread.sm_this and Fiber.sm_this).  Now I can pass 74 of 85 druntime/phobos unittests on iOS.

If nobody is working on the emulated TLS for LDC, I will give it a try. Nothing to lose.
-- 
Dan
March 27, 2014
On Thursday, 27 March 2014 at 16:01:31 UTC, Dan Olson wrote:
> Any TLS progress out there in LDC-land?

I've been familiarizing myself with the relevant dmd backend source, but haven't tried anything yet.

> To pass thread/fiber unittests on iOS, I put in temporary workaround
> using pthread_get/setspecific directly for the two threadlocals
> (Thread.sm_this and Fiber.sm_this).  Now I can pass 74 of 85
> druntime/phobos unittests on iOS.

I thought about doing the same, but didn't bother since I was able to get all of druntime's unit tests to pass by using Android's limited and flaky TLS support, left over from the linux kernel.

> If nobody is working on the emulated TLS for LDC, I will give it a try.
> Nothing to lose.

Whatever I do to implement packed TLS in the dmd backend is not going to work for ldc anyway, so nothing stopping you from making your own effort.  You will have to patch llvm also, if the weak symbols bug David pointed out is still around in llvm 3.5.  Let us know what approach you take.