March 27, 2014
On 27 Mar 2014, at 17:01, Dan Olson wrote:
> If nobody is working on the emulated TLS for LDC, I will give it a try. Nothing to lose.

Would be great – I don't think anybody else is working on this right now.

David
March 30, 2014
"Joakim" <joakim@airpost.net> writes:

> On Thursday, 27 March 2014 at 16:01:31 UTC, Dan Olson wrote:
>
>> If nobody is working on the emulated TLS for LDC, I will give it a
>> try.
>> Nothing to lose.
>
> Whatever I do to implement packed TLS in the dmd backend is not going to work for ldc anyway, so nothing stopping you from making your own effort.  You will have to patch llvm also, if the weak symbols bug David pointed out is still around in llvm 3.5.  Let us know what approach you take.

The approach I started with was to make LLVM do the work.  I read through all of the comments in this thread and decided this might be the most fun.

ARMISelLowering.cpp has TLS disabled for all but ELF targets.  I commented out an assertion blocking other targets to see what would happen for iOS (Mach-O).  To my suprise, found that Mach-O tls sections are generated (__thread_vars, __thread_data, .tbss) and populated with the D thread local vars.

The load/store instructions were treating TLS vars like global data though.  So I looked at the Mach-O X86 version and saw what it is trying to do.  LLVM coding is still a mystery to me, but managed after many hours today to hack together something that would turn this D code

module tlsd;
int a;

void test()
{
  a += 4;   // access a
}

into this:

	movw	r0, :lower16:(__D4tlsd1ai-(LPC4_0+4))
	movt	r0, :upper16:(__D4tlsd1ai-(LPC4_0+4))
LPC4_0:
	add	r0, pc
	blx	___tls_get_addr
	ldr	r1, [r0]
	adds	r1, #4
	str	r1, [r0]

...


.tbss __D4tlsd1ai$tlv$init, 4, 2

	.section	__DATA,__thread_vars,thread_local_variables
	.globl	__D4tlsd1ai
__D4tlsd1ai:
	.long	__tlv_bootstrap
	.long	0
	.long	__D4tlsd1ai$tlv$init


The following link helped explain what is going on with the __thread_vars data layout.

http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalVariables.c

Mach-O dyln replaces tlv_bootstrap (thunk) with tlv_get_addr in the
TLVDescriptor (__thread_vars).  My LLVM hack for now is just doing a
direct call to __tls_get_addr instead of indirect to tlv_get_addr.  For
proof of concept (one thread only), I have __tls_get_addr hard wired as
follows:

extern (C)
{
    struct TLVDescriptor
    {
	void*  function(TLVDescriptor*) thunk;
	uint	key;
	uint	offset;
    }

    //void* tlv_get_addr(TLVDescriptor* d)
    //void* __tls_get_addr(void* ptr)
    void* __tls_get_addr(TLVDescriptor* tlvd)
    {
        __gshared static ubyte data[512];

        printf("__tls_get_addr %p \n", tlvd);
        printf("thunk %p, key %u, offset %u\n",
               tlvd.thunk, tlvd.key, tlvd.offset);
        return data.ptr + tlvd.offset;
    }

    void _tlv_bootstrap()
    {
        assert(false, "Should not get here");
    }
}

It looks promising.  Next step is to add in some realistic runtime support.  Not sure if I will base it on dmd's sections-osx or the Apple dyld.  Probably a hybrid.

Eventually will need some help getting the LLVM changes clean instead of my hack job.

Now that I've gone down this path a bit, I am beginning to wonder if changing LLVM to support iOS thread locals will have issues.  Would LLVM want changes that affect Darwin/Mach-O (Apple's turf)?  I suppose they could be optional.
-- 
Dan
March 30, 2014
On Sunday, 30 March 2014 at 08:22:15 UTC, Dan Olson wrote:
> "Joakim" <joakim@airpost.net> writes:
>
>> On Thursday, 27 March 2014 at 16:01:31 UTC, Dan Olson wrote:
>>
>>> If nobody is working on the emulated TLS for LDC, I will give it a
>>> try.
>>> Nothing to lose.
>>
>> Whatever I do to implement packed TLS in the dmd backend is not going
>> to work for ldc anyway, so nothing stopping you from making your own
>> effort.  You will have to patch llvm also, if the weak symbols bug
>> David pointed out is still around in llvm 3.5.  Let us know what
>> approach you take.
>
> The approach I started with was to make LLVM do the work.  I read
> through all of the comments in this thread and decided this might be the
> most fun.
>
> ARMISelLowering.cpp has TLS disabled for all but ELF targets.  I
> commented out an assertion blocking other targets to see what would
> happen for iOS (Mach-O).  To my suprise, found that Mach-O tls sections
> are generated (__thread_vars, __thread_data, .tbss) and populated with
> the D thread local vars.
Nice find, I guess it helps that they have a desktop OS that does it differently.

> The load/store instructions were treating TLS vars like global data
> though.  So I looked at the Mach-O X86 version and saw what it is trying
> to do.  LLVM coding is still a mystery to me, but managed after many
> hours today to hack together something that would turn this D code
>
> module tlsd;
> int a;
>
> void test()
> {
>   a += 4;   // access a
> }
>
> into this:
>
> 	movw	r0, :lower16:(__D4tlsd1ai-(LPC4_0+4))
> 	movt	r0, :upper16:(__D4tlsd1ai-(LPC4_0+4))
> LPC4_0:
> 	add	r0, pc
> 	blx	___tls_get_addr
> 	ldr	r1, [r0]
> 	adds	r1, #4
> 	str	r1, [r0]
>
> ...
>
>
> .tbss __D4tlsd1ai$tlv$init, 4, 2
>
> 	.section	__DATA,__thread_vars,thread_local_variables
> 	.globl	__D4tlsd1ai
> __D4tlsd1ai:
> 	.long	__tlv_bootstrap
> 	.long	0
> 	.long	__D4tlsd1ai$tlv$init
>
>
> The following link helped explain what is going on with the
> __thread_vars data layout.
>
> http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalVariables.c
>
> Mach-O dyln replaces tlv_bootstrap (thunk) with tlv_get_addr in the
> TLVDescriptor (__thread_vars).  My LLVM hack for now is just doing a
> direct call to __tls_get_addr instead of indirect to tlv_get_addr.  For
> proof of concept (one thread only), I have __tls_get_addr hard wired as
> follows:
>
> extern (C)
> {
>     struct TLVDescriptor
>     {
> 	void*  function(TLVDescriptor*) thunk;
> 	uint	key;
> 	uint	offset;
>     }
>
>     //void* tlv_get_addr(TLVDescriptor* d)
>     //void* __tls_get_addr(void* ptr)
>     void* __tls_get_addr(TLVDescriptor* tlvd)
>     {
>         __gshared static ubyte data[512];
>
>         printf("__tls_get_addr %p \n", tlvd);
>         printf("thunk %p, key %u, offset %u\n",
>                tlvd.thunk, tlvd.key, tlvd.offset);
>         return data.ptr + tlvd.offset;
>     }
>
>     void _tlv_bootstrap()
>     {
>         assert(false, "Should not get here");
>     }
> }
>
> It looks promising.  Next step is to add in some realistic runtime
> support.  Not sure if I will base it on dmd's sections-osx or the Apple
> dyld.  Probably a hybrid.

Have you experimented with seeing which of that TLV stuff from OS X that iOS actually supports?  The iOS dyld could be pretty different.  We don't know since they don't release the source for the iOS core like they do for OS X, ie is tlv_get_addr even available in the iOS dyld and does it execute other possible TLS relocations?  Only way to find out is to try it, or somehow inspect their iOS binaries. ;) Their source does show an ARM assembly implementation of tlv_get_address but it's commented out:

http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalHelpers.s

I wonder if it'd be easier to pack your own Mach-O sections rather than figuring out how to access all their sections and reimplementing their TLV functions, assuming they're not available.  You might even be able to do it as an llvm patch since the relevant lib/MC/ files where llvm packs the TLS data into Mach-O sections seem pretty straightforward.

> Eventually will need some help getting the LLVM changes clean instead of
> my hack job.
>
> Now that I've gone down this path a bit, I am beginning to wonder if
> changing LLVM to support iOS thread locals will have issues.  Would LLVM
> want changes that affect Darwin/Mach-O (Apple's turf)?  I suppose they
> could be optional.
I've never submitted anything to llvm, so not really based on anything than speculation, but I doubt they would accept such a patch, doesn't mean we can't use it though. ;)
March 30, 2014
On 2014-03-30 10:22, Dan Olson wrote:
> "Joakim" <joakim@airpost.net> writes:
>
>> On Thursday, 27 March 2014 at 16:01:31 UTC, Dan Olson wrote:
>>
>>> If nobody is working on the emulated TLS for LDC, I will give it a
>>> try.
>>> Nothing to lose.
>>
>> Whatever I do to implement packed TLS in the dmd backend is not going
>> to work for ldc anyway, so nothing stopping you from making your own
>> effort.  You will have to patch llvm also, if the weak symbols bug
>> David pointed out is still around in llvm 3.5.  Let us know what
>> approach you take.
>
> The approach I started with was to make LLVM do the work.  I read
> through all of the comments in this thread and decided this might be the
> most fun.
>
> ARMISelLowering.cpp has TLS disabled for all but ELF targets.  I
> commented out an assertion blocking other targets to see what would
> happen for iOS (Mach-O).  To my suprise, found that Mach-O tls sections
> are generated (__thread_vars, __thread_data, .tbss) and populated with
> the D thread local vars.
>
> The load/store instructions were treating TLS vars like global data
> though.  So I looked at the Mach-O X86 version and saw what it is trying
> to do.  LLVM coding is still a mystery to me, but managed after many
> hours today to hack together something that would turn this D code
>
> module tlsd;
> int a;
>
> void test()
> {
>    a += 4;   // access a
> }
>
> into this:
>
> 	movw	r0, :lower16:(__D4tlsd1ai-(LPC4_0+4))
> 	movt	r0, :upper16:(__D4tlsd1ai-(LPC4_0+4))
> LPC4_0:
> 	add	r0, pc
> 	blx	___tls_get_addr
> 	ldr	r1, [r0]
> 	adds	r1, #4
> 	str	r1, [r0]
>
> ...
>
>
> .tbss __D4tlsd1ai$tlv$init, 4, 2
>
> 	.section	__DATA,__thread_vars,thread_local_variables
> 	.globl	__D4tlsd1ai
> __D4tlsd1ai:
> 	.long	__tlv_bootstrap
> 	.long	0
> 	.long	__D4tlsd1ai$tlv$init
>
>
> The following link helped explain what is going on with the
> __thread_vars data layout.
>
> http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalVariables.c
>
> Mach-O dyln replaces tlv_bootstrap (thunk) with tlv_get_addr in the
> TLVDescriptor (__thread_vars).  My LLVM hack for now is just doing a
> direct call to __tls_get_addr instead of indirect to tlv_get_addr.  For
> proof of concept (one thread only), I have __tls_get_addr hard wired as
> follows:
>
> extern (C)
> {
>      struct TLVDescriptor
>      {
> 	void*  function(TLVDescriptor*) thunk;
> 	uint	key;
> 	uint	offset;
>      }
>
>      //void* tlv_get_addr(TLVDescriptor* d)
>      //void* __tls_get_addr(void* ptr)
>      void* __tls_get_addr(TLVDescriptor* tlvd)
>      {
>          __gshared static ubyte data[512];
>
>          printf("__tls_get_addr %p \n", tlvd);
>          printf("thunk %p, key %u, offset %u\n",
>                 tlvd.thunk, tlvd.key, tlvd.offset);
>          return data.ptr + tlvd.offset;
>      }
>
>      void _tlv_bootstrap()
>      {
>          assert(false, "Should not get here");
>      }
> }
>
> It looks promising.  Next step is to add in some realistic runtime
> support.  Not sure if I will base it on dmd's sections-osx or the Apple
> dyld.  Probably a hybrid.

I would follow the native TLS implementation in OS X, i.e. using "tlv_get_addr", as close as possible. In theory it should be possible to move the code from threadLocalVariables.c and threadLocalHelpers.s directly in to druntime.

Hopefully that would mean the same code for generating TLS access could be used both on OS X and iOS.

-- 
/Jacob Carlborg
March 30, 2014
"Joakim" <joakim@airpost.net> writes:

> Have you experimented with seeing which of that TLV stuff from OS X
> that iOS actually supports?  The iOS dyld could be pretty different.
> We don't know since they don't release the source for the iOS core
> like they do for OS X, ie is tlv_get_addr even available in the iOS
> dyld and does it execute other possible TLS relocations?  Only way to
> find out is to try it, or somehow inspect their iOS binaries. ;) Their
> source does show an ARM assembly implementation of tlv_get_address but
> it's commented out:
> http://www.opensource.apple.com/source/dyld/dyld-210.2.3/src/threadLocalHelpers.s

I did try it in an iOS app.  The function _tlv_bootstrap is unresolved when I link in Xcode using the current iPhoneSDK.  That is why I had to provide a stub.   Pretty sure tlv functions are not available.

> I wonder if it'd be easier to pack your own Mach-O sections rather than figuring out how to access all their sections and reimplementing their TLV functions, assuming they're not available.  You might even be able to do it as an llvm patch since the relevant lib/MC/ files where llvm packs the TLS data into Mach-O sections seem pretty straightforward.

I think we can use their sections and it did not take long to figure out.  Here is what an example link map has for one of my test apps:

0x0004E22C	0x00000084	__DATA	__thread_vars
0x0004E2B0	0x0000000C	__DATA	__thread_data
0x0004E2BC	0x00000024	__DATA	__thread_bss

The _thread_vars section has a TVLDescriptors for each thread local.  It is used for caching the pthread_get/set key and has the variable offset into the thread local chunk of memory that can be initialized by copying _thread_data and _thread_bss (or just zerofill it).

> I've never submitted anything to llvm, so not really based on anything than speculation, but I doubt they would accept such a patch, doesn't mean we can't use it though. ;)

Another thing, Apple might consider the tlv functions and thread local sections a reserved API.

A long way off from submitting anything to App Store.  With the way things change, tlv may show up in a near future sdk, then this just becomes a bridge.
-- 
Dan
March 30, 2014
Jacob Carlborg <doob@me.com> writes:

> I would follow the native TLS implementation in OS X, i.e. using "tlv_get_addr", as close as possible. In theory it should be possible to move the code from threadLocalVariables.c and threadLocalHelpers.s directly in to druntime.
>
> Hopefully that would mean the same code for generating TLS access could be used both on OS X and iOS.

Do think we can just drop the dyld code into druntime? It should work with perhaps some modifications, but I am not familiar with the Apple opensource license. I should read it. It is BSD-like right? Would still need to hook in the garbage collector so it scans the thread local memory.  I'll have to try it tonight.
-- 
Dan
March 30, 2014
"Joakim" <joakim@airpost.net> writes:

> I wonder if it'd be easier to pack your own Mach-O sections rather than figuring out how to access all their sections and reimplementing their TLV functions, assuming they're not available.  You might even be able to do it as an llvm patch since the relevant lib/MC/ files where llvm packs the TLS data into Mach-O sections seem pretty straightforward.

Thinking about this some more. It probably makes sense to have an optional approach that can be used on any target that does not have native TLS. This current approach for iOS will only work for Mach-O. I wonder if the LLVM folks are working toward a generic TLS without OS support.
-- 
Dan
March 30, 2014
On Sunday, 30 March 2014 at 15:24:53 UTC, Dan Olson wrote:
> "Joakim" <joakim@airpost.net> writes:
> I think we can use their sections and it did not take long to figure
> out.  Here is what an example link map has for one of my test apps:
>
> 0x0004E22C	0x00000084	__DATA	__thread_vars
> 0x0004E2B0	0x0000000C	__DATA	__thread_data
> 0x0004E2BC	0x00000024	__DATA	__thread_bss
>
> The _thread_vars section has a TVLDescriptors for each thread local.  It
> is used for caching the pthread_get/set key and has the variable offset
> into the thread local chunk of memory that can be initialized by copying
> _thread_data and _thread_bss (or just zerofill it).
---snip---
> A long way off from submitting anything to App Store.  With the way
> things change, tlv may show up in a near future sdk, then this just
> becomes a bridge.

Hmm, you and Jacob are probably right, it may be better to just follow what they do.

On Sunday, 30 March 2014 at 15:34:08 UTC, Dan Olson wrote:
> Jacob Carlborg <doob@me.com> writes:
>
>> I would follow the native TLS implementation in OS X, i.e. using
>> "tlv_get_addr", as close as possible. In theory it should be possible
>> to move the code from threadLocalVariables.c and threadLocalHelpers.s
>> directly in to druntime.
>>
>> Hopefully that would mean the same code for generating TLS access
>> could be used both on OS X and iOS.
>
> Do think we can just drop the dyld code into druntime? It should work
> with perhaps some modifications, but I am not familiar with the Apple
> opensource license. I should read it. It is BSD-like right?

I think the APSL is more similar to the CDDL, which was Sun's license for OpenSolaris and much of their open-source contributions, and requires that source is provided for APS-licensed files.  I think you could always add an APS-licensed file to druntime and the licenses would not clash, but that would make druntime not completely boost-licensed anymore, as the APSL has additional requirements than the minimal boost license.  It's probably best to just reimplement the necessary functions yourself.

> Would still
> need to hook in the garbage collector so it scans the thread local
> memory.  I'll have to try it tonight.

David did this for the TLV code on OS X a year back, should be pretty straightforward to do something similar to what he did.

On Sunday, 30 March 2014 at 15:44:52 UTC, Dan Olson wrote:
> "Joakim" <joakim@airpost.net> writes:
>
>> I wonder if it'd be easier to pack your own Mach-O sections rather
>> than figuring out how to access all their sections and reimplementing
>> their TLV functions, assuming they're not available.  You might even
>> be able to do it as an llvm patch since the relevant lib/MC/ files
>> where llvm packs the TLS data into Mach-O sections seem pretty
>> straightforward.
>
> Thinking about this some more. It probably makes sense to have an
> optional approach that can be used on any target that does not have
> native TLS. This current approach for iOS will only work for Mach-O. I
> wonder if the LLVM folks are working toward a generic TLS without OS
> support.

Doesn't look like it, plus it'll need to be specialized for each object format, like Mach, ELF, or COFF, anyway.

After looking at the relevant llvm source for packing sections to see how it was working for you with Mach, I wonder if I won't be able to patch some of the existing llvm files for packing TLS data into ELF and get the TLS variables packed easily that way.  I'll try that approach at some point.
March 31, 2014
On 30/03/14 17:34, Dan Olson wrote:

> Do think we can just drop the dyld code into druntime?

Yes, with minor modifications. The TLS related code in dyld is pretty much self contained. I don't see dyld using any functionality that isn't available to a regular application.

> It should work with perhaps some modifications, but I am not familiar with the Apple
> opensource license. I should read it. It is BSD-like right?

The license is a completely different issue. The safest would be to re-implement the code. One can document the existing code and some other can do the implementation.

Regardless of the license, you can still give a try to see if the technical parts work.

> Would still need to hook in the garbage collector so it scans the thread local
> memory.  I'll have to try it tonight.

You'll just need to add a call to druntime in one of the functions in the dyld TLS code. Have a look at:

https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_osx.d

-- 
/Jacob Carlborg
March 31, 2014
On 31 Mar 2014, at 8:25, Jacob Carlborg wrote:
> You'll just need to add a call to druntime in one of the functions in the dyld TLS code. Have a look at:
>
> https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_osx.d

More specifically, for the DMD TLS emulation implementation, this is done in the initTLSRanges() function, which forwards to getTLSBlock(). IIRC, initTLSRanges() is only called for new threads. For the main thread, the TLS ranges is included in the GC ranges detected in initSections().

For LDC on OS X, which makes use of the 10.7+ system-level TLS implementation, the place where this is handled is https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/rt/sections_ldc.d#L296. _d_dyld_getTLSRange uses an undocumented dyld API function (dyld_enumerate_tlv_storage) to get the actual TLS  memory range on the current thread: https://github.com/ldc-developers/druntime/blob/a08f158618eb5d06c42bd4746b782312e937f6b3/src/ldc/osx_tls.c.

David