Jump to page: 1 2
Thread overview
Implementing native TLS on OS X in DMD
Jan 08, 2016
Jacob Carlborg
Jan 08, 2016
David Nadlinger
Jan 08, 2016
Jacob Carlborg
Jan 08, 2016
Jacob Carlborg
Jan 09, 2016
Dan Olson
Jan 09, 2016
Dan Olson
Jan 09, 2016
kinke
Jan 12, 2016
Dan Olson
Jan 12, 2016
kink
Jan 12, 2016
Dan Olson
Jan 10, 2016
Jacob Carlborg
Jan 10, 2016
Jacob Carlborg
January 08, 2016
This might be a bit odd to ask this question in the LDC newsgroup, but since LDC already supports native TLS on OS X I was hoping to get some help here.

I've implemented native TLS on OS X in DMD to the best of my knowledge. The data in the sections look correct, the assembly look correct, I've updated druntime to use the same code, in this regard, as LDC does. Everything seems to work correctly in the simple cases I've tried.

But, I have an issue when the garbage collector is run. In particular when running the DMD test suite. The failing test is this one [1]. I get a segmentation fault (in the debugger, range error) here [2], after executing the outer loop once. I highly suspect that it's the garbage collector that collects "_chars" [3] (or its content) too early, since the destructor of SomeClass [4] is executed. If I make "_chars" __gshared it doesn't crash. If I remove the call to the GC [5], it doesn't crash.

I've been trying to debug this but I don't have much knowledge in this area. What I have found out is that "_chars" is included in the range returned by _d_dyld_getTLSRange [6]. I've been trying to debug the GC, and it looks like "_chars" is marked twice, before crashing. Or at least a range where "_chars" is included.

One thing that worries me though is the range returned by _d_dyld_getTLSRange for LDC is a quite a lot larger (around 3500) than for DMD (around 650). But I noticed that LDC has a couple of additional TLS symbols that DMD doesn't have. If I recall correctly, they looked like they were related to exception handling.

Any ideas what can be wrong or suggestions how to further debug this?

[1] https://github.com/D-Programming-Language/dmd/blob/7a7687e6e5b46ab9629bcdddb3061478c504ae49/test/runnable/testaa.d#L401

[2] https://github.com/D-Programming-Language/dmd/blob/7a7687e6e5b46ab9629bcdddb3061478c504ae49/test/runnable/testaa.d#L410

[3] https://github.com/D-Programming-Language/dmd/blob/7a7687e6e5b46ab9629bcdddb3061478c504ae49/test/runnable/testaa.d#L388

[4] https://github.com/D-Programming-Language/dmd/blob/7a7687e6e5b46ab9629bcdddb3061478c504ae49/test/runnable/testaa.d#L372

[5] https://github.com/D-Programming-Language/dmd/blob/7a7687e6e5b46ab9629bcdddb3061478c504ae49/test/runnable/testaa.d#L413

[6] https://github.com/ldc-developers/druntime/blob/ldc/src/rt/sections_ldc.d#L432

-- 
/Jacob Carlborg
January 08, 2016
On 8 Jan 2016, at 8:37, Jacob Carlborg via digitalmars-d-ldc wrote:
> I've been trying to debug this but I don't have much knowledge in this area. What I have found out is that "_chars" is included in the range returned by _d_dyld_getTLSRange [6]. I've been trying to debug the GC, and it looks like "_chars" is marked twice, before crashing. Or at least a range where "_chars" is included.

It's been a while since I initially looked into getting the TLS to work, but did you check that _chars is properly aligned (i.e. to 8 bytes on x86_64)? This would be one way how the GC could miss the pointer even though the global is contained in a root range.

If that's not it, I'd just continue trying to figure out which objects exactly are collected (not marked) and why.

> If I recall correctly, they looked like they were related to exception handling.

There is currently a per-thread cache for exception handling metadata, yes. It contains a subtle bug, though (related to moving fibers between threads), and will probably go away.

 — David
January 08, 2016
On 2016-01-08 16:32, David Nadlinger via digitalmars-d-ldc wrote:

> It's been a while since I initially looked into getting the TLS to work,
> but did you check that _chars is properly aligned (i.e. to 8 bytes on
> x86_64)? This would be one way how the GC could miss the pointer even
> though the global is contained in a root range.

That seemed to be the issue, it works now. Awesome :) thanks. A followup question:

* I'm looking at the assembly output of LDC, it looks liked LDC aligns to the size of the type, i.e. "int" to 4 and "long" to 8 and so on, is that the case?

* It looks like the only uses the above form of alignment if the symbol is placed in the __thread_bss section, i.e. doesn't have an initializer. Does that make sense? If it's has a initializer and is placed in the __thread_data section it will have the alignment of 3 or 4, depending of the size of the variable.

-- 
/Jacob Carlborg
January 08, 2016
On 2016-01-08 17:40, Jacob Carlborg wrote:

Adding the assembly for convenience

> * I'm looking at the assembly output of LDC, it looks liked LDC aligns
> to the size of the type, i.e. "int" to 4 and "long" to 8 and so on, is
> that the case?

Without initializer:

.tbss __D4main1ai$tlv$init, 4, 3

BTW, do you know that the above 3 is?

> * It looks like the only uses the above form of alignment if the symbol
> is placed in the __thread_bss section, i.e. doesn't have an initializer.
> Does that make sense? If it's has a initializer and is placed in the
> __thread_data section it will have the alignment of 3 or 4, depending of
> the size of the variable.

With initializer:

	.section	__DATA,__thread_data,thread_local_regular
	.align	3
__D4main1ai$tlv$init:
	.long	4

-- 
/Jacob Carlborg
January 09, 2016
Jacob Carlborg <doob@me.com> writes:

> On 2016-01-08 17:40, Jacob Carlborg wrote:
>
> Adding the assembly for convenience
>
>> * I'm looking at the assembly output of LDC, it looks liked LDC aligns to the size of the type, i.e. "int" to 4 and "long" to 8 and so on, is that the case?
>
> Without initializer:
>
> .tbss __D4main1ai$tlv$init, 4, 3
>
> BTW, do you know that the above 3 is?

3 is alignment like .p2align (power of 2 alignment).
2^3 in this case (8-byte)

>> * It looks like the only uses the above form of alignment if the symbol is placed in the __thread_bss section, i.e. doesn't have an initializer. Does that make sense? If it's has a initializer and is placed in the __thread_data section it will have the alignment of 3 or 4, depending of the size of the variable.
>
> With initializer:
>
> 	.section	__DATA,__thread_data,thread_local_regular
> 	.align	3
> __D4main1ai$tlv$init:
> 	.long	4

Same 8-byte alignment (OSX .align is synonym for .p2align).

The tbss and tdata declarations match.
January 09, 2016
Dan Olson <gorox@comcast.net> writes:

> Jacob Carlborg <doob@me.com> writes:
>
>> On 2016-01-08 17:40, Jacob Carlborg wrote:
>>
>> Adding the assembly for convenience
>>
>>> * I'm looking at the assembly output of LDC, it looks liked LDC aligns to the size of the type, i.e. "int" to 4 and "long" to 8 and so on, is that the case?
>>
>> Without initializer:
>>
>> .tbss __D4main1ai$tlv$init, 4, 3
>>
>> BTW, do you know that the above 3 is?
>
> 3 is alignment like .p2align (power of 2 alignment).
> 2^3 in this case (8-byte)
>
>>> * It looks like the only uses the above form of alignment if the symbol is placed in the __thread_bss section, i.e. doesn't have an initializer. Does that make sense? If it's has a initializer and is placed in the __thread_data section it will have the alignment of 3 or 4, depending of the size of the variable.
>>
>> With initializer:
>>
>> 	.section	__DATA,__thread_data,thread_local_regular
>> 	.align	3
>> __D4main1ai$tlv$init:
>> 	.long	4
>
> Same 8-byte alignment (OSX .align is synonym for .p2align).
>
> The tbss and tdata declarations match.

Just re-reading and it looks like alignments in your example are too big for a 4-byte type, assuming var is an int.  .align only needs to be 2 here.

$ cat tls.c
__thread int x;
__thread int y = 42;

$ clang -S tls.c
$ cat tls.s
	.section	__TEXT,__text,regular,pure_instructions
	.macosx_version_min 10, 10
	.section	__DATA,__thread_data,thread_local_regular
	.align	2                       ## @y
_y$tlv$init:
	.long	42                      ## 0x2a

	.section	__DATA,__thread_vars,thread_local_variables
	.globl	_y
_y:
	.quad	__tlv_bootstrap
	.quad	0
	.quad	_y$tlv$init

.tbss _x$tlv$init, 4, 2                 ## @x

	.globl	_x
_x:
	.quad	__tlv_bootstrap
	.quad	0
	.quad	_x$tlv$init


.subsections_via_symbols
January 09, 2016
On Saturday, 9 January 2016 at 20:07:34 UTC, Dan Olson wrote:
> Just re-reading and it looks like alignments in your example are too big for a 4-byte type, assuming var is an int.  .align only needs to be 2 here.

This is probably due to https://github.com/kinke/ldc/commit/a39997d326f0d3da353d8b9f27ffd559e6fcc5d7.
January 10, 2016
On 2016-01-09 20:48, Dan Olson wrote:

>> .tbss __D4main1ai$tlv$init, 4, 3
>>
>> BTW, do you know that the above 3 is?
>
> 3 is alignment like .p2align (power of 2 alignment).
> 2^3 in this case (8-byte)

I thought the four was the alignment. If the three is the alignment, then what is the four? The size of the variable?


> Same 8-byte alignment (OSX .align is synonym for .p2align).
>
> The tbss and tdata declarations match.

Ah, ok. If the second number (3) above is the alignment then it makes sense.

-- 
/Jacob Carlborg
January 10, 2016
On 2016-01-09 21:07, Dan Olson wrote:

> Just re-reading and it looks like alignments in your example are too big
> for a 4-byte type, assuming var is an int.  .align only needs to be 2 here.

The output was from LDC. I noticed that Clang and LDC behaves differently.

-- 
/Jacob Carlborg
January 11, 2016
kinke <noone@nowhere.com> writes:

> On Saturday, 9 January 2016 at 20:07:34 UTC, Dan Olson wrote:
>> Just re-reading and it looks like alignments in your example are too big for a 4-byte type, assuming var is an int.  .align only needs to be 2 here.
>
> This is probably due to https://github.com/kinke/ldc/commit/a39997d326f0d3da353d8b9f27ffd559e6fcc5d7.

I haven't carefully read the commit yet.  Is the extra alignment intended for all vars declarations?  It probably is not a big issue, but the following:

ubyte a,b,c,d,e,f,g,h;

uses 64-bytes versus the 8-bytes from before.

-- 
Dan
« First   ‹ Prev
1 2