View mode: basic / threaded / horizontal-split · Log in · Help
March 18, 2012
Supporting emulated tls
I thought about supporting emulated tls a little. The GCC emutls.c
implementation currently can't work with the gc, as every TLS variable
is allocated individually and therefore we don't have a contiguous
memory region for the gc. I think these are the possible solutions:

* Try to fix GCCs emutls to allocate all tls memory for a module
 (application/shared object) at once. That's the best solution
 and native TLS works this way, but I'm not sure if we can extract
 enough information from the runtime linker to make this work (we
 need at least the combined size of all tls variables).

* Provide a callback in GCC's emutls which is called after every
 allocation. This could call GC.addRange for every variable, but I
 guess adding huge amounts of ranges is slow.

* Make it possible to register a custom allocator for GCC's emutls (not
 sure if possible, as this would have to be set up very early in
 application startup). Then allocate the memory directly from the GC
 (but this memory should only be scanned, not collected) 

* Replace the calls to mallloc in emutls.c with a custom, region based
 memory allocator. (This is not a perfect solution though, it can
 always happen that we'll need more memory)



* Do not use GCC's emutls at all, roll a custom solution. This could be
 compatible with / based on dmd's tls emulation for OSX. Most of the
 implementation is in core.thread, all that's necessary is to group
 the tls data into a _tls_data_array and call ___tls_get_addr for
 every tls access. I'm not sure if this can be done in the
 'middle-end' though and it doesn't support shared libraries yet.
March 18, 2012
Re: Supporting emulated tls
On 18 March 2012 11:32, Johannes Pfau <nospam@example.com> wrote:
> I thought about supporting emulated tls a little. The GCC emutls.c
> implementation currently can't work with the gc, as every TLS variable
> is allocated individually and therefore we don't have a contiguous
> memory region for the gc. I think these are the possible solutions:
>
> * Try to fix GCCs emutls to allocate all tls memory for a module
>  (application/shared object) at once. That's the best solution
>  and native TLS works this way, but I'm not sure if we can extract
>  enough information from the runtime linker to make this work (we
>  need at least the combined size of all tls variables).
>
> * Provide a callback in GCC's emutls which is called after every
>  allocation. This could call GC.addRange for every variable, but I
>  guess adding huge amounts of ranges is slow.
>

Painfully slow.


> * Make it possible to register a custom allocator for GCC's emutls (not
>  sure if possible, as this would have to be set up very early in
>  application startup). Then allocate the memory directly from the GC
>  (but this memory should only be scanned, not collected)
>
> * Replace the calls to mallloc in emutls.c with a custom, region based
>  memory allocator. (This is not a perfect solution though, it can
>  always happen that we'll need more memory)
>
>
>
> * Do not use GCC's emutls at all, roll a custom solution. This could be
>  compatible with / based on dmd's tls emulation for OSX. Most of the
>  implementation is in core.thread, all that's necessary is to group
>  the tls data into a _tls_data_array and call ___tls_get_addr for
>  every tls access. I'm not sure if this can be done in the
>  'middle-end' though and it doesn't support shared libraries yet.
>

If we are going to fix TLS, I'd rather it be in the most platform
agnostic way possible, if it could be helped. That would mean also
scrapping the current implementation on Linux (just tries to mimic
what dmd does, and has corner cases where it doesn't always get it
right).




-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
March 18, 2012
Re: Supporting emulated tls
On 18-03-2012 12:32, Johannes Pfau wrote:
> I thought about supporting emulated tls a little. The GCC emutls.c
> implementation currently can't work with the gc, as every TLS variable
> is allocated individually and therefore we don't have a contiguous
> memory region for the gc. I think these are the possible solutions:
>
> * Try to fix GCCs emutls to allocate all tls memory for a module
>    (application/shared object) at once. That's the best solution
>    and native TLS works this way, but I'm not sure if we can extract
>    enough information from the runtime linker to make this work (we
>    need at least the combined size of all tls variables).
>
> * Provide a callback in GCC's emutls which is called after every
>    allocation. This could call GC.addRange for every variable, but I
>    guess adding huge amounts of ranges is slow.

We should avoid this if possible, yes. A small root set is desirable.

>
> * Make it possible to register a custom allocator for GCC's emutls (not
>    sure if possible, as this would have to be set up very early in
>    application startup). Then allocate the memory directly from the GC
>    (but this memory should only be scanned, not collected)

Such an allocator would probably just allocate a decently-sized memory 
block from libc and add it as a root range (rather than individual 
word-sized roots). The memory doesn't necessarily have to be allocated 
with the GC.

>
> * Replace the calls to mallloc in emutls.c with a custom, region based
>    memory allocator. (This is not a perfect solution though, it can
>    always happen that we'll need more memory)
>
>
>
> * Do not use GCC's emutls at all, roll a custom solution. This could be
>    compatible with / based on dmd's tls emulation for OSX. Most of the
>    implementation is in core.thread, all that's necessary is to group
>    the tls data into a _tls_data_array and call ___tls_get_addr for
>    every tls access. I'm not sure if this can be done in the
>    'middle-end' though and it doesn't support shared libraries yet.
>


-- 
- Alex
March 18, 2012
Re: Supporting emulated tls
Am Sun, 18 Mar 2012 12:21:51 +0000
schrieb Iain Buclaw <ibuclaw@ubuntu.com>:

> On 18 March 2012 11:32, Johannes Pfau <nospam@example.com> wrote:
> > I thought about supporting emulated tls a little. The GCC emutls.c
> > implementation currently can't work with the gc, as every TLS
> > variable is allocated individually and therefore we don't have a
> > contiguous memory region for the gc. I think these are the possible
> > solutions:
> >
> > * Try to fix GCCs emutls to allocate all tls memory for a module
> >  (application/shared object) at once. That's the best solution
> >  and native TLS works this way, but I'm not sure if we can extract
> >  enough information from the runtime linker to make this work (we
> >  need at least the combined size of all tls variables).
> >
> > * Provide a callback in GCC's emutls which is called after every
> >  allocation. This could call GC.addRange for every variable, but I
> >  guess adding huge amounts of ranges is slow.
> >
> 
> Painfully slow.
> 
> 
> > * Make it possible to register a custom allocator for GCC's emutls
> > (not sure if possible, as this would have to be set up very early in
> >  application startup). Then allocate the memory directly from the GC
> >  (but this memory should only be scanned, not collected)
> >
> > * Replace the calls to mallloc in emutls.c with a custom, region
> > based memory allocator. (This is not a perfect solution though, it
> > can always happen that we'll need more memory)
> >
> >
> >
> > * Do not use GCC's emutls at all, roll a custom solution. This
> > could be compatible with / based on dmd's tls emulation for OSX.
> > Most of the implementation is in core.thread, all that's necessary
> > is to group the tls data into a _tls_data_array and call
> > ___tls_get_addr for every tls access. I'm not sure if this can be
> > done in the 'middle-end' though and it doesn't support shared
> > libraries yet.
> >
> 
> If we are going to fix TLS, I'd rather it be in the most platform
> agnostic way possible, if it could be helped. That would mean also
> scrapping the current implementation on Linux (just tries to mimic
> what dmd does, and has corner cases where it doesn't always get it
> right).

You mean getting rid of __tls_beg and __tls_end? I'd also like to
remove those, but:

TLS is mostly object-format specific (not as much OS specific). The ELF
implementation lays out the TLS data for a module (module = shared
library or the application) in a contiguous way. The details are
described in "ELF Handling For Thread-Local
Storage" (www.akkadia.org/drepper/tls.pdf).

The GC requires the TLS blocks to be contiguous, this is not the case
for GCC's emulated TLS and this causes issues there.

For native TLS/ELF this requirement is met, but the GC also has to know
the start and the size of the TLS sections. Although the runtime
linker has this information, there's no standard way to access it. So
we could:

* Add a custom extension API to the C libraries. We'd need at least: A
 'tls_range dl_get_tls_range(void *handle)' function related to the
 dl* set of funtions in the runtime linker, and a 'tls_range
 dl_get_tls_range2(struct dl_phdr_info *info)' to be used with
 dl_iterate_phdr. We also need some way to get the tls range for the
 application, 'get_app_tls_range' (although some libcs also return
 the application module in dl_iterate_phdr).

This seems to be the best way, but we'd have to patch every C library
and it would take some time till those updated C libraries are widely
deployed.

The other solution is to hook directly into each C libraries non-public
(and maybe non-stable!) API. For example, the structure returned by BSD
libc's dl_iterate_phdr and dlopen has these fields:

int tlsindex;		/* Index in DTV for this module
void *tlsinit;		/* Base address of TLS init block
size_t tlsinitsize;	/* Size of TLS init block for this module
size_t tlssize;	/* Size of TLS block for this module
size_t tlsoffset;	/* Offset of static TLS block for this module 
size_t tlsalign;	/* Alignment of static TLS block

tlsindex gives us the start-address of the TLS for every thread, as
long as we know how to compute the TLS address from the TP (thread
pointer) and the dtv index (there are basically 2 methods, described in
"ELF Handling For Thread-Local Storage") and tlssize gives us the size.


However, there doesn't seem to be a painless way to do this...
March 18, 2012
Re: Supporting emulated tls
On 2012-03-18 12:32, Johannes Pfau wrote:
> I thought about supporting emulated tls a little. The GCC emutls.c
> implementation currently can't work with the gc, as every TLS variable
> is allocated individually and therefore we don't have a contiguous
> memory region for the gc. I think these are the possible solutions:

Why not use the native TLS implementation when available and roll our 
own, like DMD on Mac OS X, when none exists?

BTW, I think it would be possible to emulate TLS in a very similar way 
to how it's implemented natively for ELF.

-- 
/Jacob Carlborg
March 18, 2012
Re: Supporting emulated tls
On 2012-03-18 19:39, Johannes Pfau wrote:

> You mean getting rid of __tls_beg and __tls_end? I'd also like to
> remove those, but:

__tls_beg and __tls_end is not used by Mac OS X any more:

https://github.com/D-Programming-Language/druntime/commit/73cf2c150665cb17d9365a6e3d6cf144d76312d6

https://github.com/D-Programming-Language/dmd/commit/054c525edba048ad7829dd5ec2d8d9261a6517c3

> TLS is mostly object-format specific (not as much OS specific). The ELF
> implementation lays out the TLS data for a module (module = shared
> library or the application) in a contiguous way. The details are
> described in "ELF Handling For Thread-Local
> Storage" (www.akkadia.org/drepper/tls.pdf).
>

Mac OS X 10.7 + supports TLS natively. But I don't know where to find 
documentation about it. It always possible to look at the source code.

-- 
/Jacob Carlborg
March 19, 2012
Re: Supporting emulated tls
Am Sun, 18 Mar 2012 21:57:57 +0100
schrieb Jacob Carlborg <doob@me.com>:

> On 2012-03-18 12:32, Johannes Pfau wrote:
> > I thought about supporting emulated tls a little. The GCC emutls.c
> > implementation currently can't work with the gc, as every TLS
> > variable is allocated individually and therefore we don't have a
> > contiguous memory region for the gc. I think these are the possible
> > solutions:
> 
> Why not use the native TLS implementation when available and roll our 
> own, like DMD on Mac OS X, when none exists?

That's what we (mostly) do right now. We have 2 issues:

* Our own, emulated TLS support is implemented in GCC. This means it's
 also used in C, which is great. Also GCC's emulated tls needs
 absolutely no special features in the runtime linker, compile time
 linker or language frontends. It's very portable and works with all
 weird combinations of dynamic libraries, dlopen, etc.
 But it has one quirk: It doesn't allocate TLS memory in a contiguous
 way, every tls variable is allocated using malloc. This means we
 can't pass a range to the GC for the tls variables. So we can't
 support this emutls in the GC.

* The other issue with native TLS is that using bracketing with
 __tls_beg and __tls_end has corner cases where it doesn't work. We'd
 need an alternative to locate the TLS memory addresses and TLS sizes.
 But there's no standard or public API to do that.

> BTW, I think it would be possible to emulate TLS in a very similar
> way to how it's implemented natively for ELF.
> 

I don't think it's that easy. For example, how would you assign module
ids? For native TLS this is partially done by the compile time linker
(for the main application and libraries that are always loaded), but if
no native TLS is available, we can't rely on the linker to do that. We
also need some way to get the current module id in running code.

And how do we get the TLS initialization data? If we placed it into an
array, like DMD does on OSX, we could use dlsym for dlopened libraries,
but what about initially loaded libraries?

Say you have application 'app', which depends on 'liba' and 'libb'. All
of these have TLS data. Maybe we could implement something using
dl_iterate_phdr, but that's a nonstandard extension.

Compare that to GCC's emulation, which is probably slow, but 'just
works' everywhere (except for the GC :-( ).
March 19, 2012
Re: Supporting emulated tls
Am Sun, 18 Mar 2012 22:06:41 +0100
schrieb Jacob Carlborg <doob@me.com>:

> On 2012-03-18 19:39, Johannes Pfau wrote:
> 
> > You mean getting rid of __tls_beg and __tls_end? I'd also like to
> > remove those, but:
> 
> __tls_beg and __tls_end is not used by Mac OS X any more:
> 
> https://github.com/D-Programming-Language/druntime/commit/73cf2c150665cb17d9365a6e3d6cf144d76312d6
> 
> https://github.com/D-Programming-Language/dmd/commit/054c525edba048ad7829dd5ec2d8d9261a6517c3

Yes, but OSX still uses emulated tls. With the way dmd emulates TLS
it's possible to remove __tls_beg and __tls_end, but for native TLS
those symbols are still needed. However, as the runtime linker (ld.so)
has got the necessary information, it's possible that OSX even offers a
API to access it. It's just that most C libraries don't provide a way to
get the TLS segment sizes and the (per thread) addresses of the TLS
blocks.

> > TLS is mostly object-format specific (not as much OS specific). The
> > ELF implementation lays out the TLS data for a module (module =
> > shared library or the application) in a contiguous way. The details
> > are described in "ELF Handling For Thread-Local
> > Storage" (www.akkadia.org/drepper/tls.pdf).
> >
> 
> Mac OS X 10.7 + supports TLS natively. But I don't know where to find 
> documentation about it. It always possible to look at the source code.
> 

Then it's probably already supported by GCC/GDC. But having working
emulated TLS would be nice for many other architectures. Native TLS is
not that widespread.
March 19, 2012
Re: Supporting emulated tls
On 19 March 2012 08:15, Johannes Pfau <nospam@example.com> wrote:
> Am Sun, 18 Mar 2012 21:57:57 +0100
> schrieb Jacob Carlborg <doob@me.com>:
>
>> On 2012-03-18 12:32, Johannes Pfau wrote:
>> > I thought about supporting emulated tls a little. The GCC emutls.c
>> > implementation currently can't work with the gc, as every TLS
>> > variable is allocated individually and therefore we don't have a
>> > contiguous memory region for the gc. I think these are the possible
>> > solutions:
>>
>> Why not use the native TLS implementation when available and roll our
>> own, like DMD on Mac OS X, when none exists?
>
> That's what we (mostly) do right now. We have 2 issues:
>
> * Our own, emulated TLS support is implemented in GCC. This means it's
>  also used in C, which is great. Also GCC's emulated tls needs
>  absolutely no special features in the runtime linker, compile time
>  linker or language frontends. It's very portable and works with all
>  weird combinations of dynamic libraries, dlopen, etc.
>  But it has one quirk: It doesn't allocate TLS memory in a contiguous
>  way, every tls variable is allocated using malloc. This means we
>  can't pass a range to the GC for the tls variables. So we can't
>  support this emutls in the GC.
>

As far as my thought process goes, the only (implementable in the GDC
frontend) way to force contiguous layout of all TLS symbols is to pack
them up ourselves into a struct that is accessible via a single global
module-level variable.  And in the .ctor section, the module adds this
range to the GC.  This should be enough so it also works for shared
libraries too, however I'm sure there is quite a few details I am
missing out on here that would block this from working. :)


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
March 19, 2012
Re: Supporting emulated tls
On 2012-03-19 09:15, Johannes Pfau wrote:
> Am Sun, 18 Mar 2012 21:57:57 +0100
> schrieb Jacob Carlborg<doob@me.com>:
>
>> On 2012-03-18 12:32, Johannes Pfau wrote:
>>> I thought about supporting emulated tls a little. The GCC emutls.c
>>> implementation currently can't work with the gc, as every TLS
>>> variable is allocated individually and therefore we don't have a
>>> contiguous memory region for the gc. I think these are the possible
>>> solutions:
>>
>> Why not use the native TLS implementation when available and roll our
>> own, like DMD on Mac OS X, when none exists?
>
> That's what we (mostly) do right now. We have 2 issues:
>
> * Our own, emulated TLS support is implemented in GCC. This means it's
>    also used in C, which is great. Also GCC's emulated tls needs
>    absolutely no special features in the runtime linker, compile time
>    linker or language frontends. It's very portable and works with all
>    weird combinations of dynamic libraries, dlopen, etc.
>    But it has one quirk: It doesn't allocate TLS memory in a contiguous
>    way, every tls variable is allocated using malloc. This means we
>    can't pass a range to the GC for the tls variables. So we can't
>    support this emutls in the GC.

Ok, I see.

> * The other issue with native TLS is that using bracketing with
>    __tls_beg and __tls_end has corner cases where it doesn't work. We'd
>    need an alternative to locate the TLS memory addresses and TLS sizes.
>    But there's no standard or public API to do that.

On Mac OS X they are actually not needed. Don't know about other platforms.

>> BTW, I think it would be possible to emulate TLS in a very similar
>> way to how it's implemented natively for ELF.
>>
>
> I don't think it's that easy. For example, how would you assign module
> ids? For native TLS this is partially done by the compile time linker
> (for the main application and libraries that are always loaded), but if
> no native TLS is available, we can't rely on the linker to do that. We
> also need some way to get the current module id in running code.

As I understand it, in the native ELF implementation, assembly is used 
to access the current module id, this is for FreeBSD:

http://people.freebsd.org/~marcel/tls.html

This is how ___tls_get_addr is implemented on FreeBSD ELF i386:

https://bitbucket.org/freebsd/freebsd-head/src/4e8f50fe2f05/libexec/rtld-elf/i386/reloc.c#cl-355

> And how do we get the TLS initialization data? If we placed it into an
> array, like DMD does on OSX, we could use dlsym for dlopened libraries,
> but what about initially loaded libraries?

In the same way it's done in the native implementation. Isn't it 
possible to access all loaded libraries?

> Say you have application 'app', which depends on 'liba' and 'libb'. All
> of these have TLS data. Maybe we could implement something using
> dl_iterate_phdr, but that's a nonstandard extension.

Ok. Mac OS X has this a function called 
"_dyld_register_func_for_add_image", I guess other OS'es don't have a 
corresponding function? In general all this stuff very low level and 
nonstandard.

https://developer.apple.com/library/mac/#documentation/developertools/Reference/MachOReference/Reference/reference.html#jumpTo_53

> Compare that to GCC's emulation, which is probably slow, but 'just
> works' everywhere (except for the GC :-( ).

Yeah, that's a big advantage.

In general I was hoping that the work done by the dynamic loader to 
setup TLS could be moved to druntime.

-- 
/Jacob Carlborg
« First   ‹ Prev
1 2 3 4
Top | Discussion index | About this forum | D home