January 03, 2012
On Tue, 03 Jan 2012 18:27:56 +0100, Sean Kelly <sean@invisibleduck.org> wrote:

> The trick seems to be mapping in TLS (on OSX anyway) and running static crore at the right time. Are there other issues as well?
>
I was hoping to hook thread local module ctors to TLS initialization
which is already done lazily, but the semantics of 'static this()'
allow to run arbitrary code, so the right time currently is before any
code/data from that library can be accessed by this particular thread.
This necessitates to initialize all library dependencies as well.

Implementing dynamic TLS support for OSX might lead to some useful findings.

> Sent from my iPhone
>
> On Jan 3, 2012, at 8:53 AM, "Martin Nowak" <dawg@dawgfoto.de> wrote:
>
>> On Tue, 03 Jan 2012 08:20:38 +0100, Jacob Carlborg <doob@me.com> wrote:
>>
>>> On 2012-01-02 21:57, Martin Nowak wrote:
>>>> On Mon, 02 Jan 2012 20:38:50 +0100, Jacob Carlborg <doob@me.com> wrote:
>>>>
>>>>> On 2012-01-02 20:20, Martin Nowak wrote:
>>>>>> I think that I'll defer the support for runtime loading of shared
>>>>>> library (plugins)
>>>>>> in favor of getting linked shared library support done now.
>>>>>> There are several issues that require more thoughts.
>>>>>>
>>>>>> - Per-thread initialization of modules is somewhat tricky.
>>>>>> Doing it in Runtime.loadLibrary requires knowledge of shared library
>>>>>> dependencies
>>>>>> because different threads might share dependencies but this is not
>>>>>> provided by libc/libdl.
>>>>>>
>>>>>> - Libraries might not be unloaded as long as GC collected class
>>>>>> instances still exist because
>>>>>> finalization fails otherwise.
>>>>>>
>>>>>> - Getting symbols through mangled names is difficult/unstable.
>>>>>>
>>>>>> - D libraries used by a C library should provide proper runtime
>>>>>> initialization
>>>>>> even if the C library is used by a D application.
>>>>>>
>>>>>> Any ideas or use-cases for plugins are welcome.
>>>>>>
>>>>>> martin
>>>>>
>>>>>
>>>>> - Initializing module infos
>>>>> - Initializing exception handling tables
>>>>> - Running module constructors
>>>>> - Initializing TLS
>>>>>
>>>>> Then also unload all this when the library is unloaded.
>>>>>
>>>> It seems that libraries can't be unloaded deterministically,
>>>> because GC finalization still references them.
>>>>
>>>>> On Mac OS X, can't "_dyld_register_func_for_add_image" be used? Then
>>>>> it will work, hopefully, transparently for the user. D libraries used
>>>>> by C wouldn't need any different handling. Because they will be linked
>>>>> with druntime it can initializing everything with the help of
>>>>> "_dyld_register_func_for_add_image".
>>>>>
>>>>
>>>> That was the approach I took and it is partly a dead-end.
>>>>
>>>> I have a mechanism similar to _dyld_register_func_for_add_image
>>>> but runtime loaders have no notion of per-thread initialization,
>>>> i.e. when two threads load the same library only the first one will
>>>> actually cause the image to be loaded.
>>>> This implies that the second thread would need to check all
>>>> dependencies of the loaded library to do the initialization.
>>>> I've written something along this line but it requires to
>>>> exploit/rewrite part of the runtime linker.
>>>> Using dlmopen on linux would be a terrible inefficient hack
>>>> around this issue, it allows to load libraries multiple times.
>>>
>>> I'm not quite sure I understand. Most of the things that should be done, initializing module infos and so on, should only be done once.
>>>
>> Yes most, but not all.
>> The core issue here is that C++'s __thread doesn't allow dynamic initializers,
>> thus there is no infrastructure to do such things. And really a clean approach
>> would be to extend libc/ld.so.
January 03, 2012
On Tue, 03 Jan 2012 18:44:27 +0100, Jacob Carlborg <doob@me.com> wrote:

> On 2012-01-03 17:53, Martin Nowak wrote:
>> Yes most, but not all.
>> The core issue here is that C++'s __thread doesn't allow dynamic
>> initializers,
>> thus there is no infrastructure to do such things. And really a clean
>> approach
>> would be to extend libc/ld.so.
>
> First, __thread isn't supported in Mac OS X (if we're talking about that). For all the operating systems that do support TLS I'm pretty sure that TLS and dynamic libraries work.
>
> This documentation:
>
> http://www.akkadia.org/drepper/tls.pdf
>
> mentions several different TLS modes, some used for dynamic libraries and some used for other cases.
>

C++ takes a very simple approach here.
You can have
    __thread int a = 3;
but not
    __thread int a = geta();
    error: 'a' is thread-local and so cannot be dynamically initialized
January 03, 2012
On 2012-01-03 18:27, Sean Kelly wrote:
> The trick seems to be mapping in TLS (on OSX anyway) and running static crore at the right time. Are there other issues as well?

TLS has at least been an issue to me when trying to implement support for dynamic libraries on Mac OS X. I have no idea how the ___tls_get_addr function should be implemented to support dynamic libraries, especially not since we're rolling our own implementation and there's no documentation to follow.

I don't think there's any problems with the static constructors.

I've posted about my problems before at the DMD internals mailing list:

http://dfeed.kimsufi.thecybershadow.net/discussion/post/F2166CBA-4F77-49F7-9949-9F3666C12840@me.com

-- 
/Jacob Carlborg
January 03, 2012
On 2012-01-03 18:51, Martin Nowak wrote:
> On Tue, 03 Jan 2012 18:44:27 +0100, Jacob Carlborg <doob@me.com> wrote:
>
>> On 2012-01-03 17:53, Martin Nowak wrote:
>>> Yes most, but not all.
>>> The core issue here is that C++'s __thread doesn't allow dynamic
>>> initializers,
>>> thus there is no infrastructure to do such things. And really a clean
>>> approach
>>> would be to extend libc/ld.so.
>>
>> First, __thread isn't supported in Mac OS X (if we're talking about
>> that). For all the operating systems that do support TLS I'm pretty
>> sure that TLS and dynamic libraries work.
>>
>> This documentation:
>>
>> http://www.akkadia.org/drepper/tls.pdf
>>
>> mentions several different TLS modes, some used for dynamic libraries
>> and some used for other cases.
>>
>
> C++ takes a very simple approach here.
> You can have
> __thread int a = 3;
> but not
> __thread int a = geta();
> error: 'a' is thread-local and so cannot be dynamically initialized

Oh, you mean dynamic like that. I thought you meant accessing "a" from a dynamic library.

-- 
/Jacob Carlborg
January 03, 2012
On Tue, 03 Jan 2012 18:56:56 +0100, Jacob Carlborg <doob@me.com> wrote:

> On 2012-01-03 18:27, Sean Kelly wrote:
>> The trick seems to be mapping in TLS (on OSX anyway) and running static crore at the right time. Are there other issues as well?
>
> TLS has at least been an issue to me when trying to implement support for dynamic libraries on Mac OS X. I have no idea how the ___tls_get_addr function should be implemented to support dynamic libraries, especially not since we're rolling our own implementation and there's no documentation to follow.
>
> I don't think there's any problems with the static constructors.
>
> I've posted about my problems before at the DMD internals mailing list:
>
> http://dfeed.kimsufi.thecybershadow.net/discussion/post/F2166CBA-4F77-49F7-9949-9F3666C12840@me.com
>

Without support from the OS (TLS segment register)
and none of the linker (module index as dynamic TLS relocations)
any implementation will be severely flawed/inefficient.

As a workaround you could determine the library that called
__tls_get_addr by the return address, e.g. through __builtin_return_address(0),
and a search it among the .text ranges of all loaded libraries.
January 03, 2012
__thread is supported under Lion via Clang.

Sent from my iPhone

On Jan 3, 2012, at 9:44 AM, Jacob Carlborg <doob@me.com> wrote:

> On 2012-01-03 17:53, Martin Nowak wrote:
>> Yes most, but not all.
>> The core issue here is that C++'s __thread doesn't allow dynamic
>> initializers,
>> thus there is no infrastructure to do such things. And really a clean
>> approach
>> would be to extend libc/ld.so.
> 
> First, __thread isn't supported in Mac OS X (if we're talking about that). For all the operating systems that do support TLS I'm pretty sure that TLS and dynamic libraries work.
> 
> This documentation:
> 
> http://www.akkadia.org/drepper/tls.pdf
> 
> mentions several different TLS modes, some used for dynamic libraries and some used for other cases.
> 
> -- 
> /Jacob Carlborg
January 03, 2012
Shouldn't be terrible then. Have a routine in the lib that returns a reference to whatever, and have library map it in. Unloading would be tricky though, for the reasons you mention. Probably possible though by copying the stuff to be mapped in into GCed memory. Possibly even simply have the GC track that memory in a way similar to how Andeei suggested we handle mmap.

Sent from my iPhone

On Jan 3, 2012, at 9:47 AM, "Martin Nowak" <dawg@dawgfoto.de> wrote:

> On Tue, 03 Jan 2012 18:27:56 +0100, Sean Kelly <sean@invisibleduck.org> wrote:
> 
>> The trick seems to be mapping in TLS (on OSX anyway) and running static crore at the right time. Are there other issues as well?
>> 
> I was hoping to hook thread local module ctors to TLS initialization which is already done lazily, but the semantics of 'static this()' allow to run arbitrary code, so the right time currently is before any code/data from that library can be accessed by this particular thread. This necessitates to initialize all library dependencies as well.
> 
> Implementing dynamic TLS support for OSX might lead to some useful findings.
> 
>> Sent from my iPhone
>> 
>> On Jan 3, 2012, at 8:53 AM, "Martin Nowak" <dawg@dawgfoto.de> wrote:
>> 
>>> On Tue, 03 Jan 2012 08:20:38 +0100, Jacob Carlborg <doob@me.com> wrote:
>>> 
>>>> On 2012-01-02 21:57, Martin Nowak wrote:
>>>>> On Mon, 02 Jan 2012 20:38:50 +0100, Jacob Carlborg <doob@me.com> wrote:
>>>>> 
>>>>>> On 2012-01-02 20:20, Martin Nowak wrote:
>>>>>>> I think that I'll defer the support for runtime loading of shared
>>>>>>> library (plugins)
>>>>>>> in favor of getting linked shared library support done now.
>>>>>>> There are several issues that require more thoughts.
>>>>>>> 
>>>>>>> - Per-thread initialization of modules is somewhat tricky.
>>>>>>> Doing it in Runtime.loadLibrary requires knowledge of shared library
>>>>>>> dependencies
>>>>>>> because different threads might share dependencies but this is not
>>>>>>> provided by libc/libdl.
>>>>>>> 
>>>>>>> - Libraries might not be unloaded as long as GC collected class
>>>>>>> instances still exist because
>>>>>>> finalization fails otherwise.
>>>>>>> 
>>>>>>> - Getting symbols through mangled names is difficult/unstable.
>>>>>>> 
>>>>>>> - D libraries used by a C library should provide proper runtime
>>>>>>> initialization
>>>>>>> even if the C library is used by a D application.
>>>>>>> 
>>>>>>> Any ideas or use-cases for plugins are welcome.
>>>>>>> 
>>>>>>> martin
>>>>>> 
>>>>>> 
>>>>>> - Initializing module infos
>>>>>> - Initializing exception handling tables
>>>>>> - Running module constructors
>>>>>> - Initializing TLS
>>>>>> 
>>>>>> Then also unload all this when the library is unloaded.
>>>>>> 
>>>>> It seems that libraries can't be unloaded deterministically, because GC finalization still references them.
>>>>> 
>>>>>> On Mac OS X, can't "_dyld_register_func_for_add_image" be used? Then it will work, hopefully, transparently for the user. D libraries used by C wouldn't need any different handling. Because they will be linked with druntime it can initializing everything with the help of "_dyld_register_func_for_add_image".
>>>>>> 
>>>>> 
>>>>> That was the approach I took and it is partly a dead-end.
>>>>> 
>>>>> I have a mechanism similar to _dyld_register_func_for_add_image
>>>>> but runtime loaders have no notion of per-thread initialization,
>>>>> i.e. when two threads load the same library only the first one will
>>>>> actually cause the image to be loaded.
>>>>> This implies that the second thread would need to check all
>>>>> dependencies of the loaded library to do the initialization.
>>>>> I've written something along this line but it requires to
>>>>> exploit/rewrite part of the runtime linker.
>>>>> Using dlmopen on linux would be a terrible inefficient hack
>>>>> around this issue, it allows to load libraries multiple times.
>>>> 
>>>> I'm not quite sure I understand. Most of the things that should be done, initializing module infos and so on, should only be done once.
>>>> 
>>> Yes most, but not all.
>>> The core issue here is that C++'s __thread doesn't allow dynamic initializers,
>>> thus there is no infrastructure to do such things. And really a clean approach
>>> would be to extend libc/ld.so.
January 03, 2012
I thought C++0x allowed TLS class instances?

Sent from my iPhone

On Jan 3, 2012, at 9:51 AM, "Martin Nowak" <dawg@dawgfoto.de> wrote:

> On Tue, 03 Jan 2012 18:44:27 +0100, Jacob Carlborg <doob@me.com> wrote:
> 
>> On 2012-01-03 17:53, Martin Nowak wrote:
>>> Yes most, but not all.
>>> The core issue here is that C++'s __thread doesn't allow dynamic
>>> initializers,
>>> thus there is no infrastructure to do such things. And really a clean
>>> approach
>>> would be to extend libc/ld.so.
>> 
>> First, __thread isn't supported in Mac OS X (if we're talking about that). For all the operating systems that do support TLS I'm pretty sure that TLS and dynamic libraries work.
>> 
>> This documentation:
>> 
>> http://www.akkadia.org/drepper/tls.pdf
>> 
>> mentions several different TLS modes, some used for dynamic libraries and some used for other cases.
>> 
> 
> C++ takes a very simple approach here.
> You can have
>    __thread int a = 3;
> but not
>    __thread int a = geta();
>    error: 'a' is thread-local and so cannot be dynamically initialized
January 03, 2012
On Tue, 03 Jan 2012 22:49:57 +0100, Sean Kelly <sean@invisibleduck.org> wrote:

> Shouldn't be terrible then. Have a routine in the lib that returns a reference to whatever, and have library map it in. Unloading would be tricky though, for the reasons you mention. Probably possible though by copying the stuff to be mapped in into GCed memory.
This is a bad solution it would require to relocate all classinfo
pointers at runtime and even worse move class initializer into a writeable
segment, thus reduce process memory sharing.

> Possibly even simply have the GC track that memory in
> a way similar to how Andeei suggested we handle mmap.
>
What exactly does he suggest?
But extending the GC seems like a feasible way.
This could be done by a very general interface of the garbage collector.

GC.trackRange(void* p, size_t sz, void function(void* p) finalizer);

OTOH it will be difficult w.r.t. performance.

> Sent from my iPhone
>
> On Jan 3, 2012, at 9:47 AM, "Martin Nowak" <dawg@dawgfoto.de> wrote:
>
>> On Tue, 03 Jan 2012 18:27:56 +0100, Sean Kelly <sean@invisibleduck.org> wrote:
>>
>>> The trick seems to be mapping in TLS (on OSX anyway) and running static crore at the right time. Are there other issues as well?
>>>
>> I was hoping to hook thread local module ctors to TLS initialization
>> which is already done lazily, but the semantics of 'static this()'
>> allow to run arbitrary code, so the right time currently is before any
>> code/data from that library can be accessed by this particular thread.
>> This necessitates to initialize all library dependencies as well.
>>
>> Implementing dynamic TLS support for OSX might lead to some useful findings.
>>
>>> Sent from my iPhone
>>>
>>> On Jan 3, 2012, at 8:53 AM, "Martin Nowak" <dawg@dawgfoto.de> wrote:
>>>
>>>> On Tue, 03 Jan 2012 08:20:38 +0100, Jacob Carlborg <doob@me.com> wrote:
>>>>
>>>>> On 2012-01-02 21:57, Martin Nowak wrote:
>>>>>> On Mon, 02 Jan 2012 20:38:50 +0100, Jacob Carlborg <doob@me.com> wrote:
>>>>>>
>>>>>>> On 2012-01-02 20:20, Martin Nowak wrote:
>>>>>>>> I think that I'll defer the support for runtime loading of shared
>>>>>>>> library (plugins)
>>>>>>>> in favor of getting linked shared library support done now.
>>>>>>>> There are several issues that require more thoughts.
>>>>>>>>
>>>>>>>> - Per-thread initialization of modules is somewhat tricky.
>>>>>>>> Doing it in Runtime.loadLibrary requires knowledge of shared library
>>>>>>>> dependencies
>>>>>>>> because different threads might share dependencies but this is not
>>>>>>>> provided by libc/libdl.
>>>>>>>>
>>>>>>>> - Libraries might not be unloaded as long as GC collected class
>>>>>>>> instances still exist because
>>>>>>>> finalization fails otherwise.
>>>>>>>>
>>>>>>>> - Getting symbols through mangled names is difficult/unstable.
>>>>>>>>
>>>>>>>> - D libraries used by a C library should provide proper runtime
>>>>>>>> initialization
>>>>>>>> even if the C library is used by a D application.
>>>>>>>>
>>>>>>>> Any ideas or use-cases for plugins are welcome.
>>>>>>>>
>>>>>>>> martin
>>>>>>>
>>>>>>>
>>>>>>> - Initializing module infos
>>>>>>> - Initializing exception handling tables
>>>>>>> - Running module constructors
>>>>>>> - Initializing TLS
>>>>>>>
>>>>>>> Then also unload all this when the library is unloaded.
>>>>>>>
>>>>>> It seems that libraries can't be unloaded deterministically,
>>>>>> because GC finalization still references them.
>>>>>>
>>>>>>> On Mac OS X, can't "_dyld_register_func_for_add_image" be used? Then
>>>>>>> it will work, hopefully, transparently for the user. D libraries used
>>>>>>> by C wouldn't need any different handling. Because they will be linked
>>>>>>> with druntime it can initializing everything with the help of
>>>>>>> "_dyld_register_func_for_add_image".
>>>>>>>
>>>>>>
>>>>>> That was the approach I took and it is partly a dead-end.
>>>>>>
>>>>>> I have a mechanism similar to _dyld_register_func_for_add_image
>>>>>> but runtime loaders have no notion of per-thread initialization,
>>>>>> i.e. when two threads load the same library only the first one will
>>>>>> actually cause the image to be loaded.
>>>>>> This implies that the second thread would need to check all
>>>>>> dependencies of the loaded library to do the initialization.
>>>>>> I've written something along this line but it requires to
>>>>>> exploit/rewrite part of the runtime linker.
>>>>>> Using dlmopen on linux would be a terrible inefficient hack
>>>>>> around this issue, it allows to load libraries multiple times.
>>>>>
>>>>> I'm not quite sure I understand. Most of the things that should be done, initializing module infos and so on, should only be done once.
>>>>>
>>>> Yes most, but not all.
>>>> The core issue here is that C++'s __thread doesn't allow dynamic initializers,
>>>> thus there is no infrastructure to do such things. And really a clean approach
>>>> would be to extend libc/ld.so.
January 03, 2012
That's roughly what he suggested. It's in an old thread, either here or the druntime mailing list. The idea was to have the GC be in charge of releasing even non-GC memory to ensure that no dangling reference issues exist, IIRC.

Sent from my iPhone

On Jan 3, 2012, at 2:56 PM, "Martin Nowak" <dawg@dawgfoto.de> wrote:

> On Tue, 03 Jan 2012 22:49:57 +0100, Sean Kelly <sean@invisibleduck.org> wrote:
> 
>> Shouldn't be terrible then. Have a routine in the lib that returns a reference to whatever, and have library map it in. Unloading would be tricky though, for the reasons you mention. Probably possible though by copying the stuff to be mapped in into GCed memory.
> This is a bad solution it would require to relocate all classinfo pointers at runtime and even worse move class initializer into a writeable segment, thus reduce process memory sharing.
> 
>> Possibly even simply have the GC track that memory in
>> a way similar to how Andeei suggested we handle mmap.
>> 
> What exactly does he suggest?
> But extending the GC seems like a feasible way.
> This could be done by a very general interface of the garbage collector.
> 
> GC.trackRange(void* p, size_t sz, void function(void* p) finalizer);
> 
> OTOH it will be difficult w.r.t. performance.
> 
>> Sent from my iPhone
>> 
>> On Jan 3, 2012, at 9:47 AM, "Martin Nowak" <dawg@dawgfoto.de> wrote:
>> 
>>> On Tue, 03 Jan 2012 18:27:56 +0100, Sean Kelly <sean@invisibleduck.org> wrote:
>>> 
>>>> The trick seems to be mapping in TLS (on OSX anyway) and running static crore at the right time. Are there other issues as well?
>>>> 
>>> I was hoping to hook thread local module ctors to TLS initialization which is already done lazily, but the semantics of 'static this()' allow to run arbitrary code, so the right time currently is before any code/data from that library can be accessed by this particular thread. This necessitates to initialize all library dependencies as well.
>>> 
>>> Implementing dynamic TLS support for OSX might lead to some useful findings.
>>> 
>>>> Sent from my iPhone
>>>> 
>>>> On Jan 3, 2012, at 8:53 AM, "Martin Nowak" <dawg@dawgfoto.de> wrote:
>>>> 
>>>>> On Tue, 03 Jan 2012 08:20:38 +0100, Jacob Carlborg <doob@me.com> wrote:
>>>>> 
>>>>>> On 2012-01-02 21:57, Martin Nowak wrote:
>>>>>>> On Mon, 02 Jan 2012 20:38:50 +0100, Jacob Carlborg <doob@me.com> wrote:
>>>>>>> 
>>>>>>>> On 2012-01-02 20:20, Martin Nowak wrote:
>>>>>>>>> I think that I'll defer the support for runtime loading of shared
>>>>>>>>> library (plugins)
>>>>>>>>> in favor of getting linked shared library support done now.
>>>>>>>>> There are several issues that require more thoughts.
>>>>>>>>> 
>>>>>>>>> - Per-thread initialization of modules is somewhat tricky.
>>>>>>>>> Doing it in Runtime.loadLibrary requires knowledge of shared library
>>>>>>>>> dependencies
>>>>>>>>> because different threads might share dependencies but this is not
>>>>>>>>> provided by libc/libdl.
>>>>>>>>> 
>>>>>>>>> - Libraries might not be unloaded as long as GC collected class
>>>>>>>>> instances still exist because
>>>>>>>>> finalization fails otherwise.
>>>>>>>>> 
>>>>>>>>> - Getting symbols through mangled names is difficult/unstable.
>>>>>>>>> 
>>>>>>>>> - D libraries used by a C library should provide proper runtime
>>>>>>>>> initialization
>>>>>>>>> even if the C library is used by a D application.
>>>>>>>>> 
>>>>>>>>> Any ideas or use-cases for plugins are welcome.
>>>>>>>>> 
>>>>>>>>> martin
>>>>>>>> 
>>>>>>>> 
>>>>>>>> - Initializing module infos
>>>>>>>> - Initializing exception handling tables
>>>>>>>> - Running module constructors
>>>>>>>> - Initializing TLS
>>>>>>>> 
>>>>>>>> Then also unload all this when the library is unloaded.
>>>>>>>> 
>>>>>>> It seems that libraries can't be unloaded deterministically, because GC finalization still references them.
>>>>>>> 
>>>>>>>> On Mac OS X, can't "_dyld_register_func_for_add_image" be used? Then it will work, hopefully, transparently for the user. D libraries used by C wouldn't need any different handling. Because they will be linked with druntime it can initializing everything with the help of "_dyld_register_func_for_add_image".
>>>>>>>> 
>>>>>>> 
>>>>>>> That was the approach I took and it is partly a dead-end.
>>>>>>> 
>>>>>>> I have a mechanism similar to _dyld_register_func_for_add_image
>>>>>>> but runtime loaders have no notion of per-thread initialization,
>>>>>>> i.e. when two threads load the same library only the first one will
>>>>>>> actually cause the image to be loaded.
>>>>>>> This implies that the second thread would need to check all
>>>>>>> dependencies of the loaded library to do the initialization.
>>>>>>> I've written something along this line but it requires to
>>>>>>> exploit/rewrite part of the runtime linker.
>>>>>>> Using dlmopen on linux would be a terrible inefficient hack
>>>>>>> around this issue, it allows to load libraries multiple times.
>>>>>> 
>>>>>> I'm not quite sure I understand. Most of the things that should be done, initializing module infos and so on, should only be done once.
>>>>>> 
>>>>> Yes most, but not all.
>>>>> The core issue here is that C++'s __thread doesn't allow dynamic initializers,
>>>>> thus there is no infrastructure to do such things. And really a clean approach
>>>>> would be to extend libc/ld.so.