Thread overview
Segmentation fault in runTlsDtors
Jun 25, 2021
Ali Çehreli
Jun 25, 2021
rikki cattermole
Jun 26, 2021
Ali Çehreli
Jul 01, 2021
Max Samukha
Jul 01, 2021
Ali Çehreli
June 25, 2021
I need your help with sporadic segfaults.

Players:

* dmd 2.096 (but I've seen similar issues in the past with earlier versions as well)

* A D library with extern(C) functions that calls rt_init() and rt_term(), which I think are needed for the library's use with Python

* A D program that uses said library (would calling rt_init() and rt_term() cause harm in this case?) (Using the library with Python works fine.)


The segfault happens when the program is shutting down. Here is a stack trace from a core dump:

[Current thread is 1 (Thread 0x7fb1ef95e700 (LWP 20010))]
(gdb) bt
#0 0x00007fb240a1c698 in _D2rt5minfo__T17runModuleFuncsRevSQBgQBg11ModuleGroup11runTlsDtorsMFZ9__lambda1ZQCoMFAxPyS6object10ModuleInfoZv () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
#1 0x00007fb240a1c0b1 in rt.minfo.ModuleGroup.runTlsDtors() () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
#2 0x00007fb240a1c411 in _D2rt5minfo16rt_moduleTlsDtorUZ14__foreachbody1MFKSQBx19sections_elf_shared3DSOZi () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
#3 0x00007fb240a1ddf2 in _D2rt19sections_elf_shared3DSO14opApplyReverseFMDFKSQByQByQBgZiZi () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
#4 0x00007fb240a1c3f1 in rt_moduleTlsDtor () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
#5 0x00007fb240a0a401 in thread_entryPoint () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
#6 0x00007fb23fcf56db in start_thread (arg=0x7fb1ef95e700) at pthread_create.c:463
#7 0x00007fb23f80671f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95


If related, here are the library initialization and deinitialization functions, which I think are needed e.g. for using from Python:

// The initialization function of the library
pragma (crt_constructor)
extern (C)
void lib_init() {
  const err = rt_init();
  enum success = 1;  // Yes, backwards.
  if (err != success) {
    fprintf(core.stdc.stdio.stderr, "Failed to initialize D runtime.");
    abort();
  }
}

// The deinitialization function of the library
pragma (crt_destructor)
extern (C)
void lib_deinit() {
  const err = rt_term();
  enum success = 1;  // Yes, backwards.
  if (err != success) {
    fprintf(core.stdc.stdio.stderr, "Failed to deinitialize D runtime.");
    // Intentionally not aborting in a destructor.
  }
}


The segmentation fault is sporadic; likely due to a race condition. Is it related to my code? Can I workaround this? Can I reduce the likelihood of this happening?

The couple of places where I define any '~this' function is not used in this program. So, I rule out my allocating memory in a destructor.

Thank you,
Ali

June 26, 2021
This may not help but try with ldc's address sanitizer.

That might give you more information about the life time for the memory causing the segfault itself with stack traces.
June 25, 2021

On 6/25/21 10:55 AM, Ali Çehreli wrote:

>

I need your help with sporadic segfaults.

Players:

  • dmd 2.096 (but I've seen similar issues in the past with earlier versions as well)

  • A D library with extern(C) functions that calls rt_init() and rt_term(), which I think are needed for the library's use with Python

  • A D program that uses said library (would calling rt_init() and rt_term() cause harm in this case?) (Using the library with Python works fine.)

rt_init and rt_term are reentrant, you can call rt_term and rt_init as many times as you like, as long as you call rt_init first, and rt_term as many times as you called rt_init.

>

The segfault happens when the program is shutting down. Here is a stack trace from a core dump:

[Current thread is 1 (Thread 0x7fb1ef95e700 (LWP 20010))]
(gdb) bt
#0 0x00007fb240a1c698 in _D2rt5minfo__T17runModuleFuncsRevSQBgQBg11ModuleGroup11runTlsDtorsMFZ9__lambda1ZQCoMFAxPyS6object10ModuleInfoZv () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
#1 0x00007fb240a1c0b1 in rt.minfo.ModuleGroup.runTlsDtors() () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
#2 0x00007fb240a1c411 in _D2rt5minfo16rt_moduleTlsDtorUZ14__foreachbody1MFKSQBx19sections_elf_shared3DSOZi () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
#3 0x00007fb240a1ddf2 in _D2rt19sections_elf_shared3DSO14opApplyReverseFMDFKSQByQByQBgZiZi () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
#4 0x00007fb240a1c3f1 in rt_moduleTlsDtor () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
#5 0x00007fb240a0a401 in thread_entryPoint () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
#6 0x00007fb23fcf56db in start_thread (arg=0x7fb1ef95e700) at pthread_create.c:463
#7 0x00007fb23f80671f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Hm... maybe try compiling Phobos/druntime in debug mode. Line numbers would be helpful.

It's interesting though, the segfault is not happening in a static destructor, but rather the function that runs the destructors (seems like a nested function).

Have you tried running demangle on these to see what they really are?

>

If related, here are the library initialization and deinitialization functions, which I think are needed e.g. for using from Python:

// The initialization function of the library
pragma (crt_constructor)
extern (C)
void lib_init() {
  const err = rt_init();
  enum success = 1;  // Yes, backwards.
  if (err != success) {
    fprintf(core.stdc.stdio.stderr, "Failed to initialize D runtime.");
    abort();
  }
}

// The deinitialization function of the library
pragma (crt_destructor)
extern (C)
void lib_deinit() {
  const err = rt_term();
  enum success = 1;  // Yes, backwards.
  if (err != success) {
    fprintf(core.stdc.stdio.stderr, "Failed to deinitialize D runtime.");
    // Intentionally not aborting in a destructor.
  }
}

The segmentation fault is sporadic; likely due to a race condition. Is it related to my code? Can I workaround this? Can I reduce the likelihood of this happening?

Are you running any other CRT destructors that might use D constructs? Note that CRT destructors and constructors do not run in any specific order, unlike D constructors and destructors.

>

The couple of places where I define any '~this' function is not used in this program. So, I rule out my allocating memory in a destructor.

Allocating memory in a destructor would not cause this problem.

-Steve

June 25, 2021
On 6/25/21 11:21 AM, Steven Schveighoffer wrote:

> rt_init and rt_term are reentrant, you can call rt_term and rt_init as
> many times as you like, as long as you call rt_init first, and rt_term
> as many times as you called rt_init.

Cool. That's what I know.

>> The segfault happens when the program is shutting down. Here is a
>> stack trace from a core dump:
>>
>> [Current thread is 1 (Thread 0x7fb1ef95e700 (LWP 20010))]
>> (gdb) bt
>> #0 0x00007fb240a1c698 in
>> _D2rt5minfo__T17runModuleFuncsRevSQBgQBg11ModuleGroup11runTlsDtorsMFZ9__lambda1ZQCoMFAxPyS6object10ModuleInfoZv 

>> () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
>> #1 0x00007fb240a1c0b1 in rt.minfo.ModuleGroup.runTlsDtors() () from
>> /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
>> #2 0x00007fb240a1c411 in
>> _D2rt5minfo16rt_moduleTlsDtorUZ14__foreachbody1MFKSQBx19sections_elf_shared3DSOZi 

>> () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
>> #3 0x00007fb240a1ddf2 in
>> _D2rt19sections_elf_shared3DSO14opApplyReverseFMDFKSQByQByQBgZiZi ()
>> from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
>> #4 0x00007fb240a1c3f1 in rt_moduleTlsDtor () from
>> /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
>> #5 0x00007fb240a0a401 in thread_entryPoint () from
>> /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96
>> #6 0x00007fb23fcf56db in start_thread (arg=0x7fb1ef95e700) at
>> pthread_create.c:463
>> #7 0x00007fb23f80671f in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
> Hm... maybe try compiling Phobos/druntime in debug mode. Line numbers
> would be helpful.
>
> It's interesting though, the segfault is not happening in a static
> destructor, but rather the function that runs the destructors (seems
> like a nested function).
>
> Have you tried running demangle on these to see what they really are?

I can see runTlsDtors() in frame 0. Assuming it runs the destructors of my TLS objects, then the culprit may be me. (See below.)

And why are we inside starting a thread? Is that a GC thread? I can't imagine my program starting a thread when the program is shutting down. (?)

> Are you running any other CRT destructors that might use D constructs?

No. There is only one pair to initialize the library.

Again, the library is used by a D program but the program does not load the library explicitly. This is built by cmake and the library is specified as a dependency and I assume it's linked and loaded automatically.

I just had a worry: I am not even sure whether a function is used from the library or whether it's compiled and used from the module that the program inevitably imports. For example, if the library has a c_api.d module, the D program imports it anyway and it imports other modules that it depends on anyway. :) So, perhaps my D program does not even use the librayr, in which case perhasp rt_term may be a problem. (?)

>> The couple of places where I define any '~this' function is not used
>> in this program. So, I rule out my allocating memory in a destructor.
>
> Allocating memory in a destructor would not cause this problem.

I am reminded of ~this() functions (any kind: struct, class, static, and shared static) because the segfault happens during runTlsDtors(). Does that execute my code? Am I doing things in destructors that I should not be doing? But again, the only destructors I defined are not in this program. (The only one that's in this program is in a unittest, which is excluded by 'version(unittest)'.)

> -Steve

Thank you,
Ali

July 01, 2021
On Saturday, 26 June 2021 at 02:14:50 UTC, Ali Çehreli wrote:

>
> And why are we inside starting a thread? Is that a GC thread? I can't imagine my program starting a thread when the program is shutting down. (?)

We just haven't exited the process's main thread yet, which was created with this call at line 95: https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/x86_64/clone.S.html
July 01, 2021
On 7/1/21 12:51 PM, Max Samukha wrote:
> On Saturday, 26 June 2021 at 02:14:50 UTC, Ali Çehreli wrote:
> 
>>
>> And why are we inside starting a thread? Is that a GC thread? I can't imagine my program starting a thread when the program is shutting down. (?)
> 
> We just haven't exited the process's main thread yet, which was created with this call at line 95: https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/x86_64/clone.S.html 
> 

Thanks.

I came here to report that I've worked around this issue by not linking with the library but including its modules in the program that segfaulted.

The main difference in this case is the lack of the library's c_api.d file, which did automatic library initialization and deinitialization. Of course, I'm not sure whether that was the cause but I am happy that it was a fairly simple workaround which involved just the build configuration file.

Ali