Thread overview
druntime thread (from foreach parallel?) cleanup bug
Nov 01, 2022
mw
Nov 01, 2022
H. S. Teoh
Nov 01, 2022
Ali Çehreli
Nov 01, 2022
H. S. Teoh
Nov 01, 2022
mw
Nov 01, 2022
mw
Nov 01, 2022
Imperatorn
November 01, 2022

My program received signal SIGSEGV, Segmentation fault.

Its simplified structure looks like this:

void foo() {
  ...
  writeln("done");  // saw this got printed!
}

int main() {
  foo();
  return 0;
}

So, just before the program exit, it failed. I suspect druntime has a thread (maybe due to foreach parallel) cleanup bug somewhere, which is unrelated to my own code. This kind of bug is hard to re-produce, not sure if I should file an issue.

I'm using: LDC - the LLVM D compiler (1.30.0) on x86_64.

Under gdb, here is the threads info (for the record):

Thread 11 "xxx" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x1555553df700 (LWP 36258)]
__GI___res_iclose (free_addr=true, statp=0x1555553dfdb8) at res-close.c:103
103 res-close.c: No such file or directory.

(gdb) info threads
Id Target Id Frame
1 Thread 0x155555515000 (LWP 36244) "lt" 0x0000155550af1d2d in __GI___pthread_timedjoin_ex (threadid=23456246527744, thread_return=0x0, abstime=0x0, block=) at pthread_join_common.c:89

  • 11 Thread 0x1555553df700 (LWP 36258) "lt" __GI___res_iclose (free_addr=true, statp=0x1555553dfdb8) at res-close.c:103
    17 Thread 0x155544817700 (LWP 36264) "lt" 0x0000155550afac70 in __GI___nanosleep (requested_time=0x155544810e90, remaining=0x155544810ea8) at ../sysdeps/unix/sysv/linux/nanosleep.c:28

(gdb) thread 1
[Switching to thread 1 (Thread 0x155555515000 (LWP 36244))]
#0 0x0000155550af1d2d in __GI___pthread_timedjoin_ex (threadid=23456246527744, thread_return=0x0, abstime=0x0, block=) at pthread_join_common.c:89
89 pthread_join_common.c: No such file or directory.
(gdb) where
#0 0x0000155550af1d2d in __GI___pthread_timedjoin_ex (threadid=23456246527744, thread_return=0x0, abstime=0x0, block=) at pthread_join_common.c:89
#1 0x0000555555fb94f8 in core.thread.osthread.joinLowLevelThread(ulong) ()
#2 0x0000555555fd7210 in _D4core8internal2gc4impl12conservativeQw3Gcx15stopScanThreadsMFNbZv ()
#3 0x0000555555fd3ae7 in _D4core8internal2gc4impl12conservativeQw3Gcx4DtorMFZv ()
#4 0x0000555555fd3962 in _D4core8internal2gc4impl12conservativeQw14ConservativeGC6__dtorMFZv ()
#5 0x0000555555fc2ce7 in rt_finalize2 ()
#6 0x0000555555fc0056 in rt_term ()
#7 0x0000555555fc0471 in _D2rt6dmain212_d_run_main2UAAamPUQgZiZ6runAllMFZv ()
#8 0x0000555555fc0356 in _d_run_main2 ()
#9 0x0000555555fc01ae in _d_run_main ()
#10 0x0000555555840c02 in main (argc=2, argv=0x7fffffffe188) at //home/zhou/project/ldc2-1.30.0-linux-x86_64/bin/../import/core/internal/entrypoint.d:42
#11 0x0000155550163b97 in __libc_start_main (main=0x555555840be0 , argc=2, argv=0x7fffffffe188, init=, fini=, rtld_fini=, stack_end=0x7fffffffe178)
at ../csu/libc-start.c:310
#12 0x00005555556dccca in _start ()

(gdb) thread 11
[Switching to thread 11 (Thread 0x1555553df700 (LWP 36258))]
#0 __GI___res_iclose (free_addr=true, statp=0x1555553dfdb8) at res-close.c:103
103 res-close.c: No such file or directory.
(gdb) where
#0 __GI___res_iclose (free_addr=true, statp=0x1555553dfdb8) at res-close.c:103
#1 res_thread_freeres () at res-close.c:138
#2 0x00001555502de8c2 in __libc_thread_freeres () at thread-freeres.c:29
#3 0x0000155550af0700 in start_thread (arg=0x1555553df700) at pthread_create.c:476
#4 0x0000155550263a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) thread 17
[Switching to thread 17 (Thread 0x155544817700 (LWP 36264))]
#0 0x0000155550afac70 in __GI___nanosleep (requested_time=0x155544810e90, remaining=0x155544810ea8) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
28 ../sysdeps/unix/sysv/linux/nanosleep.c: No such file or directory.
(gdb) where
#0 0x0000155550afac70 in __GI___nanosleep (requested_time=0x155544810e90, remaining=0x155544810ea8) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1 0x0000555555fb8c3b in _D4core6thread8osthread6Thread5sleepFNbNiSQBo4time8DurationZv ()
#2 0x0000555555d9a0c2 in _D4hunt4util8DateTimeQj25_sharedStaticCtor_L406_C5FZ9__lambda4MFZv () at home/zhou/.dub/packages/hunt-1.7.16/hunt/source/hunt/util/DateTime.d:430
#3 0x0000555555fb89f4 in thread_entryPoint ()
#4 0x0000155550af06db in start_thread (arg=0x155544817700) at pthread_create.c:463
#5 0x0000155550263a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

November 01, 2022
On Tue, Nov 01, 2022 at 05:19:56PM +0000, mw via Digitalmars-d-learn wrote:
> My program received signal SIGSEGV, Segmentation fault.
> 
> Its simplified structure looks like this:
> 
> ```
> void foo() {
>   ...
>   writeln("done");  // saw this got printed!
> }
> 
> int main() {
>   foo();
>   return 0;
> }
> 
> ```

Can you show a code snippet that includes the parallel foreach?  Because the above code snippet is over-simplified to the point it's impossible to tell what the original problem might be, since obviously calling a function that calls writeln would not crash the program.

Maybe try running Digger to reduce the code for you?


T

-- 
Never step over a puddle, always step around it. Chances are that whatever made it is still dripping.
November 01, 2022
On 11/1/22 10:27, H. S. Teoh wrote:

> Maybe try running Digger to reduce the code for you?

Did you mean dustmite, which is accessible as 'dub dustmite <destination-path>' but I haven't used it.

My guess for the segmentation fault is that the OP is executing destructor code that assumes some members are alive. If so, the code should be moved from destructors to functions to be called like obj.close(). But it's just a guess...

Ali

November 01, 2022
> Can you show a code snippet that includes the parallel foreach?

(It's just a very straight forward foreach on an array; as I said it may not be relevant.)


And I just noticed, one of the thread trace points to here:

https://github.com/huntlabs/hunt/blob/master/source/hunt/util/DateTime.d#L430

```
class DateTime {
  shared static this() {
    ...
    dateThread.isDaemon = true;  // not sure if this is related
  }
}
```

in the comments, it said: "BUG: ... crashed".  Looks like someone run into this (cleanup) issue already, but unable to fix it.

Anyway I logged an issue there:

https://github.com/huntlabs/hunt/issues/96


November 01, 2022
On Tue, Nov 01, 2022 at 10:37:57AM -0700, Ali Çehreli via Digitalmars-d-learn wrote:
> On 11/1/22 10:27, H. S. Teoh wrote:
> 
> > Maybe try running Digger to reduce the code for you?
> 
> Did you mean dustmite, which is accessible as 'dub dustmite <destination-path>' but I haven't used it.

Oh yes, sorry, I meant dustmite, not digger. :-P


> My guess for the segmentation fault is that the OP is executing destructor code that assumes some members are alive. If so, the code should be moved from destructors to functions to be called like obj.close(). But it's just a guess...
[...]

Yes, that's a common gotcha.


T

-- 
We are in class, we are supposed to be learning, we have a teacher... Is it too much that I expect him to teach me??? -- RL
November 01, 2022

On 11/1/22 1:47 PM, mw wrote:

> >

Can you show a code snippet that includes the parallel foreach?

(It's just a very straight forward foreach on an array; as I said it may not be relevant.)

And I just noticed, one of the thread trace points to here:

https://github.com/huntlabs/hunt/blob/master/source/hunt/util/DateTime.d#L430

class DateTime {
   shared static this() {
     ...
     dateThread.isDaemon = true;  // not sure if this is related
   }
}

in the comments, it said: "BUG: ... crashed".  Looks like someone run into this (cleanup) issue already, but unable to fix it.

Anyway I logged an issue there:

https://github.com/huntlabs/hunt/issues/96

Oh yeah, isDaemon detaches the thread from the GC. Don't do that unless you know what you are doing.

-Steve

November 01, 2022

On Tuesday, 1 November 2022 at 18:18:45 UTC, Steven Schveighoffer wrote:

>

Oh yeah, isDaemon detaches the thread from the GC. Don't do that unless you know what you are doing.

As discussed on discord, this isn't true actually. All it does is prevent the thread from being joined before exiting the runtime.

What is likely happening is, the runtime shuts down. That thread is still running, but the D runtime is gone. So it eventually starts trying to do something (like let's say, access thread local storage), and it's gone. Hence the segfault.

-Steve

November 01, 2022

On Tuesday, 1 November 2022 at 18:18:45 UTC, Steven Schveighoffer wrote:

> >

And I just noticed, one of the thread trace points to here:

https://github.com/huntlabs/hunt/blob/master/source/hunt/util/DateTime.d#L430

class DateTime {
   shared static this() {
     ...
     dateThread.isDaemon = true;  // not sure if this is related
   }
}

in the comments, it said: "BUG: ... crashed".  Looks like someone run into this (cleanup) issue already, but unable to fix it.

Anyway I logged an issue there:

https://github.com/huntlabs/hunt/issues/96

Oh yeah, isDaemon detaches the thread from the GC. Don't do that unless you know what you are doing.

Maybe the hunt library author doesn't know. (My code does not directly use this library, it got pulled in by some other decencies.)

Currently, the isDaemon doc does not mention this about this:

https://dlang.org/library/core/thread/threadbase/thread_base.is_daemon.html

Sets the daemon status for this thread. While the runtime will wait for all normal threads to complete before tearing down the process, daemon threads are effectively ignored and thus will not prevent the process from terminating. In effect, daemon threads will be terminated automatically by the OS when the process exits.

Maybe we should add to the doc?

BTW, what is exactly going wrong with their code?

I saw the tick() method call inside the anonymous dateThread is accessing these two stack variables of shared static this():

https://github.com/huntlabs/hunt/blob/master/source/hunt/util/DateTime.d#L409

    Appender!(char[])[2] bufs;
    const(char)[][2] targets;

Why does this tick() call work after the static this() finished in a normal run?

Why the problem only shows up when program finish?

November 01, 2022

On Tuesday, 1 November 2022 at 19:49:47 UTC, mw wrote:

>

On Tuesday, 1 November 2022 at 18:18:45 UTC, Steven Schveighoffer wrote:

>

[...]

Maybe the hunt library author doesn't know. (My code does not directly use this library, it got pulled in by some other decencies.)

[...]

Please, if you see anything in the docs that needs to be updated, make a PR right away <3

Documentation saves lives!

The times I have thought "I'll do it later" have been too many.