Thread overview
Can you reproduce this threading bug?
Jun 14, 2019
FeepingCreature
Jun 14, 2019
ag0aep6g
Jun 14, 2019
Alex
Jun 14, 2019
FeepingCreature
Jun 14, 2019
Antonio Corbi
Jun 14, 2019
Jacob Carlborg
Jun 15, 2019
rikki cattermole
Re: Can you reproduce this threading bug? (issue 19978)
Jun 17, 2019
FeepingCreature
June 14, 2019
Consider the following code:

void main()
{
    import core.thread : Thread;

    with (new Thread({ })) { isDaemon = true; start; }
}

On Linux, this builds and runs. Most of the time. Maybe 99% of the time.

But if you run it in a loop:

while true; do ./test || break; done

You may see that it segfaults after a few seconds. At least, it does for me on 2.080.0, Linux 4.18.0-20 x86_64.

This is obviously quite bad. Any ideas?
June 14, 2019
On 14.06.19 18:36, FeepingCreature wrote:
> Consider the following code:
> 
> void main()
> {
>      import core.thread : Thread;
> 
>      with (new Thread({ })) { isDaemon = true; start; }
> }
> 
> On Linux, this builds and runs. Most of the time. Maybe 99% of the time.
> 
> But if you run it in a loop:
> 
> while true; do ./test || break; done
> 
> You may see that it segfaults after a few seconds. At least, it does for me on 2.080.0, Linux 4.18.0-20 x86_64.

Can reproduce. DMD 2.086.0, Linux 5.0.0-16-generic x86_64
June 14, 2019
On Friday, 14 June 2019 at 16:36:04 UTC, FeepingCreature wrote:
> Consider the following code:
>
> void main()
> {
>     import core.thread : Thread;
>
>     with (new Thread({ })) { isDaemon = true; start; }
> }
>
> On Linux, this builds and runs. Most of the time. Maybe 99% of the time.
>
> But if you run it in a loop:
>
> while true; do ./test || break; done
>
> You may see that it segfaults after a few seconds. At least, it does for me on 2.080.0, Linux 4.18.0-20 x86_64.
>
> This is obviously quite bad. Any ideas?

Can reproduce. DMD64 D Compiler v2.086.0, MacOs 10.13.6; Darwin Kernel Version 17.7.0; x86_64
June 14, 2019
Happens for me at home too, with ldc2-1.11 on 4.14.111.

I think with the Mac user reporting in, we can exclude a kernel or glibc issue. Damn.
June 14, 2019
On Friday, 14 June 2019 at 18:41:41 UTC, FeepingCreature wrote:
> Happens for me at home too, with ldc2-1.11 on 4.14.111.
>
> I think with the Mac user reporting in, we can exclude a kernel or glibc issue. Damn.

Using Arch Linux:

Linux hal9000 5.1.9-zen1-1-zen #1 ZEN SMP PREEMPT Tue Jun 11 16:19:25 UTC 2019 x86_64 GNU/Linux

And dmd:
dmd --version
DMD64 D Compiler v2.086.0


Compiling with "dmd -g" and running the same loop but inside gdb (while true; do gdb -ex run -ex q ./ttest || break; done), this is the stack trace I get:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff7b1c700 (LWP 16006)]

Thread 2 "ttest" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7ffff7b1c700 (LWP 16006)]
0x00007ffff7f6708a in __lll_unlock_wake () from /usr/lib/libpthread.so.0
A debugging session is active.

	Inferior 1 [process 15984] will be killed.

Quit anyway? (y or n) n
Not confirmed.
(gdb) bt
#0  0x00007ffff7f6708a in __lll_unlock_wake () from /usr/lib/libpthread.so.0
#1  0x00007ffff7f61a66 in __pthread_mutex_unlock_usercnt () from /usr/lib/libpthread.so.0
#2  0x000055555559083d in _D4core4sync5mutex5Mutex__T14unlock_nothrowTCQBrQBpQBnQBkZQBfMFNbNiNeZv ()
#3  0x000055555558f78f in _D4core6thread6Thread3addFNbNiCQBdQBbQxbZv ()
#4  0x00005555555a0088 in thread_entryPoint ()
#5  0x00007ffff7f5da92 in start_thread () from /usr/lib/libpthread.so.0
#6  0x00007ffff7d1ccd3 in clone () from /usr/lib/libc.so.6

Hope this helps.
Antonio

June 14, 2019
On 2019-06-14 18:36, FeepingCreature wrote:
> Consider the following code:
> 
> void main()
> {
>      import core.thread : Thread;
> 
>      with (new Thread({ })) { isDaemon = true; start; }
> }
> 
> On Linux, this builds and runs. Most of the time. Maybe 99% of the time.
> 
> But if you run it in a loop:
> 
> while true; do ./test || break; done
> 
> You may see that it segfaults after a few seconds. At least, it does for me on 2.080.0, Linux 4.18.0-20 x86_64.
> 
> This is obviously quite bad. Any ideas?

On macOS I get a mixture of the following:

Aborting from src/core/sync/mutex.d(149) Error: pthread_mutex_destroy failed.Abort trap: 6

Aborting from src/core/sync/mutex.d(149) Error: pthread_mutex_destroy failed.Segmentation fault: 11

Aborting from Segmentation fault: 11



Pretty easy to reproduce. But when I tried in a debugger I failed to reproduce the segfault. Although, here is what the crash reporter logged:

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
abort() called

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	0x00007fff4fe8cb66 __pthread_kill + 10
1   libsystem_pthread.dylib       	0x00007fff50057080 pthread_kill + 333
2   libsystem_c.dylib             	0x00007fff4fde81ae abort + 127
3   main                          	0x00000001033978e2 _D4core8internal5abortQgFNbNiNfMAyaMQemZv + 262
4   main                          	0x0000000103394e73 thread_term + 259
5   main                          	0x00000001033a7268 rt_term + 88
6   main                          	0x00000001033a796c _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ6runAllMFZv + 208
7   main                          	0x00000001033a7848 _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ7tryExecMFMDFZvZv + 36
8   main                          	0x00000001033a77a8 _d_run_main + 764
9   main                          	0x0000000103384e72 main + 34
10  libdyld.dylib                 	0x00007fff4fd3c015 start + 1

Thread 1:
0   main                          	0x0000000103394ab3 _D4core6thread6Thread6removeFNbNiCQBgQBeQBaZv + 63
1   main                          	0x0000000103393922 thread_entryPoint + 526
2   libsystem_pthread.dylib       	0x00007fff50054661 _pthread_body + 340
3   libsystem_pthread.dylib       	0x00007fff5005450d _pthread_start + 377
4   libsystem_pthread.dylib       	0x00007fff50053bf9 thread_start + 13

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x00007fff88782380  rcx: 0x00007ffeec87b158  rdx: 0x0000000000000000
  rdi: 0x0000000000000307  rsi: 0x0000000000000006  rbp: 0x00007ffeec87b190  rsp: 0x00007ffeec87b158
   r8: 0x000000000000000a   r9: 0x0000000000000011  r10: 0x0000000000000000  r11: 0x0000000000000206
  r12: 0x0000000000000307  r13: 0x00007ffeec87b3c6  r14: 0x0000000000000006  r15: 0x000000000000002d
  rip: 0x00007fff4fe8cb66  rfl: 0x0000000000000206  cr2: 0x000000010343d088

Logical CPU:     6
Error Code:      0x00000004
Trap Number:     14

-- 
/Jacob Carlborg
June 15, 2019
On 15/06/2019 4:36 AM, FeepingCreature wrote:
> Consider the following code:
> 
> void main()
> {
>      import core.thread : Thread;
> 
>      with (new Thread({ })) { isDaemon = true; start; }
> }
> 
> On Linux, this builds and runs. Most of the time. Maybe 99% of the time.
> 
> But if you run it in a loop:
> 
> while true; do ./test || break; done
> 
> You may see that it segfaults after a few seconds. At least, it does for me on 2.080.0, Linux 4.18.0-20 x86_64.
> 
> This is obviously quite bad. Any ideas?

Cannot reproduce under Windows 10 dmd 2.082.0 and ldc2 1.12.0-beta2.

But this does not say much, Windows has a different set of costs related to threads + processes. It was a good second between runs.
June 17, 2019
Filed as 19978! (Darn, I was hoping for 20000.)

https://issues.dlang.org/show_bug.cgi?id=19978

Thanks everyone for the help in excluding kernel, backend and stdlib as source!