Jump to page: 1 2 3
Thread overview
[Issue 4890] GC.collect() deadlocks multithreaded program.
Sep 04, 2014
Sean Kelly
Sep 04, 2014
Sean Kelly
Sep 04, 2014
Sean Kelly
Sep 04, 2014
Sean Kelly
Sep 05, 2014
badlink
Sep 05, 2014
badlink
Sep 05, 2014
Marco Leise
Sep 05, 2014
Sobirari Muhomori
Sep 05, 2014
Sean Kelly
Sep 05, 2014
Sobirari Muhomori
Sep 05, 2014
Sean Kelly
Sep 06, 2014
Tomash Brechko
Sep 06, 2014
Tomash Brechko
Sep 06, 2014
Sean Kelly
Sep 06, 2014
safety0ff.bugz
Sep 08, 2014
Sean Kelly
Sep 10, 2014
Brad Roberts
Oct 01, 2014
badlink
Oct 01, 2014
badlink
Oct 02, 2014
Sobirari Muhomori
Oct 04, 2014
Martin Nowak
Oct 05, 2014
Sean Kelly
Oct 05, 2014
badlink
Oct 06, 2014
badlink
September 04, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

andrea.9940@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |industry
             Status|RESOLVED                    |REOPENED
                 CC|                            |andrea.9940@gmail.com
         Resolution|FIXED                       |---
           Severity|normal                      |regression

--- Comment #10 from andrea.9940@gmail.com ---
This bug is present in DMD 2.066 on Arch Linux 3.14.17-1-lts x86_64 (GNU libc
2.19).
The code posted originally still deadlocks (and even with j.sleep uncommented,
it never prints a "." which means GC.collect never returns):

import core.thread, core.memory, std.stdio;

class Job : Thread {
 this() {
   super(&run);
 }

 private void run() {
   while (true) write("*");
 }
}

void main() {
 Job j = new Job;
 j.start();

 //j.sleep(dur!"msecs"(1));

 GC.collect();

 while(true) write(".");
}

--
September 04, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #11 from Sean Kelly <sean@invisibleduck.org> ---
My initial guess is that this has something to do with the changes for critical regions, as the algorithm for collection before that seemed quite solid.  I'll try for a repro on my end though.  What would be really useful from whoever encounters this is to trap it in a debugger and include stack traces of all relevant threads.  Something has to be blocked on a lock or signal somewhere, but without knowing which one there's little that can be done.

--
September 04, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #12 from Sean Kelly <sean@invisibleduck.org> ---
Um... I may be wrong in what I just said.  It looks like someone added a delegate call within the signal handler for coordinating collections on Linux. There's a decent chance that a dynamic stack frame is being allocated by the GC within that signal handler, which would be Very Bad.

--
September 04, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #13 from andrea.9940@gmail.com ---
Just tested, the bug is not present on Windows (DMD 2.066)

--
September 04, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #14 from Sean Kelly <sean@invisibleduck.org> ---
It's likely as I said.  The way GC collections work is different on different platforms.  Both Windows and OSX use a kernel call to suspend threads and inspect their stacks.  On other Unix platforms (like Linux), the suspending is done via signals, and signal handlers are VERY restrictive in what can safely be done inside them.  And either way, having one thread try to allocate something from the GC inside this suspend handler is a guaranteed deadlock.  If this is really what's going on I'm amazed that D on Linux works at all.  Maybe it really is something else...

I'm setting up a new Linux VM and so should hopefully be able to repro this shortly.

--
September 04, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #15 from Sean Kelly <sean@invisibleduck.org> ---
Okay, I can't reproduce this using the provided code on Oracle Linux 64-bit. If someone has a reliable repro, please let me know.

--
September 05, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #16 from badlink <andrea.9940@gmail.com> ---
(In reply to Sean Kelly from comment #15)
> Okay, I can't reproduce this using the provided code on Oracle Linux 64-bit. If someone has a reliable repro, please let me know.

My Linux machine is using Arch Linux, 3.14.17-1-lts x86_64 kernel, GNU libc
2.19.
Oracle Linux is completely different as it is using the 3.8.13 x86_64 kernel
and glibc 2.17
(http://www.oracle.com/us/technologies/linux/product/specifications/index.html).
Try Manjaro Linux wich is based on Arch but come with a ready desktop
environment (just run `pacman -S dlang-dmd` to get DMD)

--
September 05, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #17 from badlink <andrea.9940@gmail.com> ---
Created attachment 1416
  --> https://issues.dlang.org/attachment.cgi?id=1416&action=edit
stack trace

--
September 05, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

Marco Leise <Marco.Leise@gmx.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |Marco.Leise@gmx.de

--- Comment #18 from Marco Leise <Marco.Leise@gmx.de> ---
*** Issue 10351 has been marked as a duplicate of this issue. ***

--
September 05, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #19 from Sobirari Muhomori <dfj1esp02@sneakemail.com> ---
(In reply to badlink from comment #17)
> stack trace

Hmm... if a thread hangs on a mutex, does it handle signals?

--
« First   ‹ Prev
1 2 3