September 05, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #20 from Sean Kelly <sean@invisibleduck.org> ---
It should.  Not doing so seems pretty broken.  But it this particular kernel it seems like maybe signals are ignored in this situation.

What's happening specifically is that the one thread is blocked on the mutex protecting the GC, and another thread holds that lock and is attempting a collection.

I could change this code to use a spin lock instead, but the same problem could crop up with any mutex if I understand the problem correctly.  I'm kind of curious to see whether the Boehm GC deadlocks in a similar situation with this kernel.  It should, since last time I checked it coordinated collections the exact same way on Linux.

--
September 05, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #21 from Sobirari Muhomori <dfj1esp02@sneakemail.com> ---
This mutex protects various global data like the list of threads in core.thread, not GC.

--
September 05, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #22 from Sean Kelly <sean@invisibleduck.org> ---
Yes I misspoke somewhat.  The GC acquires the lock to the global thread list while collecting to ensure that everything remains in a consistent state while the collection takes place.  In this case the GC already holds this lock and Thread.start() is blocked on it waiting to add the new thread to the list.

--
September 06, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #23 from Tomash Brechko <tomash.brechko@gmail.com> ---
I think the order of events is such that pthread_create() is followed by
pthread_kill() from main thread before the new thread had any chance to run.
In this case there are reports that the new thread may miss signals on Linux:
http://stackoverflow.com/questions/14827509/does-the-new-thread-exist-when-pthread-create-returns
.  I think POSIX intent is such that pthread_kill() should work once you have
thread ID, i.e. it's a bug with (some versions of) Linux kernel (maybe the
signal is first raised and then pending signals are cleared (per POSIX) for the
new thread when it starts, or the signal is not become pending as it is not
blocked, but is not delivered either because the thread is not really running
yet; though on my 3.15.10 pthread_kill() after pthread_create() always works in
C, and I don't have D compiler at the moment to check if I'm still able to
reproduce original problem).  OTOH issue 10351 is marked as duplicate, but it's
not clear if the threads involved there are newly created.

On a side note, in thread_entryPoint() there's a place:

 // NOTE: isRunning should be set to false after the thread is
 // removed or a double-removal could occur between this
 // function and thread_suspendAll.
 Thread.remove( obj );
 obj.m_isRunning = false;

Note that if thread_suspendAll() is called after remove() but before assignment
you still will have double removal.  This shouldn't relate to bug in question
however.

--
September 06, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #24 from Tomash Brechko <tomash.brechko@gmail.com> ---
Now I see that I was wrong about double removal, please ignore that part.

--
September 06, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #25 from Sean Kelly <sean@invisibleduck.org> ---
Hrm... at one point thread_entryPoint called Thread.add to add itself, but I think the add was moved to Thread.start at some point to deal with a race.  I had a comment in Thread.start explaining the rationale, but it looks like Thread.start has been heavily edited and the comment is gone.  Either way, having Thread.start call Thread.add *after* pthread_create is totally wrong, as it leaves a window for the thread to exist and be allocating memory but be unknown to the GC.

I think I'll have to roll back thread.d to find my original comments and see how it used to be implemented.  Something was clearly changed here, but there's no longer enough info to tell exactly what.

I've got to say that seeing these and other changes in core.thread without careful documentation of what was changed and why it was done is very frustrating.  There's simply no way to unit test for the existence or lack of deadlocks, and the comments in this module were built up over years of bug fixes to explain each situation and why the code was the way it was.  If someone changes the code in this module they *must* be absolutely sure of what they are doing and document accordingly.

--
September 06, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #26 from safety0ff.bugz <safety0ff.bugz@gmail.com> ---
(In reply to Sean Kelly from comment #25)
>
> I think I'll have to roll back thread.d to find my original comments and see how it used to be implemented.  Something was clearly changed here, but there's no longer enough info to tell exactly what.

This change? https://github.com/D-Programming-Language/druntime/commit/7a731ffe0869dc

--
September 08, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #27 from Sean Kelly <sean@invisibleduck.org> ---
Earlier than that.

--
September 10, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

Brad Roberts <braddr@puremagic.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |braddr@puremagic.com

--- Comment #28 from Brad Roberts <braddr@puremagic.com> ---
Might not be related, but for reference, bug 13416

--
October 01, 2014
https://issues.dlang.org/show_bug.cgi?id=4890

--- Comment #29 from badlink <andrea.9940@gmail.com> ---
Also present in DMD 2.067.0-b1.
Stacktrace of the sample program in comment 10: http://pastebin.com/4mudSeEX

--