June 06, 2004
What I have here is only a theory, so if I'm completely off base just beat me with a banana and I'll shut up ;) I've been seeing some hangs in some of the multithreaded code I've been working on, and my browsing through stack traces and phobos code suggests this.

Executive Summary: A thread that is already waiting will not respond to
Thread.pauseAll(), causing Thread.pauseAll() to never return (stuck in
sem_wait). The garbage collector calls Thread.pauseAll(), triggering the
problem.

The longer version:

My suspicion right now is this: when I call pthread_cond_wait, it suspends the current process, using sigsuspend (which ignores all signals but those specified). When Thread.pauseAll suspends a thread, it does so by sending SIGUSR1. If the thread is already in a wait state, the thread will *stay suspended* (i.e. it will ignore SIGUSR1, because it's waiting for SIGUSR2).

Then, Thread.pauseAll() will sit in sem_wait(), waiting for the all the
threads to acknowledge being suspended. However, my user thread - already
waiting - doesn't acknowledge, because it ignored the signal (being already
suspended).

This means that Thread.pauseAll() never completes, and all threads are left in a wait state that is impossible for them to leave.

Any thoughts?

Mike Swieton
__
Freedom lies in being bold.
	- Robert Frost

June 06, 2004
Well, I've been unable to duplicate this in a small example, so it's probably not a bug. Not in DM code, anyway ;)

On Sat, 05 Jun 2004 21:22:59 -0400, Mike Swieton wrote:

> What I have here is only a theory, so if I'm completely off base just beat me with a banana and I'll shut up ;) I've been seeing some hangs in some of the multithreaded code I've been working on, and my browsing through stack traces and phobos code suggests this.
> 
> Executive Summary: A thread that is already waiting will not respond to
> Thread.pauseAll(), causing Thread.pauseAll() to never return (stuck in
> sem_wait). The garbage collector calls Thread.pauseAll(), triggering the
> problem.
> 
> The longer version:
> 
> My suspicion right now is this: when I call pthread_cond_wait, it suspends the current process, using sigsuspend (which ignores all signals but those specified). When Thread.pauseAll suspends a thread, it does so by sending SIGUSR1. If the thread is already in a wait state, the thread will *stay suspended* (i.e. it will ignore SIGUSR1, because it's waiting for SIGUSR2).
> 
> Then, Thread.pauseAll() will sit in sem_wait(), waiting for the all the
> threads to acknowledge being suspended. However, my user thread - already
> waiting - doesn't acknowledge, because it ignored the signal (being already
> suspended).
> 
> This means that Thread.pauseAll() never completes, and all threads are left in a wait state that is impossible for them to leave.
> 
> Any thoughts?
> 
> Mike Swieton
> __
> Freedom lies in being bold.
> 	- Robert Frost

-- 

Mike Swieton
__
But it is vital to remember that information - in the sense of raw data - is
not knowledge; that knowledge is not wisdom; and that wisdom is not foresight.
	- Sir Arthur C Clarke