Thread overview | ||||||||
---|---|---|---|---|---|---|---|---|
|
September 07, 2008 I think race condition exists in tango & phobos gc code | ||||
---|---|---|---|---|
| ||||
I have a programm wrote in D and run 24 * 7, I found it would block one time or twice a week (without using CPU load), whenever I use strace to check if if block at system all, it continue run (strange ? ) and I can resume it use kill -SIGUSR2, so I think this situation may associated with gc. But why strace ? I check the strace code, and found it would cause SIGSTOP to send, and I found SIGSTOP can not block by signal mask. Then I check the lib, and I think the problem may cause by the following execute order: thread A: thread B: fullcollect thread_suspendAll suspend thread_suspendHandler sem_post( &suspendCount ); ret from sem_wait( &suspendCount ); do collect thread_resumeAll !! this signal would lost pthread_kill( t.m_addr, SIGUSR2 ) sigsuspend( &sigres ); thread B would block because of the SIGUSR2 lost. then I check the phobos code, and the code is alike. Now, I 'm trying to use semaphore to do resume, and would check if my programming run correctly. Any suggest ? |
September 08, 2008 Re: I think race condition exists in tango & phobos gc code | ||||
---|---|---|---|---|
| ||||
Posted in reply to redsea | redsea wrote: > I have a programm wrote in D and run 24 * 7, I found it would block one time or twice a week (without using CPU load), whenever I use strace to check if if block at system all, it continue run (strange ? ) > > and I can resume it use kill -SIGUSR2, so I think this situation may associated with gc. But why strace ? I check the strace code, and found it would cause SIGSTOP to send, and I found SIGSTOP can not block by signal mask. > > Then I check the lib, and I think the problem may cause by the following execute order: > > thread A: thread B: fullcollect thread_suspendAll > suspend thread_suspendHandler > sem_post( &suspendCount ); > > ret from sem_wait( &suspendCount ); do collect > thread_resumeAll > !! this signal would lost > pthread_kill( t.m_addr, SIGUSR2 ) > sigsuspend( &sigres ); > > thread B would block because of the SIGUSR2 lost. SIGUSR2 shouldn't be lost. Tango sets sa_mask for the signal handlers to tell the OS to block all signals while the handler is processing. The call to sigsuspend is supposed to manually change that for the signals requested. > then I check the phobos code, and the code is alike. > > Now, I 'm trying to use semaphore to do resume, and would check if my programming run correctly. Thanks, please do. If it really is a problem I'd be happy to change it. Sean |
September 09, 2008 Re: I think race condition exists in tango & phobos gc code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Sean Kelly Wrote:
> SIGUSR2 shouldn't be lost. Tango sets sa_mask for the signal handlers to tell the OS to block all signals while the handler is processing. The call to sigsuspend is supposed to manually change that for the signals requested.
>
> > then I check the phobos code, and the code is alike.
> >
> > Now, I 'm trying to use semaphore to do resume, and would check if my programming run correctly.
>
> Thanks, please do. If it really is a problem I'd be happy to change it.
I wrote a small programm kill and sigsuspend use the order as me metioned before, the signal is not lost. So the real reason should hide more deep.
The version use semaphore finished, but I've to wait the adminstrator test & upload the programming.
I will do more check.
Thanks for your opinions .
|
September 11, 2008 I'm wrong. Re: I think race condition exists in tango & phobos gc code | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Sean Kelly Wrote:
>
> SIGUSR2 shouldn't be lost. Tango sets sa_mask for the signal handlers to tell the OS to block all signals while the handler is processing. The call to sigsuspend is supposed to manually change that for the signals requested.
I'm wrong.
Indeed the programming has two components, client & server, both is multi thread. I was reported that two components have same problem.
After check, I found the client version is correct, running stable, that the bug must be nothing about tango.
Sorry !
|
Copyright © 1999-2021 by the D Language Foundation