Thread overview
Race conditions in critical.c?
Sep 20, 2008
Mike Hearn
Re: Russian D community: dprogramming.ru
Sep 22, 2008
Janderson
Sep 25, 2008
Mike Hearn
Sep 25, 2008
Mike Hearn
Sep 26, 2008
Fawzi Mohamed
Sep 30, 2008
Bartosz Milewski
Oct 01, 2008
Sean Kelly
Oct 01, 2008
Sean Kelly
Oct 01, 2008
Fawzi Mohamed
September 20, 2008
I was reading the code to the phobos runtime library and saw the following code in critical.c:

void _d_criticalenter(D_CRITICAL_SECTION *dcs)
{
    if (!dcs->next)
    {
        EnterCriticalSection(&critical_section.cs);
        if (!dcs->next) // if, in the meantime, another thread didn't set it
        {
            dcs->next = dcs_list;
                 <----
            dcs_list = dcs;
            InitializeCriticalSection(&dcs->cs);
        }
        LeaveCriticalSection(&critical_section.cs);
    }
    EnterCriticalSection(&dcs->cs);
}

There's a race condition here - consider a thread that is suspended at the marked point. Another thread then attempts to lock the critsec. It sees that dcs->next is set and calls EnterCriticalSection on an uninitialized CRITICAL_SECTION.

The Linux code has the same problem, and then another one as well - the initialization of the "critsec lock" is itself lazy and is not protected by anything at all. The result is potentially multiple initializations of that lock, and because atexit does not coalesce multiple registrations, a potentially catastrophic multiple invocation of _STD_critical_term.

It appears this code (for both platforms) needs to be scrapped and rewritten. I'd start by checking that lazy initialization of critsecs is actually needed - do D compilers really generate tons of potentially unnecessary D_CRITICAL_SECTIONs?

thanks -mike
September 22, 2008
Digited wrote:
> thunderbird "rules"... every post option does the same thing... sorry. :(

You can cancel/remove your own messages in Thunderbird if you wish :)

-Joel
September 25, 2008
Mike Hearn Wrote:

> I was reading the code to the phobos runtime library and saw the following code in critical.c:
> 
> void _d_criticalenter(D_CRITICAL_SECTION *dcs)
> {
>     if (!dcs->next)
>     {
>         EnterCriticalSection(&critical_section.cs);
>         if (!dcs->next) // if, in the meantime, another thread didn't set it
>         {
>             dcs->next = dcs_list;
>                  <----
>             dcs_list = dcs;
>             InitializeCriticalSection(&dcs->cs);
>         }
>         LeaveCriticalSection(&critical_section.cs);
>     }
>     EnterCriticalSection(&dcs->cs);
> }
> 
> There's a race condition here - consider a thread that is suspended at the marked point. Another thread then attempts to lock the critsec. It sees that dcs->next is set and calls EnterCriticalSection on an uninitialized CRITICAL_SECTION.
> 
> The Linux code has the same problem, and then another one as well - the initialization of the "critsec lock" is itself lazy and is not protected by anything at all. The result is potentially multiple initializations of that lock, and because atexit does not coalesce multiple registrations, a potentially catastrophic multiple invocation of _STD_critical_term.
> 
> It appears this code (for both platforms) needs to be scrapped and rewritten. I'd start by checking that lazy initialization of critsecs is actually needed - do D compilers really generate tons of potentially unnecessary D_CRITICAL_SECTIONs?
> 
> thanks -mike

September 25, 2008
Sorry, hit return too early. So is anybody going to take a look at this? I just noticed there is a Bugzilla, should I file a bug?

Mike Hearn  Wrote:

> Mike Hearn Wrote:
> 
> > I was reading the code to the phobos runtime library and saw the following code in critical.c:
> > 
> > void _d_criticalenter(D_CRITICAL_SECTION *dcs)
> > {
> >     if (!dcs->next)
> >     {
> >         EnterCriticalSection(&critical_section.cs);
> >         if (!dcs->next) // if, in the meantime, another thread didn't set it
> >         {
> >             dcs->next = dcs_list;
> >                  <----
> >             dcs_list = dcs;
> >             InitializeCriticalSection(&dcs->cs);
> >         }
> >         LeaveCriticalSection(&critical_section.cs);
> >     }
> >     EnterCriticalSection(&dcs->cs);
> > }
> > 
> > There's a race condition here - consider a thread that is suspended at the marked point. Another thread then attempts to lock the critsec. It sees that dcs->next is set and calls EnterCriticalSection on an uninitialized CRITICAL_SECTION.
> > 
> > The Linux code has the same problem, and then another one as well - the initialization of the "critsec lock" is itself lazy and is not protected by anything at all. The result is potentially multiple initializations of that lock, and because atexit does not coalesce multiple registrations, a potentially catastrophic multiple invocation of _STD_critical_term.
> > 
> > It appears this code (for both platforms) needs to be scrapped and rewritten. I'd start by checking that lazy initialization of critsecs is actually needed - do D compilers really generate tons of potentially unnecessary D_CRITICAL_SECTIONs?
> > 
> > thanks -mike
> 

September 26, 2008
On 2008-09-25 23:23:10 +0200, Mike Hearn  <mike@plan99.net> said:

> Sorry, hit return too early. So is anybody going to take a look at this? I just noticed there is a Bugzilla, should I file a bug?

You are right dcs->next should be set only after the initialization is performed.
do submit a bug on bugzilla
	http://d.puremagic.com/issues/

Well spotted!

Fawzi
> 
> Mike Hearn  Wrote:
> 
>> Mike Hearn Wrote:
>> 
>>> I was reading the code to the phobos runtime library and saw the following code in critical.c:
>>> 
>>> void _d_criticalenter(D_CRITICAL_SECTION *dcs)
>>> {
>>> if (!dcs->next)
>>> {
>>> EnterCriticalSection(&critical_section.cs);
>>> if (!dcs->next) // if, in the meantime, another thread didn't set it
>>> {
>>> dcs->next = dcs_list;
>>> <----
>>> dcs_list = dcs;
>>> InitializeCriticalSection(&dcs->cs);
>>> }
>>> LeaveCriticalSection(&critical_section.cs);
>>> }
>>> EnterCriticalSection(&dcs->cs);
>>> }
>>> 
>>> There's a race condition here - consider a thread that is suspended at the marked point. Another thread then attempts to lock the critsec. It sees that dcs->next is set and calls EnterCriticalSection on an uninitialized CRITICAL_SECTION.
>>> 
>>> The Linux code has the same problem, and then another one as well - the initialization of the "critsec lock" is itself lazy and is not protected by anything at all. The result is potentially multiple initializations of that lock, and because atexit does not coalesce multiple registrations, a potentially catastrophic multiple invocation of _STD_critical_term.
>>> 
>>> It appears this code (for both platforms) needs to be scrapped and rewritten. I'd start by checking that lazy initialization of critsecs is actually needed - do D compilers really generate tons of potentially unnecessary D_CRITICAL_SECTIONs?
>>> 
>>> thanks -mike


September 30, 2008
When I was rewriting monitor.c in D, I noticed the race in critical.c but decided not to work on it. Instead I've been trying to convince Walter to scrap critical.c altogether. Here's why:

How do you interpret the following fragment of code? (It's a simplified version of actual code written by a D programmer.)

class Foo {
   int i, j;
   void a() {
      synchronized {
         int x = ++i;
         j = x;
      }
   }
   void b() {
      synchronized {
         int x = --i;
         j = x;
      }
   }
}

Does it guarantee that i==j? If you say yes, you subscribe to one of the schools of thought that argue that "synchronized" in this context either means synchronized(this) or synchronized(this.classinfo), or synchronized(SomeHiddenGlobalSingleton).

Wrong! The code in critical.c creates a separate new critical section for every synchronized block of code. So a() and b() use _different_ locks and their execution may interleave.

If you think this is bizarre bordering on reckless I'm with you.

Mike Hearn wrote:
>>>
>>> I was reading the code to the phobos runtime library and saw the following code in critical.c:
>>>
>>> void _d_criticalenter(D_CRITICAL_SECTION *dcs)
>>> {
>>>     if (!dcs->next)
>>>     {
>>>         EnterCriticalSection(&critical_section.cs);
>>>         if (!dcs->next) // if, in the meantime, another thread didn't set it
>>>         {
>>>             dcs->next = dcs_list;
>>>                  <----               dcs_list = dcs;                  InitializeCriticalSection(&dcs->cs);
>>>         }
>>>         LeaveCriticalSection(&critical_section.cs);
>>>     }
>>>     EnterCriticalSection(&dcs->cs);
>>> }
>>>
>>> There's a race condition here - consider a thread that is suspended at the marked point. Another thread then attempts to lock the critsec. It sees that dcs->next is set and calls EnterCriticalSection on an uninitialized CRITICAL_SECTION.
>>>
>>> The Linux code has the same problem, and then another one as well - the initialization of the "critsec lock" is itself lazy and is not protected by anything at all. The result is potentially multiple initializations of that lock, and because atexit does not coalesce multiple registrations, a potentially catastrophic multiple invocation of _STD_critical_term.
>>>
>>> It appears this code (for both platforms) needs to be scrapped and rewritten. I'd start by checking that lazy initialization of critsecs is actually needed - do D compilers really generate tons of potentially unnecessary D_CRITICAL_SECTIONs?
>>>
>>> thanks -mike
> 
October 01, 2008
Bartosz Milewski wrote:
> 
> If you think this is bizarre bordering on reckless I'm with you.

Yup, it's completely useless given that POSIX only provides guarantees for visibility, etc, for threads entering the *same* mutex.  So one couldn't even argue that this approach would work as some strange sort of memory barrier.


Sean
October 01, 2008
Sean Kelly wrote:
> Bartosz Milewski wrote:
>>
>> If you think this is bizarre bordering on reckless I'm with you.
> 
> Yup, it's completely useless given that POSIX only provides guarantees for visibility, etc, for threads entering the *same* mutex.  So one couldn't even argue that this approach would work as some strange sort of memory barrier.

Oh, and I haven't fixed critical.c for the same reason.  Given how they work, does *anyone* use this feature?


Sean
October 01, 2008
On 2008-10-01 06:49:11 +0200, Sean Kelly <sean@invisibleduck.org> said:

> Sean Kelly wrote:
>> Bartosz Milewski wrote:
>>> 
>>> If you think this is bizarre bordering on reckless I'm with you.
>> 
>> Yup, it's completely useless given that POSIX only provides guarantees for visibility, etc, for threads entering the *same* mutex.  So one couldn't even argue that this approach would work as some strange sort of memory barrier.
> 
> Oh, and I haven't fixed critical.c for the same reason.  Given how they work, does *anyone* use this feature?
> 
> 
> Sean

I think that it should either throw an exception "unsupported" or be corrected.

it is possible to use it correctly if you have a function that is called in parallel with only one critical section, or when the whole function is synchronized.

I checked my code and I prefer explicit locks, so indeed I haven't used, but both tango and phobos have some synchronized statements.

Fawzi