Jump to page: 1 2
Thread overview
D1 garbage collector + threads + malloc = garbage?
Dec 11, 2009
Bane
Dec 12, 2009
Leandro Lucarella
Dec 12, 2009
Sean Kelly
Dec 12, 2009
Walter Bright
Dec 12, 2009
Sean Kelly
Dec 12, 2009
Walter Bright
Dec 12, 2009
Sean Kelly
Dec 12, 2009
Walter Bright
Dec 13, 2009
Sean Kelly
Dec 12, 2009
Walter Bright
Dec 12, 2009
Leandro Lucarella
Dec 12, 2009
Leandro Lucarella
Dec 12, 2009
Walter Bright
Dec 13, 2009
Leandro Lucarella
Dec 12, 2009
Bane
Dec 12, 2009
Bane
Dec 12, 2009
Bane
December 11, 2009
Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610

Test case that freezes app (tested on 1.053):


import std.c.stdlib;
import std.stdio;
import std.gc;
import std.thread;
import std.c.time;


class Foo {
  void* p;
  this(){
    synchronized(this.classinfo) {
      writefln("malloc begin");
      p = std.c.stdlib.malloc(1024*1024);
      writefln("malloc finished");
    }
  }

  ~this(){
    synchronized(this.classinfo){
      writefln("free begin");
      std.c.stdlib.free(p);
      writefln("free finished");
    }
  }
}

class MyThread : Thread {
  int run(){
    while(true){
      new Foo;
      msleep(1);
    }
    return 0;
  }
}

void main(){
  for(int i=0; i<10; i++){
    (new MyThread).start;
  }

  while(true){
    writefln("collect begin");
    std.gc.fullCollect;
    writefln("collect finished");
    msleep(1000);
  }
}



Can it be fixed or adleast mentioned somewhere in docs as known limitation of D1 GC? It would save a lot of time to some people (too late for me :).
December 12, 2009
Bane, el 11 de diciembre a las 06:00 me escribiste:
> Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
> 
> Test case that freezes app (tested on 1.053):
> 
> 
> import std.c.stdlib;
> import std.stdio;
> import std.gc;
> import std.thread;
> import std.c.time;
> 
> 
> class Foo {
>   void* p;
>   this(){
>     synchronized(this.classinfo) {
>       writefln("malloc begin");
>       p = std.c.stdlib.malloc(1024*1024);
>       writefln("malloc finished");
>     }
>   }
> 
>   ~this(){
>     synchronized(this.classinfo){
>       writefln("free begin");
>       std.c.stdlib.free(p);
>       writefln("free finished");
>     }
>   }
> }
> 
> class MyThread : Thread {
>   int run(){
>     while(true){
>       new Foo;
>       msleep(1);
>     }
>     return 0;
>   }
> }
> 
> void main(){
>   for(int i=0; i<10; i++){
>     (new MyThread).start;
>   }
> 
>   while(true){
>     writefln("collect begin");
>     std.gc.fullCollect;
>     writefln("collect finished");
>     msleep(1000);
>   }
> }
> 
> 
> 
> Can it be fixed or adleast mentioned somewhere in docs as known limitation of D1 GC? It would save a lot of time to some people (too late for me :).

The patch is very simple (as pointed by Grauzone), I guess it should make it
into D1 (I hope):

------------------------------------------------------------
diff --git a/phobos/internal/gc/gcx.d b/phobos/internal/gc/gcx.d index b0d2821..7b1a7a3 100644
--- a/phobos/internal/gc/gcx.d
+++ b/phobos/internal/gc/gcx.d
@@ -2033,8 +2033,6 @@ struct Gcx
            }
        }

-       Thread.resumeAll();
-
        // Free up everything not marked
        debug(COLLECT_PRINTF) printf("\tfree'ing\n");
        size_t freedpages = 0;
@@ -2194,6 +2192,8 @@ struct Gcx
        debug(COLLECT_PRINTF) printf("recovered pages = %d\n", recoveredpages);
        debug(COLLECT_PRINTF) printf("\tfree'd %u bytes, %u pages from %u pools\

+       Thread.resumeAll();
+
        return freedpages + recoveredpages;
     }

------------------------------------------------------------

I wanted to reproduce it to make a bug report to attach the patch but
I couldn't. I replaced msleep() with usleep() (and multiplied the values
by 1000) because I'm in Linux, but that's all I changed. I have an old AMD
Athlon 1.8GHz. How do you reproduce it?

I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla.

Thanks!

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Nos retiramos hasta la semana que viene reflexionando sobre nuestras
vidas: "Qué vida de mier'... Qué vida de mier'!"
	-- Sidharta Kiwi
December 12, 2009
Leandro Lucarella Wrote:

> Bane, el 11 de diciembre a las 06:00 me escribiste:
> > Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610

To be perfectly safe, it's a tiny bit trickier than that.  You have to always lock on allocations instead of just locking the collection cycle in single-threaded apps.  I think the reason for this is that once the lock is acquired it must be held until the current thread exits the GC code, and the "synchronized" statement only allows you to lock for the length of the collect() call (because the start and end must be within the same scope).  However, this /may/ actually be a separate bug with the D 1.0 GC.  I really can't recall for sure if this was an issue with collecting when all the other threads were suspended.
December 12, 2009
Leandro Lucarella wrote:
> I want to be sure the bug is present before the patch and fixed after the
> patch before submiting the patch via Bugzilla.

It's a good fix, I checked in a patch for it.
December 12, 2009
Sean Kelly wrote:
> Leandro Lucarella Wrote:
> 
>> Bane, el 11 de diciembre a las 06:00 me escribiste:
>>> Bug with GC fullCollect() in multithreaded environment. Grauzone
>>> explained it here
>>> http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
>>> 
> 
> To be perfectly safe, it's a tiny bit trickier than that.  You have
> to always lock on allocations instead of just locking the collection
> cycle in single-threaded apps.  I think the reason for this is that
> once the lock is acquired it must be held until the current thread
> exits the GC code, and the "synchronized" statement only allows you
> to lock for the length of the collect() call (because the start and
> end must be within the same scope).  However, this /may/ actually be
> a separate bug with the D 1.0 GC.  I really can't recall for sure if
> this was an issue with collecting when all the other threads were
> suspended.

Yes, the lock on the gc must still be held until the collection work is completed, but the other threads can be unfrozen once the mark phase is complete.
December 12, 2009
Leandro Lucarella Wrote:

> Bane, el 11 de diciembre a las 06:00 me escribiste:
> > Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
> > 
> > Test case that freezes app (tested on 1.053):
> > 
> > 
> > import std.c.stdlib;
> > import std.stdio;
> > import std.gc;
> > import std.thread;
> > import std.c.time;
> > 
> > 
> > class Foo {
> >   void* p;
> >   this(){
> >     synchronized(this.classinfo) {
> >       writefln("malloc begin");
> >       p = std.c.stdlib.malloc(1024*1024);
> >       writefln("malloc finished");
> >     }
> >   }
> > 
> >   ~this(){
> >     synchronized(this.classinfo){
> >       writefln("free begin");
> >       std.c.stdlib.free(p);
> >       writefln("free finished");
> >     }
> >   }
> > }
> > 
> > class MyThread : Thread {
> >   int run(){
> >     while(true){
> >       new Foo;
> >       msleep(1);
> >     }
> >     return 0;
> >   }
> > }
> > 
> > void main(){
> >   for(int i=0; i<10; i++){
> >     (new MyThread).start;
> >   }
> > 
> >   while(true){
> >     writefln("collect begin");
> >     std.gc.fullCollect;
> >     writefln("collect finished");
> >     msleep(1000);
> >   }
> > }
> > 
> > 
> > 
> > Can it be fixed or adleast mentioned somewhere in docs as known limitation of D1 GC? It would save a lot of time to some people (too late for me :).
> 
> The patch is very simple (as pointed by Grauzone), I guess it should make it
> into D1 (I hope):
> 
> ------------------------------------------------------------
> diff --git a/phobos/internal/gc/gcx.d b/phobos/internal/gc/gcx.d index b0d2821..7b1a7a3 100644
> --- a/phobos/internal/gc/gcx.d
> +++ b/phobos/internal/gc/gcx.d
> @@ -2033,8 +2033,6 @@ struct Gcx
>             }
>         }
> 
> -       Thread.resumeAll();
> -
>         // Free up everything not marked
>         debug(COLLECT_PRINTF) printf("\tfree'ing\n");
>         size_t freedpages = 0;
> @@ -2194,6 +2192,8 @@ struct Gcx
>         debug(COLLECT_PRINTF) printf("recovered pages = %d\n", recoveredpages);
>         debug(COLLECT_PRINTF) printf("\tfree'd %u bytes, %u pages from %u pools\
> 
> +       Thread.resumeAll();
> +
>         return freedpages + recoveredpages;
>      }
> 
> ------------------------------------------------------------
> 
> I wanted to reproduce it to make a bug report to attach the patch but
> I couldn't. I replaced msleep() with usleep() (and multiplied the values
> by 1000) because I'm in Linux, but that's all I changed. I have an old AMD
> Athlon 1.8GHz. How do you reproduce it?
> 
> I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla.
> 
> Thanks!
> 
> -- 
> Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
> ----------------------------------------------------------------------
> GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
> ----------------------------------------------------------------------
> Nos retiramos hasta la semana que viene reflexionando sobre nuestras
> vidas: "Qué vida de mier'... Qué vida de mier'!"
> 	-- Sidharta Kiwi

Home computer I tested it in is winxp with phenom quad core, but this bug is confirmed with centos 5.2 + intel dual core and centos 5.2 + intel xeon.

Bug is present for a long time, thats for sure, It took me a while to track it because I thought its just me abusing keyboard, not a bug in phobos. Worst thing with this bug is that there is no output or exception when it happens - just that dreaded deadlock :)

Once thing I noted - if you reduce memory required in that malloc call from 1 MB to 1 KB or less (or increase sleep time between calls), chances for bug to occur drop drastically, so in most applications this probably will never happen.
December 12, 2009
Walter Bright, el 11 de diciembre a las 20:51 me escribiste:
> Leandro Lucarella wrote:
> >I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla.
> 
> It's a good fix, I checked in a patch for it.

Great, thanks!

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
You should've seen her face. It was the exact same look my father gave me when
I told him I wanted to be a ventriloquist.
	-- George Constanza
December 12, 2009
Leandro Lucarella, el 12 de diciembre a las 11:18 me escribiste:
> Walter Bright, el 11 de diciembre a las 20:51 me escribiste:
> > Leandro Lucarella wrote:
> > >I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla.
> > 
> > It's a good fix, I checked in a patch for it.
> 
> Great, thanks!

If you could add a small description of what the commit is about besides pointing the NG discussion/Bugzilla number, as you are doing with DMD now, it would be really great! =)

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Le pedí que me enseñe a usar el mouse
Pero solo quiere hablarme del Bauhaus
Le pregunté si era chorra o rockera
Me dijo "Gertrude Stein era re-tortillera"
Me hizo mucho mal la cumbiera intelectual
December 12, 2009
Walter Bright Wrote:

> Sean Kelly wrote:
> > Leandro Lucarella Wrote:
> > 
> >> Bane, el 11 de diciembre a las 06:00 me escribiste:
> >>> Bug with GC fullCollect() in multithreaded environment. Grauzone
> >>> explained it here
> >>> http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
> >>> 
> > 
> > To be perfectly safe, it's a tiny bit trickier than that.  You have to always lock on allocations instead of just locking the collection cycle in single-threaded apps.  I think the reason for this is that once the lock is acquired it must be held until the current thread exits the GC code, and the "synchronized" statement only allows you to lock for the length of the collect() call (because the start and end must be within the same scope).  However, this /may/ actually be a separate bug with the D 1.0 GC.  I really can't recall for sure if this was an issue with collecting when all the other threads were suspended.
> 
> Yes, the lock on the gc must still be held until the collection work is completed, but the other threads can be unfrozen once the mark phase is complete.

Hm... so if a dtor creates a new thread then that thread could enter and "lock" the GC while the first thread is still completing its allocation.  Couldn't this cause problems?
December 12, 2009
> I wanted to reproduce it to make a bug report to attach the patch but
> I couldn't. I replaced msleep() with usleep() (and multiplied the values
> by 1000) because I'm in Linux, but that's all I changed. I have an old AMD
> Athlon 1.8GHz. How do you reproduce it?
> 
> I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla.
> 
> Thanks!

I forgot: for linux just remove msleep lines completely. It will deadlock soon after program start. Seems that linux is more resistant to this bug than windows.
« First   ‹ Prev
1 2