Thread overview | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
December 11, 2009 D1 garbage collector + threads + malloc = garbage? | ||||
---|---|---|---|---|
| ||||
Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610 Test case that freezes app (tested on 1.053): import std.c.stdlib; import std.stdio; import std.gc; import std.thread; import std.c.time; class Foo { void* p; this(){ synchronized(this.classinfo) { writefln("malloc begin"); p = std.c.stdlib.malloc(1024*1024); writefln("malloc finished"); } } ~this(){ synchronized(this.classinfo){ writefln("free begin"); std.c.stdlib.free(p); writefln("free finished"); } } } class MyThread : Thread { int run(){ while(true){ new Foo; msleep(1); } return 0; } } void main(){ for(int i=0; i<10; i++){ (new MyThread).start; } while(true){ writefln("collect begin"); std.gc.fullCollect; writefln("collect finished"); msleep(1000); } } Can it be fixed or adleast mentioned somewhere in docs as known limitation of D1 GC? It would save a lot of time to some people (too late for me :). |
December 12, 2009 Re: D1 garbage collector + threads + malloc = garbage? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bane | Bane, el 11 de diciembre a las 06:00 me escribiste: > Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610 > > Test case that freezes app (tested on 1.053): > > > import std.c.stdlib; > import std.stdio; > import std.gc; > import std.thread; > import std.c.time; > > > class Foo { > void* p; > this(){ > synchronized(this.classinfo) { > writefln("malloc begin"); > p = std.c.stdlib.malloc(1024*1024); > writefln("malloc finished"); > } > } > > ~this(){ > synchronized(this.classinfo){ > writefln("free begin"); > std.c.stdlib.free(p); > writefln("free finished"); > } > } > } > > class MyThread : Thread { > int run(){ > while(true){ > new Foo; > msleep(1); > } > return 0; > } > } > > void main(){ > for(int i=0; i<10; i++){ > (new MyThread).start; > } > > while(true){ > writefln("collect begin"); > std.gc.fullCollect; > writefln("collect finished"); > msleep(1000); > } > } > > > > Can it be fixed or adleast mentioned somewhere in docs as known limitation of D1 GC? It would save a lot of time to some people (too late for me :). The patch is very simple (as pointed by Grauzone), I guess it should make it into D1 (I hope): ------------------------------------------------------------ diff --git a/phobos/internal/gc/gcx.d b/phobos/internal/gc/gcx.d index b0d2821..7b1a7a3 100644 --- a/phobos/internal/gc/gcx.d +++ b/phobos/internal/gc/gcx.d @@ -2033,8 +2033,6 @@ struct Gcx } } - Thread.resumeAll(); - // Free up everything not marked debug(COLLECT_PRINTF) printf("\tfree'ing\n"); size_t freedpages = 0; @@ -2194,6 +2192,8 @@ struct Gcx debug(COLLECT_PRINTF) printf("recovered pages = %d\n", recoveredpages); debug(COLLECT_PRINTF) printf("\tfree'd %u bytes, %u pages from %u pools\ + Thread.resumeAll(); + return freedpages + recoveredpages; } ------------------------------------------------------------ I wanted to reproduce it to make a bug report to attach the patch but I couldn't. I replaced msleep() with usleep() (and multiplied the values by 1000) because I'm in Linux, but that's all I changed. I have an old AMD Athlon 1.8GHz. How do you reproduce it? I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla. Thanks! -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Nos retiramos hasta la semana que viene reflexionando sobre nuestras vidas: "Qué vida de mier'... Qué vida de mier'!" -- Sidharta Kiwi |
December 12, 2009 Re: D1 garbage collector + threads + malloc = garbage? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Leandro Lucarella | Leandro Lucarella Wrote:
> Bane, el 11 de diciembre a las 06:00 me escribiste:
> > Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
To be perfectly safe, it's a tiny bit trickier than that. You have to always lock on allocations instead of just locking the collection cycle in single-threaded apps. I think the reason for this is that once the lock is acquired it must be held until the current thread exits the GC code, and the "synchronized" statement only allows you to lock for the length of the collect() call (because the start and end must be within the same scope). However, this /may/ actually be a separate bug with the D 1.0 GC. I really can't recall for sure if this was an issue with collecting when all the other threads were suspended.
|
December 12, 2009 Re: D1 garbage collector + threads + malloc = garbage? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Leandro Lucarella | Leandro Lucarella wrote:
> I want to be sure the bug is present before the patch and fixed after the
> patch before submiting the patch via Bugzilla.
It's a good fix, I checked in a patch for it.
|
December 12, 2009 Re: D1 garbage collector + threads + malloc = garbage? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Sean Kelly wrote:
> Leandro Lucarella Wrote:
>
>> Bane, el 11 de diciembre a las 06:00 me escribiste:
>>> Bug with GC fullCollect() in multithreaded environment. Grauzone
>>> explained it here
>>> http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
>>>
>
> To be perfectly safe, it's a tiny bit trickier than that. You have
> to always lock on allocations instead of just locking the collection
> cycle in single-threaded apps. I think the reason for this is that
> once the lock is acquired it must be held until the current thread
> exits the GC code, and the "synchronized" statement only allows you
> to lock for the length of the collect() call (because the start and
> end must be within the same scope). However, this /may/ actually be
> a separate bug with the D 1.0 GC. I really can't recall for sure if
> this was an issue with collecting when all the other threads were
> suspended.
Yes, the lock on the gc must still be held until the collection work is completed, but the other threads can be unfrozen once the mark phase is complete.
|
December 12, 2009 Re: D1 garbage collector + threads + malloc = garbage? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Leandro Lucarella | Leandro Lucarella Wrote:
> Bane, el 11 de diciembre a las 06:00 me escribiste:
> > Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
> >
> > Test case that freezes app (tested on 1.053):
> >
> >
> > import std.c.stdlib;
> > import std.stdio;
> > import std.gc;
> > import std.thread;
> > import std.c.time;
> >
> >
> > class Foo {
> > void* p;
> > this(){
> > synchronized(this.classinfo) {
> > writefln("malloc begin");
> > p = std.c.stdlib.malloc(1024*1024);
> > writefln("malloc finished");
> > }
> > }
> >
> > ~this(){
> > synchronized(this.classinfo){
> > writefln("free begin");
> > std.c.stdlib.free(p);
> > writefln("free finished");
> > }
> > }
> > }
> >
> > class MyThread : Thread {
> > int run(){
> > while(true){
> > new Foo;
> > msleep(1);
> > }
> > return 0;
> > }
> > }
> >
> > void main(){
> > for(int i=0; i<10; i++){
> > (new MyThread).start;
> > }
> >
> > while(true){
> > writefln("collect begin");
> > std.gc.fullCollect;
> > writefln("collect finished");
> > msleep(1000);
> > }
> > }
> >
> >
> >
> > Can it be fixed or adleast mentioned somewhere in docs as known limitation of D1 GC? It would save a lot of time to some people (too late for me :).
>
> The patch is very simple (as pointed by Grauzone), I guess it should make it
> into D1 (I hope):
>
> ------------------------------------------------------------
> diff --git a/phobos/internal/gc/gcx.d b/phobos/internal/gc/gcx.d index b0d2821..7b1a7a3 100644
> --- a/phobos/internal/gc/gcx.d
> +++ b/phobos/internal/gc/gcx.d
> @@ -2033,8 +2033,6 @@ struct Gcx
> }
> }
>
> - Thread.resumeAll();
> -
> // Free up everything not marked
> debug(COLLECT_PRINTF) printf("\tfree'ing\n");
> size_t freedpages = 0;
> @@ -2194,6 +2192,8 @@ struct Gcx
> debug(COLLECT_PRINTF) printf("recovered pages = %d\n", recoveredpages);
> debug(COLLECT_PRINTF) printf("\tfree'd %u bytes, %u pages from %u pools\
>
> + Thread.resumeAll();
> +
> return freedpages + recoveredpages;
> }
>
> ------------------------------------------------------------
>
> I wanted to reproduce it to make a bug report to attach the patch but
> I couldn't. I replaced msleep() with usleep() (and multiplied the values
> by 1000) because I'm in Linux, but that's all I changed. I have an old AMD
> Athlon 1.8GHz. How do you reproduce it?
>
> I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla.
>
> Thanks!
>
> --
> Leandro Lucarella (AKA luca) http://llucax.com.ar/
> ----------------------------------------------------------------------
> GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05)
> ----------------------------------------------------------------------
> Nos retiramos hasta la semana que viene reflexionando sobre nuestras
> vidas: "Qué vida de mier'... Qué vida de mier'!"
> -- Sidharta Kiwi
Home computer I tested it in is winxp with phenom quad core, but this bug is confirmed with centos 5.2 + intel dual core and centos 5.2 + intel xeon.
Bug is present for a long time, thats for sure, It took me a while to track it because I thought its just me abusing keyboard, not a bug in phobos. Worst thing with this bug is that there is no output or exception when it happens - just that dreaded deadlock :)
Once thing I noted - if you reduce memory required in that malloc call from 1 MB to 1 KB or less (or increase sleep time between calls), chances for bug to occur drop drastically, so in most applications this probably will never happen.
|
December 12, 2009 Re: D1 garbage collector + threads + malloc = garbage? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright, el 11 de diciembre a las 20:51 me escribiste: > Leandro Lucarella wrote: > >I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla. > > It's a good fix, I checked in a patch for it. Great, thanks! -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- You should've seen her face. It was the exact same look my father gave me when I told him I wanted to be a ventriloquist. -- George Constanza |
December 12, 2009 Re: D1 garbage collector + threads + malloc = garbage? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Leandro Lucarella | Leandro Lucarella, el 12 de diciembre a las 11:18 me escribiste: > Walter Bright, el 11 de diciembre a las 20:51 me escribiste: > > Leandro Lucarella wrote: > > >I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla. > > > > It's a good fix, I checked in a patch for it. > > Great, thanks! If you could add a small description of what the commit is about besides pointing the NG discussion/Bugzilla number, as you are doing with DMD now, it would be really great! =) -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Le pedí que me enseñe a usar el mouse Pero solo quiere hablarme del Bauhaus Le pregunté si era chorra o rockera Me dijo "Gertrude Stein era re-tortillera" Me hizo mucho mal la cumbiera intelectual |
December 12, 2009 Re: D1 garbage collector + threads + malloc = garbage? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright Wrote:
> Sean Kelly wrote:
> > Leandro Lucarella Wrote:
> >
> >> Bane, el 11 de diciembre a las 06:00 me escribiste:
> >>> Bug with GC fullCollect() in multithreaded environment. Grauzone
> >>> explained it here
> >>> http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
> >>>
> >
> > To be perfectly safe, it's a tiny bit trickier than that. You have to always lock on allocations instead of just locking the collection cycle in single-threaded apps. I think the reason for this is that once the lock is acquired it must be held until the current thread exits the GC code, and the "synchronized" statement only allows you to lock for the length of the collect() call (because the start and end must be within the same scope). However, this /may/ actually be a separate bug with the D 1.0 GC. I really can't recall for sure if this was an issue with collecting when all the other threads were suspended.
>
> Yes, the lock on the gc must still be held until the collection work is completed, but the other threads can be unfrozen once the mark phase is complete.
Hm... so if a dtor creates a new thread then that thread could enter and "lock" the GC while the first thread is still completing its allocation. Couldn't this cause problems?
|
December 12, 2009 Re: D1 garbage collector + threads + malloc = garbage? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Leandro Lucarella | > I wanted to reproduce it to make a bug report to attach the patch but
> I couldn't. I replaced msleep() with usleep() (and multiplied the values
> by 1000) because I'm in Linux, but that's all I changed. I have an old AMD
> Athlon 1.8GHz. How do you reproduce it?
>
> I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla.
>
> Thanks!
I forgot: for linux just remove msleep lines completely. It will deadlock soon after program start. Seems that linux is more resistant to this bug than windows.
|
Copyright © 1999-2021 by the D Language Foundation