Antti-Ville Tuuainen Passes GSoC Final Evaluation (page 2)

On Thursday, 23 August 2012 at 14:38:19 UTC, Alex Rønne Petersen wrote: > Yes, but parallelization of the mark phase is fairly trivial, and something we should probably look into. Ironically, Antti-ville's original proposal involved parallelization. This was scrapped because after rtinfo was added, we agreed that precise heap scanning was more important and looked newly feasible.

On 23-08-2012 16:47, dsimcha wrote: > On Thursday, 23 August 2012 at 14:38:19 UTC, Alex Rønne Petersen wrote: >> Yes, but parallelization of the mark phase is fairly trivial, and >> something we should probably look into. > > Ironically, Antti-ville's original proposal involved parallelization. > This was scrapped because after rtinfo was added, we agreed that precise > heap scanning was more important and looked newly feasible. Oh, I agree it's more important. The GC is reasonably fast as-is, but has severe issues with false pointers. Parallel marking is just in the "nice to have" category. -- Alex Rønne Petersen alex@lycus.org http://lycus.org

On Thursday, 23 August 2012 at 14:49:11 UTC, Alex Rønne Petersen wrote: > On 23-08-2012 16:47, dsimcha wrote: >> On Thursday, 23 August 2012 at 14:38:19 UTC, Alex Rønne Petersen wrote: >>> Yes, but parallelization of the mark phase is fairly trivial, and >>> something we should probably look into. >> >> Ironically, Antti-ville's original proposal involved parallelization. >> This was scrapped because after rtinfo was added, we agreed that precise >> heap scanning was more important and looked newly feasible. > > Oh, I agree it's more important. The GC is reasonably fast as-is, but has severe issues with false pointers. Parallel marking is just in the "nice to have" category. Funny, I've just read an interview from Narihiro Nakamura who is working on parallel marking in Ruby: http://rubysource.com/narihiro-nakamura-rubys-gc-innovator/ BR, renoX

August 23, 2012

Re: Antti-Ville Tuuainen Passes GSoC Final Evaluation

Posted by Leandro Lucarella
in reply to Alex Rønne Petersen

Permalink

Leandro Lucarella

Posted in reply to Alex Rønne Petersen

Permalink

Alex Rønne Petersen, el 23 de August a las 16:38 me escribiste:
> On 23-08-2012 15:21, Rory McGuire wrote:
> >On Thu, Aug 23, 2012 at 2:51 PM, dsimcha <dsimcha@yahoo.com <mailto:dsimcha@yahoo.com>> wrote:
> >
> >    Basically, the idea is to store information about what is and isn't
> >    a pointer at the pool level instead of at the block level.  My
> >    attempt from a long time ago at precise heap scanning, and
> >    Antti-Ville's first attempt, stored meta-data at the end of every
> >    allocated block.  This worked well for large arrays, but was
> >    terribly inefficient for smaller allocations and made the GC code
> >    even messier than it already is.  The overhead was a fixed
> >    (void*).sizeof bits per block.  Now, each pool has a bit array that
> >    contains one bit for every possible aligned pointer.  The overhead
> >    is always 1 bit for every (void*).sizeof bytes no matter how large
> >    or small the block is.
> >
> >
> >Am I correct in thinking that this is still single threaded stop the world?
>
> Yes, but parallelization of the mark phase is fairly trivial, and something we should probably look into.
>
> The GC will probably always be STW unless we get compiler support for inserting GC barriers.

Feel free to use my work[1] to have a cloning collector which is not STW
in the classical sense (it just STW for the time it takes to pause the
threads and fork() the process), at least on *nixes. Works fairly
well[2], even in production code. There's is only one issue[3], but
a nasty one, because of a glibc internal lock shared between fork() and
other glibc functions like malloc and the GC using signals to interrupt
the threads. I hope I can fix this problem eventually (the ideal for me
would be to find another way to pause the threads).

[1] http://www.dsource.org/projects/tango/browser/trunk/tango/core/rt/gc/cdgc
[2] http://www.llucax.com.ar/blog/blog/post/-1a4bdfba
[3] http://www.dsource.org/projects/tango/ticket/2087

--
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Si por el chancho fuera, se autocomería con chimichurri Worshestershire!

On 2012-08-23 16:38, Alex Rønne Petersen wrote: > Yes, but parallelization of the mark phase is fairly trivial, and > something we should probably look into. > > The GC will probably always be STW unless we get compiler support for > inserting GC barriers. Would a thread local GC be possible, and desirable? To my understanding, which is not much, that means the GC will only run in one thread (or multiple) and only needs to stop that/those thread(s). That also means it only need to search for dead objects in the heap/storage area for that particular thread (and where these point to). If I understand this correctly this would be perfect for D since everything is thread local by default. There's also a global heap for global objects or objects shared between threads. -- /Jacob Carlborg

On Thursday, 23 August 2012 at 14:38:19 UTC, Alex Rønne Petersen wrote: > On 23-08-2012 15:21, Rory McGuire wrote: >> On Thu, Aug 23, 2012 at 2:51 PM, dsimcha <dsimcha@yahoo.com >> <mailto:dsimcha@yahoo.com>> wrote: >> >> Basically, the idea is to store information about what is and isn't >> a pointer at the pool level instead of at the block level. My >> attempt from a long time ago at precise heap scanning, and >> Antti-Ville's first attempt, stored meta-data at the end of every >> allocated block. This worked well for large arrays, but was >> terribly inefficient for smaller allocations and made the GC code >> even messier than it already is. The overhead was a fixed >> (void*).sizeof bits per block. Now, each pool has a bit array that >> contains one bit for every possible aligned pointer. The overhead >> is always 1 bit for every (void*).sizeof bytes no matter how large >> or small the block is. >> >> >> Am I correct in thinking that this is still single threaded stop the world? > > Yes, but parallelization of the mark phase is fairly trivial, and something we should probably look into. Funny coincidence, I've just read an interview from Narihiro Nakamura who is working on parallel marking in Ruby: http://rubysource.com/narihiro-nakamura-rubys-gc-innovator/ BR, renoX

On Fri, 24 Aug 2012 08:27:09 +0200, Jacob Carlborg <doob@me.com> wrote: > On 2012-08-23 16:38, Alex Rønne Petersen wrote: > >> Yes, but parallelization of the mark phase is fairly trivial, and >> something we should probably look into. >> >> The GC will probably always be STW unless we get compiler support for >> inserting GC barriers. > > Would a thread local GC be possible, and desirable? To my understanding, which is not much, that means the GC will only run in one thread (or multiple) and only needs to stop that/those thread(s). That also means it only need to search for dead objects in the heap/storage area for that particular thread (and where these point to). > > If I understand this correctly this would be perfect for D since everything is thread local by default. > > There's also a global heap for global objects or objects shared between threads. It certainly would be possible. But I believe it requires thread-local heaps, and some way of keeping track of when an object changes owning thread. Perhaps a mark phase could run on each thread's heap, and unreferenced objects be moved to a global list of potentially dead objects. If after all threads have run a mark, none have claimed the objects, they're collected. Then again, I'm hardly a GC architect. -- Simen

On 2012-08-24 09:32, Simen Kjaeraas wrote: > It certainly would be possible. But I believe it requires thread-local > heaps, and some way of keeping track of when an object changes owning > thread. Yeah, I was think about that. Are thread local heaps a problem? > Perhaps a mark phase could run on each thread's heap, and unreferenced > objects be moved to a global list of potentially dead objects. If after > all threads have run a mark, none have claimed the objects, they're > collected. > > Then again, I'm hardly a GC architect. Me neither. -- /Jacob Carlborg

Forums