Jump to page: 1 2
Thread overview
[phobos] Gcx: Would we ever want more than one?
May 13, 2011
David Simcha
May 13, 2011
Walter Bright
May 13, 2011
David Simcha
May 15, 2011
Sean Kelly
May 15, 2011
David Simcha
May 15, 2011
Brad Roberts
May 15, 2011
Jonathan M Davis
May 15, 2011
Brad Roberts
May 15, 2011
Jonathan M Davis
May 16, 2011
Sean Kelly
May 16, 2011
Robert Jacques
May 12, 2011
I'm looking to get rid of the global malloc lock for small memory allocations.  A major barrier to pulling this off within the current GC design is the fact that Gcx is a struct, which suggests the possibility of having more than one instance and makes it more difficult to create thread-local objects.  Is there any reason why we would ever want more than one garbage collector instance?  If not, why is Gcx a struct instead of just a bunch of __gshared variables?
May 12, 2011

On 5/12/2011 8:01 PM, David Simcha wrote:
> I'm looking to get rid of the global malloc lock for small memory allocations.  A major barrier to pulling this off within the current GC design is the fact that Gcx is a struct, which suggests the possibility of having more than one instance and makes it more difficult to create thread-local objects.  Is there any reason why we would ever want more than one garbage collector instance?  If not, why is Gcx a struct instead of just a bunch of __gshared variables?
>

You can get multiple Gcx instances when you're connecting DLL instances together. That's why the druntime allows you the means to pick one.

Also, grouping them together in a single struct is good encapsulation practice, rather than a random distributed collection of globals.
May 12, 2011
On 5/12/2011 11:05 PM, Walter Bright wrote:
>
>
> On 5/12/2011 8:01 PM, David Simcha wrote:
>> I'm looking to get rid of the global malloc lock for small memory allocations.  A major barrier to pulling this off within the current GC design is the fact that Gcx is a struct, which suggests the possibility of having more than one instance and makes it more difficult to create thread-local objects.  Is there any reason why we would ever want more than one garbage collector instance?  If not, why is Gcx a struct instead of just a bunch of __gshared variables?
>>
>
> You can get multiple Gcx instances when you're connecting DLL instances together. That's why the druntime allows you the means to pick one.
>
> Also, grouping them together in a single struct is good encapsulation practice, rather than a random distributed collection of globals.
>

Crap.  This means I'm going to have to get creative and figure out a way to get storage that's local to both a Gcx instance and a thread. Definitely do-able, but not pretty.

May 14, 2011
Technically, you want a free list per core. I don't know how practical it is to figure that out though.

Sent from my iPhone

On May 12, 2011, at 8:14 PM, David Simcha <dsimcha at gmail.com> wrote:

> On 5/12/2011 11:05 PM, Walter Bright wrote:
>> 
>> 
>> On 5/12/2011 8:01 PM, David Simcha wrote:
>>> I'm looking to get rid of the global malloc lock for small memory allocations.  A major barrier to pulling this off within the current GC design is the fact that Gcx is a struct, which suggests the possibility of having more than one instance and makes it more difficult to create thread-local objects.  Is there any reason why we would ever want more than one garbage collector instance?  If not, why is Gcx a struct instead of just a bunch of __gshared variables?
>>> 
>> 
>> You can get multiple Gcx instances when you're connecting DLL instances together. That's why the druntime allows you the means to pick one.
>> 
>> Also, grouping them together in a single struct is good encapsulation practice, rather than a random distributed collection of globals.
>> 
> 
> Crap.  This means I'm going to have to get creative and figure out a way to get storage that's local to both a Gcx instance and a thread.  Definitely do-able, but not pretty.
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
May 14, 2011
On 5/14/2011 8:28 PM, Sean Kelly wrote:
> Technically, you want a free list per core. I don't know how practical it is to figure that out though.
>
> Sent from my iPhone
>
> On May 12, 2011, at 8:14 PM, David Simcha<dsimcha at gmail.com>  wrote:

The idea being that, if you have a free list per core, there will almost never be any contention in practice, even if  you have way more threads than cores?
May 14, 2011
On 5/14/2011 7:02 PM, David Simcha wrote:
> On 5/14/2011 8:28 PM, Sean Kelly wrote:
>> Technically, you want a free list per core. I don't know how practical it is to figure that out though.
>>
>> Sent from my iPhone
>>
>> On May 12, 2011, at 8:14 PM, David Simcha<dsimcha at gmail.com>  wrote:
> 
> The idea being that, if you have a free list per core, there will almost never be any contention in practice, even if you have way more threads than cores?

Ideally neither contention nor cache swapping.  It'd stay in the l1 or l2 of the core directly involved with the allocations.  By being thread centric even if not contended it could still wander between cores and thus the caches associated with them.

A serious micro-optimization, but..
May 14, 2011
On 2011-05-14 19:09, Brad Roberts wrote:
> On 5/14/2011 7:02 PM, David Simcha wrote:
> > On 5/14/2011 8:28 PM, Sean Kelly wrote:
> >> Technically, you want a free list per core. I don't know how practical it is to figure that out though.
> >> 
> >> Sent from my iPhone
> > 
> >> On May 12, 2011, at 8:14 PM, David Simcha<dsimcha at gmail.com>  wrote:
> > The idea being that, if you have a free list per core, there will almost never be any contention in practice, even if you have way more threads than cores?
> 
> Ideally neither contention nor cache swapping.  It'd stay in the l1 or l2 of the core directly involved with the allocations.  By being thread centric even if not contended it could still wander between cores and thus the caches associated with them.
> 
> A serious micro-optimization, but..

But we're always serious about our micro-optimizations! ;)

Yes, anything which we can reasonably do to make the GC more efficient is a good thing. Java already gets enough flak for its GC (most undeservedly at this point), and it has an efficient one. We don't. So, anything we can reasonably do to improve how well the GC performs is definitely desirable.

- Jonathan M Davis
May 14, 2011
On 5/14/2011 7:44 PM, Jonathan M Davis wrote:
> On 2011-05-14 19:09, Brad Roberts wrote:
>> On 5/14/2011 7:02 PM, David Simcha wrote:
>>> On 5/14/2011 8:28 PM, Sean Kelly wrote:
>>>> Technically, you want a free list per core. I don't know how practical it is to figure that out though.
>>>>
>>>> Sent from my iPhone
>>>
>>>> On May 12, 2011, at 8:14 PM, David Simcha<dsimcha at gmail.com>  wrote:
>>> The idea being that, if you have a free list per core, there will almost never be any contention in practice, even if you have way more threads than cores?
>>
>> Ideally neither contention nor cache swapping.  It'd stay in the l1 or l2 of the core directly involved with the allocations.  By being thread centric even if not contended it could still wander between cores and thus the caches associated with them.
>>
>> A serious micro-optimization, but..
> 
> But we're always serious about our micro-optimizations! ;)
> 
> Yes, anything which we can reasonably do to make the GC more efficient is a good thing. Java already gets enough flak for its GC (most undeservedly at this point), and it has an efficient one. We don't. So, anything we can reasonably do to improve how well the GC performs is definitely desirable.
> 
> - Jonathan M Davis

Macro before micro though.
May 14, 2011
On 2011-05-14 20:12, Brad Roberts wrote:
> On 5/14/2011 7:44 PM, Jonathan M Davis wrote:
> > On 2011-05-14 19:09, Brad Roberts wrote:
> >> On 5/14/2011 7:02 PM, David Simcha wrote:
> >>> On 5/14/2011 8:28 PM, Sean Kelly wrote:
> >>>> Technically, you want a free list per core. I don't know how practical it is to figure that out though.
> >>>> 
> >>>> Sent from my iPhone
> >>> 
> >>>> On May 12, 2011, at 8:14 PM, David Simcha<dsimcha at gmail.com>  wrote:
> >>> The idea being that, if you have a free list per core, there will almost never be any contention in practice, even if you have way more threads than cores?
> >> 
> >> Ideally neither contention nor cache swapping.  It'd stay in the l1 or l2 of the core directly involved with the allocations.  By being thread centric even if not contended it could still wander between cores and thus the caches associated with them.
> >> 
> >> A serious micro-optimization, but..
> > 
> > But we're always serious about our micro-optimizations! ;)
> > 
> > Yes, anything which we can reasonably do to make the GC more efficient is a good thing. Java already gets enough flak for its GC (most undeservedly at this point), and it has an efficient one. We don't. So, anything we can reasonably do to improve how well the GC performs is definitely desirable.
> > 
> > - Jonathan M Davis
> 
> Macro before micro though.

Oh, definitely. But in this case, it sounds like we're talking about a fairly major design decision with regards to thread pools, and we do want to consider all of the various ramifications when making a decision about what best to do, and such micro-optimizations factor in, though obviously any macro optimizations which the decision affects take precedence.

- Jonathan M Davis
May 16, 2011
On May 14, 2011, at 7:09 PM, Brad Roberts wrote:

> On 5/14/2011 7:02 PM, David Simcha wrote:
>> On 5/14/2011 8:28 PM, Sean Kelly wrote:
>>> Technically, you want a free list per core. I don't know how practical it is to figure that out though.
>>> 
>>> Sent from my iPhone
>>> 
>>> On May 12, 2011, at 8:14 PM, David Simcha<dsimcha at gmail.com>  wrote:
>> 
>> The idea being that, if you have a free list per core, there will almost never be any contention in practice, even if you have way more threads than cores?
> 
> Ideally neither contention nor cache swapping.  It'd stay in the l1 or l2 of the core directly involved with the allocations.  By being thread centric even if not contended it could still wander between cores and thus the caches associated with them.
> 
> A serious micro-optimization, but..

I mentioned it mostly because it seemed an option worth exploring if a free list per thread turned out to be very difficult for some reason.  A fixed array of free lists, one per core, would be easy if there were a way to determine which core the caller was being executed by.  We may have to figure out the per-thread stuff anyway though, since non-shared data needs to be finalized by its owner thread.  Again, this could be done by the owner core instead, but only if we could ensure that threads don't move between cores.
« First   ‹ Prev
1 2