Gcx: Would we ever want more than one? (page 2)

May 16, 2011
[phobos] Gcx: Would we ever want more than one?
Posted by Robert Jacques
in reply to Sean Kelly
Permalink
Robert Jacques
Posted in reply to Sean Kelly
Permalink
On Mon, 16 May 2011 14:17:00 -0400, Sean Kelly <sean at invisibleduck.org> wrote:

> On May 14, 2011, at 7:09 PM, Brad Roberts wrote:
>
>> On 5/14/2011 7:02 PM, David Simcha wrote:
>>> On 5/14/2011 8:28 PM, Sean Kelly wrote:
>>>> Technically, you want a free list per core. I don't know how practical it is to figure that out though.
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On May 12, 2011, at 8:14 PM, David Simcha<dsimcha at gmail.com>  wrote:
>>>
>>> The idea being that, if you have a free list per core, there will
>>> almost never be any contention in practice, even if
>>> you have way more threads than cores?
>>
>> Ideally neither contention nor cache swapping.  It'd stay in the l1 or
>> l2 of the core directly involved with the
>> allocations.  By being thread centric even if not contended it could
>> still wander between cores and thus the caches
>> associated with them.
>>
>> A serious micro-optimization, but..
>
> I mentioned it mostly because it seemed an option worth exploring if a free list per thread turned out to be very difficult for some reason.  A fixed array of free lists, one per core, would be easy if there were a way to determine which core the caller was being executed by.  We may have to figure out the per-thread stuff anyway though, since non-shared data needs to be finalized by its owner thread.  Again, this could be done by the owner core instead, but only if we could ensure that threads don't move between cores.

Regarding thread-specific finalization, this does seem to gum things up a bit. The issue I see is that all objects to be finalized need to be placed onto some kind of free-list (which each thread would then processes later) while preserving the object's layout. Objects currently consist of {vtable,monitor,data...}. That doesn't really leave any room for a) a next object pointer or b) a block-info pointer (which might be used for fine-grain-lock/lock-free solutions).

One option is to re-use the monitor for a next pointer. Objects with a valid monitor would be finalized globally and zeroed before being placed on the local free list and 'local' objects would have the next point/monitor re-nulled prior to finalization. I see one potential corner case with this. If an object synchronizes on/calls a synchronized method on another object during its finalizer, then (possibly silent) corruption could occur. Now doing this is a) accessing "references [that] may no longer be valid" according to the spec and b) extremely rare (shared/syncronized objects generally would have a valid monitor prior to sweeping and would be finalized 'globally' not 'locally'). Yes, this is a bug in the users' code, but it's a bug that today will segfault or run correctly, not corrupt things.

Storing a block-info pointer as part of the free-list node provides a nice performance gain and allows for finer-gain locking. However, direct substitution won't work as there is no room inside {vtable,monitor/next*,data...} for a block-info*. One option would be to place the block-info* at the end of the object's allocation chunk. This would effectively mean adding an extra word to finalized objects for the purpose of allocation size.
Forums