D's Memory Model (page 3)

February 10, 2010
[dmd-concurrency] D's Memory Model
Posted by Robert Jacques
in reply to Fawzi Mohamed
Permalink
Robert Jacques
Posted in reply to Fawzi Mohamed
Permalink
On Wed, 10 Feb 2010 12:28:39 -0500, Fawzi Mohamed <fawzi at gmx.ch> wrote:
> On 10-feb-10, at 17:41, Sean Kelly wrote:
>> On Feb 10, 2010, at 3:59 AM, Fawzi Mohamed wrote:
>>> On 10-feb-10, at 03:14, Robert Jacques wrote:
>>>> On Tue, 09 Feb 2010 05:12:29 -0500, Fawzi Mohamed <fawzi at gmx.ch> wrote:
>>>>> On 8-feb-10, at 19:15, Robert Jacques wrote:
>>>>>> On Mon, 08 Feb 2010 06:07:36 -0500, Fawzi Mohamed <fawzi at gmx.ch> wrote:
>>>>
>>>> [snip]
>>>>
>>>>>> Concurrent collectors trade throughput (GC time / program time) for program responsiveness. They also often trade program correctness for latency, as they are notoriously hard to get correct for languages with mutation. While, there are some very elegance concurrent GCs for functional languages, the high level descriptions of the Boehm concurrent GC seems to be inherently racy, while Java concurrent GCs tend to be highly instrumented and lock-tastic (mark-bit locking on every ref assignment).
>>>>> that can be done with atomic ops...
>>>>
>>>> It can only be done correctly using a DCAS, (not dwCAS), so isn't implementable on any mainstream chip. So you have to lock the mark bits on an object. This can be done as a spin-lock using atomic ops, but that doesn't change the number or frequency they have to be taken.
>>>
>>> I might miss something (and I don't know what DCAS is) but as you always go in one direction (unmarked -> marked) I don't see why something like (for 32 bit)
>>>
>>> mark(uint[]bitfield,uint bit){
>>> 	auto idx=(bit+31)/32;
>>> 	auto mask=1>>(bit&0x1F);
>>> 	volatile auto oldV=bitfield[idx];
>>> 	while((oldV & mask)==0){
>>> 		auto oldV=atomicCAS(bitfield[idx],oldV | mask,oldV);
>>> 	}
>>> }
>>>
>>> this can be extended if you have 3 states (white,gray,black) for
>>> concurrent GC or to refCounting.
>>> You have collisions only when two threads try to modify the same
>>> memory location, and even then given that the function is so fast to
>>> compute it should not be a big problem.
>>> False sharing makes things worse, but also that should not be
>>> something that cannot be made acceptable distributing a bit the mark
>>> bits in the memory.
>>>
>>> At the end of the mark phase you need a memory barrier in each working thread, and a semaphore (or some flag) to ensure that all changes have global visibility, but that is it.
>>
>> This works in a world where all reference updates are done via in- language operations.  But D supports both extern (C) functions and inline assembly, both of which defeat incremental collection.
>
> this works with a stop the world approach as well as now (and is parallel).
>
> I agree that to use that in a concurrent GC then you need some way to
> ensure that an update to a reference does not loose references.
> That is possible only if you have some control on the references
> updates, so either that has to be exposed to external programs, or you
> declare programs that modify references transferring the reference to an
> object from a memory location to another (simply loosing a reference or
> adding one is not a problem, only the combination is), as unsafe to be
> run during a GC mark phase if you use a concurrent GC and do not notify
> it of the change.
> Not the ideal solution, but that is all you can do: ether you stop the
> threads, or some control on the modifications has to be done...

The DCAS, or double compare and swap, does a compare and swap at two different memory locations simultaneously. The problem is that in a concurrent/incremental GC, you have to propagate the mark bits correctly. And to do this you have to atomically set the mark-bits and the assigned reference at the same time...although in writing this I think I figured out at least two ways of doing this correctly without a DCAS. One involves recursively propagating the mark-bits yourself, which could cause the thread to pause for a long time and retain excess memory, or adding the new node to a to-mark list, which would add a some serialization.
Forums