Concurrent No-Pause GC Option (idea)

Aug 21, 2022

jordan4ibanez

Aug 21, 2022

frame

Aug 21, 2022

Aug 22, 2022

Aug 25, 2022

Aug 26, 2022

August 21, 2022

Concurrent No-Pause GC Option (idea)

Posted by jordan4ibanez

Permalink

jordan4ibanez

Permalink

I had an idea on the GC for CPUs that have more than one core. So basically all of them.

So I learned on the Discord that the current GC runs parallel if this machine setup is detected, but it still pauses everything to make sure nothing is changed, thanks to adr.

Now I had the idea that perhaps this could be fixed. It is simple in idea, but I am sure that I am about to find out is is horrifically complex in implementation.

So the current assumption is that the memory addresses for objects do not become moved around in the heap and it grows and shrinks as needed. So this opens a new opportunity to create a concurrent no-pause garbage collector. My literal idea was this:

So like, when an object, let's say, in the main thread has it's destructor called, or the thread notices that it no longer talks to this piece of data ever again, basically tell that external gc thread "hey this piece of data and all of it's children are done" and just keep on going doing it's execution routine

So the main thread creates an object like so:

Object data = new Object();

This is great, that means that data has it's own little house on the heap and it will never move from this spot until it "has to move out".

So we make it move out:

Object data = new Object();
data = new Object();

Well there we go, data is now a new data. But what happened to the old data? Well with the concurrent GC the main thread has silently told the GC to collect all of data's data.

The same thing happens if it goes out of scope:

{
    Object data = new Object();
    // data destroy is sent to concurrent gc to free space
}

The same thing will happen if the piece of data is in an array, or if it's held in another object. (In theory) But this was just an extremely simple example. Ideally the pointer table in the thread will never be able to see the data location of the object that is destroyed even if it still exists in memory.

The concurrent gc would simply always be listening and the only micro pauses I could see would be from having to add those memory addresses to the gc's stack. Perhaps that could be avoided by having a concurrent linked list somehow.

This is coming from someone new to D and I am still learning much about it and trying to help build the ecosystem and languages as much as I can. I would love to know your thoughts.

On Thursday, 25 August 2022 at 19:06:15 UTC, Krzysztof Jajeśnica wrote:

On Monday, 22 August 2022 at 18:00:42 UTC, frame wrote:

AFAIK forking does indeed eliminate the problem of another thread using the data after scanner marks it as free.

Forked process does not share address space with the parent process (it gets a copy of the parent's address space), so any memory accesses done in the application process after the fork() will not influence the scanning process. If the scanner process marks a block as free, then that means it was definitely unreachable in the application process prior to the fork(), so the application process has no way to access the block (apart from cases where you hide the pointer from the GC on purpose, but that's a problem for almost all GC implementations, not just the fork() one)

I think it does share unless a write operation is performed on parent or child side, then both get a fresh mutable copy. So the child needs to commit IDs(?) indexes(?) whatever back to the parent, and the parent can map it to its memory block [or something like that].

That is, of course, way better than a background thread as it creates some isolated snapshot.

But consider the parent would call GC.free() manually while the child is running and the child thinks the block can be freed and the parent continued and just used the block right again (don't know if the GC actually does this, but I think so). The parent must ignore the supplied ID in this case or would collect used memory. The GC needs to work differently for this.

Forums