October 18, 2019
These questions probably need some context: I'm working on an interpreter that will manage memory via reference counted struct types. To deal with the problem of strong reference cycles retaining memory indefinitely, weak references or recursive teardowns have to be used where appropriate.

To help detect memory leaks from within the interpreter, I'd also like to employ core.memory.GC in the following fashion:

* keep core.memory.GC off (disabled) by default, but nonetheless allocate objects from GC memory
* provide a function that can find (and reclaim) retained unreachable object graphs that contain strong reference cycles
* the main purpose of this function is to find and report such instances, not to reclaim memory. Retained graphs should be reported as warnings on stderr, so that the program can be fixed manually, e.g. by weakening some refs in the proper places
* the function will rely on GC.collect to find unreachable objects
* the function will *always* be called implicitly when a program terminates
* the function should also be explicitly callable from any point within a program.

Now my questions:

Is it safe to assume that a call to GC.collect will be handled synchronously (and won't return early)?

Is there a way to ensure that GC.collect will never run unless when called explicitly (even in out of memory situations)?

Is it possible and is it OK to print to stderr while the GC is collecting (e.g. from @nogc code, using functions from core.stdc.stdio)?

Could I implement my function by introducing a shared global flag which is set prior to calling GC.collect and reset afterwards, so that any destructor can determine whether has been invoked by a "flagged" call to GC.collect and act accordingly?

Alternatively: do I need to implement such a flag, or is there already a way in which a destructor can determine whether it has been invoked by the GC?

Thanks for any help!
October 18, 2019
On Friday, October 18, 2019 10:54:55 AM MDT Roland Hadinger via Digitalmars- d-learn wrote:
> These questions probably need some context: I'm working on an interpreter that will manage memory via reference counted struct types. To deal with the problem of strong reference cycles retaining memory indefinitely, weak references or recursive teardowns have to be used where appropriate.
>
> To help detect memory leaks from within the interpreter, I'd also like to employ core.memory.GC in the following fashion:
>
> * keep core.memory.GC off (disabled) by default, but nonetheless
> allocate objects from GC memory
> * provide a function that can find (and reclaim) retained
> unreachable object graphs that contain strong reference cycles
> * the main purpose of this function is to find and report such
> instances, not to reclaim memory. Retained graphs should be
> reported as warnings on stderr, so that the program can be fixed
> manually, e.g. by weakening some refs in the proper places
> * the function will rely on GC.collect to find unreachable objects
> * the function will *always* be called implicitly when a program
> terminates
> * the function should also be explicitly callable from any point
> within a program.
>
> Now my questions:
>
> Is it safe to assume that a call to GC.collect will be handled synchronously (and won't return early)?

D's GC is a stop-the-world GC. Every thread managed by the GC is stopped when a thread runs a collection.

> Is there a way to ensure that GC.collect will never run unless when called explicitly (even in out of memory situations)?

The GC only runs a collection either when you explicitly tell it to or when you try to allocate memory using the GC, and it determines that it should run a collection. Disabling the GC normally prevents a collection from running, though per the documentation, it sounds like it may still run if the GC actually runs out of memory. I had thought that it prevented collections completely, but that's not what the documentation says. I don't know what the current implementation does.

> Is it possible and is it OK to print to stderr while the GC is collecting (e.g. from @nogc code, using functions from core.stdc.stdio)?

No code in any thread managed by the GC is run while a collection is running unless it's code that's triggered by the collection itself (e.g. a finalizer being called on an object that's being collected - and even that isn't supposed to access GC-allocated objects, because the GC might have already destroyed them - e.g. in the case of cycle). If you want code to run at the same time as a GC collection, it's going to have to be in a thread that is not attached to the GC, and at that point, you shouldn't be accessing _anything_ that's managed by the GC unless you have a guarantee that what you're accessing won't be collected. And even then, you shouldn't be mutating any of it.

Also, @nogc doesn't say anything about whether the code accesses GC-allocated objects. It just means that it's not allowed to access most GC functions, which usually just means that it doesn't allocate anything using the GC and that it doesn't risk running a collection. So, just because a function is @nogc doesn't necessarily mean that it's safe to run it from a thread that isn't managed by the GC while a collection is running.

> Could I implement my function by introducing a shared global flag which is set prior to calling GC.collect and reset afterwards, so that any destructor can determine whether has been invoked by a "flagged" call to GC.collect and act accordingly?

You should be able to do that, but then the destructor can't be pure (though as I understand it, there's currently a compiler bug with pure destructors anyway which causes them to not be called), and when a destructor is run as a finalizer, it shouldn't be accessing any other GC-allocated objects, because the GC might have actually destroyed them already at that point. Finalizers really aren't supposed to doh much of anything other than managing what lives in an object directly or managing non-GC-allocated resources. Regardless, anything that really should be operating as a destructor rather than a finalizer has to live on the stack, since finalizers won't be run until a collection occurs. If you're explicitly running them yourself via your own reference counting, then you don't have that problem, but if there's any chance that a destructor is going to be run as a finalizer by the GC, then you have to write your destructors / finalizers with the idea that that could happen.

> Alternatively: do I need to implement such a flag, or is there already a way in which a destructor can determine whether it has been invoked by the GC?
>
> Thanks for any help!

Honestly, the way things are set up, destructors aren't supposed to know or care about whether they're being run by the GC as a finalizer. So, the GC isn't going to provide that kind of functionality. What you're looking to do is pretty much a giant hack from the perspective of the GC and likely to be pretty dangerous to attempt. I suspect that what would make a lot more sense would be to create a custom build of druntime to run which specifically printed out what wasn't freed when the program shut down rather than trying to hack around how the GC works. Alternatively, you could just ditch the GC entirely and then use valgrind to see what didn't get freed to catch cycles (or other screw-ups that resulted in memory not being freed).

Having the GC take care of cycles for you isn't necessarily a problem, but having the GC report on what's alive or not is tricky business, particularly since it's supposed to keep anything that the program still has access to alive.

Another thing to consider is that some language features outright require the GC (e.g. closures and anything with dynamic arrays involving allocation), and if you truly don't want to use the GC for that stuff, it's probably going to be easier to require that your program not use the GC at all than to try to have it just manage cycles.

Regardless, if you really want to go forward with something like you're proposing here, you'll probably need to get answers from one of the few GC experts around here.

- Jonathan M Davis