GC BlkAttr clarification. Programming in D pages 671, 672. About GC

GC BlkAttr clarification. Programming in D pages 671, 672. About GC
3 days ago Brother Bill
3 days ago H. S. Teoh
2 days ago Steven Schveighoffer

3 days ago

Posted by Brother Bill

Permalink

Brother Bill

Permalink

It appears that D has multiple ways of determining which created objects are subject to GC.

For class instances, new creates a new class instance, returning a pointer to it. While "alive", other pointers may also point to the class instance. Once no pointers are pointing to this class instance, it may be garbage collected.

OOP languages such as Eiffel can keep track of pointers to class instances. They then do a generational lookup or do a thorough mark and sweep.

C, C++ and D can play shenanigans with pointers, such as casting them to size_t, which hides them from the GC.

In addition, D provides several ways to allocate and free memory.

GC.calloc can allocate memory for a slice of MyClass instances.
The developer may run GC.free to free the allocated memory.
But GC may perform its own garbage collection of GC allocated memory blocks.

Let's look at each attribute: (confirm if my analysis is right, otherwise correct)

FINALIZE - just before GC reclaims the memory, such as with GC.free,
call destructors, aka finalizers.

NO_SCAN - There may be false positives regarding byte values that look like 'new'
allocated pointers. This can result in 'garbage' memory not being collected.
If we are CERTAIN that this memory block doesn't contain any pointers to
'new' SomeClass allocated memory, then mark as NO_SCAN.

      Question 1: if GC-calloc has allocated MyClass that has a string 'name' member,
      which may expand in size, would be still properly apply NO_SCAN.

      Question 2: if GC-calloc has allocated MyClass, which may allocate
      new MyStudent(...), would that mean 'don't apply NO_SCAN'?

NO_MOVE - For GC.realloc, if increasing memory allocated, and it's not available,
throw 'MEMORY_NOT_AVAILABLE' exception.

APPENDABLE - For D internal runtime use. Don't mark this yourself.

NO_INTERIOR - This says that only the base address of the block may be a target address
of other GC allocated pointers.
All other possible pointers are 'false' pointers.

          Question 3: How is this different that NO_SCAN.

Perhaps I am missing the fundamentals of various D garbage collectors.

Are there any articles that expound on this issue?

3 days ago

Re: GC BlkAttr clarification. Programming in D pages 671, 672. About GC

Posted by H. S. Teoh
in reply to Brother Bill

Permalink

H. S. Teoh

Posted in reply to Brother Bill

Permalink

On Wed, Sep 03, 2025 at 07:56:03PM +0000, Brother Bill via Digitalmars-d-learn wrote:
> It appears that D has multiple ways of determining which created objects are subject to GC.

No.  The GC knows which memory address ranges it manages, and any pointer that fall outside of those ranges will be ignored, since they cannot possibly be GC-allocated.

> For class instances, new creates a new class instance, returning a pointer to it.  While "alive", other pointers may also point to the class instance.  Once no pointers are pointing to this class instance, it may be garbage collected.

Correct.

[...]
> C, C++ and D can play shenanigans with pointers, such as casting them to size_t, which hides them from the GC.

D's current GC is conservative, meaning that any value it sees that looks like it might be a pointer value, will be regarded as a pointer value.

There is an optional precise GC that has been implemented, that can be turned on with compiled-in options or command line options, which uses a slightly less conservative scheme.

[...]
> GC.calloc can allocate memory for a slice of MyClass instances.  The developer may run GC.free to free the allocated memory.  But GC may perform its own garbage collection of GC allocated memory blocks.

I might also add that D's GC *does not run* unless the user program tries to allocate GC memory. Unlike languages like Java, where the GC may be implicitly triggered by the runtime in the background, D's GC lies dormant until you perform a GC allocation and it decides that there isn't enough free memory left and it's time to run a collection cycle.

IOW, if you don't want the GC to run, simply don't allocate any more GC memory, and no collection cycles will run (unless you call it yourself via GC.collect).

> Let's look at each attribute:  (confirm if my analysis is right,
> otherwise correct)
> 
> FINALIZE - just before GC reclaims the memory, such as with GC.free,
>            call destructors, aka finalizers.

This bit is probably best left untouched by user code, and left to the runtime to figure out when/how to use it.

> NO_SCAN - There may be false positives regarding byte values that look like 'new' allocated pointers.  This can result in 'garbage' memory not being collected.  If we are CERTAIN that this memory block doesn't contain any pointers to 'new' SomeClass allocated memory, then mark as NO_SCAN.

Correct.  Though if you're writing idiomatic D code, you'll almost never need to worry about this.  Whenever you allocate an array whose elements are PODs (without any pointers), the allocator will automatically mark the memory NO_SCAN so that the GC doesn't waste time scanning such blocks.  So things like implicit string allocations will be marked NO_SCAN, etc.  If you're allocating an array or object that contains indirections, then NO_SCAN will not be set, so the GC will scan the interior of suc blocks for pointers to other live objects.

>           Question 1: if GC-calloc has allocated MyClass that has a
>           string 'name' member, which may expand in size, would be
>           still properly apply NO_SCAN.

You need to understand that string members are pointer/size pairs.  The content of the string is never stored inside the object's memory block itself.

An empty string does not have any associated allocated memory, and when you assign to the string, whether or not it will be scanned depends on where it came from.  If it came from a string literal, it will be in the program's static memory, and the GC never scans that (neither does it have any GC flags like NO_SCAN).  If the string comes from GC-allocated memory, then that memory by default will have been marked NO_SCAN because string data is assumed not to contain any pointer values, only character values. (This is why it's a bad idea to mask a pointer by, e.g., converting it into a string representation and storing it inside a string.  Because the GC won't scan such strings, it may mistakenly collect a live object thinking that it's dead, if the only references to the object are inside such strings.)

>           Question 2: if GC-calloc has allocated MyClass, which may
>           allocate new MyStudent(...), would that mean 'don't apply
>           NO_SCAN'?

It's very simple.  If a memory block may contain pointers, then it should not be NO_SCAN.  If a memory block never contains any pointers, then it can (should) be marked NO_SCAN.

When does a memory block contain pointers?  When the object that lives in it contains references, such as references to other objects, to GC-allocated strings, arrays, etc..  If the object only contains PODs, then there are no pointers and it may be safely marked NO_SCAN.

But again, I'd like to repeat that user code rarely needs to bother with NO_SCAN or other GC flags.  The default implementation of `new` will automatically do the right thing for you.  You only need to fiddle with GC flags if you're doing something unusual, like emplacing an object in memory you manually allocated (as opposed to memory allocated by `new`), or if you're dealing with void[] arrays (possibly constructed externally) where the runtime doesn't know the actual type of the data.

Normal D code does not need to fiddle with GC flags.

> NO_MOVE - For GC.realloc, if increasing memory allocated, and it's not available, throw 'MEMORY_NOT_AVAILABLE' exception.

Correct. You might want to use this flag if you have non-D code that might be holding pointers to this memory block, e.g., if you passed a pointer to some D array to C code which retains it in some C-managed pointer, and the C code expects the array to still be there later.

It's not very often that such situations come up, though.  When passing GC-allocated data to C code, it's generally a good idea to keep a reference to it inside D code so that the GC can find the reference anyway.  Since D doesn't have a moving GC, this is really all you need to do.  Again, unless you're doing something unusual, you probably don't need to touch the NO_MOVE flag.

> APPENDABLE - For D internal runtime use.  Don't mark this yourself.

Yes.

> NO_INTERIOR - This says that only the base address of the block may be a target address of other GC allocated pointers.  All other possible pointers are 'false' pointers.

Correct.  This flag might be useful if you know that your program will only ever point to the head of the block, and you wish to optimize GC performance by letting it skip over values that look like pointers, but aren't (because they point to the interior of a NO_INTERIOR block, so the GC knows it can't be a real pointer value).  But I'd recommend not bothering with this flag unless you have a GC performance issue, and you're sure that this specific situation is the cause of said performance issue.  Otherwise you're just wasting time playing with GC flags that don't really make a significant difference.

[...]
>               Question 3: How is this different that NO_SCAN.

NO_SCAN means don't look for pointer values inside the memory block.

NO_INTERIOR means ignore pointers (outside this block) that point to the
interior of this block.

> Perhaps I am missing the fundamentals of various D garbage collectors.
[...]

It's really very simple.  D uses a mark-and-sweep collector. That means that at the start of every GC collection cycle, it starts with a set of roots: pointers that represent active references to memory, such as CPU registers, pointers on the runtime stack, global variables that contain pointers, etc..  Then it recursively walks these pointers, and every GC-allocated object that it reaches will be marked as live. The contents of these objects are scanned for more pointers to other objects, etc.. At the end of the cycle, any object that isn't marked live is unreachable from the program's roots, so it must be dead and can be collected.

D's GC is conservative by default, meaning that it does not assume anything about the structure of data in allocated blocks. Any pointer-size aligned integer values that look like they are pointers, will be treated as pointers.  (The docs officially discourage deliberately storing pointers in integer variables, though, since this would break the optional precise GC that *does* make certain assumptions about where pointer values might be located inside allocated blocks.) Of course, as I said earlier, the GC knows which memory ranges it manages, so any pointer values that are outside of these ranges will be ignored as irrelevant.

The various GC flags are simply hints that let you influence the scanning process to some extent. The NO_SCAN bit means that upon reaching this block, don't bother scanning its contents to find more pointers (because there are none). The NO_INTERIOR bit means that if the GC finds a pointer-like value that looks like it points to the inside of this block, ignore it as a non-pointer, because pointers to this block only ever point to its head (the supposed pointer is actually not a real pointer, but an integer value that happens to have a pointer-like value).

The other flags have very specific uses that, if you don't know what they actually do, you probably don't need them and shouldn't touch them.

T

-- 
Bomb technician: If I'm running, try to keep up.

2 days ago

Re: GC BlkAttr clarification. Programming in D pages 671, 672. About GC

Posted by Steven Schveighoffer
in reply to H. S. Teoh

Permalink

Steven Schveighoffer

Posted in reply to H. S. Teoh

Permalink

On Wednesday, 3 September 2025 at 23:05:45 UTC, H. S. Teoh wrote:

On Wed, Sep 03, 2025 at 07:56:03PM +0000, Brother Bill via Digitalmars-d-learn wrote:
[...]

C, C++ and D can play shenanigans with pointers, such as casting them to size_t, which hides them from the GC.

D's current GC is conservative, meaning that any value it sees that looks like it might be a pointer value, will be regarded as a pointer value.

There is an optional precise GC that has been implemented, that can be turned on with compiled-in options or command line options, which uses a slightly less conservative scheme.

The recommendation is avoid only storing data in size_t that points to an allocated block.

Even without the precise collector, the GC has pointer containing blocks and no-pointer blocks. this means that it's quite easy to accidentally only store a pointer in a size_t that will not be scanned, even with the conservative GC.

You should only store pointers as size_t "if you know what you are doing". Otherwise do not do this.

It is fine to make a temporary copy of a pointer to a size_t for example to examine the bits inside. This should leave the original pointer alone.

[...]

GC.calloc can allocate memory for a slice of MyClass instances. The developer may run GC.free to free the allocated memory. But GC may perform its own garbage collection of GC allocated memory blocks.

GC.free is going to free the memory. It will NOT run finalizers. It will not collect it again later. I want to make that clear.

If you do not explicitly free the memory, and it becomes garbage, then the GC will collect it.

As far as a slice of MyClass instances, if you mean a slice of data that contains the fields of an array of classes, you should be very cautious of this. The GC is not equipped to call finalizers on such a structure, and so you likely will run into lifetime issues.

For classes, I'd just stick with new.

For structs, you can quite easily allocate an array of structs, and the GC can support finalization of that. Also recommend just using new.

> >

Let's look at each attribute: (confirm if my analysis is right,
otherwise correct)

FINALIZE - just before GC reclaims the memory, such as with GC.free,
call destructors, aka finalizers.

This bit is probably best left untouched by user code, and left to the runtime to figure out when/how to use it.

In the latest compiler (2.111), this has been changed to a bit that requests finalization upon allocation. The GC uses this bit and the typeinfo passed in to determine the correct action. This is different from before where the bit was an implementation detail that you had to know what you are asking for.

I do agree that you should basically leave this alone. But for sure the new treatment of the bit is more robust than before.

Note: changing bits after allocation does not take this into account, at that point you are modifying implementation details. I really would like to get rid of these bits completely and use more reliable API (having a set of implementation bits as an option is quite dangerous).

> >

NO_SCAN - There may be false positives regarding byte values that look like 'new' allocated pointers. This can result in 'garbage' memory not being collected. If we are CERTAIN that this memory block doesn't contain any pointers to 'new' SomeClass allocated memory, then mark as NO_SCAN.

Correct. Though if you're writing idiomatic D code, you'll almost never need to worry about this. Whenever you allocate an array whose elements are PODs (without any pointers), the allocator will automatically mark the memory NO_SCAN so that the GC doesn't waste time scanning such blocks. So things like implicit string allocations will be marked NO_SCAN, etc. If you're allocating an array or object that contains indirections, then NO_SCAN will not be set, so the GC will scan the interior of suc blocks for pointers to other live objects.

I will add that the concern of scanning non-pointers is pretty much obsolete with 64-bit addressing. It's still important to use NO_SCAN, as it's quite common to allocate large blocks of data that are just bytes (e.g. load a file). You don't want to waste time scanning that, even if there are no false-positives to be found in there.

> >

      Question 1: if GC-calloc has allocated MyClass that has a
      string 'name' member, which may expand in size, would be
      still properly apply NO_SCAN.

I would say this is not true. A string has a pointer, it should be scanned.

> >

      Question 2: if GC-calloc has allocated MyClass, which may
      allocate new MyStudent(...), would that mean 'don't apply
      NO_SCAN'?

It's very simple. If a memory block may contain pointers, then it should not be NO_SCAN. If a memory block never contains any pointers, then it can (should) be marked NO_SCAN.

100% correct.

Normal D code does not need to fiddle with GC flags.

Great advice!

> >

NO_MOVE - For GC.realloc, if increasing memory allocated, and it's not available, throw 'MEMORY_NOT_AVAILABLE' exception.

Correct. You might want to use this flag if you have non-D code that might be holding pointers to this memory block, e.g., if you passed a pointer to some D array to C code which retains it in some C-managed pointer, and the C code expects the array to still be there later.

It's not very often that such situations come up, though. When passing GC-allocated data to C code, it's generally a good idea to keep a reference to it inside D code so that the GC can find the reference anyway. Since D doesn't have a moving GC, this is really all you need to do. Again, unless you're doing something unusual, you probably don't need to touch the NO_MOVE flag.

No, this is not correct. NO_MOVE is supposed to mean that a moving GC cannot move this block (and fix up pointers to it).

Given that we have a conservative GC, which scans the stack conservatively including C stacks, and we will always have one, I would say this bit should just be deprecated.

Indeed, it is completely ignored in the current GC.

> >

APPENDABLE - For D internal runtime use. Don't mark this yourself.

Yes.

Also improved with D 2.111. The APPENDABLE bit is now an input to malloc that tells the GC this is an array (including adjusting the size to deal with padding space). The GC now handles array runtime features directly, and so it understands what this means.

So in fact, this is a bit you can set, and there are currently unexposed GC interface functions that can be used to manage the array. They have not yet been exposed in core.memory, because we are not sure if these are the final interfaces we want.

However, allocating an array with this bit will do exactly what you expect (and managing the resulting array with the normal array management functions such as appending or capacity will work).

I do still recommend using new.

> >

NO_INTERIOR - This says that only the base address of the block may be a target address of other GC allocated pointers. All other possible pointers are 'false' pointers.

Yes, though I would say it like:

"only pointers found while scanning that point to the exact target address may be considered pointers to the block."

Again, this is really only of great use in 32-bit addressing.

> >

Perhaps I am missing the fundamentals of various D garbage collectors.
[...]

The various GC flags are simply hints that let you influence the scanning process to some extent. The NO_SCAN bit means that upon reaching this block, don't bother scanning its contents to find more pointers (because there are none). The NO_INTERIOR bit means that if the GC finds a pointer-like value that looks like it points to the inside of this block, ignore it as a non-pointer, because pointers to this block only ever point to its head (the supposed pointer is actually not a real pointer, but an integer value that happens to have a pointer-like value).

The other flags have very specific uses that, if you don't know what they actually do, you probably don't need them and shouldn't touch them.

Flags you should be able to use:

NO_SCAN
FINALIZE
APPENDABLE
NO_INTERIOR (very cautiously)

Do not use any other bits directly. A future version of D likely will migrate these into function parameters instead of providing bits.

-Steve

Top | Forum index | About this forum

Forums