Thread overview
Copying/heap compacting GC
Mar 24, 2004
Stewart Gordon
Mar 24, 2004
Ben Hinkle
Mar 25, 2004
Stewart Gordon
Mar 24, 2004
Stephan Wienczny
Mar 25, 2004
Cloud9 Virtual
May 05, 2005
Kevin Bealer
May 06, 2005
Stewart Gordon
March 24, 2004
http://www.digitalmars.com/d/garbage.html

"#  Modern garbage collectors are far more advanced now than the older, slower ones. Generational, copying collectors eliminate much of the inefficiency of early mark and sweep algorithms.
# Modern garbage collectors do heap compaction. Heap compaction tends to reduce the number of pages actively referenced by a program, which means that memory accesses are more likely to be cache hits and less swapping."

The fact that this is given in the context of D suggests that these kinds of GC are valid implementations for D.

The trouble is that copying and heap compaction naturally change the memory addresses to which pointers point.  So before this can work, the "pointers being pointers and not pointers being not pointers" needs to be considered further.

The isuses I can see are:

- Unions.  Obviously, if it doesn't know whether something is a pointer or not, it can't know whether to reassign it when the suspected referent is moved.

- Crocky APIs.  std.c.windows.windows defines handles as pointers. Maybe they are, just not into memory that an application is going to manipulate directly.  Are they?  If not, they could clash with pointers into the GC heap, and a GC pass would then screw up the handle. Something else that's cast into a pointer type but blatantly isn't is numeric resource IDs.

- Pointers into the GC heap held by foreign code.  Obviously you need a pointer on the D side as well.  But if the GC moves the object, while the D side pointer'll be updated, the foreign one surely won't.

- std.gc.addRange.  Would it be at all straightforward to know what's a pointer and what isn't within an added range?  (Or is this only meant for adding arrays of pointers?  This isn't crystal clear.)

Maybe someone has ideas....

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment.  Please keep replies on the 'group where everyone may benefit.
March 24, 2004
On Wed, 24 Mar 2004 09:16:20 +0000, Stewart Gordon <smjg_1998@yahoo.com> wrote:

>http://www.digitalmars.com/d/garbage.html
>
>"#  Modern garbage collectors are far more advanced now than the older,
>slower ones. Generational, copying collectors eliminate much of the
>inefficiency of early mark and sweep algorithms.
># Modern garbage collectors do heap compaction. Heap compaction tends to
>reduce the number of pages actively referenced by a program, which means
>that memory accesses are more likely to be cache hits and less swapping."
>
>The fact that this is given in the context of D suggests that these kinds of GC are valid implementations for D.
>
>The trouble is that copying and heap compaction naturally change the memory addresses to which pointers point.  So before this can work, the "pointers being pointers and not pointers being not pointers" needs to be considered further.

I worry about this, too, though I suspect most any GC algorithm can be
adapted to work with "ambiguous pointers". The price is GC complexity
and probably some performance hit. I'm not a GC expert though so
take my answers with a huge grain of salt.

>The isuses I can see are:
>
>- Unions.  Obviously, if it doesn't know whether something is a pointer or not, it can't know whether to reassign it when the suspected referent is moved.

There are "mostly-copying" collectors which, I think, know about
data blocks that can be moved and data blocks that can't. That
means the GC knows about the union pieces of a data structure
as well as the "normal" pointer pieces. Basically it makes the
GC and runtime type info requirements more complex but since
unions probably don't show up all that much it still means
the collector copies and compacts most of the time.

>- Crocky APIs.  std.c.windows.windows defines handles as pointers. Maybe they are, just not into memory that an application is going to manipulate directly.  Are they?  If not, they could clash with pointers into the GC heap, and a GC pass would then screw up the handle. Something else that's cast into a pointer type but blatantly isn't is numeric resource IDs.

I assume the GC can tell the difference between pointers into the GC heap and pointers elsewhere.

>- Pointers into the GC heap held by foreign code.  Obviously you need a pointer on the D side as well.  But if the GC moves the object, while the D side pointer'll be updated, the foreign one surely won't.

I guess the GC would need an API to mark a data block as being unmovable.

>- std.gc.addRange.  Would it be at all straightforward to know what's a pointer and what isn't within an added range?  (Or is this only meant for adding arrays of pointers?  This isn't crystal clear.)

I assume that addRange should work with arbitrary data structures and not just arrays of pointers. In any case the addRange'd data would be unmovable and the things pointed to by the data would be unmovable.

>Maybe someone has ideas....
>
>Stewart.

March 24, 2004
Do you know a copying gc algorithm we could use?
What should be used incremental or not?

Stephan
March 25, 2004
Stephan Wienczny wrote:
> 
> Do you know a copying gc algorithm we could use?
> What should be used incremental or not?
> 
> Stephan
For a definitive treatment of all publicly available garbage collecting algorithms get this:

http://www.amazon.com/exec/obidos/tg/detail/-/0471941484/qid=1080203603/sr=1-9/ref=sr_1_9/102-8193337-4992900?v=glance&s=books

There are a couple of proprietary/patented strategies out there that are not in the book; I can provide pointers if anyone wants them. BTW there is nothing wrong with the stock mark&sweep algorithm if you don't care about real-time collection.
March 25, 2004
Ben Hinkle wrote:
> On Wed, 24 Mar 2004 09:16:20 +0000, Stewart Gordon <smjg_1998@yahoo.com> wrote:
<snip>
>> - Crocky APIs.  std.c.windows.windows defines handles as pointers.
>>  Maybe they are, just not into memory that an application is going
>> to manipulate directly.  Are they?  If not, they could clash with
>> pointers into the GC heap, and a GC pass would then screw up the
>> handle. Something else that's cast into a pointer type but
>> blatantly isn't is numeric resource IDs.
> 
> I assume the GC can tell the difference between pointers into the GC
> heap and pointers elsewhere.

What has this to do with pointers that aren't pointers at all?

>> - Pointers into the GC heap held by foreign code.  Obviously you
>> need a pointer on the D side as well.  But if the GC moves the
>> object, while the D side pointer'll be updated, the foreign one
>> surely won't.
> 
> I guess the GC would need an API to mark a data block as being unmovable.
<snip>

Guess you're right.  makeImmovable and makeMovable?  Or unmakeImmovable?  Maybe setImmovable and clearImmovable?

And should the set and clear calls have to be numerically balanced in order to reinstate movability?

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the
unfortunate victim of intensive mail-bombing at the moment.  Please keep
replies on the 'group where everyone may benefit.
May 05, 2005
In article <c3rjl6$1nds$1@digitaldaemon.com>, Stewart Gordon says...
>
>http://www.digitalmars.com/d/garbage.html
>
>"#  Modern garbage collectors are far more advanced now than the older,
>slower ones. Generational, copying collectors eliminate much of the
>inefficiency of early mark and sweep algorithms.
># Modern garbage collectors do heap compaction. Heap compaction tends to
>reduce the number of pages actively referenced by a program, which means
>that memory accesses are more likely to be cache hits and less swapping."
>
>The fact that this is given in the context of D suggests that these kinds of GC are valid implementations for D.
>
>The trouble is that copying and heap compaction naturally change the memory addresses to which pointers point.  So before this can work, the "pointers being pointers and not pointers being not pointers" needs to be considered further.
>
>The isuses I can see are:
>
>- Unions.  Obviously, if it doesn't know whether something is a pointer or not, it can't know whether to reassign it when the suspected referent is moved.

Two potential solutions:  If D does not specify the exact layout of a union type, then D could track which field of the union you're using right now.  This is often done by the programmer, but D could do it internally.  It could be an additional debug-mode-only assertion() like array indices.  It would die if you tried to read read a different union field than you recently stored to.

More likely: in the few cases that you have unions, just pin down the disputed memory and move everything else.  Areas of memory allocated via malloc() and added to the GC via addRange() would also be included in the "immobile" category.

This is harder because it requires a "mostly copyin" collector, but I don't think it's fatal.  Some of the simplicity of a moving collector is lost.

>- Crocky APIs.  std.c.windows.windows defines handles as pointers. Maybe they are, just not into memory that an application is going to manipulate directly.  Are they?  If not, they could clash with pointers into the GC heap, and a GC pass would then screw up the handle. Something else that's cast into a pointer type but blatantly isn't is numeric resource IDs.

If you work with these you could allocate them via "malloc()" or similar, and they would act like a giant union as above.

>- Pointers into the GC heap held by foreign code.  Obviously you need a pointer on the D side as well.  But if the GC moves the object, while the D side pointer'll be updated, the foreign one surely won't.

Umm... pin them down by putting them in a union as above.... did I just invent a new kinda "locked memory" pointer?

>- std.gc.addRange.  Would it be at all straightforward to know what's a pointer and what isn't within an added range?  (Or is this only meant for adding arrays of pointers?  This isn't crystal clear.)
>
>Maybe someone has ideas....
>
>Stewart.
>
>-- 
>My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment.  Please keep replies on the 'group where everyone may benefit.

I think the thing is to ask yourself, WHY is a moving collector better than a non-moving collector?  To get the whole story, you can read one of the "gc surveys".  A famous one: (ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps).

But the gist as I remember it, is two fold:

A. Structures like trees and linked lists are iterated "in order", both in the program and in the GC.  This means that consecutive list nodes and the data they point to (if any) get moved into the new locations IN ORDER.  This is great for cache efficiency because memory access is almost array-like if you do it sequentially.  All the nodes of a list get moved into consequetive spots.

--> Essentially the garbage collector becomes a heap data structure optimizer.

B. Solves most internal fragmentation problems.  Non-moving collectors sometimes end up in states where most of the free memory is divided into small pieces that are small.  If this happens, the memory can be exhausted.  There is plenty of space, but not enough in any one place.  Copying collectors are normally immune to this effect, which can be a boon for real-time systems that could otherwise die from this effect "one in a million times".

So.... if you can get a 90 % copying collector (ie "conservative" compaction), it might be good enough.  A few unions stuck here and there will not be fatal to this plan, as long as the "fully understood" D objects are movable.

Kevin



May 06, 2005
Kevin Bealer wrote:
> In article <c3rjl6$1nds$1@digitaldaemon.com>, Stewart Gordon says...
<snip>
>>- Unions.  Obviously, if it doesn't know whether something is a pointer or not, it can't know whether to reassign it when the suspected referent is moved.
> 
> Two potential solutions:  If D does not specify the exact layout of a union
> type, then D could track which field of the union you're using right now. 

Structs and unions are designed to be compatible with the C ABI.  As such, there's no room for that extra piece of information.

> This is often done by the programmer, but D could do it internally.  It could be an
> additional debug-mode-only assertion() like array indices.  It would die if you
> tried to read read a different union field than you recently stored to.
> 
> More likely: in the few cases that you have unions, just pin down the disputed
> memory and move everything else.  Areas of memory allocated via malloc() and
> added to the GC via addRange() would also be included in the "immobile"
> category.

The difficulty is that the GC would need to know in advance whether a given piece of heap memory is pointed to from a union.  The essence of a copying GC is that it collects garbage by copying everything to newly allocated memory and then deallocating the old memory.  And so if it's copied a certain block and then it stumbles on a pointer to it in a union, it'll have to change back all the pointers it's changed.

> This is harder because it requires a "mostly copyin" collector, but I don't
> think it's fatal.  Some of the simplicity of a moving collector is lost.
> 
>>- Crocky APIs.  std.c.windows.windows defines handles as pointers. Maybe they are, just not into memory that an application is going to manipulate directly.  Are they?  If not, they could clash with pointers into the GC heap, and a GC pass would then screw up the handle. Something else that's cast into a pointer type but blatantly isn't is numeric resource IDs.
> 
> If you work with these you could allocate them via "malloc()" or similar, and
> they would act like a giant union as above.

You mean I should store all my resource IDs in malloc'd memory?  I'm not sure that that's always possible, nor how it would help.

>>- Pointers into the GC heap held by foreign code.  Obviously you need a pointer on the D side as well.  But if the GC moves the object, while the D side pointer'll be updated, the foreign one surely won't.
> 
> Umm... pin them down by putting them in a union as above.... did I just invent a
> new kinda "locked memory" pointer?

As I just said, having unions as the way of pinning is going to cause difficulties.  Better to add a specific pinning feature to D.

<snip>
> I think the thing is to ask yourself, WHY is a moving collector better than a
> non-moving collector?  To get the whole story, you can read one of the "gc
> surveys".  A famous one: (ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps).
> 
> But the gist as I remember it, is two fold:
> 
> A. Structures like trees and linked lists are iterated "in order", both in the
> program and in the GC.  This means that consecutive list nodes and the data they
> point to (if any) get moved into the new locations IN ORDER.  This is great for
> cache efficiency because memory access is almost array-like if you do it
> sequentially.  All the nodes of a list get moved into consequetive spots.

Assuming that there aren't other ways into the structures that may have been picked up first.

> --> Essentially the garbage collector becomes a heap data structure optimizer.
> 
> B. Solves most internal fragmentation problems.  Non-moving collectors sometimes
> end up in states where most of the free memory is divided into small pieces that
> are small.  If this happens, the memory can be exhausted.  There is plenty of
> space, but not enough in any one place.  Copying collectors are normally immune
> to this effect, which can be a boon for real-time systems that could otherwise
> die from this effect "one in a million times".
<snip>

I guess that just about every heap allocation system would have a cluster size.  So if your allocated objects aren't too big, then this won't be too much of a problem.  OTOH if you are working with large arrays, and these are competing for space with smaller things, then there is indeed this issue.

And as long as you don't use up so much memory that the copying GC has no room to manoeuvre....

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on the 'group where everyone may benefit.