View mode: basic / threaded / horizontal-split · Log in · Help
March 24, 2004
Copying/heap compacting GC
http://www.digitalmars.com/d/garbage.html

"#  Modern garbage collectors are far more advanced now than the older, 
slower ones. Generational, copying collectors eliminate much of the 
inefficiency of early mark and sweep algorithms.
# Modern garbage collectors do heap compaction. Heap compaction tends to 
reduce the number of pages actively referenced by a program, which means 
that memory accesses are more likely to be cache hits and less swapping."

The fact that this is given in the context of D suggests that these 
kinds of GC are valid implementations for D.

The trouble is that copying and heap compaction naturally change the 
memory addresses to which pointers point.  So before this can work, the 
"pointers being pointers and not pointers being not pointers" needs to 
be considered further.

The isuses I can see are:

- Unions.  Obviously, if it doesn't know whether something is a pointer 
or not, it can't know whether to reassign it when the suspected referent 
is moved.

- Crocky APIs.  std.c.windows.windows defines handles as pointers. 
Maybe they are, just not into memory that an application is going to 
manipulate directly.  Are they?  If not, they could clash with pointers 
into the GC heap, and a GC pass would then screw up the handle. 
Something else that's cast into a pointer type but blatantly isn't is 
numeric resource IDs.

- Pointers into the GC heap held by foreign code.  Obviously you need a 
pointer on the D side as well.  But if the GC moves the object, while 
the D side pointer'll be updated, the foreign one surely won't.

- std.gc.addRange.  Would it be at all straightforward to know what's a 
pointer and what isn't within an added range?  (Or is this only meant 
for adding arrays of pointers?  This isn't crystal clear.)

Maybe someone has ideas....

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.
March 24, 2004
Re: Copying/heap compacting GC
On Wed, 24 Mar 2004 09:16:20 +0000, Stewart Gordon
<smjg_1998@yahoo.com> wrote:

>http://www.digitalmars.com/d/garbage.html
>
>"#  Modern garbage collectors are far more advanced now than the older, 
>slower ones. Generational, copying collectors eliminate much of the 
>inefficiency of early mark and sweep algorithms.
># Modern garbage collectors do heap compaction. Heap compaction tends to 
>reduce the number of pages actively referenced by a program, which means 
>that memory accesses are more likely to be cache hits and less swapping."
>
>The fact that this is given in the context of D suggests that these 
>kinds of GC are valid implementations for D.
>
>The trouble is that copying and heap compaction naturally change the 
>memory addresses to which pointers point.  So before this can work, the 
>"pointers being pointers and not pointers being not pointers" needs to 
>be considered further.

I worry about this, too, though I suspect most any GC algorithm can be
adapted to work with "ambiguous pointers". The price is GC complexity
and probably some performance hit. I'm not a GC expert though so
take my answers with a huge grain of salt.

>The isuses I can see are:
>
>- Unions.  Obviously, if it doesn't know whether something is a pointer 
>or not, it can't know whether to reassign it when the suspected referent 
>is moved.

There are "mostly-copying" collectors which, I think, know about
data blocks that can be moved and data blocks that can't. That
means the GC knows about the union pieces of a data structure 
as well as the "normal" pointer pieces. Basically it makes the
GC and runtime type info requirements more complex but since 
unions probably don't show up all that much it still means 
the collector copies and compacts most of the time. 

>- Crocky APIs.  std.c.windows.windows defines handles as pointers. 
>Maybe they are, just not into memory that an application is going to 
>manipulate directly.  Are they?  If not, they could clash with pointers 
>into the GC heap, and a GC pass would then screw up the handle. 
>Something else that's cast into a pointer type but blatantly isn't is 
>numeric resource IDs.

I assume the GC can tell the difference between pointers into
the GC heap and pointers elsewhere.

>- Pointers into the GC heap held by foreign code.  Obviously you need a 
>pointer on the D side as well.  But if the GC moves the object, while 
>the D side pointer'll be updated, the foreign one surely won't.

I guess the GC would need an API to mark a data block as being
unmovable.

>- std.gc.addRange.  Would it be at all straightforward to know what's a 
>pointer and what isn't within an added range?  (Or is this only meant 
>for adding arrays of pointers?  This isn't crystal clear.)

I assume that addRange should work with arbitrary data structures
and not just arrays of pointers. In any case the addRange'd data
would be unmovable and the things pointed to by the data would
be unmovable.

>Maybe someone has ideas....
>
>Stewart.
March 24, 2004
Re: Copying/heap compacting GC
Do you know a copying gc algorithm we could use?
What should be used incremental or not?

Stephan
March 25, 2004
Re: Copying/heap compacting GC
Stephan Wienczny wrote:
> 
> Do you know a copying gc algorithm we could use?
> What should be used incremental or not?
> 
> Stephan
For a definitive treatment of all publicly available garbage collecting 
algorithms get this:

http://www.amazon.com/exec/obidos/tg/detail/-/0471941484/qid=1080203603/sr=1-9/ref=sr_1_9/102-8193337-4992900?v=glance&s=books

There are a couple of proprietary/patented strategies out there that are 
not in the book; I can provide pointers if anyone wants them. BTW there 
is nothing wrong with the stock mark&sweep algorithm if you don't care 
about real-time collection.
March 25, 2004
Re: Copying/heap compacting GC
Ben Hinkle wrote:
> On Wed, 24 Mar 2004 09:16:20 +0000, Stewart Gordon 
> <smjg_1998@yahoo.com> wrote:
<snip>
>> - Crocky APIs.  std.c.windows.windows defines handles as pointers.
>>  Maybe they are, just not into memory that an application is going
>> to manipulate directly.  Are they?  If not, they could clash with
>> pointers into the GC heap, and a GC pass would then screw up the
>> handle. Something else that's cast into a pointer type but
>> blatantly isn't is numeric resource IDs.
> 
> I assume the GC can tell the difference between pointers into the GC
> heap and pointers elsewhere.

What has this to do with pointers that aren't pointers at all?

>> - Pointers into the GC heap held by foreign code.  Obviously you
>> need a pointer on the D side as well.  But if the GC moves the
>> object, while the D side pointer'll be updated, the foreign one
>> surely won't.
> 
> I guess the GC would need an API to mark a data block as being 
> unmovable.
<snip>

Guess you're right.  makeImmovable and makeMovable?  Or unmakeImmovable? 
 Maybe setImmovable and clearImmovable?

And should the set and clear calls have to be numerically balanced in 
order to reinstate movability?

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the
unfortunate victim of intensive mail-bombing at the moment.  Please keep
replies on the 'group where everyone may benefit.
May 05, 2005
Re: Copying/heap compacting GC
In article <c3rjl6$1nds$1@digitaldaemon.com>, Stewart Gordon says...
>
>http://www.digitalmars.com/d/garbage.html
>
>"#  Modern garbage collectors are far more advanced now than the older, 
>slower ones. Generational, copying collectors eliminate much of the 
>inefficiency of early mark and sweep algorithms.
># Modern garbage collectors do heap compaction. Heap compaction tends to 
>reduce the number of pages actively referenced by a program, which means 
>that memory accesses are more likely to be cache hits and less swapping."
>
>The fact that this is given in the context of D suggests that these 
>kinds of GC are valid implementations for D.
>
>The trouble is that copying and heap compaction naturally change the 
>memory addresses to which pointers point.  So before this can work, the 
>"pointers being pointers and not pointers being not pointers" needs to 
>be considered further.
>
>The isuses I can see are:
>
>- Unions.  Obviously, if it doesn't know whether something is a pointer 
>or not, it can't know whether to reassign it when the suspected referent 
>is moved.

Two potential solutions:  If D does not specify the exact layout of a union
type, then D could track which field of the union you're using right now.  This
is often done by the programmer, but D could do it internally.  It could be an
additional debug-mode-only assertion() like array indices.  It would die if you
tried to read read a different union field than you recently stored to.

More likely: in the few cases that you have unions, just pin down the disputed
memory and move everything else.  Areas of memory allocated via malloc() and
added to the GC via addRange() would also be included in the "immobile"
category.

This is harder because it requires a "mostly copyin" collector, but I don't
think it's fatal.  Some of the simplicity of a moving collector is lost.

>- Crocky APIs.  std.c.windows.windows defines handles as pointers. 
>Maybe they are, just not into memory that an application is going to 
>manipulate directly.  Are they?  If not, they could clash with pointers 
>into the GC heap, and a GC pass would then screw up the handle. 
>Something else that's cast into a pointer type but blatantly isn't is 
>numeric resource IDs.

If you work with these you could allocate them via "malloc()" or similar, and
they would act like a giant union as above.

>- Pointers into the GC heap held by foreign code.  Obviously you need a 
>pointer on the D side as well.  But if the GC moves the object, while 
>the D side pointer'll be updated, the foreign one surely won't.

Umm... pin them down by putting them in a union as above.... did I just invent a
new kinda "locked memory" pointer?

>- std.gc.addRange.  Would it be at all straightforward to know what's a 
>pointer and what isn't within an added range?  (Or is this only meant 
>for adding arrays of pointers?  This isn't crystal clear.)
>
>Maybe someone has ideas....
>
>Stewart.
>
>-- 
>My e-mail is valid but not my primary mailbox, aside from its being the 
>unfortunate victim of intensive mail-bombing at the moment.  Please keep 
>replies on the 'group where everyone may benefit.

I think the thing is to ask yourself, WHY is a moving collector better than a
non-moving collector?  To get the whole story, you can read one of the "gc
surveys".  A famous one: (ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps).

But the gist as I remember it, is two fold:

A. Structures like trees and linked lists are iterated "in order", both in the
program and in the GC.  This means that consecutive list nodes and the data they
point to (if any) get moved into the new locations IN ORDER.  This is great for
cache efficiency because memory access is almost array-like if you do it
sequentially.  All the nodes of a list get moved into consequetive spots.

--> Essentially the garbage collector becomes a heap data structure optimizer.

B. Solves most internal fragmentation problems.  Non-moving collectors sometimes
end up in states where most of the free memory is divided into small pieces that
are small.  If this happens, the memory can be exhausted.  There is plenty of
space, but not enough in any one place.  Copying collectors are normally immune
to this effect, which can be a boon for real-time systems that could otherwise
die from this effect "one in a million times".

So.... if you can get a 90 % copying collector (ie "conservative" compaction),
it might be good enough.  A few unions stuck here and there will not be fatal to
this plan, as long as the "fully understood" D objects are movable.

Kevin
May 06, 2005
Re: Copying/heap compacting GC
Kevin Bealer wrote:
> In article <c3rjl6$1nds$1@digitaldaemon.com>, Stewart Gordon says...
<snip>
>>- Unions.  Obviously, if it doesn't know whether something is a pointer 
>>or not, it can't know whether to reassign it when the suspected referent 
>>is moved.
> 
> Two potential solutions:  If D does not specify the exact layout of a union
> type, then D could track which field of the union you're using right now. 

Structs and unions are designed to be compatible with the C ABI.  As 
such, there's no room for that extra piece of information.

> This is often done by the programmer, but D could do it internally.  It could be an
> additional debug-mode-only assertion() like array indices.  It would die if you
> tried to read read a different union field than you recently stored to.
> 
> More likely: in the few cases that you have unions, just pin down the disputed
> memory and move everything else.  Areas of memory allocated via malloc() and
> added to the GC via addRange() would also be included in the "immobile"
> category.

The difficulty is that the GC would need to know in advance whether a 
given piece of heap memory is pointed to from a union.  The essence of a 
copying GC is that it collects garbage by copying everything to newly 
allocated memory and then deallocating the old memory.  And so if it's 
copied a certain block and then it stumbles on a pointer to it in a 
union, it'll have to change back all the pointers it's changed.

> This is harder because it requires a "mostly copyin" collector, but I don't
> think it's fatal.  Some of the simplicity of a moving collector is lost.
> 
>>- Crocky APIs.  std.c.windows.windows defines handles as pointers. 
>>Maybe they are, just not into memory that an application is going to 
>>manipulate directly.  Are they?  If not, they could clash with pointers 
>>into the GC heap, and a GC pass would then screw up the handle. 
>>Something else that's cast into a pointer type but blatantly isn't is 
>>numeric resource IDs.
> 
> If you work with these you could allocate them via "malloc()" or similar, and
> they would act like a giant union as above.

You mean I should store all my resource IDs in malloc'd memory?  I'm not 
sure that that's always possible, nor how it would help.

>>- Pointers into the GC heap held by foreign code.  Obviously you need a 
>>pointer on the D side as well.  But if the GC moves the object, while 
>>the D side pointer'll be updated, the foreign one surely won't.
> 
> Umm... pin them down by putting them in a union as above.... did I just invent a
> new kinda "locked memory" pointer?

As I just said, having unions as the way of pinning is going to cause 
difficulties.  Better to add a specific pinning feature to D.

<snip>
> I think the thing is to ask yourself, WHY is a moving collector better than a
> non-moving collector?  To get the whole story, you can read one of the "gc
> surveys".  A famous one: (ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps).
> 
> But the gist as I remember it, is two fold:
> 
> A. Structures like trees and linked lists are iterated "in order", both in the
> program and in the GC.  This means that consecutive list nodes and the data they
> point to (if any) get moved into the new locations IN ORDER.  This is great for
> cache efficiency because memory access is almost array-like if you do it
> sequentially.  All the nodes of a list get moved into consequetive spots.

Assuming that there aren't other ways into the structures that may have 
been picked up first.

> --> Essentially the garbage collector becomes a heap data structure optimizer.
> 
> B. Solves most internal fragmentation problems.  Non-moving collectors sometimes
> end up in states where most of the free memory is divided into small pieces that
> are small.  If this happens, the memory can be exhausted.  There is plenty of
> space, but not enough in any one place.  Copying collectors are normally immune
> to this effect, which can be a boon for real-time systems that could otherwise
> die from this effect "one in a million times".
<snip>

I guess that just about every heap allocation system would have a 
cluster size.  So if your allocated objects aren't too big, then this 
won't be too much of a problem.  OTOH if you are working with large 
arrays, and these are competing for space with smaller things, then 
there is indeed this issue.

And as long as you don't use up so much memory that the copying GC has 
no room to manoeuvre....

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
the 'group where everyone may benefit.
Top | Discussion index | About this forum | D home