View mode: basic / threaded / horizontal-split · Log in · Help
October 25, 2004
gc question
I have a question about garbage that I'm curious if anyone can help out
with. I'm wondering what percentage of memory allocations contain pointers.
A string, for example, doesn't contain pointers. An array of pointers does
contain pointers. What's the typical breakdown? I'm curious because I'm
experimenting with the GC and if the "new" operator can tell if the thing
it is allocating doesn't have any pointers then it can allocate it from a
region of the GC that doesn't get scanned for pointers - hence speed up the
marking phase of a collection. I'm guessing some paper exists out there
that has studied this but I can't find anything useful.

thanks,
Ben
October 25, 2004
Re: gc question
In article <clho8q$pnt$1@digitaldaemon.com>, Ben Hinkle says...
>
>I have a question about garbage that I'm curious if anyone can help out
>with. I'm wondering what percentage of memory allocations contain pointers.
>A string, for example, doesn't contain pointers. An array of pointers does
>contain pointers. What's the typical breakdown? I'm curious because I'm
>experimenting with the GC and if the "new" operator can tell if the thing
>it is allocating doesn't have any pointers then it can allocate it from a
>region of the GC that doesn't get scanned for pointers - hence speed up the
>marking phase of a collection. I'm guessing some paper exists out there
>that has studied this but I can't find anything useful.

I'm not sure if this link contains a paper that directly applies, but it does
have a lot of good information: http://www.cs.utexas.edu/users/oops/papers.html


Sean
October 26, 2004
Re: gc question
Sean Kelly wrote:

> In article <clho8q$pnt$1@digitaldaemon.com>, Ben Hinkle says...
>>
>>I have a question about garbage that I'm curious if anyone can help out
>>with. I'm wondering what percentage of memory allocations contain
>>pointers. A string, for example, doesn't contain pointers. An array of
>>pointers does contain pointers. What's the typical breakdown? I'm curious
>>because I'm experimenting with the GC and if the "new" operator can tell
>>if the thing it is allocating doesn't have any pointers then it can
>>allocate it from a region of the GC that doesn't get scanned for pointers
>>- hence speed up the marking phase of a collection. I'm guessing some
>>paper exists out there that has studied this but I can't find anything
>>useful.
> 
> I'm not sure if this link contains a paper that directly applies, but it
> does have a lot of good information:
> http://www.cs.utexas.edu/users/oops/papers.html
> 
> 
> Sean

nice link. Looking at the Boehm collector it has an API malloc_atomic that
is used for memory that doesn't have any pointers in it. I'll try poking
around with GDC to see if it can be used (if it isn't used already). I'd
also like to try some of the other hooks like incremental GC and parallel
marking etc.

-Ben
October 28, 2004
Re: gc question
Ben Hinkle wrote:

> I have a question about garbage that I'm curious if anyone can help out
> with. I'm wondering what percentage of memory allocations contain
> pointers. A string, for example, doesn't contain pointers. An array of
> pointers does contain pointers. What's the typical breakdown? I'm curious
> because I'm experimenting with the GC and if the "new" operator can tell
> if the thing it is allocating doesn't have any pointers then it can
> allocate it from a region of the GC that doesn't get scanned for pointers
> - hence speed up the marking phase of a collection. I'm guessing some
> paper exists out there that has studied this but I can't find anything
> useful.
> 
> thanks,
> Ben

I can't help directly with the question, but I wanted to mention that I
think you are definately on the right track as far as improving D through
improving the GC marking phase.

I've been benchmarking some stuff and have noticed that the DMD compiler
generally puts out some very competitive code for operations that don't
involve a great deal of de/allocation - sometimes a bit better than Intel
C/++ (in my book that's outstanding), for operations on primitive types,
conditional jumps, vectors, etc. GDC is competitive also - seems to be
pretty close to DMD overall, which I think really validates the work Walter
is doing on both the language design and compiler, as well as David
Friedman's work on GDC.

All that to underscore that the one area where D seems to lag right now is
in GC allocation, and that seems to be strongly tied to the collection that
kicks in before the heap is expanded:

gcx.d/malloc()
...
#   if (!gcx.fullcollectshell())    // collect to find a new page
#   {
#       //gcx.newPool(1);
#   }
...

If you drill into that further, most of that time spent during the
collection is during the sweep phase (obviously).

If some way could be found to improve that, then I think that would really
help D v1.0 get out of the gate well. An improvement here would obviously
effect a lot of important things - string/array concatentation, object
instantiation, AA perf., stream operations, etc..

- Dave
October 28, 2004
Re: gc question
In article <clpedn$l7p$1@digitaldaemon.com>, Dave says...
>

[snip]

>I can't help directly with the question, but I wanted to mention that I
>think you are definately on the right track as far as improving D through
>improving the GC marking phase.

[snip]

>If you drill into that further, most of that time spent during the
>collection is during the sweep phase (obviously).

That should have been "...during the mark phase...".
October 28, 2004
Re: gc question
In article <clpedn$l7p$1@digitaldaemon.com>, Dave says...
>
>[snip]
>
>All that to underscore that the one area where D seems to lag right now is
>in GC allocation, and that seems to be strongly tied to the collection that
>kicks in before the heap is expanded:

I suppose you could always disable/enable the more sensitive and critical
sections of your code to side-step this.  Of course, it just puts of the
inevidable, so its really a band-aid approach at best.

Off the top of my head, the only optimization I can think of here is to opt for
an incremental collector design (provided that D isn't one already).  At least
then your overhead per collection phase will be /closer/ to constant than it is
at present.

BTW, has anyone done anything to make the GC more pluggable?  Anyone out there
done any GC hacking on D already?  I'm curious to see what has been, and what
could be, done in this area.

The reason why I bring this up is that any performance concern for something as
fundamental as the GC is going to have a penalty for a certain category of
applications.  Optional GC types is one way to satisfy the wide range of memory
managment strategies needed by a language platform such as D.

If one could compile-in a different flavor of GC (stop-and-collect, incremental,
compacting, generational/copying, lazy-as-hell, none-at-all?) you could at least
pick the set of consequences that you're stuck with.

EricAnderton at --yahoo--
October 28, 2004
Re: gc question
In article <clrn6d$flm$1@digitaldaemon.com>, pragma says...
>
>BTW, has anyone done anything to make the GC more pluggable?  Anyone out there
>done any GC hacking on D already?  I'm curious to see what has been, and what
>could be, done in this area.

Not really, though I've been gutting Phobos in a more general sense.  I would
like to modularize all the internals, but haven't quite gotten this far yet--I'm
still working on separating out a minimal D runtime from the rest of Phobos.
Follow up in the Ares forum or here if you're so inclined.


Sean
October 28, 2004
Re: gc question
pragma wrote:
> In article <clpedn$l7p$1@digitaldaemon.com>, Dave says...
> 
>>[snip]
>>
>>All that to underscore that the one area where D seems to lag right now is
>>in GC allocation, and that seems to be strongly tied to the collection that
>>kicks in before the heap is expanded:
> 
> I suppose you could always disable/enable the more sensitive and critical
> sections of your code to side-step this.  Of course, it just puts of the
> inevidable, so its really a band-aid approach at best.

I'd like to quibble a bit here.  The cost of performing a GC sweep 
(given our current, mark-and-sweep collector) is roughly propotional to 
the current *active* set.  So if you put off the GC run for a while, and 
generate a lot more garbage in the meantime (but don't increase the 
active set of your program) then running a GC sweep at some later date 
is no more costly than now.

That is, by delaying the GC sweep a while, you aren't necessarily making 
the next GC sweep any longer.

Of course, as garbage piles up, that means that you have more and more 
destructors to run, so there is some increased cost, but the total cost 
here (sum over the lifetime of the program) is proportional to the 
number of garbage objects you create, and has nothing to do with the 
frequency with which the GC runs.

What this all means is that, IMHO, you should try to limit the number of 
times when the GC runs, and when it runs, you want it to run when you 
have a small working set.  In other words, if you have an algorithm that 
creates lots and lots of objects, and then releases most of them, you 
might run the GC before and after the algorithm, but leave it disabled 
during the algorithm.
October 29, 2004
Re: gc question
I think you could get this optimization with something like:

:class MemHandle {
:  byte * data;
:  uint   length;
:  this(uint x) { data = malloc(length=x); }
:  ~this() { free(data); }
:}

I think the malloc'd area should be free of GC interactions (it won't be
scanned) unless the GC's "addRange" is called.  This object will ensure
collection happens (so keep a pointer to it).

Maybe it would be cleaner to directly support objects that have "in" pointers
but not "out" pointers (char arrays for example).

Kevin

In article <clho8q$pnt$1@digitaldaemon.com>, Ben Hinkle says...
>
>I have a question about garbage that I'm curious if anyone can help out
>with. I'm wondering what percentage of memory allocations contain pointers.
>A string, for example, doesn't contain pointers. An array of pointers does
>contain pointers. What's the typical breakdown? I'm curious because I'm
>experimenting with the GC and if the "new" operator can tell if the thing
>it is allocating doesn't have any pointers then it can allocate it from a
>region of the GC that doesn't get scanned for pointers - hence speed up the
>marking phase of a collection. I'm guessing some paper exists out there
>that has studied this but I can't find anything useful.
>
>thanks,
>Ben
Top | Discussion index | About this forum | D home