Thread overview
Profiling Garbage Collector
Nov 29, 2007
Chad J
Nov 29, 2007
Kris
Nov 29, 2007
Chad J
Nov 29, 2007
Sean Kelly
Nov 29, 2007
Chad J
Nov 30, 2007
Robert Fraser
November 29, 2007
As I contemplated the challenge of determining whether or not a library (not an entire program) causes heap activity and whether or not it leaves garbage for the gc, I decided that having a profiling garbage collector would be really useful for such tasks.

Such a profiling collector would have the following features:
- For each type in the program, it would track how many times each type is allocated, how many times each type is manually deleted, and how many times each type is collected by the garbage collector.
- It can return a string containing the above information.  Also, it'd be nice if the gc could summarize by saying which types were "leaked" most frequently.
- Ideally, it not only uses type information but also has some help from the compiler.  Thus it not only knows about types, but can keep track of every single allocation on a file by file, line by line, basis.  That way finding leaks would be as simple as reading "the allocation of Foo at line 42 in file bar.d was collected 1762 times".  This may require some extra origination information to be stored in each object/array/etc for programs undergoing GC profiling.

Well, other than spouting the idea and refusing to implement it, I am wondering - has anyone made this kind of thing yet?
November 29, 2007
There was a conversation about this just the other week, for Tango. It's on the cards, with some really slick features :)


"Chad J" <gamerChad@_spamIsBad_gmail.com> wrote in message news:fil6a2$2nu0$1@digitalmars.com...
> As I contemplated the challenge of determining whether or not a library (not an entire program) causes heap activity and whether or not it leaves garbage for the gc, I decided that having a profiling garbage collector would be really useful for such tasks.
>
> Such a profiling collector would have the following features:
> - For each type in the program, it would track how many times each type is
> allocated, how many times each type is manually deleted, and how many
> times each type is collected by the garbage collector.
> - It can return a string containing the above information.  Also, it'd be
> nice if the gc could summarize by saying which types were "leaked" most
> frequently.
> - Ideally, it not only uses type information but also has some help from
> the compiler.  Thus it not only knows about types, but can keep track of
> every single allocation on a file by file, line by line, basis.  That way
> finding leaks would be as simple as reading "the allocation of Foo at line
> 42 in file bar.d was collected 1762 times".  This may require some extra
> origination information to be stored in each object/array/etc for programs
> undergoing GC profiling.
>
> Well, other than spouting the idea and refusing to implement it, I am wondering - has anyone made this kind of thing yet?


November 29, 2007
Kris wrote:
> There was a conversation about this just the other week, for Tango. It's on the cards, with some really slick features :)
> 
> 

Awesome!
November 29, 2007
Chad J wrote:
> As I contemplated the challenge of determining whether or not a library (not an entire program) causes heap activity and whether or not it leaves garbage for the gc, I decided that having a profiling garbage collector would be really useful for such tasks.
> 
> Such a profiling collector would have the following features:
> - For each type in the program, it would track how many times each type is allocated, how many times each type is manually deleted, and how many times each type is collected by the garbage collector.
> - It can return a string containing the above information.  Also, it'd be nice if the gc could summarize by saying which types were "leaked" most frequently.
> - Ideally, it not only uses type information but also has some help from the compiler.  Thus it not only knows about types, but can keep track of every single allocation on a file by file, line by line, basis.  That way finding leaks would be as simple as reading "the allocation of Foo at line 42 in file bar.d was collected 1762 times".  This may require some extra origination information to be stored in each object/array/etc for programs undergoing GC profiling.
> 
> Well, other than spouting the idea and refusing to implement it, I am wondering - has anyone made this kind of thing yet?

Tracking "leaked" objects is quite easy to do in Tango.  Check out GC.collectHandler in tango.core.Memory.  There is currently no way to track objects that were manually deleted, but it wouldn't be difficult to add a similar hook for that.


Sean
November 29, 2007
Sean Kelly wrote:
> 
> Tracking "leaked" objects is quite easy to do in Tango.  Check out GC.collectHandler in tango.core.Memory.  There is currently no way to track objects that were manually deleted, but it wouldn't be difficult to add a similar hook for that.
> 
> 
> Sean

Yeah, when I thought of this I checked Tango and saw that.  I like that collect handler feature, a lot.  I don't think it pegs profiling though: even assuming I implement all of the logic and pass it to the tango GC, it still doesn't handle non-object entities, entities such as the ubiquitous array.

Well, you guys have probably discussed this already so feel free to ignore the below.  I feel like rambling.  I'm having fun with this :)

I also realized that even though the compiler might not be able to supply line and file info for an allocation, the library can still do a stack trace and discover the name of the function that caused the allocation.  This does, of course, assume debugging info that allows stack tracing - something that already exists.

So perhaps the GC/Tango needs these things to pull it off:
- An allocation handler.
- A collection handler.  (partially done)
- A deletion handler.
- A copying handler, if the GC wants to do copying.  This will be necessary to persist origination info.
- Maybe a reallocation handler, or maybe this can be thought of as a kind of allocation.
- All handlers must disclose the address(es) involved.
- All handlers must disclose full runtime type information.
- Each type (allocation/collection/deletion/copying) of handler must handle ANY kind of heap activity, not just objects.  Specialized handlers may be built on top of that.
- A stack trace function.  Preferably it lets you select which frames you want to see.

All of those are useful in general, but also just happen to be the right combination of stuff to implement a powerful gc profiler.  Now the profiler can be implemented in a separate module or somesuch.

The profiler can keep track of where each individual chunk of memory was allocated by means of an (associative?) array that maps addresses onto the function names that allocated them.  Then when a deletion/collection occurs, it can just look up the name of the function that caused the corresponding allocation.

I would love to even see just a stack trace function in Tango.  Even better if Exception calls it and dumps the thing to the console so I can know where I screwed up.
November 30, 2007
Chad J wrote:
> I would love to even see just a stack trace function in Tango.  Even better if Exception calls it and dumps the thing to the console so I can know where I screwed up.

There already is a stack trace "hook" in Tango. You can use Flectioned to get exception stack traces for it.