Jump to page: 1 2 3
Thread overview
how to use GC as a leak detector? i.e. get some help info from GC?
May 24, 2009
nobody
May 24, 2009
Jason House
May 24, 2009
nobody
May 24, 2009
Frits van Bommel
May 25, 2009
Leandro Lucarella
feature request expanded class object? Re: how to use GC as a leak detector? i.e. get some help info from GC?
May 25, 2009
nobody
May 25, 2009
Leandro Lucarella
May 24, 2009
nobody
May 24, 2009
Nick Sabalausky
May 24, 2009
dsimcha
May 25, 2009
nobody
May 25, 2009
dsimcha
May 25, 2009
nobody
May 25, 2009
Brad Roberts
May 25, 2009
nobody
May 25, 2009
Leandro Lucarella
May 25, 2009
nobody
May 25, 2009
Leandro Lucarella
May 25, 2009
nobody
May 25, 2009
Leandro Lucarella
May 24, 2009
Hi,

I'm writing a data processing program in D, which deals with large amounts of small objects. One of the thing I found is that D's GC is horribly slow in such situation. I tried my program with gc enable & disabled (with some manual deletes). The GC disabled version (2 min) is ~100 times faster than the GC enabled version (4 hours)!

But of course the GC disabled version still leak memory, it soon exceeds the machine memory limit when I try to process more data; while the GC enabled version don't have such problem.

So my plan is to use the GC disabled version with manual deletes. But it was very hard to find all the memory leaks. I'm wondering: is there anyway to use GC as a leak detector? can the GC enabled version give me some help information on which objects get collected, so I can manually delete them in my GC disabled version?  Thanks!


May 24, 2009
nobody Wrote:

> Hi,
> 
> I'm writing a data processing program in D, which deals with large amounts of small objects. One of the thing I found is that D's GC is horribly slow in such situation. I tried my program with gc enable & disabled (with some manual deletes). The GC disabled version (2 min) is ~100 times faster than the GC enabled version (4 hours)!
> 
> But of course the GC disabled version still leak memory, it soon exceeds the machine memory limit when I try to process more data; while the GC enabled version don't have such problem.
> 
> So my plan is to use the GC disabled version with manual deletes. But it was very hard to find all the memory leaks. I'm wondering: is there anyway to use GC as a leak detector? can the GC enabled version give me some help information on which objects get collected, so I can manually delete them in my GC disabled version?  Thanks!
> 
> 

Why not use valgrind? With the GC disabled, it should give accurate results.
May 24, 2009
Theoretically, you could recompile the GC to write to a log file any time it frees anything.

For data processing, though, you really want to try to have a fixed memory buffer.  You've got to be hurting from the allocations and frees, which if at all possible you should get rid of.

Also, if you're allocating buffers of memory (e.g. for the data), you can tell the GC not to scan them.  This will probably solve the problem of the GC being so slow.

-[Unknown]


nobody wrote:
> Hi,
> 
> I'm writing a data processing program in D, which deals with large amounts of
> small objects. One of the thing I found is that D's GC is horribly slow in
> such situation. I tried my program with gc enable & disabled (with some manual
> deletes). The GC disabled version (2 min) is ~100 times faster than the GC
> enabled version (4 hours)!
> 
> But of course the GC disabled version still leak memory, it soon exceeds the
> machine memory limit when I try to process more data; while the GC enabled
> version don't have such problem.
> 
> So my plan is to use the GC disabled version with manual deletes. But it was
> very hard to find all the memory leaks. I'm wondering: is there anyway to use
> GC as a leak detector? can the GC enabled version give me some help
> information on which objects get collected, so I can manually delete them in
> my GC disabled version?  Thanks!
> 
> 
May 24, 2009
== Quote from Jason House (jason.james.house@gmail.com)'s article
> Why not use valgrind? With the GC disabled, it should give accurate results.

Strange enough, indeed I have tried valgrind with the GC disabled version.  It didn't report anything useful.

That's why I'm puzzled, does D's GC do something special?

The GC disabled version run out of 3G memory; but the GC enabled version stays at ~800M throughout the run.
May 24, 2009
"nobody" <no@where.com> wrote in message news:gvc5q7$2bc3$1@digitalmars.com...
> Hi,
>
> I'm writing a data processing program in D, which deals with large amounts
> of
> small objects. One of the thing I found is that D's GC is horribly slow in
> such situation. I tried my program with gc enable & disabled (with some
> manual
> deletes). The GC disabled version (2 min) is ~100 times faster than the GC
> enabled version (4 hours)!
>
> But of course the GC disabled version still leak memory, it soon exceeds
> the
> machine memory limit when I try to process more data; while the GC enabled
> version don't have such problem.
>
> So my plan is to use the GC disabled version with manual deletes. But it
> was
> very hard to find all the memory leaks. I'm wondering: is there anyway to
> use
> GC as a leak detector? can the GC enabled version give me some help
> information on which objects get collected, so I can manually delete them
> in
> my GC disabled version?  Thanks!
>

Depending how exactly your program is working, another common thing that might help is to manually manage free pools. Ie, allocate a bunch up-front, and instead of letting one get GCed when done with it, hold on to it, make note of it being available for re-use, and then reuse it instead of allocating a new one. Or, allocate one big chuck of memory and stick your small objects in there. They typically do this sort of thing for particle systems.


May 24, 2009
== Quote from Unknown W. Brackets (unknown@simplemachines.org)'s article
> Theoretically, you could recompile the GC to write to a log file any time it frees anything.

Is it possible to recompile Phobos to let the GC write to a log whenever it frees? I guess I also need the type info of the object being freed.


May 24, 2009
nobody wrote:
> == Quote from Jason House (jason.james.house@gmail.com)'s article
>> Why not use valgrind? With the GC disabled, it should give accurate results.
> 
> Strange enough, indeed I have tried valgrind with the GC disabled version.  It
> didn't report anything useful.
> 
> That's why I'm puzzled, does D's GC do something special?

The GC allocates memory directly from the OS, it doesn't use malloc/free and friends. It does this even when the GC is "disabled", which just means the collections won't happen. (Disabling the GC doesn't change the method of allocation)
Valgrind probably doesn't detect those OS calls (and almost certainly doesn't know about the GC calls).

If you're using Tango, you can link to the 'stub' GC instead of the normal ('basic') one. The stub GC doesn't actually collect, it passes calls on to malloc/calloc/realloc/free instead. That should make Valgrind work.
(something similar probably applies if you're using D2 with druntime)
May 24, 2009
== Quote from nobody (no@where.com)'s article
> Hi,
> I'm writing a data processing program in D, which deals with large amounts of
> small objects. One of the thing I found is that D's GC is horribly slow in
> such situation. I tried my program with gc enable & disabled (with some manual
> deletes). The GC disabled version (2 min) is ~100 times faster than the GC
> enabled version (4 hours)!
> But of course the GC disabled version still leak memory, it soon exceeds the
> machine memory limit when I try to process more data; while the GC enabled
> version don't have such problem.
> So my plan is to use the GC disabled version with manual deletes. But it was
> very hard to find all the memory leaks. I'm wondering: is there anyway to use
> GC as a leak detector? can the GC enabled version give me some help
> information on which objects get collected, so I can manually delete them in
> my GC disabled version?  Thanks!

I've dealt with a bunch of somewhat similar situations in code I've written, here are some tips that others have not already mentioned, and that might be less drastic than going with fully manual memory management:

One thing you could try is disabling the GC (this really just disables automatic
running of the collector) and run it manually at points that you know make sense.
 For example, you could just insert a GC.collect() statement at the end of every
run of your main loop.

Another thing to try is avoiding appending to arrays.  If you know the length in advance, you can get pretty good speedups by pre-allocating the array instead of appending using the ~= operator.

You can safely delete specific objects manually even when the GC is enabled.  For very large objects with trivial lifetimes, this is probably worth doing.  First of all, the GC will run less frequently.  Secondly, D's GC is partially conservative, meaning that occasionally memory will not be freed when it should be.  The probability of this happening is proportional to the size of the memory block.

Lastly, I've been working on a generic second stack/mark-release allocator for D2,
called TempAlloc.  It's useful for when you need to temporarily allocate memory in
a last in, first out order, but you can't use the call stack for whatever reason.
 I've also implemented a few basic data structures (hash tables and hash sets)
that are specifically designed for this allocator.  Right now, it's coevolving
with my dstats statistics lib, but if you want to try it or at least look at it
and give me some feedback, I'd like to eventually get it to the point where it can
be added to Phobos and/or Tango.  See
http://svn.dsource.org/projects/dstats/docs/alloc.html .
May 25, 2009
> One thing you could try is disabling the GC (this really just disables automatic
> running of the collector) and run it manually at points that you know make sense.
>  For example, you could just insert a GC.collect() statement at the end of every
> run of your main loop.
> Another thing to try is avoiding appending to arrays.  If you know the length in
> advance, you can get pretty good speedups by pre-allocating the array instead of
> appending using the ~= operator.
> You can safely delete specific objects manually even when the GC is enabled.  For
> very large objects with trivial lifetimes, this is probably worth doing.  First of
> all, the GC will run less frequently.  Secondly, D's GC is partially conservative,
> meaning that occasionally memory will not be freed when it should be.  The
> probability of this happening is proportional to the size of the memory block.

I have tried all these: with GC enabled only periodically runs in the main loop, however the memory still grows faster than I expected when I feed more data into the program. Then I manually delete some specific objects. However the program start to fail randomly.

Has anyone experienced similar issues: i.e. with GC on, you defined you own dtor for certain class, and called delete manually on certain objects.

The program fails at random stages, with some stack trace showing some GC calls like:

 0x0821977a in _D2gc3gcx3Gcx16fullcollectshellMFZk ()

I suspected the GC is buggy when mixed with manual deletes.
May 25, 2009
== Quote from nobody (no@where.com)'s article
> > One thing you could try is disabling the GC (this really just disables automatic
> > running of the collector) and run it manually at points that you know make sense.
> >  For example, you could just insert a GC.collect() statement at the end of every
> > run of your main loop.
> > Another thing to try is avoiding appending to arrays.  If you know the length in
> > advance, you can get pretty good speedups by pre-allocating the array instead of
> > appending using the ~= operator.
> > You can safely delete specific objects manually even when the GC is enabled.  For
> > very large objects with trivial lifetimes, this is probably worth doing.  First of
> > all, the GC will run less frequently.  Secondly, D's GC is partially conservative,
> > meaning that occasionally memory will not be freed when it should be.  The
> > probability of this happening is proportional to the size of the memory block.
> I have tried all these: with GC enabled only periodically runs in the main loop,
> however the memory still grows faster than I expected when I feed more data into
> the program. Then I manually delete some specific objects. However the program
> start to fail randomly.
> Has anyone experienced similar issues: i.e. with GC on, you defined you own dtor
> for certain class, and called delete manually on certain objects.
> The program fails at random stages, with some stack trace showing some GC calls
like:
>  0x0821977a in _D2gc3gcx3Gcx16fullcollectshellMFZk ()
> I suspected the GC is buggy when mixed with manual deletes.

I personally have not experienced this.  Please be more specific:

D1 or D2?
If D1, Phobos or Tango?
DMD, LDC, or GDC?
Compiler version?

Also, please file a bug report, especially if you can create a concise, reproducible test case.
« First   ‹ Prev
1 2 3