[xmlp] the recent garbage collector performance improvements (page 5)

On Thu, 02 Feb 2012 19:31:50 -0600, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:

> Speaking of GC improvements, I googled around a bit and found this
> thread from 2 years ago:
>
> 	http://www.digitalmars.com/d/archives/digitalmars/D/Is_there_a_modern_GC_for_D_105967.html
>
> Especially interesting is this paper:
>
> 	Nonintrusive Cloning Garbage Collector with Stock Operating
> 	System Support.
> 	        http://www.cs.purdue.edu/homes/grr/snapshot-gc.ps
>
> Has anything been done along these lines since that time? Seems like
> this particular GC algorithm is exactly the kind we need for D. It's a
> conservative mark-and-sweep algo with a very low overhead(*), mark phase
> concurrent with mutator thread(s), and lazy incremental sweeping at
> allocation time.  Synchronization is automatically done by default OS
> kernel-space mechanisms (copy on write memory pages).
>
> More to the point, how easy/hard is it to switch between GCs in the
> current D implementation(s)? I think it would be helpful if these kinds
> of experimental GCs were available in addition to the current default
> GC, and people can play around with them and find out which one(s) are
> the cream of the crop. Otherwise we're just bottlenecked at a small
> number of people who can actually play with GC algos for D -- which
> means improvements will be slow.
>
>
> (*) True on *nix systems anyway, but judging from comments in
> that thread, Windoze also was acquiring equivalent functionality -- and
> this being 2 years ago, I'm assuming it's available now.

So to answer your question, yes, someone has made one of these types of GC for D called CDGC. No, it doesn't look like Windows will support this anytime soon. And cloning GCs, don't solve the problems of large heaps, soft/hard realtime issues like game render threads, and it actually makes the embarrassingly long pause (when they happen) longer. Now, CDGC, as I understand it, is better written than our current GC and would probably improve things, but I don't see it as the final end goal.

February 25, 2012

Re: [xmlp] the recent garbage collector performance improvements

Posted by Michael Rynn
in reply to Richard Webb

Permalink

Michael Rynn

Posted in reply to Richard Webb

Permalink

On Thu, 02 Feb 2012 15:44:56 +0000, Richard Webb wrote:

> 


With the xml package (xmlp) , and the linked node DOM, the GC is likely to fail cleanup.  I divided the generated test file, with its 2 layer elements, into 5, 50, 500, 5000 sized files.  I put in a mixin on the Node class to do static counts of Node objects created and deleted.

I also set the document to be read multiple times, calling GC.collect between each one, and timed it, and the increase in read time, and the increase in GC.collect time, was strong evidence that the GC was unable to clean it up, even if, to my naive code inspection, the previous document instance was no longer reachable from GC "roots", ie, should have been directly replaced on stack.

The linkdom, rather too faithfully copies the java dom model, with double- linked pointers.  Elements have an "owner document" (I felt like ditching this).  All element children derive from ChildNode (Element, Text, CDATA and Comment), and have a pointer to the parent_ node. (could ditch this, maybe for CharData).   Anyway, its a pretty linked up conventional model. Java GC, evidently has got around problems of GC xml documents.

When I looked at C code for libxml2, for instance, all the Nodes have that sort of interlocked linkage. I tried once to see how far I could get (on windows) calling the libxml2 from D.  I did not get very far, and gave up after crashes and a headache.

On the GC cleanup stats, with 5 nodes of the generated test was cleanable, but on 50,000, well, the poor GC could not delete a single Node object. I've only done this with dmd2_32 on windows. 64 bit may be different. Its also Github release of 2.058 - 162, and may not be in releasable state for GC.

The original std.xml seemed to fair best, on GC cleanup, but still developed troubles on each larger document size.   std.xmlp.arraydom was similar.  Both of these have no back pointers to parents, use an array to hold children.

I thought of just nulling the backpointers, and wrote a version of an explode method. I tried this with the node version doing, or not doing a delete.  With a big document, it currently has to do a delete, or the GC will not clean up everything.

I turned the gc stats counters, into a mixin template, and put them on the std.xml1, and on the std.arraydom, and any intermediate objects from the parsing, that I wanted to confirm were not hanging around. For to many of them, I found it necessary to chase them down, and force delete (I now prefer explode, because delete implies a complete cleanup, and for me, explde means to split into little non-referencing pieces, that the GC can manage).

This means I use too many pointer linked structures for the GC. It may be just as difficult for RedBlack tree in std.container, I haven't checked.

For me one lesson at least, is, do not assume a set of interlocked classes, with circular references, can always be isolated in a single full GC collection, and properly deleted. For internal temporaries, that will never be seen by external code, it seems a good idea to explode them.

For large complex tree structures, it might be less CPU time to explode them, rather than this particular version of GC having to work it out with less information. Do explode again.

After exploding, repeated runs and bigger loads ran optimally.


So I read with interest other posts on mechanisms of GC and how to integrate with the compiler to tag classes and stack layout, to know pointers precisely. I particularly liked the suggestion for mixin on each class, for GC layout.

What could be done, even adhoc, is to do GC stats on various kinds of complex self referring classes, to examine the effectiveness of GC collection. This could even be a compiler flag, or version. It was useful for me to spot where I could reuse a class rather than reconstruct new.

A lot of D applications may be run once, and GC doesn't matter so much. But for long running games, and servers, verified GC behaviour will bring more confidence.

Aggressive measurement may allow better detection of where and when the GC fails, and allow better assessment of effects of GC code changes.


GC is a tough problem. A hard one to write a whole language based around it.

Forums