Write-up over on https://gist.github.com/FeepingCreature/a47a3daed89d905668da08effaa4d6cd . I'll duplicate the content here as well, but I'm not sure if Github will be happy with hosting the images externally. If the graphs don't load, go to the Gist instead.
The D GC
The D GC is tuned by default to trade memory for performance.
This can be clearly seen in the default heap size target of 2.0,
ie. the GC will prefer to just allocate more memory until less than half the heap memory is alive.
But with long-running user-triggered processes, memory can be more at a premium than CPU is, and larger heaps also mean slower collection runs.
Can we tweak GC parameters to make D programs use less memory? More importantly, what is the effect of doing so?
There are two important parameters:
heapSizeFactordefines the target "used heap to live memory" ratio. It defaults to
maxPoolSizedefines the maximum pool size, which is the unit by which D allocates (and releases) memory from the operating system.
So residential memory usage will generally grow in units of
You can manually vary these parameters by passing
--DRT-gcopt="heapSizeFactor:1.1 maxPoolSize:8" to any D program.
As a reference program, I'll use my heap fragmentation/GC leak testcase from
"Why does this simple program leak 500MB of RAM?".
So here's a diagram of RSS memory usage and program runtime as I adjust
heapSizeFactor (on the X axis).
We can clearly see two things:
- the D GC is extremely random in actual heap usage (as expected for a system without per-thread pools)
but becomes less so as collections get more frequent
- you can get a significant improvement in memory usage for very little cost
- something wild happens between
Clearly, using a linear scale was a mistake. Let's try a different progression defined by
1 + 1 / (1.1 ^ x)):
I've added four runs with different
maxPoolSize settings. Several additional things become clear:
- the exponential scale was the right way to go
- GC CPU usage goes up slower than memory goes down, indicating significant potential benefit.
Interestingly, adjustments between 2 and 1.1 seem to have very little effect.
Pretty much the only thing that matters is the number of zeroes and maybe the final digit.
For instance, if you're willing to accept a doubling of GC cost for a halving of RAM,
you should tune your
heapSizeFactor to 1.002.
Annoyingly, there seems to be no benefit from
maxPoolSize. The reduction in memory that you attain by smaller pools
is pretty much exactly made up by the increased CPU use, so that you could gain the same reduction by just running the GC
more often via
heapSizeFactor. Still, good to know.
Note that this benchmark was performed with an extremely GC hungry program. Performance impact and benefit may vary with type
of process. Nonetheless, I'll be attempting and advocating to run all but the most CPU-hungry of our services with
Why do more aggressive GC runs reduce total memory used? I can't help but think it's down to heap fragmentation. D's GC is non-moving, meaning once a pointer is allocated, it has to stay there until it is freed. As a result, for programs that mix long-lived and short-lived allocations, such as "anything that parses with std.json" and "anything that uses threads at all", a pool that was only needed at peak memory usage may be kept alive by a small number of surviving allocations. In that case, more frequent GC runs will allow the program to pack more actually-alive content into the pools already allocated, reducing the peak use and thus fragmentation. In the long run it averages out, but in the long run I restart the service cause it uses too much memory anyways.
At any rate, without fundamental changes to the language, such as finding ways to make at least some allocations movable, there isn't anything to be done. For now, the default setting for
heapSizeFactor of 2 may be good for benchmarks, but for long-running server processes, I suspect it makes the GC look worse than it is.