Thread overview
I don't get the GC. (heapSizeFactor followup)
Jan 16, 2023
FeepingCreature
Jan 17, 2023
FeepingCreature
Jan 17, 2023
FeepingCreature
January 16, 2023

A follow-up to https://forum.dlang.org/thread/befrzndhowlwnvlqcoxx@forum.dlang.org

I don't get it.

Okay, this is how I understand the conservative GC to work:

  • do allocations
  • allocations increment usedSmallPages/usedLargePages
  • at some time, we cross smallCollectThreshold/largeCollectThreshold
    • then we do a fullcollect, ie. mark, sweep
    • then set the new threshold to usedSmallPages * heapSizeFactor, which defaults to 2
      • plus some smooth-decay magic that means threshold doesn't go down super fast
    • unlock and resume

Rinse and repeat.

We have a service. It's pretty std.json intensive, it handles a lot of networking on startup, and it has about 30 threads. When we start it up with everything at default settings, it uses 3.3GB.

This would indicate, given heapSizeFactor=2 by default, that the high-water mark of used pages is 1.6GB.

We've tried setting heapSizeFactor to 0.25. This is not quite equivalent to running the GC on every allocation (smallCollectThreshold < usedSmallPages), but AIUI it first runs the GC every time it would allocate a new pool. It's pretty aggressive; something like 70% of our startup performance goes to GC. But, after startup, the service sits at RSS 961MB.

Here's the confusing part. RSS 961MB is an upper limit on the live memory. If this is correct, the smallCollectThreshold after startup should be at most 1.6GB? Right? So the untuned GC should not let the process grow above 1.6GB? Right? But it's 3.3GB, more than twice that.

I mean, wrong, because we might get unlucky and run our GC when we have a lot of temporary memory live. But we'd need to have twice as much temporary memory truly live during startup as we do after startup, right? And if I watch the RSS during startup with heapSizeFactor=0.25, I don't see it above 961MB ever. And it can't be that net queues run emptier during startup with heapSizeFactor=0.25 than without, because the process is a lot slower with heapSizeFactor=0.25 than without! It should clear queues faster with it off!

So what is the GC doing?!

January 17, 2023
Right goal, wrong questions.

The process memory consumption may not be what the GC is consuming. You need to measure that first before questioning if the GC is the one doing it wrong.

For instance a badly acting buddy allocator in malloc could double memory like you're seeing. So swapping malloc may give the results you desire. But first rule out the GC.
January 17, 2023

On Monday, 16 January 2023 at 23:50:35 UTC, Richard (Rikki) Andrew Cattermole wrote:

>

Right goal, wrong questions.

The process memory consumption may not be what the GC is consuming. You need to measure that first before questioning if the GC is the one doing it wrong.

For instance a badly acting buddy allocator in malloc could double memory like you're seeing. So swapping malloc may give the results you desire. But first rule out the GC.

Okay, queried GC stats instead of looking at RSS. Now I randomly get 2.6GB after start without heapSizeFactor, whatever, random variation. But the interesting thing is:

  • heapSizeFactor=0.25: usedSize 704MB freeSize 202MB
  • heapSizeFactor=2 (default): usedSize 1568MB freeSize 810MB

So why is twice as much GC memory reachable without heapSizeFactor?

January 17, 2023

On Tuesday, 17 January 2023 at 06:42:33 UTC, FeepingCreature wrote:

>

Okay, queried GC stats instead of looking at RSS. Now I randomly get 2.6GB after start without heapSizeFactor, whatever, random variation. But the interesting thing is:

  • heapSizeFactor=0.25: usedSize 704MB freeSize 202MB
  • heapSizeFactor=2 (default): usedSize 1568MB freeSize 810MB

So why is twice as much GC memory reachable without heapSizeFactor?

Okay hang on no.

If I actually call GC.collect before measuring, I do get the proper usedSize 652MB freeSize 2302MB. So the GC insists that it transitively needed 3GB? It seems the GC's claim is "because your program ran faster, it did something that ballooned its used memory to 3GB before it sank back down." ... Right?

But that's impossible. This service is entirely network triggered. It can't use more live memory by running faster.

Could the GC's used-memory estimate have gotten messed up somehow/somewhere? I don't see where that could be in the code though.