Jump to page: 1 2
Thread overview
GC memory fragmentation
Apr 11, 2021
tchaloupka
Apr 11, 2021
Nathan S.
Apr 11, 2021
tchaloupka
Apr 11, 2021
ryuukk_
Apr 12, 2021
ryuukk_
Apr 12, 2021
frame
Apr 12, 2021
Sebastiaan Koppe
Apr 12, 2021
Per Nordlöw
Apr 12, 2021
Per Nordlöw
Apr 12, 2021
Per Nordlöw
Apr 13, 2021
tchaloupka
Apr 14, 2021
Tobias Pankrath
Apr 14, 2021
tchaloupka
Apr 16, 2021
sarn
Apr 12, 2021
Imperatorn
Apr 14, 2021
Heromyth
Apr 14, 2021
Imperatorn
Nov 03, 2021
Heromyth
Nov 03, 2021
Imperatorn
Apr 17, 2021
kdevel
April 11, 2021

Hi,
we're using vibe-d (on Linux) for a long running REST API server and have problem with constantly growing memory until system kills it with OOM killer.

First guess was some memory leak so we've added periodic call to:

GC.collect();
GC.minimize();
malloc_trim(0);

And when called very often (ie 500ms) service memory stays stable and doesn't grow, so it doesn't look as a memory leak (or at least not that fast to explain that service can go from 90MB to 900MB in 2 days).

But this is very bad for performance so we've prolonged the interval for example to 30 seconds and now memory still goes up (not that dramatically but still).

Stats of the GC when it growed up between 30s are these (logged after GC.collect() and GC.minimize():

GC.stats: usedSize=9994864, freeSize=92765584, total=102760448
GC.stats: usedSize=11571456, freeSize=251621120, total=263192576

Before the grow it has 88MB free space from 98MB total allocated.
After 30s it has 239MB free from 251MB allocated.

So it wastes a lot of free space which it can't return back to OS for some reason.

Can these numbers be caused by memory fragmentation? There is probably a lot of small allocations (postgresql query, result processing and REST API json serialization).

Only explanation that makes some sense is that in some operation there is required memory allocations that can't be fulfilled with current memory pool (ie due to the memory fragmentation in it) and then it allocates some data in new memory segment that can't be returned afterwards as it still holds the 'live' data. But that should be freed too at some point and GC should minimize (if another request doesn'cause allocation in the 'wrong' page again).

But still, the amount of used vs free memory seems wrong, its a whooping 95% of free space that can't be minimized :-o. I have a problem imagining some fragmentation in it.

Are there any tools that can help diagnosing this more?

Also note that malloc_trim is not called from the GC itself and as internally used malloc handles it's own memory pool it has it's own quirks with returning unused memory back to the OS (it does it only on free() in some cases).
See for example: http://notes.secretsauce.net/notes/2016/04/08_glibc-malloc-inefficiency.html

Behavior of malloc can be controlled with mallopt with:

M_TRIM_THRESHOLD
    When  the  amount  of contiguous free memory at the top of the heap grows sufficiently large, free(3) employs sbrk(2) to release this memory back to the system.  (This can be useful in programs that continue to execute
    for a long period after freeing a significant amount of memory.)  The M_TRIM_THRESHOLD parameter specifies the minimum size (in bytes) that this block of memory must reach before sbrk(2) is used to trim the heap.

    The default value for this parameter is 128*1024.

But the default is 128kB free heap memory block for trim to activate but when malloc_trim is called manually, a much larger block of memory is often returned. Thats puzzling too :)

April 11, 2021

On Sunday, 11 April 2021 at 09:10:22 UTC, tchaloupka wrote:

>

Hi,
we're using vibe-d (on Linux) for a long running REST API server and have problem with constantly growing memory until system kills it with OOM killer.

One thing that comes to mind: is your application compiled as 32-bit? The garbage collector is much more likely to leak memory with a 32-bit address space since it much more likely for a random int to appear to be a pointer to the interior of a block of GC-allocated memory.

April 11, 2021

On Sunday, 11 April 2021 at 12:20:39 UTC, Nathan S. wrote:

>

One thing that comes to mind: is your application compiled as 32-bit? The garbage collector is much more likely to leak memory with a 32-bit address space since it much more likely for a random int to appear to be a pointer to the interior of a block of GC-allocated memory.

Nope it's 64bit build.
I've also tried to switch to precise GC with same result.

Tom

April 11, 2021

On Sunday, 11 April 2021 at 13:50:12 UTC, tchaloupka wrote:

>

On Sunday, 11 April 2021 at 12:20:39 UTC, Nathan S. wrote:

>

One thing that comes to mind: is your application compiled as 32-bit? The garbage collector is much more likely to leak memory with a 32-bit address space since it much more likely for a random int to appear to be a pointer to the interior of a block of GC-allocated memory.

Nope it's 64bit build.
I've also tried to switch to precise GC with same result.

Tom

Try to tweak the GC, first could try with the precise one?

Also try to enable GC profiling with profile:1 to know what exactly eats memory, i suspect you have a memory leak somewhere, maybe in one of the native libs you are using for your SQL queries?


extern(C) __gshared string[] rt_options = [
"gcopt=gc:precise heapSizeFactor=1.2 profile:1"
];

April 12, 2021

I should have added

https://dub.pm/package-format-json#build-types

profileGC as build option in your dub file

That will generate a profile_gc file once the program exit, a table that lists all the allocations

April 12, 2021

On Sunday, 11 April 2021 at 09:10:22 UTC, tchaloupka wrote:

>

Hi,
we're using vibe-d (on Linux) for a long running REST API server and have problem with constantly growing memory until system kills it with OOM killer.

Do you have a manual GC.free() in your code, maybe with a larger array?

>

Only explanation that makes some sense is that in some operation there is required memory allocations that can't be fulfilled with current memory pool (ie due to the memory fragmentation in it) and then it allocates some data in new memory segment that can't be returned afterwards as it still holds the 'live' data. But that should be freed too at some point and GC should minimize (if another request doesn'cause allocation in the 'wrong' page again).

If so, setting minPoolSize:1 could help you to control the memory usage, and if this memory is kept, you could inspect whats inside the particular pool item (at least for GC allocated stuff).

April 12, 2021

On Sunday, 11 April 2021 at 09:10:22 UTC, tchaloupka wrote:

>

Hi,
we're using vibe-d (on Linux) for a long running REST API server and have problem with constantly growing memory until system kills it with OOM killer.

[...]

But this is very bad for performance so we've prolonged the interval for example to 30 seconds and now memory still goes up (not that dramatically but still).

[...]

So it wastes a lot of free space which it can't return back to OS for some reason.

Can these numbers be caused by memory fragmentation? There is probably a lot of small allocations (postgresql query, result processing and REST API json serialization).

We have similar problems, we see memory usage alternate between plateauing and then slowly growing. Until it hits the configured maximum memory for that job and the orchestrator kills it (we run multiple instances and have good failover).

I have reduced the problem by refactoring some of our gc usage, but the problem still persists.

On side-note, it would also be good if the GC can be aware of the max memory it is allotted so that it knows it needs to do more aggressive collections when nearing it.

April 12, 2021

On Sunday, 11 April 2021 at 09:10:22 UTC, tchaloupka wrote:

>

Hi,
we're using vibe-d (on Linux) for a long running REST API server and have problem with constantly growing memory until system kills it with OOM killer.

[...]

Looks like the GC needs some love

April 12, 2021

On Monday, 12 April 2021 at 07:03:02 UTC, Sebastiaan Koppe wrote:

>

On side-note, it would also be good if the GC can be aware of the max memory it is allotted so that it knows it needs to do more aggressive collections when nearing it.

I'm surprised there is no such functionality available.
It doesn't sound to me like it's that difficult to implement.

April 12, 2021

On Monday, 12 April 2021 at 20:50:49 UTC, Per Nordlöw wrote:

>

I'm surprised there is no such functionality available.
It doesn't sound to me like it's that difficult to implement.

Afaict, we're looking for a way to call GC.collect() when GC.Status.usedSize [1] reaches a certain threshold.

I wonder how other GC-backed languages handle this.

[1] https://dlang.org/phobos/core_memory.html#.GC.Stats.usedSize

« First   ‹ Prev
1 2