Thread overview
Manually freeing up memory
Nov 07, 2012
bearophile
Nov 07, 2012
bearophile
Nov 07, 2012
H. S. Teoh
Nov 08, 2012
Marco Leise
Nov 10, 2012
Rob T
November 07, 2012
Hello all,

I'm doing some work with a fairly large dataset.  For various reasons it's convenient to import it first as simply an array of data points which is then used to generate other data structures (actually, technically it's an array of data points plus a couple of associative arrays, which ideally would instead be sets; but I think that's a minor detail).

Once the various data structures are in place, it's possible to discard the initial array data.  It would be very desirable to free up the memory allocated, as it's a very large amount.  However, I can't work out how to do this.

I've tried calling destroy() on the input data, with and without a subsequent GC.collect(), but the program's memory usage still remains at its peak level. This is a shame, because that peak memory usage only needs to last for a short part of the program's total runtime, and it seems only polite to other computer users to give back the excess memory.

Can anyone advise?  I would rather not disable the GC entirely as there's lots of Phobos I want to be able to use -- but I'd really like it if I could indicate categorically to the GC, "these objects and arrays need to be deleted and the memory freed _now_".

Thanks and best wishes,

      -- Joe
November 07, 2012
Joseph Rushton Wakeling:

> Can anyone advise?  I would rather not disable the GC entirely as there's lots of Phobos I want to be able to use -- but I'd really like it if I could indicate categorically to the GC, "these objects and arrays need to be deleted and the memory freed _now_".

One solution is to allocate the original array on the C heap. Another solution is to allocate it normally from the GC heap and then use GC.free().

Maybe a third option is to use a memory-mapped file for the first array.

Bye,
bearophile
November 07, 2012
On 11/07/2012 03:17 PM, bearophile wrote:
> One solution is to allocate the original array on the C heap. Another solution
> is to allocate it normally from the GC heap and then use GC.free().

Well, what I've got is something like this:

      auto raw = rawInput();      /* loads data and outputs a struct containing
                                     the array of data */
      auto data = rawToData(raw); // converts the raw input to data structure
      GC.free(raw.links.ptr);     // _should_ free up the allocated memory?

... but despite the GC.free(), memory usage stays at peak level for the rest of the runtime of the function.

I tried preceding the free() with a destroy(raw) or destroy(raw.links) also to no avail.

> Maybe a third option is to use a memory-mapped file for the first array.

That's an interesting thought, which I'll look into.  Another thought was to dump the data into an SQL DB and read/sample from there as necessary, but IIRC the SQL support available for D is somewhat limited right now ... ?
November 07, 2012
Joseph Rushton Wakeling:

> ... but despite the GC.free(), memory usage stays at peak level for the rest of the runtime of the function.

GC.free() usually works. Some memory allocators don't give back the memory to the OS, no matter what, until the process is over, despite that memory is free for the process to use in other ways (this is what often happens in Python on Windows).

If I am right, then if you try to allocate memory from the same program after GC.free() the total memory used by that process will not increase.

Bye,
bearophile
November 07, 2012
On Wed, Nov 07, 2012 at 06:12:52PM +0100, bearophile wrote:
> Joseph Rushton Wakeling:
> 
> >... but despite the GC.free(), memory usage stays at peak level for the rest of the runtime of the function.
> 
> GC.free() usually works. Some memory allocators don't give back the memory to the OS, no matter what, until the process is over, despite that memory is free for the process to use in other ways (this is what often happens in Python on Windows).
[...]

I think on Posix systems, malloc/free does not return freed memory back to the OS, it just gets reused by the process later on.

If you want to return memory back to the OS, you could call sbrk()... but that is highly *NOT* recommended unless you know exactly what you're doing, and you know the innards of your C library (*and* D runtime) like the back of your hand. But it *is* the "hardcore" way of doing it. :-)

An easier workaround might be to fork() a process that constructs whatever data structures you need, transmits that to the main process somehow, then exit. If I understand it correctly, the large memory allocations will be restricted to the child process, which will get returned to the OS once it exits. (Note that you have to use fork(), not threads, because threads share memory in the same process so you end up with the same problem.)


T

-- 
Question authority. Don't ask why, just do it.
November 07, 2012
On 11/07/2012 06:53 PM, H. S. Teoh wrote:
> I think on Posix systems, malloc/free does not return freed memory back
> to the OS, it just gets reused by the process later on.

I have to say that in this program, it looks like the memory usage keeps increasing even after the free(), even though theoretically the amount it's possible to free up would dwarf any subsequent memory requirements.

Using GC.missing() seems to return a very little bit of memory to the OS, depending on which compiler is used, but nowhere near the amount it's theoretically possible to hand back.

> An easier workaround might be to fork() a process that constructs
> whatever data structures you need, transmits that to the main process
> somehow, then exit. If I understand it correctly, the large memory
> allocations will be restricted to the child process, which will get
> returned to the OS once it exits. (Note that you have to use fork(), not
> threads, because threads share memory in the same process so you end up
> with the same problem.)

Nice thought!  I'll have a look at doing this.

November 08, 2012
Am Wed, 07 Nov 2012 19:56:35 +0100
schrieb Joseph Rushton Wakeling <joseph.wakeling@webdrake.net>:

> On 11/07/2012 06:53 PM, H. S. Teoh wrote:
> > I think on Posix systems, malloc/free does not return freed memory back to the OS, it just gets reused by the process later on.
> 
> I have to say that in this program, it looks like the memory usage keeps increasing even after the free(), even though theoretically the amount it's possible to free up would dwarf any subsequent memory requirements.

Could it be that you still hold a reference to the raw memory
in your data structures ? A slice would be a typical candidate:
s.name = raw[a .. b];
You probably checked that already...

-- 
Marco

November 08, 2012
On 11/08/2012 05:50 AM, Marco Leise wrote:
> Could it be that you still hold a reference to the raw memory
> in your data structures ? A slice would be a typical candidate:
> s.name = raw[a .. b];
> You probably checked that already...

I don't _think_ so, although there is a point where data is passed to another struct something like this:

  foreach(link; raw.links)   // raw is struct, links is array
      data.add(link.expand); // each entry in links is a Tuple!(size_t, size_t)

where add() takes as input a pair of size_t's.  I assumed the values here would be copied.  I've tried tweaking it to take out the link.expand and it makes no difference.
November 10, 2012
On Thursday, 8 November 2012 at 04:51:00 UTC, Marco Leise wrote:
> Could it be that you still hold a reference to the raw memory
> in your data structures ? A slice would be a typical candidate:

Good point. I find that with GC'd memory, you have to diligently keep track of where and when your references will be deallocated to ensure there are no persistent references left dangling by mistake.

I find that apps built with GC languages like Java tend to suffer from severe memory leak issues, perhaps due to persistent referenced memory that the programmer is unaware about.

I come from C++ background so I am painfully aware of why I cannot lower my guard just because there's a CG kicking about, in fact I find myself much more concerned than ever because I'm never certain when the GC will kick in, or if it will do the job correctly, and so forth.


--rt