September 03, 2006
Serg Kovrov wrote:
> Walter Bright wrote:
>> Not necessarily. There's a difference between physical memory and virtual memory. The OS will automatically take away from the process any unused physical memory, although the process retains it as virtual memory. If the process explorer watches physical memory, then that's what it's seeing.
> I'm not trying to argue here, but to understand. I done some search on subject of memory usage. So far, I see it as I described before - to measure application's memory usage, best way is to watch 'private bytes' counter in Perfmon (or via other tools that (I believe) uses Perfmon, such as Process Explorer or FAR's Process list plugin).

If those tools are measuring physical memory, that has no connection to whether memory is returned to the operating system or not.

> For example, quote from http://shsc.info/WindowsMemoryManagement:
>> ... is what's called "private bytes" by the Performance control panel.
>> It's memory that's private to that process; generally it's the amount
>> of RAM the process has asked for in order to store runtime data.
> 
> And as I said before C++'s 'delete' (not sure about all implementations, but at least GCC/mingw and VC++2005) do free memory from this 'private bytes' thing.
> 
> You said it is not really returned to OS, but get in to some pool. Ok, could you explain further, is it my application's pool, or some shared pool?

Your application's pool, which is implemented by the runtime library. You can see a sample implementation of this in K+R's "The C Programming Language" book. malloc/free/new/delete are just functions that are part of your program.


> What I mean, is: my application allocated and used this memory, but not anymore - it has been 'delete'd. Application will not need a new memory from OS for some time, or even possibly never. Can this freed(deleted) memory be used elsewhere? Please note, I talking about mentioned here C++ implementation that actually affect 'private bytes' counter.

To understand this, you really need to understand about how demand paged virtual memory works, and the difference between virtual and physical memory: http://en.wikipedia.org/wiki/Virtual_memory


September 03, 2006
Walter Bright wrote:
> If those tools are measuring physical memory, that has no connection to whether memory is returned to the operating system or not.

Private Bytes is a virtual memory counter.

-- 
serg.
September 03, 2006
Serg Kovrov wrote:
> Walter Bright wrote:
>> If those tools are measuring physical memory, that has no connection to whether memory is returned to the operating system or not.
> 
> Private Bytes is a virtual memory counter.

You might find this helpful from http://blogs.msdn.com/ricom/archive/2005/08/01/446329.aspx :

"The Private Bytes counter reports the commit charge of the process. That is to say, the
amount of space that has been allocated  in the swap file to hold the contents of the
private memory in the event that it is swapped out.  Note: I'm avoiding the word "reserved"
because of possible confusion with virtual memory in the reserved state which is not
committed.

So, if we are concerned with space allocated in the swap file then the Private Bytes counter
is right on the money.  However, that is not usually what we are concerned about.  We're
much more interested in memory pressure caused by copies of private bytes in multiple
processes.  That is to say we are concerned about the physical memory that has been allocated
to hold those private bytes.

Why might/does the operating system allocate space in the swap file to hold the contents of
memory whose contents have never become resident and may never become resident?  The answer
is not actually so complicated:  Windows cannot deliver an "out of memory" exception/error
just because you tried writing to, for instance, a static variable.  The swap space must be
pre-allocated at a reasonable time (such as loading a DLL) so that we can deliver an error
result at a reasonable time -- the time at which the virtual addresses changed from
reserved to committed.

So if we had some static data that could be modified, but usually isn't modified in
practice, and maybe isn't even read, you would find that the Private Bytes counter
overstated the cost of that storage.  The only real cost is the small bit of housekeeping
to track allocated space in the swap file.  No private memory need be allocated.

In contrast, the output of vadump deals directly in what is resident and so, if memory is
tight, it may understate the true memory cost because some things may have already been
swapped out by the time you ask for the current resident pages. However, if memory is
abundant then swapping is not going to happen and you can get a true picture of all the
required pages from vadump."

September 04, 2006
Walter Bright wrote:
> Serg Kovrov wrote:
>> Walter, thanks for your kind and comprehensive answer, but I still do not get this 'malloc/free do not return memory to OS' part.
>>
>> I just tried simplest example with GCC(mingw): I allocated with 'new' 1Mb string. Then in process explorer (or in FAR's Process list) I can see my process usage of 'Private bytes' increases equivalently. When free this string, with 'delete', i see that 'Private bytes' usage dropped to same level it was before. Isn't this mean that memory returned to OS?
> 
> Not necessarily. There's a difference between physical memory and virtual memory. The OS will automatically take away from the process any unused physical memory, although the process retains it as virtual memory. If the process explorer watches physical memory, then that's what it's seeing.

I think Serg is right. I don't know what program he used to monitor  the mem usage ("process explorer", is that the Windows Task Manager?), but my test confirms what he's saying.

I made a small C program that allocs 100Mb and then frees it. I monitored it with Windows XP Task Manager process list, looking at the fields "Mem Usage" (physical mem) and "VM Size". Compiling with GCC (3.2.3), it does free back the memory to the OS (VM Size decreases after the free). I tried the program with DMC but the same does not happen: the memory is not returned after the free. I don't have visual C available right now so I didn't try that one.

After a google search:
http://www.gnu.org/software/libc/manual/html_node/Freeing-after-Malloc.html
"Occasionally, free can actually return memory to the operating system and make the process smaller."
Evidently the Windows version of libc also does the same. And seems it's smart enough to return the mem not just when the top of the heap is all free (what I was expecting) but also with free pages in the middle of the heap. (the test program allocs two segments and frees the first only)


---- test program ----
#include <stdio.h>
#include <assert.h>

int main() {
	char buf[80];
	
	char* ptr = (char*) malloc(100000000);
	assert(ptr != NULL);

	char* ptr2 = (char*) malloc(100000000);
	assert(ptr2 != NULL);

	assert(ptr2 > ptr);

	
	printf("Mem Allocated\n"); fflush(stdout);
	gets(buf);
	// Just to see the physical mem usage
	int i;
	for(i = 0; i < 20000; i++)
		ptr[i*4000] = 'X';

	printf("Memory Writen\n"); fflush(stdout);
	gets(buf);

	free(ptr);
	printf("ptr1 free'd\n"); fflush(stdout);

	gets(buf);
}

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
September 04, 2006
* Bruno Medeiros:
> I don't know what program he used to monitor  the mem usage
> ("process explorer", is that the Windows Task Manager?), but my test confirms what he's saying.

I use sysinternals process explorer as substitute to windows task manager. I believe it reads perfmon counters for that purpose. Great tool, BTW. Worth looking at:
http://www.sysinternals.com/Utilities/ProcessExplorer.html

-- 
serg.
September 04, 2006
Bruno Medeiros wrote:
> I think Serg is right. I don't know what program he used to monitor  the mem usage ("process explorer", is that the Windows Task Manager?), but my test confirms what he's saying.
> 
> I made a small C program that allocs 100Mb and then frees it. I monitored it with Windows XP Task Manager process list, looking at the fields "Mem Usage" (physical mem) and "VM Size". Compiling with GCC (3.2.3), it does free back the memory to the OS (VM Size decreases after the free). I tried the program with DMC but the same does not happen: the memory is not returned after the free. I don't have visual C available right now so I didn't try that one.
> 
> After a google search:
> http://www.gnu.org/software/libc/manual/html_node/Freeing-after-Malloc.html
> "Occasionally, free can actually return memory to the operating system and make the process smaller."
> Evidently the Windows version of libc also does the same. And seems it's smart enough to return the mem not just when the top of the heap is all free (what I was expecting) but also with free pages in the middle of the heap. (the test program allocs two segments and frees the first only)

What your program is doing is allocating a gigantic chunk. It is a reasonable thing for new() to have a fork in it, and for gigantic chunks allocate/free them by calling the OS directly and not attempting to manage it. Thus you would see the behavior you see.

Try allocating a large number of small chunks, and free them.
September 05, 2006
Walter Bright wrote:
> Bruno Medeiros wrote:
>> I think Serg is right. I don't know what program he used to monitor  the mem usage ("process explorer", is that the Windows Task Manager?), but my test confirms what he's saying.
>>
>> I made a small C program that allocs 100Mb and then frees it. I monitored it with Windows XP Task Manager process list, looking at the fields "Mem Usage" (physical mem) and "VM Size". Compiling with GCC (3.2.3), it does free back the memory to the OS (VM Size decreases after the free). I tried the program with DMC but the same does not happen: the memory is not returned after the free. I don't have visual C available right now so I didn't try that one.
>>
>> After a google search:
>> http://www.gnu.org/software/libc/manual/html_node/Freeing-after-Malloc.html 
>>
>> "Occasionally, free can actually return memory to the operating system and make the process smaller."
>> Evidently the Windows version of libc also does the same. And seems it's smart enough to return the mem not just when the top of the heap is all free (what I was expecting) but also with free pages in the middle of the heap. (the test program allocs two segments and frees the first only)
> 
> What your program is doing is allocating a gigantic chunk. It is a reasonable thing for new() to have a fork in it, and for gigantic chunks allocate/free them by calling the OS directly and not attempting to manage it. Thus you would see the behavior you see.
> 
> Try allocating a large number of small chunks, and free them.

Hum, I modified the program and tried 4 more tests:

Allocating 10000 (sequential) chunks of size 10000 bytes. Free them all.
-> All the 100Mb of memory is returned to the OS.

Allocating 100000 (sequential) chunks of size 1000 bytes. Free them all.
-> All the 100Mb of memory is returned to the OS.

Allocating 10000 (sequential) chunks of size 10000 bytes. Free only half of them, the even numbered ones, so that the total freed memory is not contiguous.
-> Of the 50Mb memory free'd, about 30Mb of memory is returned to the OS. Expected due to page segmentantion/rounding.

Allocating 100000 (sequential) chunks of size 1000 bytes. Again free only the even numbered ones.
-> 50Mb memory is free'd, but no memory is returned to the OS. Expected due to page segmentantion/rounding.

So it seems glibc does its best, it returns any page if it is all free. (hum, and I'm curious to what the results in VC++ are, if anyone tries it, do post the results)

------ test program ------
#include <stdio.h>
#include <assert.h>

#define CHUNKSIZE 10000
#define NUMCHUNKS 10000
#define CHUNKINC 1 //use CHUNKINC 2 for even numbered freeing

int main() {
	char buf[666];
	int i;
	char* ptrs[NUMCHUNKS];
	
	printf("NumChunks: %d Size: %d Inc: %d\n", NUMCHUNKS, CHUNKSIZE, CHUNKINC);
	fflush(stdout); gets(buf);
	
	for(i = 0; i < NUMCHUNKS; ++i) {
		ptrs[i] = (char*) malloc(CHUNKSIZE);
		assert(ptrs[i] != NULL);
	}

	char* ptrs2[NUMCHUNKS];

	for(i = 0; i < NUMCHUNKS; ++i) {
		ptrs2[i] = (char*) malloc(CHUNKSIZE);
		assert(ptrs2[i] != NULL);
		assert(ptrs2[i] > ptrs[NUMCHUNKS-1]);
	}
	printf("Mem Allocated\n"); fflush(stdout); gets(buf);
	
/*	for(i = 0; i < NUMCHUNKS; i++)
		ptrs[i][0] = 'X';
	printf("Memory Writen\n"); fflush(stdout);	gets(buf);
*/

	for(i = 0; i < NUMCHUNKS; i += CHUNKINC) {
		free(ptrs[i]);
	}
	printf("Mem free'd\n"); fflush(stdout); gets(buf);
}

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
September 05, 2006
* Bruno Medeiros:
> Walter Bright wrote:
>> What your program is doing is allocating a gigantic chunk. It is a reasonable thing for new() to have a fork in it, and for gigantic chunks allocate/free them by calling the OS directly and not attempting to manage it. Thus you would see the behavior you see.
>>
>> Try allocating a large number of small chunks, and free them.
> 
> Hum, I modified the program and tried 4 more tests:
> 
> Allocating 10000 (sequential) chunks of size 10000 bytes. Free them all.
> -> All the 100Mb of memory is returned to the OS.
> 
> Allocating 100000 (sequential) chunks of size 1000 bytes. Free them all.
> -> All the 100Mb of memory is returned to the OS.
> 
> Allocating 10000 (sequential) chunks of size 10000 bytes. Free only half of them, the even numbered ones, so that the total freed memory is not contiguous.
> -> Of the 50Mb memory free'd, about 30Mb of memory is returned to the OS. Expected due to page segmentantion/rounding.
> 
> Allocating 100000 (sequential) chunks of size 1000 bytes. Again free only the even numbered ones.
> -> 50Mb memory is free'd, but no memory is returned to the OS. Expected due to page segmentantion/rounding.
> 
> So it seems glibc does its best, it returns any page if it is all free. (hum, and I'm curious to what the results in VC++ are, if anyone tries it, do post the results)

I can't be sure, but it's possible that Microsoft runtime library 'delete' implementation calls _heapmin() [http://msdn2.microsoft.com/en-us/library/fc7etheh.aspx] for that purpose. I red about it some time ago in a post on comp.os.ms-windows.programmer. The actual discussion was about delete not returning memory to OS =) I think it really depends on some other factors. The point was that 'delete'(or 'free') *could* return memory to OS, but not guaranteed.

Phobos' GC has a stub f-n std.gc.minimize(), which probably intended for same purpose. I hope Walter could provide an update on it real purpose and implementation plans.

-- 
serg.
September 05, 2006
10_000 chunks of 10_000 (freeing everything - CHUNKINC = 1):

On startup...
	Mem usage: 612k
	VM Size: 240k

After alloc:
	Mem usage: 119,504k
	VM Size: 195k

After free:
	Mem usage: 60,272k
	VM Size 98,296k


DMD version... same constants.

Startup...
	Mem usage: 1,304k
	VM Size: 696k

After alloc (which takes much longer):
	Mem usage: 248,208k
	VM Size: 248,208k

After free: same as above.

I used the below source.

-[Unknown]


import std.c.stdio;

const CHUNKSIZE = 10_000;
const NUMCHUNKS = 10_000;
const CHUNKINC = 1; // Use CHUNKINC 2 for even numbered freeing.

int main()
{
    char buf[666];
    int i;
    ubyte*[NUMCHUNKS] ptrs;

    printf("NumChunks: %d Size: %d Inc: %d\n", NUMCHUNKS, CHUNKSIZE, CHUNKINC);
    fflush(stdout); gets(buf);

    for (i = 0; i < NUMCHUNKS; ++i)
    {
        ptrs[i] = new ubyte[CHUNKSIZE];
        assert (ptrs[i] != null);
    }

    ubyte*[NUMCHUNKS] ptrs2;

    for(i = 0; i < NUMCHUNKS; ++i) {
        ptrs2[i] = new ubyte[CHUNKSIZE];
        assert (ptrs2[i] != null);
        assert (ptrs2[i] > ptrs[NUMCHUNKS - 1]);
    }
    printf("Mem Allocated\n"); fflush(stdout); gets(buf);

/*
    for (i = 0; i < NUMCHUNKS; i++)
        ptrs[i][0] = 42;
    printf("Memory Writen\n"); fflush(stdout); gets(buf);
*/

    for (i = 0; i < NUMCHUNKS; i += CHUNKINC)
    {
        delete ptrs[i];
    }
    printf("Mem free'd\n"); fflush(stdout); gets(buf);

    return 0;
}

> Hum, I modified the program and tried 4 more tests:
> 
> Allocating 10000 (sequential) chunks of size 10000 bytes. Free them all.
> -> All the 100Mb of memory is returned to the OS.
> 
> Allocating 100000 (sequential) chunks of size 1000 bytes. Free them all.
> -> All the 100Mb of memory is returned to the OS.
> 
> Allocating 10000 (sequential) chunks of size 10000 bytes. Free only half of them, the even numbered ones, so that the total freed memory is not contiguous.
> -> Of the 50Mb memory free'd, about 30Mb of memory is returned to the OS. Expected due to page segmentantion/rounding.
> 
> Allocating 100000 (sequential) chunks of size 1000 bytes. Again free only the even numbered ones.
> -> 50Mb memory is free'd, but no memory is returned to the OS. Expected due to page segmentantion/rounding.
> 
> So it seems glibc does its best, it returns any page if it is all free. (hum, and I'm curious to what the results in VC++ are, if anyone tries it, do post the results)
> 
> ------ test program ------
> #include <stdio.h>
> #include <assert.h>
> 
> #define CHUNKSIZE 10000
> #define NUMCHUNKS 10000
> #define CHUNKINC 1 //use CHUNKINC 2 for even numbered freeing
> 
> int main() {
>     char buf[666];
>     int i;
>     char* ptrs[NUMCHUNKS];
>         printf("NumChunks: %d Size: %d Inc: %d\n", NUMCHUNKS, CHUNKSIZE, CHUNKINC);
>     fflush(stdout); gets(buf);
>         for(i = 0; i < NUMCHUNKS; ++i) {
>         ptrs[i] = (char*) malloc(CHUNKSIZE);
>         assert(ptrs[i] != NULL);
>     }
> 
>     char* ptrs2[NUMCHUNKS];
> 
>     for(i = 0; i < NUMCHUNKS; ++i) {
>         ptrs2[i] = (char*) malloc(CHUNKSIZE);
>         assert(ptrs2[i] != NULL);
>         assert(ptrs2[i] > ptrs[NUMCHUNKS-1]);
>     }
>     printf("Mem Allocated\n"); fflush(stdout); gets(buf);
>     /*    for(i = 0; i < NUMCHUNKS; i++)
>         ptrs[i][0] = 'X';
>     printf("Memory Writen\n"); fflush(stdout);    gets(buf);
> */
> 
>     for(i = 0; i < NUMCHUNKS; i += CHUNKINC) {
>         free(ptrs[i]);
>     }
>     printf("Mem free'd\n"); fflush(stdout); gets(buf);
> }
> 
September 05, 2006
Unknown W. Brackets wrote:
> 10_000 chunks of 10_000 (freeing everything - CHUNKINC = 1):
> 
> On startup...
>     Mem usage: 612k
>     VM Size: 240k
> 
> After alloc:
>     Mem usage: 119,504k
>     VM Size: 195k
>
Typo there, should be about VM Size: 195,000k.

> After free:
>     Mem usage: 60,272k
>     VM Size 98,296k
> 

You forgot to mention which compiler(and compiler version) was used. *g*


> 
> DMD version... same constants.

DMD? Why not just use DMC?

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D