Jump to page: 1 2
Thread overview
Memory allocation in D
Nov 29, 2007
Marius Muja
Nov 29, 2007
BCS
Nov 29, 2007
Sean Kelly
Nov 29, 2007
Sean Kelly
Nov 29, 2007
Craig Black
Nov 30, 2007
Sean Kelly
Nov 30, 2007
Robert Fraser
Nov 30, 2007
Marius Muja
Dec 01, 2007
Sean Kelly
Dec 01, 2007
Marius Muja
Nov 30, 2007
Jason House
Dec 01, 2007
Sean Kelly
Dec 01, 2007
Sean Kelly
Dec 19, 2007
James Dennett
Dec 19, 2007
Sean Kelly
Dec 01, 2007
Oskar Linde
Dec 01, 2007
Sean Kelly
Nov 29, 2007
Jérôme M. Berger
Nov 29, 2007
Jérôme M. Berger
November 29, 2007
Hi,

I'm working on a program that is computationally and memory intensive. I have noticed that if I allocate the memory using D's new operator I can allocate only about half as much memory compared to C's malloc (and also the memory allocation is much slower). Does anybody know why D's memory allocator uses twice as much memory compared to malloc?

I used the following test program to test this:

version (Tango){
import tango.core.Memory;
import tango.stdc.stdlib;
import tango.io.Stdout;
}
else
{
import std.stdio;
import std.c.stdlib;
}

const int ALLOC_SIZE = 1024*1024;

void test_d_alloc()
{
	float[] a = new float[ALLOC_SIZE];
}

void test_malloc()
{
	float* ptr = cast(float*)malloc(ALLOC_SIZE*float.sizeof);
	if (ptr is null) throw new Exception("Out of memory");
}

void main()
{
	version (Tango) {
		GC.disable();
	} else {
		std.gc.disable();
	}
	int i=0;
	while(true) {
		version (Tango) {
			Stdout.formatln("{}",++i);
		} else {
			writefln(++i);
		}
// 		test_d_alloc();
		test_malloc();
	}
}


When the above program uses the test_d_alloc() function it executes the loop about half as many times compared to when it's using the test_malloc() function (until it terminates with an out of memory error). Also when the allocation is done in smaller increments (ALLOC_SIZE is smaller) the test_d_alloc() version segfaults after allocating the maximum amount of memory (3G) instead of terminating with a nice OutOfMemory error.

Marius
November 29, 2007
Reply to Marius,

> Hi,
> 
> I'm working on a program that is computationally and memory intensive.
> I have noticed that if I allocate the memory using D's new operator I
> can allocate only about half as much memory compared to C's malloc
> (and also the memory allocation is much slower). Does anybody know why
> D's memory allocator uses twice as much memory compared to malloc?
> 
[...]
>
> When the above program uses the test_d_alloc() function it executes
> the loop about half as many times compared to when it's using the
> test_malloc() function (until it terminates with an out of memory
> error). Also when the allocation is done in smaller increments
> (ALLOC_SIZE is smaller) the test_d_alloc() version segfaults after
> allocating the maximum amount of memory (3G) instead of terminating
> with a nice OutOfMemory error.
> 

If you are using windows and the D new version allocates 3GB of ram then it's using everything it can get. No matter what, you have 75% of the address space, I would suspect that the malloc isn't actual allocating as much as it should. Or am I reading something wrong. 


November 29, 2007
Marius Muja wrote:
> Hi,
> 
> I'm working on a program that is computationally and memory intensive. I have noticed that if I allocate the memory using D's new operator I can allocate only about half as much memory compared to C's malloc (and also the memory allocation is much slower). Does anybody know why D's memory allocator uses twice as much memory compared to malloc?
> 
> I used the following test program to test this:
> 
> version (Tango){
> import tango.core.Memory;
> import tango.stdc.stdlib;
> import tango.io.Stdout;
> }
> else
> {
> import std.stdio;
> import std.c.stdlib;
> }
> 
> const int ALLOC_SIZE = 1024*1024;
> 
> void test_d_alloc()
> {
>     float[] a = new float[ALLOC_SIZE];
> }

With D as it is now, this will call GC.malloc(2049), which will allocate one page of memory per call.  The D runtime adds 1 to every allocation with "new", according to Walter, this was to prevent certain confusing errors.  It will allocate a page per call because the D GC allocates from pools devoted to fixed-size blocks which grow in powers of two up to 4096 bytes (one page on most systems).  Allocations beyond 4096 bytes are "big" allocations, and a multiple of 4096 bytes will be allocated to fit the requested size.  For these allocations, when the memory is released I believe the pages go back into a free pool.  I'm not sure offhand if empty pages used for smaller blocks do or not.

You might want to try calling GC.malloc() instead and compare results.


Sean
November 29, 2007
This thread inspired me to make a number of changes to the GC and such in Tango.  They should improve efficiency in terms of eliminating wasted space and avoiding collections in some situations.  They'll probably take a few days, so stay tuned.


Sean
November 29, 2007
Marius Muja wrote:
> Hi,
> 
> I'm working on a program that is computationally and memory intensive. I have noticed that if I allocate the memory using D's new operator I can allocate only about half as much memory compared to C's malloc (and also the memory allocation is much slower). Does anybody know why D's memory allocator uses twice as much memory compared to malloc?
> 
> I used the following test program to test this:
> 
> ... snip ...
> 
> When the above program uses the test_d_alloc() function it executes the
> loop about half as many times compared to when it's using the
> test_malloc() function (until it terminates with an out of memory
> error). Also when the allocation is done in smaller increments
> (ALLOC_SIZE is smaller) the test_d_alloc() version segfaults after
> allocating the maximum amount of memory (3G) instead of terminating with
> a nice OutOfMemory error.
> 
	Since you are using Linux, I would like to point out that a call to
malloc may succeed even if there is no memory available at the time.
The program will only fail when it actually tries to access the
memory. Since the D new fills the array with zeroes, it will fail
immediately upon reaching the memory limit. The standard malloc
however may appear to work much longer unless you actually try to
use the memory.

		Jerome
- --
+------------------------- Jerome M. BERGER ---------------------+
|    mailto:jeberger@free.fr      | ICQ:    238062172            |
|    http://jeberger.free.fr/     | Jabber: jeberger@jabber.fr   |
+---------------------------------+------------------------------+
November 29, 2007
"Sean Kelly" <sean@f4.ca> wrote in message news:fin4fl$1h7v$1@digitalmars.com...
> This thread inspired me to make a number of changes to the GC and such in Tango.  They should improve efficiency in terms of eliminating wasted space and avoiding collections in some situations.  They'll probably take a few days, so stay tuned.
>
>
> Sean

Cool!  Maybe you could show some benchmarks when you get done?


November 29, 2007
Jérôme M. Berger wrote:
> Since the D new fills the array with zeroes,
                                       ^^^^^^

	Sorry, this should be "nans" of course, but the main point remains
valid...

		Jerome
- --
+------------------------- Jerome M. BERGER ---------------------+
|    mailto:jeberger@free.fr      | ICQ:    238062172            |
|    http://jeberger.free.fr/     | Jabber: jeberger@jabber.fr   |
+---------------------------------+------------------------------+
November 30, 2007
Craig Black wrote:
> "Sean Kelly" <sean@f4.ca> wrote in message news:fin4fl$1h7v$1@digitalmars.com...
>> This thread inspired me to make a number of changes to the GC and such in Tango.  They should improve efficiency in terms of eliminating wasted space and avoiding collections in some situations.  They'll probably take a few days, so stay tuned.
> 
> Cool!  Maybe you could show some benchmarks when you get done? 

Unfortunately, I think I'm not going to commit these changes.  I had some clever ideas for how to replace the "allocate an extra byte" functionality with an approach that would have resulted in less wasted space, but it's become more complicated than the added efficiency is worth.

Basically, I was hoping to keep an extra free page at the end of the allocated memory region used by the process.  This assumes that all memory used by the process is contiguous however, or at least that the intervening pages are valid.  But neither assumption is reasonable, and while some sort of fancy manual fix may have worked with some memory management APIs, it wouldn't have worked with others.

For what it's worth, my concern with the current behavior is that its worst-case scenario coincides with typical usage.  It is extremely common for a programmer to use array sizes that are powers of two, and this is the situation which results in the maximum unused space per block (because the runtime adds 1 to the size of all array allocations passed to the GC).  Totally stinks, and I'm running out of ideas.  I don't suppose someone can offer a suggestion for how to address this, other than "just always allocate exactly what they request and if they get a page fault then too bad for them?"


Sean
November 30, 2007
Sean Kelly wrote:
> For what it's worth, my concern with the current behavior is that its worst-case scenario coincides with typical usage.  It is extremely common for a programmer to use array sizes that are powers of two, and this is the situation which results in the maximum unused space per block (because the runtime adds 1 to the size of all array allocations passed to the GC).  Totally stinks, and I'm running out of ideas.  I don't suppose someone can offer a suggestion for how to address this, other than "just always allocate exactly what they request and if they get a page fault then too bad for them?"
> 
> 
> Sean

I'm sure you've probably read it, but I'll post it anyway:

http://www.cs.purdue.edu/homes/grr/ISMM98-grr-mps-cef.pdf

In particular:
"... [Objects] slightly larger than a page boundary lead to
significant internal fragmentation in the page-based large
object allocator as well. The high internal fragmentation
associated with both large small objects and small large
objects suggest that one should create an intermediate class
of medium objects."
November 30, 2007
Sean Kelly wrote:

> For what it's worth, my concern with the current behavior is that its worst-case scenario coincides with typical usage.  It is extremely common for a programmer to use array sizes that are powers of two, and this is the situation which results in the maximum unused space per block (because the runtime adds 1 to the size of all array allocations passed to the GC).  Totally stinks, and I'm running out of ideas.  I don't suppose someone can offer a suggestion for how to address this, other than "just always allocate exactly what they request and if they get a page fault then too bad for them?"

That was my problem also. I was using array sizes that were powers of two and because of the extra 1 there were many memory pages wasted. For example in one of my tests I was allocating float arrays of size 1024 (4096 bytes) which were taking 2 pages instead of 1 because of the extra 1 added, and that's where I was observing that with D's new operator I can only allocate half as much memory than with malloc.
Oddly the same problem occurs if I allocate much larger arrays of exactly 1024*1024 floats (4MB size). Seems that instead of 4MB, D's runtime allocated 8MB for each of these arrays. Is D's runtime using 4MB pages for large memory allocations?


Marius
« First   ‹ Prev
1 2