Thread overview
Code to show
Mar 05, 2008
bearophile
Mar 05, 2008
Marius Muja
Mar 06, 2008
bearophile
Mar 06, 2008
Bill Baxter
Mar 07, 2008
Marius Muja
March 05, 2008
One of the most important topics of D I don't understand yet regards what the (Phobos) GC does when I try to allocate memory in more manual ways or in mixed situations.
In this post I show some simple code, if you have some time I'd like you to take a look at it and tell me what possibile problems/bugs (mostly related to GC/memory) it may have (it's not fully tested yet, but it's short enough).
(Some days from now I'll try to post a second piece of code (regarding a Deque implementation of mine) with a more complex GC/memory situation).


I have read the "Allocating Uninitialized Arrays On The Stack" part in the D docs:
>The uninitialized data that is on the stack will get scanned by the garbage collector looking for any references to allocated memory. Since the uninitialized data consists of old D stack frames, it is highly likely that some of that garbage will look like references into the GC heap, and the GC memory will not get freed. This problem really does happen, and can be pretty frustrating to track down.<

That talks about the stack only, but is this problem possible for Garbage-Collected memory on the heap too?

In a small bioinformatics benchmark program the time taken to allocate a 40 MB matrix (about 3200^2 of int) was too much long for my tastes, so I have created the following very experimental code:


import std.c.stdlib: cmalloc = malloc, cfree = free;
import std.gc: gcmalloc = malloc, gcrealloc = realloc, hasNoPointers;

T[] NewVoidGCArray(T, bool pointers=true)(int n) {
    auto pt = cast(T*)gcmalloc(n * T.sizeof);
    static if (!pointers)
        hasNoPointers(pt);
    return pt[0 .. n];
}

void delVoidGCArray(T)(ref T a) {
    gcrealloc(a.ptr, 0);
    a = null;
}

T[][] NewVoidGCMatrix(T, bool pointers=true)(int nr, int nc) {
    auto pt = cast(T*)gcmalloc(nr * nc * T.sizeof);
    static if (!pointers)
        hasNoPointers(pt);
    auto result = new T[][](nr);
    int pos;
    foreach(ref row; result) {
        row = pt[pos .. pos+nc];
        pos += nc;
    }
    return result;
}

void delVoidGCMatrix(T)(ref T[][] a) {
    gcrealloc(a.ptr, 0);
    a = null;
}

//-----------------------------------

T[] NewVoidCArray(T)(int n) {
    auto pt = cast(T*)cmalloc(n * T.sizeof);
    return pt[0 .. n];
}

void delVoidCArray(T)(ref T a) {
    cfree(a.ptr);
    a = null;
}

T[][] NewVoidCMatrix(T)(int nr, int nc) {
    auto pt = cast(T*)cmalloc(nr * nc * T.sizeof);
    auto result = new T[][](nr);
    int pos;
    foreach(ref row; result) {
        row = pt[pos .. pos+nc];
        pos += nc;
    }
    return result;
}

void deleteVoidCMatrix(T)(ref T[][] a) {
    cfree(a.ptr);
    a = null;
}


Notes:
- If you spot some wrong or dangerous things please tell me.
- I know most programs spend most time using data structures, and not just creating them, so in most programs this can't speed up much the code.
- Even if this code is "safe" (and I am far from sure of that), it's longer than the normal D code, so I am going to use it only in very special situations, in speed-critical situations, and probably never in long-ish programs that must be realiable enough.
- If I use such NewVoid[C|CG][Matrix|Array] I am going to fill them as soon as possible anyway, because void data is a bomb anyway, and it requires care.
- If that T type is non-reference data (like int, float, etc) then hasNoPointers() is useful. If T is an object, then the bool "pointers" must be true (elsewhere I have written a recursive template that automatically tells if a type is/contains a refence type or not, but I haven't used it here) to allow the GC manage those object referces correctly, keeping them alive if I remove their reference elsewhere, etc.
- I presume putting D objects in one of those arrays allocated from the C heap (NewVoidCArray/ NewVoidCMatrix) is a too much dangerous thing to do, I don't know if it's possible, but I think I'll always avoid to do it.
- The size of those array/matrices can be defined at run time. And their memory is contigous, this means a 3200^2 matrix requires about less than about 41 MB, instead of about 52 (because the sizes from the memory pool become rounded, etc). Being contigous maybe there is a bit more cache coherence.
- Those arrays/matrices can't change size once created, and those matrices must be rectangular, but I can often live with this. They can be used with the normal D syntax, avoiding the slow down D has when you create a struct to wrap such things, with the operator overloading to use [] etc.
- All those slices, like pt[0 .. n] don't actually copy the memory, so they are fast. The matrix uses an array of arrays, this wastes some memory, but I can live with that, and it allows me a standard syntax and to avoid structs/classes (another possible solution is to use pointers, so I can use the C syntax, but I lose the handy .length attribute, and the nice array bound cheeks when not in -release mode).
- With one of this functions, I have written short code that's as fast as C code, but half the lines of code (and it produces the correct result). Probably I'll show you the interesting benchmark I was working on.

Once I'll slowly start to understand how the GC works, I'll be able to create more complex data structures for D, like the Deque I am working on.

Bye and thank you for any help you can give,
bearophile
March 05, 2008
I'm using something very similar in my speed-critical code:


/**
 * Allocates (using C's malloc) a generic type T.
 *
 * Params:
 *     count = number of instances to allocate.
 * Returns: pointer (of type T*) to memory buffer
 */
public T* allocate(T)(int count = 1)
{
	T* mem = cast(T*) tango.stdc.stdlib.malloc(T.sizeof*count);
	return mem;
}

/**
 * Allocates (using C's malloc) a one dimensional array.
 * Params:
 *     count = array size
 * Returns: new array
 */
public T[] allocate(T : T[])(int count)
{
	T* mem = cast(T*) tango.stdc.stdlib.malloc(count*T.sizeof);		
	return mem[0..count];
}

/**
 * Allocates (using C's malloc) a two dimensional array.
 *
 * Params:
 *     rows = rows in the new array
 *     cols = cols in the new array, if cols==-1 then this function
 *     		allocates a one dimensional array of empty arrays
 * Returns: the new two dimensional array
 */
public T[][] allocate(T : T[][])(int rows, int cols = -1)
{
	if (cols == -1) {
		T[]* mem = cast(T[]*) tango.stdc.stdlib.malloc(rows*(T[]).sizeof);
		return mem[0..rows];
	}
	else {
		//if (rows & 1) rows++; // for 16 byte allignment	
		void* mem = tango.stdc.stdlib.malloc(rows*(T[]).sizeof+rows*cols*T.sizeof);
		T[]* index = cast(T[]*) mem;
		T* mat = cast(T*) (mem+rows*(T[]).sizeof);
		
		for (int i=0;i<rows;++i) {
			index[i] = mat[0..cols];
			mat += cols;
		}
		
		return index[0..rows];
	}
}

/**
 * Template to check is a type is an array.
 */
private template isArray(T)
{
    static if( is( T U : U[] ) )
        const isArray = true;
    else
        const isArray = false;
}

/**
 * Frees allocated memory.
 * Params:
 *     ptr = variable to free.
 */
public void free(T)(T ptr)
{
	static if ( isArray!(T) ) {
		tango.stdc.stdlib.free(cast(void*) ptr.ptr);
	}
	else {
		tango.stdc.stdlib.free(cast(void*) ptr);
	}
}
March 06, 2008
Marius Muja:
> I'm using something very similar in my speed-critical code:

Thank you for showing. Your code is more refined (mine was an experimental version, not refined yet, little tested, dirty, etc) and it shows a couple ideas I too may adopt/copy.

But beside such secondary improvements, my main question was: if the element of those arrays  are object references, or pointers to GC structs, aren't such arrays going to produce GC problems when they aren't initialized yet? They can contain pointers that the GC may add to the roots... even if that memory block is essentially empty still.

So I presume I must always disable the GC looking for pointers on them, and make the GC look for pointers only after I have filled the whole array (and it contains reference types, but I presume the GC is smart enough to stop looking at it if a block is filled with just non-reference types).

And note that such data structure is simple, it's just an 1D/2D array. When data structures are more complex I don't know anymore how the GC will manage them in such mixed manual/automatic situations... Programming in C is much simpler, because there's no a hidden subystem (the GC) that works in mysterious ways when your code is running.

Bye,
bearophile
March 06, 2008
bearophile wrote:
> Marius Muja:
>> I'm using something very similar in my speed-critical code:
> 
> Thank you for showing. Your code is more refined (mine was an experimental version, not refined yet, little tested, dirty, etc) and it shows a couple ideas I too may adopt/copy.
> 
> But beside such secondary improvements, my main question was: if the element of those arrays  are object references, or pointers to GC structs, aren't such arrays going to produce GC problems when they aren't initialized yet? They can contain pointers that the GC may add to the roots... even if that memory block is essentially empty still. 
> 
> So I presume I must always disable the GC looking for pointers on them, and make the GC look for pointers only after I have filled the whole array (and it contains reference types, but I presume the GC is smart enough to stop looking at it if a block is filled with just non-reference types).

> And note that such data structure is simple, it's just an 1D/2D
> array. When data structures are more complex I don't know anymore how
> the GC will manage them in such mixed manual/automatic situations...
> Programming in C is much simpler, because there's no a hidden
> subystem (the GC) that works in mysterious ways when your code is
> running.

Sorry I didn't read your post in detail, but if think C-like behavior is easier, then why don't you just use std.c.stdlib.{malloc,calloc,alloca,realloc,free} ?

--bb
March 07, 2008
bearophile wrote:
> Marius Muja:
>> I'm using something very similar in my speed-critical code:
> 
> Thank you for showing. Your code is more refined (mine was an experimental version, not refined yet, little tested, dirty, etc) and it shows a couple ideas I too may adopt/copy.
> 
> But beside such secondary improvements, my main question was: if the element of those arrays  are object references, or pointers to GC structs, aren't such arrays going to produce GC problems when they aren't initialized yet? They can contain pointers that the GC may add to the roots... even if that memory block is essentially empty still. 
> 
> So I presume I must always disable the GC looking for pointers on them, and make the GC look for pointers only after I have filled the whole array (and it contains reference types, but I presume the GC is smart enough to stop looking at it if a block is filled with just non-reference types).

I usually disable the GC in all speed-critical code and I manage memory manually.

> 
> And note that such data structure is simple, it's just an 1D/2D array. When data structures are more complex I don't know anymore how the GC will manage them in such mixed manual/automatic situations... Programming in C is much simpler, because there's no a hidden subystem (the GC) that works in mysterious ways when your code is running.
> 
> Bye,
> bearophile