January 09, 2007
Frits van Bommel wrote:
> Ralf Schneider wrote:
>> It dosen't seem so hard for me to let the compiler set such an attribute on arrays without pointers...
> 
> It's pretty hard if you can't modify the compiler ;).
> Sean can't modify what DMD does (at least, not directly). He _can_ (directly) modify what the runtime library does by replacing it, which is what he's done.
> He (or anyone else, for that matter) might be able to implement this in GDC though...
> And Walter might be convinced to implement it in DMD (or, if it's front-end only code, accept a patch that implements it).
> 
> Of course, that still leaves arrays of structs, which may contain both pointers and non-pointers.
> What we really need is a way for the GC to know what the type of the memory is, or at least where the pointers are. This may be possible by adding this info to TypeInfo & subclasses[1]. But then every memory block would need a pointer to the relevant TypeInfo (or some condensed form of this information, like flags for "only pointers" and "no pointers", with TypeInfo pointer only if both are false).
> This would definitely require some sort of compiler support though; it would need to generate appropriate type information for structs, objects (the actual memory, not the references) and unions.
> It would then need to supply this information to the GC in the runtime, requiring extra code generation.
> 
> 
> [1]: This could be as simple as supplying a member function that performs a callback for every offset containing a pointer.

According to a conversation I had with Walter a month or two ago, this is the direction we are going. With this sort of type information, other, more modern GC implementations will become possible, one of which I hope to write. :D
January 09, 2007
Oskar Linde wrote:
> Oskar Linde wrote:
>> After having fought a while with D programs with runaway memory leaks, I've unfortunately had to come to the conclusion that the D GC is not ready for production use. The problem is what I'd call "spurious pointers". That is random data (strings, numbers, image data, audio or whatever) appearing to the GC to be full of pointers to all over the memory space.
> 
> I feel kind of bad for making it sound like this is a problem related specifically to the D garbage collector. It is rather a general and apparently well known problem of all conservative garbage collectors. The D garbage collector is still excellent for a large domain of problems.
> 
> Lots of people seem to be having similar problems though, so a better understanding of under what conditions a conservative garbage collector will and will not work seems to be called for.
> 
> I'd guess that a lot of problems we have with the conservative GC is because we are at the end of the life time of the 32-bit address space. As time has passed, our programs have used a larger and larger portion of the available memory space. As one anonymous %u said, one solution would be "increasing the amount of memory needed for pointers by the factor two", which I interpret as moving to a 64-bit memory space.
> 
> I did some calculations to better understand what is going on. I've probably done several errors though.
> 
> <theoretical rambling>
> The probability of one single uniformly random spurious pointer preventing the collection of an object of size s is
> 
> p = s/B,
> 
> with B being the size of the address space (2^b for b = 32 or 64 bits). On a 64 bit architecture, the probability is obviously drastically reduced.
> 
> if g is the amount (bytes) of independent random data an application needs, the risk of an object of size s not being collected is
> 
> P(g) = 1 - (1 - s/B) ^ (g/(b/8))
> 
> As one can see, the risk increases considerably as the objects gets bigger. Increasing the amount of random data the application contains, reduces the size of objects the GC will be able to handle satisfactory.
> 
> My example was kind of nasty in that it not only accumulated additional random garbage in each iteration, but also caused heap fragmentation. A kinder example would always create objects of the same size. The simpler example would be:
> 
> * start with a static set of g bytes of random data
> * for each iteration create n objects of size s (for simplicity, let each object contain s bytes of random data)
> * after each iteration, none of the n objects are used anymore. This is a good time to call gc.fullCollect();
> 
> After each such iteration, i, P_i*n_i objects will remain uncollected, and will be added to the pool of random spurious pointers. I disregard the small portion of objects that are uncollected because of spurious pointers appearing in other objects left uncollected in the same iteration. If this portion would be significant, you'd really have a problem anyway and this will show up in later iterations.
> 
> g_{i+1} ~ g_i + P_i*n_i*s.
> 
> I generously assume that the safe memory area of the collected objects are reused by the allocator in the next iteration. A few of those will become unsafe by the recently added spurious pointers from uncollected objects, but for the most part, those memory areas are now safe, and the objects allocated/collected from them can be disregarded. The number of objects that need to be allocated in unsafe memory areas for the next iteration becomes:
> 
> n_{i+1} = P_i*n_i + P(P_i*n_i*s)*(n_0 - n_i)
> 
> This will of course not perfectly predict the real behavior, as it relaxes the problem away from discrete units, but should still adequately model the sample application. Using this, I tried to find the breaking point, i.e. at what size objects need to be explicitly freed instead of left to the GC.
> 
> "Verification":
> Rewriting my original example to to a std.gc.fullCollect() every 10th iteration, giving the following parameters:
> g = 20 MB
> n = 10
> 
> the model suggests allocating objects of size:
> s = 2000 would result in almost no space overhead, stable at 20 mb
> s = 3000 would result in a slight but stable space overhead,
> s = 4000 would cause a run-away memory usage
> 
> running the program results in:
> s = 2000 memory usage is stable at 20 mb
> s = 3000 results in a small apparently unbounded memory leak
> s = 4000 results in a unbounded memory leak
> 
> The model appears to be at least in the correct ballpark for this sample.
> 
> Theoretical results:
> Those tables show the maximum object size the GC will be able to handle with different static amounts of "random" data. By "being able to handle", I mean that the application doesn't use more than 2x the required amount of memory. The exact breaking point is very unstable and going above it rapidly results in uncontrolled memory consumption.
> 
> 32 bit arch, 100 objects
> 
> 1 MB data    8000 bytes
> 5 MB data    4500 bytes
> 10 MB data    3000 bytes
> 20 MB data    2000 bytes
> 100 MB data     700 bytes
> 500 MB data     200 bytes
> 1000 MB data     100 bytes
> 
> 32 bit arch, 1000 objects
> 
> 1 MB data    3400 bytes
> 5 MB data    2200 bytes
> 10 MB data    1700 bytes
> 20 MB data    1200 bytes
> 100 MB data     500 bytes
> 500 MB data     150 bytes
> 1000 MB data     100 bytes
> 
> 32 bit arch, 10000 objects
> 1 MB data    1300 bytes
> 5 MB data    1000 bytes
> 10 MB data     800 bytes
> 20 MB data     600 bytes
> 100 MB data     300 bytes
> 500 MB data     100 bytes
> 1000 MB data      75 bytes
> 
> Those figures need to be taken with a couple of grains of salt, but should give a indication of at what object size one needs to manually handle object lifetimes.
> 
> As a comparison -- the 64 bit haven:
> 
> 64 bit arch, 100 objects
> 2 GB data    1.5 GB
> 100 GB data    600 MB
> 1 TB data    200 MB
> 
> 64 bit arch, 1000 objects
> 2 GB data    350 MB
> 100 GB data    250 MB
> 1 TB data    150 MB
> 
> 64 bit arch, 10000 objects
> 2 GB data    100 MB
> 100 GB data     75 MB
> 1 TB data     50 MB
> </theoretical rambling>
> 
> Summary:
> 
> As far as I can see, what I have to do to avoid memory leaks with a conservative GC, is one of the following:
> 
> 1. move to a 64 bit architecture
> 2. manually handle all objects larger than a few hundred bytes (see above)
> 3. hide all non pointer data from the GC
> 
> It is a shame #2 is such a pain and that D doesn't offer any help such as automatic ref-counting. Not having automatic ref-counting also prevents neat solutions such as transparent CoW, and automatic handling of scarce resources.
> 
> If I wouldn't have a strong belief that automatic ref-counting would be addressed soon, I'd definitely consider going back to C++. Luckily, before I'd give up waiting, 64 bit architectures will probably be in great majority. ;)
> 
> /Oskar

Dont hold your breath for a reference counting implementation from DMD. Walter doesnt want to add any overhead to assignments.
January 09, 2007
Kyle Furlong wrote:

> Oskar Linde wrote:
>> Oskar Linde wrote:
>>> After having fought a while with D programs with runaway memory leaks, I've unfortunately had to come to the conclusion that the D GC is not ready for production use. The problem is what I'd call "spurious pointers". That is random data (strings, numbers, image data, audio or whatever) appearing to the GC to be full of pointers to all over the memory space.
>> 
>> I feel kind of bad for making it sound like this is a problem related specifically to the D garbage collector. It is rather a general and apparently well known problem of all conservative garbage collectors. The D garbage collector is still excellent for a large domain of problems.
>> 
>> Lots of people seem to be having similar problems though, so a better understanding of under what conditions a conservative garbage collector will and will not work seems to be called for.
>> 
>> I'd guess that a lot of problems we have with the conservative GC is because we are at the end of the life time of the 32-bit address space. As time has passed, our programs have used a larger and larger portion of the available memory space. As one anonymous %u said, one solution would be "increasing the amount of memory needed for pointers by the factor two", which I interpret as moving to a 64-bit memory space.
>> 
>> I did some calculations to better understand what is going on. I've probably done several errors though.
>> 
>> <theoretical rambling>
>> The probability of one single uniformly random spurious pointer
>> preventing the collection of an object of size s is
>> 
>> p = s/B,
>> 
>> with B being the size of the address space (2^b for b = 32 or 64 bits). On a 64 bit architecture, the probability is obviously drastically reduced.
>> 
>> if g is the amount (bytes) of independent random data an application needs, the risk of an object of size s not being collected is
>> 
>> P(g) = 1 - (1 - s/B) ^ (g/(b/8))
>> 
>> As one can see, the risk increases considerably as the objects gets bigger. Increasing the amount of random data the application contains, reduces the size of objects the GC will be able to handle satisfactory.
>> 
>> My example was kind of nasty in that it not only accumulated additional random garbage in each iteration, but also caused heap fragmentation. A kinder example would always create objects of the same size. The simpler example would be:
>> 
>> * start with a static set of g bytes of random data
>> * for each iteration create n objects of size s (for simplicity, let
>> each object contain s bytes of random data)
>> * after each iteration, none of the n objects are used anymore. This is
>> a good time to call gc.fullCollect();
>> 
>> After each such iteration, i, P_i*n_i objects will remain uncollected, and will be added to the pool of random spurious pointers. I disregard the small portion of objects that are uncollected because of spurious pointers appearing in other objects left uncollected in the same iteration. If this portion would be significant, you'd really have a problem anyway and this will show up in later iterations.
>> 
>> g_{i+1} ~ g_i + P_i*n_i*s.
>> 
>> I generously assume that the safe memory area of the collected objects are reused by the allocator in the next iteration. A few of those will become unsafe by the recently added spurious pointers from uncollected objects, but for the most part, those memory areas are now safe, and the objects allocated/collected from them can be disregarded. The number of objects that need to be allocated in unsafe memory areas for the next iteration becomes:
>> 
>> n_{i+1} = P_i*n_i + P(P_i*n_i*s)*(n_0 - n_i)
>> 
>> This will of course not perfectly predict the real behavior, as it relaxes the problem away from discrete units, but should still adequately model the sample application. Using this, I tried to find the breaking point, i.e. at what size objects need to be explicitly freed instead of left to the GC.
>> 
>> "Verification":
>> Rewriting my original example to to a std.gc.fullCollect() every 10th
>> iteration, giving the following parameters:
>> g = 20 MB
>> n = 10
>> 
>> the model suggests allocating objects of size:
>> s = 2000 would result in almost no space overhead, stable at 20 mb
>> s = 3000 would result in a slight but stable space overhead,
>> s = 4000 would cause a run-away memory usage
>> 
>> running the program results in:
>> s = 2000 memory usage is stable at 20 mb
>> s = 3000 results in a small apparently unbounded memory leak
>> s = 4000 results in a unbounded memory leak
>> 
>> The model appears to be at least in the correct ballpark for this sample.
>> 
>> Theoretical results:
>> Those tables show the maximum object size the GC will be able to handle
>> with different static amounts of "random" data. By "being able to
>> handle", I mean that the application doesn't use more than 2x the
>> required amount of memory. The exact breaking point is very unstable and
>> going above it rapidly results in uncontrolled memory consumption.
>> 
>> 32 bit arch, 100 objects
>> 
>> 1 MB data    8000 bytes
>> 5 MB data    4500 bytes
>> 10 MB data    3000 bytes
>> 20 MB data    2000 bytes
>> 100 MB data     700 bytes
>> 500 MB data     200 bytes
>> 1000 MB data     100 bytes
>> 
>> 32 bit arch, 1000 objects
>> 
>> 1 MB data    3400 bytes
>> 5 MB data    2200 bytes
>> 10 MB data    1700 bytes
>> 20 MB data    1200 bytes
>> 100 MB data     500 bytes
>> 500 MB data     150 bytes
>> 1000 MB data     100 bytes
>> 
>> 32 bit arch, 10000 objects
>> 1 MB data    1300 bytes
>> 5 MB data    1000 bytes
>> 10 MB data     800 bytes
>> 20 MB data     600 bytes
>> 100 MB data     300 bytes
>> 500 MB data     100 bytes
>> 1000 MB data      75 bytes
>> 
>> Those figures need to be taken with a couple of grains of salt, but should give a indication of at what object size one needs to manually handle object lifetimes.
>> 
>> As a comparison -- the 64 bit haven:
>> 
>> 64 bit arch, 100 objects
>> 2 GB data    1.5 GB
>> 100 GB data    600 MB
>> 1 TB data    200 MB
>> 
>> 64 bit arch, 1000 objects
>> 2 GB data    350 MB
>> 100 GB data    250 MB
>> 1 TB data    150 MB
>> 
>> 64 bit arch, 10000 objects
>> 2 GB data    100 MB
>> 100 GB data     75 MB
>> 1 TB data     50 MB
>> </theoretical rambling>
>> 
>> Summary:
>> 
>> As far as I can see, what I have to do to avoid memory leaks with a conservative GC, is one of the following:
>> 
>> 1. move to a 64 bit architecture
>> 2. manually handle all objects larger than a few hundred bytes (see
>> above) 3. hide all non pointer data from the GC
>> 
>> It is a shame #2 is such a pain and that D doesn't offer any help such as automatic ref-counting. Not having automatic ref-counting also prevents neat solutions such as transparent CoW, and automatic handling of scarce resources.
>> 
>> If I wouldn't have a strong belief that automatic ref-counting would be addressed soon, I'd definitely consider going back to C++. Luckily, before I'd give up waiting, 64 bit architectures will probably be in great majority. ;)
>> 
>> /Oskar
> 
> Dont hold your breath for a reference counting implementation from DMD. Walter doesnt want to add any overhead to assignments.

Wouldn't it be possible to have classes declared "refcounted" in the same way as scope? Then the extra cost would only apply to members of thouse classes, possibly by disallowing uppcasts to non refcounted superclasses.

example:

refcounted class foo{
}

...

void bar(){
        foo f=new foo();//refcount is one
        auto k=f;//refcount is two
        k=null;//back to one
        //here the refcount reaches zero and the object is deleted
}

January 09, 2007
Frits van Bommel wrote:
> Ralf Schneider wrote:
>> It dosen't seem so hard for me to let the compiler set such an attribute on arrays without pointers...
> 
> It's pretty hard if you can't modify the compiler ;).
> Sean can't modify what DMD does (at least, not directly). He _can_ (directly) modify what the runtime library does by replacing it, which is what he's done.
> He (or anyone else, for that matter) might be able to implement this in GDC though...
> And Walter might be convinced to implement it in DMD (or, if it's front-end only code, accept a patch that implements it).
> 
> Of course, that still leaves arrays of structs, which may contain both pointers and non-pointers.
> What we really need is a way for the GC to know what the type of the memory is, or at least where the pointers are. This may be possible by adding this info to TypeInfo & subclasses[1]. But then every memory block would need a pointer to the relevant TypeInfo (or some condensed form of this information, like flags for "only pointers" and "no pointers", with TypeInfo pointer only if both are false).
> This would definitely require some sort of compiler support though; it would need to generate appropriate type information for structs, objects (the actual memory, not the references) and unions.
> It would then need to supply this information to the GC in the runtime, requiring extra code generation.

An interim alternative that came up in conversation would be to provide similar functionality via template functions.  With the .tupleof property, it's possible to obtain a very accurate picture of what data should be scanned.  This would mean using a custom function instead of 'new', but it would certainly work:

    MyClass c = create!(MyClass)( a, b, c );

The create function above could be written using TypeTuples and other magic to allow direct parameter passing to the class ctor, making the routine just as efficient as an in-language solution.  Similar functions could be written for arrays, exploiting some tricks in the language:

    T[] resize(T)( T[] val, size_t len ) { ... }
    int[] buf;
    buf.resize( 1024 );

Again, perhaps not as clean as an in-language solution, but it could be done today.

I'll look into doing something like this for Tango prior to release.  It shouldn't be too difficult, assuming the .tupleof property works as described (some experimentation I made this morning suggests that it may not).


Sean
January 09, 2007
Sean Kelly wrote:
> Similar functions could be written for arrays, exploiting some tricks in the language:
> 
>     T[] resize(T)( T[] val, size_t len ) { ... }

Err, this should be:

    T[] resize(T)( inout T[] val, size_t len ) { ... }

Sean
January 09, 2007
Sean Kelly wrote:
> An interim alternative that came up in conversation would be to provide similar functionality via template functions.  With the .tupleof property, it's possible to obtain a very accurate picture of what data should be scanned.  This would mean using a custom function instead of 'new', but it would certainly work:

I actually tried something like that a while back. Unfortunately DMD kept segfaulting on me...
Just tried to compile that code again, still segfaults. I should probably try cutting it down to something bugzilla-worthy.
January 09, 2007
Sean Kelly wrote:
> I'll look into doing something like this for Tango prior to release.  It shouldn't be too difficult, assuming the .tupleof property works as described (some experimentation I made this morning suggests that it may not).
> 
> 
> Sean

Some years ago we had some comparision between two PostScript (Not GS) Interpreters without rendering to disk but generating loads of data Harlequin RIP beat the other by large factors which we found out were due to their garbage collector besides other things, this group also built the GC for LispWorks and is now available open source, I don't know how it's licenced but I know it was also used in Dylan from Halequin (for those of you who remember the language).

Maybe you should look at some of their ideas, the link is bellow.

http://www.ravenbrook.com/project/mps/

Zz
January 09, 2007
I would think you are find if you pick your pivot at random(), and seed the random number generator at the top of your program with microseconds-since-1970 or whatever the convenient chaotic local number is.

Or you could probably just use D's [].sort method -- I think Walter said it uses quick-sort but checks for poor performance and switches to a heap sort automatically.

Kevin
January 09, 2007
Kyle Furlong schrieb am 2007-01-09:
> Oskar Linde wrote:
>> Oskar Linde wrote:

<snip>

>> If I wouldn't have a strong belief that automatic ref-counting would be addressed soon, I'd definitely consider going back to C++. Luckily, before I'd give up waiting, 64 bit architectures will probably be in great majority. ;)

> Dont hold your breath for a reference counting implementation from DMD. Walter doesnt want to add any overhead to assignments.

Reference counting implementations would cause a lot of pain and
overhead with array slices. A sweeping GC that has different
pools for non-pointer and may-contain-pointer memory areas is easier to
implement and causes less overhead. In addition reference counting can
cause interesting problems in multi-core systems.
Yes, using advance information all sweeping GC's can be exploited,
though user data should'nt usually be a problem as most of it ends up
in the no-pointer pool. The x86-stack however is a potential attack vector.

Thomas


January 09, 2007
Kevin Bealer wrote:
> I would think you are find if you pick your pivot at random(), and seed the random
> number generator at the top of your program with microseconds-since-1970 or
> whatever the convenient chaotic local number is.

IIRC randomizing the input is a rather general solution to this kind of problem (not just with Quicksort).

> Or you could probably just use D's [].sort method -- I think Walter said it uses
> quick-sort but checks for poor performance and switches to a heap sort automatically.

That would be a variant of introsort. I believe I mentioned it.