January 09, 2007
== Quote from Bill Baxter (dnewsgroup@billbaxter.com)'s article
> Here's a slightly less contrived version of Oskar's gc test.
> import std.math;
> import std.random;
> import std.stdio;
> void main() {
>      // The real memory use, ~40 mb
>      double[] data;
>      data.length = 5_000_000;
>      foreach(i, inout x; data) {
>          x = sin(cast(double)i/data.length);
>          //x = 1;
>      }
>      int count = 0;
>      int gcount = 0;
>      while(1) {
>          // simulate reading a few kb of data
>          double[] incoming;
>          incoming.length = 1000 + rand() % 5000;
>          foreach(i, inout x; incoming) {
>              x = sin(cast(double)i/incoming.length);
>              //x = 5;
>          }
>          // do something with the data...
>          // print status message every so often
>          count += incoming.length;
>          if (count > 1_000_000) {
>              count = 0;
>              gcount++;
>              writefln("%s processed", gcount);
>          }
>      }
> }
> This one uses doubles instead of uints and the data is the sin of some
> number.  These are _very_ realistic values for numeric data to have.
> The same effect can be seen.  Instead of hovering around 40MB, the
> memory use grows and grows and performance slows and slows.
> This seems to be a very big issue.  The GC seems to be pretty much
> useless right now if you're going to have a lot of floating point data
> in your app.
> --bb
> Oskar Linde wrote:
> > After having fought a while with D programs with runaway memory leaks, I've unfortunately had to come to the conclusion that the D GC is not ready for production use. The problem is what I'd call "spurious pointers". That is random data (strings, numbers, image data, audio or whatever) appearing to the GC to be full of pointers to all over the memory space.
> >
> > Consider this simple program. It is designed to have a memory footprint of about 20 mb and then continuously process data.
> >

Agreed. This needs to be changed. Is the GC in that tango library any better?
January 09, 2007
%u wrote:
> == Quote from Bill Baxter (dnewsgroup@billbaxter.com)'s article
>> Here's a slightly less contrived version of Oskar's gc test.
>> import std.math;
>> import std.random;
>> import std.stdio;
>> void main() {
>>      // The real memory use, ~40 mb
>>      double[] data;
>>      data.length = 5_000_000;
>>      foreach(i, inout x; data) {
>>          x = sin(cast(double)i/data.length);
>>          //x = 1;
>>      }
>>      int count = 0;
>>      int gcount = 0;
>>      while(1) {
>>          // simulate reading a few kb of data
>>          double[] incoming;
>>          incoming.length = 1000 + rand() % 5000;
>>          foreach(i, inout x; incoming) {
>>              x = sin(cast(double)i/incoming.length);
>>              //x = 5;
>>          }
>>          // do something with the data...
>>          // print status message every so often
>>          count += incoming.length;
>>          if (count > 1_000_000) {
>>              count = 0;
>>              gcount++;
>>              writefln("%s processed", gcount);
>>          }
>>      }
>> }
>> This one uses doubles instead of uints and the data is the sin of some
>> number.  These are _very_ realistic values for numeric data to have.
>> The same effect can be seen.  Instead of hovering around 40MB, the
>> memory use grows and grows and performance slows and slows.
>> This seems to be a very big issue.  The GC seems to be pretty much
>> useless right now if you're going to have a lot of floating point data
>> in your app.
>> --bb
>> Oskar Linde wrote:
>>> After having fought a while with D programs with runaway memory leaks,
>>> I've unfortunately had to come to the conclusion that the D GC is not
>>> ready for production use. The problem is what I'd call "spurious
>>> pointers". That is random data (strings, numbers, image data, audio or
>>> whatever) appearing to the GC to be full of pointers to all over the
>>> memory space.
>>>
>>> Consider this simple program. It is designed to have a memory footprint
>>> of about 20 mb and then continuously process data.
>>>
> 
> Agreed. This needs to be changed. Is the GC in that tango
> library any better?

It's a modified version of the DMD GC.  The "don't scan blocks containing elements smaller than pointer size" feature is built-in, and there is user-level control of that behavior on a per-block basis, among other things.  But it's still the same old mark/sweep GC at heart.


Sean
January 09, 2007
Sean Kelly wrote:
> It's a modified version of the DMD GC.  The "don't scan blocks containing elements smaller than pointer size" feature is built-in, and there is user-level control of that behavior on a per-block basis, among other things.  But it's still the same old mark/sweep GC at heart.

Does the new GC allow setting a hook to be informed of when a given object was collected? (I need that)

Luís
January 09, 2007
Luís Marques wrote:
> Sean Kelly wrote:
>> It's a modified version of the DMD GC.  The "don't scan blocks containing elements smaller than pointer size" feature is built-in, and there is user-level control of that behavior on a per-block basis, among other things.  But it's still the same old mark/sweep GC at heart.
> 
> Does the new GC allow setting a hook to be informed of when a given object was collected? (I need that)
> 
> Luís

This is possible with these functions in Object:
final void notifyRegister(void delegate(Object) dg);
final void notifyUnRegister(void delegate(Object) dg);
January 09, 2007
Lutger wrote:
> Luís Marques wrote:
>> Sean Kelly wrote:
>>> It's a modified version of the DMD GC.  The "don't scan blocks containing elements smaller than pointer size" feature is built-in, and there is user-level control of that behavior on a per-block basis, among other things.  But it's still the same old mark/sweep GC at heart.
>>
>> Does the new GC allow setting a hook to be informed of when a given object was collected? (I need that)
> 
> This is possible with these functions in Object:
> final void notifyRegister(void delegate(Object) dg);
> final void notifyUnRegister(void delegate(Object) dg);

This option is available.  There is also a global hook that can be called when an object is collected by the GC (as opposed to destroyed deterministically via delete):

    bool myCollectHandler( Object o )
    {
        if( o.classinfo.name == "MyObject" )
        {
            (cast(MyObject) o).dispose();
            return false;
        }
        return true;
    }

    setCollectHandler( &myCollectHandler );

Returning false from the hook tells the GC not to finalize the object--ie. destroy its monitor but don't call its dtor.  The purpose of this method is twofold: to detect when memory is "leaked" and to allow objects to be cleaned up differently if disposed via delete than via a GC collection.


Sean
January 09, 2007
Frits van Bommel wrote:
> Of course, if you're only concerned about malicious users sending
> worst-case data (and not about worst-case data appearing randomly)
> there's an easy fix: randomize the data in O(n) :).

Yeah, sure. You can also just pick a random element per iteration (instead of the first one or the middle one).
January 09, 2007
Bill Baxter wrote:
> Here's a slightly less contrived version of Oskar's gc test.
> 
> import std.math;
> import std.random;
> import std.stdio;
> 
> void main() {
>     // The real memory use, ~40 mb
>     double[] data;
>     data.length = 5_000_000;
>     foreach(i, inout x; data) {
>         x = sin(cast(double)i/data.length);
>         //x = 1;
>     }
>     int count = 0;
>     int gcount = 0;
>     while(1) {
>         // simulate reading a few kb of data
>         double[] incoming;
>         incoming.length = 1000 + rand() % 5000;
>         foreach(i, inout x; incoming) {
>             x = sin(cast(double)i/incoming.length);
>             //x = 5;
>         }
>         // do something with the data...
> 
>         // print status message every so often
>         count += incoming.length;
>         if (count > 1_000_000) {
>             count = 0;
>             gcount++;
>             writefln("%s processed", gcount);
>         }
>     }
> }
> 
> 
> 
> This one uses doubles instead of uints and the data is the sin of some number.  These are _very_ realistic values for numeric data to have. The same effect can be seen.  Instead of hovering around 40MB, the memory use grows and grows and performance slows and slows.
> 
> This seems to be a very big issue.  The GC seems to be pretty much useless right now if you're going to have a lot of floating point data in your app.

For what it's worth, I ran the test above with the modified GC in Tango, for 10000 iterations of the "while(1)" loop.  The default behavior roughly matched Phobos, with an 89 second run time and over 340MB of memory consumed and growing steadily.  Then I told the GC to not scan the arrays using the following calls:

    gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
    gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );

A test with these changes in place dropped the run time to 7 seconds with 43MB of memory consumed and not growing.

I grant that this isn't quite as nice as if D just figured out whether to scan the block using TypeInfo, but at least it grants the programmer a way to customize GC behavior somewhat to tune application performance.  The only stipulation with the current implementation is that block attributes will not be preserved if an array is resized.  This isn't terribly difficult to fix, but I haven't done so yet.


Sean
January 09, 2007
> For what it's worth, I ran the test above with the modified GC in Tango, for 10000 iterations of the "while(1)" loop.  The default behavior roughly matched Phobos, with an 89 second run time and over 340MB of memory consumed and growing steadily.  Then I told the GC to not scan the arrays using the following calls:
>
>     gc.setAttr( data.ptr, GC.BlkAttr.NO_SCAN );
>     gc.setAttr( incoming.ptr, GC.BlkAttr.NO_SCAN );
>
> A test with these changes in place dropped the run time to 7 seconds with 43MB of memory consumed and not growing.
>
> I grant that this isn't quite as nice as if D just figured out whether to scan the block using TypeInfo, but at least it grants the programmer a way to customize GC behavior somewhat to tune application performance. The only stipulation with the current implementation is that block attributes will not be preserved if an array is resized.  This isn't terribly difficult to fix, but I haven't done so yet.

It dosen't seem so hard for me to let the compiler set such an attribute on arrays without pointers...

- Ralf


January 09, 2007
Oskar Linde wrote:
> After having fought a while with D programs with runaway memory leaks, I've unfortunately had to come to the conclusion that the D GC is not ready for production use. The problem is what I'd call "spurious pointers". That is random data (strings, numbers, image data, audio or whatever) appearing to the GC to be full of pointers to all over the memory space.

I feel kind of bad for making it sound like this is a problem related specifically to the D garbage collector. It is rather a general and apparently well known problem of all conservative garbage collectors. The D garbage collector is still excellent for a large domain of problems.

Lots of people seem to be having similar problems though, so a better understanding of under what conditions a conservative garbage collector will and will not work seems to be called for.

I'd guess that a lot of problems we have with the conservative GC is because we are at the end of the life time of the 32-bit address space. As time has passed, our programs have used a larger and larger portion of the available memory space. As one anonymous %u said, one solution would be "increasing the amount of memory needed for pointers by the factor two", which I interpret as moving to a 64-bit memory space.

I did some calculations to better understand what is going on. I've probably done several errors though.

<theoretical rambling>
The probability of one single uniformly random spurious pointer preventing the collection of an object of size s is

p = s/B,

with B being the size of the address space (2^b for b = 32 or 64 bits). On a 64 bit architecture, the probability is obviously drastically reduced.

if g is the amount (bytes) of independent random data an application needs, the risk of an object of size s not being collected is

P(g) = 1 - (1 - s/B) ^ (g/(b/8))

As one can see, the risk increases considerably as the objects gets bigger. Increasing the amount of random data the application contains, reduces the size of objects the GC will be able to handle satisfactory.

My example was kind of nasty in that it not only accumulated additional random garbage in each iteration, but also caused heap fragmentation. A kinder example would always create objects of the same size. The simpler example would be:

* start with a static set of g bytes of random data
* for each iteration create n objects of size s (for simplicity, let each object contain s bytes of random data)
* after each iteration, none of the n objects are used anymore. This is a good time to call gc.fullCollect();

After each such iteration, i, P_i*n_i objects will remain uncollected, and will be added to the pool of random spurious pointers. I disregard the small portion of objects that are uncollected because of spurious pointers appearing in other objects left uncollected in the same iteration. If this portion would be significant, you'd really have a problem anyway and this will show up in later iterations.

g_{i+1} ~ g_i + P_i*n_i*s.

I generously assume that the safe memory area of the collected objects are reused by the allocator in the next iteration. A few of those will become unsafe by the recently added spurious pointers from uncollected objects, but for the most part, those memory areas are now safe, and the objects allocated/collected from them can be disregarded. The number of objects that need to be allocated in unsafe memory areas for the next iteration becomes:

n_{i+1} = P_i*n_i + P(P_i*n_i*s)*(n_0 - n_i)

This will of course not perfectly predict the real behavior, as it relaxes the problem away from discrete units, but should still adequately model the sample application. Using this, I tried to find the breaking point, i.e. at what size objects need to be explicitly freed instead of left to the GC.

"Verification":
Rewriting my original example to to a std.gc.fullCollect() every 10th iteration, giving the following parameters:
g = 20 MB
n = 10

the model suggests allocating objects of size:
s = 2000 would result in almost no space overhead, stable at 20 mb
s = 3000 would result in a slight but stable space overhead,
s = 4000 would cause a run-away memory usage

running the program results in:
s = 2000 memory usage is stable at 20 mb
s = 3000 results in a small apparently unbounded memory leak
s = 4000 results in a unbounded memory leak

The model appears to be at least in the correct ballpark for this sample.

Theoretical results:
Those tables show the maximum object size the GC will be able to handle with different static amounts of "random" data. By "being able to handle", I mean that the application doesn't use more than 2x the required amount of memory. The exact breaking point is very unstable and going above it rapidly results in uncontrolled memory consumption.

32 bit arch, 100 objects

1 MB data	8000 bytes
5 MB data	4500 bytes
10 MB data	3000 bytes
20 MB data	2000 bytes
100 MB data	 700 bytes
500 MB data	 200 bytes
1000 MB data	 100 bytes

32 bit arch, 1000 objects

1 MB data	3400 bytes
5 MB data	2200 bytes
10 MB data	1700 bytes
20 MB data	1200 bytes
100 MB data	 500 bytes
500 MB data	 150 bytes
1000 MB data	 100 bytes

32 bit arch, 10000 objects
1 MB data	1300 bytes
5 MB data	1000 bytes
10 MB data	 800 bytes
20 MB data	 600 bytes
100 MB data	 300 bytes
500 MB data	 100 bytes
1000 MB data	  75 bytes

Those figures need to be taken with a couple of grains of salt, but should give a indication of at what object size one needs to manually handle object lifetimes.

As a comparison -- the 64 bit haven:

64 bit arch, 100 objects
2 GB data	1.5 GB
100 GB data	600 MB
1 TB data	200 MB

64 bit arch, 1000 objects
2 GB data	350 MB
100 GB data	250 MB
1 TB data	150 MB

64 bit arch, 10000 objects
2 GB data	100 MB
100 GB data	 75 MB
1 TB data	 50 MB
</theoretical rambling>

Summary:

As far as I can see, what I have to do to avoid memory leaks with a conservative GC, is one of the following:

1. move to a 64 bit architecture
2. manually handle all objects larger than a few hundred bytes (see above)
3. hide all non pointer data from the GC

It is a shame #2 is such a pain and that D doesn't offer any help such as automatic ref-counting. Not having automatic ref-counting also prevents neat solutions such as transparent CoW, and automatic handling of scarce resources.

If I wouldn't have a strong belief that automatic ref-counting would be addressed soon, I'd definitely consider going back to C++. Luckily, before I'd give up waiting, 64 bit architectures will probably be in great majority. ;)

/Oskar
January 09, 2007
Ralf Schneider wrote:
> It dosen't seem so hard for me to let the compiler set such an attribute on arrays without pointers...

It's pretty hard if you can't modify the compiler ;).
Sean can't modify what DMD does (at least, not directly). He _can_ (directly) modify what the runtime library does by replacing it, which is what he's done.
He (or anyone else, for that matter) might be able to implement this in GDC though...
And Walter might be convinced to implement it in DMD (or, if it's front-end only code, accept a patch that implements it).

Of course, that still leaves arrays of structs, which may contain both pointers and non-pointers.
What we really need is a way for the GC to know what the type of the memory is, or at least where the pointers are. This may be possible by adding this info to TypeInfo & subclasses[1]. But then every memory block would need a pointer to the relevant TypeInfo (or some condensed form of this information, like flags for "only pointers" and "no pointers", with TypeInfo pointer only if both are false).
This would definitely require some sort of compiler support though; it would need to generate appropriate type information for structs, objects (the actual memory, not the references) and unions.
It would then need to supply this information to the GC in the runtime, requiring extra code generation.


[1]: This could be as simple as supplying a member function that performs a callback for every offset containing a pointer.