January 08, 2007
Frits van Bommel wrote:
> If you don't want to use them at all, I think the closest you'd get would involve implementing thread-local heaps.
> But that would still require a mutex if you want to allow deleting an object from a different thread than it was allocated in...

Now that I think about it, maybe it would be possible to eliminate that mutex as well. Hmm...
January 08, 2007
Oskar Linde wrote:

> After having fought a while with D programs with runaway memory leaks, I've unfortunately had to come to the conclusion that the D GC is not ready for production use. The problem is what I'd call "spurious pointers". That is random data (strings, numbers, image data, audio or whatever) appearing to the GC to be full of pointers to all over the memory space.
> 
> The definite solution has to be a GC that only scans memory containing pointers. Sean's patches to make the GC skip scanning memory known to contain elements smaller than sizeof(void*) will probably help tremendously. (I'd just have to make sure I'm not using dchar[] strings, float or double data, or the DMD associative array implementation)

That's what precise GC is all about.

I think it's the single biggest problem of the present D implementations. Some of my measurements have shown that on memory-intensive applications current Phobos GC could be 10x slower than MS.NET 2.0's GC: http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digi talmars.D&artnum=43991

-- 
AKhropov
January 08, 2007
Frits van Bommel wrote:
> Sean Kelly wrote:
>> Assuming this is a multithreaded app, you have to pay for two mutex locks for every concat operation.  So an expression like:
>>
>> a = b ~ c ~ d;
>>
>> would result in six locks of the mutex protecting the GC to perform allocations (I think anyway--I don't believe chained concats have special handling).  I think this would also result in two discarded temporary buffers for the GC to clean up later.  And since the Phobos GC scans all such blocks by default...
> 
> Chained concats *do* have special handling. See phobos/internal/arraycat.d, _d_arraycatn() (first function in the file, can't miss it).
> This function takes element size and number of arrays, followed by C-style vararg arrays.
> And yes, it only allocates once :).

Oh good :)  I knew about the varargs but hadn't given the code a close enough look to be sure.

> By the way, how do you figure six locks? At two locks per concat, that code only has two concats so even without above special handling wouldn't it still only lock four times?

I miscounted :p  It would have been four without the special handling.

> Also: Why two locks per concat at all? A concat only requires one allocation, so does a single alloc require two locks?
> (I haven't looked much into the GC code, so this is just curiosity)

The code in internal/gc/gc.d first checks the size of the array to see if a realloc is necessary, then it performs the realloc.  So worst case you're stuck with two locks per operation.  However, this may not be the case in the arraycat routines.  It's been a while since I've looked at them.

>> My patch will address the scanning issue, but it won't address the mutex issue--there's really no easy way to avoid that one.
> 
> What exactly do you mean by "the mutex issue"? Is using mutexes at all the problem, or is it locking them too often per operation (i.e. more than once per alloc)?

Too often per operation.  So given the vararg stuff you mentioned above, this isn't an issue IMO.  As it is, locks should only be acquired if the app is multithreaded, so this cost is only incurred if it's necessary. I think Phobos might actually lock in a few instances where it isn't strictly necessary, but I can't recall for certain.

> If you don't want to use them at all, I think the closest you'd get would involve implementing thread-local heaps.
> But that would still require a mutex if you want to allow deleting an object from a different thread than it was allocated in...

No it wouldn't--give each per-thread heap a lock-free free list.  But per-thread heaps for a GC are somewhat complicated.  I think you'd have to implement something like a read/write lock where the "writer" is actually the thread that wants to collect.


Sean
January 08, 2007
== Quote from Oskar Linde (oskar.lindeREM@OVEgmail.com)'s article
> This means that each time you rely on the GC (array appending/resizing, Phobos function calls etc), you have a potential memory leak.

Wrong. You have an intentional memory leak, from which the GC cannot recover you.

The GC was designed to eventually free coders from memory leak accidents. To enable this all variables are initialized by default-- -because this way no random data, including pointers, can survive.

In total your toy program establishes exactly that environment for which the GC is known to fail.

There is an easy solution for such environments by increasing the amount of memory needed for pointers by the factor two---or halving the available memory if memory bounds are tight and one takes a look from the other side.

But even this solution will not prevent the GC from sucking in all data swapped out, if the GC is not coupled with the virtual memory manager.
January 08, 2007
Oskar Linde wrote:
> After having fought a while with D programs with runaway memory leaks, I've unfortunately had to come to the conclusion that the D GC is not ready for production use. The problem is what I'd call "spurious pointers". That is random data (strings, numbers, image data, audio or whatever) appearing to the GC to be full of pointers to all over the memory space.

Some time ago I had the exact same problem, and I tried to convince Walter of how bad this problem is. Others have said the same thing before you and me. To Walter, "in practice, all integers vary from 0 to 100" or something like that, implying that the problem does not exist in fact or is not relevant.

I should also mention that this is considered a serious security issue, on the same class of the QuickSort algorithm.

For those not aware of secure programming practices: using QuickSort to sort data input by a remote user is considered a bad programming practice, because although QuickSort is O(n*log(n)) on average, there is a set of input data that makes the algorithm O(n^2). Knowing the implementation, an attacker may feed the application with carefully crafted data that exploits this weakness.

In the D case, knowing how data are laid on memory, it is easy for a remote attacker to feed data that, read as pointers, would point to large (and otherwise temporary) blocks of data, making them leak. Stack base randomization and other similar techniques helps preventing this, but IMO, just hides the problem.

Another problem in current GC implementation, IMO, is that memory space once allocated is never given back to the OS when disposed. What makes this worse is that it is common for programs to use a big amount of memory during initialization (like parsing XML data files), and never use it again. Again, Walter already knows of this and doesn't think it is a problem.
January 08, 2007
Honestly, it was a long time ago, so remembering is difficult. I didn't have SVN installed back then :) I'm pretty sure it wasn't multi-threaded. I think that we launched multiple processes. (eg. not mutexes either) I know I used a lot of assoc arrays, and I also used a lot of concatenation. The size of the code was quite huge too.. at least 8000 lines.

It could be a combination of those factors, and it may have been as early of a version of 0.73. I dunno dude, I just remember it sucked.

I have not really been having bad experiences with the GC lately, accept that, it's annoying at times with the peaks. Every request will give a response time of 2ms, accept when the GC runs, and then it'll give a 50ms response. I suppose that's not that horrible, but it is a 25x difference.

What I've wanted to do for a while now is to be able to set the size of the pool of memory, and also want the variable to be set telling me some form of information of how close the next collect is.. whether it's largest memory block avail, or % fragmentation, or something, so I can take the processes and tell them to stop accepting requests (and let one of the other processes handle it) and run the GC so the GC doesn't run while accepting requests.

I guess I can just estimate based off of the amount of requests it has processed... but the other way sounds way cooler.


Sean Kelly wrote:
> kenny wrote:
>> I also have experienced bad GC performance. I found it to be because of the ~ operator on strings. The program is a daemon, and after it had been running for a while, memory usage gets truly horrific, and performance degrades very bad.
>>
>> This was back on 0.140, so things may have changed since then...
> 
> Probably not.  Assuming this is a multithreaded app, you have to pay for two mutex locks for every concat operation.  So an expression like:
> 
> a = b ~ c ~ d;
> 
> would result in six locks of the mutex protecting the GC to perform allocations (I think anyway--I don't believe chained concats have special handling).  I think this would also result in two discarded temporary buffers for the GC to clean up later.  And since the Phobos GC scans all such blocks by default...
> 
> My patch will address the scanning issue, but it won't address the mutex issue--there's really no easy way to avoid that one.
> 
> 
> Sean
January 08, 2007
Sean Kelly wrote:
> Frits van Bommel wrote:
>> Also: Why two locks per concat at all? A concat only requires one allocation, so does a single alloc require two locks?
>> (I haven't looked much into the GC code, so this is just curiosity)
> 
> The code in internal/gc/gc.d first checks the size of the array to see if a realloc is necessary, then it performs the realloc.  So worst case you're stuck with two locks per operation.  However, this may not be the case in the arraycat routines.  It's been a while since I've looked at them.

Normal concats always allocate, so there's no need to check the size (and so they don't).
You're probably thinking of appends.

>> If you don't want to use them at all, I think the closest you'd get would involve implementing thread-local heaps.
>> But that would still require a mutex if you want to allow deleting an object from a different thread than it was allocated in...
> 
> No it wouldn't--give each per-thread heap a lock-free free list.  But per-thread heaps for a GC are somewhat complicated.  I think you'd have to implement something like a read/write lock where the "writer" is actually the thread that wants to collect.

I replied to myself about 10 minutes later when I realized it was probably possible :).
January 08, 2007
Miles wrote:
> I should also mention that this is considered a serious security issue,
> on the same class of the QuickSort algorithm.
> 
> For those not aware of secure programming practices: using QuickSort to
> sort data input by a remote user is considered a bad programming
> practice, because although QuickSort is O(n*log(n)) on average, there is
> a set of input data that makes the algorithm O(n^2). Knowing the
> implementation, an attacker may feed the application with carefully
> crafted data that exploits this weakness.

Of course, if you're only concerned about malicious users sending worst-case data (and not about worst-case data appearing randomly) there's an easy fix: randomize the data in O(n) :).
Don't laugh, they taught me that in school. And it works, Quicksort will remain O(n*log(n)) on average with that simple step performed beforehand, no matter the input data.
You'll need to trust your randomizer though, if the attacker can influence *that* you're just screwed and should use introsort or merge sort or something else with guaranteed O(n*log(n)) behavior if that's really what you want...
January 08, 2007
Oskar Linde wrote:
> After having fought a while with D programs with runaway memory leaks,
> I've unfortunately had to come to the conclusion that the D GC is not
> ready for production use. The problem is what I'd call "spurious
> pointers". That is random data (strings, numbers, image data, audio or
> whatever) appearing to the GC to be full of pointers to all over the
> memory space.


Wow.  That may be the problem I'm having too.  I have some radial basis function interpolation code that I put together just before the holiday break, which I observed to be leaking like mad.  First thing I need to do today after catching up on mail is to figure out why, but maybe you've just answered the question for me.

--bb
January 09, 2007
Here's a slightly less contrived version of Oskar's gc test.

import std.math;
import std.random;
import std.stdio;

void main() {
    // The real memory use, ~40 mb
    double[] data;
    data.length = 5_000_000;
    foreach(i, inout x; data) {
        x = sin(cast(double)i/data.length);
        //x = 1;
    }
    int count = 0;
    int gcount = 0;
    while(1) {
        // simulate reading a few kb of data
        double[] incoming;
        incoming.length = 1000 + rand() % 5000;
        foreach(i, inout x; incoming) {
            x = sin(cast(double)i/incoming.length);
            //x = 5;
        }
        // do something with the data...

        // print status message every so often
        count += incoming.length;
        if (count > 1_000_000) {
            count = 0;
            gcount++;
            writefln("%s processed", gcount);
        }
    }
}



This one uses doubles instead of uints and the data is the sin of some number.  These are _very_ realistic values for numeric data to have. The same effect can be seen.  Instead of hovering around 40MB, the memory use grows and grows and performance slows and slows.

This seems to be a very big issue.  The GC seems to be pretty much useless right now if you're going to have a lot of floating point data in your app.

--bb

Oskar Linde wrote:
> After having fought a while with D programs with runaway memory leaks, I've unfortunately had to come to the conclusion that the D GC is not ready for production use. The problem is what I'd call "spurious pointers". That is random data (strings, numbers, image data, audio or whatever) appearing to the GC to be full of pointers to all over the memory space.
> 
> Consider this simple program. It is designed to have a memory footprint of about 20 mb and then continuously process data.
>