February 19, 2011 GC.forget() (Was: O(N) Garbage collection?) | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | Yeh, I rebuilt the same model in my head over the past few hours (like you, I had a good mental model of the GC at one point but have slowly forgotten it). Unfortunately it looks like there's no easy fix. It also seems like gcbits are allocated as 1 bit for every 16 bytes of heap space, no matter what, and that a scan touches all of these, no matter what. This is another place for O(N) to creep in. About the only thing that would fix these is a rewrite, since addressing these problems would have ripple effects everywhere in the GC.
The obvious but ugly workaround is to use the C heap for large, long-lived data structures, especially ones that live until the program terminates (because then you don't have to worry about freeing them). This sucks horribly, though, because it forces you to think about memory management in exactly the kind of way the GC is meant to automate, and prevents the use of standard library functions and builtin array appending, etc. for building said data structures.
Maybe what we need to remedy this is a function called GC.forget(). This function would allow you to build your data structures as regular GC objects. Then you'd call forget on the head pointer, and the GC would remove all information about the data structure (or move it to a completely unscanned, etc. forgottenData data structure, in which case you could call remind() to bring it back to normal GC memory), transitively if it's something more than just a simple array. The data would now act as if it was allocated by the C heap.
In a way this is poor man's generational GC. It's poor man's in the sense that it requires the user to manually specify which objects are long-lived. On the other hand, it could be better than generational GC in some ways because you wouldn't need tons of compiler infrastructure or pay tons of performance penalties to update the GC data structures on simple pointer assignment.
Then again, I don't imagine GC.forget being particularly easy to implement either, except in the simplest case where you have a huge array that takes up a whole pool. You'd have to do things like split pools in half if forget() was called on an object in the middle of a pool.
On 2/19/2011 5:34 PM, Steven Schveighoffer wrote:
> On Sat, 19 Feb 2011 00:03:27 -0500, dsimcha <dsimcha@yahoo.com> wrote:
>
>> I've been trying out D's new 64-bit compiler and a serious barrier to
>> using it effectively seems to be abysmal garbage collection
>> performance with large heaps. It seems like the time for a garbage
>> collection to run scales linearly with the size of the heap *even if
>> most of the heap is marked as NO_SCAN*. I'm running a program with a
>> heap size of ~6GB, almost all of which is strings (DNA sequences),
>> which are not scanned by the GC. It's spending most of its time in GC,
>> based on pausing it every once in a while in GDB and seeing what's at
>> the top of the stack.
>>
>> Here's a test program and the results for a few runs.
>>
>> import std.stdio, std.datetime, core.memory, std.conv;
>>
>> void main(string[] args) {
>> if(args.length < 2) {
>> stderr.writeln("Need size.");
>> return;
>> }
>>
>> immutable mul = to!size_t(args[1]);
>> auto ptr = GC.malloc(mul * 1_048_576, GC.BlkAttr.NO_SCAN);
>>
>> auto sw = StopWatch(autoStart);
>> GC.collect();
>> immutable msec = sw.peek.msecs;
>> writefln("Collected a %s megabyte heap in %s milliseconds.",
>> mul, msec);
>> }
>>
>> Outputs for various sizes:
>>
>> Collected a 10 megabyte heap in 1 milliseconds.
>> Collected a 50 megabyte heap in 4 milliseconds.
>> Collected a 200 megabyte heap in 16 milliseconds.
>> Collected a 500 megabyte heap in 41 milliseconds.
>> Collected a 1000 megabyte heap in 80 milliseconds.
>> Collected a 5000 megabyte heap in 397 milliseconds.
>> Collected a 10000 megabyte heap in 801 milliseconds.
>> Collected a 30000 megabyte heap in 2454 milliseconds.
>> Collected a 50000 megabyte heap in 4096 milliseconds.
>>
>> Note that these tests were run on a server with over 100 GB of
>> physical RAM, so a shortage of physical memory isn't the problem.
>> Shouldn't GC be O(1) with respect to the size of the unscanned portion
>> of the heap?
>
> Having recently constructed the GC model in my head (and it's rapidly
> deteriorating from there, believe me), here is a stab at what I see
> could be a bottleneck.
>
> The way the GC works is you have this giant loop (in pseudocode):
> bool changed;
> while(changed)
> {
> changed = false;
>
> foreach(memblock in heap)
> {
> if(memblock.marked && memblock.containsPointers)
> foreach(pointer in memblock)
> {
> auto memblock2 = heap.findBlock(pointer);
> if(memblock2 && !memblock2.marked)
> {
> memblock2.mark();
> changed = true;
> }
> }
> }
> }
>
> So you can see two things. First, every iteration of the outer while
> loop loops through *all* memory blocks, even if they do not contain
> pointers. This has a non-negligible cost.
> Second, there looks like the potential for the while loop to mark one,
> or at least a very small number, of blocks, so the algorithm worst case
> degenerates into O(n^2). This may not be happening, but it made me a
> little uneasy.
>
> The part in my mind that already deteriorated is whether marked blocks
> which have already been scanned are scanned again. I would guess not,
> but if that's not the case, that would be a really easy thing to fix.
>
> Also note that the findPointer function is a binary search I think. So
> you are talking O(lg(n)) there, not O(1).
>
> Like I said, I may not remember exactly how it works.
>
> -steve
| |||
February 19, 2011 Re: GC.forget() (Was: O(N) Garbage collection?) | ||||
|---|---|---|---|---|
| ||||
Posted in reply to dsimcha | ...and actually, forget() would only work for arrays of primitives, because if the object has pointers, you can change what these point to after calling forget() and Bad Things Can Happen.
On 2/19/2011 6:17 PM, dsimcha wrote:
> Yeh, I rebuilt the same model in my head over the past few hours (like
> you, I had a good mental model of the GC at one point but have slowly
> forgotten it). Unfortunately it looks like there's no easy fix. It also
> seems like gcbits are allocated as 1 bit for every 16 bytes of heap
> space, no matter what, and that a scan touches all of these, no matter
> what. This is another place for O(N) to creep in. About the only thing
> that would fix these is a rewrite, since addressing these problems would
> have ripple effects everywhere in the GC.
>
> The obvious but ugly workaround is to use the C heap for large,
> long-lived data structures, especially ones that live until the program
> terminates (because then you don't have to worry about freeing them).
> This sucks horribly, though, because it forces you to think about memory
> management in exactly the kind of way the GC is meant to automate, and
> prevents the use of standard library functions and builtin array
> appending, etc. for building said data structures.
>
> Maybe what we need to remedy this is a function called GC.forget(). This
> function would allow you to build your data structures as regular GC
> objects. Then you'd call forget on the head pointer, and the GC would
> remove all information about the data structure (or move it to a
> completely unscanned, etc. forgottenData data structure, in which case
> you could call remind() to bring it back to normal GC memory),
> transitively if it's something more than just a simple array. The data
> would now act as if it was allocated by the C heap.
>
> In a way this is poor man's generational GC. It's poor man's in the
> sense that it requires the user to manually specify which objects are
> long-lived. On the other hand, it could be better than generational GC
> in some ways because you wouldn't need tons of compiler infrastructure
> or pay tons of performance penalties to update the GC data structures on
> simple pointer assignment.
>
> Then again, I don't imagine GC.forget being particularly easy to
> implement either, except in the simplest case where you have a huge
> array that takes up a whole pool. You'd have to do things like split
> pools in half if forget() was called on an object in the middle of a pool.
>
> On 2/19/2011 5:34 PM, Steven Schveighoffer wrote:
>> On Sat, 19 Feb 2011 00:03:27 -0500, dsimcha <dsimcha@yahoo.com> wrote:
>>
>>> I've been trying out D's new 64-bit compiler and a serious barrier to
>>> using it effectively seems to be abysmal garbage collection
>>> performance with large heaps. It seems like the time for a garbage
>>> collection to run scales linearly with the size of the heap *even if
>>> most of the heap is marked as NO_SCAN*. I'm running a program with a
>>> heap size of ~6GB, almost all of which is strings (DNA sequences),
>>> which are not scanned by the GC. It's spending most of its time in GC,
>>> based on pausing it every once in a while in GDB and seeing what's at
>>> the top of the stack.
>>>
>>> Here's a test program and the results for a few runs.
>>>
>>> import std.stdio, std.datetime, core.memory, std.conv;
>>>
>>> void main(string[] args) {
>>> if(args.length < 2) {
>>> stderr.writeln("Need size.");
>>> return;
>>> }
>>>
>>> immutable mul = to!size_t(args[1]);
>>> auto ptr = GC.malloc(mul * 1_048_576, GC.BlkAttr.NO_SCAN);
>>>
>>> auto sw = StopWatch(autoStart);
>>> GC.collect();
>>> immutable msec = sw.peek.msecs;
>>> writefln("Collected a %s megabyte heap in %s milliseconds.",
>>> mul, msec);
>>> }
>>>
>>> Outputs for various sizes:
>>>
>>> Collected a 10 megabyte heap in 1 milliseconds.
>>> Collected a 50 megabyte heap in 4 milliseconds.
>>> Collected a 200 megabyte heap in 16 milliseconds.
>>> Collected a 500 megabyte heap in 41 milliseconds.
>>> Collected a 1000 megabyte heap in 80 milliseconds.
>>> Collected a 5000 megabyte heap in 397 milliseconds.
>>> Collected a 10000 megabyte heap in 801 milliseconds.
>>> Collected a 30000 megabyte heap in 2454 milliseconds.
>>> Collected a 50000 megabyte heap in 4096 milliseconds.
>>>
>>> Note that these tests were run on a server with over 100 GB of
>>> physical RAM, so a shortage of physical memory isn't the problem.
>>> Shouldn't GC be O(1) with respect to the size of the unscanned portion
>>> of the heap?
>>
>> Having recently constructed the GC model in my head (and it's rapidly
>> deteriorating from there, believe me), here is a stab at what I see
>> could be a bottleneck.
>>
>> The way the GC works is you have this giant loop (in pseudocode):
>> bool changed;
>> while(changed)
>> {
>> changed = false;
>>
>> foreach(memblock in heap)
>> {
>> if(memblock.marked && memblock.containsPointers)
>> foreach(pointer in memblock)
>> {
>> auto memblock2 = heap.findBlock(pointer);
>> if(memblock2 && !memblock2.marked)
>> {
>> memblock2.mark();
>> changed = true;
>> }
>> }
>> }
>> }
>>
>> So you can see two things. First, every iteration of the outer while
>> loop loops through *all* memory blocks, even if they do not contain
>> pointers. This has a non-negligible cost.
>> Second, there looks like the potential for the while loop to mark one,
>> or at least a very small number, of blocks, so the algorithm worst case
>> degenerates into O(n^2). This may not be happening, but it made me a
>> little uneasy.
>>
>> The part in my mind that already deteriorated is whether marked blocks
>> which have already been scanned are scanned again. I would guess not,
>> but if that's not the case, that would be a really easy thing to fix.
>>
>> Also note that the findPointer function is a binary search I think. So
>> you are talking O(lg(n)) there, not O(1).
>>
>> Like I said, I may not remember exactly how it works.
>>
>> -steve
>
| |||
February 19, 2011 Re: GC.forget() (Was: O(N) Garbage collection?) | ||||
|---|---|---|---|---|
| ||||
Posted in reply to dsimcha | dsimcha:
> Yeh, I rebuilt the same model in my head over the past few hours (like you, I had a good mental model of the GC at one point but have slowly forgotten it).
For serious D programmers having such model in the head is so important that I'd like a short page with such explanations & pseudocode in the D site :-)
Bye,
bearophile
| |||
February 20, 2011 Re: O(N) Garbage collection? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to retard | "retard" <re@tard.com.invalid> wrote in message news:ijp7pa$1d34$1@digitalmars.com... > Sat, 19 Feb 2011 14:32:27 -0500, dsimcha wrote: > >> On 2/19/2011 12:50 PM, Ulrik Mikaelsson wrote: >>> Just a thought; I guess the references to the non-GC-scanned strings are held in GC-scanned memory, right? Are the number of such references also increased linearly? >> >> Well, first of all, the benchmark I posted seems to indicate otherwise. >> Second of all, I was running this program before on yeast DNA and it >> was ridiculously fast. Then I tried to do the same thing on human DNA >> and it became slow as molasses. Roughly speaking, w/o getting into the >> biology much, I've got one string for each gene. Yeast have about 1/3 >> as many genes as humans, but the genes are on average about 100 times >> smaller. Therefore, the difference should be at most a small constant >> factor and in actuality it's a huge constant factor. >> >> Note: I know I could make the program in question a lot more space efficient, and that's what I ended up doing. It works now. It's just that it was originally written for yeast, where space efficiency is obviously not a concern, and I would have liked to just try a one-off calculation on the human genome without having to rewrite portions of it. > > Probably one reason for this behavior is the lack of testing. My desktop only has 24 GB of DDR3. I have another machine with 16 GB of DDR2, but don't know how to combine the address spaces via clustering. This would also horribly drag down GC performance. Even JVM is badly tuned for larger systems, they might use the Azul Java runtimes instead.. *Only* 24GB of DDR3, huh? :) Makes me feel like a pauper: I recently upgraded from 1GB to 2GB of DDR1 ;) (It actually had been 2GB a few years ago, but I cannablized half of it to build my Linux box.) Out of curiosity, what are you running on that? (Multiple instances of Crysis? High-definition voxels?) | |||
February 20, 2011 Re: O(N) Garbage collection? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to dsimcha | "dsimcha" <dsimcha@yahoo.com> wrote in message news:ijp61d$1bu1$1@digitalmars.com... > On 2/19/2011 12:50 PM, Ulrik Mikaelsson wrote: >> Just a thought; I guess the references to the non-GC-scanned strings are held in GC-scanned memory, right? Are the number of such references also increased linearly? > > Well, first of all, the benchmark I posted seems to indicate otherwise. Second of all, I was running this program before on yeast DNA and it was ridiculously fast. Then I tried to do the same thing on human DNA and it became slow as molasses. Roughly speaking, w/o getting into the biology much, I've got one string for each gene. Yeast have about 1/3 as many genes as humans, but the genes are on average about 100 times smaller. Therefore, the difference should be at most a small constant factor and in actuality it's a huge constant factor. > Out of curiosity, roughly how many, umm "characters" (I forget the technical term for each T, G, etc), are in each yeast gene, and how many genes do they have? (Humans have, umm, was it 26? My last biology class was ages ago.) > Note: I know I could make the program in question a lot more space efficient, and that's what I ended up doing. It works now. It's just that it was originally written for yeast, where space efficiency is obviously not a concern, and I would have liked to just try a one-off calculation on the human genome without having to rewrite portions of it. | |||
February 20, 2011 Re: O(N) Garbage collection? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Nick Sabalausky | On 2/19/2011 10:21 PM, Nick Sabalausky wrote:
> Out of curiosity, roughly how many, umm "characters" (I forget the technical
> term for each T, G, etc), are in each yeast gene, and how many genes do they
> have? (Humans have, umm, was it 26? My last biology class was ages ago.)
>
It varies massively, but you can compute the averages yourself. There are about 6,000 yeast genes and about 12 million nucleotides (the technical term for "characters"). Humans have about 20k to 25k genes, and a total of about 3 billion nucleotides, though a lot of this is intergenic regions (stuff that isn't genes).
| |||
February 20, 2011 Re: GC.forget() (Was: O(N) Garbage collection?) | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | "bearophile" <bearophileHUGS@lycos.com> wrote in message news:ijpkh8$232r$1@digitalmars.com... > dsimcha: > >> Yeh, I rebuilt the same model in my head over the past few hours (like you, I had a good mental model of the GC at one point but have slowly forgotten it). > > For serious D programmers having such model in the head is so important that I'd like a short page with such explanations & pseudocode in the D site :-) > I seem to remember finding out at one point (the hard way) that arrays of primitives are scanned for pointers unless you explicitly tell the GC not to do so (for each array). A long time ago, this caused me major headaches when I tried to write a batch processing tool for image files. I never actually verified this, but my understanding was that the file and image data contained false pointers that prevented the data from completed images from being collected. So the program just happily kept munching away at more and more memory (and presumably, more false pointers) with each image until after only a fairly small number of files it ground to a sputtering halt under the weight of its own memory footprint. | |||
February 20, 2011 Re: O(N) Garbage collection? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to dsimcha | "dsimcha" <dsimcha@yahoo.com> wrote in message news:ijq2bl$2opj$1@digitalmars.com... > On 2/19/2011 10:21 PM, Nick Sabalausky wrote: >> Out of curiosity, roughly how many, umm "characters" (I forget the >> technical >> term for each T, G, etc), are in each yeast gene, and how many genes do >> they >> have? (Humans have, umm, was it 26? My last biology class was ages ago.) >> > > It varies massively, but you can compute the averages yourself. There are about 6,000 yeast genes and about 12 million nucleotides (the technical term for "characters"). Humans have about 20k to 25k genes, and a total of about 3 billion nucleotides, though a lot of this is intergenic regions (stuff that isn't genes). Ahh, I was confusing "gene" with "chromosome" (and probably still got the exact number wrong ;) ). Interesting stuff. And that certianly explains the computation-practicality difference of yeast vs human: approx 12MB vs 3GB, assuming ASCII/UTF-8. | |||
February 20, 2011 Re: O(N) Garbage collection? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Nick Sabalausky | Sat, 19 Feb 2011 22:15:52 -0500, Nick Sabalausky wrote:
> "retard" <re@tard.com.invalid> wrote in message news:ijp7pa$1d34$1@digitalmars.com...
>> Sat, 19 Feb 2011 14:32:27 -0500, dsimcha wrote:
>>
>>> On 2/19/2011 12:50 PM, Ulrik Mikaelsson wrote:
>>>> Just a thought; I guess the references to the non-GC-scanned strings are held in GC-scanned memory, right? Are the number of such references also increased linearly?
>>>
>>> Well, first of all, the benchmark I posted seems to indicate
>>> otherwise.
>>> Second of all, I was running this program before on yeast DNA and it
>>> was ridiculously fast. Then I tried to do the same thing on human DNA
>>> and it became slow as molasses. Roughly speaking, w/o getting into
>>> the biology much, I've got one string for each gene. Yeast have about
>>> 1/3 as many genes as humans, but the genes are on average about 100
>>> times smaller. Therefore, the difference should be at most a small
>>> constant factor and in actuality it's a huge constant factor.
>>>
>>> Note: I know I could make the program in question a lot more space efficient, and that's what I ended up doing. It works now. It's just that it was originally written for yeast, where space efficiency is obviously not a concern, and I would have liked to just try a one-off calculation on the human genome without having to rewrite portions of it.
>>
>> Probably one reason for this behavior is the lack of testing. My desktop only has 24 GB of DDR3. I have another machine with 16 GB of DDR2, but don't know how to combine the address spaces via clustering. This would also horribly drag down GC performance. Even JVM is badly tuned for larger systems, they might use the Azul Java runtimes instead..
>
> *Only* 24GB of DDR3, huh? :)
>
> Makes me feel like a pauper: I recently upgraded from 1GB to 2GB of DDR1 ;) (It actually had been 2GB a few years ago, but I cannablized half of it to build my Linux box.)
>
> Out of curiosity, what are you running on that? (Multiple instances of Crysis? High-definition voxels?)
The largest processes are virtual machines, application servers, web server, IDE environment, several compiler instances in parallel. The Web browser also seems to have use for a gigabyte or two these days. As I recall, the memory was cheaper than now when I bought it. It's also cheaper than DDR2 or DDR (per gigabyte).
| |||
February 20, 2011 Re: O(N) Garbage collection? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Nick Sabalausky | Nick Sabalausky <a@a.a> wrote: > Ahh, I was confusing "gene" with "chromosome" (and probably still got the > exact number wrong ;) ). That you did. Humans have 23 chromosome pairs. But now you know! -- Simen | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply