Jump to page: 1 2
Thread overview
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
Mar 17, 2008
Craig Black
Mar 17, 2008
BCS
Mar 17, 2008
Craig Black
Mar 17, 2008
renoX
Mar 17, 2008
Craig Black
Mar 18, 2008
renoX
Mar 17, 2008
Sean Kelly
Mar 17, 2008
Sean Kelly
Mar 17, 2008
Craig Black
Mar 18, 2008
Vladimir Panteleev
Mar 18, 2008
renoX
Mar 19, 2008
Craig Black
Mar 19, 2008
Sean Kelly
Mar 19, 2008
Frits van Bommel
Mar 19, 2008
Sean Kelly
Mar 19, 2008
Frits van Bommel
Mar 18, 2008
Robert Fraser
Mar 17, 2008
BCS
Re: Quantifying the Performance of Garbage Collection vs. Explicit
Mar 17, 2008
Dan
Mar 18, 2008
renoX
March 17, 2008
Interesting article:
http://lambda-the-ultimate.org/node/2552

-Craig
March 17, 2008
Reply to Craig,

> Interesting article:
> http://lambda-the-ultimate.org/node/2552
> -Craig
> 

That test sounds bogus to me. The results for manual memory management are effectively unbeatable. I rather suspect that most program lose the last reference to memory about the same time you can tell it won't be used (so it's almost as quick as possible there) and it has no overhead at all including the accounting needed to keep track of the memory. Add back in the post processing time and then tell us the numbers!

What I want to see is: rebuild a Java app with smart pointer or something, or how about spec a large function block as detailed as you can without knowing if you will be using GC or not and then implement it both ways (sever times by sever people). Then compare those. 


March 17, 2008
"BCS" <ao@pathlink.com> wrote in message news:55391cb32a83e8ca55bd841aba20@news.digitalmars.com...
> Reply to Craig,
>
>> Interesting article:
>> http://lambda-the-ultimate.org/node/2552
>> -Craig
>>
>
> That test sounds bogus to me. The results for manual memory management are effectively unbeatable. I rather suspect that most program lose the last reference to memory about the same time you can tell it won't be used (so it's almost as quick as possible there) and it has no overhead at all including the accounting needed to keep track of the memory. Add back in the post processing time and then tell us the numbers!
>
> What I want to see is: rebuild a Java app with smart pointer or something, or how about spec a large function block as detailed as you can without knowing if you will be using GC or not and then implement it both ways (sever times by sever people). Then compare those.

If I understand the article correctly, I think you are missing the main point of it.  It is not merely saying ,"We ran conclusive test: GC loses to explicit memory management!"  They went much further than that, and identified the primary bottleneck in modern GC implementations.  Namely, that GC doesn't cooperate well with virtual memory.  GC typically scans way more memory than the rest of the application, and consequently causes the most cache misses/page faults.  This article is not bashing GC.  I think it is indicating to us how we can best improve it.  Perhaps some GC algorithm could be developed that is optimized specifically to produce the fewer page faults and cache misses.

Thoughts?

-Craig

March 17, 2008
Craig Black a écrit :
> "BCS" <ao@pathlink.com> wrote in message news:55391cb32a83e8ca55bd841aba20@news.digitalmars.com...
>> Reply to Craig,
>>
>>> Interesting article:
>>> http://lambda-the-ultimate.org/node/2552
>>> -Craig
>>>
>>
>> That test sounds bogus to me. The results for manual memory management are effectively unbeatable. I rather suspect that most program lose the last reference to memory about the same time you can tell it won't be used (so it's almost as quick as possible there) and it has no overhead at all including the accounting needed to keep track of the memory. Add back in the post processing time and then tell us the numbers!
>>
>> What I want to see is: rebuild a Java app with smart pointer or something, or how about spec a large function block as detailed as you can without knowing if you will be using GC or not and then implement it both ways (sever times by sever people). Then compare those.
> 
> If I understand the article correctly, I think you are missing the main point of it.  It is not merely saying ,"We ran conclusive test: GC loses to explicit memory management!"  They went much further than that, and identified the primary bottleneck in modern GC implementations.  Namely, that GC doesn't cooperate well with virtual memory.  GC typically scans way more memory than the rest of the application, and consequently causes the most cache misses/page faults.  This article is not bashing GC.  I think it is indicating to us how we can best improve it.  Perhaps some GC algorithm could be developed that is optimized specifically to produce the fewer page faults and cache misses.
> 
> Thoughts?


The same authors have made a VM friendly GC, it is linked at the end the article..
http://lambda-the-ultimate.org/node/2391

The are two drawback of their GC:
1) The OS needs to be modified so that the VM and the GC communicate.

2) It is a moving GC, in the original paper they compares their GC with moving and non-moving variant and the moving one win.

For Java this isn't an issue, but for D it is: working with C's library wouldn't work if the GC is a moving one..

I don't know how to fix this one..

Regards,
renoX



> 
> -Craig
> 
March 17, 2008
> The are two drawback of their GC:
> 1) The OS needs to be modified so that the VM and the GC communicate.

Perhaps not.  There may be other ways to detect how the VM for virtual memory (not virtual machine) works without specific OS queries.  It could be part of the application initialization to determine this.  I don't know enough details to know whether this is possible.

> 2) It is a moving GC, in the original paper they compares their GC with moving and non-moving variant and the moving one win.
>
> For Java this isn't an issue, but for D it is: working with C's library wouldn't work if the GC is a moving one..
>
> I don't know how to fix this one..
>

Perhaps we need to transition away from reliance on non-GC friendly C
library functions?
IMO, a moving GC for D should a high priority, and well worth the extra
work.

-Craig


March 17, 2008
Craig Black wrote:
> 
> If I understand the article correctly, I think you are missing the main point of it.  It is not merely saying ,"We ran conclusive test: GC loses to explicit memory management!"  They went much further than that, and identified the primary bottleneck in modern GC implementations.  Namely, that GC doesn't cooperate well with virtual memory.  GC typically scans way more memory than the rest of the application, and consequently causes the most cache misses/page faults.  This article is not bashing GC.  I think it is indicating to us how we can best improve it.  Perhaps some GC algorithm could be developed that is optimized specifically to produce the fewer page faults and cache misses.
> 

I didn't read the whole article (only the segments that were quoted) so I may have missed the point. The quoting article seem to me to be using this to claim that GC looses. If that's not the main point... Ok, my bad.

> Thoughts?

I built a version of lisp a few months back that needed some sort of libsegv library for it's GC. it sounds like it might be of use. Something like only scanning pages that have changed and caching the rest.

> 
> -Craig
> 
March 17, 2008
== Quote from renoX (renosky@free.fr)'s article
> The same authors have made a VM friendly GC, it is linked at the end the
> article..
> http://lambda-the-ultimate.org/node/2391
> The are two drawback of their GC:
> 1) The OS needs to be modified so that the VM and the GC communicate.
> 2) It is a moving GC, in the original paper they compares their GC with
> moving and non-moving variant and the moving one win.
> For Java this isn't an issue, but for D it is: working with C's library
> wouldn't work if the GC is a moving one..
> I don't know how to fix this one..

There is published research regarding VM-aware garbage collectors, and creating one for D wouldn't be terribly difficult.  Portability is more of an issue, as you've mentioned.  I believe plugging one into Windows isn't terribly difficult, but for *nix it can require a kernel re-compile to get access to the proper hooks.  Because of this, I don't think that VM-aware GCs are the best general purpose solution, even though their performance can be quite good.


Sean
March 17, 2008
== Quote from Sean Kelly (sean@invisibleduck.org)'s article
> There is published research regarding VM-aware garbage collectors, and creating one for D wouldn't be terribly difficult.

Oops.  That blog links to one of the papers I was thinking of, so I guess you're already aware of it :-)


Sean
March 17, 2008
"Sean Kelly" <sean@invisibleduck.org> wrote in message news:frmhe4$20jp$1@digitalmars.com...
> == Quote from renoX (renosky@free.fr)'s article
>> The same authors have made a VM friendly GC, it is linked at the end the
>> article..
>> http://lambda-the-ultimate.org/node/2391
>> The are two drawback of their GC:
>> 1) The OS needs to be modified so that the VM and the GC communicate.
>> 2) It is a moving GC, in the original paper they compares their GC with
>> moving and non-moving variant and the moving one win.
>> For Java this isn't an issue, but for D it is: working with C's library
>> wouldn't work if the GC is a moving one..
>> I don't know how to fix this one..
>
> There is published research regarding VM-aware garbage collectors, and
> creating one for D wouldn't be
> terribly difficult.  Portability is more of an issue, as you've mentioned.
> I believe plugging one into
> Windows isn't terribly difficult, but for *nix it can require a kernel
> re-compile to get access to the
> proper hooks.  Because of this, I don't think that VM-aware GCs are the
> best general purpose solution,
> even though their performance can be quite good.
>
>
> Sean

I don't think the *nix kernel recompile issue should be a show stopper.  I would be very happy with a prototype that was Windows only.  Once we have it working and can measure the benefits, then tackle the *nix issues.  A fast GC would really help D to be competitive.  It would be nice if the D community was leading GC innovation (like Tango just did with the XML parser).

-Craig


March 17, 2008
BCS Wrote:

> Craig Black wrote:
> > 
> > If I understand the article correctly, I think you are missing the main point of it.  It is not merely saying ,"We ran conclusive test: GC loses to explicit memory management!"  They went much further than that, and identified the primary bottleneck in modern GC implementations.  Namely, that GC doesn't cooperate well with virtual memory.  GC typically scans way more memory than the rest of the application, and consequently causes the most cache misses/page faults.  This article is not bashing GC.  I think it is indicating to us how we can best improve it.  Perhaps some GC algorithm could be developed that is optimized specifically to produce the fewer page faults and cache misses.

I typically use GC, but when I don't it gets linked anyways.  Bloat.

As for preventing cache/page misses...

Some blocks will be in cache (a), some pages will be in memory (b), and some will be on swap (c).
Set (b) and set (c) can be differentiated by OS accounting.
(a) is a subset of (b) does not overlap (c).
Set (a) and set (b) cannot be differentiated by any explicit accounting.

- Any system which allowed you to not analyze memory stored on swap will definitely improve performance.

- Performing a cache miss causes the requested data not to be available for a guestimated 100 cycles.  Prefetching batches of memory that you want to analyze, and analyzing it as it actually arrives means you could dramatically reduce cache miss overhead.

- Not allowing pointers to be implicitly cast from other types might allow you to store pointers in the GC to all pointers, and store simple patterns for arrays of structs containing pointers and arrays of pointers.  This could let you skip any pages that clearly don't have pointer information.

This would become sensible if pointers were treated as a typedef'd fully qualified integer type (with shift, add etc working on them), and an alias of '*'.  As we found with C, allowing one type to be used to mean two different things is bad, so I would recommend a p32 and a p64.

In fact, causing all pointers and array {ptr,length} structs to be colocated where possible in a program would *dramatically* improve GC performance as only a few pages would ever need to be brought into cache during a GC phase - the same that would probably already be in cache during normal program operation.

This is probably already true; but the effect would be more dramatic if the GC didn't need to analyze the contents of a pointer because it knew the contents themselves didn't contain another pointer?

Best Regards,
Dan
« First   ‹ Prev
1 2