| Thread overview | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
March 17, 2008 Quantifying the Performance of Garbage Collection vs. Explicit Memory Management | ||||
|---|---|---|---|---|
| ||||
Interesting article: http://lambda-the-ultimate.org/node/2552 -Craig | ||||
March 17, 2008 Re: Quantifying the Performance of Garbage Collection vs. Explicit Memory Management | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Craig Black | Reply to Craig,
> Interesting article:
> http://lambda-the-ultimate.org/node/2552
> -Craig
>
That test sounds bogus to me. The results for manual memory management are effectively unbeatable. I rather suspect that most program lose the last reference to memory about the same time you can tell it won't be used (so it's almost as quick as possible there) and it has no overhead at all including the accounting needed to keep track of the memory. Add back in the post processing time and then tell us the numbers!
What I want to see is: rebuild a Java app with smart pointer or something, or how about spec a large function block as detailed as you can without knowing if you will be using GC or not and then implement it both ways (sever times by sever people). Then compare those.
| |||
March 17, 2008 Re: Quantifying the Performance of Garbage Collection vs. Explicit Memory Management | ||||
|---|---|---|---|---|
| ||||
Posted in reply to BCS | "BCS" <ao@pathlink.com> wrote in message news:55391cb32a83e8ca55bd841aba20@news.digitalmars.com... > Reply to Craig, > >> Interesting article: >> http://lambda-the-ultimate.org/node/2552 >> -Craig >> > > That test sounds bogus to me. The results for manual memory management are effectively unbeatable. I rather suspect that most program lose the last reference to memory about the same time you can tell it won't be used (so it's almost as quick as possible there) and it has no overhead at all including the accounting needed to keep track of the memory. Add back in the post processing time and then tell us the numbers! > > What I want to see is: rebuild a Java app with smart pointer or something, or how about spec a large function block as detailed as you can without knowing if you will be using GC or not and then implement it both ways (sever times by sever people). Then compare those. If I understand the article correctly, I think you are missing the main point of it. It is not merely saying ,"We ran conclusive test: GC loses to explicit memory management!" They went much further than that, and identified the primary bottleneck in modern GC implementations. Namely, that GC doesn't cooperate well with virtual memory. GC typically scans way more memory than the rest of the application, and consequently causes the most cache misses/page faults. This article is not bashing GC. I think it is indicating to us how we can best improve it. Perhaps some GC algorithm could be developed that is optimized specifically to produce the fewer page faults and cache misses. Thoughts? -Craig | |||
March 17, 2008 Re: Quantifying the Performance of Garbage Collection vs. Explicit Memory Management | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Craig Black | Craig Black a écrit : > "BCS" <ao@pathlink.com> wrote in message news:55391cb32a83e8ca55bd841aba20@news.digitalmars.com... >> Reply to Craig, >> >>> Interesting article: >>> http://lambda-the-ultimate.org/node/2552 >>> -Craig >>> >> >> That test sounds bogus to me. The results for manual memory management are effectively unbeatable. I rather suspect that most program lose the last reference to memory about the same time you can tell it won't be used (so it's almost as quick as possible there) and it has no overhead at all including the accounting needed to keep track of the memory. Add back in the post processing time and then tell us the numbers! >> >> What I want to see is: rebuild a Java app with smart pointer or something, or how about spec a large function block as detailed as you can without knowing if you will be using GC or not and then implement it both ways (sever times by sever people). Then compare those. > > If I understand the article correctly, I think you are missing the main point of it. It is not merely saying ,"We ran conclusive test: GC loses to explicit memory management!" They went much further than that, and identified the primary bottleneck in modern GC implementations. Namely, that GC doesn't cooperate well with virtual memory. GC typically scans way more memory than the rest of the application, and consequently causes the most cache misses/page faults. This article is not bashing GC. I think it is indicating to us how we can best improve it. Perhaps some GC algorithm could be developed that is optimized specifically to produce the fewer page faults and cache misses. > > Thoughts? The same authors have made a VM friendly GC, it is linked at the end the article.. http://lambda-the-ultimate.org/node/2391 The are two drawback of their GC: 1) The OS needs to be modified so that the VM and the GC communicate. 2) It is a moving GC, in the original paper they compares their GC with moving and non-moving variant and the moving one win. For Java this isn't an issue, but for D it is: working with C's library wouldn't work if the GC is a moving one.. I don't know how to fix this one.. Regards, renoX > > -Craig > | |||
March 17, 2008 Re: Quantifying the Performance of Garbage Collection vs. Explicit Memory Management | ||||
|---|---|---|---|---|
| ||||
Posted in reply to renoX | > The are two drawback of their GC: > 1) The OS needs to be modified so that the VM and the GC communicate. Perhaps not. There may be other ways to detect how the VM for virtual memory (not virtual machine) works without specific OS queries. It could be part of the application initialization to determine this. I don't know enough details to know whether this is possible. > 2) It is a moving GC, in the original paper they compares their GC with moving and non-moving variant and the moving one win. > > For Java this isn't an issue, but for D it is: working with C's library wouldn't work if the GC is a moving one.. > > I don't know how to fix this one.. > Perhaps we need to transition away from reliance on non-GC friendly C library functions? IMO, a moving GC for D should a high priority, and well worth the extra work. -Craig | |||
March 17, 2008 Re: Quantifying the Performance of Garbage Collection vs. Explicit Memory Management | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Craig Black | Craig Black wrote: > > If I understand the article correctly, I think you are missing the main point of it. It is not merely saying ,"We ran conclusive test: GC loses to explicit memory management!" They went much further than that, and identified the primary bottleneck in modern GC implementations. Namely, that GC doesn't cooperate well with virtual memory. GC typically scans way more memory than the rest of the application, and consequently causes the most cache misses/page faults. This article is not bashing GC. I think it is indicating to us how we can best improve it. Perhaps some GC algorithm could be developed that is optimized specifically to produce the fewer page faults and cache misses. > I didn't read the whole article (only the segments that were quoted) so I may have missed the point. The quoting article seem to me to be using this to claim that GC looses. If that's not the main point... Ok, my bad. > Thoughts? I built a version of lisp a few months back that needed some sort of libsegv library for it's GC. it sounds like it might be of use. Something like only scanning pages that have changed and caching the rest. > > -Craig > | |||
March 17, 2008 Re: Quantifying the Performance of Garbage Collection vs. Explicit Memory Management | ||||
|---|---|---|---|---|
| ||||
Posted in reply to renoX | == Quote from renoX (renosky@free.fr)'s article
> The same authors have made a VM friendly GC, it is linked at the end the
> article..
> http://lambda-the-ultimate.org/node/2391
> The are two drawback of their GC:
> 1) The OS needs to be modified so that the VM and the GC communicate.
> 2) It is a moving GC, in the original paper they compares their GC with
> moving and non-moving variant and the moving one win.
> For Java this isn't an issue, but for D it is: working with C's library
> wouldn't work if the GC is a moving one..
> I don't know how to fix this one..
There is published research regarding VM-aware garbage collectors, and creating one for D wouldn't be terribly difficult. Portability is more of an issue, as you've mentioned. I believe plugging one into Windows isn't terribly difficult, but for *nix it can require a kernel re-compile to get access to the proper hooks. Because of this, I don't think that VM-aware GCs are the best general purpose solution, even though their performance can be quite good.
Sean
| |||
March 17, 2008 Re: Quantifying the Performance of Garbage Collection vs. Explicit Memory Management | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | == Quote from Sean Kelly (sean@invisibleduck.org)'s article
> There is published research regarding VM-aware garbage collectors, and creating one for D wouldn't be terribly difficult.
Oops. That blog links to one of the papers I was thinking of, so I guess you're already aware of it :-)
Sean
| |||
March 17, 2008 Re: Quantifying the Performance of Garbage Collection vs. Explicit Memory Management | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | "Sean Kelly" <sean@invisibleduck.org> wrote in message news:frmhe4$20jp$1@digitalmars.com... > == Quote from renoX (renosky@free.fr)'s article >> The same authors have made a VM friendly GC, it is linked at the end the >> article.. >> http://lambda-the-ultimate.org/node/2391 >> The are two drawback of their GC: >> 1) The OS needs to be modified so that the VM and the GC communicate. >> 2) It is a moving GC, in the original paper they compares their GC with >> moving and non-moving variant and the moving one win. >> For Java this isn't an issue, but for D it is: working with C's library >> wouldn't work if the GC is a moving one.. >> I don't know how to fix this one.. > > There is published research regarding VM-aware garbage collectors, and > creating one for D wouldn't be > terribly difficult. Portability is more of an issue, as you've mentioned. > I believe plugging one into > Windows isn't terribly difficult, but for *nix it can require a kernel > re-compile to get access to the > proper hooks. Because of this, I don't think that VM-aware GCs are the > best general purpose solution, > even though their performance can be quite good. > > > Sean I don't think the *nix kernel recompile issue should be a show stopper. I would be very happy with a prototype that was Windows only. Once we have it working and can measure the benefits, then tackle the *nix issues. A fast GC would really help D to be competitive. It would be nice if the D community was leading GC innovation (like Tango just did with the XML parser). -Craig | |||
March 17, 2008 Re: Quantifying the Performance of Garbage Collection vs. Explicit | ||||
|---|---|---|---|---|
| ||||
Posted in reply to BCS | BCS Wrote:
> Craig Black wrote:
> >
> > If I understand the article correctly, I think you are missing the main point of it. It is not merely saying ,"We ran conclusive test: GC loses to explicit memory management!" They went much further than that, and identified the primary bottleneck in modern GC implementations. Namely, that GC doesn't cooperate well with virtual memory. GC typically scans way more memory than the rest of the application, and consequently causes the most cache misses/page faults. This article is not bashing GC. I think it is indicating to us how we can best improve it. Perhaps some GC algorithm could be developed that is optimized specifically to produce the fewer page faults and cache misses.
I typically use GC, but when I don't it gets linked anyways. Bloat.
As for preventing cache/page misses...
Some blocks will be in cache (a), some pages will be in memory (b), and some will be on swap (c).
Set (b) and set (c) can be differentiated by OS accounting.
(a) is a subset of (b) does not overlap (c).
Set (a) and set (b) cannot be differentiated by any explicit accounting.
- Any system which allowed you to not analyze memory stored on swap will definitely improve performance.
- Performing a cache miss causes the requested data not to be available for a guestimated 100 cycles. Prefetching batches of memory that you want to analyze, and analyzing it as it actually arrives means you could dramatically reduce cache miss overhead.
- Not allowing pointers to be implicitly cast from other types might allow you to store pointers in the GC to all pointers, and store simple patterns for arrays of structs containing pointers and arrays of pointers. This could let you skip any pages that clearly don't have pointer information.
This would become sensible if pointers were treated as a typedef'd fully qualified integer type (with shift, add etc working on them), and an alias of '*'. As we found with C, allowing one type to be used to mean two different things is bad, so I would recommend a p32 and a p64.
In fact, causing all pointers and array {ptr,length} structs to be colocated where possible in a program would *dramatically* improve GC performance as only a few pages would ever need to be brought into cache during a GC phase - the same that would probably already be in cache during normal program operation.
This is probably already true; but the effect would be more dramatic if the GC didn't need to analyze the contents of a pointer because it knew the contents themselves didn't contain another pointer?
Best Regards,
Dan
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply