Thread overview | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
October 02, 2006 Is a moving GC really needed? | ||||
---|---|---|---|---|
| ||||
I've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble? Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.) I've experienced the problems of memory fragmentation first hand. In the project I'm working on (3D visualization software) I've had to track out-of-memory errors, which turned out to be because of virtual memory fragmentation. At some point, even a malloc/VirtualAlloc (the MS CRT's malloc directly calls VirtualAlloc for big memory blocks) for 80MB failed. Our problems were resolved by reserving a huge block (~1GB) of virtual memory at application start-up, to prevent third-party DLLs from fragmenting the virtual address space. One of the reasons we ran into problems with memory fragmentation was that Windows is actually only using 2GB of virtual address space. Using Windows Address Extension (a flag passed to the linker), however, it is possible to get the full 4GB of virtual address space available. That's an extra 2GB of continuous virtual address space! In the (near) future we'll have 2^64 bytes of virtual address space, which "should be enough for anyone". Is the extra complexity and run-time overhead of a moving GC worth the trouble, at this point in time? L. |
October 02, 2006 Re: Is a moving GC really needed? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Lionello Lunesu | On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu <lio@lunesu.remove.com> wrote:
> I've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble?
>
> Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.)
>
I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in.
It may also improve caching as allocated memory is more together.
|
October 02, 2006 Re: Is a moving GC really needed? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris Miller | Chris Miller wrote: > On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu <lio@lunesu.remove.com> wrote: > >> I've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble? >> >> Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.) >> > > I believe it allows for very fast allocations, almost as fast as stack > allocation. This is something I'm very interested in. > It may also improve caching as allocated memory is more together. Also, the moving part can (and most likely will) lead to the implementation of a generational GC, which should be able to reduce the time spent in scanning with large amounts (I believe todays Java GC's often operate with 6 generations). All GC traits have pro's and con's, the current D GC is as close as you come to a naive implementation, different types of applications need different traits, so having different GC's available will be necessary to secure a future for D. -- Lars Ivar Igesund blog at http://larsivi.net DSource & #D: larsivi |
October 02, 2006 Re: Is a moving GC really needed? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris Miller | Chris Miller wrote: > On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu <lio@lunesu.remove.com> wrote: > >> I've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble? >> >> Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.) >> > > I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in. Why would it allocate faster than a non-moving GC? (Ignoring for the moment the cost of moving.) > It may also improve caching as allocated memory is more together. This I understand, but what's the granularity (if that's the correct term) of a CPU cache? http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache keeps copies of 64 byte lines of memory." I guess the chances are pretty low for two blocks to be moved close together and be cached together, and be accessed together.. L. |
October 02, 2006 Re: Is a moving GC really needed? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Lionello Lunesu | Lionello Lunesu wrote:
> I've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble?
>
> Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.)
>
> I've experienced the problems of memory fragmentation first hand. In the project I'm working on (3D visualization software) I've had to track out-of-memory errors, which turned out to be because of virtual memory fragmentation. At some point, even a malloc/VirtualAlloc (the MS CRT's malloc directly calls VirtualAlloc for big memory blocks) for 80MB failed. Our problems were resolved by reserving a huge block (~1GB) of virtual memory at application start-up, to prevent third-party DLLs from fragmenting the virtual address space.
>
> One of the reasons we ran into problems with memory fragmentation was that Windows is actually only using 2GB of virtual address space. Using Windows Address Extension (a flag passed to the linker), however, it is possible to get the full 4GB of virtual address space available. That's an extra 2GB of continuous virtual address space! In the (near) future we'll have 2^64 bytes of virtual address space, which "should be enough for anyone".
>
> Is the extra complexity and run-time overhead of a moving GC worth the trouble, at this point in time?
While I'm no expert, I doubt a moving GC is even possible in a systems language like D.
First, if you move things around, you obviously need to be precise when updating pointers, lest all hell breaks loose. But how do you update
union {
int a;
void* b;
}
? While you could forbid overlap of pointers and non-pointer data, what about custom allocators, assembler, C libraries (including OS runtime!), etc.?
And second, for the generational case, you need an efficient way to track references from older objects to newer objects, otherwise you need to scan them all, defeating the point of having generations in the first place. While a JIT-compiled language/runtime can relatively easily (and efficiently) do this by injecting appropriate code into older objects, I think it's practically impossible to do so with native code.
I've no idea how to overcome those without involving the end-user (programmer) and/or losing quite a lot of speed during normal operation, which I'm quite sure are not acceptable trade-offs.
On the bright side, I believe there's considerably less need to heap-allocate in D than, say, in Java, and even when used, one can overcome a bad(slow) GC in many cases (with stuff like malloc/free, delete, etc.), so the performance of GC is not as critical.
If the compiler/GC were improved to differentiate between atomic and non-atomic data (the latter contains pointers to other data, the first doesn't), so memory areas that can't contain pointers don't get scanned at all*, I think I'd already be quite happy with the state of things..
xs0
*) that may already be the case, but last time I checked it wasn't :)
|
October 02, 2006 Re: Is a moving GC really needed? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Lionello Lunesu | Lionello Lunesu wrote: > Chris Miller wrote: >> On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu <lio@lunesu.remove.com> wrote: >>> Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.) >>> >> >> I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in. > > Why would it allocate faster than a non-moving GC? (Ignoring for the moment the cost of moving.) "Normal" allocators have to deal with fragmentation. With a moving GC you can allocate everything to the top of the heap and rely on the GC to shrink the heap when you run out of space. So the allocator can basically be "increment top-of-heap-pointer by size of allocation and return old value" (synchronized if in a multi threaded environment, of course). That's pretty much the most efficient allocator possible (as long as the GC doesn't need to run, anyway ;) ) >> It may also improve caching as allocated memory is more together. > > This I understand, but what's the granularity (if that's the correct term) of a CPU cache? > > http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache keeps copies of 64 byte lines of memory." > > I guess the chances are pretty low for two blocks to be moved close together and be cached together, and be accessed IIRC CPUs typically have multiple leveled caches, and the later levels have larger cache lines. Not sure how big they get though. |
October 02, 2006 Re: Is a moving GC really needed? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Frits van Bommel | Frits van Bommel wrote: > Lionello Lunesu wrote: >> Chris Miller wrote: >>> On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu <lio@lunesu.remove.com> wrote: >>>> Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.) >>>> >>> >>> I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in. >> >> Why would it allocate faster than a non-moving GC? (Ignoring for the moment the cost of moving.) > > "Normal" allocators have to deal with fragmentation. > With a moving GC you can allocate everything to the top of the heap and rely on the GC to shrink the heap when you run out of space. > So the allocator can basically be "increment top-of-heap-pointer by size of allocation and return old value" (synchronized if in a multi threaded environment, of course). > That's pretty much the most efficient allocator possible (as long as the GC doesn't need to run, anyway ;) ) > And then if you add 'generations' of objects by 'age', you can effectively scan only the 'youngest' generation to recover some good percentage of garbage, and only scan the rest if memory is still tight. Seems to makes for a pretty good strategy for many programs, including interactive ones (at least for imperative languages). I think the .NET GC does a really good job from what I've seen, and IIRC it is a moving / generational GC. So is Sun's latest (again, IIRC). >>> It may also improve caching as allocated memory is more together. >> >> This I understand, but what's the granularity (if that's the correct term) of a CPU cache? >> >> http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache keeps copies of 64 byte lines of memory." >> >> I guess the chances are pretty low for two blocks to be moved close together and be cached together, and be accessed > > IIRC CPUs typically have multiple leveled caches, and the later levels have larger cache lines. Not sure how big they get though. |
October 02, 2006 Re: Is a moving GC really needed? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Lionello Lunesu | Lionello Lunesu wrote: > I've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble? > > Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.) Fragmentation shouldn't be an issue with the correct allocator strategy, see: http://citeseer.ist.psu.edu/johnstone97memory.html I think the real reason is to speed up collections, as generational GCs (a class of moving GCs) begin by scanning only a segment of allocated memory which contains 'new' allocations. Only if no free space is found does it bother to scan older generations. On each collection, 'new' memory that is still alive typically graduates to an older generation. The idea behind this strategy is that memory which has been around for a while will probably be around for a long time, while a large number of allocations (in function temporaries) are discarded and available for collection almost immediately. > Is the extra complexity and run-time overhead of a moving GC worth the trouble, at this point in time? Probably not, but it's more a goal to make D compatible with moving GCs than to write one at the moment. But doing so is a big pain in some cases--default implementations of toHash and opCmp being particularly sore points. Sean |
October 02, 2006 Re: Is a moving GC really needed? | ||||
---|---|---|---|---|
| ||||
Posted in reply to xs0 | xs0 wrote: > > While I'm no expert, I doubt a moving GC is even possible in a systems language like D. > > First, if you move things around, you obviously need to be precise when updating pointers, lest all hell breaks loose. But how do you update > > union { > int a; > void* b; > } > > ? While you could forbid overlap of pointers and non-pointer data, what about custom allocators, assembler, C libraries (including OS runtime!), etc.? For the union, I might suggest a more acceptable tradeoff - mandate that some data be inserted before every union to tell the gc which member is selected at any moment during program execution. Whenever an assignment is done to the union, code is inserted to update the union's status. So your union would look more like this: enum StructStatus { a, b, } struct { StructStatus status; //or size_t, whichever works union { int a; void* b; } } Now the GC can be precise with unions. Notice also the enum, which would be nice to make available to userland - AFAIK many unions are coded in a struct like that, so this will not be a loss in memory usage for those cases, provided D exposes the implicit union information. At any rate, unions seem pretty rare, so no one would notice the extra mem usage. Not sure how custom allocators mess up the GC, I thought these were just on their own anyways. If a pointer to something is outside of the GC heap, the GC doesn't bother changing it or collecting it or moving anything. Assembler is a bit tricky, maybe someone smarter than I can handle it better, but here's a shot with some psuedoasm: struct Foo { int member1; int member2; } Foo bar; ... Foo* foo = &bar; int extracted; // foo spotted in the assembly block, never mind the context // as such, foo gets pinned. asm { mov EAX, foo; // EAX = foo; add EAX, 4; // EAX += 4; mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2; } // foo is unpinned here As for C libraries, it seems like the same thing as custom allocators. The C heap is outside of the GC's jurisdiction and won't be moved or manipulated in any way. C code that handles D objects will have to be careful, and the callee D code will have to pin the objects before the go out into the unknown. > > On the bright side, I believe there's considerably less need to heap-allocate in D than, say, in Java, and even when used, one can overcome a bad(slow) GC in many cases (with stuff like malloc/free, delete, etc.), so the performance of GC is not as critical. structs are teh rulez. I'm still not comfortable with manual memory management in D though, mostly because the standard lib (phobos) is built with GC in mind and will probably leak the hell out of my program if I trust it too far. Either that or I have to roll my own functions, which sucks, or I have to be stuck with std.c which also sucks because it's not nearly as nice as phobos IMO. Mostly I agree with this though. Also, I wonder, if I were to make a tool that does escape analysis on your program, then finds that a number of classes can either be stack allocated or safely deleted after they reach a certain point in the code, then would this change the effectiveness of a generational GC? Perhaps part of why young objects die so often is because they are temporary things that can often be safely deleted at the end of scope or some such. > > If the compiler/GC were improved to differentiate between atomic and non-atomic data (the latter contains pointers to other data, the first doesn't), so memory areas that can't contain pointers don't get scanned at all*, I think I'd already be quite happy with the state of things.. > > > xs0 > > *) that may already be the case, but last time I checked it wasn't :) I'd love this optimization. It doesn't seem too horribly hard to do either. The GC needs a new heap and a new allocation function and the compiler needs to be trained to use the new allocation function. |
October 03, 2006 Re: Is a moving GC really needed? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Frits van Bommel | Frits van Bommel wrote:
> Lionello Lunesu wrote:
>> Chris Miller wrote:
>>> On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu <lio@lunesu.remove.com> wrote:
>>>> Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.)
>>>>
>>>
>>> I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in.
>>
>> Why would it allocate faster than a non-moving GC? (Ignoring for the moment the cost of moving.)
>
> "Normal" allocators have to deal with fragmentation.
> With a moving GC you can allocate everything to the top of the heap and rely on the GC to shrink the heap when you run out of space.
> So the allocator can basically be "increment top-of-heap-pointer by size of allocation and return old value" (synchronized if in a multi threaded environment, of course).
> That's pretty much the most efficient allocator possible (as long as the GC doesn't need to run, anyway ;) )
Imagine doing that with 2^64 bytes of virtual memory! Maybe you don't even need to compact (although, small blocks will keep entire pages in memory.. bad idea)
L.
|
Copyright © 1999-2021 by the D Language Foundation