October 04, 2006
Lionello Lunesu wrote:
> Chris Miller wrote:
>> On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu <lio@lunesu.remove.com> wrote:
>>
>>> I've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble?
>>>
>>> Being able to move memory blocks reduces memory fragmentation, am I correct? Is this the only reason? (For the remainder of this post, I'm assuming it is.)
>>>
>>
>> I believe it allows for very fast allocations, almost as fast as stack allocation. This is something I'm very interested in.
> 
> Why would it allocate faster than a non-moving GC? (Ignoring for the moment the cost of moving.)
> 
>> It may also improve caching as allocated memory is more together.
> 
> This I understand, but what's the granularity (if that's the correct term) of a CPU cache?
> 
> http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache keeps copies of 64 byte lines of memory."
> 
> I guess the chances are pretty low for two blocks to be moved close together and be cached together, and be accessed together..
> 
> L.

If you think about it from the point of view of disk blocks or pages it makes more sense, but even cache lines could benefit some.

What happens is that normal applications typically have linked lists and arrays of pointers to sub-structures.  Moving allocators normally traverse these objects in 'depth-first' mode, and copying the data into some destination area as they scan it, rather than only moving to fill gaps as you might guess.

Copying a linked list in depth first order, means that the nodes of the list are copied to a new region of memory - and end up *sequential* in memory as a result.  From then on, they are sequential and contiguous, packed into nearly as few pages as possible.

Later, as nodes are added to the middle of the list, the list gets 'scattered', but is always recollected and re-sequentialized during the next sweep.

The same is true for arrays of pointers, binary trees, and many other structures.  It's probably less important for hash tables and "value" arrays.  Also it's more important when memory is full, because disk pages are heavier than cache lines.

Java programmers are taught that "ArrayList" is the best list for most applications -- it's basically c++'s std::vector -- so now arrays have become the norm in Java and C++, and the argument for moving collectors is probably weaker now.

For LISP like languages (postscript, lisp, scheme, sml, ocaml), this is probably a bigger deal.

Kevin
October 04, 2006
Chad J > wrote:
> xs0 wrote:
>>
>> While I'm no expert, I doubt a moving GC is even possible in a systems language like D.
>>
>> First, if you move things around, you obviously need to be precise when updating pointers, lest all hell breaks loose. But how do you update
>>
>> union {
>>     int a;
>>     void* b;
>> }
>>
>> ? While you could forbid overlap of pointers and non-pointer data, what about custom allocators, assembler, C libraries (including OS runtime!), etc.?
> 
> For the union, I might suggest a more acceptable tradeoff - mandate that some data be inserted before every union to tell the gc which member is selected at any moment during program execution.  Whenever an assignment is done to the union, code is inserted to update the union's status.  So your union would look more like this:
> 
> enum StructStatus
> {
>   a,
>   b,
> }
> 
> struct
> {
>   StructStatus status; //or size_t, whichever works
> 
>   union
>   {
>     int a;
>     void* b;
>   }
> }

Well, you can't simply mandate that, I think.. It can be the case that the exact structure of a struct/union is mandated by requirements external to the application..

And, even if it was an option, it doesn't really solve the precision issue. To be precise, the GC must know the internal structure of anything on its heap, which I don't think is possible unless the language was severely restricted or it was required that the programmer supplies that information manually whenever the purpose of a part of memory changes from pointer to non-pointer and vice-versa. Both options suck quite a lot.


> Not sure how custom allocators mess up the GC, I thought these were just on their own anyways.  If a pointer to something is outside of the GC heap, the GC doesn't bother changing it or collecting it or moving anything.

That's true, but a custom allocator can use the GC heap if it wants. For example, if I know I'll be allocating a million linked list nodes, and will then release them all at once, it can be considerably faster to allocate a large byte[] and use chunks of it for my nodes. But how can the GC tell which of those bytes are pointers and which aren't? At the beginning none are, but later some may be..


> Assembler is a bit tricky, maybe someone smarter than I can handle it better, but here's a shot with some psuedoasm:
> 
> struct Foo
> {
>   int member1;
>   int member2;
> }
> Foo bar;
> 
> ...
> 
> Foo* foo = &bar;
> int extracted;
> // foo spotted in the assembly block, never mind the context
> // as such, foo gets pinned.
> asm
> {
>   mov EAX, foo;         // EAX = foo;
>   add EAX, 4;           // EAX += 4;
>   mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2;
> }
> // foo is unpinned here

I'm sure simple cases can be detected, but am also fairly certain it's impossible to generally determine which registers hold pointers and which don't, and/or what will get referenced and what won't.


> As for C libraries, it seems like the same thing as custom allocators. The C heap is outside of the GC's jurisdiction and won't be moved or manipulated in any way.  C code that handles D objects will have to be careful, and the callee D code will have to pin the objects before the go out into the unknown.

I meant precision again, not C's heap. Even if compiled D code somehow notifies the GC what's on the stack, the C code definitely won't. Then, how can the GC tell which parts of the stack point to its own heap, and which are just integers that happen to look like they could point to the heap?


> Also, I wonder, if I were to make a tool that does escape analysis on your program, then finds that a number of classes can either be stack allocated or safely deleted after they reach a certain point in the code, then would this change the effectiveness of a generational GC? Perhaps part of why young objects die so often is because they are temporary things that can often be safely deleted at the end of scope or some such.

Well, I think it's quite hard to guess whether the result would be better or worse, speed-wise. Obviously, allocating those objects on the stack would speed things up, while using generations in the GC would be less effective, because the GC would see less short-lived objects. I've no idea, however, how those two effects would combine.. Most probably it depends very much on the actual application (as do most things ;)


xs0
October 04, 2006
Lionello Lunesu wrote:
> I've noticed that some design decisions are made with the possibility of a moving GC in mind. Will D indeed end up with a moving GC? If so, why? Is a moving GC really worth the extra trouble?

	Some people claim that a program that runs for a long time (web server, servlet container etc) would fragment memory so bad that eventually you can run out of RAM even though you still have lots.

	That is not a theory. This has been reported to me by a close friend who works for a big company which I will not name. A new JVM with a GC that moved objects solved the problem their customer was having.

	A GC that moves objects can compact memory and avoid the fragmentation.

	I also don't see how a moving GC can live with the D features. I know that having a non-moving GC can have people produce code that is not portable across compiler implementations. I have seen this happen with code for SmartEiffel versus ISE Eiffel for example. Using SmartEiffel, people cached pointers to Eiffel objects from C DLLs etc. Worked in SmartEiffel (non-moving GC), but not in ISE Eiffel (moving GC).


marcio
October 04, 2006
Marcio wrote:
> 
>     I also don't see how a moving GC can live with the D features. I know that having a non-moving GC can have people produce code that is not portable across compiler implementations. I have seen this happen with code for SmartEiffel versus ISE Eiffel for example. Using SmartEiffel, people cached pointers to Eiffel objects from C DLLs etc. Worked in SmartEiffel (non-moving GC), but not in ISE Eiffel (moving GC).

I'll admit I've wondered about this.  Without sufficient type information the GC has no way of knowing whether a particular 32-bit (or 64-bit) value that *could be* a pointer actually *is* a pointer and therefore should be altered when a move occurs.  Assuming the type information is present however, it should be simple enough to state that pointers must be typed as such (ie. it is illegal to store a pointer in a byte[4], for example).  In fact, I think D already has this restriction in place.


Sean
October 05, 2006
xs0 wrote:
> Chad J > wrote:
> 
>> xs0 wrote:
>>
>>>
>>> While I'm no expert, I doubt a moving GC is even possible in a systems language like D.
>>>
>>> First, if you move things around, you obviously need to be precise when updating pointers, lest all hell breaks loose. But how do you update
>>>
>>> union {
>>>     int a;
>>>     void* b;
>>> }
>>>
>>> ? While you could forbid overlap of pointers and non-pointer data, what about custom allocators, assembler, C libraries (including OS runtime!), etc.?
>>
>>
>> For the union, I might suggest a more acceptable tradeoff - mandate that some data be inserted before every union to tell the gc which member is selected at any moment during program execution.  Whenever an assignment is done to the union, code is inserted to update the union's status.  So your union would look more like this:
>>
>> enum StructStatus
>> {
>>   a,
>>   b,
>> }
>>
>> struct
>> {
>>   StructStatus status; //or size_t, whichever works
>>
>>   union
>>   {
>>     int a;
>>     void* b;
>>   }
>> }
> 
> 
> Well, you can't simply mandate that, I think.. It can be the case that the exact structure of a struct/union is mandated by requirements external to the application..
> 

I think you can though, the same way that all D classes are prepended by a pointer to a vtable and a 'monitor', or the same way that dynamic arrays have not just a pointer, but an added length variable.

> And, even if it was an option, it doesn't really solve the precision issue. To be precise, the GC must know the internal structure of anything on its heap, which I don't think is possible unless the language was severely restricted or it was required that the programmer supplies that information manually whenever the purpose of a part of memory changes from pointer to non-pointer and vice-versa. Both options suck quite a lot.
> 

But it can be precise with said union.  It will know at compile time where all of the unions are stored, in the same way it knows where pointers are stored.  Then, when it looks at the unions, it determines at runtime whether or not they contain pointers by using the extra member info.  Either the member info says the current member is a pointer, or it says that the current member isn't a pointer.  There is no maybe.

IMO this is not a severe restriction, and it does not require any programmer intervention.  The compiler writes all of the code that sets unions' member info variables, and to the programmer the member info is readonly and needs no maintenance.

> 
>> Not sure how custom allocators mess up the GC, I thought these were just on their own anyways.  If a pointer to something is outside of the GC heap, the GC doesn't bother changing it or collecting it or moving anything.
> 
> 
> That's true, but a custom allocator can use the GC heap if it wants. For example, if I know I'll be allocating a million linked list nodes, and will then release them all at once, it can be considerably faster to allocate a large byte[] and use chunks of it for my nodes. But how can the GC tell which of those bytes are pointers and which aren't? At the beginning none are, but later some may be..
> 

Hmmm.  In the case of the allocator I think it is the allocator writer's responsibility to use gc.addRange() on all of the pointers that they toss into the byte[].  What bugs me now though, is other arbitrary data cases, like std.boxer.

So far the only answer I can think of is to make people dealing with arbitrarily typed data be very careful about gc.addRange'ing and gc.removeRange'ing any references that goes into or out of their arbitrarily typed data, which does suck.

> 
>> Assembler is a bit tricky, maybe someone smarter than I can handle it better, but here's a shot with some psuedoasm:
>>
>> struct Foo
>> {
>>   int member1;
>>   int member2;
>> }
>> Foo bar;
>>
>> ...
>>
>> Foo* foo = &bar;
>> int extracted;
>> // foo spotted in the assembly block, never mind the context
>> // as such, foo gets pinned.
>> asm
>> {
>>   mov EAX, foo;         // EAX = foo;
>>   add EAX, 4;           // EAX += 4;
>>   mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2;
>> }
>> // foo is unpinned here
> 
> 
> I'm sure simple cases can be detected, but am also fairly certain it's impossible to generally determine which registers hold pointers and which don't, and/or what will get referenced and what won't.
> 

Yeah, I realized stuff like ASM that rewrites/rearranges the stack would mess up the GC.  Perhaps we just need a comprehensive list of do's, dont's, and workarounds in D asm programming?

> 
>> As for C libraries, it seems like the same thing as custom allocators. The C heap is outside of the GC's jurisdiction and won't be moved or manipulated in any way.  C code that handles D objects will have to be careful, and the callee D code will have to pin the objects before the go out into the unknown.
> 
> 
> I meant precision again, not C's heap. Even if compiled D code somehow notifies the GC what's on the stack, the C code definitely won't. Then, how can the GC tell which parts of the stack point to its own heap, and which are just integers that happen to look like they could point to the heap?
> 

It already knows where pointers are in each frame of the stack for D code, can't we make it also know which frames aren't D code at all?  For the C code frames it doesn't look at them at all.  Any D code that could let references out into C code (only way I know of is with extern(C)) should pin that object on the D side and keep a reference until execution has returned to D.

> 
>> Also, I wonder, if I were to make a tool that does escape analysis on your program, then finds that a number of classes can either be stack allocated or safely deleted after they reach a certain point in the code, then would this change the effectiveness of a generational GC? Perhaps part of why young objects die so often is because they are temporary things that can often be safely deleted at the end of scope or some such.
> 
> 
> Well, I think it's quite hard to guess whether the result would be better or worse, speed-wise. Obviously, allocating those objects on the stack would speed things up, while using generations in the GC would be less effective, because the GC would see less short-lived objects. I've no idea, however, how those two effects would combine.. Most probably it depends very much on the actual application (as do most things ;)
> 
> 
> xs0
October 06, 2006
Marcio wrote:
>     I also don't see how a moving GC can live with the D features. I know that having a non-moving GC can have people produce code that is not portable across compiler implementations.

In C#, a block of code can't create or access pointers unless it uses the "unsafe" keyword. And, any reference types within the method body have to be specifically pinned at one memory location (using a "fixed" block) whenever a pointer might be accessing that memory.

It works really well. The vast majority of my code uses the default GC behavior, but every once in a while, I need to do some pointer manipulation, and I temporarily tell the garbage collector to refrain from moving my objects.

I think a similar system could work in D.

--Benji Smith
October 06, 2006
Chad J > wrote:
> xs0 wrote:
>> Chad J > wrote:
>>
>>> xs0 wrote:
>>>
>>>>
>>>> While I'm no expert, I doubt a moving GC is even possible in a systems language like D.
>>>>
>>>> First, if you move things around, you obviously need to be precise when updating pointers, lest all hell breaks loose. But how do you update
>>>>
>>>> union {
>>>>     int a;
>>>>     void* b;
>>>> }
>>>>
>>>> ? While you could forbid overlap of pointers and non-pointer data, what about custom allocators, assembler, C libraries (including OS runtime!), etc.?
>>>
>>>
>>> For the union, I might suggest a more acceptable tradeoff - mandate that some data be inserted before every union to tell the gc which member is selected at any moment during program execution.  Whenever an assignment is done to the union, code is inserted to update the union's status.  So your union would look more like this:
>>>
>>> enum StructStatus
>>> {
>>>   a,
>>>   b,
>>> }
>>>
>>> struct
>>> {
>>>   StructStatus status; //or size_t, whichever works
>>>
>>>   union
>>>   {
>>>     int a;
>>>     void* b;
>>>   }
>>> }
>>
>>
>> Well, you can't simply mandate that, I think.. It can be the case that the exact structure of a struct/union is mandated by requirements external to the application..
>>
> 
> I think you can though, the same way that all D classes are prepended by a pointer to a vtable and a 'monitor', or the same way that dynamic arrays have not just a pointer, but an added length variable.
> 
>> And, even if it was an option, it doesn't really solve the precision issue. To be precise, the GC must know the internal structure of anything on its heap, which I don't think is possible unless the language was severely restricted or it was required that the programmer supplies that information manually whenever the purpose of a part of memory changes from pointer to non-pointer and vice-versa. Both options suck quite a lot.
>>
> 
> But it can be precise with said union.  It will know at compile time where all of the unions are stored, in the same way it knows where pointers are stored.  Then, when it looks at the unions, it determines at runtime whether or not they contain pointers by using the extra member info.  Either the member info says the current member is a pointer, or it says that the current member isn't a pointer.  There is no maybe.
> 
> IMO this is not a severe restriction, and it does not require any programmer intervention.  The compiler writes all of the code that sets unions' member info variables, and to the programmer the member info is readonly and needs no maintenance.
> 
>>
>>> Not sure how custom allocators mess up the GC, I thought these were just on their own anyways.  If a pointer to something is outside of the GC heap, the GC doesn't bother changing it or collecting it or moving anything.
>>
>>
>> That's true, but a custom allocator can use the GC heap if it wants. For example, if I know I'll be allocating a million linked list nodes, and will then release them all at once, it can be considerably faster to allocate a large byte[] and use chunks of it for my nodes. But how can the GC tell which of those bytes are pointers and which aren't? At the beginning none are, but later some may be..
>>
> 
> Hmmm.  In the case of the allocator I think it is the allocator writer's responsibility to use gc.addRange() on all of the pointers that they toss into the byte[].  What bugs me now though, is other arbitrary data cases, like std.boxer.
> 
> So far the only answer I can think of is to make people dealing with arbitrarily typed data be very careful about gc.addRange'ing and gc.removeRange'ing any references that goes into or out of their arbitrarily typed data, which does suck.
> 
>>
>>> Assembler is a bit tricky, maybe someone smarter than I can handle it better, but here's a shot with some psuedoasm:
>>>
>>> struct Foo
>>> {
>>>   int member1;
>>>   int member2;
>>> }
>>> Foo bar;
>>>
>>> ...
>>>
>>> Foo* foo = &bar;
>>> int extracted;
>>> // foo spotted in the assembly block, never mind the context
>>> // as such, foo gets pinned.
>>> asm
>>> {
>>>   mov EAX, foo;         // EAX = foo;
>>>   add EAX, 4;           // EAX += 4;
>>>   mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2;
>>> }
>>> // foo is unpinned here
>>
>>
>> I'm sure simple cases can be detected, but am also fairly certain it's impossible to generally determine which registers hold pointers and which don't, and/or what will get referenced and what won't.
>>
> 
> Yeah, I realized stuff like ASM that rewrites/rearranges the stack would mess up the GC.  Perhaps we just need a comprehensive list of do's, dont's, and workarounds in D asm programming?
> 
>>
>>> As for C libraries, it seems like the same thing as custom allocators. The C heap is outside of the GC's jurisdiction and won't be moved or manipulated in any way.  C code that handles D objects will have to be careful, and the callee D code will have to pin the objects before the go out into the unknown.
>>
>>
>> I meant precision again, not C's heap. Even if compiled D code somehow notifies the GC what's on the stack, the C code definitely won't. Then, how can the GC tell which parts of the stack point to its own heap, and which are just integers that happen to look like they could point to the heap?
>>
> 
> It already knows where pointers are in each frame of the stack for D code, can't we make it also know which frames aren't D code at all?  For the C code frames it doesn't look at them at all.  Any D code that could let references out into C code (only way I know of is with extern(C)) should pin that object on the D side and keep a reference until execution has returned to D.
> 
>>
>>> Also, I wonder, if I were to make a tool that does escape analysis on your program, then finds that a number of classes can either be stack allocated or safely deleted after they reach a certain point in the code, then would this change the effectiveness of a generational GC? Perhaps part of why young objects die so often is because they are temporary things that can often be safely deleted at the end of scope or some such.
>>
>>
>> Well, I think it's quite hard to guess whether the result would be better or worse, speed-wise. Obviously, allocating those objects on the stack would speed things up, while using generations in the GC would be less effective, because the GC would see less short-lived objects. I've no idea, however, how those two effects would combine.. Most probably it depends very much on the actual application (as do most things ;)
>>
>>
>> xs0

Why do you need a 100% moving GC anyway? Wouldn't it possible to combine moving GC with mark-and-sweep for those cases (like unions and asm) where you just can't be sure? In other words, a 'conservative moving gc'. (Might be a nightmare to implement, of course).
1 2
Next ›   Last »