gc and Object.toHash() (page 2)

Ben Hinkle wrote: > Here's my guess: > if the GC can tell what is an object reference (for example by allocating > from special heaps that store only objects) then it can get at the class > info and the class info can store a mask that tells the GC which fields > can be updated if the referant is moved. > > That would mean dynamic arrays don't get moved and pointers to structs (well, actually, pointers in general) don't get moved. I don't agree: the question should not be what kind of data you move, but where the reference is stored. References in object are easy to be found, but what about references on the stack or inside of structures/arrays? If the GC wants to move just a single object, it has to be sure to know about *all* references to this object, so it needs to know exactly what piece of data on the stack is a reference or not. It would need to do a complete backtrace and interpret each stackframe according to the corresponding function. I really would like to know what idea Walter has in mind about that. I doubt that it can be done with reasonable effort, in which case it would really make sense to drop that concept from the specs and rather guarantee that references stay fixed.

Norbert Nemec wrote: > Ben Hinkle wrote: > >> Here's my guess: >> if the GC can tell what is an object reference (for example by allocating >> from special heaps that store only objects) then it can get at the class >> info and the class info can store a mask that tells the GC which fields >> can be updated if the referant is moved. >> >> That would mean dynamic arrays don't get moved and pointers to structs (well, actually, pointers in general) don't get moved. > > I don't agree: the question should not be what kind of data you move, but where the reference is stored. > > References in object are easy to be found, but what about references on the stack or inside of structures/arrays? If the GC wants to move just a single object, it has to be sure to know about *all* references to this object, so it needs to know exactly what piece of data on the stack is a reference or not. It would need to do a complete backtrace and interpret each stackframe according to the corresponding function. Agreed. I was thinking any ambiguous object reference from outside the object heap would make the object unmovable but I never considered that only garbage would be moveable then :-) Maybe pure object references on the stack (or maybe everywhere... who knows) can be stored in a giant list. That would make the size of a reference bigger since it stores a pointer to the next reference but it would make traversing the set of references easy. > I really would like to know what idea Walter has in mind about that. I doubt that it can be done with reasonable effort, in which case it would really make sense to drop that concept from the specs and rather guarantee that references stay fixed.

May 24, 2004

Re: gc and Object.toHash()

Posted by Kevin Bealer
in reply to Ben Hinkle

Permalink

Kevin Bealer

Posted in reply to Ben Hinkle

Permalink

In article <c8sri0$28a7$1@digitaldaemon.com>, Ben Hinkle says...
>
>Norbert Nemec wrote:
>
>> Ben Hinkle wrote:
>> 
>>> Here's my guess:
>>> if the GC can tell what is an object reference (for example by allocating
>>> from special heaps that store only objects) then it can get at the class
>>> info and the class info can store a mask that tells the GC which fields
>>> can be updated if the referant is moved.
>>> 
>>> That would mean dynamic arrays don't get moved and pointers to structs (well, actually, pointers in general) don't get moved.
>> 
>> I don't agree: the question should not be what kind of data you move, but where the reference is stored.
>> 
>> References in object are easy to be found, but what about references on the stack or inside of structures/arrays? If the GC wants to move just a single object, it has to be sure to know about *all* references to this object, so it needs to know exactly what piece of data on the stack is a reference or not. It would need to do a complete backtrace and interpret each stackframe according to the corresponding function.
>
>Agreed. I was thinking any ambiguous object reference from outside the
>object heap would make the object unmovable but I never considered that
>only garbage would be moveable then :-)
>Maybe pure object references on the stack (or maybe everywhere... who knows)
>can be stored in a giant list. That would make the size of a reference
>bigger since it stores a pointer to the next reference but it would make
>traversing the set of references easy.

Removing or adding such a pointer is simple, if the list is changed to a doubly linked list.. but the overhead is annoying for single threaded, and much worse for multithreaded, programs.  You could also use a bitmap of stack frames.

I would guess the solution involves sectioning memory into parts that are "fully understood" and "partially understood".  Fully understood memory is D objects where the pointers and non-pointers are clearly known from the code.

During a sweep, objects are classified as dead, moveable, or fixed.  Dead objects are removed because no pointer is found to them.  Moveable objects have pointers, but only from "fully understood" memory.  "Fixed" objects are all objects in the partial heap, plus "fully understood" objects that have a pointer or *possible* pointer, from partially understood memory.

Since most code will not use pointers.  In practice, it doesn't hurt much to leave these objects where they are and move other objects around them.  This is similar to the "defrag" of a FAT filesystem.  You can't move files that are open, so you leave them in place.  You get 99% of the benefit.

This technique is partly based on a cultural distinction.  In C, programmers expect that they know at least as much as the language does about what memory layout is.  The language helps out, but is only a servant.  D programmers will have different expectations; the language runtime controls the heap and knows enough to do its own accounting.  It's like an armed shopkeeper.

Kevin

>> I really would like to know what idea Walter has in mind about that. I doubt that it can be done with reasonable effort, in which case it would really make sense to drop that concept from the specs and rather guarantee that references stay fixed.
>

Kevin Bealer wrote: > I would guess the solution involves sectioning memory into parts that are > "fully > understood" and "partially understood". Fully understood memory is D > objects where the pointers and non-pointers are clearly known from the > code. > ... That might really be an idea! Object references on the stack usually either point to huge objects, short-time objects or to the top of object hierarchies. In those cases where memory-defragmentation really would matter, references are mostly between objects that are saved on the heap for a long time. Furthermore, defragmentation really is only necessary for long-running programs like interactive programs or daemons. These should be fairly easy to optimize for defragmentation, but just trying not to keep too many references in variables on the stack during the top loop. Additionally, the GC could simply offer a way to mark/unmark variables on the stack as references, so their content may be moved as well. (This should not happen automatically, since it would mean quite some overhead, but doing it manually in important cases would not be a problem.

"Brian Hammond" <d at brianhammond dot comBrian_member@pathlink.com> wrote in message news:c8s0hv$v1n$1@digitaldaemon.com... > So Walter, if I understand correctly, the current Object.toHash() implementation > is valid with the current garbage collector but would be broken using a new, > copying collector. Yes, that's right. > If and when the GC impl. is updated, will Object.toHash() be > updated as well? I'd imagine yes, Object.toHash() will always work correctly > else it should be abstract :) > > Just wondering if I need to rewrite code now, when the GC impl. changes, or > never. Thanks. To be safe, right now I'd implement your own toHash() for your derived types.

In article <c8t1s1$2hv7$1@digitaldaemon.com>, Norbert Nemec says... > >Kevin Bealer wrote: > >> I would guess the solution involves sectioning memory into parts that are >> "fully >> understood" and "partially understood". Fully understood memory is D >> objects where the pointers and non-pointers are clearly known from the >> code. >> ... > >That might really be an idea! Object references on the stack usually either point to huge objects, short-time objects or to the top of object hierarchies. In those cases where memory-defragmentation really would matter, references are mostly between objects that are saved on the heap for a long time. > >Furthermore, defragmentation really is only necessary for long-running programs like interactive programs or daemons. These should be fairly easy to optimize for defragmentation, but just trying not to keep too many references in variables on the stack during the top loop. > >Additionally, the GC could simply offer a way to mark/unmark variables on the stack as references, so their content may be moved as well. (This should not happen automatically, since it would mean quite some overhead, but doing it manually in important cases would not be a problem. > Other ideas: The GC could prefer to collect when the stack is "shallow" relative to recent activity. This would cause collections at the end of "natural breaks" in processing. The stack itself could be "well understood" as follows: each function/method has a stack frame with a certain layout. A bitmap of that layout, i.e. "1" bits mean this-is-a-reference, and "0" bits mean "this is piece-of-data", would be created at compile time. When the GC "took over" it would determine which bitmap applied to each frame. If you think about it, a struct is like a stack frame, is like an object's data section. Programs have code and data elements; you can move "local variables" in an object into and out of the object body, to make different parts of it more or less persistent. So, everything that can be done with an object can be done with a stack frame and vice versa. Languages like ruby even allow "closures" where the stack frame stays around after the program exits because a nested function still has references to variables in it. This (potentially) requires garbage collection of the stack frame, which is not fast. It also means (depending on implementation) there really is no stack: just a lot of floating objects that serve as contexts for each other. There are some neat writeups on this. I don't remember where, but it was on the page for "parrot", the perl "assembler language". Slightly mind blowing, if you ask me. Kevin

Forums