Thread overview
Should all Arrays be more like Objects
Jan 22, 2002
Mike Wynn
Jan 22, 2002
Walter
Jan 23, 2002
Russ Lewis
Jan 23, 2002
Mike Wynn
Jan 23, 2002
Russ Lewis
Jan 24, 2002
Walter
Jan 24, 2002
Russ Lewis
Jan 25, 2002
Walter
January 22, 2002
As currently implemented Static arrays are on the stack, and you can without
warnings pass them to functions or constructors as dynamic arrays, which may
cause the dynamic array reference to remain live after the stack frame has
gone.
more alarming than this is my hypothosis that static arrays within objects
are not noticed by the GC and thus if I hold onto a reference to an array
within an object but not the object. The array will may be freed from under
my feet by the GC.

I understand the need for structures and static arrays, and the desire to
get down and dirty. but you can do that and remain safe if constrained a
little.
as point of reference C and C++ have the register keyword, and in C is means
behaves LIKE a register (not IS A register) but like one; so it has no
address
but in C++ it means optimise this value's use, the former is IMHO a much
nicer behaviour, as a programmer I can restrict what I can do with the
variable allowing the compiler to optimise if it feels in the mood to do so.

if the objectives of D are to reduce the pit falls when writing OO
programmes then
and classes as all dynamic ...
either all Arrays and Structures are dynamic too
or an automatic or stack attribute for method parameters
two rules (I think), references to stack items can not be stored in dynamic
items
and references to stack items of the current frame can not be put into any
items parameters refer to.
so a stack parameter can only be put into stack items of the current frame
or passed to methods that take stack parameters without manual intervention.
The programmer would have to create a dynamic copy of the static item. i.e.
dup the array.
because D is static, the compiler could do the analysis for you.

Mike.


January 22, 2002
"Mike Wynn" <mike.wynn@l8night.co.uk> wrote in message news:a2k4gh$cc7$1@digitaldaemon.com...
> As currently implemented Static arrays are on the stack, and you can
without
> warnings pass them to functions or constructors as dynamic arrays, which
may
> cause the dynamic array reference to remain live after the stack frame has gone.

Yes, that is a general problem with anything on the stack you take a pointer to.


> more alarming than this is my hypothosis that static arrays within objects are not noticed by the GC and thus if I hold onto a reference to an array within an object but not the object. The array will may be freed from
under
> my feet by the GC.

No, the GC will recognize that case.


> I understand the need for structures and static arrays, and the desire to
> get down and dirty. but you can do that and remain safe if constrained a
> little.
> as point of reference C and C++ have the register keyword, and in C is
means
> behaves LIKE a register (not IS A register) but like one; so it has no
> address
> but in C++ it means optimise this value's use, the former is IMHO a much
> nicer behaviour, as a programmer I can restrict what I can do with the
> variable allowing the compiler to optimise if it feels in the mood to do
so.
>
> if the objectives of D are to reduce the pit falls when writing OO
> programmes then
> and classes as all dynamic ...
> either all Arrays and Structures are dynamic too
> or an automatic or stack attribute for method parameters
> two rules (I think), references to stack items can not be stored in
dynamic
> items
> and references to stack items of the current frame can not be put into any
> items parameters refer to.
> so a stack parameter can only be put into stack items of the current frame
> or passed to methods that take stack parameters without manual
intervention.
> The programmer would have to create a dynamic copy of the static item.
i.e.
> dup the array.
> because D is static, the compiler could do the analysis for you.

Those are good points, and I thought a long time about them in the design of D. I eventually came to the conclusion that needing to make copies of arrays and objects all the time was too much overhead.




January 23, 2002
The GC could check reference counts on stack objects as they go out of scope. You could throw a "ReferencesRemainException" if somebody is still pointing to it.

This is much like my idea that delete should throw an exception if references remain....

--
The Villagers are Online! villagersonline.com

.[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
.[ (a version.of(English).(precise.more)) is(possible) ]
?[ you want.to(help(develop(it))) ]


January 23, 2002
As I read it the GC is a Mark and sweep, conservative collector.
so no ref counts to use, to add them would waste lots of time for all the
cases when not required and to only have them for stack arrays would require
the same analysis as for automatic promotion to heap arrays.

so the only way to see if it's live is to walk the whole heap every time you
exit a non leaf method that has passed an array as a parameter.
and to put more spanners in the works, a conservative collector can get it
wrong so it may not be live, you may just have an int with the same value.

Who gets the exception., the thread who's stack has been used or the
thread(s) with the array.

instead of going to the expence of trying to detect it, why no disallow it,
or make sure it never happens
if all array where objects except those within objects as the GC keeps the
parent alive
you never have a problem.
the compiler can optimise arrays in leaf methods, and with good memory and
one that does not coalesce too argessively then the right size block will be
available.

Mike.

"Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:3C4EE3D3.30B44B51@deming-os.org...
> The GC could check reference counts on stack objects as they go out of
scope.
> You could throw a "ReferencesRemainException" if somebody is still
pointing to
> it.
>
> This is much like my idea that delete should throw an exception if
references
> remain....
>
> --
> The Villagers are Online! villagersonline.com
>
> .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
> .[ (a version.of(English).(precise.more)) is(possible) ]
> ?[ you want.to(help(develop(it))) ]
>
>




January 23, 2002
Mike Wynn wrote:

> As I read it the GC is a Mark and sweep, conservative collector.
> so no ref counts to use, to add them would waste lots of time for all the
> cases when not required and to only have them for stack arrays would require
> the same analysis as for automatic promotion to heap arrays.

You're right, sorry.  I forgot about it.  I keep thinking about it as a reference-counting collector because I think that it should be one :)

NOTE TO WALTER: David Bacon of IBM Research has implemented a reference counting collector that does loop detection without any other algorithms.  That means no mark-and-sweep, stack traversal, or anything like that.  It's called Recycler and is implemented on the Jalapeno JVM.

http://www.research.ibm.com/people/d/dfb   is his home page, and http://www.research.ibm.com/people/d/dfb/papers.html#Bacon01Java describes how he does it.

> instead of going to the expence of trying to detect it, why no disallow it,
> or make sure it never happens
> if all array where objects except those within objects as the GC keeps the
> parent alive
> you never have a problem.
> the compiler can optimise arrays in leaf methods, and with good memory and
> one that does not coalesce too argessively then the right size block will be
> available.

I thought about proposing a function pointer type modifier that would allow you to declare pointers (passed as parameters) that couldn't be saved after the function returns...syntax error if you did otherwise.  However, that breaks this code:

int StartJob(char command[]);   // returns an id
int WaitForJobToFinish(int ID);
void MyFunc()
{
    char command = "do stuff";
    int id = StartJob(command);
    WaitForJobToFinish(id);
};

You want to ensure that all pointers to 'command' are gone before MyFunc()
exits.  But we can't because the thing we're calling has to save a pointer to it
between the calls to StartJob() and WaitForJobToFinish()...unless we force
StartJob() to copy the buffer, but then we're getting unnecessary overhead
again.

I'm open to ideas here...

--
The Villagers are Online! villagersonline.com

.[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
.[ (a version.of(English).(precise.more)) is(possible) ]
?[ you want.to(help(develop(it))) ]


January 24, 2002
"Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:3C4F034C.BD4D051A@deming-os.org...
> NOTE TO WALTER: David Bacon of IBM Research has implemented a reference
counting
> collector that does loop detection without any other algorithms.  That
means no
> mark-and-sweep, stack traversal, or anything like that.  It's called
Recycler
> and is implemented on the Jalapeno JVM.
>
> http://www.research.ibm.com/people/d/dfb   is his home page, and http://www.research.ibm.com/people/d/dfb/papers.html#Bacon01Java describes
how
> he does it.

Thanks! But I don't think reference counting can work with D, since the only pointer to an object might be an interior pointer. I don't see how that can work with reference counts.


January 24, 2002
"Walter" <walter@digitalmars.com> wrote in message news:a2nt3q$2v4f$2@digitaldaemon.com...

> Thanks! But I don't think reference counting can work with D, since the
only
> pointer to an object might be an interior pointer. I don't see how that
can
> work with reference counts.

   Well... There's an option that sounds interesting to me: assuming that
pointers are somewhat deprecated, as they should be in a higher-level
language, they could be implemented more inefficiently as a pointer to an
object, and an offet within that object. Then, reference counting would
work.

   This could be even more interesting if you consider the possibility of
implementing an extra level on indirection in object pointers (references),
where the pointer is an index into an array of active "real object
pointers". In that case, you can easily implement a self-compacting heap.

   Anyway, it's an idea.

   Oh, and you could still have pointers into real memory by making the
object be at position zero. For OS programming (if you still want to have
that as an option), you could also have a special type (call it "address")
which would be just an untyped raw pointer (just like in assembler), and
would be used with casts.

Salutaciones,
                         JCAB



January 24, 2002
Walter wrote:

> Thanks! But I don't think reference counting can work with D, since the only pointer to an object might be an interior pointer. I don't see how that can work with reference counts.

I've been thinking about that.  I see several possible solutions, none of which is optimal.  Be sure to read (5), which I think has the most promise:

SIMPLE IDEAS:

1) All pointers are actually double pointers; one element points to the actual thing being referenced, while one points to the garbage-collectable thing which it is inside of;

2) When setting (or unsetting) a pointer, do a lookup into a table of objects to find what the pointer is inside of.  If this was a hash table, then you would have a reasonably good chance of getting an immediate hit as long as the pointer is NOT an interior pointer...

MORE COMPLEX ONES:

4) Require that all reference-counted objects be on n-byte boundaries (perhaps
32 byte boundaries).  Then, you can find the start of the object as follows:
    a) Put a magic value as the first 32 bits of every reference counted object
    b) Have a memory table (as in #2), but include in the object a pointer to
that table.
    c) When looking for the "root" of an object, first go back to the 32-byte
boundary and check for the magic value.  If you don't find it, keep going back
32 bytes at a time until you do.
    d) When you find a magic value, guess that it's the root, and look at the
"pointer into the memory table."  Confirm that this pointer is to a valid
address in the table and that the table at that entry point points back to this
object.  If so, then you have the object root.  If not, return to (c) and
continue.

5) Make all members of an object also be garbage collected.  However, you set a
flag (or a pointer or offset value) that indicates that the method of "garbage
collecting" a member is simply to decrement the reference count of the parent
object.
    a) When using a pointer to an object to get a pointer to a member of it, you
check the count on the child first.  If this count is 0, then you increment the
count on the parent as well as the child, as you are making the child "refer" to
the parent.
    b) Later, when you lose the pointer to the child, you decrement its count.
If the child ends up "garbage collected", then you don't delete the
memory...instead, you decrement the count on the parent.  Then the parent might
or might not get collected.
    c) Dynamic arrays will have to be implemented as 12 bytes instead of 8; each
array has a pointer to a "root" (root of the entire array, not this slice of
it), and a pointer to the "root of this slice", and a length field.  Thus, you
can always know where to find the reference count when an array reference is
destroyed.
    d) You can't have any GC overhead inside structs (so while you can GC a
struct, you can't GC its members).  Also, it's not practical to have a GC record
for every 'int' inside a class.  Perhaps if you take a pointer to such elements,
you have to uses a special non-GC pointer.  This isn't actually so bad, if you
think about it...
    e) If you absolutely demand GC-ing members of a class:   For small members
of classes, you could group them into n-byte chunks (each of which starts with a
GC record) and require that all classes be allocated on a similar boundary.
Thus, except for pointers into a struct (which have to be non-GC pointers), you
can always find a GC record by taking the pointer and rounding down to the
n-byte boundary.

--
The Villagers are Online! villagersonline.com

.[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
.[ (a version.of(English).(precise.more)) is(possible) ]
?[ you want.to(help(develop(it))) ]


January 25, 2002
The solutions you propose I think will work, but there'll be considerable added overhead in both execution time and memory consumption. Also, some of the interoperability with C objects will no longer be possible.

I've written some programs using conservative GC and benchmarked them against fast reference counting schemes (written by others). My GC outperformed them, so I'm not sold on the idea. The GC I used is the one that comes with the Phobos library source, and it's a pretty basic one. Upgrading it to a generational collector would produce a big increment in performance.

-Walter

"Russ Lewis" <spamhole-2001-07-16@deming-os.org> wrote in message news:3C5039CF.397B4A12@deming-os.org...
> Walter wrote:
>
> > Thanks! But I don't think reference counting can work with D, since the
only
> > pointer to an object might be an interior pointer. I don't see how that
can
> > work with reference counts.
>
> I've been thinking about that.  I see several possible solutions, none of
which
> is optimal.  Be sure to read (5), which I think has the most promise:
>
> SIMPLE IDEAS:
>
> 1) All pointers are actually double pointers; one element points to the
actual
> thing being referenced, while one points to the garbage-collectable thing
which
> it is inside of;
>
> 2) When setting (or unsetting) a pointer, do a lookup into a table of
objects to
> find what the pointer is inside of.  If this was a hash table, then you
would
> have a reasonably good chance of getting an immediate hit as long as the
pointer
> is NOT an interior pointer...
>
> MORE COMPLEX ONES:
>
> 4) Require that all reference-counted objects be on n-byte boundaries
(perhaps
> 32 byte boundaries).  Then, you can find the start of the object as
follows:
>     a) Put a magic value as the first 32 bits of every reference counted
object
>     b) Have a memory table (as in #2), but include in the object a pointer
to
> that table.
>     c) When looking for the "root" of an object, first go back to the
32-byte
> boundary and check for the magic value.  If you don't find it, keep going
back
> 32 bytes at a time until you do.
>     d) When you find a magic value, guess that it's the root, and look at
the
> "pointer into the memory table."  Confirm that this pointer is to a valid address in the table and that the table at that entry point points back to
this
> object.  If so, then you have the object root.  If not, return to (c) and
> continue.
>
> 5) Make all members of an object also be garbage collected.  However, you
set a
> flag (or a pointer or offset value) that indicates that the method of
"garbage
> collecting" a member is simply to decrement the reference count of the
parent
> object.
>     a) When using a pointer to an object to get a pointer to a member of
it, you
> check the count on the child first.  If this count is 0, then you
increment the
> count on the parent as well as the child, as you are making the child
"refer" to
> the parent.
>     b) Later, when you lose the pointer to the child, you decrement its
count.
> If the child ends up "garbage collected", then you don't delete the memory...instead, you decrement the count on the parent.  Then the parent
might
> or might not get collected.
>     c) Dynamic arrays will have to be implemented as 12 bytes instead of
8; each
> array has a pointer to a "root" (root of the entire array, not this slice
of
> it), and a pointer to the "root of this slice", and a length field.  Thus,
you
> can always know where to find the reference count when an array reference
is
> destroyed.
>     d) You can't have any GC overhead inside structs (so while you can GC
a
> struct, you can't GC its members).  Also, it's not practical to have a GC
record
> for every 'int' inside a class.  Perhaps if you take a pointer to such
elements,
> you have to uses a special non-GC pointer.  This isn't actually so bad, if
you
> think about it...
>     e) If you absolutely demand GC-ing members of a class:   For small
members
> of classes, you could group them into n-byte chunks (each of which starts
with a
> GC record) and require that all classes be allocated on a similar
boundary.
> Thus, except for pointers into a struct (which have to be non-GC
pointers), you
> can always find a GC record by taking the pointer and rounding down to the n-byte boundary.
>
> --
> The Villagers are Online! villagersonline.com
>
> .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
> .[ (a version.of(English).(precise.more)) is(possible) ]
> ?[ you want.to(help(develop(it))) ]
>
>