Thread overview | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
October 09, 2010 GC interpreting integer values as pointers | ||||
---|---|---|---|---|
| ||||
Hi! In my D programs I am having problems with objects not getting finalised although there is no reference anymore. It turned out that this is caused by integers which happen to have values corresponding to pointers into the heap. So I wrote a test program to check the GC behaviour concerning integer values: ---------------------------------------- import std.stdio; import core.memory; class C { string s; this(string s) { this.s=s; } ~this() { writeln(s); } } struct S { uint r; this(uint x) { r = x; } } class X { C c; uint r; S s; uint[int] a; uint* p; this() { c = new C("reference"); new C("no reference"); r = cast(uint) cast(void*) new C("uint"); s = S(cast(uint) cast(void*) new C("struct")); a[0] = cast(uint) cast(void*) new C("AA"); p = new uint; *p = (cast(uint) cast(void*) new C("new uint")); } } void main(string[] args) { X x = new X; GC.collect(); writefln("========== %s, %x, %x, %x, %x", x.c.s, x.r, x.s.r, x.a[0], *x.p); } ---------------------------------------- This writes: new uint no reference ========== reference, f7490e20, f7490e10, f7490df0, f7490dd0 AA struct uint reference So in most but not all situations the integer value keeps the object from getting finalised. This observation corresponds to the effects I saw in my programs. I find this rather unfortunate. Is this known, documented behaviour? In a typical program there are such integer values all over the place. How should such values be stored to avoid unwanted interaction with the GC? Thanks, Ivo |
October 11, 2010 Re: GC interpreting integer values as pointers | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ivo Kasiuk | == Quote from Ivo Kasiuk (i.kasiuk@gmx.de)'s article > Hi! ~snip > ---------------------------------------- > This writes: > new uint > no reference > ========== reference, f7490e20, f7490e10, f7490df0, f74 > 90dd0 > AA > struct > uint > reference > So in most but not all situations the integer value keeps the object > from getting finalised. This observation corresponds to the effects I > saw in my programs. > I find this rather unfortunate. Is this known, documented behaviour? In > a typical program there are such integer values all over the place. How > should such values be stored to avoid unwanted interaction with the GC? > Thanks, > Ivo In D1: import std.stdio; import std.gc; class C { string s; this(string s) { this.s=s; } ~this() { writefln(s); } } class X { C c; uint r; uint[int] a; uint* p; this() { c = new C("reference"); new C("no reference"); r = cast(uint) cast(void*) new C("uint"); a[0] = cast(uint) cast(void*) new C("AA"); p = new uint; *p = (cast(uint) cast(void*) new C("new uint")); } } void main(string[] args) { X x = new X; std.gc.fullCollect(); writefln("========== %s, %x, %x, %x", x.c.s, x.r, x.a[0],*x.p); } Writes: no reference ========== reference, ad3fd0, ad3fb0, ad3f90 new uint << ;) AA uint reference |
October 11, 2010 Re: GC interpreting integer values as pointers | ||||
---|---|---|---|---|
| ||||
Posted in reply to %u | > ~snip > > ---------------------------------------- > > This writes: > > new uint > > no reference > > ========== reference, f7490e20, f7490e10, f7490df0, f74 > > 90dd0 > > AA > > struct > > uint > > reference ... > > Thanks, > > Ivo > > In D1: ... > Writes: > no reference > ========== reference, ad3fd0, ad3fb0, ad3f90 > new uint << ;) > AA > uint > reference Thanks for trying it out in D1. So, summing up this means that: - In most cases, memory is by default scanned for pointers regardless of the actual data types. - In D2, newly allocated memory for a non-pointer data type (like "new uint" or "new uint[10]") is not scanned by default. - In D1, you have to use hasNoPointers if you want some memory not to be scanned. Is this observation correct? And what about structs/classes that have integer fields as well as pointer/reference fields? And what about associative arrays - apparently these are scanned even if the type is uint? Ivo |
October 11, 2010 Re: GC interpreting integer values as pointers | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ivo Kasiuk | == Quote from Ivo Kasiuk (i.kasiuk@gmx.de)'s article > > ~snip > > > ---------------------------------------- > > > This writes: > > > new uint > > > no reference > > > ========== reference, f7490e20, f7490e10, f7490df0, > f74 > > > 90dd0 > > > AA > > > struct > > > uint > > > reference > ... > > > Thanks, > > > Ivo > > > > In D1: > ... > > Writes: > > no reference > > ========== reference, ad3fd0, ad3fb0, ad3f90 > > new uint << ;) > > AA > > uint > > reference > Thanks for trying it out in D1. > So, summing up this means that: > - In most cases, memory is by default scanned for pointers regardless of > the actual data types. > - In D2, newly allocated memory for a non-pointer data type (like "new > uint" or "new uint[10]") is not scanned by default. Isn't p a pointer data type? I didn't even know I could do "i = new int;" :D > - In D1, you have to use hasNoPointers if you want some memory not to be > scanned. > Is this observation correct? > And what about structs/classes that have integer fields as well as > pointer/reference fields? > And what about associative arrays - apparently these are scanned even if > the type is uint? > Ivo I added the struct again and also ran without the enclosing X class. With X : no reference ========== reference, ad3fd0, ad3fc0, ad3fa0, ad3f80 new uint AA struct uint reference Without X : no reference ========== reference, ad2fd0, ad2fc0, ad2fa0, ad2f80 new uint -- import std.stdio; import std.gc; class C { string s; this(string s) { this.s=s; } ~this() { writefln(s); } } struct S { uint r; static S opCall(uint x) { S s; s.r = x; return s; } } class X{ C c; uint r; S s; uint[int] a; uint* p; this() { c = new C("reference"); new C("no reference"); r = cast(uint) cast(void*) new C("uint"); s = S(cast(uint) cast(void*) new C("struct")); a[0] = cast(uint) cast(void*) new C("AA"); p = new uint; *p = (cast(uint) cast(void*) new C("new uint")); } } void main(string[] args) { /+ c = new C("reference"); new C("no reference"); r = cast(uint) cast(void*) new C("uint"); s = S(cast(uint) cast(void*) new C("struct")); a[0] = cast(uint) cast(void*) new C("AA"); p = new uint; *p = (cast(uint) cast(void*) new C("new uint")); +/ X x = new X; std.gc.fullCollect(); writefln("========== %s, %x, %x, %x, %x", x.c.s, x.r, x.s.r, x.a[0],*x.p); //writefln("========== %s, %x, %x, %x, %x", c.s, r, s.r, a[0],*p); } |
October 11, 2010 Re: GC interpreting integer values as pointers | ||||
---|---|---|---|---|
| ||||
Posted in reply to %u | > > > ~snip > > > > ---------------------------------------- > > > > This writes: > > > > new uint > > > > no reference > > > > ========== reference, f7490e20, f7490e10, f7490df0, > > f74 > > > > 90dd0 > > > > AA > > > > struct > > > > uint > > > > reference > > ... > > > > Thanks, > > > > Ivo > > > > > > In D1: > > ... > > > Writes: > > > no reference > > > ========== reference, ad3fd0, ad3fb0, ad3f90 > > > new uint << ;) > > > AA > > > uint > > > reference > > Thanks for trying it out in D1. > > So, summing up this means that: > > - In most cases, memory is by default scanned for pointers > > regardless of > > the actual data types. > > - In D2, newly allocated memory for a non-pointer data type (like > > "new > > uint" or "new uint[10]") is not scanned by default. > Isn't p a pointer data type? > I didn't even know I could do "i = new int;" :D What I mean is that p is pointing to data which has a simple data type (not a struct/class/union) that is not a pointer/reference type. For instance, with "p = new uint[10]" the compiler knows that the newly allocated memory that p points to does not contain any pointers. With D2, that seems to cause the memory not to be scanned. > > - In D1, you have to use hasNoPointers if you want some memory not > > to be > > scanned. > > Is this observation correct? > > And what about structs/classes that have integer fields as well as > > pointer/reference fields? > > And what about associative arrays - apparently these are scanned > > even if > > the type is uint? > > Ivo > > I added the struct again and also ran without the enclosing X class. > > With X : > no reference > ========== reference, ad3fd0, ad3fc0, ad3fa0, ad3f80 > new uint > AA > struct > uint > reference > > Without X : > no reference > ========== reference, ad2fd0, ad2fc0, ad2fa0, ad2f80 > new uint ... No suprises with the struct. And the "Without X" example... I am not sure, with the variables all in the current stack frame that might be a special case. What about global variables instead: ... C c; uint r; S s; uint[int] a; uint* p; uint[] arr; void f() { c = new C("reference"); new C("no reference"); r = cast(uint) cast(void*) new C("uint"); s = S(cast(uint) cast(void*) new C("struct")); a[0] = cast(uint) cast(void*) new C("AA"); p = new uint; *p = (cast(uint) cast(void*) new C("new uint")); arr = new uint[1]; arr[0] = (cast(uint) cast(void*) new C("array")); } void main(string[] args) { f(); GC.collect(); writefln("========== %s, %x, %x, %x, %x, %x", c.s, r, s.r, a[0], *p, arr[0]); } That gives me (with D2): array new uint no reference ========== reference, f74c3e20, f74c3e10, f74c3df0, f74c3dd0, f74c3db0 AA struct uint reference |
October 12, 2010 Re: GC interpreting integer values as pointers | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ivo Kasiuk | == Quote from Ivo Kasiuk (i.kasiuk@gmx.de)'s article > > I added the struct again and also ran without the enclosing X class. > > > > With X : > > no reference > > ========== reference, ad3fd0, ad3fc0, ad3fa0, ad3f80 > > new uint > > AA > > struct > > uint > > reference > > > > Without X : > > no reference > > ========== reference, ad2fd0, ad2fc0, ad2fa0, ad2f80 > > new uint > ... > No suprises with the struct. > And the "Without X" example... I am not sure, with the variables all in > the current stack frame that might be a special case. What about global > variables instead: Actually, those were global variables: I simply commented out the encapsulating class and constructor. But I left all the allocation in the main.. would that matter? > ... > C c; > uint r; > S s; > uint[int] a; > uint* p; > uint[] arr; > void f() { > c = new C("reference"); > new C("no reference"); > r = cast(uint) cast(void*) new C("uint"); > s = S(cast(uint) cast(void*) new C("struct")); > a[0] = cast(uint) cast(void*) new C("AA"); > p = new uint; > *p = (cast(uint) cast(void*) new C("new uint")); > arr = new uint[1]; > arr[0] = (cast(uint) cast(void*) new C("array")); > } > void main(string[] args) { > f(); > GC.collect(); > writefln("========== %s, %x, %x, %x, %x, %x", > c.s, r, s.r, a[0], *p, arr[0]); > } > That gives me (with D2): > array > new uint > no reference > ========== reference, f74c3e20, f74c3e10, f74c3df0, f74 > c3dd0, f74c3db0 > AA > struct > uint > reference |
October 14, 2010 Re: GC interpreting integer values as pointers | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ivo Kasiuk | On Sat, 09 Oct 2010 15:51:37 -0400, Ivo Kasiuk <i.kasiuk@gmx.de> wrote: > Hi! > > In my D programs I am having problems with objects not getting finalised > although there is no reference anymore. It turned out that this is > caused by integers which happen to have values corresponding to pointers > into the heap. So I wrote a test program to check the GC behaviour > concerning integer values: > [snip] > So in most but not all situations the integer value keeps the object > from getting finalised. This observation corresponds to the effects I > saw in my programs. > > I find this rather unfortunate. Is this known, documented behaviour? In > a typical program there are such integer values all over the place. How > should such values be stored to avoid unwanted interaction with the GC? Yes, D's garbage collector is a conservative garbage collector. One which doesn't have this problem is called a precise garbage collector. There are two problems here. First, D has unions, so it is impossible for the GC to determine if a union contains an integer or a pointer. Second problem is the granularity of scanning. A memory block is scanned as if every n bits (n being your architecture) is a pointer, or there are no pointers. This is determined by a bit associated with the block (the NO_SCAN bit). If you allocate a memory block that contains at least one pointer, then all the words in the memory block are considered to be pointers by the GC. There is a (continually updated) patch which allows the GC to be semi-precise. That is, the type information of the memory block will be linked to it. This will allow precise scanning except for unions. Once this is integrated, the false pointer problem will be much less prevalent. -Steve |
October 14, 2010 Re: GC interpreting integer values as pointers | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | Steven Schveighoffer:
> There are two problems here. First, D has unions, so it is impossible for the GC to determine if a union contains an integer or a pointer.
D has unions, and sometimes normal C-style unions are useful. But in many situations when you have a union you also keep a tag that represents the type, so in many of those situations you may use the tagged union of Phobos, std.variant.Algebraic (if the Phobos implementation is good enough, currently unfinished and not good enough yet) and the D GC may be aware and read and use the tag of an Algebraic union to know at runtime what's the type. This improves the GC precision a little.
Bye,
bearophile
|
October 14, 2010 Re: GC interpreting integer values as pointers | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Thu, 14 Oct 2010 12:39:33 -0400, bearophile <bearophileHUGS@lycos.com> wrote:
> Steven Schveighoffer:
>
>> There are two problems here. First, D has unions, so it is impossible for
>> the GC to determine if a union contains an integer or a pointer.
>
> D has unions, and sometimes normal C-style unions are useful. But in many situations when you have a union you also keep a tag that represents the type, so in many of those situations you may use the tagged union of Phobos, std.variant.Algebraic (if the Phobos implementation is good enough, currently unfinished and not good enough yet) and the D GC may be aware and read and use the tag of an Algebraic union to know at runtime what's the type. This improves the GC precision a little.
Unions are rare enough that I think this may not be worth doing. But yes, it could be had.
-Steve
|
October 14, 2010 Re: GC interpreting integer values as pointers | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | > On Sat, 09 Oct 2010 15:51:37 -0400, Ivo Kasiuk <i.kasiuk@gmx.de> wrote:
>
> > Hi!
> >
> > In my D programs I am having problems with objects not getting finalised although there is no reference anymore. It turned out that this is caused by integers which happen to have values corresponding to pointers into the heap. So I wrote a test program to check the GC behaviour concerning integer values:
> >
>
> [snip]
>
> > So in most but not all situations the integer value keeps the object from getting finalised. This observation corresponds to the effects I saw in my programs.
> >
> > I find this rather unfortunate. Is this known, documented behaviour? In a typical program there are such integer values all over the place. How should such values be stored to avoid unwanted interaction with the GC?
>
> Yes, D's garbage collector is a conservative garbage collector. One which doesn't have this problem is called a precise garbage collector.
>
> There are two problems here. First, D has unions, so it is impossible for the GC to determine if a union contains an integer or a pointer.
>
> Second problem is the granularity of scanning. A memory block is scanned as if every n bits (n being your architecture) is a pointer, or there are no pointers. This is determined by a bit associated with the block (the NO_SCAN bit).
>
> If you allocate a memory block that contains at least one pointer, then all the words in the memory block are considered to be pointers by the GC. There is a (continually updated) patch which allows the GC to be semi-precise. That is, the type information of the memory block will be linked to it. This will allow precise scanning except for unions. Once this is integrated, the false pointer problem will be much less prevalent.
>
> -Steve
Thanks! This absolutely makes sense. It is basically a trade-off between
precision and efficiency of the GC.
Slowly, I am learning all the little details of D's garbage collection.
It is more complicated than it seems at first, but understanding it
better greatly helps to write better programs in terms of memory
management.
There is one case though that I am still not sure about: associative arrays. It seems that keys as well as values in AAs are scanned for pointers even if both are integer types. How can I tell the GC that I do not want them to be scanned? I know about the NO_SCAN flag but what memory region should it be applied to in this case?
BTW: considering the "conservative" scanning, the implementation of Object.toHash() is somewhat interesting:
hash_t toHash()
{
// BUG: this prevents a compacting GC from working, needs to be fixed
return cast(hash_t)cast(void*)this;
}
So an object's hash value will keep the GC from freeing the object, if that value is scanned. But as the comment indicates, this implementation needs to be changed anyway (I am eager to see the result). A compacting GC probably gives rise to some whole new problems.
Ivo
|
Copyright © 1999-2021 by the D Language Foundation