A few considerations on garbage collection - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » A few considerations on garbage collection

Thread overview

A few considerations on garbage collection
Apr 30, 2014 Andrei Alexandrescu
Apr 30, 2014 Andrei Alexandrescu
Apr 30, 2014 Orvid King
Apr 30, 2014 bearophile
Apr 30, 2014 Vladimir Panteleev
Apr 30, 2014 Dmitry Olshansky
Apr 30, 2014 Andrei Alexandrescu
Apr 30, 2014 Vladimir Panteleev
Apr 30, 2014 Andrei Alexandrescu
May 01, 2014 Steven Schveighoffer
May 01, 2014 Steven Schveighoffer
May 01, 2014 Jonathan M Davis
May 05, 2014 Marco Leise
May 05, 2014 Orvid King

April 30, 2014

A few considerations on garbage collection

Posted by Andrei Alexandrescu

Andrei Alexandrescu

I'm mulling over a couple of design considerations for allocators, and was thinking of the following restriction:

1. No out-of-bounds tricks and no pointer arithmetic. Consider:

int[] a = new int[1000];
a = a[250 .. 750];
int* p = a[500 .. $].ptr;

Subsequently the GC should be within its rights to deallocate any memory within the first and last 250 integers allocated, even though in theory the user may get to them by using pointer arithmetic.

In particular that means once a slice is shrunk, there's no growing back unless another slice is around.

I think the current GC already does that.

2. The same may be the case for classes WITHOUT destructors. Consider:

class A
{
   int[1000] a;
   int b;
   int[1000] c;
}
int* p = &(new A).b;

The collector should be allowed to deallocate any memory except b's own, even though that means the class has "holes" in it. The current GC does not do that.

2. However, the same shall not be the case for classes with destructors. Consider:

class B
{
   int[1000] a;
   int b;
   int[1000] c;
   ~this() { ... }
}
int* p = &(new B).b;

This class has a destructor, so it will be kept around in its entirety if an internal pointer is held.

3. Classes meant to have destructors called at collection will ALWAYS have been allocated with new (i.e. won't start in the middle of some other allocation). In other words, only class objects created with new will be properly collected. Those forced in odd places with emplace() are the responsibility of the user.

4. Currently, struct objects created with new do NOT have their destructors called during collection. I think this may continue, meaning that structs created with new are somewhat low-level and are meant to be destroyed and deallocated manually.

5. This brings up arrays of structs. As far as I understand, destructors will never be called for these, even after all references are gone:

struct S { ~this() { ... } }
auto a = new S[100];

Unlike (4), arrays of structs are high-level and frequently used. I think we must do something about it, so I plan to support calling destructors for arrays of structs.

6. The point above brings to mind more radical possibilities, such as making all arrays reference-counted and allowing compulsive deallocation when the reference counter goes down to zero. That would rule out things like escaping pointers to data inside arrays, which is quite extreme. But probably worth bringing up in a brainstorming. If we disallow statically constructs that take addresses we may get away with it.

Please chime in with your thoughts.

Andrei

April 30, 2014

Re: A few considerations on garbage collection

Posted by Andrei Alexandrescu
in reply to Andrei Alexandrescu

Andrei Alexandrescu

Posted in reply to Andrei Alexandrescu

On 4/30/14, 8:33 AM, Andrei Alexandrescu wrote:
>
> int[] a = new int[1000];
> a = a[250 .. 750];
> int* p = a[500 .. $].ptr;

Here I meant:

int[] a = new int[1000];
int* p = a[500 .. $].ptr;
a = a[250 .. 750];


Andrei

April 30, 2014

Re: A few considerations on garbage collection

Posted by Orvid King
in reply to Andrei Alexandrescu

Orvid King

Posted in reply to Andrei Alexandrescu

On Wednesday, 30 April 2014 at 15:33:29 UTC, Andrei Alexandrescu wrote:
> 4. Currently, struct objects created with new do NOT have their destructors called during collection. I think this may continue, meaning that structs created with new are somewhat low-level and are meant to be destroyed and deallocated manually.
>
> 5. This brings up arrays of structs. As far as I understand, destructors will never be called for these, even after all references are gone:
>
> struct S { ~this() { ... } }
> auto a = new S[100];
>
> Unlike (4), arrays of structs are high-level and frequently used. I think we must do something about it, so I plan to support calling destructors for arrays of structs.

Although it would be a breaking change, I am intending to propose that the destructors of heap-allocated structs be handled in the same way as destructors of classes in my GC implementation. I am also intending to propose a guarantee that, if the program exits normally, all destructors will be invoked before the application ends. To help facilitate this I intend to have a thread dedicated to calling destructors. I'm still working on typing out the entire design of the GC, but I don't currently see a reason (design-wise) why I'd have to change these things.

April 30, 2014

Re: A few considerations on garbage collection

Posted by bearophile
in reply to Andrei Alexandrescu

bearophile

Posted in reply to Andrei Alexandrescu

Andrei Alexandrescu:

> Subsequently the GC should be within its rights to deallocate any memory within the first and last 250 integers allocated, even though in theory the user may get to them by using pointer arithmetic.

Such use of point arithmetic is not uncommon.


> I think the current GC already does that.

If this is true then this needs to be documented in bold text.

Bye,
bearophile

April 30, 2014

Re: A few considerations on garbage collection

Posted by Vladimir Panteleev
in reply to Andrei Alexandrescu

Vladimir Panteleev

Posted in reply to Andrei Alexandrescu

On Wednesday, 30 April 2014 at 15:33:29 UTC, Andrei Alexandrescu wrote:
> I think the current GC already does that.

I don't think the current GC can either recognize slices or deallocate parts of an object.

April 30, 2014

Re: A few considerations on garbage collection

Posted by Dmitry Olshansky
in reply to Andrei Alexandrescu

Dmitry Olshansky

Posted in reply to Andrei Alexandrescu

30-Apr-2014 19:33, Andrei Alexandrescu пишет:
> I'm mulling over a couple of design considerations for allocators, and
> was thinking of the following restriction:
>
> 1. No out-of-bounds tricks and no pointer arithmetic. Consider:
>
> int[] a = new int[1000];
> a = a[250 .. 750];
> int* p = a[500 .. $].ptr;
>
> Subsequently the GC should be within its rights to deallocate any memory
> within the first and last 250 integers allocated, even though in theory
> the user may get to them by using pointer arithmetic.
>
> In particular that means once a slice is shrunk, there's no growing back
> unless another slice is around.
>
> I think the current GC already does that.

It doesn't and CAN'T. As long as there is a pointer into the block it stays. There are plenty of ways to shoot yourself in the foot if this rule is not respected. Anyhow, for starters, a conservative GC doesn't know what is a slice when ptr/length sits in registers!

>
> 2. The same may be the case for classes WITHOUT destructors. Consider:
>
> class A
> {
>     int[1000] a;
>     int b;
>     int[1000] c;
> }
> int* p = &(new A).b;
>
> The collector should be allowed to deallocate any memory except b's own,
> even though that means the class has "holes" in it. The current GC does
> not do that.
>
> 2. However, the same shall not be the case for classes with destructors.
> Consider:
>
> class B
> {
>     int[1000] a;
>     int b;
>     int[1000] c;
>     ~this() { ... }
> }
> int* p = &(new B).b;
>
> This class has a destructor, so it will be kept around in its entirety
> if an internal pointer is held.

Does it make *anything* simpler/better? I don't see significant gains in any sane code.

>
> 3. Classes meant to have destructors called at collection will ALWAYS
> have been allocated with new (i.e. won't start in the middle of some
> other allocation). In other words, only class objects created with new
> will be properly collected. Those forced in odd places with emplace()
> are the responsibility of the user.
>
OK

> 4. Currently, struct objects created with new do NOT have their
> destructors called during collection. I think this may continue, meaning
> that structs created with new are somewhat low-level and are meant to be
> destroyed and deallocated manually.

IIRC they do, it's only arrays of such that doesn't. Anyhow having such a dangerous construct built-in (new = resource leak) in the language becomes questionable.

>
> 5. This brings up arrays of structs. As far as I understand, destructors
> will never be called for these, even after all references are gone:
>
> struct S { ~this() { ... } }
> auto a = new S[100];
>
> Unlike (4), arrays of structs are high-level and frequently used. I
> think we must do something about it, so I plan to support calling
> destructors for arrays of structs.

Making the point about NOT calling destructor that much more schizophrenic. Either do it properly or not at all.

>
> 6. The point above brings to mind more radical possibilities, such as
> making all arrays reference-counted and allowing compulsive deallocation
> when the reference counter goes down to zero.

So passing them around becomes costly, and down towards the RCslice we march.

> That would rule out things
> like escaping pointers to data inside arrays, which is quite extreme.

Discussions about such haven't went anywhere good. Attempts to redesign slices to be ref-counted soon lead to RC-ing everything.

BTW there could be sane code in the wild that may have non-reclaimable cycles even if ONLY array slices would become ref-counted.

> But probably worth bringing up in a brainstorming. If we disallow
> statically constructs that take addresses we may get away with it.
>
> Please chime in with your thoughts.
>
> Andrei


-- 
Dmitry Olshansky

April 30, 2014

Re: A few considerations on garbage collection

Posted by Andrei Alexandrescu
in reply to Dmitry Olshansky

Andrei Alexandrescu

Posted in reply to Dmitry Olshansky

On 4/30/14, 11:15 AM, Dmitry Olshansky wrote:
> It doesn't and CAN'T. As long as there is a pointer into the block it
> stays. There are plenty of ways to shoot yourself in the foot if this
> rule is not respected. Anyhow, for starters, a conservative GC doesn't
> know what is a slice when ptr/length sits in registers!

Yah, if you find a register possibly pointing somewhere in the middle of an array, all bets are off.

But let's say that's not the case and all that's pointing in a 1M elements int[] is a int[] of 5 elements somewhere in the middle. Do we want to support these?

s = s.ptr[-1 .. $ + 1]

and such?

Andrei

April 30, 2014

Re: A few considerations on garbage collection

Posted by Vladimir Panteleev
in reply to Andrei Alexandrescu

Vladimir Panteleev

Posted in reply to Andrei Alexandrescu

On Wednesday, 30 April 2014 at 20:29:37 UTC, Andrei Alexandrescu wrote:
> On 4/30/14, 11:15 AM, Dmitry Olshansky wrote:
>> It doesn't and CAN'T. As long as there is a pointer into the block it
>> stays. There are plenty of ways to shoot yourself in the foot if this
>> rule is not respected. Anyhow, for starters, a conservative GC doesn't
>> know what is a slice when ptr/length sits in registers!
>
> Yah, if you find a register possibly pointing somewhere in the middle of an array, all bets are off.
>
> But let's say that's not the case and all that's pointing in a 1M elements int[] is a int[] of 5 elements somewhere in the middle. Do we want to support these?
>
> s = s.ptr[-1 .. $ + 1]
>
> and such?

I guess this is related to the reason Java switched its substring method to make a copy instead of hold a reference?

April 30, 2014

Re: A few considerations on garbage collection

Posted by Andrei Alexandrescu
in reply to Vladimir Panteleev

Andrei Alexandrescu

Posted in reply to Vladimir Panteleev

On 4/30/14, 1:34 PM, Vladimir Panteleev wrote:
> On Wednesday, 30 April 2014 at 20:29:37 UTC, Andrei Alexandrescu wrote:
>> On 4/30/14, 11:15 AM, Dmitry Olshansky wrote:
>>> It doesn't and CAN'T. As long as there is a pointer into the block it
>>> stays. There are plenty of ways to shoot yourself in the foot if this
>>> rule is not respected. Anyhow, for starters, a conservative GC doesn't
>>> know what is a slice when ptr/length sits in registers!
>>
>> Yah, if you find a register possibly pointing somewhere in the middle
>> of an array, all bets are off.
>>
>> But let's say that's not the case and all that's pointing in a 1M
>> elements int[] is a int[] of 5 elements somewhere in the middle. Do we
>> want to support these?
>>
>> s = s.ptr[-1 .. $ + 1]
>>
>> and such?
>
> I guess this is related to the reason Java switched its substring method
> to make a copy instead of hold a reference?

It it related, but not necessarily a motivator. I came to this naturally while designing allocators. -- Andrei

May 01, 2014

Re: A few considerations on garbage collection

Posted by Steven Schveighoffer
in reply to Andrei Alexandrescu

Steven Schveighoffer

Posted in reply to Andrei Alexandrescu

On Wed, 30 Apr 2014 16:29:38 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 4/30/14, 11:15 AM, Dmitry Olshansky wrote:
>> It doesn't and CAN'T. As long as there is a pointer into the block it
>> stays. There are plenty of ways to shoot yourself in the foot if this
>> rule is not respected. Anyhow, for starters, a conservative GC doesn't
>> know what is a slice when ptr/length sits in registers!
>
> Yah, if you find a register possibly pointing somewhere in the middle of an array, all bets are off.

I think Dmitry's point is that the GC is not aware of the length portion of a slice, only the pointer portion. By definition, a pointer at a block could refer to the whole block.

In other words, in your original example, it is aware that 'a' points at the block, but not that a's length is 500. Theoretically, you could assume that a pointer may only lock all data *after* the pointer in the block. In other words, the first 250 ints could be reallocated if not referred to. But I think this is too limiting a behavior, for not much gain.

>
> But let's say that's not the case and all that's pointing in a 1M elements int[] is a int[] of 5 elements somewhere in the middle. Do we want to support these?

I think we are never ever going to get away from the requirement of users to be at least cognizant of these situations.

Consider this:

int[] x;

x.reserve(1_000_000);

should the GC at this point just refuse, citing its rule that you just can't ask for that much memory until you want to use it? Or maybe the GC comes along and says "you're not using that, I'm going to take it back," when your goal explicitly is to pre-allocate such data.

> s = s.ptr[-1 .. $ + 1]

I can think of a very important use case that does this -- interfaces. Possibly you could allow this exception, but I don't think it's worth disallowing a user type that might mimic such useful behavior for some low-level purpose.

-Steve

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation