Thread overview
Does the GC consider slice lengths?
Apr 01, 2022
Ali Çehreli
Apr 01, 2022
Ali Çehreli
Apr 01, 2022
Ali Çehreli
Apr 01, 2022
H. S. Teoh
Apr 01, 2022
Ali Çehreli
Apr 02, 2022
Salih Dincer
April 01, 2022
void main() {
  auto a = new int[1_000_000];

  auto b = a[0..1];  // Hold on to the first element

  a = a[$-1..$];     // Drop all but the last element

  // Are the middle elements gone, or is b keeping
  // them alive just because it still holds the very
  // first pointer?

  // Attempt to access original elements through
  // a pointer to the first element. Bug?
  auto c = b.ptr[0..1_000_000];
}

One assumption might be that the GC remembers the single large allocation as one and all of the elements are still available? But if it considers slices and their lengths specially, then it would see that the mid section is not referenced any more.

I would not do that but one wonders... :)

A related question is whether the memory for earlier elements are freed as they are popped off from the front. I know by experience that the memory for earlier elements are indeed freed. So one can use D's arrays as queues without memory issues: Add to the end, pop from the front...

Ali
April 01, 2022

On 4/1/22 12:24 PM, Ali Çehreli wrote:

>

void main() {
  auto a = new int[1_000_000];

  auto b = a[0..1];  // Hold on to the first element

  a = a[$-1..$];     // Drop all but the last element

  // Are the middle elements gone, or is b keeping
  // them alive just because it still holds the very
  // first pointer?

  // Attempt to access original elements through
  // a pointer to the first element. Bug?
  auto c = b.ptr[0..1_000_000];

Not a bug, the array still exists and is allocated (the pointer is keeping it alive). Remember, the GC manages the memory, not the slice.

e.g. if this were an array of structs with destructors, those destructors would not be called just because they aren't referenced by some slice.

>

}

One assumption might be that the GC remembers the single large allocation as one and all of the elements are still available? But if it considers slices and their lengths specially, then it would see that the mid section is not referenced any more.

It's not a consideration for the GC, but possibly the compiler. The GC has no visibility into slices that hold references to its memory block. It only has knowledge of pointers to its memory block (and then only indirectly via scanning).

Only the compiler could consider this distinction. As of yet, I don't think D has stated one way or another whether this could be exploited.

But thinking about how a non-native array type might be implemented, I don't think there's any possible way D could consider a reference to one element to not be a reference to all elements.

>

A related question is whether the memory for earlier elements are freed as they are popped off from the front. I know by experience that the memory for earlier elements are indeed freed. So one can use D's arrays as queues without memory issues: Add to the end, pop from the front...

No, because removing the reference does not mean there are no other references.

-Steve

April 01, 2022
On 4/1/22 10:46, Steven Schveighoffer wrote:

>> But if
>> it considers slices and their lengths specially, then it would see
>> that the mid section is not referenced any more.
>
> It's not a consideration for the GC, but possibly the compiler. The GC
> has no visibility into slices that hold references to its memory block.
> It only has knowledge of pointers to its memory block (and then only
> indirectly via scanning).

Yes but if it recognized slices in addition to pointers, it could consider their lengths to reuse the middle section, no? I understand the role of the compiler but as GC is part of the D runtime, it feels like it could own the concept of slices.

>> A related question is whether the memory for earlier elements are
>> freed as they are popped off from the front. I know by experience that
>> the memory for earlier elements are indeed freed. So one can use D's
>> arrays as queues without memory issues: Add to the end, pop from the
>> front...
>
> No, because removing the reference does not mean there are no other
> references.

Agreed but if the array is a private queue of a struct with just one reference, then the earlier memory would be freed.

However, my mind was not working well in the morning: Combining with what you said above, of course it is never the "previous parts" of the array that are freed, rather the old allocations of the array that are freed. (Provided there are no references.)

Ali

April 01, 2022

On 4/1/22 4:15 PM, Ali Çehreli wrote:

>

On 4/1/22 10:46, Steven Schveighoffer wrote:

> >

But if
it considers slices and their lengths specially, then it would see
that the mid section is not referenced any more.

It's not a consideration for the GC, but possibly the compiler. The GC
has no visibility into slices that hold references to its memory block.
It only has knowledge of pointers to its memory block (and then only
indirectly via scanning).

Yes but if it recognized slices in addition to pointers, it could consider their lengths to reuse the middle section, no? I understand the role of the compiler but as GC is part of the D runtime, it feels like it could own the concept of slices.

While possible, there are several problems:

  1. The GC is performant (!) because it only has to check whether a block is live or not. If it had to also mark every single element (which is based on the type) of the block as live or dead, the performance takes a steep nosedive, along with space usage (1 bit per element?).
  2. The GC somehow would have to manage that "middle" block of memory. How do you free part of a block?
  3. By default, memory is untyped. I think even with precise scanning, the stack is untyped. The GC only scans pointers because it has no way of interpreting the length that's contained somewhere nearby. A lot of machinery would have to be added to make this happen.
> > >

A related question is whether the memory for earlier elements are
freed as they are popped off from the front. I know by experience that
the memory for earlier elements are indeed freed. So one can use D's
arrays as queues without memory issues: Add to the end, pop from the
front...

No, because removing the reference does not mean there are no other
references.

Agreed but if the array is a private queue of a struct with just one reference, then the earlier memory would be freed.

However, my mind was not working well in the morning: Combining with what you said above, of course it is never the "previous parts" of the array that are freed, rather the old allocations of the array that are freed. (Provided there are no references.)

Yes, I didn't read fully your question. The "front" elements that you see freed are the old copies that are discarded when a reallocation occurs. It's still an entire block that's managed, not portions of it.

-Steve

April 01, 2022
On Fri, Apr 01, 2022 at 01:15:01PM -0700, Ali Çehreli via Digitalmars-d-learn wrote:
> On 4/1/22 10:46, Steven Schveighoffer wrote:
> 
> >> But if it considers slices and their lengths specially, then it would see that the mid section is not referenced any more.
> >
> > It's not a consideration for the GC, but possibly the compiler. The GC has no visibility into slices that hold references to its memory block.  It only has knowledge of pointers to its memory block (and then only indirectly via scanning).
> 
> Yes but if it recognized slices in addition to pointers, it could consider their lengths to reuse the middle section, no? I understand the role of the compiler but as GC is part of the D runtime, it feels like it could own the concept of slices.
[...]

The GC trades in blocks of memory.  Being conservative, it makes no assumptions about the semantics of pointers that it encounters when it scans for references, other than the fact that it's referencing an allocated block of memory. It doesn't know what a slice is -- which, in D, is stored as a struct containing a pointer and a length: the GC makes no assumptions about the meaning of the length value that just happens to occur adjacently to the pointer. It could be part of a slice, or it could be a completely unrelated variable that has no relation to the pointer; such values are disregarded.

In fact, unless precise scanning is enabled, the GC doesn't even know which values are pointers and which are integers; it just conservatively assumes that every integer-like value is a potential pointer. Only if they don't look like a valid address, or point to something outside the GC heap, or to an unallocated block, then the GC will ignore them.  This is why in 32-bit programs these "false pointers" can sometimes cause the GC not to collect blocks that are actually already unreferenced (you happen to have an integer variable whose value coincidentally looks like a pointer into the block).

So as long as one reference to the allocated block still exists, the GC will not collect it. It cannot tell whether the program might have stored an index into an array inside this block in an integer variable that will later be added to the remaining reference; the GC would have no way of telling which integer variable this might be, so it must conservatively assume that as long as a reference to the block still exists, the entire block is still alive (other parts of the block may still be accessed by adding an index value, etc.).

If you really only intend on keeping a small chunk of an existing large allocated block, one solution is to .dup or .idup the slice you want to keep, and null all other references to the old block (or let them go out of scope).

//

Of course, one could argue that a more precise scanning could help the GC minimize blocks whose sole remaining reference is a slice that covers only a small subset of the entire block. This would involve significant complications to the scanning algorithm, though, which may have performance implications. E.g., what constitutes a slice? Only built-in slices? So pointer + separately-kept index is not allowed? Then we have a potential buffer overrun bug in code that's actually not buggy, if some code holds a pointer to the block with some separate index value, then the GC comes along and thinks it can free the part of the block that the index indirectly refers to.

I'm sure someone can come up with a way to solve this problem, but the question is whether the payoff is worth the likely performance hit to GC collection times.


T

-- 
Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn
April 01, 2022
On 4/1/22 13:55, Steven Schveighoffer wrote:

> The GC only scans pointers because it has no way
> of interpreting the length that's contained somewhere nearby.

Makes sense. (I figured that out myself by thinking after sending my message. :/)

Ali

April 01, 2022
On 4/1/22 14:11, H. S. Teoh wrote:
> the
> question is whether the payoff is worth the likely performance hit to GC
> collection times.

No! :) Thank you for the insights.

Ali

April 02, 2022

On Friday, 1 April 2022 at 16:24:33 UTC, Ali Çehreli wrote:

>

void main() {
auto a = new int[1_000_000];

auto b = a[0..1]; // Hold on to the first element

a = a[$-1..$]; // Drop all but the last element

// Are the middle elements gone, or is b keeping
// them alive just because it still holds the very
// first pointer?

// Attempt to access original elements through
// a pointer to the first element. Bug?
auto c = b.ptr[0..1_000_000];
}

One assumption might be that the GC remembers the single large allocation as one and all of the elements are still available? But if it considers slices and their lengths specially, then it would see that the mid section is not referenced any more.
[...]

import std.stdio;

// -- I couldn't type 32 here --v
enum testLimit = 1024 * 1024 * 31;
void main()
{
  char[] first ="a-123".dup;
  char* oneHalf = &first[2];      // =>[1]

  first.length = testLimit;
  first[$-1] = 'z';

  // legal? Ok.

  auto sliceLeft = first[0..2];   // "a-"
  auto sliceRight = first[$-2..$];// "�z"

  // legal? Ok.

  assert(first.length == testLimit);
  first = sliceLeft ~ sliceRight;
  assert(first.length == 4);

  char[] second = "b-321".dup;
  second.length = testLimit/2;

  first.writeln; // "a-�z"
  first.length = testLimit/2;
  first[0..2].writeln;

  // legal? Ok.

  auto numsMid = oneHalf[0..3];  // "123"
  numsMid.writeln;
  second[0..2].writeln;          // "b-"
}

In this 2nd test, I couldn't get the memory up to 62 MBytes. It throws the following error:
(not a compiler error!)

>

core.exception.OutOfMemoryError@src/core/exception.d(702)

In the same system configuration and in my previous attempt, I had allocated all of the memory to an array. Let's get to the conclusion here: We need to do something to get the used memory back. But what?

SDB@79