Thread overview
GC scan for pointers
Mar 09, 2016
Gerald Jansen
Mar 09, 2016
Adam D. Ruppe
Mar 09, 2016
Chris Wright
Mar 10, 2016
thedeemon
Mar 11, 2016
Gerald Jansen
March 09, 2016
I've studied [1] and [2] but don't understand everything there. Hence these dumb questions:

Given

  enum n = 100_000_000; // some big number
  auto a = new ulong[](n);
  auto b = new char[8][](n);
  struct S { ulong x; char[8] y; }
  auto c = new S[](n);

will the large memory blocks allocated for a, b and/or c actually be scanned for pointers to GC-allocated memory during a garbage collection? If so, why?

[1] http://p0nce.github.io/d-idioms/#How-the-D-Garbage-Collector-works
[2] http://dlang.org/garbage.html

March 09, 2016
On Wednesday, 9 March 2016 at 15:14:02 UTC, Gerald Jansen wrote:
> will the large memory blocks allocated for a, b and/or c actually be scanned for pointers to GC-allocated memory during a garbage collection? If so, why?

No. It knows that the type has no pointers in it, so it will not scan it for them.

If it was a struct with a pointer, it might be scanned though. Or static arrays of int on the stack will also be scanned, since the GC doesn't actually know much about local variables - it conservatively assumes anything on the stack might be a pointer.

But large arrays are rarely on the stack so I think it is an ok situation.

See the GC block attr flags:

http://dpldocs.info/experimental-docs/core.memory.GC.BlkAttr.html
March 09, 2016
On Wed, 09 Mar 2016 15:50:43 +0000, Adam D. Ruppe wrote:
> Or static
> arrays of int on the stack will also be scanned, since the GC doesn't
> actually know much about local variables

It's especially tricky because compilers can reuse memory on the stack --  for instance, if I use one variable in the first half of a function, stop using that variable, and start using another one, the compiler can save me some stack space by putting them at the same address.

Plus it's a bit more straightforward to make a performant check for whether a type might be a pointer than for whether a stackframe might have a pointer. With types, it takes one pointer dereference. With stackframes, you have to look through some dictionary stored somewhere.
March 10, 2016
On Wednesday, 9 March 2016 at 15:14:02 UTC, Gerald Jansen wrote:
> I've studied [1] and [2] but don't understand everything there. Hence these dumb questions:
>
> Given
>
>   enum n = 100_000_000; // some big number
>   auto a = new ulong[](n);
>   auto b = new char[8][](n);
>   struct S { ulong x; char[8] y; }
>   auto c = new S[](n);
>
> will the large memory blocks allocated for a, b and/or c actually be scanned for pointers to GC-allocated memory during a garbage collection? If so, why?

I've just tested it with my GC tracker ( https://bitbucket.org/infognition/dstuff ), all 3 allocations go with flags APPENDABLE | NO_SCAN which means these blocks will not be scanned.

But if you define S as
struct S { ulong x; char[] y; }
so there is some pointer inside, then it gets allocated with just APPENDABLE flag, i.e. it will be scanned then.
March 11, 2016
On Thursday, 10 March 2016 at 10:58:41 UTC, thedeemon wrote:
> On Wednesday, 9 March 2016 at 15:14:02 UTC, Gerald Jansen wrote:

>>   enum n = 100_000_000; // some big number
>>   auto a = new ulong[](n);
>>   auto b = new char[8][](n);
>>   struct S { ulong x; char[8] y; }
>>   auto c = new S[](n);
>>
>> will the large memory blocks allocated for a, b and/or c actually be scanned for pointers to GC-allocated memory during a garbage collection? If so, why?
>
> I've just tested it with my GC tracker ( https://bitbucket.org/infognition/dstuff ), all 3 allocations go with flags APPENDABLE | NO_SCAN which means these blocks will not be scanned.
>
> But if you define S as
> struct S { ulong x; char[] y; }
> so there is some pointer inside, then it gets allocated with just APPENDABLE flag, i.e. it will be scanned then.

Thanks for the very clear answer. Adam too. This alleviates much of my fear of GC performance issues for processing largish datasets in memory with traditional loops, even with multiple threads. Of course, it depends on wasting some memory to avoid char[] fields, but that is often a reasonable trade-off for the kind of data I need to process.