Jump to page: 1 2
Thread overview
Question to GC experts: NO_SCAN for a part of the block
Jun 04, 2017
Stanislav Blinov
Jun 04, 2017
ag0aep6g
Jun 04, 2017
Stanislav Blinov
Jun 05, 2017
Stanislav Blinov
Jun 06, 2017
Stanislav Blinov
Jun 06, 2017
Stanislav Blinov
Jun 06, 2017
Stanislav Blinov
Jun 06, 2017
Stanislav Blinov
Jun 06, 2017
Stanislav Blinov
June 04, 2017
Suppose I need to allocate several dynamic arrays of different types and/or sizes. An obvious choice is to do just that: perform N allocations.
But given that I know all the sizes, and also am pretty sure that each of the arrays will have the same lifespan, why would I do N allocations when I can do just one? After all, I can ask the GC for a sufficiently-sized memory block and put my arrays in it.

But I don't necessarily want the GC to scan *all* my arrays for pointers. The problem is, with current GC, it seems that it's impossible to mark only a part of the block as NO_SCAN. Note that I'm not really talking about precise GC here with full type information, but about addRange() pointing within a GC-allocated block. A short example, two arrays, one holding class references, the other - bytes:

---

import core.memory : GC;
import std.stdio;
import std.typecons;

class C {
    int i;
    this(int i) {
        this.i = i;
    }

    ~this() {
        writeln("C(", i, ") dtor");
    }
}

auto obvious(int numCs, int numBytes) {
    // obvious solution: two arrays, two allocations
    return tuple!("Cs", "bytes")(new C[numCs], new byte[numBytes]);
}

auto scanned(int numCs, int numBytes) {

    // one allocation for both arrays,
    // for simplicity not dealing with alignment/out of memory here.
    // allocate with default attributes, scanned block
    auto memory = GC.malloc(numCs*C.sizeof + numBytes*byte.sizeof);
    auto cs = (cast(C*)memory)[0..numCs];
    cs[] = C.init;
    auto bytes = (cast(byte*)(memory + numCs*C.sizeof))[0..numBytes];
    bytes[] = byte.init;
    return tuple!("Cs", "bytes")(cs, bytes);
}

auto selective(int numCs, int numBytes) {

    // one allocation for both arrays,
    // for simplicity not dealing with alignment/out of memory here.
    // explicitly ask for NO_SCAN block,
    // to not scan bytes for pointers
    auto memory = GC.malloc(numCs*C.sizeof + numBytes*byte.sizeof,
            GC.BlkAttr.NO_SCAN);
    auto cs = (cast(C*)memory)[0..numCs];
    cs[] = C.init;
    // add scanning range for references
    GC.addRange(cs.ptr, cs.length*C.sizeof, typeid(C));
    auto bytes = (cast(byte*)(memory + numCs*C.sizeof))[0..numBytes];
    bytes[] = byte.init;
    return tuple!("Cs", "bytes")(cs, bytes);
}

void main() {

    int numCs = 4; // comes at runtime from elsewhere
    int numBytes = 32; // comes at runtime from elsewhere

    auto arrays1 = obvious(numCs, numBytes);

    int counter;
    foreach (ref e; arrays1.Cs)
        e = new C(counter++); // dtors will be called

    auto arrays2 = scanned(numCs, numBytes);
    foreach (ref e; arrays2.Cs)
        e = new C(counter++); // dtors will be called

    auto arrays3 = selective(numCs, numBytes);
    foreach (ref e; arrays3.Cs)
        e = new C(counter++); // dtors will not be called
}

---

Should this work, and if not, why?
June 04, 2017
On 06/04/2017 03:08 AM, Stanislav Blinov wrote:
> ---
> 
> import core.memory : GC;
> import std.stdio;
> import std.typecons;
> 
> class C {
>      int i;
>      this(int i) {
>          this.i = i;
>      }
> 
>      ~this() {
>          writeln("C(", i, ") dtor");
>      }
> }
[...]
> auto selective(int numCs, int numBytes) {
[...]
>      auto memory = GC.malloc(numCs*C.sizeof + numBytes*byte.sizeof,
>              GC.BlkAttr.NO_SCAN);
>      auto cs = (cast(C*)memory)[0..numCs];
>      cs[] = C.init;
>      // add scanning range for references
>      GC.addRange(cs.ptr, cs.length*C.sizeof, typeid(C));
>      auto bytes = (cast(byte*)(memory + numCs*C.sizeof))[0..numBytes];
>      bytes[] = byte.init;
>      return tuple!("Cs", "bytes")(cs, bytes);
> }
> 
> void main() {
> 
>      int numCs = 4; // comes at runtime from elsewhere
>      int numBytes = 32; // comes at runtime from elsewhere
[...]
>      int counter;
[...]
>      auto arrays3 = selective(numCs, numBytes);
>      foreach (ref e; arrays3.Cs)
>          e = new C(counter++); // dtors will not be called
> }
> 
> ---
> 
> Should this work, and if not, why?

As far as I can tell, the `addRange` call works correctly, but maybe too well in a sense. It keeps the `new`ed `C`s alive as long as `arrays3.Cs` has pointers to them. And `arrays3.Cs` has those pointers until the very end.

If you add `GC.removeRange(arrays3.Cs.ptr);` at the of `main`, the dtors show up. Overwriting `arrays3.Cs`'s elements with `null`s also works. My guess is that the `removeRange` call isn't done automatically before the final run of the GC. Maybe it should be?

But I have a vague memory that the GC isn't required to call destructors on everything in the final run. Or maybe it's not guaranteed that there is a final run when the program ends? Anyway, that would mean everything's working as intended, and you just can't rely on destructors like that.
June 04, 2017
On Sunday, 4 June 2017 at 09:38:45 UTC, ag0aep6g wrote:

>> Should this work, and if not, why?
>
> As far as I can tell, the `addRange` call works correctly, but maybe too well in a sense. It keeps the `new`ed `C`s alive as long as `arrays3.Cs` has pointers to them. And `arrays3.Cs` has those pointers until the very end.

Yeah, after playing around with the code a bit, shuffling the calls, making new allocations, etc., I saw those dtors indeed being run. I was just expecting the behavior to be the same as for normal 'new'ed arrays, but I guess there are nuances.

> If you add `GC.removeRange(arrays3.Cs.ptr);` at the of `main`, the dtors show up.

Yeah, well, calling it manually sort of defeats the purpose :)

> Overwriting `arrays3.Cs`'s elements with `null`s also works. My guess is that the `removeRange` call isn't done automatically before the final run of the GC. Maybe it should be?

If at all possible, I think that'd be good, if only for consistency.

> But I have a vague memory that the GC isn't required to call destructors on everything in the final run. Or maybe it's not guaranteed that there is a final run when the program ends?

That's what puzzled me, seeing as dtors from the other two arrays ran at the end. I understand how particular dtors may not be run due to circular or self-references, but there are none in this case.

> Anyway, that would mean everything's working as intended, and you just can't rely on destructors like that.

Seems it is, and that is great. Now all that's left is some benchmarking to see if saving on allocations actually yields any fruit.

Thanks!
June 05, 2017
On 6/4/17 5:57 AM, Stanislav Blinov wrote:
> On Sunday, 4 June 2017 at 09:38:45 UTC, ag0aep6g wrote:
>
>>> Should this work, and if not, why?
>>
>> As far as I can tell, the `addRange` call works correctly, but maybe
>> too well in a sense. It keeps the `new`ed `C`s alive as long as
>> `arrays3.Cs` has pointers to them. And `arrays3.Cs` has those pointers
>> until the very end.
>
> Yeah, after playing around with the code a bit, shuffling the calls,
> making new allocations, etc., I saw those dtors indeed being run. I was
> just expecting the behavior to be the same as for normal 'new'ed arrays,
> but I guess there are nuances.
>
>> If you add `GC.removeRange(arrays3.Cs.ptr);` at the of `main`, the
>> dtors show up.
>
> Yeah, well, calling it manually sort of defeats the purpose :)

adding a range marks it as having pointers to scan, AND stores a reference to that array, so it won't be GC collected (nor will anything it points to). The intention is for it to be used on non-GC memory, like C malloc'd memory, where it doesn't matter that the GC is pointing at it.

I would say that you are better off allocating 2 arrays -- one with NO_SCAN where you put your non-pointer-containing data, and one without the flag to put your other data. This is similar to your "selective" function, but instead of allocating 1 array, with a tuple of slices into it, just allocate 2 arrays and return the tuple of those 2 arrays.

In my dcollections library, the allocator I use uses malloc for non-pointer containing arrays, and GC for everything that could contain pointers, and then divvy up the array into the exact sizes I want. Works quite well, but the dtor of the container needs to deallocate the malloc'd one.

-Steve
June 05, 2017
On Monday, 5 June 2017 at 13:11:25 UTC, Steven Schveighoffer wrote:

> adding a range marks it as having pointers to scan, AND stores a reference to that array, so it won't be GC collected (nor will anything it points to). The intention is for it to be used on non-GC memory, like C malloc'd memory, where it doesn't matter that the GC is pointing at it.
>

Huh?

https://dlang.org/phobos/core_memory.html#.GC.addRange :

> Note that p[0 .. sz] is treated as an opaque range of memory assumed to be suitably managed by the caller. In particular, if p points into a GC-managed memory block, addRange does not mark this block as live.

Is that paragraph wrong?

> I would say that you are better off allocating 2 arrays -- one with NO_SCAN where you put your non-pointer-containing data, and one without the flag to put your other data. This is similar to your "selective" function, but instead of allocating 1 array, with a tuple of slices into it, just allocate 2 arrays and return the tuple of those 2 arrays.

Then it's just the obvious() function. The whole point of the exercise is to make one GC allocation instead of N :) But still GC, so as not to put additional responsibility on the caller.

June 06, 2017
On Monday, 5 June 2017 at 23:30:00 UTC, Stanislav Blinov wrote:
> On Monday, 5 June 2017 at 13:11:25 UTC, Steven Schveighoffer wrote:
>
>> adding a range marks it as having pointers to scan, AND stores a reference to that array, so it won't be GC collected (nor will anything it points to). The intention is for it to be used on non-GC memory, like C malloc'd memory, where it doesn't matter that the GC is pointing at it.
>>
>
> Huh?
>
> https://dlang.org/phobos/core_memory.html#.GC.addRange :
>
>> Note that p[0 .. sz] is treated as an opaque range of memory assumed to be suitably managed by the caller. In particular, if p points into a GC-managed memory block, addRange does not mark this block as live.
>
> Is that paragraph wrong?

No I am wrong. I assumed that because the gc is part of the static space it's metadata would also be scanned. But the data is allocated using c malloc, so it's not scanned. This makes any partial insertion using addrange more dangerous actually because you have to take care to keep a pointer to that block somewhere

>
>> I would say that you are better off allocating 2 arrays -- one with NO_SCAN where you put your non-pointer-containing data, and one without the flag to put your other data. This is similar to your "selective" function, but instead of allocating 1 array, with a tuple of slices into it, just allocate 2 arrays and return the tuple of those 2 arrays.
>
> Then it's just the obvious() function. The whole point of the exercise is to make one GC allocation instead of N :) But still GC, so as not to put additional responsibility on the caller.

No, 2 allocations instead of N.

-Steve


June 06, 2017
On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:

>>> Note that p[0 .. sz] is treated as an opaque range of memory assumed to be suitably managed by the caller. In particular, if p points into a GC-managed memory block, addRange does not mark this block as live.
>>
>> Is that paragraph wrong?
>
> No I am wrong. I assumed that because the gc is part of the static space it's metadata would also be scanned. But the data is allocated using c malloc, so it's not scanned. This makes any partial insertion using addrange more dangerous actually because you have to take care to keep a pointer to that block somewhere

You mean explicitly pointer to the block itself? It is being kept with the first array.
Or you mean the pointer passed to addRange? That one would not necessarily belong to the first array. But I thought the GC does track pointers that point to block interiors. After all, that's how slices work.

>> Then it's just the obvious() function. The whole point of the exercise is to make one GC allocation instead of N :) But still GC, so as not to put additional responsibility on the caller.
>
> No, 2 allocations instead of N.

In this example. But obviously I'd like to make this generic, i.e.:

struct S { /* ... */ }

auto arrays = allocateArrays!(int, "ints", numInts, S, "Ss", numSs, void*, "pointers", numPtrs);

So it won't necessarily be 2. And of course the function would calculate the alignment, initialize the contents (or not, e.g. via passing a dummy type Uninitialized!T), etc.

I do have a use case for allocating up to 4 arrays at once, for example.
June 06, 2017
On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:

> No, 2 allocations instead of N.

Oh, I misunderstood you. You mean two blocks total, for scanned and non-scanned data. This could indeed work for cases when more than two arrays are needed. Still, it's one extra allocation.

June 06, 2017
On 6/6/17 6:36 AM, Stanislav Blinov wrote:
> On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:
>
>> No, 2 allocations instead of N.
>
> Oh, I misunderstood you. You mean two blocks total, for scanned and
> non-scanned data. This could indeed work for cases when more than two
> arrays are needed. Still, it's one extra allocation.
>

True. But O(2) is better than O(N) :)

-Steve
June 06, 2017
On Tuesday, 6 June 2017 at 12:07:28 UTC, Steven Schveighoffer wrote:
> On 6/6/17 6:36 AM, Stanislav Blinov wrote:
>> On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:
>>
>>> No, 2 allocations instead of N.
>>
>> Oh, I misunderstood you. You mean two blocks total, for scanned and
>> non-scanned data. This could indeed work for cases when more than two
>> arrays are needed. Still, it's one extra allocation.
>>
>
> True. But O(2) is better than O(N) :)
>
> -Steve

Heh, this isn't malloc. We could end up in a situation when even one allocation is slower than N, depending on when they happen :)

Actually, separate blocks might be a decent choice: there's another problem with sharing blocks for different data in that finalizers can't be run. So such a function would either not support types with finalizers at all, or would have to allocate separate blocks for such types anyway. Don't know, need to actually implement and measure if it's even worth the trouble.
« First   ‹ Prev
1 2