Thread overview | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
June 04, 2017 Question to GC experts: NO_SCAN for a part of the block | ||||
---|---|---|---|---|
| ||||
Suppose I need to allocate several dynamic arrays of different types and/or sizes. An obvious choice is to do just that: perform N allocations. But given that I know all the sizes, and also am pretty sure that each of the arrays will have the same lifespan, why would I do N allocations when I can do just one? After all, I can ask the GC for a sufficiently-sized memory block and put my arrays in it. But I don't necessarily want the GC to scan *all* my arrays for pointers. The problem is, with current GC, it seems that it's impossible to mark only a part of the block as NO_SCAN. Note that I'm not really talking about precise GC here with full type information, but about addRange() pointing within a GC-allocated block. A short example, two arrays, one holding class references, the other - bytes: --- import core.memory : GC; import std.stdio; import std.typecons; class C { int i; this(int i) { this.i = i; } ~this() { writeln("C(", i, ") dtor"); } } auto obvious(int numCs, int numBytes) { // obvious solution: two arrays, two allocations return tuple!("Cs", "bytes")(new C[numCs], new byte[numBytes]); } auto scanned(int numCs, int numBytes) { // one allocation for both arrays, // for simplicity not dealing with alignment/out of memory here. // allocate with default attributes, scanned block auto memory = GC.malloc(numCs*C.sizeof + numBytes*byte.sizeof); auto cs = (cast(C*)memory)[0..numCs]; cs[] = C.init; auto bytes = (cast(byte*)(memory + numCs*C.sizeof))[0..numBytes]; bytes[] = byte.init; return tuple!("Cs", "bytes")(cs, bytes); } auto selective(int numCs, int numBytes) { // one allocation for both arrays, // for simplicity not dealing with alignment/out of memory here. // explicitly ask for NO_SCAN block, // to not scan bytes for pointers auto memory = GC.malloc(numCs*C.sizeof + numBytes*byte.sizeof, GC.BlkAttr.NO_SCAN); auto cs = (cast(C*)memory)[0..numCs]; cs[] = C.init; // add scanning range for references GC.addRange(cs.ptr, cs.length*C.sizeof, typeid(C)); auto bytes = (cast(byte*)(memory + numCs*C.sizeof))[0..numBytes]; bytes[] = byte.init; return tuple!("Cs", "bytes")(cs, bytes); } void main() { int numCs = 4; // comes at runtime from elsewhere int numBytes = 32; // comes at runtime from elsewhere auto arrays1 = obvious(numCs, numBytes); int counter; foreach (ref e; arrays1.Cs) e = new C(counter++); // dtors will be called auto arrays2 = scanned(numCs, numBytes); foreach (ref e; arrays2.Cs) e = new C(counter++); // dtors will be called auto arrays3 = selective(numCs, numBytes); foreach (ref e; arrays3.Cs) e = new C(counter++); // dtors will not be called } --- Should this work, and if not, why? |
June 04, 2017 Re: Question to GC experts: NO_SCAN for a part of the block | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stanislav Blinov | On 06/04/2017 03:08 AM, Stanislav Blinov wrote: > --- > > import core.memory : GC; > import std.stdio; > import std.typecons; > > class C { > int i; > this(int i) { > this.i = i; > } > > ~this() { > writeln("C(", i, ") dtor"); > } > } [...] > auto selective(int numCs, int numBytes) { [...] > auto memory = GC.malloc(numCs*C.sizeof + numBytes*byte.sizeof, > GC.BlkAttr.NO_SCAN); > auto cs = (cast(C*)memory)[0..numCs]; > cs[] = C.init; > // add scanning range for references > GC.addRange(cs.ptr, cs.length*C.sizeof, typeid(C)); > auto bytes = (cast(byte*)(memory + numCs*C.sizeof))[0..numBytes]; > bytes[] = byte.init; > return tuple!("Cs", "bytes")(cs, bytes); > } > > void main() { > > int numCs = 4; // comes at runtime from elsewhere > int numBytes = 32; // comes at runtime from elsewhere [...] > int counter; [...] > auto arrays3 = selective(numCs, numBytes); > foreach (ref e; arrays3.Cs) > e = new C(counter++); // dtors will not be called > } > > --- > > Should this work, and if not, why? As far as I can tell, the `addRange` call works correctly, but maybe too well in a sense. It keeps the `new`ed `C`s alive as long as `arrays3.Cs` has pointers to them. And `arrays3.Cs` has those pointers until the very end. If you add `GC.removeRange(arrays3.Cs.ptr);` at the of `main`, the dtors show up. Overwriting `arrays3.Cs`'s elements with `null`s also works. My guess is that the `removeRange` call isn't done automatically before the final run of the GC. Maybe it should be? But I have a vague memory that the GC isn't required to call destructors on everything in the final run. Or maybe it's not guaranteed that there is a final run when the program ends? Anyway, that would mean everything's working as intended, and you just can't rely on destructors like that. |
June 04, 2017 Re: Question to GC experts: NO_SCAN for a part of the block | ||||
---|---|---|---|---|
| ||||
Posted in reply to ag0aep6g | On Sunday, 4 June 2017 at 09:38:45 UTC, ag0aep6g wrote: >> Should this work, and if not, why? > > As far as I can tell, the `addRange` call works correctly, but maybe too well in a sense. It keeps the `new`ed `C`s alive as long as `arrays3.Cs` has pointers to them. And `arrays3.Cs` has those pointers until the very end. Yeah, after playing around with the code a bit, shuffling the calls, making new allocations, etc., I saw those dtors indeed being run. I was just expecting the behavior to be the same as for normal 'new'ed arrays, but I guess there are nuances. > If you add `GC.removeRange(arrays3.Cs.ptr);` at the of `main`, the dtors show up. Yeah, well, calling it manually sort of defeats the purpose :) > Overwriting `arrays3.Cs`'s elements with `null`s also works. My guess is that the `removeRange` call isn't done automatically before the final run of the GC. Maybe it should be? If at all possible, I think that'd be good, if only for consistency. > But I have a vague memory that the GC isn't required to call destructors on everything in the final run. Or maybe it's not guaranteed that there is a final run when the program ends? That's what puzzled me, seeing as dtors from the other two arrays ran at the end. I understand how particular dtors may not be run due to circular or self-references, but there are none in this case. > Anyway, that would mean everything's working as intended, and you just can't rely on destructors like that. Seems it is, and that is great. Now all that's left is some benchmarking to see if saving on allocations actually yields any fruit. Thanks! |
June 05, 2017 Re: Question to GC experts: NO_SCAN for a part of the block | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stanislav Blinov | On 6/4/17 5:57 AM, Stanislav Blinov wrote:
> On Sunday, 4 June 2017 at 09:38:45 UTC, ag0aep6g wrote:
>
>>> Should this work, and if not, why?
>>
>> As far as I can tell, the `addRange` call works correctly, but maybe
>> too well in a sense. It keeps the `new`ed `C`s alive as long as
>> `arrays3.Cs` has pointers to them. And `arrays3.Cs` has those pointers
>> until the very end.
>
> Yeah, after playing around with the code a bit, shuffling the calls,
> making new allocations, etc., I saw those dtors indeed being run. I was
> just expecting the behavior to be the same as for normal 'new'ed arrays,
> but I guess there are nuances.
>
>> If you add `GC.removeRange(arrays3.Cs.ptr);` at the of `main`, the
>> dtors show up.
>
> Yeah, well, calling it manually sort of defeats the purpose :)
adding a range marks it as having pointers to scan, AND stores a reference to that array, so it won't be GC collected (nor will anything it points to). The intention is for it to be used on non-GC memory, like C malloc'd memory, where it doesn't matter that the GC is pointing at it.
I would say that you are better off allocating 2 arrays -- one with NO_SCAN where you put your non-pointer-containing data, and one without the flag to put your other data. This is similar to your "selective" function, but instead of allocating 1 array, with a tuple of slices into it, just allocate 2 arrays and return the tuple of those 2 arrays.
In my dcollections library, the allocator I use uses malloc for non-pointer containing arrays, and GC for everything that could contain pointers, and then divvy up the array into the exact sizes I want. Works quite well, but the dtor of the container needs to deallocate the malloc'd one.
-Steve
|
June 05, 2017 Re: Question to GC experts: NO_SCAN for a part of the block | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Monday, 5 June 2017 at 13:11:25 UTC, Steven Schveighoffer wrote: > adding a range marks it as having pointers to scan, AND stores a reference to that array, so it won't be GC collected (nor will anything it points to). The intention is for it to be used on non-GC memory, like C malloc'd memory, where it doesn't matter that the GC is pointing at it. > Huh? https://dlang.org/phobos/core_memory.html#.GC.addRange : > Note that p[0 .. sz] is treated as an opaque range of memory assumed to be suitably managed by the caller. In particular, if p points into a GC-managed memory block, addRange does not mark this block as live. Is that paragraph wrong? > I would say that you are better off allocating 2 arrays -- one with NO_SCAN where you put your non-pointer-containing data, and one without the flag to put your other data. This is similar to your "selective" function, but instead of allocating 1 array, with a tuple of slices into it, just allocate 2 arrays and return the tuple of those 2 arrays. Then it's just the obvious() function. The whole point of the exercise is to make one GC allocation instead of N :) But still GC, so as not to put additional responsibility on the caller. |
June 06, 2017 Re: Question to GC experts: NO_SCAN for a part of the block | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stanislav Blinov | On Monday, 5 June 2017 at 23:30:00 UTC, Stanislav Blinov wrote: > On Monday, 5 June 2017 at 13:11:25 UTC, Steven Schveighoffer wrote: > >> adding a range marks it as having pointers to scan, AND stores a reference to that array, so it won't be GC collected (nor will anything it points to). The intention is for it to be used on non-GC memory, like C malloc'd memory, where it doesn't matter that the GC is pointing at it. >> > > Huh? > > https://dlang.org/phobos/core_memory.html#.GC.addRange : > >> Note that p[0 .. sz] is treated as an opaque range of memory assumed to be suitably managed by the caller. In particular, if p points into a GC-managed memory block, addRange does not mark this block as live. > > Is that paragraph wrong? No I am wrong. I assumed that because the gc is part of the static space it's metadata would also be scanned. But the data is allocated using c malloc, so it's not scanned. This makes any partial insertion using addrange more dangerous actually because you have to take care to keep a pointer to that block somewhere > >> I would say that you are better off allocating 2 arrays -- one with NO_SCAN where you put your non-pointer-containing data, and one without the flag to put your other data. This is similar to your "selective" function, but instead of allocating 1 array, with a tuple of slices into it, just allocate 2 arrays and return the tuple of those 2 arrays. > > Then it's just the obvious() function. The whole point of the exercise is to make one GC allocation instead of N :) But still GC, so as not to put additional responsibility on the caller. No, 2 allocations instead of N. -Steve |
June 06, 2017 Re: Question to GC experts: NO_SCAN for a part of the block | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote: >>> Note that p[0 .. sz] is treated as an opaque range of memory assumed to be suitably managed by the caller. In particular, if p points into a GC-managed memory block, addRange does not mark this block as live. >> >> Is that paragraph wrong? > > No I am wrong. I assumed that because the gc is part of the static space it's metadata would also be scanned. But the data is allocated using c malloc, so it's not scanned. This makes any partial insertion using addrange more dangerous actually because you have to take care to keep a pointer to that block somewhere You mean explicitly pointer to the block itself? It is being kept with the first array. Or you mean the pointer passed to addRange? That one would not necessarily belong to the first array. But I thought the GC does track pointers that point to block interiors. After all, that's how slices work. >> Then it's just the obvious() function. The whole point of the exercise is to make one GC allocation instead of N :) But still GC, so as not to put additional responsibility on the caller. > > No, 2 allocations instead of N. In this example. But obviously I'd like to make this generic, i.e.: struct S { /* ... */ } auto arrays = allocateArrays!(int, "ints", numInts, S, "Ss", numSs, void*, "pointers", numPtrs); So it won't necessarily be 2. And of course the function would calculate the alignment, initialize the contents (or not, e.g. via passing a dummy type Uninitialized!T), etc. I do have a use case for allocating up to 4 arrays at once, for example. |
June 06, 2017 Re: Question to GC experts: NO_SCAN for a part of the block | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:
> No, 2 allocations instead of N.
Oh, I misunderstood you. You mean two blocks total, for scanned and non-scanned data. This could indeed work for cases when more than two arrays are needed. Still, it's one extra allocation.
|
June 06, 2017 Re: Question to GC experts: NO_SCAN for a part of the block | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stanislav Blinov | On 6/6/17 6:36 AM, Stanislav Blinov wrote:
> On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:
>
>> No, 2 allocations instead of N.
>
> Oh, I misunderstood you. You mean two blocks total, for scanned and
> non-scanned data. This could indeed work for cases when more than two
> arrays are needed. Still, it's one extra allocation.
>
True. But O(2) is better than O(N) :)
-Steve
|
June 06, 2017 Re: Question to GC experts: NO_SCAN for a part of the block | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Tuesday, 6 June 2017 at 12:07:28 UTC, Steven Schveighoffer wrote:
> On 6/6/17 6:36 AM, Stanislav Blinov wrote:
>> On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:
>>
>>> No, 2 allocations instead of N.
>>
>> Oh, I misunderstood you. You mean two blocks total, for scanned and
>> non-scanned data. This could indeed work for cases when more than two
>> arrays are needed. Still, it's one extra allocation.
>>
>
> True. But O(2) is better than O(N) :)
>
> -Steve
Heh, this isn't malloc. We could end up in a situation when even one allocation is slower than N, depending on when they happen :)
Actually, separate blocks might be a decent choice: there's another problem with sharing blocks for different data in that finalizers can't be run. So such a function would either not support types with finalizers at all, or would have to allocate separate blocks for such types anyway. Don't know, need to actually implement and measure if it's even worth the trouble.
|
Copyright © 1999-2021 by the D Language Foundation