| Thread overview | |||||||
|---|---|---|---|---|---|---|---|
|
April 05, 2012 Slices and GC | ||||
|---|---|---|---|---|
| ||||
Recently I've been working on some projects that involve parsing binary files. I've mainly been using std.file.read() to get the whole file as a huge array and then extracting slices. I had initially assumed that the GC would free any chunks of the array that didn't end up being referenced by these slices, but after reading some more, it looks like the whole array is kept in memory even if only a few elements are actually referenced. Is this actually the case? If so, might the language be extended to handle this situation? | ||||
April 05, 2012 Re: Slices and GC | ||||
|---|---|---|---|---|
| ||||
Posted in reply to BLM | On Thursday, 5 April 2012 at 15:00:04 UTC, BLM wrote:
> Recently I've been working on some projects that involve parsing binary files. I've mainly been using std.file.read() to get the whole file as a huge array and then extracting slices. I had initially assumed that the GC would free any chunks of the array that didn't end up being referenced by these slices, but after reading some more, it looks like the whole array is kept in memory even if only a few elements are actually referenced. Is this actually the case? If so, might the language be extended to handle this situation?
The GC can't really know which parts of the array you're using. For example, your only reference to the array might be a pointer, and you might be traversing the array in either direction, only keeping count of the remaining bytes until the array boundary.
Consider .dup-ing the slices you're going to need, or using std.mmfile to map the file into memory - in that case, the OS won't load the unnecessary parts of the file into memory in the first place.
| |||
April 05, 2012 Re: Slices and GC | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | On Thursday, 5 April 2012 at 15:30:45 UTC, Vladimir Panteleev wrote:
>
> The GC can't really know which parts of the array you're using. For example, your only reference to the array might be a pointer, and you might be traversing the array in either direction, only keeping count of the remaining bytes until the array boundary.
>
> Consider .dup-ing the slices you're going to need, or using std.mmfile to map the file into memory - in that case, the OS won't load the unnecessary parts of the file into memory in the first place.
I had considered using .dup, but I wanted to minimize overhead. I should probably look into std.mmfile or pull the data out in smaller chunks that the GC can handle individually.
If the GC can distinguish between pointers and slices, it should theoretically be able to prune an array that is only referenced by slices, but I'm not sure how well that would fit into the current GC system.
| |||
April 05, 2012 Re: Slices and GC | ||||
|---|---|---|---|---|
| ||||
Posted in reply to BLM | On Thursday, April 05, 2012 17:00:03 BLM wrote: > Recently I've been working on some projects that involve parsing binary files. I've mainly been using std.file.read() to get the whole file as a huge array and then extracting slices. I had initially assumed that the GC would free any chunks of the array that didn't end up being referenced by these slices, but after reading some more, it looks like the whole array is kept in memory even if only a few elements are actually referenced. Is this actually the case? If so, might the language be extended to handle this situation? http://dlang.org/d-array-article.html - Jonathan M Davis | |||
April 05, 2012 Re: Slices and GC | ||||
|---|---|---|---|---|
| ||||
Posted in reply to BLM | On 05.04.2012 20:35, BLM wrote: > On Thursday, 5 April 2012 at 15:30:45 UTC, Vladimir Panteleev wrote: > >> >> The GC can't really know which parts of the array you're using. For >> example, your only reference to the array might be a pointer, and you >> might be traversing the array in either direction, only keeping count >> of the remaining bytes until the array boundary. >> >> Consider .dup-ing the slices you're going to need, or using std.mmfile >> to map the file into memory - in that case, the OS won't load the >> unnecessary parts of the file into memory in the first place. > > I had considered using .dup, but I wanted to minimize overhead. I should > probably look into std.mmfile or pull the data out in smaller chunks > that the GC can handle individually. Another idea is to copy out interesting parts of the original chunk to a separate storage array. This array will contain your sliced-out data just packed more tightly. If you have a upper bound on % of useful bytes then you can get away without extra allocations. The tricky part is reallocating this storage array, as it will make slices that point to it dangling (and keeping GC from deallocation), a workaround would be to use pure index-based "slices" that work on this block only. > If the GC can distinguish between pointers and slices, it should > theoretically be able to prune an array that is only referenced by > slices, but I'm not sure how well that would fit into the current GC > system. -- Dmitry Olshansky | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply