Thread overview
[D-runtime] Metadata for memory
Jul 30, 2010
Fawzi Mohamed
Jul 30, 2010
Fawzi Mohamed
Jul 30, 2010
Fawzi Mohamed
July 30, 2010
People are working diligently on incorporating precise scanning into the GC.

I have added safe array appending.

All of these require metadata associated with the memory block to assist the runtime in correctly performing it's tasks.  So far, the solution has been to eat a small piece of the data block allocated.  For precise scanning, it's a bitmask or pointer-to-bitmask to indicate which fields are pointers.  For safe array appending, it's a length field indicating the "used" length of the block.

One of the problems the developers working on precise scanning have pointed out is that we are both eating up the memory block, and there is the potential that we are both trying to use the same part of the memory block, unaware of the other's usage.  Also, in large blocks I store the array length at the beginning of the block, which can screw up the precise scanning, since the length can offset the alignment of the bitmask.

I think it's probably time to think about how we can allocate metadata in an orderly fashion.

The requirements are that:

1. the overhead is as low as possible.  Array appending uses 1 byte for all blocks <= 256 bytes.  A bitmask for a small block (16 bytes) only needs 3-4 bits, but may consume 4 bytes if a pointer to a bitmask is required.

2. alignment of data is preserved.  This is done mostly via putting the data at
the end.  But storing outside the block is also a possibility.
3. Individual Metadata does not conflict with others.

I think the GC is going to have to be aware of the array appending, since it must be aware of the length field (if stored int he block), and it could potentially save many cycles only scanning 'used' data.

On storing the array append length at the beginning of large blocks:  A large block can be extended in place, meaning that the block size (not just the used size) can increase.  This means the end of the block can move.  With my current array append code, for shared array appends, you would have to lock the entire append function to ensure this didn't happen in the middle of the append.  I query for the block size once, and if it changed after I do the query, the length would be screwed up.  This might be a necessary evil but who is appending large amounts of data to shared blocks anyways?  So biting the shared bullet and locking the entire shared append function might take care of the issues of storing at the beginning (scanning skipping the length, bitmask offset by n bytes).  The GC is going to have to be aware of moving the bitmask anyways.

I think the most funky situation is really the small blocks (16 or 32 bytes) where you want overhead to be small, but you need to store enough info.  With a 16-byte block, I think we can squeeze both the bitmask and length into one byte.  Essentially 4 bits for length, and 3 bits for a bitmask (the 4th word is incomplete, so only 3 bits are needed).  For a 32 byte block, we would need 2 bytes, one for the 7-bits bitmask, and one for the length.  At 64 bytes, we can have 16 bits for bitmask, and 1 byte for length.  At 128 bytes and above, a bitmask would consist of 32 bits anyways, so might as well start using a pointer (may be different for 64-bit CPUs), and still only a byte is needed for length.

The order of storage should always be arraylength, bitmask to keep alignment proper, and allow the most usage of space.  And depending on the bits set on the block (APPENDABLE, NO_SCAN), one or both of these may be omitted.

We also should get bugzilla 4487 addressed, allocating an "array" whenever a single struct is desired is wasteful.

Anyone have any other ideas?  Any other points to make?

-Steve




July 30, 2010
On 30-lug-10, at 15:53, Steve Schveighoffer wrote:

> People are working diligently on incorporating precise scanning into the GC.
>
> I have added safe array appending.
>
> All of these require metadata associated with the memory block to
> assist the
> runtime in correctly performing it's tasks.  So far, the solution
> has been to
> eat a small piece of the data block allocated.  For precise
> scanning, it's a
> bitmask or pointer-to-bitmask to indicate which fields are
> pointers.  For safe
> array appending, it's a length field indicating the "used" length of
> the block.
>
> One of the problems the developers working on precise scanning have
> pointed out
> is that we are both eating up the memory block, and there is the
> potential that
> we are both trying to use the same part of the memory block, unaware
> of the
> other's usage.  Also, in large blocks I store the array length at
> the beginning
> of the block, which can screw up the precise scanning, since the
> length can
> offset the alignment of the bitmask.
>
> I think it's probably time to think about how we can allocate
> metadata in an
> orderly fashion.
>
> The requirements are that:
>
> 1. the overhead is as low as possible.  Array appending uses 1 byte
> for all
> blocks <= 256 bytes.  A bitmask for a small block (16 bytes) only
> needs 3-4
> bits, but may consume 4 bytes if a pointer to a bitmask is required.
>
> 2. alignment of data is preserved.  This is done mostly via putting
> the data at
> the end.  But storing outside the block is also a possibility.
> 3. Individual Metadata does not conflict with others.
>
> I think the GC is going to have to be aware of the array appending,
> since it
> must be aware of the length field (if stored int he block), and it
> could
> potentially save many cycles only scanning 'used' data.
>
> On storing the array append length at the beginning of large
> blocks:  A large
> block can be extended in place, meaning that the block size (not
> just the used
> size) can increase.  This means the end of the block can move.  With
> my current
> array append code, for shared array appends, you would have to lock
> the entire
> append function to ensure this didn't happen in the middle of the
> append.  I
> query for the block size once, and if it changed after I do the
> query, the
> length would be screwed up.  This might be a necessary evil but who
> is appending
> large amounts of data to shared blocks anyways?  So biting the
> shared bullet and
> locking the entire shared append function might take care of the
> issues of
> storing at the beginning (scanning skipping the length, bitmask
> offset by n
> bytes).  The GC is going to have to be aware of moving the bitmask
> anyways.
>
> I think the most funky situation is really the small blocks (16 or
> 32 bytes)
> where you want overhead to be small, but you need to store enough
> info.  With a
> 16-byte block, I think we can squeeze both the bitmask and length
> into one
> byte.  Essentially 4 bits for length, and 3 bits for a bitmask (the
> 4th word is
> incomplete, so only 3 bits are needed).  For a 32 byte block, we
> would need 2
> bytes, one for the 7-bits bitmask, and one for the length.  At 64
> bytes, we can
> have 16 bits for bitmask, and 1 byte for length.  At 128 bytes and
> above, a
> bitmask would consist of 32 bits anyways, so might as well start
> using a pointer
> (may be different for 64-bit CPUs), and still only a byte is needed
> for length.
>
> The order of storage should always be arraylength, bitmask to keep
> alignment
> proper, and allow the most usage of space.  And depending on the
> bits set on the
> block (APPENDABLE, NO_SCAN), one or both of these may be omitted.
>
> We also should get bugzilla 4487 addressed, allocating an "array"
> whenever a
> single struct is desired is wasteful.
>
> Anyone have any other ideas?  Any other points to make?

I think that the main points are there: alignement, small overhead and conflict avoidance.

Actually I don't see the need of more info about a block than size, and character (equivalent to the typeid of what was allocated). Anything else of interest can be derived from that.

Given this one could think about hardcoding that in some way in the
block.
The rest should be stored elsewhere imho (one can expect that the
table to reach other info will be cached when used, as it should be
relatively small).

Still I would give more thought about storing also those two pieces of
information outside the block.
The pool already knows the size, and one could add a (uint/size_t?)
per element to store the character.

Fawzi
July 30, 2010
> [...]
> I think the most funky situation is really the small blocks (16 or
> 32 bytes)
> where you want overhead to be small, but you need to store enough
> info.  With a
> 16-byte block, I think we can squeeze both the bitmask and length
> into one
> byte.  Essentially 4 bits for length, and 3 bits for a bitmask (the
> 4th word is
> incomplete, so only 3 bits are needed).  For a 32 byte block, we
> would need 2
> bytes, one for the 7-bits bitmask, and one for the length.  At 64
> bytes, we can
> have 16 bits for bitmask, and 1 byte for length.  At 128 bytes and
> above, a
> bitmask would consist of 32 bits anyways, so might as well start
> using a pointer
> (may be different for 64-bit CPUs), and still only a byte is needed
> for length.

Especially in these cases external storage with a bitmap for the whole pool is advantageous.

July 30, 2010



----- Original Message ----
> From: Fawzi Mohamed <fawzi at gmx.ch>
> 
> Actually I  don't see the need of more info about a block than size, and
>character  (equivalent to the typeid of what was allocated).
> Anything else of interest  can be derived from that.

The "used" length of an array is not a static value.  It needs to be stored separate from the typeinfo.  The block size is identified by the pool or by doing a linear search on large blocks (this also needs to be addressed at some point).

> Given this one could think about hardcoding  that in some way in the block. The rest should be stored elsewhere imho (one  can expect that the table to
>reach other info will be cached when used, as it  should be relatively small).
> 
> Still I would give more thought about  storing also those two pieces of
>information outside the block.
> The pool  already knows the size, and one could add a (uint/size_t?) per
>element to store  the  character.

Note that as I said above, the "used size" must be stored somewhere besides the "character", and also note that for arrays, a pad byte is inherent in the memory block to avoid pointers moving to the next block (which might possibly be unallocated).  So we already are using a byte inside the block for something, might as well make some use of it.

For non-arrays, I'm unsure.  An external pointer might be fine, but for small blocks, it can be up to 25% overhead if you are storing a size_t.

-Steve




July 30, 2010
On 30-lug-10, at 16:44, Steve Schveighoffer wrote:

> ----- Original Message ----
>> From: Fawzi Mohamed <fawzi at gmx.ch>
>>
>> Actually I  don't see the need of more info about a block than
>> size, and
>> character  (equivalent to the typeid of what was allocated).
>> Anything else of interest  can be derived from that.
>
> The "used" length of an array is not a static value.  It needs to be
> stored
> separate from the typeinfo.  The block size is identified by the
> pool or by
> doing a linear search on large blocks (this also needs to be
> addressed at some
> point).
>
>> Given this one could think about hardcoding  that in some way in
>> the block.
>> The rest should be stored elsewhere imho (one  can expect that the
>> table to
>> reach other info will be cached when used, as it  should be
>> relatively small).
>>
>> Still I would give more thought about  storing also those two
>> pieces of
>> information outside the block.
>> The pool  already knows the size, and one could add a (uint/
>> size_t?) per
>> element to store  the  character.
>
> Note that as I said above, the "used size" must be stored somewhere
> besides the
> "character", and also note that for arrays, a pad byte is inherent
> in the memory
> block to avoid pointers moving to the next block (which might
> possibly be
> unallocated).  So we already are using a byte inside the block for
> something,
> might as well make some use of it.
>
> For non-arrays, I'm unsure.  An external pointer might be fine, but
> for small
> blocks, it can be up to 25% overhead if you are storing a size_t.

Yes I realized as much after having sent it, thus my previous repost.
Exactly in those cases as separate bitmap for the whole pool is much
advantageous, and also the size could be done the same way, one would
use special encoding for the pools of small objects (one could use
even less than one byte for B_16, B_32,...)
Being "external" no clashes are possible...

Fawzi