BitArray implementation issue

Jul 22, 2014

safety0ff

Jul 23, 2014

H. S. Teoh

Jul 23, 2014

Jul 23, 2014

Jul 23, 2014

Jul 23, 2014

Jul 23, 2014

Currently the implementation of BitArray's length setter is badly broken. Its implementation contains this code segment: if (newdim != olddim) { auto b = ptr[0 .. olddim]; b.length = newdim; // realloc ptr = b.ptr; if (newdim & (bitsPerSizeT-1)) // Set any pad bits to 0 ptr[newdim - 1] &= ~(~0 << (newdim & (bitsPerSizeT-1))); } There are many bugs in the last 2 lines: 1) If we're increasing the length, ptr[newdim-1] is necessarily zero, so we're doing nothing. 2) If we're reducing the length, we're essentially clearing arbitrary bits whenever the branch is taken, furthermore: 3) On 64 bit, we're always clearing the upper 32 bits of the last word. 4) On 64 bit, we have undefined behaviour from the left shift. While trying to fix this the question was raised: should we even clear any bits? The underlying array can be shared between BitArray's (assignment behaves like dynamic array assignment.) This means that clearing the bits might affect other BitArray's if the extension does not cross a size_t boundary. So it is difficult to mimic D dynamic array semantics at a bit level without a reference counting mechanism. Furthermore, a BitArray can function as an arbitrary data reinterpreter via the function: void init(void[] v, size_t numbits) Help is needed for deciding whether setting length should zero the new bits (even if it means stomping other BitArrays.) Currently I've implemented zero'ing bits upon extension in my PR [1], but it is stalled on this design issue. [1] https://github.com/D-Programming-Language/phobos/pull/2249

July 23, 2014

Re: BitArray implementation issue

Posted by H. S. Teoh
in reply to safety0ff

Permalink

H. S. Teoh

Posted in reply to safety0ff

Permalink

On Tue, Jul 22, 2014 at 09:16:35PM +0000, safety0ff via Digitalmars-d wrote:
> Currently the implementation of BitArray's length setter is badly broken. Its implementation contains this code segment:
> 
>     if (newdim != olddim) {
>         auto b = ptr[0 .. olddim];
>         b.length = newdim;    // realloc
>         ptr = b.ptr;
>         if (newdim & (bitsPerSizeT-1)) // Set any pad bits to 0
>             ptr[newdim - 1] &= ~(~0 << (newdim & (bitsPerSizeT-1)));
>     }
> 
> There are many bugs in the last 2 lines:
> 1) If we're increasing the length, ptr[newdim-1] is necessarily zero,
> so we're doing nothing.
> 2) If we're reducing the length, we're essentially clearing arbitrary
> bits whenever the branch is taken, furthermore:
> 3) On 64 bit, we're always clearing the upper 32 bits of the last
> word.
> 4) On 64 bit, we have undefined behaviour from the left shift.

Wow. Looks like this is in dire need of some TLC.

> While trying to fix this the question was raised: should we even clear any bits?

You might want to consider implementing a way of tracking how many bits in the final word are valid. That way, you can correctly trigger reallocation if the user tries to write to a bit beyond the current end of the array (instead of stomping over other BitArray's data), and you don't have to clear any bits except newly-allocated words. Assuming we want to keep dynamic-array-like behaviour for BitArray, that is. I think it's a good idea.

> The underlying array can be shared between BitArray's (assignment
> behaves like dynamic array assignment.)
> This means that clearing the bits might affect other BitArray's if the
> extension does not cross a size_t boundary.
> So it is difficult to mimic D dynamic array semantics at a bit level
> without a reference counting mechanism.

I think it's best to keep a bit count of the number of valid bits at the end of the array, in addition to the number of words in the underlying array. You might also want to consider keeping a bit count of the number of valid bits at the *start* of the array, so that it's possible to bit-level slicing in a transparent way.

> Furthermore, a BitArray can function as an arbitrary data
> reinterpreter via the function:
> void init(void[] v, size_t numbits)
> 
> Help is needed for deciding whether setting length should zero the new bits (even if it means stomping other BitArrays.)

It shouldn't. Keeping track of the number of valid bits in the last word should solve this problem.

> Currently I've implemented zero'ing bits upon extension in my PR [1], but it is stalled on this design issue.
> 
> [1] https://github.com/D-Programming-Language/phobos/pull/2249

I think the best solution is to keep an additional count of the number of valid bits in the last word of the array. Then you don't have to worry about stomping other BitArrays, and you can potentially implement bit-level slicing too.

T

-- 
IBM = I Blame Microsoft

On Tue, Jul 22, 2014 at 05:57:48PM -0700, H. S. Teoh via Digitalmars-d wrote: [....] > You might want to consider implementing a way of tracking how many bits in the final word are valid. That way, you can correctly trigger reallocation if the user tries to write to a bit beyond the current end of the array (instead of stomping over other BitArray's data), and you don't have to clear any bits except newly-allocated words. Assuming we want to keep dynamic-array-like behaviour for BitArray, that is. I think it's a good idea. Hmm. It looks like the code is already counting bits rather than words. So you should already have enough information to implement the correct solution (take the current length modulo the number of bits per word, and that tells you which bits in the last word are valid). T -- Let's not fight disease by killing the patient. -- Sean 'Shaleh' Perry

On Wednesday, 23 July 2014 at 00:59:38 UTC, H. S. Teoh via Digitalmars-d wrote: > > You might want to consider implementing a way of tracking how many bits > in the final word are valid. That way, you can correctly trigger > reallocation if the user tries to write to a bit beyond the current end > of the array (instead of stomping over other BitArray's data), and you > don't have to clear any bits except newly-allocated words. Assuming we > want to keep dynamic-array-like behaviour for BitArray, that is. I think > it's a good idea. In brief, you're suggesting to always realloc on extension, no matter if it's a sub-word extension. This solves the stomping issue nicely, but it would cause a lot of GC churn in a concatenation loop.

On Wednesday, 23 July 2014 at 02:33:39 UTC, safety0ff wrote: > > In brief, you're suggesting to always realloc on extension, no matter if it's a sub-word extension. > This solves the stomping issue nicely Actually, you'd need to allocate and copy on each extension, not realloc, because realloc would do nothing on subword extension.

On Wednesday, 23 July 2014 at 00:59:38 UTC, H. S. Teoh via Digitalmars-d wrote: > > I think it's best to keep a bit count of the number of valid bits at the > end of the array, in addition to the number of words in the underlying > array. You might also want to consider keeping a bit count of the number > of valid bits at the *start* of the array, so that it's possible to > bit-level slicing in a transparent way. > Sorry, being too eager to reply. We would need to use a secondary pointer to the count to avoid breaking code that uses the void init(void[], size_t) function which allows arbitrary bit twiddling of the passed in array.

Forums