BitArray/BitFields - Review (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » BitArray/BitFields - Review (page 2)

July 28, 2012

Re: BitArray/BitFields - Review

Posted by Era Scarecrow
in reply to Jonathan M Davis

Era Scarecrow

Posted in reply to Jonathan M Davis

On Saturday, 28 July 2012 at 21:30:21 UTC, Jonathan M Davis wrote:

> Well, it seems that my comment was somewhat misplaced in that BitArray isn't a range but a container. My comment was directed at opSlice on ranges. Container types should return whatever range type is appropriate. That will usually be a struct of some variety. I'd have to study BitArray and your changes to determine what the appropriate type was here, but it sounds like you were talking about duping the contents with opSlice, which would be highly irregular. Normally, ranges over containers are light wrappers which can iterate through and potentially alter the elements in the container, and duping up an array as the range type doesn't work like that. But depending on what BitArray is doing exactly, it might be appropriate. I don't know.

 An array suggests you can control the size of the array, however if you are forced to work in 32increments (or more if 64bit), that makes it highly hard to control without making it a slice to begin with. Consider without slices (range control)

 BitArray ba; //size 32/64bit

 ba.length = 42; //life, the universe and everything
 assert(ba.length == 42); //fails, can't set length except by increments of 32/64

 ba ~= true; //what's the size now? Or are all appending operators invalid unless it's a slice?

 foreach(b; ba) {
   //fails, can't go over range.
 }

 auto slice = ba[]; //converts to BitSlice

 foreach(b; slice) {
    //works
 }

 BitSlice func(BitArray ba);
 BitSlice func(BitSlice ba); //two versions of function needed? If convertable between the two, then you only lose the begining/ending length offsets.

 T func(T)(T ba) if (is(T : BitArray)) // correct? Potential loss of information.

 BitSlice x = ba[1 .. 10];
 BitArray y = func(x); //range lost, beginning/ending at max regardless of size



 Honestly that doesn't add up to what I call an 'array'. It should look like an array, act like an array, and besides not being able to specify the array as separate from the entire BitArray (Ie const(int)) it's otherwise should not be distinguishable from an array.

July 28, 2012

Re: BitArray/BitFields - Review

Posted by Era Scarecrow
in reply to Era Scarecrow

Era Scarecrow

Posted in reply to Era Scarecrow

On Saturday, 28 July 2012 at 21:39:56 UTC, Era Scarecrow wrote:
> BitArray (Ie const(int)) it's otherwise should not be

 My filter/formatter got me again...

  Honestly that doesn't add up to what I call an 'array'. It should look like an array, act like an array, and besides not being able to specify the array as separate from the entire BitArray (ie: const(int)[]) it's otherwise should be indistinguishable from an array template-wise.

July 28, 2012

Re: BitArray/BitFields - Review

Posted by Dmitry Olshansky
in reply to Era Scarecrow

Dmitry Olshansky

Posted in reply to Era Scarecrow

On 29-Jul-12 01:39, Era Scarecrow wrote:
> On Saturday, 28 July 2012 at 21:30:21 UTC, Jonathan M Davis wrote:
>
>> Well, it seems that my comment was somewhat misplaced in that BitArray
>> isn't a range but a container. My comment was directed at opSlice on
>> ranges. Container types should return whatever range type is
>> appropriate. That will usually be a struct of some variety. I'd have
>> to study BitArray and your changes to determine what the appropriate
>> type was here, but it sounds like you were talking about duping the
>> contents with opSlice, which would be highly irregular. Normally,
>> ranges over containers are light wrappers which can iterate through
>> and potentially alter the elements in the container, and duping up an
>> array as the range type doesn't work like that. But depending on what
>> BitArray is doing exactly, it might be appropriate. I don't know.
>
>   An array suggests you can control the size of the array, however if
> you are forced to work in 32increments (or more if 64bit), that makes it
> highly hard to control without making it a slice to begin with. Consider
> without slices (range control)

A strange jump to conclusions? Why can't array know it's size in bits and capacity in words?

>
>   BitArray ba; //size 32/64bit
>
>   ba.length = 42; //life, the universe and everything
>   assert(ba.length == 42); //fails, can't set length except by
> increments of 32/64
>
>   ba ~= true; //what's the size now? Or are all appending operators
> invalid unless it's a slice?

43 ? Why should it use 32/64 bit increments?

>
>   foreach(b; ba) {
>     //fails, can't go over range.
>   }

works as foreach takes [] when needed automagically :)
>
>   auto slice = ba[]; //converts to BitSlice
>
>   foreach(b; slice) {
>      //works
>   }
>
>   BitSlice func(BitArray ba);
>   BitSlice func(BitSlice ba); //two versions of function needed? If
> convertable between the two, then you only lose the begining/ending
> length offsets.
>

Just make it a template if it works with any range.

>   T func(T)(T ba) if (is(T : BitArray)) // correct? Potential loss of
> information.
T func(R)(R ba)
 if(isRandomAccessRange!R && is(Unqual!(ElementType!R) == bool))

Would be far more general. As a bonus it will work with built-in arrays of bool or anything compatible.

>
>   BitSlice x = ba[1 .. 10];
>   BitArray y = func(x); //range lost, beginning/ending at max regardless
> of size
>

Certain functions would need to take a BitArray if they need to change it size, append, prepand etc.

>   Honestly that doesn't add up to what I call an 'array'. It should look
> like an array, act like an array, and besides not being able to specify
> the array as separate from the entire BitArray (Ie const(int)) it's
> otherwise should not be distinguishable from an array.

It's damn hard to emulate built-in array. And I'd rather not to see this emulation half-working.

And in say C++, std::vector (and most arrays in other languages I know about) is not sliceble at all or slices are different type, or they do copy the whole thing. So I'm not sure what kind of array you are used to.  D's built-in arrays are rare beasts with their own issues, even with the fair deal of built-in language/runtime support they have.

-- 
Dmitry Olshansky

July 28, 2012

Re: BitArray/BitFields - Review

Posted by Era Scarecrow
in reply to Dmitry Olshansky

Era Scarecrow

Posted in reply to Dmitry Olshansky

On Saturday, 28 July 2012 at 21:49:45 UTC, Dmitry Olshansky wrote:
> On 29-Jul-12 01:39, Era Scarecrow wrote:
>> An array suggests you can control the size of the array, however if you are forced to work in 32increments (or more if 64bit), that makes it highly hard to control without making it a slice to begin with. Consider without slices (range control)
>
> A strange jump to conclusions? Why can't array know it's size in bits and capacity in words?

 Knowing the size based on if it's compact or the array.length * bits would be easy, but to do slices you drop the beginning/ending reference pointers, locking it at 32/64bits. If you keep that information, why not just include slicing along with the rest of it?

>>
>>  BitArray ba; //size 32/64bit
>>
>>  ba.length = 42; //life, the universe and everything
>>  assert(ba.length == 42); //fails, can't set length except by
>> increments of 32/64
>>
>>  ba ~= true; //what's the size now? Or are all appending operators
>> invalid unless it's a slice?
>
> 43 ? Why should it use 32/64 bit increments?

 See above.

> works as foreach takes [] when needed automagically :)

 I thought about that after I submitted it, so my mistake on that.

> Just make it a template if it works with any range.
>
> T func(R)(R ba)
>  if(isRandomAccessRange!R && is(Unqual!(ElementType!R) == bool))
>
> Would be far more general. As a bonus it will work with built-in arrays of bool or anything compatible.

 True, but you'd still need to specifically convert the array to a slice first. So you'd call it func(ba[]), or func(ba.range) or func(ba[0 .. $]). Kinda seems silly to require the extra bit there.

>>  BitSlice x = ba[1 .. 10];
>>  BitArray y = func(x); //range lost, beginning/ending at max regardless
>> of size
>>
> Certain functions would need to take a BitArray if they need to change it size, append, prepand etc.

 See first reply.

> It's damn hard to emulate built-in array. And I'd rather not to see this emulation half-working.
>
> And in say C++, std::vector (and most arrays in other languages I know about) is not sliceble at all or slices are different type, or they do copy the whole thing. So I'm not sure what kind of array you are used to. D's built-in arrays are rare beasts with their own issues, even with the fair deal of built-in language/runtime support they have.

 Obviously it can't emulate it perfectly, but doing all the basic operations should be acceptable. I don't know all of what arrays require, but opIndex and length are the largest part to my opinion, which I guess is just randomAccessRange (or whatever it's called)

July 28, 2012

Re: BitArray/BitFields - Review

Posted by Dmitry Olshansky
in reply to Era Scarecrow

Dmitry Olshansky

Posted in reply to Era Scarecrow

On 29-Jul-12 02:08, Era Scarecrow wrote:
> On Saturday, 28 July 2012 at 21:49:45 UTC, Dmitry Olshansky wrote:
>> On 29-Jul-12 01:39, Era Scarecrow wrote:
>>> An array suggests you can control the size of the array, however if
>>> you are forced to work in 32increments (or more if 64bit), that makes
>>> it highly hard to control without making it a slice to begin with.
>>> Consider without slices (range control)
>>
>> A strange jump to conclusions? Why can't array know it's size in bits
>> and capacity in words?
>
>   Knowing the size based on if it's compact or the array.length * bits
> would be easy, but to do slices you drop the beginning/ending reference
> pointers, locking it at 32/64bits.

Mhm? Not getting you it at all. What's locking it ?

What's wrong with:
struct BitArray
{

union {
	struct {
		size_t[] data;
		ulong length;	//in bits

//length in words would be just
// (length + size_t.sizeof*4) / // (size_t.sizeof*8);
// as it's a power of 2 it should get optimized

	}
	size_t[2+ulong.sizeof/size_t.sizeof] compact;

// and say 1 bit is used for isCompact would be great if it could use // the least significant bit of pointer from data[], dunno if it's easy enough
}

}

then Range would be:

struct Range{
	BitArray* array;
	ulong start, end;
}

to mark [start..end] piece of it?

If you keep that information, why not
> just include slicing along with the rest of it?
>

I'm not seeing slicing easily integrated just because length is known.

-- 
Dmitry Olshansky

July 28, 2012

Re: BitArray/BitFields - Review

Posted by Era Scarecrow
in reply to Dmitry Olshansky

Era Scarecrow

Posted in reply to Dmitry Olshansky

On Saturday, 28 July 2012 at 22:19:05 UTC, Dmitry Olshansky wrote:
> Mhm? Not getting you it at all. What's locking it ?
>
> What's wrong with:
> struct BitArray
> {
>
> union {
> 	struct {
> 		size_t[] data;
> 		ulong length;	//in bits
>
> //length in words would be just
> // (length + size_t.sizeof*4) / // (size_t.sizeof*8);
> // as it's a power of 2 it should get optimized
>
> 	}
> 	size_t[2+ulong.sizeof/size_t.sizeof] compact;
>
> // and say 1 bit is used for isCompact would be great if it could use // the least significant bit of pointer from data[], dunno if it's easy enough

 I'd rather not rely on the pointer at all as a reference for if it's compact. If the array is changed later to a template that could accept bytes, then where does that leave you? When the slice doesn't need to know more of the beginning, it can re-slice it to a lesser more appropriate size. It may very well soon contain a third entry in the union for main operations and use the larger size_t just for canUseBulk (Actually that sounds better to me).

 Besides, separating the slice suggests 'length' is not part of the union struct. Sure it's convertible to [0 .. length], but it's no longer convertible back to BitArray (Unless you make a whole new array and conversion/copying which takes a lot of time). If you do that, why not do BitSlice from the beginning which holds BitArray, but then if you do that why make them separate at all? It doesn't make sense to me. I can see making a separate range for the front/popFront/empty if you really wanted to use those, but it wouldn't change how it would work.

> then Range would be:
>
> struct Range{
> 	BitArray* array;
> 	ulong start, end;
> }
>
> to mark [start..end] piece of it?
>
>> If you keep that information, why not just include slicing along with the rest of it?
>
> I'm not seeing slicing easily integrated just because length is known.

 Beginning and ending are known, so slicing can be added at no extra cost.

July 28, 2012

Re: BitArray/BitFields - Review

Posted by bearophile
in reply to Era Scarecrow

bearophile

Posted in reply to Era Scarecrow

Era Scarecrow:

> BitFields:
>  I've concentrated on adding default values.

Good, that's useful.


Regarding BitArray I have written some ER:

http://d.puremagic.com/issues/show_bug.cgi?id=4123
http://d.puremagic.com/issues/show_bug.cgi?id=4124
http://d.puremagic.com/issues/show_bug.cgi?id=4717
http://d.puremagic.com/issues/show_bug.cgi?id=7487
http://d.puremagic.com/issues/show_bug.cgi?id=7488
http://d.puremagic.com/issues/show_bug.cgi?id=7490

Related:
http://d.puremagic.com/issues/show_bug.cgi?id=6697

Bye,
bearophile

July 28, 2012

Re: BitArray/BitFields - Review

Posted by Era Scarecrow
in reply to bearophile

Era Scarecrow

Posted in reply to bearophile

On Saturday, 28 July 2012 at 22:43:22 UTC, bearophile wrote:
> Era Scarecrow:
>
>> BitFields:
>> I've concentrated on adding default values.
>
> Good, that's useful.
>
>
> Regarding BitArray I have written some ER:
>
> http://d.puremagic.com/issues/show_bug.cgi?id=4123
> http://d.puremagic.com/issues/show_bug.cgi?id=4124
> http://d.puremagic.com/issues/show_bug.cgi?id=4717
> http://d.puremagic.com/issues/show_bug.cgi?id=7487
> http://d.puremagic.com/issues/show_bug.cgi?id=7488
> http://d.puremagic.com/issues/show_bug.cgi?id=7490

4123 - fixed
4124 - Set/reset all bits ie:
  BitArray ba = BitArray(100);
  ba[] = true;
Others haven't been done yet.

4717 - Same as 4124, none others done yet.
7487 - Done. When prepending it extends the array to size_t extra and slowly backtracks until it needs to do it again. To go further, have to disable re-slicing of the allocated array.

Or perhaps a 'start at end' during constructor stage. Hmmm..

7488 - Done, union used and is 'compact' version (by default or resized and can fit)

7490 - haven't touched yet. :(

> Related:
> http://d.puremagic.com/issues/show_bug.cgi?id=6697

 Not so sure. Could make a multi-dimentional one that returns slices to various sections, but  that's iffy. I'd need an example of how you'd use it with BitArray before I could build a solution.

July 28, 2012

Re: BitArray/BitFields - Review

Posted by Dmitry Olshansky
in reply to Era Scarecrow

Dmitry Olshansky

Posted in reply to Era Scarecrow

On 29-Jul-12 02:31, Era Scarecrow wrote:
> On Saturday, 28 July 2012 at 22:19:05 UTC, Dmitry Olshansky wrote:
>> Mhm? Not getting you it at all. What's locking it ?
>>
>> What's wrong with:
>> struct BitArray
>> {
>>
>> union {
>>     struct {
>>         size_t[] data;
>>         ulong length;    //in bits
>>
>> //length in words would be just
>> // (length + size_t.sizeof*4) / // (size_t.sizeof*8);
>> // as it's a power of 2 it should get optimized
>>
>>     }
>>     size_t[2+ulong.sizeof/size_t.sizeof] compact;
>>
>> // and say 1 bit is used for isCompact would be great if it could use
>> // the least significant bit of pointer from data[], dunno if it's
>> easy enough
>
>   I'd rather not rely on the pointer at all as a reference for if it's
> compact.

No problem, I redesigned a bit:

struct BitArray
{

union {
    struct {
        ulong length;    //in bits, the top bit used as isCompact
        size_t[] data;
    }
    struct{
	ushort smallLength; //here as well
	ubyte[size_t.sizeof*2+ulong.sizeof-2] compact;
    }
}

Up 2^63 of bits would occupy 2^60 of bytes should be good enough for the years to come.
 I'm not seeing it as a problem as at very least processors have to actually be able to address more then 42 (or was it 48?) bits that they can now. (see AMD or Intel or ARM datasheets on 64 bit chips)

If the array is changed later to a template that could accept
> bytes, then where does that leave you?
> When the slice doesn't need to
> know more of the beginning,

???
Slice can start at some random offset

 it can re-slice it to a lesser more
> appropriate size.

no idea what's you are talking about here.

 It may very well soon contain a third entry in the
> union for main operations and use the larger size_t just for canUseBulk
> (Actually that sounds better to me).

Again I'm having hard time to get what follows from where. Can you expand on this point please?
Preferably with some code as it was far easier to reason about.


>   Besides, separating the slice suggests 'length' is not part of the
> union struct.

How? All containers know their length.

Sure it's convertible to [0 .. length], but it's no longer
> convertible back to BitArray (Unless you make a whole new array and
> conversion/copying which takes a lot of time).

Yes it's not convertible. And it's no problem and is intended. Otherwise like you said it doesn't make a lot of sense.

>If you do that, why not
> do BitSlice from the beginning which holds BitArray, but then if you do
> that why make them separate at all?
> It doesn't make sense to me. I can
> see making a separate range for the front/popFront/empty if you really
> wanted to use those, but it wouldn't change how it would work.
>

>> then Range would be:
>>
>> struct Range{
>>     BitArray* array;
>>     ulong start, end;
>> }
>>
>> to mark [start..end] piece of it?
>>
>>> If you keep that information, why not just include slicing along with
>>> the rest of it?
>>
>> I'm not seeing slicing easily integrated just because length is known.
>
>   Beginning and ending are known, so slicing can be added at no extra cost.
Begin takes extra ulong I'm not sure it is needed in the container itself.

-- 
Dmitry Olshansky

July 28, 2012

Re: BitArray/BitFields - Review

Posted by Era Scarecrow
in reply to Dmitry Olshansky

Era Scarecrow

Posted in reply to Dmitry Olshansky

On Saturday, 28 July 2012 at 23:02:17 UTC, Dmitry Olshansky wrote:
> On 29-Jul-12 02:31, Era Scarecrow wrote:
>> On Saturday, 28 July 2012 at 22:19:05 UTC, Dmitry Olshansky wrote:
>>> Mhm? Not getting you it at all. What's locking it ?
>>>
>>> What's wrong with:
>>> struct BitArray
>>> {
>>>
>>> union {
>>>    struct {
>>>        size_t[] data;
>>>        ulong length;    //in bits
>>>
>>> //length in words would be just
>>> // (length + size_t.sizeof*4) / // (size_t.sizeof*8);
>>> // as it's a power of 2 it should get optimized
>>>
>>>    }
>>>    size_t[2+ulong.sizeof/size_t.sizeof] compact;
>>>
>>> // and say 1 bit is used for isCompact would be great if it could use
>>> // the least significant bit of pointer from data[], dunno if it's easy enough
>>
>> I'd rather not rely on the pointer at all as a reference for if it's compact.
>
> No problem, I redesigned a bit:
>
> struct BitArray
> {
>
> union {
>     struct {
>         ulong length;    //in bits, the top bit used as isCompact
>         size_t[] data;
>     }
>     struct{
> 	ushort smallLength; //here as well
> 	ubyte[size_t.sizeof*2+ulong.sizeof-2] compact;
>     }
> }
>
> Up 2^63 of bits would occupy 2^60 of bytes should be good enough for the years to come.
>  I'm not seeing it as a problem as at very least processors have to actually be able to address more then 42 (or was it 48?) bits that they can now. (see AMD or Intel or ARM datasheets on 64 bit chips)

2^63, or drop 8 so it takes up no more than 2^55 bytes of memory. Yes that's quite a large chunk.

 Curiously enough the current structure is already smaller than the proposed unions.

>> If the array is changed later to a template that could accept bytes, then where does that leave you? When the slice doesn't need to know more of the beginning,

> ???
> Slice can start at some random offset

 Yes they can, but so can a normal array (as they are pretty much the same)

>>  it can re-slice it to a lesser more appropriate size.
>
> no idea what's you are talking about here.

 As for requested compact to hold the starting/ending offsets at 1 byte each, obviously slicing at say 1024 to 2048 of a 10,000 length bit array, that won't work (0-127). If it's larger than size_t in bits, it will subtract that size and internally change the pointer of the array it actually points to, until it can fit that new value.

start/end slice: [32 .. 64]
equals: size_t array[1 .. 2]

>> It may very well soon contain a third entry in the union for main operations and use the larger size_t just for canUseBulk (Actually that sounds better to me).
>
> Again I'm having hard time to get what follows from where. Can you expand on this point please? Preferably with some code as it was far easier to reason about.

A question earlier was brought up about endienness for the patch, that is not part of what it can handle, so:

ba[100] = 1; //then save the array
assert(ba[100]); //fails on opposite endienness when loaded

HOWEVER! if individual bits are handled in a byte array reference, then the above would succeed, even with the larger size_t ones all working with canUseBulk.

>> Besides, separating the slice suggests 'length' is not part of the union struct.
>
> How? All containers know their length.

 Yes they do, they also know where they start, thereby giving start/ending, which allows slicing. So why separate slicing?

>> Sure it's convertible to [0 .. length], but it's no longer convertible back to BitArray (Unless you make a whole new array and conversion/copying which takes a lot of time).
>
> Yes it's not convertible. And it's no problem and is intended. Otherwise like you said it doesn't make a lot of sense.

>>> If you do that, why not do BitSlice from the beginning which holds BitArray, but then if you do that why make them separate at all? It doesn't make sense to me. I can see making a separate range for the front/popFront/empty if you really wanted to use those, but it wouldn't change how it would work.

> Begin takes extra ulong I'm not sure it is needed in the container itself.

 No it doesn't. bitfields are used to handle the beginning/ending offsets, equaling a total of 14 bits. It's the most confusing part (aside from [offset + sliceoffset + i[ formulas). canExpand and isCompact fill the other two bits.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation