Bits and Bobs and Bools

Feb 10, 2005

Anders F Björklund

Feb 10, 2005

Walter

Feb 10, 2005

Feb 10, 2005

Feb 10, 2005

Feb 10, 2005

Feb 10, 2005

Feb 11, 2005

February 10, 2005

Bits and Bobs and Bools

Posted by Anders F Björklund

Permalink

Anders F Björklund

Permalink

Back to our good old D friend, the "bit" data type...
(with the two constant bits: true == 1 and false == 0)

While D will not ever have a boolean type like e.g. Java,
it doesn't hurt if the alternative "bit" is documented ?

Here is my attempt... Feel free to comment or extend.


The first thing that is good to know is that bit
is a "unsigned" data type. That is, if you convert
true to an integer, then you get a 1 and not a -1.

(having -1 as true might sound a bit silly, but has its
merits since that means all bits are set when converting
to a larger int type. This convention is in use elsewhere)

The only "universal" convention is that false is 0,
and this is also the .init value for the bit type.
(in boolean contexts, "null" has a value of 0 too)

The second that is good to know is that single bits
are stored in bytes, and in the least significant bit.
That is, "true" is equal to 0x01 when read as a byte.


There are a bunch of things about bits that are
"implementation defined", and that's not all that
good for a language that defines e.g. int sizes ?

It would be much simpler if these were defined ?
This would also allow external languages, such as
C, to access any bit variables in D structures...

http://www.digitalmars.com/d/interface.html:
> Data Type Compatibility
> 
> D type    C type
> bit       no equivalent

Which is not true, it should be "signed char" in C/C++ ?
(but C "bool" can't be used, since it might be 4 bytes)

IMPLEMENTATION DETAIL: (D)

    assert(typeid(typeof(true)) == typeid(bit));
    assert(typeid(typeof(false)) == typeid(bit));
    assert(bit.sizeof == 1);
    assert(true == 1);
    assert(false == 0);


Then, bit arrays. For efficiency reasons, these are
stored in 32-bit (register sized) chunks, rounded up.
Since D mandates a 32-bit minimum CPU, might as well
make this implementation detail fixed in the D spec ?

This means that bit[1].sizeof == 4, same as bit[32],
while as bit[256] occupies 32 bytes - for the data
(as usual there is some overhead for length and ptr)
Dividing by 32 is the same as shifting right by 5.

IMPLEMENTATION DETAIL: (C)

    unsigned int *data;

void alloc(unsigned int bitdim)
{
    unsigned int allocdim = (bitdim + 31) / 32; // rounding up
    data = (unsigned int *) calloc(allocdim, sizeof(unsigned int));
}

void set(unsigned int bitnum)
{
    data[bitnum >> 5] |= 1L << (bitnum & 31);
}

void clear(unsigned int bitnum)
{
    data[bitnum >> 5] &= ~(1L << (bitnum & 31));
}

int test(unsigned int bitnum)
{
    return data[bitnum >> 5] & (1L << (bitnum & 31));
}


Within the chunks, the bytes are stored in little-endian byte order
but still using normal bit direction (LSB first) within each byte.
That is, the first bit encountered is the least significant,
and this differs from e.g. how the bit integer literals work...

>void main()
>{
>  printf("%02x\n", 0b10000000);
>  printf("%02x\n", 0b01000000);
>  printf("%02x\n", 0b00100000);
>  printf("%02x\n", 0b00010000);
>  printf("%02x\n", 0b00001000);
>  printf("%02x\n", 0b00000100);
>  printf("%02x\n", 0b00000010);
>  printf("%02x\n", 0b00000001);
>}

80
40
20
10
08
04
02
01

That's the reverse bit order, compared to what bit[] uses:
(note that the byte ordering is the same on all platforms,
just interpreted differently when loaded as 32-bit integers,
This byte order needs to be finalized, and written in spec)

>void main()
>{
>  bit[32] ba;
>
>  for (int i = 0; i < 32; i++)
>  {
>    ba[] = false;
>    ba[i] = true;
>    printf("%08lx\n", *(cast(uint*) ba.ptr));
>  }
>}

version (LittleEndian) // e.g. X86
{
00000001
00000002
00000004
00000008
00000010
00000020
00000040
00000080
00000100
00000200
00000400
00000800
00001000
00002000
00004000
00008000
00010000
00020000
00040000
00080000
00100000
00200000
00400000
00800000
01000000
02000000
04000000
08000000
10000000
20000000
40000000
80000000
}
version (BigEndian) // e.g. PPC
{
01000000
02000000
04000000
08000000
10000000
20000000
40000000
80000000
00010000
00020000
00040000
00080000
00100000
00200000
00400000
00800000
00000100
00000200
00000400
00000800
00001000
00002000
00004000
00008000
00000001
00000002
00000004
00000008
00000010
00000020
00000040
00000080
}


Bit pointers (bit*) are allowed, but can only address
the first bit of each byte, directly. This means that
you can *not* take the address of individual bits in an
array, just to the start of the bit[] array itself...
(this can easily be accessed by using the .ptr property,
or take the address of an individual bit variable : &b)

However, you can then use this bit pointer with the
regular index, as in bp[4], to set individual bits...
(just as you could do with a bit[] bit array ,as well)

But incrementing is a little funny, it will be divided
by eight in order to fit the next whole byte address.

>  bit[32] ba;
>  bit* bp = ba.ptr;
>
>  printf("%x\n", bp);
>  bp += 4;
>  printf("%x\n", bp);
>  bp += 4;
>  printf("%x\n", bp);
>  bp += 8;
>  printf("%x\n", bp);

// an example, actual pointer values might differ
bffffaa8
bffffaa8
bffffaa8
bffffaa9

For similar reasons, slices of bit arrays are only
allowed when the lower bound starts on an even byte
(that is: bit number is a multiple of 8, "% 8 == 0")

Misaligned bit array slices throws ArrayBoundsError,
or silently does nothing for the -release builds.
Appending to bit arrays might not be implemented.


bool: (bit)
Having bit as the default boolean type in D now means
that there can be only *one* value for true: 1 (one),
all other non-zero values are converted when casting.

wbool: (byte)
While the shift and bit operators involved in getting
and setting individual bits are pretty fast, it might be
yet faster to just use byte arrays instead of bit arrays.

dbool: (int)
And since individual bytes might need masking to
fit into a register, it might be faster still to
use the int type to eg. return bits from functions.

IMPLEMENTATION DETAIL: (D)

alias   bit  bool; // boolean (true/false)
alias  byte wbool; // wide boolean (like wchar)
alias   int dbool; // double boolean (like dchar)

byte c = cast(byte) 256; // c is now equal to "0"
bit b = cast(bit) 2;     // b is now equal to "true"
int i = cast(int) b;     // i is now equal to "1"


I will probably stick this whole long post on Wiki4D.
(under the GNU Free Documentation License, that is)

--anders

"Anders F Björklund" <afb@algonet.se> wrote in message news:cufvfa$2aet$1@digitaldaemon.com... > I will probably stick this whole long post on Wiki4D. (under the GNU Free Documentation License, that is) Please do, and thanks. It's a good article.

Walter wrote: >>I will probably stick this whole long post on Wiki4D. > > Please do, and thanks. It's a good article. Thank you, it's now at: http://www.prowiki.org/wiki4d/wiki.cgi?BitsAndBools Here was some of the questions raised: - Should the byte order of bit[] be "fixed", like it seems to be now ? (which could be just an accident...) Or should the uint's use native byte endianness instead ? - Taking the address of a bit in an array seems to a) not complain and b) give a bogus address ? (that is: bit[] ba; bit *bp = &ba[10];) My suggestion is that it should return: ba + (10 / 8) == ba + 1 (rounded to byte) ? Similar to using : bp = ba; bp += 10; - wbool and dbool could be added, as "official" aliases perhaps ? They could all go in std.stdbool... wbool[8191] flags; // from sieve.d dbool opEquals(Object o); // from object.d I've also taken back my previous "boolean" suggestions... The bit type is it for D, with the bool alias for shortness. This probably also nixes the "isnot" keyword, as in: assert(object isnot null); which becomes assert(object); After all, D is about performance and favorite C terseness... If you want cuddly syntax and slowness, you know where Java is. --anders

Deaf ears prepare: bit is a useless wart and should be removed. There, I said it. Now ignore me. "Anders F Björklund" <afb@algonet.se> wrote in message news:cugafi$2mrr$1@digitaldaemon.com... > Walter wrote: > >>>I will probably stick this whole long post on Wiki4D. >> >> Please do, and thanks. It's a good article. > > Thank you, it's now at: http://www.prowiki.org/wiki4d/wiki.cgi?BitsAndBools > > Here was some of the questions raised: > > - Should the byte order of bit[] be "fixed", like it > seems to be now ? (which could be just an accident...) > Or should the uint's use native byte endianness instead ? > > - Taking the address of a bit in an array seems to > a) not complain and b) give a bogus address ? > (that is: bit[] ba; bit *bp = &ba[10];) > > My suggestion is that it should return: > ba + (10 / 8) == ba + 1 (rounded to byte) ? > Similar to using : bp = ba; bp += 10; > > - wbool and dbool could be added, > as "official" aliases perhaps ? > They could all go in std.stdbool... > > wbool[8191] flags; // from sieve.d > dbool opEquals(Object o); // from object.d > > > I've also taken back my previous "boolean" suggestions... The bit type is it for D, with the bool alias for shortness. > > This probably also nixes the "isnot" keyword, as in: > assert(object isnot null); which becomes assert(object); > > After all, D is about performance and favorite C terseness... If you want cuddly syntax and slowness, you know where Java is. > > > --anders

Hear, Hear! For that deaf ear! :-) In article <cugdv5$2r8g$2@digitaldaemon.com>, Matthew says... > >Deaf ears prepare: > > bit is a useless wart and should be removed. > >There, I said it. Now ignore me. > >"Anders F Björklund" <afb@algonet.se> wrote in message news:cugafi$2mrr$1@digitaldaemon.com... >> Walter wrote: >> >>>>I will probably stick this whole long post on Wiki4D. >>> >>> Please do, and thanks. It's a good article. >> >> Thank you, it's now at: http://www.prowiki.org/wiki4d/wiki.cgi?BitsAndBools >> >> Here was some of the questions raised: >> >> - Should the byte order of bit[] be "fixed", like it >> seems to be now ? (which could be just an accident...) >> Or should the uint's use native byte endianness instead ? >> >> - Taking the address of a bit in an array seems to >> a) not complain and b) give a bogus address ? >> (that is: bit[] ba; bit *bp = &ba[10];) >> >> My suggestion is that it should return: >> ba + (10 / 8) == ba + 1 (rounded to byte) ? >> Similar to using : bp = ba; bp += 10; >> >> - wbool and dbool could be added, >> as "official" aliases perhaps ? >> They could all go in std.stdbool... >> >> wbool[8191] flags; // from sieve.d >> dbool opEquals(Object o); // from object.d >> >> >> I've also taken back my previous "boolean" suggestions... The bit type is it for D, with the bool alias for shortness. >> >> This probably also nixes the "isnot" keyword, as in: >> assert(object isnot null); which becomes assert(object); >> >> After all, D is about performance and favorite C terseness... If you want cuddly syntax and slowness, you know where Java is. >> >> >> --anders > >

Matthew wrote: > Deaf ears prepare: > > bit is a useless wart and should be removed. > > There, I said it. Now ignore me. I think we had that conversion already :-) I would like a boolean type too, but I don't consider it important enough to drop D for. And it's pretty obvious that Walter won't change this lemon, so I made some limonade... Just consider it the _Bool of D, right ? --anders PS. However, this statement is nonsense and should go: http://www.digitalmars.com/d/overview.html: > Bit type > > The fundamental data type is the bit, and D has a bit > data type. This is most useful in creating arrays of bits: > > bit[] foo; Bit is *not* the fundamental data type, but byte is... (these days, one could even consider "int" fundamental)

On Thu, 10 Feb 2005 16:43:38 +0100, Anders F Björklund wrote: [snip] > > Then, bit arrays. For efficiency reasons, these are > stored in 32-bit (register sized) chunks, rounded up. > Since D mandates a 32-bit minimum CPU, might as well > make this implementation detail fixed in the D spec ? > > This means that bit[1].sizeof == 4, same as bit[32], > while as bit[256] occupies 32 bytes - for the data > (as usual there is some overhead for length and ptr) > Dividing by 32 is the same as shifting right by 5. > While this might work for bits, it doesn't for booleans. An array of individually addressable boolean values is a useful construct. bool[6] Flags; Flags[2] = true; Flags[4] = False; if (Flags[3] || Flags[5]) . . . foobar( Flags[1] ); void foobar(inout bool aSwitch) { . . . } etc... -- Derek Melbourne, Australia

Derek wrote: >>This means that bit[1].sizeof == 4, same as bit[32], >>while as bit[256] occupies 32 bytes - for the data >>(as usual there is some overhead for length and ptr) >>Dividing by 32 is the same as shifting right by 5. > > While this might work for bits, it doesn't for booleans. An array of > individually addressable boolean values is a useful construct. So use wbool instead ? Individual bit pointers just don't exist. alias byte wbool; (well, I do know that Stewart coded an implementation, but IRL... http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/495) There is no spoon. --anders

Forums