Bits again. A proposal. (page 3)

October 16, 2004

Re: Bits again. A proposal.

Posted by Anders F Björklund
in reply to Charles Hixson

Permalink

Anders F Björklund

Posted in reply to Charles Hixson

Permalink

Charles Hixson wrote:

>> But D's practice of calling the boolean type "bit" is *not good*.
>> The sooner it can be changed, the better! It could *work* the same.
> 
> But currently D's type IS bit.  bool is an alias for convenience only.  

And I think that is just plain wrong. It should be the other way around.

If it's all about storage, then we might just as well do away with all
the character types too ? char => ubyte, wchar => ushort, dchar => uint

No, my request is for "bool" to be a proper language keyword in D.

> And currently bit arrays are packed, and thus bit[8] is equivalent to a bit addressable byte.  This can be quite useful, but since I don't know your boundaries about how you think of bit and how you think of bool, I can't claim that there isn't an overlap, but to my mind, if you care how it's packed, then it's a bit type, otherwise, bool is probably a decent label.

I think that using a single bit for a boolean is an elegant solution,
even if does have a lot of pain implied on the implementation front...

As for my own "boundaries", I happen to think that:

-"bool" is a boolean type, that can have one of values "true" or "false"
  when you assign an integer to a bool, the end result is: b = (i != 0)

-"bit" is an integer type, size 1 bit, that can contain numbers 1 and 0
  when you assign an integer to a bit, the end result is: b = (i & 1)

Some people (me included) think that integers and booleans should not be
assignable at all, but that's another discussion... (about type-safety)

And even if you use a "char" or an "int" to store values of type "bool",
in the end it can only hold two result values: zero and non-zero... :-)

Currently D has a boolean type, called "bit". And *that's* confusing.
(*especially* for all C99 and C++ programmers that are used to "bool")

--anders

Charles Hixson wrote: > And currently bit arrays are packed, and thus bit[8] is equivalent to a bit addressable byte. This can be quite useful, [...] I'm not sure how I would use that ? My first attempt crashed the compiler: void main() { union U { ubyte bite; bit[8] bits; } U.bite = 0x80; foreach (bit b; U.bits) { printf(" %d", b ? 1 : 0); } } > bitarray.d: In function `main': > bitarray.d:5: internal compiler error: in d_expand_expr, at d/d-glue.cc:3000 The second attempt shows different sizes: void main() { ubyte bite; bit[8] bits; printf("byte: %d\n", bite.sizeof); printf("bits: %d\n", bits.sizeof); } > byte: 1 > bits: 4 But maybe bit arrays has some other use I'm not aware of ? (and what about the "nybble" type ? nybble[2] hex_byte;) Just that it all feels so Pascal to me: "Bytes as bit sets" Isn't sub-byte manipulation what the bit operators are for ? Currently, I just view bit[] as a nice hack to store arrays of (1-bit) flags in an effective format, just as char[] is a way of storing arrays of (32-bit) Unicode code points effectively... (then again, one *could* just use ubyte[] and dchar[] too ?) --anders

Anders F Björklund wrote: > Charles Hixson wrote: > >> And currently bit arrays are packed, and thus bit[8] is equivalent to a bit addressable byte. This can be quite useful, [...] > > I'm not sure how I would use that ? > > > My first attempt crashed the compiler: > > void main() > { > union U { ubyte bite; bit[8] bits; } > > U.bite = 0x80; > foreach (bit b; U.bits) { > printf(" %d", b ? 1 : 0); > } > } > >> bitarray.d: In function `main': >> bitarray.d:5: internal compiler error: in d_expand_expr, at >> d/d-glue.cc:3000 It works fine if you replace union U { ubyte bite; bit[8] bits; } with union U_t { ubyte bite; bit[8] bits; } U_t U; Your code was trying to access a type like a variable. The compiler shouldn't error, though, so I'd go ahead and post that example to D.bugs so that Walter can try to fix it.

> The second attempt shows different sizes: > > void main() > { > ubyte bite; > bit[8] bits; > > printf("byte: %d\n", bite.sizeof); > printf("bits: %d\n", bits.sizeof); > } > >> byte: 1 >> bits: 4 Looks like bit arrays are packed into ints not bytes - so the sizeof will always be a multiple of 4. Where does this matter?

Ben Hinkle wrote: > It works fine if you replace > union U { ubyte bite; bit[8] bits; } > with > union U_t { ubyte bite; bit[8] bits; } > U_t U; > Your code was trying to access a type like a variable. The compiler > shouldn't error, though, so I'd go ahead and post that example to D.bugs so > that Walter can try to fix it. Argh, you are right... Thinking in C, I guess. (or not at all) Also discovered that bit arrays are little-endian. (LSB first) --anders

Ben Hinkle wrote: >>void main() >>{ >> ubyte bite; >> bit[8] bits; >> >> printf("byte: %d\n", bite.sizeof); >> printf("bits: %d\n", bits.sizeof); >>} >> >> >>>byte: 1 >>>bits: 4 > > Looks like bit arrays are packed into ints not bytes - so the sizeof will > always be a multiple of 4. Where does this matter? Not at all, I guess... Changing to int shows that you are right: void main() { uint bite; bit[32] bits; printf("int: %d\n", bite.sizeof); printf("bit: %d\n", bits.sizeof); } > int: 4 > bit: 4 --anders

Anders F Björklund wrote: > > Also discovered that bit arrays are little-endian. (LSB first) They are on x86 hardware anyway. I would be surprised if this were preserved for big-endian machines. Sean

I wrote: >> Looks like bit arrays are packed into ints not bytes - so the >> sizeof will always be a multiple of 4. Where does this matter? > > Not at all, I guess... It wasn't obvious to me what the sizes were, so I checked the current D implementation... Bit variables are stored in a byte, unless they occur in arrays. Then they are instead packed into blocks of 32 bits, for speed : > void Bits::set(unsigned bitnum) > { > data[bitnum / 32] |= 1 << (bitnum & 31); > } > > void Bits::clear(unsigned bitnum) > { > data[bitnum / 32] &= ~(1 << (bitnum & 31)); > } > > int Bits::test(unsigned bitnum) > { > return data[bitnum / 32] & (1 << (bitnum & 31)); > } That should be "bitnum >> 5", but the compiler should be smart enough to optimize it away... struct bit_dynamic { bit[] bits; } struct bit_static { bit[2] bits; } struct bit_fields { bit a; bit b; } > bit_dynamic.sizeof: 8 (Dynamic arrays are the usual length+pointer) > bit_static.sizeof: 4 > bit_fields.sizeof: 2 So if you union a ubyte and a bit[8], the union occupies 4 bytes. Same (!) if you union a uint and a bit[32]: 4 bytes. (ulong with bit[64] is expected 8 bytes) Pointers to bits are funny, they *do* work if you access a byte-stored single bit var - but not if you try access a single bit in an array ? > void main() > { > static bit[32] t = 0; > > t[5] = 1; > for (int i = 0; i < 32; i++) { > bit *p = &(t[i]); > printf("%d ", (*p) ? 1 : 0); > } > printf("\n"); > > static ubyte[32] b = 0; > > b[5] = 1; > for (int i = 0; i < 32; i++) { > ubyte *p = &(b[i]); > printf("%d ", (*p) ? 1 : 0); > } > printf("\n"); > } Probably for the same reasons that bit[] slices has problems, pointers only knows of whole bytes (while they need to know a 0-7 bit offset too) ? --anders

Sean Kelly wrote: >> Also discovered that bit arrays are little-endian. (LSB first) > > They are on x86 hardware anyway. I would be surprised if this > were preserved for big-endian machines. This was on a big-endian machine... But I only meant within the byte, that is: bit[0] sets 0x01 and bit[7] sets 0x80 of the byte... If you do things like unions or casts, then it'll probably preserve the native endian of the platform (since it just copies the bytes) --anders

Anders F Björklund wrote: > > Pointers to bits are funny, they *do* work if > you access a byte-stored single bit var - but > not if you try access a single bit in an array ? ... > Probably for the same reasons that bit[] slices > has problems, pointers only knows of whole bytes > (while they need to know a 0-7 bit offset too) ? Yup. There have been some proposals for addressing this issue (no pun intended) but all seemed a bit kludgy. Sean

Forums