Jump to page: 1 2
Thread overview
Back to basics: integer types
Nov 23, 2004
Lionello Lunesu
Nov 23, 2004
Georg Wrede
Nov 23, 2004
Sean Kelly
Nov 23, 2004
Russ Lewis
Nov 23, 2004
Sean Kelly
Nov 24, 2004
Lionello Lunesu
Nov 25, 2004
Lionello Lunesu
Nov 25, 2004
Georg Wrede
November 23, 2004
Hi..

Another thought I had...

I like the type size-suffix for fixed-sized integers (int8, int16, int32, etc) and this got me thinking about the most basic of types: bit.

Wouldn't it be nice if "bit[8]" behaved the same as byte / int8, and bit[16] the same as the current int16, etc. ? I probably need more arguments than "it's nice" to get this one through, but I'll think about it some more while you guys fire back at me.

And, am I correct in understanding that subsequent bits in a struct will always be packed?

struct something
{
    bit    first;
    bit    second;
};

I.e. what's the size of this struct?

If they will be packed, then I've got a (new?) argument for "bool": sizeof(bool) being 1 (mostly for compatibility with old code/structs as I don't think there's any speed gain. Well, passing "bit second" from the above struct to a method taking bool/bit would need a shift operation not needed if "second" was a real bool).

-- 
Lio.

-- Get the CACert root certificate (and a personal one) at http://cacert.org/


November 23, 2004
Lionello Lunesu wrote:

> I like the type size-suffix for fixed-sized integers (int8, int16, int32, etc) and this got me thinking about the most basic of types: bit.

The supported type name "spelling" is: int8_t, int16_t, int32_t...

See the D module: import std.stdint;

> Wouldn't it be nice if "bit[8]" behaved the same as byte / int8, and bit[16] the same as the current int16, etc. ? I probably need more arguments than "it's nice" to get this one through, but I'll think about it some more while you guys fire back at me.

bit[#] is optimized for speed, so it's a multiple of 4 bytes (32 bits)

Had it been optimized for storage, it would have worked like you wrote

> And, am I correct in understanding that subsequent bits in a struct will always be packed?

No. You could have tested it, too ?

> struct something
> {
>     bit    first;
>     bit    second;
> };
> 
> I.e. what's the size of this struct?

2 bytes.

> If they will be packed, then I've got a (new?) argument for "bool": sizeof(bool) being 1 (mostly for compatibility with old code/structs as I don't think there's any speed gain. Well, passing "bit second" from the above struct to a method taking bool/bit would need a shift operation not needed if "second" was a real bool).

There is no "real" boolean type in D.
(Like one that only takes true/false?)


sizeof(bool) differs between 0.125, 1 or 4 depending on who you ask.
The different implementations are fondly known as bit, wbit and dbit

Otherwise you need sub-byte pointers to access &something.second,
and that's a huge pain to implement and to pass around the block.


Personally, I'm not sure that bit *is* the fundamental data type...

Similar to how the triangle is the fundamental type in 3D graphics,
I think the byte has become the fundamental data type. For speed ?

--anders


PS.
Bit is not an integer data type in D. It's a pseudo-bool data type.
See my previous rants and the endless discussions on this newsgroup.
November 23, 2004

Anders F Björklund wrote:
> Lionello Lunesu wrote:
> There is no "real" boolean type in D.
> (Like one that only takes true/false?)
> 
> 
> sizeof(bool) differs between 0.125, 1 or 4 depending on who you ask. The different implementations are fondly known as bit, wbit and dbit
> 
> Otherwise you need sub-byte pointers to access &something.second, and that's a huge pain to implement and to pass around the block.
> 
> 
> Personally, I'm not sure that bit *is* the fundamental data type...
> 
> Similar to how the triangle is the fundamental type in 3D graphics, I think the byte has become the fundamental data type. For speed ?
..
> PS.
> Bit is not an integer data type in D. It's a pseudo-bool data type.
> See my previous rants and the endless discussions on this newsgroup.

I agree.

While bit may be the fundamental unit of information, it is _not_, I repeat, _not_ the fundamental data type in computers.

Contrary to public belief (IT professionals and laymen alike), these
are not the same thing.

Computer hardware manufacturers have known this for more than 50 years. If they hadn't, then the memory adress space would be a single bit string.

Corollary (slightly off the issue)

This would lead to two things: (a) memory pointers
could point to any bit, and (b) the different cpu-sizes (today
16, 32 and 64 bits) could actually be _any_ number of bits. Thus,
we could have a 17 bit CPU or a 67 bit cpu.

Probably these computers would fetch several bits at once, say, those 17 bits on a "17-bit" CPU. But not necessarily.

Opcodes could be of varying size without wasting memory. The CPU would know where (at which bit-adress) to fetch the next instruction.

Opcodes (and entire applications) would exist to process bit-
streams, a la audio CD reading. (Ever seen '1 bit processor'
written on a CD player?)

Most probably different parts of a CPU would be capable of handling different size "words", depending on the usage. For example, US-ASCII operations hardware would be on 7 bit circuits, and graphics operations hardware 24 bits.

Oh, and an amazing thing with the memory: arbitrary size "DMA like" data transfers (say between main memory and graphics memory) would take only one clock cycle for the _entire_ transfer !!!! <diabolical laughter> This I leave as an excercise to the reader to figure out! </>

Yes, very nice. But practical issues in hardware design
and implementation have lead us to use a fixed word size
and memory adressing by the byte.

(Actually, the word
size usually determines the fetched number of bits, so
on some hardware architectures the _actual_ granularity
of adressing is not the same as the _appearing_ granularity.
E.g. we may have a CPU that is only capable of adressing
memory "as 32 bit entities", but then, using various
kludges, it pretends to be able to do it by the byte
(=8bits). And this only because systems programmers'
minds are stuck to byte as the "only" size.)

Back to the issue at hand

It is regrettable that this was not thought of in the
early days of D. As a result we have this Bit data type
that lingers haunting the entire D community.

Would I become "the God in D world", on the First Day, I would throw away the Bit data type.

On the second day I would create a Boolean data type, and write in the specification that its size would be architecture dependent, and suggest the fastest integral size would be used. (Eg, probably 32 bits on Pentiums.)

Then I would seriously consider forbidding int casts to/from Boolean. You'd have to write

myBoolean = ( myInt != 0 );
myInt2 = myBoolean ? 1 : 0;

I would not disapprove of arrays of Booleans.

I would not even disallow union wizardry with int on top
of packed Boolean, but I would seriously discourage it.

Of course the compiler writer would be free to use whatever intermediate values in Boolean calculations etc., but the programmer should percieve Booleans as distinct from 1 and 0.

And of course, programmers would still be free to do bitwise logical ops between ints. But that is _really_ unrelated to this Boolean issue. Isn't it!




November 23, 2004
In article <cnvusn$1hh3$1@digitaldaemon.com>, Georg Wrede says...
>
>(Actually, the word
>size usually determines the fetched number of bits, so
>on some hardware architectures the _actual_ granularity
>of adressing is not the same as the _appearing_ granularity.
>E.g. we may have a CPU that is only capable of adressing
>memory "as 32 bit entities", but then, using various
>kludges, it pretends to be able to do it by the byte
>(=8bits). And this only because systems programmers'
>minds are stuck to byte as the "only" size.)

Technically, since a byte is defined as "the smallest addressable unit" in C/C++, I would expect a byte on such machines to be 32 bits.  But then I suppose we've gotten so used to 8 bit bytes that perhaps the kludges would be expected? In any case, D imposes size constraints on all types, so in our world a byte would be 8 bits.


>On the second day I would create a Boolean data type, and write in the specification that its size would be architecture dependent, and suggest the fastest integral size would be used. (Eg, probably 32 bits on Pentiums.)

..which would make the boolean type the only D data type with an indeterminate size.  I would almost rather require that such a type always be 8 or 32 bits, unless all size restrictions were eliminated (and I don't expect that to happen).


Sean


November 23, 2004
Sean Kelly wrote:
> Technically, since a byte is defined as "the smallest addressable unit" in
> C/C++, I would expect a byte on such machines to be 32 bits.  But then I suppose
> we've gotten so used to 8 bit bytes that perhaps the kludges would be expected?
> In any case, D imposes size constraints on all types, so in our world a byte
> would be 8 bits.

Well, to quibble with your quibble: on most 32bit machines, 8bit values are individually addressable.  Even on architectures that require aligned reads & writes, they often use byte-addressing, and byte-level primitives are sometimes available.

November 23, 2004
Georg Wrede wrote:

> Back to the issue at hand

A somewhat dead and beaten issue, but I'll bite...

> It is regrettable that this was not thought of in the
> early days of D. As a result we have this Bit data type
> that lingers haunting the entire D community.

I don't think that it wasn't thought of being the issue...
Maybe bit array slices got a tad harder to implement, but.

> Would I become "the God in D world", on the First Day, I
> would throw away the Bit data type.

The current Deity being the proud bit creator, of course :-)

> On the second day I would create a Boolean data type, and
> write in the specification that its size would be
> architecture dependent, and suggest the fastest integral
> size would be used. (Eg, probably 32 bits on Pentiums.)

Going with the main "fixed-size" ideom of D, one could
probably fix it as, say: 1 byte, too - without crying.

Either the way, the main issue being there would be a
boolean type that only accepts true and false values.

> Then I would seriously consider forbidding int casts
> to/from Boolean. You'd have to write
> 
> myBoolean = ( myInt != 0 );
> myInt2 = myBoolean ? 1 : 0;

Just as in Java... (the language, not the VM or OS)
Also required: myBoolean = ( myPointer != null );

Note: some people *hate* these "extra" steps

> I would not disapprove of arrays of Booleans.
> 
> I would not even disallow union wizardry with int on top
> of packed Boolean, but I would seriously discourage it.

I'm assuming you mean packed bit arrays. Java does those
with an Object class, but language support isn't all bad.

If it hadn't been for the pain to get the pointers and
slices implemented, that is. Probably not worth it, IMHO.

> Of course the compiler writer would be free to use whatever
> intermediate values in Boolean calculations etc., but the
> programmer should percieve Booleans as distinct from 1 and 0.

Heck, I'd settle for true and false being distinct from 1 and 0.
(currently the are hard-coded as bit literals in the D compiler)

But the fact remains that D chose to side with C and C++ on logic.
(or actually more with C, since there is no "bool" primitive type)

And since zero is false and non-zero is true, who's really boolean?

> And of course, programmers would still be free to do bitwise
> logical ops between ints. But that is _really_ unrelated to
> this Boolean issue. Isn't it!

It is.

A related issue, however, is what type that conditionals etc expect.
Had there been a boolean type, it's natural to make them require it.

IMHO, Java got this right. (again, the language, not the religion...)
See http://java.sun.com/docs/books/jls/ for that language's details.


Anyway, D is a whole other language and has another such specification.
And in *that* spec, there's three string types and three boolean types.

string types: char[], wchar[], dchar[]
boolean types: bit, wbit (byte), dbit (int)


You get to pick one of each, to optimize for the task at hand.
It's not a fatal flaw... More of a "missed opportunity", really?

"alias bit bool; alias char[] string; goto on_with_our_lives;"
See also http://www.prowiki.org/wiki4d/wiki.cgi?BooleanNotEquBit

--anders

November 23, 2004
In article <co03i7$1nmp$1@digitaldaemon.com>, Russ Lewis says...
>
>Sean Kelly wrote:
>> Technically, since a byte is defined as "the smallest addressable unit" in C/C++, I would expect a byte on such machines to be 32 bits.  But then I suppose we've gotten so used to 8 bit bytes that perhaps the kludges would be expected? In any case, D imposes size constraints on all types, so in our world a byte would be 8 bits.
>
>Well, to quibble with your quibble: on most 32bit machines, 8bit values are individually addressable.  Even on architectures that require aligned reads & writes, they often use byte-addressing, and byte-level primitives are sometimes available.

Understood.  I was actually referring to Georg's imaginary 32 bit machine.


Sean


November 24, 2004
Hi..

> bit[#] is optimized for speed, so it's a multiple of 4 bytes (32 bits)

From what I recall, accessing a byte is just as fast as acessing a dword (on
 >=386). Accessing a word (short, 16-bit) will result in a penalty. Is speed
really _the_ reason for a bit[8] (fixed-sized array) to be allocating
32-bits (independent of the data that follows it) ? For bit[] (dynamically
sized) I understand it. It will definately be faster, because accessing a
dword is not slower and you do less growing.

> No. You could have tested it, too ?

Yes yes, I should have tested it. But I'm one of those "patient lurkers": reading this newsgroup while VC++ is busy building. Anyway, I did not want to end-up discussing differences between dmd and gdc, and they might differ in this respect, so I though I'd keep it theoretical. (Also that I could have tested :-/ )

> There is no "real" boolean type in D.
> (Like one that only takes true/false?)

Oh no, I've started the discussion again? I really didn't mean to.

> Otherwise you need sub-byte pointers to access &something.second, and that's a huge pain to implement and to pass around the block.

Yes, agreed. That's one point where bool and bit would differ: bool* being valid, but bit* not.

> Personally, I'm not sure that bit *is* the fundamental data type...

Theoretically it still is. I mean, it's the basic granularity, there's nothing smaller than bit. The quarks equivalent in information technology.

> Similar to how the triangle is the fundamental type in 3D graphics, I think the byte has become the fundamental data type. For speed ?

Ah, practically yes, but triangles are not theoretically the fundamental type of 3D geometry. Recall ray-tracers, they hardly ever used triangles. But in practice (especially in OGL/D3D-type rendering) triangles are the basis. This shouldn't prevent anybody from discussing 'the basics' though.

I also do think that 3D rendering based on triangles is just a efficient hack, a modern approximation to ray-tracing. Wouldn't be surprised if we get rid of them in the future (NURBS and other parameter-based geometries are gaining ground again).

Lionello.


November 24, 2004
Lionello Lunesu wrote:

>>bit[#] is optimized for speed, so it's a multiple of 4 bytes (32 bits)
> 
> From what I recall, accessing a byte is just as fast as acessing a dword (on  >=386). Accessing a word (short, 16-bit) will result in a penalty. Is speed really _the_ reason for a bit[8] (fixed-sized array) to be allocating 32-bits (independent of the data that follows it) ? For bit[] (dynamically sized) I understand it. It will definately be faster, because accessing a dword is not slower and you do less growing.

It is faster since you can load/store 32 bits at one time, when setting several. Otherwise it would load/store 8 bits at a time, by using bytes.

Like this C code: (where "/ 32" converts to ">> 5")

>     unsigned *data;
[...]
> void Bits::set(unsigned bitnum)
> {
>     data[bitnum / 32] |= 1 << (bitnum & 31);
> }
> 
> void Bits::clear(unsigned bitnum)
> {
>     data[bitnum / 32] &= ~(1 << (bitnum & 31));
> }
> 
> int Bits::test(unsigned bitnum)
> {
>     return data[bitnum / 32] & (1 << (bitnum & 31));
> }

bit[] is similar, just has a count parameter and then a pointer to data.
(which is again allocated in 32-bit chunks, just as it is for bit[#])

You would have to ask Walter what the *real* reason is, though ?

http://www.digitalmars.com/d/arrays.html just says:

>  Bit vectors can be constructed:
> 
> 	bit[10] x;		// array of 10 bits
> 
> The amount of storage used up is implementation dependent.
> Implementation Note: on Intel CPUs it would be rounded up to the next 32 bit size.
> 
> 	x.length		// 10, number of bits
> 	x.size			// 4,  bytes of storage
> 	
> 
> So, the size per element is not (x.size / x.length).

It's similar on PowerPC CPUs, by the way. At least with gdc today.


> Oh no, I've started the discussion again? I really didn't mean to.

It goes back under the rug then... "Never mind the little bump" :-)

>>Otherwise you need sub-byte pointers to access &something.second,
>>and that's a huge pain to implement and to pass around the block.
> 
> Yes, agreed. That's one point where bool and bit would differ: bool* being valid, but bit* not.

bit* is valid too. It's just that it doesn't work properly with arrays?

But you can take the address of a bit field, and use it as "inout" too.
(the dirty secret of course being that single bits are stored as bytes)

>>Personally, I'm not sure that bit *is* the fundamental data type...
> 
> Theoretically it still is. I mean, it's the basic granularity, there's nothing smaller than bit. The quarks equivalent in information technology.

No argument there.

But the difference between theory and practice
is a lot greater in practice than in theory... ;-)

>>Similar to how the triangle is the fundamental type in 3D graphics,
>>I think the byte has become the fundamental data type. For speed ?
> 
> Ah, practically yes, but triangles are not theoretically the fundamental type of 3D geometry. Recall ray-tracers, they hardly ever used triangles. But in practice (especially in OGL/D3D-type rendering) triangles are the basis. This shouldn't prevent anybody from discussing 'the basics' though.

Of course not. Pixels are still the end result...

Just that when drawing fast 3D animations, one
does it a tri/quad at a time in hardware - instead
of submitting each pixel and rendering in software.

Similarly we can access bits using bytes or ints -
plus the various bit operators like: and, or, shifts
and have the hardware accumulate the bits in memory ?

> I also do think that 3D rendering based on triangles is just a efficient hack, a modern approximation to ray-tracing. Wouldn't be surprised if we get rid of them in the future (NURBS and other parameter-based geometries are gaining ground again).

There is some cool stuff possible with "Cg",
http://developer.nvidia.com/page/cg_main.html
(C for Graphics) Apple uses it in CoreImage :
http://www.apple.com/macosx/tiger/core.html

No triangles there, just plain old pixels...

As a side note they are now using float pixels
(i.e. RGB is now float values, and not integers)
This works better, when doing several filters.

But that's not really about "integer types" is it,
--anders
November 25, 2004
Hi.

> It is faster since you can load/store 32 bits at one time, when setting several. Otherwise it would load/store 8 bits at a time, by using bytes.

Yeah, but if the array-size is fixed, as is the case with bit[8]? Why would you even want to fetch 32-bits, if you know you only need 8 and fetching a byte is just as fast?

> You would have to ask Walter what the *real* reason is, though ?

Actually, I was hoping he'd provide an answer to my question :-S (Q: why does bit[8] allocate 32 bits?)

>> Oh no, I've started the discussion again? I really didn't mean to.
> It goes back under the rug then... "Never mind the little bump" :-)

It's amazing how the bool/string-issue pops up every other week. Walter would just have to hard-code the aliases, just to get it over with :-)

>> Yes, agreed. That's one point where bool and bit would differ: bool* being valid, but bit* not.
>
> bit* is valid too. It's just that it doesn't work properly with arrays?
>
> But you can take the address of a bit field, and use it as "inout" too. (the dirty secret of course being that single bits are stored as bytes)

You mean stored as ints? If not, that would be even stranger: bit allocating a byte, but bit[x] allocating at least 4 bytes. Meaning there's a difference between bit and bit[1]. Is bit[1] allowed at all? Is an array of 1 in general allowed? I will test it myself :-)

> But the difference between theory and practice
> is a lot greater in practice than in theory... ;-)

How true. When designing anything, the "theory" should be checked once in a while though.

> There is some cool stuff possible with "Cg",

Hmm, "D for Graphics" comes to mind. Actually, considering their syntax, they might as well rename it :-)

> But that's not really about "integer types" is it,

Hmm, lets declare a floating-bit, fbit, and float as fbit[32] :-) Don't reply to this, it's a joke..

Lionello.


« First   ‹ Prev
1 2