Challenge: Make a data type for holding one of 8 directions allowing increment and overflow (page 2)

Settings

Help

Index » Learn » Challenge: Make a data type for holding one of 8 directions allowing increment and overflow (page 2)

March 14

Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

Posted by Basile B.
in reply to Liam McGillivray

Permalink

Basile B.

Posted in reply to Liam McGillivray

Permalink

On Thursday, 14 March 2024 at 23:39:33 UTC, Liam McGillivray wrote:

On Thursday, 14 March 2024 at 01:58:46 UTC, Richard (Rikki) Andrew Cattermole wrote:

[...]

I tried to rework the functions to use bitwise operations, but it was difficult to figure out the correct logic. I decided that it's not worth the hassle, so I just changed the value storage from bool[3] to ubyte. Now it works much more like your version.
https://github.com/LiamM32/Open_Emblem/blob/c2014ab3f77e89c0cedcd6dbf7f8362ebfac33a9/source/common.d

I did a little reading, so now I understand what it means when you have &= 7. But I want to ask, is this faster than %= 8? If not, I would like to change it to the latter for readability.

%=8 will be codegened using slower intructions w/o optimz enabled but with &=7 you directly get the right instruction, which does not involves integer division. See https://godbolt.org/z/74vbba5aG

March 15

Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

Posted by Richard (Rikki) Andrew Cattermole
in reply to Basile B.

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Basile B.

Permalink

On 15/03/2024 12:47 PM, Basile B. wrote:
> On Thursday, 14 March 2024 at 23:39:33 UTC, Liam McGillivray wrote:
>> On Thursday, 14 March 2024 at 01:58:46 UTC, Richard (Rikki) Andrew Cattermole wrote:
>>> [...]
>>
>> I tried to rework the functions to use bitwise operations, but it was difficult to figure out the correct logic. I decided that it's not worth the hassle, so I just changed the value storage from `bool[3]` to `ubyte`. Now it works much more like your version.
>> https://github.com/LiamM32/Open_Emblem/blob/c2014ab3f77e89c0cedcd6dbf7f8362ebfac33a9/source/common.d
>>
>> I did a little reading, so now I understand what it means when you have `&= 7`. But I want to ask, is this faster than `%= 8`? If not, I would like to change it to the latter for readability.
> 
> `%=8` will be codegened using slower intructions w/o optimz enabled but with `&=7` you directly get the right instruction, which does not involves integer division. See https://godbolt.org/z/74vbba5aG

Yes, it'll depend upon how smart the compiler is at optimizing and it may not occur in non-optimizing builds.

The modulas instructions are based upon division, this is an incredibly expensive operation.

https://stackoverflow.com/a/8022107

The division instruction on Haswell for integers ranges from 9 cycles for 8bit, all the way up to 36 cycles for 64bit.

March 15

Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

Posted by Basile B.
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Basile B.

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Friday, 15 March 2024 at 00:00:01 UTC, Richard (Rikki) Andrew Cattermole wrote:

On 15/03/2024 12:47 PM, Basile B. wrote:

On Thursday, 14 March 2024 at 23:39:33 UTC, Liam McGillivray wrote:

On Thursday, 14 March 2024 at 01:58:46 UTC, Richard (Rikki) Andrew Cattermole wrote:

[...]

I did a little reading, so now I understand what it means when you have &= 7. But I want to ask, is this faster than %= 8? If not, I would like to change it to the latter for readability.

Yes, it'll depend upon how smart the compiler is at optimizing and it may not occur in non-optimizing builds.

Indeed GDC (so very likely GCC too, or whatever language uses it as backend...) does it without optimz (https://godbolt.org/z/Ke7c54Gqj).

That's not very surprising. If you look at LLVM bug tracker, for the tag "missed optimisations", in the report body you'll see a lot of "GDC does that but we dont", even if here it's a bit different, as optimz are not enabled.

March 14

Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

Posted by H. S. Teoh
in reply to Liam McGillivray

Permalink

H. S. Teoh

Posted in reply to Liam McGillivray

Permalink

On Thu, Mar 14, 2024 at 11:39:33PM +0000, Liam McGillivray via Digitalmars-d-learn wrote: [...]
> I tried to rework the functions to use bitwise operations, but it was difficult to figure out the correct logic. I decided that it's not worth the hassle, so I just changed the value storage from `bool[3]` to `ubyte`.
[...]

Just wanted to note that while in theory bool[3] could be optimized by the compiler for compact storage, what you're most likely to get is 3 bytes, one for each bool, or perhaps even 3 ints (24 bytes). When dealing with units of data smaller than a byte, you generally need to do it manually, because memory is not addressable by individual bits, making it difficult to implement things like slicing an array of bool. So the compiler is most likely to simplify things by making it an array of bytes rather than emit complex bit manipulation code to make up for the lack of bit-addressability in the underlying hardware.

Using bit operators like others have pointed out in this thread is probably the best way to implement what you want.


T

-- 
LINUX = Lousy Interface for Nefarious Unix Xenophobes.

March 15

Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

Posted by Daniel N
in reply to Liam McGillivray

Permalink

Daniel N

Posted in reply to Liam McGillivray

Permalink

On Tuesday, 12 March 2024 at 05:38:03 UTC, Liam McGillivray wrote:

I am in need of a data type for holding direction information; one of 8 directions on a single axis. They are named in terms of compass directions. If D had a 4-bit datatype, I would just use this and do +=2 whenever I want to change the datatype, but it doesn't.

Perhaps this would be a good programming challenge for someone more experienced than me. Make a data type (probably a struct) for holding one of 8 directional values using 3 bits. It should accept the use of increment operators to change the angle.

Ideally (but optionally), it should work as an enum of the same name; "Direction".

Here's a unittest that it should ideally pass:

D actually supports both 3 and 4 bit integers. People will likely warn you of minor portability risks... but if you target a limited amount of platforms and prefer concise readable code, then it's a text book example for bitfields. The risk can easily be mitigated by having an unittest to catch the error directly(if you try to port to some exotic platform).

dmd -preview=bitfields

(Some lines stolen from Rikki)

struct Direction
{
    private uint value : 3;
    alias this = value;

    enum Direction N  = Direction(0);
    enum Direction NE = Direction(1);
    enum Direction E  = Direction(2);
    enum Direction SE = Direction(3);
    enum Direction S  = Direction(4);
    enum Direction SW = Direction(5);
    enum Direction W  = Direction(6);
    enum Direction NW = Direction(7);
}

March 16

Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

Posted by Liam McGillivray
in reply to H. S. Teoh

Permalink

Liam McGillivray

Posted in reply to H. S. Teoh

Permalink

On Friday, 15 March 2024 at 00:21:42 UTC, H. S. Teoh wrote:

On Thu, Mar 14, 2024 at 11:39:33PM +0000, Liam McGillivray via Digitalmars-d-learn wrote: [...]

Just wanted to note that while in theory bool[3] could be optimized by the compiler for compact storage, what you're most likely to get is 3 bytes, one for each bool, or perhaps even 3 ints (24 bytes). When dealing with units of data smaller than a byte, you generally need to do it manually, because memory is not addressable by individual bits, making it difficult to implement things like slicing an array of bool. So the compiler is most likely to simplify things by making it an array of bytes rather than emit complex bit manipulation code to make up for the lack of bit-addressability in the underlying hardware.

Using bit operators like others have pointed out in this thread is probably the best way to implement what you want.

I'm curious as to what "manual implementation" would mean, since clearly making my own struct with bool[3] doesn't count. Does D have features for precise memory manipulation?

Anyway, I'm surprised that D has a special operator &= for doing bit manipulation on integers, especially given that the steps to convert an int into a bool array is more complicated. I would imagine the former would be a rather niche thing.

March 16

Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

Posted by Liam McGillivray
in reply to Daniel N

Permalink

Liam McGillivray

Posted in reply to Daniel N

Permalink

On Friday, 15 March 2024 at 17:25:09 UTC, Daniel N wrote:

On Tuesday, 12 March 2024 at 05:38:03 UTC, Liam McGillivray wrote:

Ideally (but optionally), it should work as an enum of the same name; "Direction".

Here's a unittest that it should ideally pass:

dmd -preview=bitfields

(Some lines stolen from Rikki)

struct Direction
{
    private uint value : 3;
    alias this = value;

    enum Direction N  = Direction(0);
    enum Direction NE = Direction(1);
    enum Direction E  = Direction(2);
    enum Direction SE = Direction(3);
    enum Direction S  = Direction(4);
    enum Direction SW = Direction(5);
    enum Direction W  = Direction(6);
    enum Direction NW = Direction(7);
}

Oh wow! That looks so clean and elegant, aside from the : 3 part being easy to miss, and not very readable for those not aware of this feature. This would be so simple. If I used this, I wouldn't even need to make a struct and write the operator overload functions; just make an enum for a 3-bit uint.

Based on the words in the command and a quick search, I'm guessing that this is an experimental feature that has not yet been accepted as part of the language. Perhaps I shouldn't use this then, just in case it gets pulled and someone who discovers my project in the future will have a build error that they don't know how to solve. This seems like a bigger problem than the extra memory the current ubyte version takes up, which is probably quite small for a computer of today anyway.

I suppose this would be a nice feature for the language to have if the syntax were reworked. Perhaps something like uint!3 value; would be better.

March 16

Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

Posted by H. S. Teoh
in reply to Liam McGillivray

Permalink

H. S. Teoh

Posted in reply to Liam McGillivray

Permalink

On Sat, Mar 16, 2024 at 09:16:51PM +0000, Liam McGillivray via Digitalmars-d-learn wrote:
> On Friday, 15 March 2024 at 00:21:42 UTC, H. S. Teoh wrote:
[...]
> > When dealing with units of data smaller than a byte, you generally need to do it manually, because memory is not addressable by individual bits, making it difficult to implement things like slicing an array of bool.
[...]
> I'm curious as to what "manual implementation" would mean, since clearly making my own struct with `bool[3]` doesn't count. Does D have features for precise memory manipulation?

Manual implementation as in you would deal with the machine representation in terms of bytes, or more likely, uints (on modern CPUs even though bytes are individually addressible, the hardware actually works in terms of a larger unit, typically an 4-byte 32-bit unit, or an 8-byte 64-bit unit), using bitwise operators to manipulate the bits the way you want to.

> Anyway, I'm surprised that D has a special operator `&=` for doing bit manipulation on integers, especially given that the steps to convert an int into a bool array is more complicated. I would imagine the former would be a rather niche thing.

You should understand that bitwise operators are directly implemented in hardware, and thus operators like &, |, ^, <<, >>, ~, etc., typically map directly to individual CPU instructions. As such, they are very fast, and preferred when you're doing bit-level manipulations.  At this level, you typically do not work with individual bits per se, but with machine words (typically 32-bit or 64-bit units).  Bitwise operators operate on all 32 or 64 bits at once, so performance-aware code typically manipulates all these bits simultaneously rather than individually.  Of course, using suitable bit-masking you *can* address individual bits, but the hardware instructions themselves typically work with all 32/64 bits at once.

Here's a simple example. Suppose you have 3 bits you want to store. Since the machine doesn't have a 3-bit built-in type, you typically just use the next larger available size, either a ubyte (8 bits) if you want compact storage, or if compactness isn't a big issue just a uint (32 bits, you just ignore the other 29 bits that you don't need). So you'd declare the storage something like this:

	uint myBits;

Bits are usually indexed from 0, so bit 0 is the first position, bit 1 is the second position, and so on.  So to set the first bit to 1, you'd do:

	myBits |= 0b001;

Note that at the machine level, this operator works on all 32 bits at the same time. Most of the bits remain unchanged, though, because bitwise OR does not change the original value if the operand is 0. So the overall effect is that the first bit is set.

To set the first bit to 0, there isn't a direct operator that does that, but you can take advantage of the behaviour of bitwise AND, in which any bit which is 0 in the operand will get cleared, everything else remains unchanged. So you'd do this:

	myBits &= 0b110;

Now, since we don't really care about the other 29 bits, we could write this as follows instead, to make our intent clearer:

	myBits &= ~0b001;

The ~ operator flips all the bits, so this is equivalent to writing:

	myBits &= ~0b1111_1111_1111_1111_1111_1111_1111_1110;

Writing it with ~ also has the advantage that should we later decide to add another bit to our "bit array", we don't have to update the code; whereas if we'd used `myBits &= 0b110;` then we'd need to change it to `myBits &= 0b1110;` otherwise our new 4th bit may get unexpectedly cleared when we only wanted to clear the first bit.

Now, what if we wanted to set both the 1st and 3rd bits?  In a hypothetical bit array implementation, we'd do the equivalent of:

	bool[3] myBits;
	myBits[0] = 1;
	myBits[2] = 1;

However, in our uint approach, we can cut the number of operations by half, because the CPU is already operating on the entire 32 bits of the uint at once -- so there's no need to have two instructions to set two individual bits when we could just do it all in one:

	myBits |= 0b101; // look, ma! both bits set at once!

Similarly, to clear the 1st and 3rd bits simultaneously, we simply write:

	myBits &= ~0b101; // clear both bits in 1 instruction!

Of course, when we only have 3 bits to work with, the savings isn't that significant.  However, if you have a larger bit array, say you need an array of 32 bits, this can speed your code up by 32x, because you're taking advantage of the fact that the hardware is already operating on all 32 bits at the same time.  On 64-bit CPUs, you can speed it up by 64x because the CPU operates on all 64 bits simultaneously, so you can manipulate an entire array of 64 bits in a single instruction, which is 64x faster than if you looped over an array of bool with 64 iterations.

T

-- 
Without outlines, life would be pointless.

March 17

Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

Posted by Richard (Rikki) Andrew Cattermole
in reply to Liam McGillivray

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Liam McGillivray

Permalink

Bit fields are currently going through the DIP process, although because of ImportC having it, its just a matter of turning them on and adding the parser stuff.

However there is a major drawback to it and is why you'll still need to use a struct and that is you can't take a pointer to it.

Top | Forum index | About this forum

Forums