Thread overview
bigEndian in std.bitmanip
October 31

Hello,

Why isn't Endian.littleEndian the default setting for read() in std.bitmanip?

Okay, we can easily change this if we want (I could use enum LE in the example) and I can also be reversed with data.retro.array().

void main()
{
  import std.conv : hexString;
  string helloD = hexString!"48656C6C6F204421";
  // compile time converted literal string -ˆ

  import std.string : format;
  auto hexF = helloD.format!"%(%02X%)";

  import std.digest: toHexString;
  auto arr = cast(ubyte[])"Hello D!";

  auto hex = arr.toHexString;
  assert(hex == hexF);

  import std.stdio : writeln;
  hex.writeln(": ", helloD);
// 48656C6C6F204421: Hello D!
  assert(helloD == "Hello D!");

  auto data = arr.readBytes!size_t;
  data.code.writeln(": ", data.bytes);
// 2397076564600448328: Hello D!
}

template readBytes(T, R)
{
  union Bytes
  {
    T code;
    char[T.sizeof] bytes;
  }
  import std.bitmanip;
  enum LE = Endian.littleEndian;

  auto readBytes(ref R data)
  {
   import std.range : retro, array;
   auto reverse = data.retro.array;
   return Bytes(reverse.read!T);
  }
}

However, I think it is not compatible with Union. Thanks...

SDB@79

October 31
On Tuesday, October 31, 2023 4:09:53 AM MDT Salih Dincer via Digitalmars-d- learn wrote:
> Hello,
>
> Why isn't Endian.littleEndian the default setting for read() in
> std.bitmanip?

Why would you expect little endian to be the default? The typical thing to do when encoding integral values in a platform-agnostic manner is to use big endian, not little endian. Either way, it supports both big endian and little endian, so if your use case requires little endian, you can do that. You just have to specifiy the endianness, and if you find that to be too verbose, you can create a wrapper to use in your own code.

- Jonathan M Davis



October 31

On Tuesday, 31 October 2023 at 10:24:56 UTC, Jonathan M Davis wrote:

>

On Tuesday, October 31, 2023 4:09:53 AM MDT Salih Dincer via Digitalmars-d- learn wrote:

>

Hello,

Why isn't Endian.littleEndian the default setting for read() in
std.bitmanip?

Why would you expect little endian to be the default? The typical thing to do when encoding integral values in a platform-agnostic manner is to use big endian, not little endian...

Because when we create a structure with a Union, it does reverse insertion with according to the static array(bytes) index; I showed this above. I also have a convenience template like this:

template readBytes(T, bool big = false, R)
{        // pair endian version 2.0
  import bop = std.bitmanip;

  static if(big)
    enum E = bop.Endian.bigEndian;
  else
    enum E = bop.Endian.littleEndian;

  auto readBytes(ref R dat)
   => bop.read!(T, E)(dat);
}

Sorry to give you extra engage because I already solved the problem with readBytes(). Thank you for your answer, but there is 1 more problem, or even 2! The read() in the library, which is 2nd function, conflicts with std.write. Yeah, there are many solutions to this, but what it does is just read bytes. However, you can insert 4 ushorts into one ulong.

Don't you think the name of the function should be readBytes, not read? Because it doesn't work with any type other than ubyte[]!

SDB@79

October 31

On Tuesday, 31 October 2023 at 10:09:53 UTC, Salih Dincer wrote:

>

Hello,

Why isn't Endian.littleEndian the default setting for read() in std.bitmanip?

Okay, we can easily change this if we want (I could use enum LE in the example) and I can also be reversed with data.retro.array().

void main()
{
  import std.conv : hexString;
  string helloD = hexString!"48656C6C6F204421";
  // compile time converted literal string -ˆ

  import std.string : format;
  auto hexF = helloD.format!"%(%02X%)";

  import std.digest: toHexString;
  auto arr = cast(ubyte[])"Hello D!";

  auto hex = arr.toHexString;
  assert(hex == hexF);

  import std.stdio : writeln;
  hex.writeln(": ", helloD);
// 48656C6C6F204421: Hello D!
  assert(helloD == "Hello D!");

  auto data = arr.readBytes!size_t;
  data.code.writeln(": ", data.bytes);
// 2397076564600448328: Hello D!
}

template readBytes(T, R)
{
  union Bytes
  {
    T code;
    char[T.sizeof] bytes;
  }
  import std.bitmanip;
  enum LE = Endian.littleEndian;

  auto readBytes(ref R data)
  {
   import std.range : retro, array;
   auto reverse = data.retro.array;
   return Bytes(reverse.read!T);
  }
}

However, I think it is not compatible with Union. Thanks...

SDB@79

It might make sense to change since little endian is the most common when it comes to hardware. But big endian is most common when it comes to networking. So I guess it depends on your view of what is most common. Interacting with your local hardware or networking.

October 31
On Tuesday, October 31, 2023 8:23:28 AM MDT Salih Dincer via Digitalmars-d- learn wrote:
> On Tuesday, 31 October 2023 at 10:24:56 UTC, Jonathan M Davis
>
> wrote:
> > On Tuesday, October 31, 2023 4:09:53 AM MDT Salih Dincer via
> >
> > Digitalmars-d- learn wrote:
> >> Hello,
> >>
> >> Why isn't Endian.littleEndian the default setting for read() in
> >> std.bitmanip?
> >
> > Why would you expect little endian to be the default? The typical thing to do when encoding integral values in a platform-agnostic manner is to use big endian, not little endian...
>
> Because when we create a structure with a Union, it does reverse insertion with according to the static array(bytes) index; I showed this above.

I fail to see what the situation with the union has to do with anything. Sure, you can convert between an array of bytes and an int with a union if you want to, but what that does is going to be dependent on your local architecture. read and its related functions in std.bitmanip are architecture-independent. So, they will convert from little endian or big endian regardless of what your local architecture is. You would typically use it on ranges of bytes that come from the network or from serialized data. The most common scenario there is likely to be that they'll be in big endian, because that's what platforma-independent binary formats typically do, but you can explicitly tell read that the range is in little endian if your range of bytes happens to be in little endian. Both scenarios can occur, and it supports both. It just defaults to big endian, because that's the more common scenario when dealing with binary formats.

> I also have a convenience template like this:
> ```d
> template readBytes(T, bool big = false, R)
> {        // pair endian version 2.0
>    import bop = std.bitmanip;
>
>    static if(big)
>      enum E = bop.Endian.bigEndian;
>    else
>      enum E = bop.Endian.littleEndian;
>
>    auto readBytes(ref R dat)
>     => bop.read!(T, E)(dat);
> }
> ```
> Sorry to give you extra engage because I already solved the
> problem with readBytes(). Thank you for your answer, but there is
> 1 more problem, or even 2! The read() in the library, which is
> 2nd function, conflicts with std.write. Yeah, there are many
> solutions to this, but what it does is just read bytes. However,
> you can insert 4 ushorts into one ulong.
>
> Don't you think the name of the function should be readBytes, not read?  Because it doesn't work with any type other than ubyte[]!

D's module system makes it so that names do not need to be unique across modules, and this is not the only case in Phobos where multiple modules use the same function name. It's easy enough to import only the functions you're using or to rename them via the import if you happen to be importing from multiple modules containing functions with the same name. E.G. if you want to do

std.bitmanip : readBytes = read;

then you can.

- Jonathan M Davis



November 02

On Tuesday, 31 October 2023 at 14:43:43 UTC, Imperatorn wrote:

>

It might make sense to change since little endian is the most common when it comes to hardware. But big endian is most common when it comes to networking. So I guess it depends on your view of what is most common. Interacting with your local hardware or networking.

I realized that I had to make my prefer based on the most common. But I have to use Union. That's why I have to choose little.Endian. Because it is compatible with both Union and HexString. My test code works perfectly as seen below. I'm grateful to everyone who helped here and on the other thread.

enum sampleText = "Hello D!"; // length <= 8 char

void main()
{
  //import sdb.string : UnionBytes;
  mixin UnionBytes!size_t;
  bytes.init = sampleText;

  import std.digest: toHexString;
  auto hexF = bytes.cell.toHexString;
  assert(hexF == "48656C6C6F204421");

  import std.string : format;
  auto helloD = sampleText.format!"%(%02X%)";
  assert(hexF == helloD);

  import std.stdio;
  bytes.code.writeln(": ",  helloD); /* Prints:

  2397076564600448328: 48656C6C6F204421      */

  import std.conv : hexString;
  static assert(sampleText == hexString!"48656C6C6F204421");

  //import sdb.string : readBytes;
  auto code = bytes.cell.readBytes!size_t;
  assert(code == bytes.code);

  bytes.init = code;
  code.writeln(": ", bytes); /* Prints:

  2397076564600448328: Hello D!      */

  assert(bytes[] == [72, 101, 108, 108, 111, 32, 68, 33]);

  //import sdb.string : HexString
  auto str = "0x";
  auto hex = HexString!size_t(bytes.code);
  hex.each!(chr => str ~= chr);
  str.writeln; // 0x48656C6C6F204421
}

My core template (UnionBytes) is initialized like this, and underneath I have the readBytes template, which also works with static arrays:

// ...
      import std.range : front, popFront;
      size_t i;
      do // new version: range support
      {
        char chr;                  // default init: 0xFF
        chr &= str.front;          // masking
        code |= T(chr) << (i * 8); // shifting
        str.popFront;              // next char
      } while(++i < size);
    }

    auto opCast(Cast : T)() const
      => code;

    auto opCast(Cast : string)() const
      => this.format!"%s";

    auto toString(void delegate(in char[]) sink) const
      => sink.formattedWrite("%s", cast(char[])cell);

  }
  UnionBytes bytes;     // for mixin
}

template readBytes(T, bool big = false, R)
{        // pair endian version 2.1
  import std.bitmanip;

  static if(big) enum E = Endian.bigEndian;
  else enum E = Endian.littleEndian;

  import std.range : ElementType;
  alias ET = ElementType!R;

  auto readBytes(ref R dat)
  {
    auto data = cast(ET[])dat;
    return read!(T, E)(data);
  }
}

SDB@79

November 02

On Thursday, 2 November 2023 at 11:29:05 UTC, Salih Dincer wrote:

>

On Tuesday, 31 October 2023 at 14:43:43 UTC, Imperatorn wrote:

>

It might make sense to change since little endian is the most common when it comes to hardware. But big endian is most common when it comes to networking. So I guess it depends on your view of what is most common. Interacting with your local hardware or networking.

I realized that I had to make my prefer based on the most common. But I have to use Union. That's why I have to choose little.Endian. Because it is compatible with both Union and HexString. My test code works perfectly as seen below. I'm grateful to everyone who helped here and on the other thread.

Nice to hear you found a solution. Little endian is most common in hardware but big endian is most common in networking, so defining a default endianness can be tricky.