Jump to page: 1 2
Thread overview
Integer precision of function return types
Sep 26
monkyyy
Sep 26
user1234
Sep 27
thinkunix
Sep 27
monkyyy
Sep 27
thinkunix
Sep 27
thinkunix
Sep 27
thinkunix
September 26

Should a function like

uint parseHex(in char ch) pure nothrow @safe @nogc {
	switch (ch) {
	case '0': .. case '9':
		return ch - '0';
	case 'a': .. case 'f':
		return 10 + ch - 'a';
	case 'A': .. case 'F':
		return 10 + ch - 'A';
	default:
		assert(0, "Non-hexadecimal character");
	}
}

instead return an ubyte?

September 26

On Thursday, 26 September 2024 at 06:53:12 UTC, Per Nordlöw wrote:

>

Should a function like

uint parseHex(in char ch) pure nothrow @safe @nogc {
	switch (ch) {
	case '0': .. case '9':
		return ch - '0';
	case 'a': .. case 'f':
		return 10 + ch - 'a';
	case 'A': .. case 'F':
		return 10 + ch - 'A';
	default:
		assert(0, "Non-hexadecimal character");
	}
}

instead return an ubyte?

It will only matter if its stored; stack or the very probable inlining optimizations should just be as simple as possible so you dont confuse the optimizer

September 26

On Thursday, 26 September 2024 at 06:53:12 UTC, Per Nordlöw wrote:

>

Should a function like

uint parseHex(in char ch) pure nothrow @safe @nogc {
	switch (ch) {
	case '0': .. case '9':
		return ch - '0';
	case 'a': .. case 'f':
		return 10 + ch - 'a';
	case 'A': .. case 'F':
		return 10 + ch - 'A';
	default:
		assert(0, "Non-hexadecimal character");
	}
}

instead return an ubyte?

I have no conclusive answer:

  • From an ABI PoV that does not matter, it's AL vs EAX , i.e same "parent" register.
  • From a self-documenting PoV I'd use ubyte. But then you hit the problem of promotion of ch - ... and you have to cast each of them.
September 26

On Thursday, 26 September 2024 at 06:53:12 UTC, Per Nordlöw wrote:

>

Should a function like

uint parseHex(in char ch) pure nothrow @safe @nogc {
	switch (ch) {
	case '0': .. case '9':
		return ch - '0';
	case 'a': .. case 'f':
		return 10 + ch - 'a';
	case 'A': .. case 'F':
		return 10 + ch - 'A';
	default:
		assert(0, "Non-hexadecimal character");
	}
}

instead return an ubyte?

When I use standard library facilities, I try to use ubyte; for example:

(See "toggle comment" in the section...)

void parseFromHexString(R)(out R hex, const(char)[] str)
{
  import std.algorithm : map, copy;
  import std.conv      : to;
  import std.range     : front, chunks;

  alias T = typeof(hex.front);
  str.chunks(T.sizeof * 2)
     .map!(bin => bin
     .to!T(16))
     .copy(hex[]);
}

import std.stdio;
void main()
{
  enum hex = "48656C6C6F2044202620576F726C6421";
  enum size = hex.length / 2;

  auto sample = imported!"std.conv".hexString!hex;
  sample.writeln; // Hello D & World!

  enum hexStr = x"48656C6C6F2044202620576F726C6421";
  hexStr.writeln; // Hello D & World!
  assert(is(typeof(hexStr) == string));

  immutable int[] intStr = x"48656C6C6F2044202620576F726C6421";
  intStr.writeln; // [1214606444, 1864385568, 639653743, 1919706145]


  int[size] buf;
  buf.parseFromHexString(hex);
  buf.writeln;

  //char[size] buff; /*
  ubyte[size] buff;/* please toggle comment with above */
  buff.parseFromHexString("BADEDE");
  buff.writeln;

But when I try to do something with my own functions, I have control and I do what I want. You can also use char below, ubyte is not a problem either:

auto toHexDigit(char value)
{
  if(value > 9) value += 7;
  return '0' + value;
}

auto toHexString(R)(R str)
{
  string result;

  char a, b;
  foreach(char c; str)
  {
    a = c / 16; b = c % 16;
    result ~= a.toHexDigit;
    result ~= b.toHexDigit;
  }
  return result;
}

void main() {  assert(sample.toHexString == hex); }

SDB@79

September 26

On Thursday, 26 September 2024 at 06:53:12 UTC, Per Nordlöw wrote:

>

Should a function like

uint parseHex(in char ch) pure nothrow @safe @nogc {
	switch (ch) {
	case '0': .. case '9':
		return ch - '0';
	case 'a': .. case 'f':
		return 10 + ch - 'a';
	case 'A': .. case 'F':
		return 10 + ch - 'A';
	default:
		assert(0, "Non-hexadecimal character");
	}
}

instead return an ubyte?

ubyte parseHex(immutable char ch) pure nothrow @safe @nogc {
	switch (ch) {
	case '0': .. case '9':
		return (ch - '0') & 0x0F;
	case 'a': .. case 'f':
		return (10 + ch - 'a') & 0x0F;
	case 'A': .. case 'F':
		return (10 + ch - 'A') & 0x0F;
	default:
		assert(0, "Non-hexadecimal character");
	}
}

I’d say yes, use ubyte. I also did two things:

  • (…) & 0x0F to enable value-range propagation. Essentially, the compiler understands that the result of & will only ever be the minimum of the operands and one operand is 0x0F which fits in a ubyte, therefore the expression implicitly converts. Always use implicit conversions when they avoid using cast. With cast, in general, you can do bad things. The compiler only allows safe casts implicitly, even in @system code. Your code is marked @safe, but this is general advice.
  • I removed in from the parameter and used immutable. The in storage class means const as of now, but with the -preview=in and -preview=dip1000 switches combined, it also means scope and scope means something to DIP1000, which can become dangerous on @system code. Do not use in unless you know why exactly you’re using it.

Also, for what it’s worth, you could use an in and out contract.

September 27
Per Nordlöw via Digitalmars-d-learn wrote:
> Should a function like
> 
> ```d
> uint parseHex(in char ch) pure nothrow @safe @nogc {
>      switch (ch) {
>      case '0': .. case '9':
>          return ch - '0';
>      case 'a': .. case 'f':
>          return 10 + ch - 'a';
>      case 'A': .. case 'F':
>          return 10 + ch - 'A';
>      default:
>          assert(0, "Non-hexadecimal character");
>      }
> }
> ```
> 
> instead return an ubyte?

What about using 'auto' as the return type?
I tried it and it seemed to work OK.

Wondering if there are any good reasons to use auto,
or bad reasons why not to use auto here?

September 27
On Friday, 27 September 2024 at 04:23:32 UTC, thinkunix wrote:
> 
> What about using 'auto' as the return type?
> I tried it and it seemed to work OK.
>
> Wondering if there are any good reasons to use auto,
> or bad reasons why not to use auto here?

You have started a style debate that will last a week, great work

Auto is fantastic and everyone should use it more
September 27
monkyyy via Digitalmars-d-learn wrote:
> On Friday, 27 September 2024 at 04:23:32 UTC, thinkunix wrote:
>>
>> What about using 'auto' as the return type?
>> I tried it and it seemed to work OK.
>>
>> Wondering if there are any good reasons to use auto,
>> or bad reasons why not to use auto here?
> 
> You have started a style debate that will last a week, great work

That was not my intent.  It was an honest question.  I'm here to learn
and not looking to start debates or for attitude.

I've seen a lot of "use auto everywhere" especially in C++ and was
wondering where the D community stands on it's use.  Is it generally
favored or not?

Personally, I think auto makes understanding code harder for humans.
But in this example, it seemed like auto was a good fit.
September 27
On Fri, Sep 27, 2024 at 04:13:45PM -0400, thinkunix via Digitalmars-d-learn wrote: [...]
> I've seen a lot of "use auto everywhere" especially in C++ and was wondering where the D community stands on it's use.  Is it generally favored or not?
> 
> Personally, I think auto makes understanding code harder for humans. But in this example, it seemed like auto was a good fit.

In idiomatic D, you'd use `auto` when either (1) you don't care what the type is, you just want whatever value you get to be shoved into a variable, or (2) you *shouldn't* care what the type is, because your code shouldn't be depending on it, e.g., when you're using Voldemort types std.algorithm-style.

The reason for (2) is that in UFCS chains, the only thing you really only care about is what kind of range it is that you're dealing with, and maybe the element type.  What exactly the container type is, is unimportant, and in fact, stating it explicitly is detrimental to maintenance because the library that gave you that type may change the concrete type in the future while retaining the same range and element type.  So by naming an explicit type for the range, you introduce a potential needless breakage in your code when you next upgrade the library.  Instead, use `auto` to let the compiler figure out what the concrete type is, as long as it conforms to the expected range semantics and has a compatible element type, your code will continue to work as before.

This applies not only to library upgrades, but also to code maintenance, e.g., if you decide to reshuffle the elements of a UFCS chain to fix a bug or introduce a new feature.  If explicit types were always used, every such change would entail finding out and updating the type of every component in the chain -- for long chains, this quickly becomes onerous and unmaintainable.  Instead, use `auto` to let the compiler figure it all out for you, and make your code independent of the concrete type so that you can simply move things around just by cutting and pasting, and you don't have to worry about updating every single referenced type.


T

-- 
How do you solve any equation?  Multiply both sides by zero.
September 27
On Thursday, September 26, 2024 12:53:12 AM MDT Per Nordlöw via Digitalmars-d- learn wrote:
> Should a function like
>
> ```d
> uint parseHex(in char ch) pure nothrow @safe @nogc {
>   switch (ch) {
>   case '0': .. case '9':
>       return ch - '0';
>   case 'a': .. case 'f':
>       return 10 + ch - 'a';
>   case 'A': .. case 'F':
>       return 10 + ch - 'A';
>   default:
>       assert(0, "Non-hexadecimal character");
>   }
> }
> ```
>
> instead return an ubyte?

I would argue that ubyte would be better, because it's guaranteed to fit into a ubyte, but if it returns uint, then anyone who wants to assign it to a ubyte will need to cast it, whereas you can just do the casts right here (which could mean a lot less casting if this function is used much). Not only that, but you'd be doing the casts in the code that controls the result, so if something ever changes that makes it so that the type needs to change (e.g. you make it operate on dchar instead of char), you won't end up with callers casting to ubyte when the result then doesn't actually fit into a ubyte - whereas if parseHex's function signature changes from returning ubyte to returning ushort or uint or whatnot, then the change would be caught at compile time with any code that assigned the result to a ubyte.

Now, I'm guessing that it wouldn't ever make sense to change this particular function in a way that the return type needed to change, and returning uint should ultimately work just fine, but I think that restricting the surface area where narrowing casts are likely to happen will ultimately reduce the risk of bugs, and I think that it's pretty clear that there will be less casting overall if the casting is done here instead of at the call site unless the function is barely ever used.

- Jonathan M Davis




« First   ‹ Prev
1 2