Is this a bug or a VERY sneaky case?

Is this a bug or a VERY sneaky case?
Dec 25, 2021 rempas
Dec 25, 2021 Temtaime
Dec 25, 2021 rempas
Dec 25, 2021 Rumbu
Dec 25, 2021 Rumbu
Dec 25, 2021 rempas
Dec 25, 2021 Rumbu
Dec 26, 2021 rempas
Dec 28, 2021 WebFreak001
Dec 30, 2021 rempas
Dec 30, 2021 WebFreak001
Dec 30, 2021 rempas

December 25, 2021

Posted by rempas

Permalink

rempas

Permalink

First of all, I would like to ask if there is a better place to make this kind of posts. I'm 99% sure that I have found a bug but I still want to ask just to be sure that there isn't happening something that I don't know about. So yeah, if there is a place that is for this kind of things, please inform me.

Ok so I'm making a library and in one of the functions that converts an integer to a string (char*) and returns it, there is a weird bug. So the problem is when I try to negate the given number and add one to it and then take the result and assign it to an unsigned long (ulong) variable. The line goes as follows: ulong fnum = -num + 1;. So it first does then negation from num and then adds 1 to the result. To test that my function works, I'm using the macros from "limits.h" to check the smallest and biggest possible values for each type. Everything seems to work great except for "INT_MIN". I don't know why this one but it doesn't work as expected. What's weird (and why I think it's 99% a bug) is that If I change that one line of code and make it into two separate lines, it will work (even tho it's the same thing under the hood). What's the change? Well we go from:

ulong fnum = -num + 1; to ulong fnum = -num; fnum++;

which like I said, is the exact same thing! So what are your thoughts? Even if there are more things going one here that I don't know, it doesn't make sense to me that everything else (including "LONG_MIN" which is a bigger number) works and "INT_MIN" doesn't. Also keep in mind that I'm using LDC2 to compile because I'm using GCC inline assembly syntax for the library so I can't compile with DMD.

In case someone want's to see the full function, you can check bellow:

import core.memory;

import core.stdc.stdio;
import core.stdc.stdlib;
import core.stdc.limits;

alias u8  = ubyte;
alias i8  = byte;
alias u16 = ushort;
alias i16 = short;
alias u32 = uint;
alias i32 = int;
alias u64 = ulong;
alias i64 = long;

enum U8_MAX  = 255;
enum U16_MAX = 65535;
enum U32_MAX = 4294967295;
enum U64_MAX = 18446744073709551615;

enum I8_MIN  = -128;
enum I8_MAX  = 127;
enum I16_MIN = -32768;
enum I16_MAX = 32767;
enum I32_MIN = -2147483648;
enum I32_MAX = 2147483647;
enum I64_MIN = -9223372036854775808;
enum I64_MAX = 9223372036854775807;

enum is_same(alias value, T) = is(typeof(value) == T);

char* to_str(T)(T num, u8 base) {
  if (num == 0) return cast(char*)"0";

  bool min_num = false;

  // Digit count for each size
  // That's not the full code, only the one for
  // signed numbers which is what we want for now
  static if (is_same!(num, i8)) {
    enum buffer_size = 5;
  } else static if (is_same!(num, i16)) {
    enum buffer_size = 7;
  } else static if (is_same!(num, i32)) {
    enum buffer_size = 12;
  } else {
    enum buffer_size = 21;
  }

  // Overflow check
  static if (is_same!(num, i8)) {
    if (num == I8_MIN) {
      min_num = true;
      ++num;
    }
  } else static if (is_same!(num, i16)) {
    if (num == I16_MIN) {
      min_num = true;
      ++num;
    }
  } else static if (is_same!(num, i32)) {
    if (num == I32_MIN) {
      min_num = true;
      ++num;
    }
  } else {
    if (num == I64_MIN) {
      min_num = true;
      ++num;
    }
  }

  char* buf = cast(char*)pureMalloc(buffer_size);
  i32 i = buffer_size;
  u64 fnum;

  if (num < 0) {
    if (min_num) {
      fnum = -num + 1; // This line causes the error
      // It works if used as a separate instructions
      // fnum = -num;
      // fnum++;
    }
    else fnum = -num;
  }
  else fnum = num;

  for(; fnum && i; --i, fnum /= base)
    buf[i] = "0123456789abcdef"[fnum % base];

  if (num < 0) {
    buf[i] = '-';
    return buf + i;
  }

  return buf + (i+1);
}

extern (C) void main() {
  printf("The value is %d\n",         INT_MIN);
  printf("The value is %s\n",  to_str(INT_MIN, 10));
  exit(0);
}

December 25, 2021

Re: Is this a bug or a VERY sneaky case?

Posted by Temtaime
in reply to rempas

Permalink

Temtaime

Posted in reply to rempas

Permalink

To get correct results use

fnum = u64(-num) + 1;

there's no bug.

Also get rid of your *_MAX, *_MIN.
Use u8.max, i8.min built-in properties etc.

December 25, 2021

Re: Is this a bug or a VERY sneaky case?

Posted by rempas
in reply to Temtaime

Permalink

rempas

Posted in reply to Temtaime

Permalink

On Saturday, 25 December 2021 at 13:39:14 UTC, Temtaime wrote:

To get correct results use

fnum = u64(-num) + 1;

there's no bug.

Weird. It truly works but why isn't this necessary for other types as well?

Also get rid of your *_MAX, *_MIN.
Use u8.max, i8.min built-in properties etc.

Oh, didn't knew about that! Thanks!

December 25, 2021

Re: Is this a bug or a VERY sneaky case?

Posted by Rumbu
in reply to Temtaime

Permalink

Rumbu

Posted in reply to Temtaime

Permalink

On Saturday, 25 December 2021 at 13:39:14 UTC, Temtaime wrote:

To get correct results use

fnum = u64(-num) + 1;

there's no bug.

Also get rid of your *_MAX, *_MIN.
Use u8.max, i8.min built-in properties etc.

And max length of a integral can be obtained using log10(n)+1, you don't need a bunch of static ifs.

December 25, 2021

Re: Is this a bug or a VERY sneaky case?

Posted by Rumbu
in reply to rempas

Permalink

Rumbu

Posted in reply to rempas

Permalink

On Saturday, 25 December 2021 at 14:55:29 UTC, rempas wrote:

On Saturday, 25 December 2021 at 13:39:14 UTC, Temtaime wrote:

To get correct results use

fnum = u64(-num) + 1;

there's no bug.

Weird. It truly works but why isn't this necessary for other types as well?

Because others are promoted to int according to integer promotion rules.

https://dlang.org/spec/type.html#integer-promotions

so -num + 1 gets compiled as cast(int)(-num) + cast(int)1 as long as num is byte, short, ubyte, ushort, bool, char or wchar.

> >

Also get rid of your *_MAX, *_MIN.
Use u8.max, i8.min built-in properties etc.

Oh, didn't knew about that! Thanks!

December 25, 2021

Re: Is this a bug or a VERY sneaky case?

Posted by rempas
in reply to Rumbu

Permalink

rempas

Posted in reply to Rumbu

Permalink

On Saturday, 25 December 2021 at 17:29:37 UTC, Rumbu wrote:

And max length of a integral can be obtained using log10(n)+1, you don't need a bunch of static ifs.

Now you found the guy to talk about maths.... Could you mind giving a demonstration?

December 25, 2021

Re: Is this a bug or a VERY sneaky case?

Posted by Rumbu
in reply to rempas

Permalink

Rumbu

Posted in reply to rempas

Permalink

On Saturday, 25 December 2021 at 20:50:12 UTC, rempas wrote:

On Saturday, 25 December 2021 at 17:29:37 UTC, Rumbu wrote:

And max length of a integral can be obtained using log10(n)+1, you don't need a bunch of static ifs.

Now you found the guy to talk about maths.... Could you mind giving a demonstration?

It's common sense, log10 means "give me the power of 10 to obtain n". And we know that 10^x means 1 followed by x zeroes, hence the maximum width for the number. You add 1 because of the 1 before the zeroes.

So if we take as an example ubyte.max, we have log10(255) = 2.4. Truncated as int, you get 2. Add 1 and you obtain 3, the exact length of 255.

Or you can take 1000. log10(1000) = 3, add 1, you obtain 4, the exact length of 1000.

December 26, 2021

Re: Is this a bug or a VERY sneaky case?

Posted by rempas
in reply to Rumbu

Permalink

rempas

Posted in reply to Rumbu

Permalink

On Saturday, 25 December 2021 at 21:43:39 UTC, Rumbu wrote:

So if we take as an example ubyte.max, we have log10(255) = 2.4. Truncated as int, you get 2. Add 1 and you obtain 3, the exact length of 255.

Or you can take 1000. log10(1000) = 3, add 1, you obtain 4, the exact length of 1000.

I got that now! You want to replace the static ifs for the enum "buffer_size" (which is a name I'm going to change). However, is the algorithm built-in? If I have to make it, this means that I'll have to spend time finding how to make it and I will also end up with more code I will eliminate. Unless of course there is still something I don't understand...

December 28, 2021

Re: Is this a bug or a VERY sneaky case?

Posted by WebFreak001
in reply to rempas

Permalink

WebFreak001

Posted in reply to rempas

Permalink

On Sunday, 26 December 2021 at 06:34:14 UTC, rempas wrote:

On Saturday, 25 December 2021 at 21:43:39 UTC, Rumbu wrote:

So if we take as an example ubyte.max, we have log10(255) = 2.4. Truncated as int, you get 2. Add 1 and you obtain 3, the exact length of 255.

Or you can take 1000. log10(1000) = 3, add 1, you obtain 4, the exact length of 1000.

log10 (or more correctly for all bases: log(num, base) or log(num)/log(base)) is the mathematically correct answer for positive numbers, but I think practically the major disadvantage (performance) of it in code like this outweigh the advantage (memory saving) of being correct, with integers of values at most -2^63..2^64.

I think your current code is good as it is. (for base 10 at least) I don't think you will really save any memory by leaving out the spare bytes, the malloc might add way more overhead. There is no need to overthink this really and the static if cases for the different data-types are a good enough tradeoff for saving a few bytes of memory for no extra work needed at runtime.

You could better improve your code performance & memory usage by looking into better allocation strategies for the small memory blocks you allocate. But you should only really need this when your custom to_str function is called a massive amount of times. (for an int -> string function like this I could imagine it being worthwhile in certain scenarios though)

For base 2 the biggest number would then be 64 characters, still very manageable.

Some tips I think I would rather suggest to you based on that code: (for style and to avoid bugs, not changing performance or memory usage much)

use contracts for stuff like the base to indicate, that only bases 2..16 are allowed:
char* to_str(T)(T num, u8 base) in(base >= 2 && base <= 16) { ...
(nice for documentation and catches accidental bugs in development, in release builds these checks are omitted - which is part of the reason why you should never catch AssertError, Error or Throwable!)
use D's datatypes, not your own ones (if you want others to look/work on your code too it's better to use the common names for stuff) - but ofc staying consistent across your code is more important
use is(T == ubyte) etc. instead of your custom is_same!(val, ubyte) (same reason as above, people need to read the definition of is_same first)
work with slices, not with pointers (if you plan to use your code from D, it's much cleaner and avoids bugs! does not need a trailing null terminator and works with @safe code)

December 30, 2021

Re: Is this a bug or a VERY sneaky case?

Posted by rempas
in reply to WebFreak001

Permalink

rempas

Posted in reply to WebFreak001

Permalink

On Tuesday, 28 December 2021 at 15:25:30 UTC, WebFreak001 wrote:

Thanks a lot for your huge answer! Yeah, I don't want my program to pay a big price in runtime so I agree with you that "static ifs" are fine. Now I will quote some other stuff but in general thanks a lot

I think your current code is good as it is. (for base 10 at least). For base 2 the biggest number would then be 64 characters, still very manageable.

Yeah you are right and here are how bugs happen and I don't notice them. I will fix it right away!

use contracts for stuff like the base to indicate, that only bases 2..16 are allowed:
char* to_str(T)(T num, u8 base) in(base >= 2 && base <= 16) { ...
(nice for documentation and catches accidental bugs in development, in release builds these checks are omitted - which is part of the reason why you should never catch AssertError, Error or Throwable!)

Thanks! Yeah tbh I don't know a lot of D "very advanced" stuff yet and I only use the features that I need. This looks awesome but one problem with it is that it will not allow me to give a custom message in case of a failure and it will also not print the line from where the function was called but rather the line of the function itself. So I don't know...

use D's datatypes, not your own ones (if you want others to look/work on your code too it's better to use the common names for stuff) - but ofc staying consistent across your code is more important

I suppose you mean about "str" right? In this case, I would love using D's "string" if it wasn't immutable by default. It pisses me A LOT when a language tries to "protect" me from myself. There are a lot of other stuff that I would like "string" to have but I wouldn't mind them so much if string was mutable by default (or even better if string literals were "char*" like C and could get automatically casted so I could use char[] without the need of an ".dup"). Another thing is that I'm making a library so people will probably read the definition of "str" out of interest anyway and learn it if they want to use the library as users. And even for people that want to only contribute to a specific place in the code and not use the library (which why would you do that anyway?), the way "str" is used in the code, is similar to how "string" is used (check how they both have a ".ptr" property to get the actual pointer for example) so I don't really think that there is a problem with that.

use is(T == ubyte) etc. instead of your custom is_same!(val, ubyte) (same reason as above, people need to read the definition of is_same first)

This is a simple definition actually so why is it such of a big deal? Also we shouldn't use "is(T == ubyte)" but instead "is(typeof(val) == ubyte)" just like I'm doing it. This is because in variadic functions, "T" will have different type for each argument so it will not work (I made this mistake and people told me so that's how I know). So why do we have to type this much when we can automate this with a simple definition? There is also one for checking if a type is a number (integer), a floating point, a string (including my "str") etc. I don't find these hard to learn and memorize so I don't find a reason to not make my (our) life easier and just use "macros". This is the main reason I use D and not Vox in the first place (and the fact that in Vox you cannot fully work with Variadic functions yet).

work with slices, not with pointers (if you plan to use your code from D, it's much cleaner and avoids bugs! does not need a trailing null terminator and works with @safe code)

Slices are objects tho and this means paying a runtime cost versus just using a variable. Also the only place I used pointers are with C-type string (char* or u8* in my custom "str") and one member (_count) in my custom "str" struct. And all of this cases were checked very carefully and they are very specific. People that will use the library should rarely need to use pointers and this is what I try to do with my library and why I don't use "libc". However! When it comes to the actual library itself, I want to go as low level as possible and have a library that is as performant as possible.

Top | Forum index | About this forum

Forums