Jump to page: 1 2
Thread overview
Right way to show numbers in binary/hex/octal in your opinion?
Dec 25, 2021
rempas
Dec 25, 2021
Paul Backus
Dec 26, 2021
rempas
Dec 26, 2021
Siarhei Siamashka
Dec 30, 2021
rempas
Dec 27, 2021
Rumbu
Dec 27, 2021
Siarhei Siamashka
Dec 27, 2021
Rumbu
Dec 28, 2021
Siarhei Siamashka
Dec 29, 2021
Rumbu
Dec 30, 2021
rempas
Dec 30, 2021
rikki cattermole
Dec 30, 2021
rempas
Dec 30, 2021
rempas
December 25, 2021

So I have this function that converts a number to a string and it can return it in any base you want. However, for negative numbers, the result may not be what you expected for bases other than decimal if you have used "printf" or "wirtef". Let's say that we want to convert then number -10 from decimal to hex. There are two possible results (check here).

The first one is negate the number and convert it to hex and then add a "-" in front of the number. So the result will be: -a (which is what my function does)

The second one is what "printf" and "writef" does which is using 2's complement. So in this case, we first convert the number to binary using 2's complement and then we convert this number to hex. In this example, -10 to binary is 1111111111110110 which is fff6 in hex. However, for some reason "printf" and "writef" return fffffff6 so go figure....

Here are the advantages of these two methods in my humble opinion:

First method:

  1. Is is what my function does and I would prefer not to change it obviously. Also implementing the other behavior, would need a lot of work (and of course I will have to figure it out, don't know how to do it) and it will make the function much slower.
  2. It is easier on they eyes as it makes it more obvious to understand if the number is signed or not and so what the equivalent is in other systems.

Second method:

  1. It is probably what people would expect and what makes scientifically more sense as decimal was supposed to be the only base that will make sense for humans to read hence be the only base that has the "-" character.

Anyway, I don't even know why it will be practical useful to print a number in another system so, what are your thoughts?

December 25, 2021

On Saturday, 25 December 2021 at 21:12:33 UTC, rempas wrote:

>

The first one is negate the number and convert it to hex and then add a "-" in front of the number. So the result will be: -a (which is what my function does)

This is the correct result.

>

The second one is what "printf" and "writef" does which is using 2's complement. So in this case, we first convert the number to binary using 2's complement and then we convert this number to hex. In this example, -10 to binary is 1111111111110110 which is fff6 in hex. However, for some reason "printf" and "writef" return fffffff6 so go figure....

The two's complement representation of a number depends on the width of the integer type you're using. For the number -10, the 16-bit two's complement is fff6 and the 32-bit two's complement is fffffff6.

The website you link to in your post uses 16 bit integers, but D uses 32-bit ints by default. That's where the difference comes from.

>

Anyway, I don't even know why it will be practical useful to print a number in another system so, what are your thoughts?

If you implement the first method, users can still get the two's complement representation very easily with (for example) cast(uint) -10. On the other hand, if you implement the second method, it's a lot trickier to get the non-two's-complement result. So I think the first method is the better choice here.

December 26, 2021

On Saturday, 25 December 2021 at 22:16:06 UTC, Paul Backus wrote:

>

The two's complement representation of a number depends on the width of the integer type you're using. For the number -10, the 16-bit two's complement is fff6 and the 32-bit two's complement is fffffff6.

The website you link to in your post uses 16 bit integers, but D uses 32-bit ints by default. That's where the difference comes from.

Interesting. I want to create a system library but It's funny I don't know a lot of low level stuff yet. Tho this also makes it a great and even more enjoyable journey :P

>

If you implement the first method, users can still get the two's complement representation very easily with (for example) cast(uint) -10. On the other hand, if you implement the second method, it's a lot trickier to get the non-two's-complement result. So I think the first method is the better choice here.

That's great! It's much easier to me to just let the first behavior anyway that will do! I also posted on cboard forum and I got the same answers so this is how we'll do it! Thanks a lot for your time and happy holidays!

December 26, 2021

On Saturday, 25 December 2021 at 21:12:33 UTC, rempas wrote:

>

Second method:

  1. It is probably what people would expect and what makes scientifically more sense as decimal was supposed to be the only base that will make sense for humans to read hence be the only base that has the "-" character.

I don't think that this makes any sense. Numbers can be negative regardless of the base that is used to show them. If "decimal was supposed to be the only base that will make sense for humans to read", then why bother implementing "this function that converts a number to a string and it can return it in any base you want"?

BTW, this is how Ruby and Crystal languages handle conversion between strings and integers:

puts -10.to_s(16)     # prints -a
puts -10.to_s(2)      # prints -1010
puts 0x100.to_s(10)   # prints 256
puts "10".to_i(2)     # prints 2
puts "-100".to_i(8)   # prints -64

Basically, strings have method ".to_i" and integers have method ".to_s". A single optional argument specifies base (between 2 and 36) and defaults to 10.

December 27, 2021

On Saturday, 25 December 2021 at 21:12:33 UTC, rempas wrote:

>

So I have this function that converts a number to a string and it can return it in any base you want. However, for negative numbers, the result ...

When people are dumping numbers to strings în any other base than 10, they are expecting to see the internal representation of that number. Since the sign doesn't have a reserved bit in the representation of integrals (like it has for floats), for me it doesn't make any sense if I see a negative sign before a hex, octal or binary value.

The trickiest value for integrals is the one with the most significant bit set (e.g. 0x80). This can be -128 for byte, but also 128 for any other type than byte. Now, if we go the other way around and put a minus before 0x80, how do we convert it back to byte? If we assume that 0x80 is always 128, -0x80 will be -128 and can fit a byte. On the other side, you cannot store +0x80 in a byte because is out of range.

This is also an issue în phobos:

https://issues.dlang.org/show_bug.cgi?id=20452
https://issues.dlang.org/show_bug.cgi?id=18290

December 27, 2021

On Monday, 27 December 2021 at 06:55:37 UTC, Rumbu wrote:

>

When people are dumping numbers to strings în any other base than 10, they are expecting to see the internal representation of that number.

Different people may have different expectations and their expectations may be not the same as yours.

How does this "internal representation" logic make sense for the bases, which are not powers of 2? Okay, base 10 is a special snowflake, but what about the others?

If dumping numbers to strings in base 16 is intended to show their internal representation, then why are non-negative numbers not padded with zeroes on the left side (like the negative numbers are padded with Fs) when converted using Dlang's to!string?

As for my expectations, each digit in a base 3 number may be used to represent a chosen branch in a ternary tree (similar to how each digit in a base 2 number may represent a chosen branch in a binary tree). The other bases are useful in a similar way. This has nothing to do with the internal representation.

>

Since the sign doesn't have a reserved bit in the representation of integrals (like it has for floats), for me it doesn't make any sense if I see a negative sign before a hex, octal or binary value.

Why does the internal representation have to leak out and cause artificial troubles/inconsistencies, when these troubles/inconsistencies are trivially avoidable?

>

The trickiest value for integrals is the one with the most significant bit set (e.g. 0x80). This can be -128 for byte, but also 128 for any other type than byte. Now, if we go the other way around and put a minus before 0x80, how do we convert it back to byte? If we assume that 0x80 is always 128, -0x80 will be -128 and can fit a byte. On the other side, you cannot store +0x80 in a byte because is out of range.

I don't understand what's the problem here. It can be easily solved by having a unit test, which verifies that "-0x80" gets correctly converted to -128. Or have I missed something?

>

This is also an issue în phobos:

https://issues.dlang.org/show_bug.cgi?id=20452
https://issues.dlang.org/show_bug.cgi?id=18290

To me this looks very much like just a self inflicted damage and historical baggage, entirely caused by making wrong choices in the past.

December 27, 2021

On Monday, 27 December 2021 at 09:55:46 UTC, Siarhei Siamashka wrote:

>

On Monday, 27 December 2021 at 06:55:37 UTC, Rumbu wrote:

>

When people are dumping numbers to strings în any other base than 10, they are expecting to see the internal representation of that number.

Different people may have different expectations and their expectations may be not the same.

Your expectations must be congruent with the host architecture, otherwise you can have surprises (like the ones in phobos). The architecture has a limited domain and a certain way to represent numbers, they are not infinite. Otherwise computers should perform math ops using strings and you don't want that for performance reasons.

>

I don't understand what's the problem here. It can be easily solved by having a unit test, which verifies that "-0x80" gets correctly converted to -128. Or have I missed something?

How can you convert 0x8000_0000_0000_0000 to long?

And if your response is "use a ulong", I have another one: how do you convert -0x8000_0000_0000_0000 to ulong.

> >

This is also an issue în phobos:

https://issues.dlang.org/show_bug.cgi?id=20452
https://issues.dlang.org/show_bug.cgi?id=18290

To me this looks very much like just a self inflicted damage and historical baggage, entirely caused by making wrong choices in the past.

No, it's just the fact that phobos doesn't use the same convention for both senses of conversion. When converting from number to string, it uses the internal representation - 2's complement. When it is converting from string to number, it uses the "human readable" convention.

December 28, 2021

On Monday, 27 December 2021 at 12:48:52 UTC, Rumbu wrote:

>

How can you convert 0x8000_0000_0000_0000 to long?

And if your response is "use a ulong", I have another one: how do you convert -0x8000_0000_0000_0000 to ulong.

If you actually care about overflows safety, then both of these conversion attempts are invalid and should raise an exception or allow to handle this error in some different fashion. For example, Crystal language ensures overflows safety and even provides two varieties of string-to-integer conversion methods (the one with '?' in name returns nil on error, the other raises an exception):

puts "0000000000000000".to_i64?(16)  || "failed"  # prints 0

puts "8000000000000000".to_i64?(16)  || "failed"  # prints "failed"
puts "-8000000000000000".to_u64?(16) || "failed"  # prints "failed"
puts "8000000000000000".to_u64?(16)  || "failed"  # prints 9223372036854775808
puts "-8000000000000000".to_i64?(16) || "failed"  # prints -9223372036854775808

# Unhandled exception: Invalid Int64: 8000000000000000 (ArgumentError)
puts "8000000000000000".to_i64(16)

And Dlang is doing a similar job, though it doesn't seem to be able to handle negative base 16 numbers:

import std;

void main() {
  // prints 9223372036854775808
  writeln("8000000000000000".to!ulong(16));
  // Exception: Unexpected '-' when converting from type string to type long
  writeln("-8000000000000000".to!long(16));
}

If you want to get rid of overflow errors, then please consider using a larger 128-bit type or a bigint. Or figure out what's the source of this out-of-range input and fix the problem there.

But if you don't care about overflows safety, then it's surely possible to implement another library and define conversion operations to wraparound any arbitrarily large input until it fits into the valid range for the target data type. Using this definition, "0x8000_0000_0000_0000" converted to long will become -9223372036854775808 and "-0x8000_0000_0000_0000" converted to ulong will become 9223372036854775808. I think that this is incorrect, but this mimics the two's complement wraparound semantics and some people may like it.

> > >

This is also an issue în phobos:

https://issues.dlang.org/show_bug.cgi?id=20452
https://issues.dlang.org/show_bug.cgi?id=18290

To me this looks very much like just a self inflicted damage and historical baggage, entirely caused by making wrong choices in the past.

No, it's just the fact that phobos doesn't use the same convention for both senses of conversion. When converting from number to string, it uses the internal representation - 2's complement. When it is converting from string to number, it uses the "human readable" convention.

The "internal representation" is ambiguous. You can't even figure out if FFFF is a positive or a negative number:

import std;

void main() {
  short a = -1;
  writeln(a.to!string(16)); // prints "FFFF"
  long b = 65535;
  writeln(b.to!string(16)); // prints "FFFF"
}

Both -1 and 65535 become exactly the same string after conversion. How are you going to convert it back?

Also you haven't provided any answer to my questions from the earlier message, so I'm repeating them again:

  1. How does this "internal representation" logic make sense for the bases, which are not powers of 2?

  2. If dumping numbers to strings in base 16 is intended to show their internal representation, then why are non-negative numbers not padded with zeroes on the left side (like the negative numbers are padded with Fs) when converted using Dlang's to!string?

December 29, 2021

On Tuesday, 28 December 2021 at 23:45:17 UTC, Siarhei Siamashka wrote:

>

On Monday, 27 December 2021 at 12:48:52 UTC, Rumbu wrote:

>

How can you convert 0x8000_0000_0000_0000 to long?

And if your response is "use a ulong", I have another one: how do you convert -0x8000_0000_0000_0000 to ulong.

If you actually care about overflows safety, then both of these conversion attempts are invalid and should raise an exception or allow to handle this error in some different fashion. For example, Crystal language ensures overflows safety and even provides two varieties of string-to-integer conversion methods (the one with '?' in name returns nil on error, the other raises an exception):

I don't care about overflows, I care about the fact that D must use the same method when it converts numbers to string and the other way around.

Currently D dumps byte.min in hex as "80". But it throws an overflow exception when I try to get my byte back from "80".

Fun fact, when I wrote my decimal library, I had millions of expected values in a file and some of the decimal-int conversions failed according to the tests. The error source was not me, but this line: https://github.com/rumbu13/decimal/blob/a6bae32d75d56be16e82d37af0c8e4a7c08e318a/src/test/test.d#L152, but it took me some time to dig through test file and realise that among the values, there are some strings that cannot be parsed in D (the ones starting with "8").

Yes, this can be a solution to dump it as "-80", but the standard lib does not even parse the "-" today for other bases than 10.

>

If you want to get rid of overflow errors, then please consider using a larger 128-bit type or a bigint. Or figure out what's the source of this out-of-range input and fix the problem there.

That's why I gave you the "long" example. We don't have (yet) a 128-bit type. That was the idea in the first place, language has a limited range of numbers. And when we will have the cent, we will lack a 256-bit type.

>
import std;

void main() {
  short a = -1;
  writeln(a.to!string(16)); // prints "FFFF"
  long b = 65535;
  writeln(b.to!string(16)); // prints "FFFF"
}

Both -1 and 65535 become exactly the same string after conversion. How are you going to convert it back?

I would like to consider that I know exactly what kind of value I am expecting to read.

>

Also you haven't provided any answer to my questions from the earlier message, so I'm repeating them again:

  1. How does this "internal representation" logic make sense for the bases, which are not powers of 2?

Here you have a point :) I never thought to other bases than powers of 2.

>
  1. If dumping numbers to strings in base 16 is intended to show their internal representation, then why are non-negative numbers not padded with zeroes on the left side (like the negative numbers are padded with Fs) when converted using Dlang's to!string?

They are not padded with F's, that's exactly what the number holds in memory as bits.

We are on the same side here, the current to/parse implementation is not the best we can get.

Happy New Year :)

December 30, 2021

On Sunday, 26 December 2021 at 17:35:12 UTC, Siarhei Siamashka wrote:

>

I don't think that this makes any sense. Numbers can be negative regardless of the base that is used to show them. If "decimal was supposed to be the only base that will make sense for humans to read", then why bother implementing "this function that converts a number to a string and it can return it in any base you want"?

When I say that it is the only base that makes sense to read, I mean that it is the only base that is in the base we learn to read and write and it used for a general purpose and not for specific tasks (like to shorten memory addresses for example). Binary is also very common because this is what the machine understands and it is also very easy to go from hex to binary and the opposite. This is also why they bothered adding an official way of showing negative binary and doing mathematic operations with it (the most significant bit identifies if the number is negative or positive).

>

BTW, this is how Ruby and Crystal languages handle conversion between strings and integers:

puts -10.to_s(16)     # prints -a
puts -10.to_s(2)      # prints -1010
puts 0x100.to_s(10)   # prints 256
puts "10".to_i(2)     # prints 2
puts "-100".to_i(8)   # prints -64

Basically, strings have method ".to_i" and integers have method ".to_s". A single optional argument specifies base (between 2 and 36) and defaults to 10.

This is exactly how my library handles them too!

« First   ‹ Prev
1 2