Deprecate implicit conversion between signed and unsigned integers (page 3)

Settings

Help

Index » DIP Ideas » Deprecate implicit conversion between signed and unsigned integers (page 3)

February 07

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Kagamin
in reply to monkyyy

Permalink

Kagamin

Posted in reply to monkyyy

Permalink

FWIW, if you want C# array idiom

int count(T)(in T[] a)
{
	debug assert(a.length==cast(int)a.length);
	return cast(int)a.length;
}

long lcount(T)(in T[] a)
{
	debug assert(long(a.length)>=0);
	return long(a.length);
}

February 07

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Atila Neves
in reply to Walter Bright

Permalink

Atila Neves

Posted in reply to Walter Bright

Permalink

On Thursday, 6 February 2025 at 09:10:41 UTC, Walter Bright wrote:
> [I'm not sure why a new thread was created?]
>
> This comes up now and then. It's an attractive idea, and seems obvious. But I've always been against it for multiple reasons.
>
> 1. Pascal solved this issue by not allowing any implicit conversions. The result was casts everywhere, which made the code ugly. I hate ugly code.

I hate ugly code too, but I'd rather have explicit casts.

> 3. Is `1` a signed int or an unsigned int?

In Haskell, it could be either and the type would either be inferred. Or the programmer chooses:

1 :: Int

> 4. What happens with `p[i]`? If p is the beginning of a memory object, we want i to be unsigned. If p points to the middle, we want i to be signed. What should be the type of `p - q`? signed or unsigned?

Good questions.

February 07

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by DLearner
in reply to Walter Bright

Permalink

DLearner

Posted in reply to Walter Bright

Permalink

On Thursday, 6 February 2025 at 20:44:46 UTC, Walter Bright wrote:
> Having a function that searches an array for a value and returns the index of the array if found, and -1 if not found, is not a good practice.
>
> An index being returned should be size_t, and the not-found value should be size_t.max.
>
[...]

Or, maintaining size_t, make first index of an array 1 not 0, and return 0 if not found.
Like malloc.

First array index is 1 also eliminates a fruitful source of off-by-one errors.

February 13

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Quirin Schroll
in reply to Walter Bright

Permalink

Quirin Schroll

Posted in reply to Walter Bright

Permalink

On Thursday, 6 February 2025 at 20:52:53 UTC, Walter Bright wrote:

On 2/6/2025 7:18 AM, Quirin Schroll wrote:

Micro-lossy narrowing conversions:
* int/uint → float
* long/ulong → float/double

We already do VRP checks for cases:

float f = 1; // passes
float g = 0x1234_5678; // fails

I didn’t know that, but I hardly ever use floating-point types.

However, that’s not exactly VRP, but a useful check that compile-time-known values are representable in the target type. VRP means that while you normally need a cast to assign an integer to a ubyte, you can assign myInt & 0xFF to a ubyte without cast. You can assign any run-time int to a float.

What you’re pointing out is that “micro-lossy narrowing conversions” are a compile-error if they’re definitely occurring.

February 14

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Quirin Schroll
in reply to Walter Bright

Permalink

Quirin Schroll

Posted in reply to Walter Bright

Permalink

On Thursday, 6 February 2025 at 09:10:41 UTC, Walter Bright wrote:

[I'm not sure why a new thread was created?]

This comes up now and then. It's an attractive idea, and seems obvious. But I've always been against it for multiple reasons.

Pascal solved this issue by not allowing any implicit conversions. The result was casts everywhere, which made the code ugly. I hate ugly code.

Let me guess: Pascal has no value-range propagation?

Java solve this by not having an unsigned type. People went to great lengths to emulate unsigned behavior. Eventually, the Java people gave up and added it.

Java 23 does not have unsigned types, though. There are only operations that essentially reinterpret the bits of signed integer types as unsigned integers and do operations on them. Signed and unsigned multiplication, division and modulo are completely different operations.

Is 1 a signed int or an unsigned int?

Ideally, it has its own type that implicitly converts to anything that can be initialized by the constant. Of course, typeof() must return something,
there are three options:

typeof(1) is typeof(1), similar to typeof(null)
typeof(1) is __static_integer (cf. Zig’s comptime_int)
typeof(1) is int, which makes it indistinguishable from a runtime expression.

D chooses the latter. None of those are a bad choice; tradeoffs everywhere.

What happens with p[i]? If p is the beginning of a memory object, we want i to be unsigned. If p points to the middle, we want i to be signed. What should be the type of p - q? signed or unsigned?

Two questions, two answers.

What happens with p[i]?

That’s a vague question. If p is a slice, range error if i is signed and negative. If p is a pointer, it’s *(p + i) and if i is signed and negative, so be it. typeof(p + i) is typeof(p), so there shouldn’t be a problem.

What should be the type of p - q? signed or unsigned?

Signed. If p and q are compile-time constants, so is p - q, and if it’s nonnegative, converts to unsigned types.

While it would be annoying for sure, it does make sense to use a function for pointer subtraction when one assumes the difference to be positive: unsignedDifference(p, q) It would assert that the result is in fact positive or zero and return a size_t. The cool thing about it is that if you expect an unsigned result and happen to be wrong, you’ll find out quicker than otherwise.

We rely on 2's complement overflow semantics to get the same behavior if i is signed or unsigned, most of the time.

As I see it, 2’s complement for both signed and unsigned arithmetic is a straightforward choice D made to keep @safe useful. If D made any of them UB, it would exclude part of basic arithmetic from @safe because @safe bans every operation that can introduce UB. It’s essentially why pointer arithmetic is banned in @safe, since ++p might push p outside an array, which is UB. D offers slices as a safe (because checked) alternative to pointers.

Casts are a blunt instrument that impair readability and can cause unexpected behavior when changing a type in a refactoring. High quality code avoids the use of explicit casts as much as possible.

In my experience, when signed and unsigned are mixed, it points to a design issue.
I had this experience a couple of times working on an older C++ codebase.

C behavior on this is extremely well known.

Making something valid in C do something it can’t do in C is a bad idea and invites bugs, that is true. Making questionable C things errors prima facie isn’t.

AFAICT, D for the most part sticks to: If it looks like C, it behaves like C or doesn’t compile. Banning signed-to-unsigned conversions (unless VRP proves it’s okay) simply falls into the latter box.

The Value Range Propagation feature was a brilliant solution, that resolved most issues with implicit signed and unsigned conversions, without causing any problems.

Of course VRP is great. For the most part, it means if an implicit conversion compiles, it’s because nothing weird happens, no data can be lost, etc. Signed to unsigned conversion breaks this expectation that VRP in fact co-created.

Array bounds checking tends to catch the usual bugs with conflating signed with unsigned. Array bounds checking is a total winner of a feature.

It’s generally good. Almost no-one complains about it.

Andrei and I went around and around on this, pointing out the contradictions. There was no solution. There is no "correct" answer for integer 2's complement arithmetic.

I don’t really know what that means. Integer types in C and most languages derived from it (D included) inherited have this oddity that addition and subtraction is 2’s complement, but multiplication, division, and modulo are not (cast(uint)(-10 / 3) and cast(uint)-10 / 3 are different). Mathematically speaking, integers in D are neither values modulo 2ⁿ nor a section of ℤ.

Here's what I do:

use unsigned if the declaration should never be negative.
use size_t for all pointer offsets
use ptrdiff_t for deltas of size_t that could go negative
otherwise, use signed

Stick with those and most of the problems will be avoided.

Sounds reasonable.

February 14

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Quirin Schroll
in reply to Kagamin

Permalink

Quirin Schroll

Posted in reply to Kagamin

Permalink

On Thursday, 6 February 2025 at 16:39:26 UTC, Kagamin wrote:

On Monday, 3 February 2025 at 18:40:20 UTC, Atila Neves wrote:

https://forum.dlang.org/post/pbhjffbxdqpdwtmcbikh@forum.dlang.org

I agree with Bjarne, the problem is entirely caused by abuse of unsigned integers as positive numbers. And deprecation of implicit conversion is impossible due to this abuse: signed and unsigned integers will be mixed everywhere because signed integers are proper numbers and unsigned integers are everywhere due to abuse.

What would be a “proper number”? At best, signed and unsigned types represent various slices of the infinite integers.

Counterexample is C# that uses signed integers in almost all interfaces and it just works.

C# uses signed integers because not all CLR languages support unsigned types. There’s a CLSCompliantAttribute that warns you if you expose unsigned integers to your public API. That said, the case for 8-bit types is reversed: C#’s byte type is unsigned and sbyte is the signed, non-CLS-compliant variant.

February 15

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Kagamin
in reply to Quirin Schroll

Permalink

Kagamin

Posted in reply to Quirin Schroll

Permalink

On Friday, 14 February 2025 at 00:09:14 UTC, Quirin Schroll wrote:

What would be a “proper number”? At best, signed and unsigned types represent various slices of the infinite integers.

The problem is they are incompatible slices that you have to mix due to abuse of unsigned integers everywhere. At best unsigned integer gives you an extra bit, but in practice it doesn't cut: when you want a bigger integer, you use a much wider integer, not one bit bigger integer.

C# uses signed integers because not all CLR languages support unsigned types.

It demonstrates that the problem is due to abuse of unsigned integers.

February 17

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Walter Bright
in reply to monkyyy

Permalink

Walter Bright

Posted in reply to monkyyy

Permalink

size_t is just an alias declaration. The compiler does not actually know it exists.

February 17

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Walter Bright
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Walter Bright

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On 2/6/2025 8:26 PM, Richard (Rikki) Andrew Cattermole wrote:
> That could resolve this quite nicely.

For popcount, not for anything else. There are a lot of functions with `int` or `uint` parameters, but the sign is meaningless to its operation.

February 17

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Walter Bright
in reply to Atila Neves

Permalink

Walter Bright

Posted in reply to Atila Neves

Permalink

On 2/7/2025 4:50 AM, Atila Neves wrote:
> I hate ugly code too, but I'd rather have explicit casts.

Pascal required explicit casts. It sounded like a good idea. After a while, I hated it. It was so nice switching to C and leaving that behind.

(Did I mention that explicit casts also hide errors introduced by refactoring?)

Top | Forum index | About this forum

Forums