Deprecate implicit conversion between signed and unsigned integers (page 4)

Settings

Help

Index » DIP Ideas » Deprecate implicit conversion between signed and unsigned integers (page 4)

February 17

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Walter Bright
in reply to Quirin Schroll

Permalink

Walter Bright

Posted in reply to Quirin Schroll

Permalink

On 2/13/2025 4:00 PM, Quirin Schroll wrote:
> Signed and unsigned multiplication, division and modulo are completely different operations.

Signed and unsigned multiplication produce the exact same bit pattern result. Division and modulo are indeed different.

> None of those are a bad choice; tradeoffs everywhere.

It's always tradeoffs.

>> 4. What happens with `p[i]`? If `p` is the beginning of a memory object, we want `i` to be unsigned. If `p` points to the middle, we want `i` to be signed. What should be the type of `p - q`? signed or unsigned?
> 
> Two questions, two answers.
> 
>> What happens with `p[i]`?
> 
> That’s a vague question. If `p` is a slice, range error if `i` is signed and negative. If `p` is a pointer, it’s `*(p + i)` and if `i` is signed and negative, so be it. `typeof(p + i)` is `typeof(p)`, so there shouldn’t be a problem.

Sorry, I meant `p` as a pointer. I use `a` as an array (or slice). A pointer can move forward or backwards, so the index is signed. A slice cannot back up, so the index is unsigned. A slice can be converted to a pointer. So then what, is the index signed or unsigned? There's no answer for that.

>> What should be the type of `p - q`? signed or unsigned?
> 
> Signed.

That doesn't work if the array is bigger than the int range, or happens to straddle `int.max`. (The garbage collector can run into this.)

> While it would be annoying for sure, it does make sense to use a function for pointer subtraction when one assumes the difference to be positive: `unsignedDifference(p, q)` It would assert that the result is in fact positive or zero and return a `size_t`. The cool thing about it is that if you expect an unsigned result and happen to be wrong, you’ll find out quicker than otherwise.

I'm sorry, all these extra baggage and rules about signed and unsigned makes it harder to use, not easier.

> As I see it, 2’s complement for both signed and unsigned arithmetic is a straightforward choice D made to keep `@safe` useful.

D's type system preceded @safe by many years :-/

> If D made any of them UB, it would exclude part of basic arithmetic from `@safe` because `@safe` bans every operation that *can* introduce UB.

@safe only bans memory corruption. 2's complement arithmetic is not UB.

> It’s essentially why pointer arithmetic is banned in `@safe`, since `++p` might push `p` outside an array, which is UB. D offers slices as a safe (because checked) alternative to pointers.

`--p` and `++p` are always unsafe whether the implicit conversions are there or not.

>> 6. Casts are a blunt instrument that impair readability and can cause unexpected behavior when changing a type in a refactoring. High quality code avoids the use of explicit casts as much as possible.
> 
> In my experience, when signed and unsigned are mixed, it points to a design issue.
> I had this experience a couple of times working on an older C++ codebase.

Hence my suggestions.

I look at it this way. D is a systems programming language. A requirement for being successful at it is understanding 2's complement arithmetic, including what wraparound is.

It's not that dissimilar to the requirement of some understanding of how floating point code works and its limitations, otherwise grief will be your inevitable companion.

Also that a bool is a one bit integer arithmetic type.

I know there are languages that attempt to hide all this stuff, but D isn't one of them.

February 17

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Atila Neves
in reply to Walter Bright

Permalink

Atila Neves

Posted in reply to Walter Bright

Permalink

On Monday, 17 February 2025 at 08:30:44 UTC, Walter Bright wrote:
> On 2/7/2025 4:50 AM, Atila Neves wrote:
>> I hate ugly code too, but I'd rather have explicit casts.
>
> Pascal required explicit casts. It sounded like a good idea. After a while, I hated it. It was so nice switching to C and leaving that behind.
>
> (Did I mention that explicit casts also hide errors introduced by refactoring?)

`cast(typeof(foo)) bar`?

February 17

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Quirin Schroll
in reply to Walter Bright

Permalink

Quirin Schroll

Posted in reply to Walter Bright

Permalink

On Monday, 17 February 2025 at 09:01:45 UTC, Walter Bright wrote:

On 2/13/2025 4:00 PM, Quirin Schroll wrote:

Signed and unsigned multiplication, division and modulo are completely different operations.

Signed and unsigned multiplication produce the exact same bit pattern result. Division and modulo are indeed different.

You’re right, I was mistaken. I thought multiplication by −1 had to be different than multiplication my T.max, but it’s not.

> >

None of those are a bad choice; tradeoffs everywhere.

It's always tradeoffs.

Sometimes, there are better things.

> > >

What happens with p[i]? If p is the beginning of a memory object, we want i to be unsigned. If p points to the middle, we want i to be signed. What should be the type of p - q? signed or unsigned?

Two questions, two answers.

What happens with p[i]?

That’s a vague question. If p is a slice, range error if i is signed and negative. If p is a pointer, it’s *(p + i) and if i is signed and negative, so be it. typeof(p + i) is typeof(p), so there shouldn’t be a problem.

Sorry, I meant p as a pointer. I use a as an array (or slice). A pointer can move forward or backwards, so the index is signed. A slice cannot back up, so the index is unsigned. A slice can be converted to a pointer. So then what, is the index signed or unsigned? There's no answer for that.

The index already has a type. The operation p + i can support signed and unsigned i via overloading. I really don’t see the problem. You’re not inferring the type of the index because of the operation.

> > >

What should be the type of p - q? signed or unsigned?

Signed.

That doesn't work if the array is bigger than the int range, or happens to straddle int.max. (The garbage collector can run into this.)

Why would the GC use int? Unless, of course, it happens to equal ptrdiff_t? Those are conceptually different.

The general problem is, basically, that differences of n-bit integers require n+1 bits to represent. That problem is not inherent to unsigned values, it’s just more obvious because 2 − 1 can’t be represented. In signed world, -2 − int.max doesn’t fit in an int either. Making them signed doesn’t fix differences of indices totally, only differences of non-negative values.

> >

While it would be annoying for sure, it does make sense to use a function for pointer subtraction when one assumes the difference to be positive: unsignedDifference(p, q) It would assert that the result is in fact positive or zero and return a size_t. The cool thing about it is that if you expect an unsigned result and happen to be wrong, you’ll find out quicker than otherwise.

I'm sorry, all these extra baggage and rules about signed and unsigned makes it harder to use, not easier.

It’s much harder to write bugs when signed and unsigned are separated.

> >

As I see it, 2’s complement for both signed and unsigned arithmetic is a straightforward choice D made to keep @safe useful.

D's type system preceded @safe by many years :-/

My argument isn’t so much about history, but UB. Java does the same.

> >

If D made any of them UB, it would exclude part of basic arithmetic from @safe because @safe bans every operation that can introduce UB.

@safe only bans memory corruption.

In the language design space, there’s no difference between UB and memory corruption because memory corruption is a form of UB and any UB can lead to memory corruption (by definition really). Therefore, speaking about memory corruption is equivalent to speaking about UB generally.

D’s @safe bans all UB (by intent at least). If it didn’t, it would allow for memory corruption; it doesn’t matter if it’s directly or indirectly.

2's complement arithmetic is not UB.

Of course it’s not. The alternative to 2’s complement is UB (practically speaking). There are some odd platforms with a negative representation that’s not 2’s complement, but D supports none of them.

What I’m saying is, when designing a programming language, your choices to integer overflow are: 2’s complement or UB. D chose 2’s complement overall (also Java), C/C++ chose 2’s complement for unsigned and UB for signed, Zig chose UB overall.

Guaranteeing 2’s complement means the operation is well-defined for all inputs, but the optimizer can do less. Tradeoffs everywhere.

Even before @safe, having all operations on integers well-defined (maybe ignore division by zero) has positives that I guess you saw.

Historically speaking, had D taken the C/C++ or Zig route, there would be no @safe because if basic operations on integers can be UB, adding a feature like @safe makes no sense.

> >

It’s essentially why pointer arithmetic is banned in @safe, since ++p might push p outside an array, which is UB. D offers slices as a safe (because checked) alternative to pointers.

--p and ++p are always unsafe whether the implicit conversions are there or not.

What I find interesting is that:

For pointers, it’s obvious to almost anyone that slices are a win because of bounds checking, even though it comes with a dual cost: The length has to be stored and indexing operations have to range-checked.
For integer operations, people seem to be hesitant to range-check them, even though that comes only with the cost of doing the check; no bound has to be stored.

It’s not that 2’s complement doesn’t have its place; what I am saying is: The language constructs should be as close to the intuition of the programmer as possible. I for once know when I’m making deliberate use of the bit representation of integers, however, without checks, I’m making use of the bit representation of integers with every operation, most of the time when I don’t intend to.

Most of the time, the fact that integers are binary is conceptually irrelevant.

> > >

Casts are a blunt instrument that impair readability and can cause unexpected behavior when changing a type in a refactoring. High quality code avoids the use of explicit casts as much as possible.

In my experience, when signed and unsigned are mixed, it points to a design issue.
I had this experience a couple of times working on an older C++ codebase.

Hence my suggestions.

One cannot apply suggestions retroactively to a huge codebase that’s >15 years old.

One can, however, ban narrowing conversions and discover the problematic spots in compile errors and address them properly.

I look at it this way. D is a systems programming language. A requirement for being successful at it is understanding 2's complement arithmetic, including what wraparound is.

While I agree that it is true and that I would exclude anyone from being called a competent programmer who doesn’t understand 2’s complement, I find myself rarely thinking about indices and whatnot something other than an integer with a limited range. For hashing and some other algorithms, you do think of those as elements of an ordered unitary ring with an operation referred to as “division with remainder.”

D inherited its types from C and C inherited them from the operations of machines. It wouldn’t have occurred to the creators of C to provide different types for doing boolean logic, integer arithmetic, indexing arithmetic a.k.a. addressing, and bit operations. All of these happen in the same kinds of registers; to most people, however, a boolean value isn’t an integer (even C added _Bool and then bool); a number isn’t an index, and an index isn’t a bit-vector. To most people, size_t means more than “alias to the bit-width unsigned integer type the same size as addresses,” but conceptualizes sizes of memory or indices into arrays (in memory). Nobody would use a size_t to model the age of something; age is a number (within some range) and not an index.

What’s the difference between i << 1 and i * 2? From the low-level perspective, literally none after optimization. However, in code, those encode very different intents.

D is a low-level and a high-level language. From the higher levels, mixing bit-vectors and numbers is usually a mistake. The language requiring to state that, yes, that’s indeed what you want isn’t exactly bad.

It's not that dissimilar to the requirement of some understanding of how floating point code works and its limitations, otherwise grief will be your inevitable companion.

Also that a bool is a one bit integer arithmetic type.

I wonder why D has an 1-bit integer type which is conceptually a boolean value, but no general n-bit integer types? C23 added _BitInt(n) and _BitInt(1) is not bool (which C23 made a proper type).

I know there are languages that attempt to hide all this stuff, but D isn't one of them.

There’s a difference between hiding and not needlessly exposing.

Making the implicit conversion of int to and from uint an error isn’t hiding things akin to Java hiding its pointers.

Narrowing implicit conversions warrant a warning in C and C++ and rightly so – it is likely a mistake and a local fix is available (use an explicit cast); brace-initialization in C++ outright bans it. By the design of D, it should be an error. Alternatives are:

Redesign so the error doesn’t even come up anymore.
Assert, then cast. (If you’re “really sure” it can’t fail.)
Use a throwing narrowing conversion function. (If you’re “mostly sure” it can’t fail.)

February 17

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Paul Backus
in reply to Walter Bright

Permalink

Paul Backus

Posted in reply to Walter Bright

Permalink

On Monday, 17 February 2025 at 09:01:45 UTC, Walter Bright wrote:
> On 2/13/2025 4:00 PM, Quirin Schroll wrote:
>> If D made any of them UB, it would exclude part of basic arithmetic from `@safe` because `@safe` bans every operation that *can* introduce UB.
>
> @safe only bans memory corruption. 2's complement arithmetic is not UB.

Dividing an integer by zero is UB according to the D spec [1], and it is allowed in @safe code.

[1] https://dlang.org/spec/expression.html#division

February 17

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Nick Treleaven
in reply to Walter Bright

Permalink

Nick Treleaven

Posted in reply to Walter Bright

Permalink

On Monday, 17 February 2025 at 08:30:44 UTC, Walter Bright wrote:
> (Did I mention that explicit casts also hide errors introduced by refactoring?)

In this case, we can use these with IFTI instead of explicit casts:

https://dlang.org/phobos/std_conv.html#signed
https://dlang.org/phobos/std_conv.html#unsigned

February 17

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Walter Bright
in reply to Atila Neves

Permalink

Walter Bright

Posted in reply to Atila Neves

Permalink

On 2/17/2025 1:06 AM, Atila Neves wrote:
>> (Did I mention that explicit casts also hide errors introduced by refactoring?)
> 
> `cast(typeof(foo)) bar`?

That can work, but when best practices mean adding more code, the result is usually failure.

Also, what if `foo` changes to something not anticipated by that cast?

February 17

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Walter Bright
in reply to Nick Treleaven

Permalink

Walter Bright

Posted in reply to Nick Treleaven

Permalink

On 2/17/2025 1:11 PM, Nick Treleaven wrote:
> In this case, we can use these with IFTI instead of explicit casts:
> 
> https://dlang.org/phobos/std_conv.html#signed
> https://dlang.org/phobos/std_conv.html#unsigned

Yes (those were Andrei's initiative).

Up to a point. An explicit use of a signed template doesn't work if one is refactoring to an unsigned type.

February 17

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Walter Bright
in reply to Paul Backus

Permalink

Walter Bright

Posted in reply to Paul Backus

Permalink

On 2/17/2025 7:07 AM, Paul Backus wrote:
> On Monday, 17 February 2025 at 09:01:45 UTC, Walter Bright wrote:
>> @safe only bans memory corruption. 2's complement arithmetic is not UB.
> 
> Dividing an integer by zero is UB according to the D spec [1], and it is allowed in @safe code.
> 
> [1] https://dlang.org/spec/expression.html#division

That's correct. But it's not memory corruption, and requiring casts doesn't address it.

The usual result is a signal is generated. These can be intercepted at the user's discretion.

The compiler will flag an error if it can statically determine that the divisor is zero. Runtime checks could be added, but since other languages don't do that, it would put D at a competitive disadvantage.

As always, there are tradeoffs.

February 17

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Walter Bright
in reply to DLearner

Permalink

Walter Bright

Posted in reply to DLearner

Permalink

On 2/7/2025 1:04 PM, DLearner wrote:
> Or, maintaining size_t, make first index of an array 1 not 0, and return 0 if not found.
> Like malloc.
> 
> First array index is 1 also eliminates a fruitful source of off-by-one errors.

That's FORTRAN style. It would break about every piece of D code.

February 18

Re: Deprecate implicit conversion between signed and unsigned integers

Posted by Paul Backus
in reply to Walter Bright

Permalink

Paul Backus

Posted in reply to Walter Bright

Permalink

On Tuesday, 18 February 2025 at 00:33:27 UTC, Walter Bright wrote:
> On 2/17/2025 7:07 AM, Paul Backus wrote:
>> Dividing an integer by zero is UB according to the D spec [1], and it is allowed in @safe code.
>> 
>> [1] https://dlang.org/spec/expression.html#division
>
> That's correct. But it's not memory corruption, and requiring casts doesn't address it.
>
> The usual result is a signal is generated. These can be intercepted at the user's discretion.

An optimizing compiler (like LDC or GDC) is allowed to generate code that produces memory corruption if a division by zero would occur. So this is absolutely a hole in @safe.

If the compiler could guarantee that a signal would be generated on division by zero, that would be sufficient to close the safety hole.

> The compiler will flag an error if it can statically determine that the divisor is zero. Runtime checks could be added, but since other languages don't do that, it would put D at a competitive disadvantage.

An alternative solution that does not require giving up any runtime performance would be to require @safe code to use std.checkedint for dividing integers.

Top | Forum index | About this forum

Forums