Jump to page: 1 24  
Page
Thread overview
May 20

I just spent about 15 minutes working on a problem where an unsigned subtraction converted to a float caused a totally unexpected value.

I had to printf debug this, because it was happening only at a certain point.

The code was something like this:

foreach(i, v; messages) // messages is a string array
{
   float vpos = 600 - 30 * (messages.length - i);
   DrawText(messages.ptr, 100, vpos.to!int, 20, Colors.WHITE);
}

This is raylib code, and what ends up happening is that once messages.length exceeds 20, you get into a situation where 600 - 30 * (messages.length - i) wraps to ulong.max - 29. And obviously, that doesn't convert to an int.

So the diabolical thing about this, is that I focused on the messages.length - i because that is unsigned math, but there is zero chance this could wrap. So I was puzzled how this could be causing overflow.

But the unsigned-ness percolates throughout the whole expression!

I was brainstorming how we could flag this as error prone without flooding all projects with warnings that won't (usually) be a problem.

I thought of this possible filter:

  1. Unsigned subtraction happens
  2. The result is used as a signed number
  3. The signed number's range encloses the range of all values of the unsigned value.

So that last point might seem puzzling, why do we not care about converting signed/unsigned integers of the same width? We don't care because actually it works out exactly as expected:

e.g.

uint x = 100;
int v = x - 200;
assert(v == -100); // OK!

But if you assign to a type which actually can represent all the uint (or at least the range of uint), then you get bombs:

int x = 100;
double v = x - 200;
assert(v == -100); // BOOM
long v2 = x - 200;
assert(v2 == -100); // BOOM

So why not just flag all unsigned/signed conversions? The problem with this idea (and technically it is a sound idea) is that it will flag so many things that are likely fine. Flagging all conversions just is going to be extremely noisy, and require casts everywhere. Subtraction is where the problems are.

In cases where the compiler knows that unsigned subtraction won't wrap, it can also skip the warning.

Thoughts?

-Steve

May 20

On Tuesday, 20 May 2025 at 03:11:47 UTC, Steven Schveighoffer wrote:

>

I just spent about 15 minutes working on a problem where an unsigned subtraction converted to a float caused a totally unexpected value.

I had to printf debug this, because it was happening only at a certain point.

The code was something like this:

foreach(i, v; messages) // messages is a string array
{
   float vpos = 600 - 30 * (messages.length - i);
   DrawText(messages.ptr, 100, vpos.to!int, 20, Colors.WHITE);
}

Typo there, should have been DrawText(v.ptr, 100, ...)

-Steve

May 19
On Monday, May 19, 2025 9:11:47 PM Mountain Daylight Time Steven Schveighoffer via Digitalmars-d wrote:
> So why not just flag all unsigned/signed conversions? The problem with this idea (and technically it is a sound idea) is that it will flag so many things that are likely fine. Flagging all conversions just is going to be extremely noisy, and require casts everywhere. Subtraction is where the problems are.
>
> In cases where the compiler knows that unsigned subtraction won't wrap, it can also skip the warning.
>
> Thoughts?

Personally, I think that we should abolish all warnings in general. So, I'm very much against adding more. If you have warnings, ultimately, you either get stuck with a list of warnings that you have to wade through to see if there's anything that actually needs to be fixed, or you're forced to "fix" them even when they're not broken. I very much do not want the compiler spitting out stuff that I have to ignore or "fix" just to shut it up. And the fact that we currently have -w makes it all that much worse, because it means that warnings can change the results for type introspection and thus actually change what a program does rather than just warn the programmer of a potential problem. But even with -w being removed (or changed to behave like -wi), I still think that it's a mistake for the compiler to have warnings. Sometimes, a warning will show an actual problem, but in general, it just adds more stuff that programmers have to "fix" to shut the compiler up. If a problem is truly bad enough that it merits telling the programmer about it, then we should be looking at a reasonable set of conditions which are clearly an error and should be treated that way.

So, if we can come up with a compiler error that unequivocally indicates bad logic that actually does need to be fixed, then I'm all for adding it, but I think that anything along the lines of a warning should be left to a linting tool.

And really, this particular issue comes from the general issues of implicit conversions between signed and unsigned. Such implicit conversions are sometimes useful, but they also easily cause bugs which can be hard to find. I do wonder if we'd be better off if we just treated converting between signed and unsigned as the narrowing conversion that it is, or whether it would just create a need for too many casts. VRP would help, but it's not very sophisticated, so I don't know if it would help enough to matter. I suspect that with my code at least, making it an error would be absolutely fine, but it's likely to be very dependent on the kind of code that's written. Either way, from what I recall, Walter is very much against treating converting signedness as a narrowing conversion (even thouhgh it is), so I doubt that he'd agree to it.

- Jonathan M Davis




May 20

On Tuesday, 20 May 2025 at 03:11:47 UTC, Steven Schveighoffer wrote:

>

Thoughts?

-Steve

Alt view: provide .ilength as a signed equivalent that would be here ptrdiff_t

May 20

On Tuesday, 20 May 2025 at 11:12:03 UTC, Guillaume Piolat wrote:

>

On Tuesday, 20 May 2025 at 03:11:47 UTC, Steven Schveighoffer wrote:

>

Thoughts?

-Steve

Alt view: provide .ilength as a signed equivalent that would be here ptrdiff_t

Such a great idea! Even though it is strange having 2 properties meaning the same, only the type being different, I would actually really like having a feature like that: it doesn't break anything and solve actual problems.

I was even saying like yesterday the same problems I find about not having integer division vs floating division operator. As I'm writing code more iteratively, that seems to become quite common in my day.

May 20

On Tuesday, 20 May 2025 at 11:12:03 UTC, Guillaume Piolat wrote:

>

On Tuesday, 20 May 2025 at 03:11:47 UTC, Steven Schveighoffer wrote:

>

Thoughts?

Alt view: provide .ilength as a signed equivalent that would be here ptrdiff_t

In this case, that wouldn't help. Because everything signed is promoted to unsigned.

-Steve

May 20

On Tuesday, 20 May 2025 at 15:20:10 UTC, Steven Schveighoffer wrote:

>

On Tuesday, 20 May 2025 at 11:12:03 UTC, Guillaume Piolat wrote:

>

On Tuesday, 20 May 2025 at 03:11:47 UTC, Steven Schveighoffer wrote:

>

Thoughts?

Alt view: provide .ilength as a signed equivalent that would be here ptrdiff_t

In this case, that wouldn't help. Because everything signed is promoted to unsigned.

-Steve

The only cure that doesn't break everything is getting rid of unsigned at every API level, the mindset "this is always >= 0 so should be unsigned" is what cause all those unsigned bugs.

But it's too late for .length

May 20

On Tuesday, 20 May 2025 at 15:20:10 UTC, Steven Schveighoffer wrote:

>

In this case, that wouldn't help. Because everything signed is promoted to unsigned.

-Steve

Well that also doesn't help that foreach prefer unsigned by default...

May 21
Counter proposal: throw data flow analysis at it, rather than change the language.

It would be able to catch things like:

```d
int[] array = [1, 2];
size_t offset;

foreach(i; 0 .. array.length) {
	offset = i + 1;
}

array[offset]; // Error out of bounds!
```

But not:

```d
void func(int[] array, size_t offset) {
	array[offset];
}
```

To catch the second would require false positives to be acceptable, and the default pain tolerance for the D community is lower.

Yes this is more complex to implement than a language change, but the support can be improved over time, it isn't fixed, and we can have the slower DFA that will error out in the latter example.

May 20
This is a common issue. Unfortunately, nobody has come up with a solution to this in the last 45 years. Since every combination of signed and unsigned has well-defined behavior, prohibiting one of those behaviors is going to break a lot of code. Changing the conversion rules will break a lot of existing behavior. There's no way around it.

There is hope, however. Try the following rules:

1. use `size_t` for all indices and pointer offsets

2. use `ptrdiff_t` for all deltas of the form `size_t - size_t`

Given:
```d
float vpos = 600 - 30 * (messages.length - i);
```
the types are:

```
float = int - int * (size_t - size_t);
```
The rvalue integral promotion rules turn everything into size_t, which is unsigned.

The difficulty arises from `(messages.length - i)` which is computing a delta between two `size_t`s being another `size_t`. What you'd like is the result of the expression being a `ptrdiff_t`.
```d
ptrdiff_t delta = messages.length - i;
float ypos = 600 - delta;
```
That should give you coherent and robust results. No forced cast is needed. (Forced casts should be avoided, as they can hide bugs.)

You could also do this:
```d
foreach_reverse(ptrdiff_t i, v; messages) // messages is a string array
{
    float vpos = 600 - 30 * i;
    DrawText(messages.ptr, 100, vpos.to!int, 20, Colors.WHITE);
}
```
but one could argue it's a bit too clever.

P.S. since size_t is just an alias for uint or ulong, a special status for size_t is impractical. So you cannot just say that `size_t - size_t` should be typed as `ptrdiff_t`.

P.P.S. Some languages, like Java, decided the solution is to not have an unsigned type. This worked until programmers resorted to desperate measures to fake an unsigned type.

P.P.P.S. Forced casting results in hidden risk of losing significant bits. Not a good plan for robust code.
« First   ‹ Prev
1 2 3 4