flagging unsigned subtraction assigned to bigger signed number? (page 2)

The original concept of D was to have zero warnings. Code would be accepted or rejected, no wishy-washy semantics. Well, after a few years, warnings crept in with the best of intentions. We wound up exactly in the soup I tried to avoid.

On 21/05/2025 4:44 PM, Walter Bright wrote: > This is a common issue. Unfortunately, nobody has come up with a solution to this in the last 45 years. Since every combination of signed and unsigned has well-defined behavior, prohibiting one of those behaviors is going to break a lot of code. Changing the conversion rules will break a lot of existing behavior. There's no way around it. ``$ gcc --analyzer file.c`` ```c++ #include <stdio.h> void test(unsigned int len, int* ptr) { for(int i = 0; i < len; i++) { int j = i - 1; printf("%d\n", ptr[j]); } } int main() { int val = 0; test(1, &val); return 0; } ``` Some highlights of output (its a giant dump of awesomeness): ``` <source>:6:15: warning: stack-based buffer under-read [CWE-127] [-Wanalyzer-out-of-bounds] out-of-bounds read from byte -4 till byte -1 but 'val' starts at byte 0 ``` Clang-analyzer doesn't appear to have a solution to this (yet), but gcc's does appear to catch the obvious scenario here.

Interesting example! Yes, the DFA done by this dials it up a notch, and it will catch some errors. Some points: 1. it shouldn't issue a warning - it should issue an error. If the programmer wanted this code to execute anyway, he could engage point 2 to defeat the DFA and do an out-of-bounds read. But I have no influence over C, the C community can do what they want 2. it's the old halting problem again. No matter how good the DFA is, it cannot solve the problem in general. It's the same limitation that statically detecting null pointer dereferences has 3. D's approach with array bounds checked arrays does solve the problem in the general case (at some cost to runtime performance). The more advanced DFA could help in removing unnecessary bounds checks.

May 21

Re: flagging unsigned subtraction assigned to bigger signed number?

Posted by Steven Schveighoffer
in reply to Walter Bright

Permalink

Steven Schveighoffer

Posted in reply to Walter Bright

Permalink

On Wednesday, 21 May 2025 at 04:44:35 UTC, Walter Bright wrote:

This is a common issue. Unfortunately, nobody has come up with a solution to this in the last 45 years.

Well, that doesn't mean we shouldn't try. AFAIK, the if(cond); issue existed for decades before we fixed it as well.

Since every combination of signed and unsigned has well-defined behavior, prohibiting one of those behaviors is going to break a lot of code.

How much code? Is it worth it? Won't it find bugs in existing code that are dormant?

Changing the conversion rules will break a lot of existing behavior. There's no way around it.

I don't want to change any existing conversion rules. I want to flag likely error prone behavior. Like if(cond);.

There is hope, however. Try the following rules:

This completely misses the point. There are solutions, obviously. An inexperienced programmer is going to run headlong into this wall and spend hours/days dealing with the fallout.

What I want is for the compiler to tell me when I likely got it wrong.

The errors of if(a = b) and if(cond); have been a godsend for me. Every time I hit that error, I thank the D gods that I was just saved an hour of head scratching. And I have 30 years experience developing software!

P.P.S. Some languages, like Java, decided the solution is to not have an unsigned type. This worked until programmers resorted to desperate measures to fake an unsigned type.

This is not as dramatic as you make it sound. I think it was the correct decision.

But obviously, we can't go that route now.

P.P.P.S. Forced casting results in hidden risk of losing significant bits. Not a good plan for robust code.

Use to instead of cast if you are worried about losing bits. Which should be just about never. C developers get through life just fine, and lose bits left and right.

-Steve

Some good arguments. But it comes down to a value judgement against competing interests. The `if (cond);` has no useful purpose, and so making it illegal is a no-brainer. I'm surprised that the C/C++ Standard still hasn't at least deprecated it. An unsigned type is of less utility in a language like Java that is not a systems programming language (trying to make a memory allocator with no unsigned types is going to wind up pretty ugly). But, people still fake unsigned types and use them. I, too, have found that the prohibition of `if (a = b)` has been of benefit, with pretty much zero downside. Such a construction, even if intended, will always look suspicious, so professional code should eschew it. While you are correct that `(messages.length - i)` will never wrap, as you wrote any subtraction in unsigned math has the potential for wrapping, here it is the `600 -` that's doing it. I've been programming for a very long time, and reflexively double check any subtractions (particularly in for-loops that count down to zero), and any < <= > >= comparisons. They're all potential signed/unsigned bugs. There are a lot more signed/unsigned interactions than just subtraction. Addressing one means what about the others? Do we really want to flag a conversion from unsigned to floating point? Back in 1989 when the first C standard was being developed, half the existing compilers used "sign preserving" integral conversions, while the other half used "value preserving". Couldn't have both, one set of compilers was going to lose. But never on the table was getting rid of mixed signed/unsigned expressions. They were too valuable. (Value preserving was selected.) Another factor is I've used languages that required explicit casting to use mixed signed/unsigned. I understood the reasoning behind it. But it was just unpleasant and the code looked ugly. Just not worth it. P.S. this is one of those perennial topics that regularly comes up.

On Wednesday, 21 May 2025 at 18:19:19 UTC, Walter Bright wrote: > (trying to make a memory allocator with no unsigned types is going to wind up pretty ugly). > Not really. You can just follow the instructions here to emulate unsigned in a principled way: https://www.nayuki.io/page/unsigned-int-considered-harmful-for-java It's not much harder than your rules that effectively ban the minus operator for unsigned arithmetic.

On Wednesday, 21 May 2025 at 18:19:19 UTC, Walter Bright wrote: > > Back in 1989 when the first C standard was being developed, half the existing compilers used "sign preserving" integral conversions, while the other half used "value preserving". Couldn't have both, one set of compilers was going to lose. But never on the table was getting rid of mixed signed/unsigned expressions. They were too valuable. (Value preserving was selected.) It has been a long time since I last used it, but I vaguely recall that sign preserving was what K&R specified, and what the Bell Labs / AT&T compilers did. Hence this was actually a change by the ANSI committee. (Maybe the then existing value preserving compilers did so for a good reason) DF

On Wednesday, 21 May 2025 at 19:19:37 UTC, Derek Fawcus wrote: > It has been a long time since I last used it, but I vaguely recall that > sign preserving was what K&R specified, and what the Bell Labs / AT&T compilers did. > > Hence this was actually a change by the ANSI committee. > > (Maybe the then existing value preserving compilers did so for a good reason) Yup: https://home.nvg.org/~skars/programming/C_Rationale.pdf (page 40 & 41) "After much discussion, the Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving."

Forums