January 29, 2022

On Saturday, 29 January 2022 at 01:06:45 UTC, Siarhei Siamashka wrote:

>

Modern programing languages tend to have separate operators or intrinsincs for wrapped and non-wrapped (trap on overflow) arithmetic operations.

Well, in C++ "unsigned int" is defined to have modular arithmetics. The problem in C++ isn't that it is modular, but that people use "unsigned int" as a value range type, rather than when they want modular arithmetics. This is just a design influenced by classic machine language instructions rather than principled thoughts about math. This problem would probably have been much less if the type had been named "modular int" and not "unsigned int"! Syntax matters. :-D

In Ada I believe this is done explicitly, which clearly is much better.

As for practical problems: In C++ you can still enable signed overflow checks. In D you cannot even do that, because in D all integer operations are defined to be modular.

Interestingly, the main complaint about modular arithmetics in C++ is not about correctness however, but about poor optimizations. As a result, thoughtfully designed C++ libraries avoid using unsigned integers in interfaces.

Why are modular arithmetics performing worse than regular arithmetics? It is often much more difficult (in some cases impossible) to optimize computations that are mapped onto a circle than computations that are mapped onto a straight line!

This is something that D should fix!!

>

I think that it's only a matter of time until processors start adding the missing instructions to make this fast.

No. It isn't crazy slow because of the check. It is crazy slow because it prevents optimizations in complex expressions.

In theory you could compute the overflow as a separate expression and do speculative computations, then switch to a slow path on overflow, but that would be more of a high level approach than a system level approach. In low level programming the programmer wants the code to map to machine language instructions without blowing up the code size in ways that are hard to predict. You want some transparency in how the code you write maps to the hardware instructions.

To make overflow checks really cast you need a much more advanced type system with constraints, so that the compiler can know what values an integer picked up from the heap can have.

January 29, 2022

On Saturday, 29 January 2022 at 10:39:10 UTC, Ola Fosheim Grøstad wrote:

>

As for practical problems: In C++ you can still enable signed overflow checks. In D you cannot even do that, because in D all integer operations are defined to be modular.

Yes, this is a problem and C++ is doing a somewhat better job here.

Imagine an alternative reality, where processors supported efficient modular indexing in arrays and had special instructions for this (specialized DSP processors might have this even now in some form). So that a[i % a.length] has no performance disadvantages compared to a[i]. My guess is that the D language design in this alternative reality would define arrays indexing as a modular operation and consider the "memory safety" goal perfectly achieved (no out of bounds accesses, yay!). Ignoring the fact that landing on a wrong array index is a source of all kind of hard to debug problems, leading to wrong computations or even security vulnerabilities.

>

Interestingly, the main complaint about modular arithmetics in C++ is not about correctness however, but about poor optimizations.

I'm not sure if I can fully agree with that. Correctness is a major concern in C++.

>

As a result, thoughtfully designed C++ libraries avoid using unsigned integers in interfaces.

My understanding is that the primary source of unsigned types in applications is (or at least used to be) the size_t type. Which exists, because a memory buffer may technically span over more than half of the address space, at least on a 32-bit system. API functions, which process memory buffers, have to deal with size_t and this is a headache. But now with 64-bit processors, this is less of an issue and buffer sizes can become signed with no practical loss of functionality.

>

Why are modular arithmetics performing worse than regular arithmetics? It is often much more difficult (in some cases impossible) to optimize computations that are mapped onto a circle than computations that are mapped onto a straight line!

Yes, things are a bit tricky here. Modular arithmetics has some really nice math properties: https://math.stackexchange.com/questions/27336/associativity-commutativity-and-distributivity-of-modulo-arithmetic

When doing modular arithmetics, an expression a + b - c can be always safely transformed into a - c + b. But if we are dealing with potential integer overflow traps, then reordering operations in an expression can't be done safely anymore (will a + b overflow and cause an exception? or will a - c underflow and cause an exception?).

>

This is something that D should fix!!

Do you have a good suggestion?

> >

I think that it's only a matter of time until processors start adding the missing instructions to make this fast.

No. It isn't crazy slow because of the check. It is crazy slow because it prevents optimizations in complex expressions.

In theory you could compute the overflow as a separate expression and do speculative computations, then switch to a slow path on overflow, but that would be more of a high level approach than a system level approach. In low level programming the programmer wants the code to map to machine language instructions without blowing up the code size in ways that are hard to predict. You want some transparency in how the code you write maps to the hardware instructions.

We lost this transparency a long time ago. Compilers are allowed to optimize out big parts of expressions. Integer divisions by a constant are replaced by multiplications and shifts, etc. Functions are inlined, loops are unrolled and/or vectorized.

>

To make overflow checks really cast you need a much more advanced type system with constraints, so that the compiler can know what values an integer picked up from the heap can have.

Well, the reality is that this is not just a theoretical discussion anymore. Trapping of arithmetic overflows already exists in the existing programming languages. And programming languages will keep evolving to handle it even better in the future.

January 30, 2022

On Saturday, 29 January 2022 at 12:23:28 UTC, Siarhei Siamashka wrote:

>

performance disadvantages compared to a[i]. My guess is that the D language design in this alternative reality would define arrays indexing as a modular operation and consider the "memory safety" goal perfectly achieved (no out of bounds accesses, yay!).

Yes, the focus on memory safety is too narrow and can be counter productive.

>

I'm not sure if I can fully agree with that. Correctness is a major concern in C++.

It is a concern, but the driving force for switching from unsigned int to signed int appears to be more about enabling optimization. At least that is my impression.

Another issue is that having multiple integer types can lead to multiple instances of templates, which is kinda pointless.

>

My understanding is that the primary source of unsigned types in applications is (or at least used to be) the size_t type. Which exists, because a memory buffer may technically span over more than half of the address space, at least on a 32-bit system.

Yes, sizeof returns size_t which is unsigned. I think that has more to do with history than practical programming. But there is no reason for modern containers to return unsigned (or rather modular ints). Well, other than being consistent with STL, but not sure why that would be important.

> >

This is something that D should fix!!

Do you have a good suggestion?

How many programs rely on signed wrap-over? Probably none. You could just make a breaking change and provide a compiler flag for getting the old behaviour? Then another compiler flag for trapping on overflow.

>

We lost this transparency a long time ago. Compilers are allowed to optimize out big parts of expressions. Integer divisions by a constant are replaced by multiplications and shifts, etc. Functions are inlined, loops are unrolled and/or vectorized.

But you can control most of those by hints in the source-code or compilation flags.

There is a big difference between optimizations that lead to faster code (or at least code with consistent performance) and code gen that leads to uneven performance. Sometimes hardware is bad at consistent performance too, like computations with float values near zero (denormal numbers), and that is very unpopular among realtime/audio-programmers. You don't want the compiler to add more such issues.

>

Well, the reality is that this is not just a theoretical discussion anymore. Trapping of arithmetic overflows already exists in the existing programming languages. And programming languages will keep evolving to handle it even better in the future.

Yes, but trapping overflows have always existed in languages geared towards higher level programming. C and its descendants are the outliers.

It is true though that processor speed and branch-prediction has made it more attractive also for those that aim at lower level programming.

The best solution for a modern language is probably to:

  1. Improve the type system so that the compiler more often can prove that overflow never can happen for an expression. This can also lead to better optimizations.

  2. Make signed overflow checks the default, but provide an inline annotation to disable it.

I think in general that optimizing all code paths for performance is kinda pointless. Usually critical performance is limited to a smaller set of functions.

January 30, 2022

On Friday, 28 January 2022 at 18:39:54 UTC, deadalnix wrote:

>

On Friday, 28 January 2022 at 02:15:51 UTC, Paul Backus wrote:

>

https://shafik.github.io/c++/2021/12/30/usual_arithmetic_confusions.html

For what it is worth, the first exemple is fixed in D.

The multiplication one is particularly interesting. The other ones really banal wrap around behavior, which your really can't do without if youw ant to make anything fast, IMO.

I'm surprised there aren't more exemple of signed -> unsigned conversion because that one is a real mind fuck.

I wholeheartedly agree with the latter notion. The dreaded signed-to-unsigned conversion has definitely bitten me more than once in D. And not only conversion. Writing something innocent like auto var = arr.length, because auto types are generally more geared towards later code changes. Then doing arithmetic with var... one subtraction, and boom! getting the overflow all of a sudden.

At least in 64-bit programs, I don't really see the benefit of the sizes being unsigned anymore. Even C++ has cont.ssize() now for signed size.

Ivan Kazmenko.

January 30, 2022

On Saturday, 29 January 2022 at 12:23:28 UTC, Siarhei Siamashka wrote:

>

So that a[i % a.length] has no performance disadvantages compared to a[i]. My guess is that the D language design in this alternative reality would define arrays indexing as a modular operation and consider the "memory safety" goal perfectly achieved (no out of bounds accesses, yay!). Ignoring the fact that landing on a wrong array index is a source of all kind of hard to debug problems, leading to wrong computations or even security vulnerabilities.

I doubt that. There are many things in the design of D and Phobos that prevent bugs that aren't related to memory safety, sometimes even at the expense of efficiency. E.g. initializing integers by default is not necessary for @safe, even though in certain unoptimizable cases initialization is unnecessary and slower. Override the default when necessary. As with bound checks by default, this is the best design.

January 31, 2022
On Saturday, 29 January 2022 at 10:39:10 UTC, Ola Fosheim Grøstad wrote:
> As for practical problems: In C++ you can still enable signed overflow checks. In D you cannot even do that, because in D all integer operations are defined to be modular.
>
> Interestingly, the main complaint about modular arithmetics in C++ is not about correctness however, but about poor optimizations.

I don't buy this, even for a moment, and challenge anyone to demonstrate a significant (say, >5%) change in overall performance for a non-trivial c or c++ program from using -fwrapv.
January 31, 2022

On Monday, 31 January 2022 at 05:06:00 UTC, Elronnd wrote:

>

I don't buy this, even for a moment, and challenge anyone to demonstrate a significant (say, >5%) change in overall performance for a non-trivial c or c++ program from using -fwrapv.

In high level template code you get conditionals that always are false, if you dont remove those then the code size will increase. A good optimizer encourage you to write more high level code.

January 31, 2022
On Monday, 31 January 2022 at 06:36:37 UTC, Ola Fosheim Grøstad wrote:
> In high level template code you get conditionals that always are false, if you dont remove those then the code size will increase. A good optimizer encourage you to write more high level code.

I have no doubt it comes up _at all_.  What I am asking is that I do not believe it has an _appreciable_ effect on any real software.
January 31, 2022

On Monday, 31 January 2022 at 07:33:00 UTC, Elronnd wrote:

>

I have no doubt it comes up at all. What I am asking is that I do not believe it has an appreciable effect on any real software.

Not if you work around it, but ask yourself: is it a good idea to design your language in such a way that the compiler is unable to remove this:

if (x < x + 1) { … }

Probably not.

January 31, 2022

On Sunday, 30 January 2022 at 17:22:18 UTC, Ivan Kazmenko wrote:

>

At least in 64-bit programs, I don't really see the benefit of the sizes being unsigned anymore. Even C++ has cont.ssize() now for signed size.

In C++20 they have provided a function std::ssize(container) that will return a signed integer type. That way you can write templates that assume regular arithmetics and that also will work with old containers. Of course, C++ is a language where you need to learn idioms to program well… which makes it a challenging language to pick up. Smaller languages would be better off with breaking changes I think.

(I don't think they have changed the existing containers?)