March 27, 2021
On Saturday, 27 March 2021 at 21:02:39 UTC, jmh530 wrote:
> Are you familiar with how Zig handles overflow [1]? They error on overflow by default, but they have additional functions and operators to handle when you want to do wraparound.

Thanks for the link; I hadn't seen Zig's take before. It agrees with my conclusions from developing checkedint: assume the user wants normal integer math by default, signal an error somehow when it fails, and wrap overflow only when this is explicitly requested.

It's not just about reliability vs. performance, it is about making the intended semantics of the code clear:

0) Is overflow wrapped on purpose?
1) Did the programmer somehow prove that overflow cannot occur for all valid inputs?
2) Was the programmer desperate enough for speed to knowingly write incorrect code?
3) Was the programmer simply ignorant or forgetful of this problem?
4) Did the programmer willfully ignore overflow because it is "not the cause of enough problems to be that concerning"?

Most code written in C/D/etc. leaves the answer to this question a mystery for the reader to puzzle out. In contrast, code written using a system like Zig's is far less likely to confuse or mislead the reader.

> Nevertheless, I agree that the ship has sailed for D2 on this.

Yes.

> [1] https://ziglang.org/documentation/master/#Integer-Overflow

March 29, 2021
On Saturday, 27 March 2021 at 03:25:04 UTC, Walter Bright wrote:

> 4. fast integer arithmetic is fundamental to fast code, not a mere micro-optimization. Who wants an overflow check on every pointer increment?

The point is the overflow check is already done by most cpus independent if overflow will be handled by the language or not.
Unfortunately, such cpu's don't send an interrupt, so we have to check twice for overflows.
The best is of course to handle the language safe arithmetics, however this requires full semantic support in the type system.
What about providing two operators for integer arithmetic instead, one safe and one unsafe?
March 29, 2021
On 3/24/21 10:57 AM, Q. Schroll wrote:
> On Wednesday, 24 March 2021 at 11:20:52 UTC, Berni44 wrote:
>> On Tuesday, 23 March 2021 at 21:22:18 UTC, Walter Bright wrote:
>>> It's been there long enough.
>>
>> Isn't that true meanwhile for everything in std.experimental? I ask, because I've got the feeling, that std.experimental doesn't work as expected. For me it looks more or less like an attic, where stuff is put and then forgotten. Maybe the way we used for sumtype is the better approach...
> 
> I have no idea why std.experimental is a thing to begin with. It sounds like a bad idea and it turned out to be one. Moving stuff around in a standard library isn't without some disadvantages: The public import stays as an historic artifact or deprecation is needed, both things that should be avoided. There are cases where it's fine like splitting a module into a package.

It's there because we wanted a place for new parts of phobos to develop without becoming set in stone. The reason it's called "std.experimental" is to disclose explicitly that it is meant to be experimental, subject to breaking changes. Otherwise you get things like javax.

In practice, it turned out not as helpful as originally planned, which is why we haven't put anything new in it for a long long time. Take for instance std.experimental.allocator. At one point, a fundamental design change happened (which is perfectly allowed). But of course, code had depended on it, and now was broken. So stdx.allocator was born (see https://code.dlang.org/packages/stdx-allocator) to allow depending on specific versions of std.experimental.allocator without having to freeze yourself at a specific Phobos version.

It's important to note that std.experimental predates code.dlang.org, which I think is the better way to develop libraries that might become included into phobos (see for instance std.sumtype).

> 
> Can std.experimental packages be removed without deprecation?

In a word, yes. It's experimental, anything is possible. I would recommend we deprecate-remove everything in it into dub packages, or promote them to full-fledged Phobos packages.

-Steve
March 29, 2021
On 3/27/21 3:42 AM, tsbockman wrote:
> With good inlining and optimization, even a library solution generally slows integer math code down by less than a factor of two. (I expect a language solution could do even better.)
> 
> This is significant, but nowhere near big enough to move the bottleneck in most code away from I/O, memory, floating-point, or integer math for which wrapping is semantically correct (like hashing or encryption). In those cases where integer math code really is the bottleneck, there are often just a few hot spots where the automatic checks in some inner loop need to be replaced with manual checks outside the loop.

This claim seems speculative. A factor of two for a fundamental class of operations is very large, not just "significant". We're talking about e.g. 1 cycle for addition, and it was a big deal when it was introduced back in the early 2000s. Checked code is larger, meaning more pressure on the scarce I-cache in large programs - and that's not going to be visible in microbenchmarks. And "I/O is slow anyway" is exactly what drove the development of C++ catastrophically slow iostreams.
March 29, 2021
On Monday, 29 March 2021 at 14:47:15 UTC, Steven Schveighoffer wrote:
> It's important to note that std.experimental predates code.dlang.org, which I think is the better way to develop libraries that might become included into phobos (see for instance std.sumtype).

I was intringued and digged a bit of forum history:
- the idea for std.experimental was out there in 2011 (!)
- debates about its merit vs popularity on code.dlang.org happened in 2014
- first module accepted was std.experimental.logger, in 2015, after an unusually long review time (and after being a DUB package for a while)
- followed by std.experimental.allocator in 2015
- std.experimental.checkedint is added in Apr 2017, at the same time std.experimental.ndslice is removed from Phobos. Development continues on DUB.
- the stdx-allocator DUB package was created in Nov 2017. Today it has a 4.2 score on DUB.




March 29, 2021
On Monday, 29 March 2021 at 16:41:12 UTC, Andrei Alexandrescu wrote:
> Checked code is larger, meaning more pressure on the scarce
> I-cache in large programs - and that's not going to be visible
> in microbenchmarks.

This is true. But, at the moment I don't have an easy way to quantify the size of that effect.

> And "I/O is slow anyway" is exactly what drove the development of C++ catastrophically slow iostreams.

That's really not what I said, though. What I actually said is:

0) The performance of hot code is usually limited by something other than semantically non-wrapping integer arithmetic.
1) When non-wrapping integer arithmetic is the bottleneck, the compiler should usually be able to optimize away most of the cost of checking for overflow.
2) When the compiler cannot optimize away most of the cost, programmers can usually do so manually.
3) Programmers could still disable the checks entirely wherever they consider the performance gain worth the damage done to correctness/reliability.
4) Outside of hot code, the cost isn't significant.

You're picking on (0), but the validity of my claim that checked arithmetic by default wouldn't negatively impact performance much mainly depends upon the truth of (4) plus either the truth of (1), or the willingness and ability of programmers to take advantage of (2) and (3).
March 29, 2021
On 3/29/2021 9:41 AM, Andrei Alexandrescu wrote:
> On 3/27/21 3:42 AM, tsbockman wrote:
>> With good inlining and optimization, even a library solution generally slows integer math code down by less than a factor of two. (I expect a language solution could do even better.)
>>
>> This is significant, but nowhere near big enough to move the bottleneck in most code away from I/O, memory, floating-point, or integer math for which wrapping is semantically correct (like hashing or encryption). In those cases where integer math code really is the bottleneck, there are often just a few hot spots where the automatic checks in some inner loop need to be replaced with manual checks outside the loop.
> 
> This claim seems speculative. A factor of two for a fundamental class of operations is very large, not just "significant". We're talking about e.g. 1 cycle for addition, and it was a big deal when it was introduced back in the early 2000s. Checked code is larger, meaning more pressure on the scarce I-cache in large programs - and that's not going to be visible in microbenchmarks. And "I/O is slow anyway" is exactly what drove the development of C++ catastrophically slow iostreams.

With the LEA instruction, which can do adds and some multiplies in one operation, this calculation often comes at zero cost, as it is uses the address calculation logic that runs in parallel.

LEA does not set any flags or include any overflow detection logic.

Just removing that optimization will result in significant slowdowns.

Yes, bugs happen because of overflows. The worst consequence of this is memory corruption bugs in the form of undersized allocations and subsequent buffer overflows (from malloc(numElems * sizeElem)). But D's buffer overflow protection features mitigate this.

D's integral promotion rules (bytes and shorts are promoted to ints before doing arithmetic) get rid of the bulk of likely overflows. (It's ironic that the integral promotion rules are much maligned and considered a mistake, I don't share that opinion, and this is one of the reasons why.)

In my experience, there are very few places in real code where overflow is a possibility. They usually come in the form of unexpected input, such as overly large files, or specially crafted malicious input. I've inserted checks in DMD's implementation where overflow is a risk.

Placing the burden of checks everywhere is a poor tradeoff.

It isn't even clear what the behavior on overflows should be. Error? Wraparound? Saturation? std.experimental.checkedint enables the user to make this decision on a case-by-case basis. The language properly defaults to the simplest and fastest choice - wraparound.

BTW, Rust does have optional overflow protection, it's turned off for release builds. This is pretty good evidence the performance cost of such checks is not worth it. It also does not do integral promotion, so Rust code is far more vulnerable to overflows.
March 29, 2021
On Monday, 29 March 2021 at 20:00:03 UTC, Walter Bright wrote:
> D's integral promotion rules (bytes and shorts are promoted to ints before doing arithmetic) get rid of the bulk of likely overflows. (It's ironic that the integral promotion rules are much maligned and considered a mistake, I don't share that opinion, and this is one of the reasons why.)

Well...sometimes they do:

    auto result = int.max + int.max;
    writeln(typeof(result).stringof); // int
    writeln(result); // -2

The main issue with D's integer promotion rules is that they're inconsistent. Sometimes truncating the result of an expression requires an explicit cast, and sometimes it doesn't.
March 29, 2021
On Monday, 29 March 2021 at 20:00:03 UTC, Walter Bright wrote:
> It isn't even clear what the behavior on overflows should be. Error? Wraparound? Saturation?

It only seems unclear because you have accepted the idea that computer code "integer" operations may differ from mathematical integer operations in arbitrary ways. Otherwise, the algorithm is simple:

    if(floor(mathResult) <= codeResult && codeResult <= ceil(mathResult))
        return codeResult;
    else
        signalErrorSomehow();

Standard mathematical integer addition does not wrap around or saturate. When someone really wants an operation that wraps around or saturates (not just for speed's sake), then that is a different operation and should use a different name and/or type(s), to avoid sowing confusion and ambiguity throughout the codebase for readers and compilers.

All of the integer behavior that people complain about violates this in some way: wrapping overflow, incorrect signed-unsigned comparisons, confusing/inconsistent implicit conversion rules, undefined behavior of various more obscure operations for certain inputs, etc.

Mathematical integers are a more familiar, simpler, easier to reason about abstraction. When we use this abstraction, we can draw upon our understanding and intuition from our school days, use common mathematical laws and formulas with confidence, etc. Of course the behavior of the computer cannot fully match this infinite abstraction, but it could at least tell us when it is unable to do what was asked of it, instead of just silently doing something else.
March 29, 2021
On Mon, Mar 29, 2021 at 10:47:49PM +0000, tsbockman via Digitalmars-d wrote:
> On Monday, 29 March 2021 at 20:00:03 UTC, Walter Bright wrote:
> > It isn't even clear what the behavior on overflows should be. Error? Wraparound? Saturation?
> 
> It only seems unclear because you have accepted the idea that computer code "integer" operations may differ from mathematical integer operations in arbitrary ways.

The only thing at fault here is the name "integer". `int` in D is defined to be a 32-bit machine word. The very specification of "32-bit" already implies modulo 2^32. Meaning, this is arithmetic modulo 2^32, this is NOT a mathematical infinite-capacity integer. Ditto for the other built-in integral types. When you typed `int` you already signed up for all of the "unintuitive" behaviour that has been the standard behaviour of built-in machine words since the 70's and 80's.  They *approximate* mathematical integers, but they are certainly NOT the same thing as mathematical integers, and this is *by definition*.

If you want mathematical integers, you should be using std.bigint or something similar instead.


> Otherwise, the algorithm is simple:
> 
>     if(floor(mathResult) <= codeResult && codeResult <= ceil(mathResult))
>         return codeResult;
>     else
>         signalErrorSomehow();

Implementing such a scheme would introduce so much overhead that it would render the `int` type essentially useless for systems programming. Or for any application where performance is important, for that matter.


> Standard mathematical integer addition does not wrap around or saturate.  When someone really wants an operation that wraps around or saturates (not just for speed's sake), then that is a different operation and should use a different name and/or type(s), to avoid sowing confusion and ambiguity throughout the codebase for readers and compilers.

The meaning of +, -, *, /, % for built-in machine words has been the one in modulo 2^n arithmetic since the early days when computers were first invented.  This isn't going to change anytime soon in a systems language.  It doesn't matter what you call them; if you don't like the use of the symbols +, -, *, / for anything other than "standard mathematical integers", make your own language and call them something else. But they are the foundational hardware-supported operations upon which more complex abstractions are built; without them, you wouldn't even be capable of arithmetic in the first place.

It's unrealistic to impose pure mathematical definitions on limited-precision hardware numbers.  Sooner or later, any programmer must come to grips with what's actually implemented in hardware, not what he imagines some ideal utopian hardware would implement.  It's like people complaining that IEEE floats are "buggy" or otherwise behave in strange ways.  That's because they're NOT mathematical real numbers. But they *are* a useful approximation of mathematical real numbers -- if used correctly.  That requires learning to work with what's implemented in the hardware rather than imposing mathematical ideals on an abstraction that requires laborious (i.e., inefficient) translations to fit the ugly hardware reality.

If you don't like the "oddness" of hardware-implemented types, there's always the option of using std.bigint, or software like Mathematica or similar that frees you from needing to worry about the ugly realities of the hardware. Just don't expect the same kind of performance you will get by using the hardware types directly.


> All of the integer behavior that people complain about violates this in some way: wrapping overflow, incorrect signed-unsigned comparisons, confusing/inconsistent implicit conversion rules, undefined behavior of various more obscure operations for certain inputs, etc.
> 
> Mathematical integers are a more familiar, simpler, easier to reason about abstraction. When we use this abstraction, we can draw upon our understanding and intuition from our school days, use common mathematical laws and formulas with confidence, etc. Of course the behavior of the computer cannot fully match this infinite abstraction, but it could at least tell us when it is unable to do what was asked of it, instead of just silently doing something else.

It's easy to invent idealized abstractions that are easy to reason about, but which require unnatural contortions to implement efficiently in hardware.  A programming language like D that claims to be a systems programming language needs to be able to program the hardware directly, not to impose some ideal abstractions that do not translate nicely to hardware and that therefore require a lot of complexity on the part of the compiler to implement, and on top of that incurs poor runtime performance.

To quote Knuth:

	People who are more than casually interested in computers should
	have at least some idea of what the underlying hardware is like.
	Otherwise the programs they write will be pretty weird. -- D.
	Knuth

Again, if you expect mathematical integers, use std.bigint. Or MathCAD or similar. The integral types defined in D are raw hardware types of fixed bit length -- which by definition operate according to modulo 2^n arithmetic. The "peculiarities" of the hardware types are inevitable, and I seriously doubt this is going to change anytime in the foreseeable future.  By using `int` instead of `BigInt`, the programmer has already implicitly accepted the "weird" hardware behaviour, and must be prepared to deal with the consequences.  Just as when you use `float` or `double` you already signed up for IEEE semantics, like it or not. (I don't, but I also recognize that it's unrealistic to expect the hardware type to match up 100% with the mathematical ideal.) If you don't like that, use one of the real arithmetic libraries out there that let you work with "true" mathematical reals that aren't subject to the quirks of IEEE floating-point numbers. Just don't expect anything that will be competitive performance-wise.

Like I said, the only real flaw here is the choice of the name `int` for a hardware type that's clearly NOT an unbounded mathemetical integer. It's too late to rename it now, but basically it should be thought of as `intMod32bit` rather than `integerInTheMathematicalSense`. Once you mentally translate `int` into "32-bit 2's-complement binary word in a hardware register", everything else naturally follows.


T

-- 
They pretend to pay us, and we pretend to work. -- Russian saying