Checked integer type API design poll

Sep 17, 2015

tsbockman

Sep 17, 2015

Kagamin

Sep 17, 2015

tsbockman

Sep 17, 2015

tsbockman

Sep 18, 2015

tsbockman

I have written some poll questions concerning the design trade-offs involved in making a good `SafeInt`/`CheckedInt` type. They are about the actual semantics of the API, not the internals, nor bike-shedding about names. (`SafeInt` and `CheckedInt` are integer data types which use `core.checkedint` to guard against overflow, divide-by-zero, etc. Links to current work-in-progress versions by 1) Robert Schadek (burner): https://github.com/D-Programming-Language/phobos/pull/3389 2) Myself (tsbockman): https://github.com/tsbockman/CheckedInt) For the purposes of this poll please assume the following (based on my own extensive testing): 1) Code using checked operations will take about **1.5x longer to run** than unchecked code. (If you compile with GDC, anyway; DMD and LDC are another story...) 2) The main design decision with a significant runtime performance cost, is whether to throw exceptions or not. With some optimization, the hit is modest, but noticeable. 3) Even if the API uses exceptions some places, it can still be used in `nothrow @nogc` code, at the cost of some extra typing. Two further points I would ask the reader to consider: * A checked integer type is fundamentally semantically different from an unchecked type. The difference is of similar magnitude to that of floating-point vs fixed-point. * It might be wise to read the entire poll before answering it - the questions are all related in some way. The poll results are here, if you wish to preview the questions: http://polljunkie.com/poll/kzrije/checked-integer-type-behaviour/view When you are ready, please take the poll yourself: http://polljunkie.com/poll/cytdbq/checked-integer-type-behaviour Thanks for your time.

On Thursday, 17 September 2015 at 08:40:17 UTC, Kagamin wrote: > As I understand, items with bigger average score are less popular? For the ranking/sorting questions, yes. I think the number next to each item is the mean rank given it by the group. A "1.0" would mean that people unanimously ranked that option as best.

I see that someone selected "All of that and more, not listed here" in response to "How many math functions need versions with checked integer support?" If anyone selects that option, please leave a comment explaining what other code you think needs explicit support for checked integer types.

September 18, 2015

Re: Checked integer type API design poll

Posted by tsbockman
in reply to tsbockman

Permalink

tsbockman

Posted in reply to tsbockman

Permalink

This is in reply to the following comment which someone left on PollJunkie:

> I only need signed checked types and unsigned unchecked types.
> Regarding conversion from signed to unsigned, NaN should be mapped
> to T.max (thus neither garbage nor throwing an exception). And bitwise
> operators aren't very useful on signed types - whatever you want do do
> with them can be done on unsigned values and then casted to signed if
> that makes any sense at all."

First off, thanks to you and everyone else who took the time to fill out the poll. The feedback is helpful, and I will likely modify the design of `CheckedInt` in response.

For bitwise operators, it is clear from this poll, and the previous discussion (http://forum.dlang.org/thread/mfbsfkbkczrvtaqssbip@forum.dlang.org), that there is a fairly even split between people who are strongly in favour and people who are strongly opposed. We'd better just make them optional, or perhaps hidden; I will give some thought to the best way to do this.

With respect to mapping NaN to T.max (or T.min for signed types), this general idea has been part of Robert Schadek's work from the beginning; my version makes some use of it as well.

T.max (unsigned) and T.min (signed) are both good candidates for a sentinel value; the question is, is a NaN sentinel value really the right solution for the public API?

Pros:

* As an internal storage format for a `CheckedInt` type, the use of a sentinel value to represent NaN is very memory efficient, which is good for arrays and such. This will almost certainly be a part of the final design.

* A value like T.max would tend to stand out to an experienced programmer during debugging.

Cons:

* All it takes is a single unchecked arithmetic operation - say, `++` - and suddenly the sentinel value is gone, turned into garbage which may be very hard to distinguish from legitimate data at a glance.

* Sentinel values are not type-safe. In general, there is no way for an algorithm which accepts an unchecked `uint` as an input to tell if `uint.max` means "NaN" or "4294967295". An algorithm which needs to be NaN-aware should use a checked type, instead.

Of course, if you always immediately manually check the result of casts against the sentinel value, you would be OK. But then, why not maintain type safety by just manually checking `CheckedInt.isNaN` before casting?

In light of all this, I believe that guaranteeing the return of a sentinel value from failed casts is not worth the trouble, versus just making it undefined behaviour (returning garbage). Either way, the difference in both safety and performance is trivial, so I won't argue about this point if many others disagree.

Throwing an exception, on the other hand, has a clear benefit in preventing silent bugs; albeit one that is perhaps not worth the moderate performance cost and mandatory GC use.

As I said in my first post, even if exceptions are a part of the final design, it will certainly still be possible to avoid them when desired. Moreover, exceptions only naturally come up when *mixing* checked and unchecked types; neither exceptions, nor frequent explicit `isNaN` checks are required as long as code is *consistent* about using only checked integer types (or floating-point).

On the other hand, if we *don't* include exceptions in the API, then it becomes impossible to use the checked integer types in a way that is verified to be safe by the compiler - particularly if the silently-failing cast is made implicit.

Forums