Jump to page: 1 2
Thread overview
Contracts, Undefined Behavior, and Defensive,Programming
Jun 13, 2020
Walter Bright
Jun 13, 2020
Johannes Pfau
Jun 13, 2020
Jesse Phillips
Jun 13, 2020
Johannes Pfau
Jun 14, 2020
Jesse Phillips
Jun 14, 2020
Timon Gehr
Jun 13, 2020
Stanislav Blinov
Jun 13, 2020
Johannes Pfau
Jun 15, 2020
Dukc
Jun 15, 2020
Paul Backus
Jun 15, 2020
Paul Backus
Jun 15, 2020
Timon Gehr
Jun 15, 2020
Dukc
Jun 17, 2020
German Diago
Jun 14, 2020
Timon Gehr
Jun 14, 2020
Mark
Jun 14, 2020
Timon Gehr
Jun 14, 2020
Timon Gehr
Jun 18, 2020
Kagamin
June 12, 2020
A short and sweet paper:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf

It echoes Walter's opinions on contracts vs. runtime checking, to a tee. He has been derided on occasion in this group for such views. The paper is a good rationale from an independent source.
June 12, 2020
On 6/12/2020 8:46 PM, Andrei Alexandrescu wrote:
> A short and sweet paper:
> 
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf
> 
> It echoes Walter's opinions on contracts vs. runtime checking, to a tee. He has been derided on occasion in this group for such views. The paper is a good rationale from an independent source.

The authors present the case better than I did :-)
June 13, 2020
Am Fri, 12 Jun 2020 23:46:38 -0400 schrieb Andrei Alexandrescu:

> A short and sweet paper:
> 
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf
> 
> It echoes Walter's opinions on contracts vs. runtime checking, to a tee. He has been derided on occasion in this group for such views. The paper is a good rationale from an independent source.

A good article. Although there have sometimes been discussions about it, I think the difference between input validation and contract checking is now well understood by most people participating in discussions here.


However, this article also gives some vocabulary which might allow us to discuss the assert/assume conflict, where there is no uniform position in the community, in a clearer and more precise way. Consider this example:

------------------------------------------------------------------------
/**
 * returns (1 + 2 + ... + n) / n
 */
double sumDiv(ubyte n) @safe
{
    double accum = 0;

    for (size_t i = 0; i <= n; i++)
    {
        accum += i;
        if (i == n)
            return accum / i;
    }
    // DMD requires this to compile the code, cause it does not detect
    // the above loop covers all possible inputs.
    // GDC and LDC optimizers remove this assert.
    assert(0);
}
------------------------------------------------------------------------

For n = 0, this function causes soft (library) undefined behavior according to the article: 0.0/0 is defined to give NaN, but it's not a valid result for the algorithm above. Please note that the if block in the loop will return for all possible input values. The assert(0) is only needed because of compiler limitations, but never executed.

So we apply defensive programming and add this contract:
----------------------------
double sumDiv(ubyte n)
in (n > 0)
----------------------------


This now nicely detects the error in debug mode. However, the D spec says
"The compiler is free to assume the assert expression is true and optimize
subsequent code accordingly." and "Undefined  Behavior: The subsequent
execution of the program after an assert contract is false.": https://
dlang.org/spec/contracts.html
I didn't remember that the spec was that upfront about this. Let's see
how this affects our example:


* If the compiler interprets in(n > 0) as assume(n > 0), this means that
the first iteration of the for loop with i = 0 has no effect: The if
condition is always false (n != 0) and the accum += 0 has no effect.
* So it may rewrite the loop bounds to start at i = 1
* If this optimized function is called with n = 0, the loop body and the
return is never executed
* Remember that the assert(0) at the end could already be removed by the
optimizer
* I guess all bets are off what happens now


So using definsive programming we turned soft UB into hard UB which will probably crash the program. It could also cause memory corruption, run into an infinite loop, .... And please note that all this even happens in a @safe function.

So because of this, I think it's a really bad idea to conflate assume and assert meanings.

-- 
Johannes
June 13, 2020
On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:
>
>
> So using definsive programming we turned soft UB into hard UB which will probably crash the program. It could also cause memory corruption, run into an infinite loop, .... And please note that all this even happens in a @safe function.
>
> So because of this, I think it's a really bad idea to conflate assume and assert meanings.

The article talks about this. It states that soft undefined behavior can lead to hard UB.

I don't think it tries to argue this is OK. What it talks about is the importance of validation of external input. Saying that if we have well guarded inputs will allow us to detect program bugs through assertion seems a little optimistic.

However I do think emphasizing the input validation early is something important and severely lacking in the web development side. Though it is also hard to define where the system input vs external input boundaries are.

The concept that a method should provide a form with input validation and one with defensive programming is interesting.
June 13, 2020
On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:

> ------------------------------------------------------------------------
> /**
>  * returns (1 + 2 + ... + n) / n
>  */
> double sumDiv(ubyte n) @safe
> {
>     double accum = 0;
>
>     for (size_t i = 0; i <= n; i++)
>     {
>         accum += i;
>         if (i == n)
>             return accum / i;
>     }
>     // DMD requires this to compile the code, cause it does not detect
>     // the above loop covers all possible inputs.
>     // GDC and LDC optimizers remove this assert.
>     assert(0);
> }

> For n = 0, this function causes soft (library) undefined behavior according to the article: 0.0/0 is defined to give NaN, but it's not a valid result for the algorithm above.

According to the algorithm, as presented, NaN is a valid result, and this so-called "soft" UB does not exist. It is the documentation of `sumDiv` that is incorrect, or rather, incomplete - it presupposes `n` being non-zero, but otherwise does not make any statement wrt the interface (i.e. usage) of the function, so we shall turn to the interface. The function is @safe, so it promises to handle all safe inputs, which certainly include 0, without causing UB, and only return safe values, which NaN is. Optimizers, therefore, must not "optimize" this function into causing "hard" UB (such as removing the 0th iteration without inserting a `ret`).

If we only saw the declaration, without definition, i.e.:

/**
 * returns (1 + 2 + ... + n) / n
 */
double sumDiv(ubyte n) @safe;

Nowhere is it stated what happens if input is 0. But still, it has a safe interface, it is @safe, and so must return a safe value, which NaN is. No UB, "soft" or "hard".

If `sumDiv` expects *the callers* to *never* pass 0, it must either:

- throw an Exception (which its interface allows)
- throw an Error (and become nothrow)
- profess so via its interface:

> double sumDiv(NonZero!ubyte n) @safe @nogc nothrow;
> //or, if it must never return a NaN:
> NeverNaN!double sumDiv(NonZero!ubyte n) @safe @nogc /+/*perhaps*/nothrow+/;

...given hypothetical `NonZero` and `NeverNaN` types. Then the responsibility of checking for 0 is delegated away from the function body, and optimizers may indeed @safe-ly remove the 0th iteration.

> So we apply defensive programming and add this contract:
> ----------------------------
> double sumDiv(ubyte n)
> in (n > 0)

Dropping @safe? Then yes, there may be UB. But if @safe is to be kept, there's a case to be made to error out on that contract as being an invalid contract.

There's a "but" though, which concerns that stipulation on `assert` which you quoted from the spec. But that "but" has to do with semantics of `sumDiv` being available during compilation, which is not always possible because compilers cop out to "opaque" object files too early (i.e. in case if it's a library function, compiled into a separate object file, with only the declaration present during compilation). However, since, according to Walter, I "misunderstand" what linkers do (nevermind the fact that it's very much still about compilation and not linking), I'll say no more on that particular subject. Suffice it to say that in such a case, indeed, conflating `assert` and `assume` is just wrong.
June 13, 2020
Am Sat, 13 Jun 2020 14:41:19 +0000 schrieb Jesse Phillips:

> On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:
>>
>>
>> So using definsive programming we turned soft UB into hard UB which will probably crash the program. It could also cause memory corruption, run into an infinite loop, .... And please note that all this even happens in a @safe function.
>>
>> So because of this, I think it's a really bad idea to conflate assume and assert meanings.
> 
> The article talks about this. It states that soft undefined behavior can lead to hard UB.
> 
> I don't think it tries to argue this is OK.

I think you're refering to the strlen example on page 4? It is true that soft UB **may** lead to hard UB in general. If this is caused by application code, as in that example, it is something @safe has to (and can) take care of.

In application code, soft UB may be limited to local effects. Consider an online banking application showing the current date somewhere: If some code manages to assign 25 to the hours part of the date, that certainly violates some semantic guarantees. But if the date is only shown to the user, the worst thing that'll happen is that the user may see an incorrect date. And if all code is @safe, we're still guaranteed not to get memory corruption (hard UB).

But now if I add an assert(day <= 24) somewhere with best defensive programming intentions, the compiler may turn that into hard UB now. And hard UB could even include memory corruption setting your account balance to a negative value.


As assert "[...] checks (along with any action taken should a check fail) are, however,never a part of a function’s contract, cannot be relied upon by the clients..." they subvert @safe checking when used as assumes:

An example:

void a(int n) @safe
{
    assert (n > 0);
}

void b() @safe
{
    a(-1);
}

The compiler can not prevent the a(-1) call in b, as it does not know the semantic invariant demanded by a(). However, in a(), the assert (n > 0) get's ransformed to assume(n > 0) in release mode, which will lead to safe code causing hard UB.

Without assume, points where soft UB can turn into hard UB should all be covered by @safe (at least for memory corruption).


To summarize and rephrase this: the encapsulation unit of @safe is the function boundary. The parameter definitions (including this pointer) and return type and their modifiers are the information the compiler uses to implement it's safety checks. For all possible combinations of these input values, the @safe function may not cause memory corruption.

asserts then narrow the function contract by describing semantic limitations and @safe is not able to enforce this, as this information is not part of the function signature. This is fine, as @safe will still prevent soft UB from escalating to hard UB. But if you now use assert() as assume(), you introduce hard UB for some possible input combinations, subverting @safe guarantees.



> [...]
I agree with everything else you said ;-)





-- 
Johannes
June 13, 2020
Am Sat, 13 Jun 2020 15:04:45 +0000 schrieb Stanislav Blinov:

> On Saturday, 13 June 2020 at 08:52:18 UTC, Johannes Pfau wrote:
> 
>> 
------------------------------------------------------------------------
>> /**
>>  * returns (1 + 2 + ... + n) / n */
>> double sumDiv(ubyte n) @safe {
>>     double accum = 0;
>>
>>     for (size_t i = 0; i <= n; i++)
>>     {
>>         accum += i;
>>         if (i == n)
>>             return accum / i;
>>     }
>>     // DMD requires this to compile the code, cause it does not
>> detect
>>     // the above loop covers all possible inputs.
>>     // GDC and LDC optimizers remove this assert.
>>     assert(0);
>> }
> 
>> For n = 0, this function causes soft (library) undefined behavior according to the article: 0.0/0 is defined to give NaN, but it's not a valid result for the algorithm above.
> 
> According to the algorithm, as presented, NaN is a valid result, and
> this so-called "soft" UB does not exist.
> It is the documentation of `sumDiv` that is incorrect, or rather

Well, then the documentation could just state that 0 is not a valid input. This is just for illustration, the point is that the algorithm does not want to make any assumption about the result for n=0, so it should not be called with n=0.

> If `sumDiv` expects *the callers* to *never* pass 0, it must either:
> 
> - throw an Exception (which its interface allows)
> - throw an Error (and become nothrow)
> - profess so via its interface:

If you can encode such information in types, it has the benefit of the compiler being able to check it statically. But this misses the point, the article is about semantic constraints which are not encoded in types.


>> So we apply defensive programming and add this contract:
>> ----------------------------
>> double sumDiv(ubyte n)
>> in (n > 0)
> 
> Dropping @safe? Then yes, there may be UB. But if @safe is to be kept, there's a case to be made to error out on that contract as being an invalid contract.

Just a copy and paste error. Keep the @safe, it compiles and can cause hard UB.

What is your definition for an invalid contract? Even if you explicitly did if (n == 0) return; after the assert, the compiler would just remove it. The assert is perfectly fine if the algorithm is specified to not be defined for n=0. The problem is using assert() like assume() to optimize: assume(x) is unsafe when x is a function parameter with no additional restrictions. How could the compiler even detect or flag such an 'invalid' contract?

> 
> There's a "but" though, which concerns that stipulation on `assert` which you quoted from the spec. But that "but" has to do with semantics of `sumDiv` being available during compilation, which is not always possible because compilers cop out to "opaque" object files too early (i.e. in case if it's a library function, compiled into a separate object file, with only the declaration present during compilation). However, since, according to Walter, I "misunderstand" what linkers do (nevermind the fact that it's very much still about compilation and not linking), I'll say no more on that particular subject. Suffice it to say that in such a case, indeed, conflating `assert` and `assume` is just wrong.

I'm not sure what you're arguing for? My whole point is that the spec ("the compiler is free to assume the assert expression is true and optimize subsequent code accordingly.") is unsound and subverts @safe code. The example just shows how that can happen.

It is true that this argument can be generalized to be explained by the fact that @safe enforces it's guarantees based on the function interface (which is available in .di), whereas semantic contracts (asserts in function bodys) may not be available. See my  reply to Jesse for some more thoughts about that.

-- 
Johannes
June 14, 2020
On 13.06.20 05:46, Andrei Alexandrescu wrote:
> A short and sweet paper:
> 
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf
> 
> It echoes Walter's opinions on contracts vs. runtime checking, to a tee.

I don't think so. For example, the paper makes a distinction between library and language UB.

> He has been derided on occasion in this group for such views. The paper is a good rationale from an independent source.

I am not sure if that includes me, because there are points in this paper whose negation Walter has occasionally used as a straw man to dismiss other points I made.
June 14, 2020
On Saturday, 13 June 2020 at 03:46:38 UTC, Andrei Alexandrescu wrote:
> A short and sweet paper:
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1743r0.pdf
>
> It echoes Walter's opinions on contracts vs. runtime checking, to a tee. He has been derided on occasion in this group for such views. The paper is a good rationale from an independent source.

Interesting paper.

They mention that some preconditions can't be checked, or aren't feasible/worthwhile to check. While this is true, I think in most circumstances it is possible (and quite useful) to limit the effects of out-of-contract function invocations (or "soft UB" as they call it) by providing "guaranteed postconditions", for lack of a better expression. To borrow their example, consider:

void sort(std::vector<int> &vec, Comparator comp);
// Precondition: $comp provides a strict weak ordering on integers.
// Postcondition: $vec is sorted with respect to the order dictacted by $comp.
// Guaranteed postconditions (even if preconditions are unfulfilled): No data is modified except that which is accessible through $vec, no exceptions are thrown, no memory corruption occurs and no language-level UB occurs.

In other words, invoking sort with an invalid comparator might mess up the given vector, but at least no "catastrophes" occur. Luckily, in D we have attributes (pure, @safe, nothrow) that basically impose such useful "guaranteed postconditions" and are mechanically checkable.
June 14, 2020
On 14.06.20 14:49, Mark wrote:
> 
> "guaranteed postconditions", for lack of a better expression.

Or more generally, it would actually be useful to have multiple pairs of pre- and postconditions, where each postcondition is established iff the corresponding precondition is established. This would also give a more natural way to inherit contracts. Your "guaranteed postcondition" would be a pre-/postcondition pair where the precondition is `true`.

> In other words, invoking sort with an invalid comparator might mess up the given vector, but at least no "catastrophes" occur. Luckily, in D we have attributes (pure, @safe, nothrow) that basically impose such useful "guaranteed postconditions" and are mechanically checkable.

I wish. Right now, a failing assertion causes _language_ UB, even though assertions are allowed in @safe code. It's a hole in the mechanical checking.
« First   ‹ Prev
1 2