Jump to page: 1 2
Thread overview
Unit tests, asserts and contradictions in the spec
Feb 06, 2019
Jacob Carlborg
Feb 06, 2019
H. S. Teoh
Feb 06, 2019
Paul Backus
Feb 07, 2019
H. S. Teoh
Feb 07, 2019
Paul Backus
Feb 07, 2019
H. S. Teoh
Feb 07, 2019
John Colvin
Feb 07, 2019
John Colvin
Feb 07, 2019
H. S. Teoh
Feb 08, 2019
John Colvin
Feb 08, 2019
Jonathan M Davis
Feb 08, 2019
Atila Neves
Feb 10, 2019
Jacob Carlborg
Feb 10, 2019
Jacob Carlborg
Feb 10, 2019
Jonathan M Davis
Feb 10, 2019
Jacob Carlborg
Feb 07, 2019
jmh530
Feb 07, 2019
H. S. Teoh
Feb 08, 2019
Atila Neves
Feb 08, 2019
H. S. Teoh
February 06, 2019
In this pull request [1] there's some discussion related to the topic of this post, which I would now like to make more visible.

The spec mentions that if an assertion fails the program enters invalid state:

"1. The first AssignExpression must evaluate to true. If it does not, an Assert Failure has occurred and the program enters an Invalid State." [2]

And:

"Undefined Behavior: Once in an Invalid State the behavior of the continuing execution of the program is undefined." [2]

Currently `assert` throws an `AssertError` if an assertion fails. This is mentioned in the spec to be implementation defined:

"Implementation Defined: The behavior if the first AssertExpression is evaluated and is false is also typically set with a compiler switch and may include these options:
5. throwing the AssertError exception in the D runtime library" [2]

The spec also mentions that `assert` has different semantics in `unittest` blocks or `in` contract:

"3. AssertExpression has different semantics if it is in a unittest or in contract." [2]

And:

"3. Unlike AssertExpressions used elsewhere, the assert is not assumed to hold, and upon assert failure the program is still in a defined state." [3]

Subclasses of `Error` are not supposed to be caught because they represent unrecoverable runtime errors. I failed to find anything about this in the spec but the documentation for the `Error` class specifies that:

"This represents the category of Throwable objects that are not safe to catch and handle. In principle, one should not catch Error objects, as they represent unrecoverable runtime errors. Certain runtime guarantees may fail to hold when these errors are thrown, making it unsafe to continue execution after catching them." [4]

For as long as I can remember the D runtime has been catching Throwables (exceptions and errors) [5]. This might be ok since the application will terminate shortly after. The default unit test runner will also catch Throwables and continue running the tests in a test fails.

So the problem is that you're not supposed to catch Errors, but you need to do that to implement a unit test runner that will continue after a failed test. If we only look at what's specified for `assert`, it should be fine because according to the spec if an assertion fails in a `unittest` block the program is still in a valid state.

The specific problem for this pull request [1] is to allocate memory in a function that needs to be `@nogc`, `nothrow`, `pure` and `@safe`.

[1] https://github.com/dlang/druntime/pull/2479
[2] https://dlang.org/spec/expression.html#assert_expressions
[3] https://dlang.org/spec/unittest.html
[4] https://dlang.org/phobos/object.html#.Error
[5] https://github.com/dlang/druntime/blob/bc940316b4cd7cf6a76e34b7396de2003867fbef/src/rt/dmain2.d#L480-L484

-- 
/Jacob Carlborg
February 06, 2019
On Wed, Feb 06, 2019 at 10:02:05PM +0100, Jacob Carlborg via Digitalmars-d wrote: [...]
> The spec mentions that if an assertion fails the program enters invalid state:
> 
> "1. The first AssignExpression must evaluate to true. If it does not, an Assert Failure has occurred and the program enters an Invalid State." [2]
> 
> And:
> 
> "Undefined Behavior: Once in an Invalid State the behavior of the continuing execution of the program is undefined." [2]
[...]
> "3. AssertExpression has different semantics if it is in a unittest or in contract." [2]
> 
> And:
> 
> "3. Unlike AssertExpressions used elsewhere, the assert is not assumed to hold, and upon assert failure the program is still in a defined state." [3]
> 
> Subclasses of `Error` are not supposed to be caught because they represent unrecoverable runtime errors. I failed to find anything about this in the spec but the documentation for the `Error` class specifies that:
> 
> "This represents the category of Throwable objects that are not safe to catch and handle. In principle, one should not catch Error objects, as they represent unrecoverable runtime errors. Certain runtime guarantees may fail to hold when these errors are thrown, making it unsafe to continue execution after catching them." [4]
[...]

This is a big can o' worms I'm not sure we're ready to deal with just yet.  Generally, everywhere except in unittests we expect assert failures to basically terminate the program instantly, because it's in undefined state, which is UB by definition in the spec, and UB is bad because there's no guarantee any further operations will actually do what you think it does.  E.g., the assert failure could have been caused by a hacker triggering an invalid state and inserting shellcode into memory somewhere that the code is now about to access.  Continuing to run means the hacker will probably compromise your system.

In unittests, however, we've had the convention of using asserts for test failures since the early days.  Generally, UB caused by asserts in this context probably wouldn't have the same risk as above, but it does bring into question the wisdom of continuing to run after a unittest has failed.  The trouble is that the assert may have happened deep inside the call stack, and since it throws AssertError, that means stack unwinding code is potentially never run (or there is no unwinding code to begin with because the assert is inside a nothrow function -- it's allowed to throw non-Exception Errors from nothrow functions because it's assumed to be fatal).  So you may end up with unreleased resources and cleanups that ought to have happened but never did.  In the rare case where this matters, it can cause problems with running unittests.

Worst of all, however, is the fact that contract failures also use assert to signal the failure of a contract.  One might think this makes sense -- because a contract failure means the program has entered invalid state, so we're ready to abort anyway.  However, there's one nasty catch: overriding a base class function with an in-contract is defined by the spec to *loosen* the contract. I.e., the base class contract may fail but it's not assumed to be invalid state / UB land, because the derived class is allowed to loosen the contract.

But this means that if the contract failed because of some deeply-nested assert while evaluating the contract. E.g., the contract calls some function which in turns calls a bunch of stuff, and somewhere in there an assert fails. AssertError is thrown, possibly through nothrow boundaries and causing non-cleanup of various resources and states, before it reaches the base class contract that then forwards the problem to the derived class contract.  The derived contract says everything checks out, so we continue running.  In spite of the resource leak caused by the implicit thrown AssertError.


T

-- 
Notwithstanding the eloquent discontent that you have just respectfully expressed at length against my verbal capabilities, I am afraid that I must unfortunately bring it to your attention that I am, in fact, NOT verbose.
February 06, 2019
On Wednesday, 6 February 2019 at 21:40:38 UTC, H. S. Teoh wrote:
> [...]
>
> In unittests, however, we've had the convention of using asserts for test failures since the early days.  Generally, UB caused by asserts in this context probably wouldn't have the same risk as above, but it does bring into question the wisdom of continuing to run after a unittest has failed.  The trouble is that the assert may have happened deep inside the call stack, and since it throws AssertError, that means stack unwinding code is potentially never run [...]
>
> [...]
>
> But this means that if the contract failed because of some deeply-nested assert while evaluating the contract. E.g., the contract calls some function which in turns calls a bunch of stuff, and somewhere in there an assert fails. AssertError is thrown [...]

It seems like the fundamental problem here is the failure distinguish between a failed assertion *in the actual body* of a contract or unit test, and one that occurs deeper in the stack, in code called by the contract or unit test. Only assertions that are part of the unit test/contract body itself should have their failure treated specially, since they're the only ones written with the special unit test/contract assertion semantics in mind.

Probably the easiest way to distinguish different types of assertions would be to have them throw different AssertError subclasses (e.g., UnittestAssertError, ContractAssertError), which the runtime could then check for specifically. This would allow existing code that expects AssertError, like custom test runners, to continue to "work", with the aforementioned caveats.
February 06, 2019
On Wed, Feb 06, 2019 at 10:33:27PM +0000, Paul Backus via Digitalmars-d wrote:
> On Wednesday, 6 February 2019 at 21:40:38 UTC, H. S. Teoh wrote:
> > [...]
> > In unittests, however, we've had the convention of using asserts for
> > test failures since the early days.  Generally, UB caused by asserts
> > in this context probably wouldn't have the same risk as above, but
> > it does bring into question the wisdom of continuing to run after a
> > unittest has failed.  The trouble is that the assert may have
> > happened deep inside the call stack, and since it throws
> > AssertError, that means stack unwinding code is potentially never
> > run [...]
> > 
> > [...]
> > But this means that if the contract failed because of some
> > deeply-nested assert while evaluating the contract. E.g., the
> > contract calls some function which in turns calls a bunch of stuff,
> > and somewhere in there an assert fails. AssertError is thrown [...]
> 
> It seems like the fundamental problem here is the failure distinguish between a failed assertion *in the actual body* of a contract or unit test, and one that occurs deeper in the stack, in code called by the contract or unit test. Only assertions that are part of the unit test/contract body itself should have their failure treated specially, since they're the only ones written with the special unit test/contract assertion semantics in mind.
> 
> Probably the easiest way to distinguish different types of assertions would be to have them throw different AssertError subclasses (e.g., UnittestAssertError, ContractAssertError), which the runtime could then check for specifically. This would allow existing code that expects AssertError, like custom test runners, to continue to "work", with the aforementioned caveats.

Hmm. A common practice when writing complex contracts that are frequently used is to factor them out into common functions:

	auto myComplexFunc(Args...)(Args args)
	in {
		checkArgs(args);
	}
	do { ... }

	auto anotherComplexFunc(Args...)(Args args)
	in {
		checkArgs(args);
	}
	do { ... }

	void checkArgs(Args...)(Args args) {
		foreach (arg; args) {
			assert(someCondition(arg));
		}
	}

Since checkArgs is outside the contract scope, the compiler wouldn't know to emit code to throw ContractAssertError rather than AssertError.

Of course, one could argue that checkArgs ought to return bool, and the assert moved into the contract itself.  That's probably a reasonable thing to do, but existing code may not follow such a practice.


Furthermore, the resource leak problem is still not really fixed:

	nothrow unittest {
		auto res = acquireResources();
		scope(exit) res.release();

		void runTest(Resource r) nothrow {
			doSomething(r);
			assert(r.success);
		}

		// Presumably run through all test cases
		foreach (r; res) {
			runTest(r);
		}
	}

Since the nested helper function is marked nothrow, a failed assert throwing an AssertError or UnittestAssertError may bypass the unittest's unwinding code (the compiler assumes runTest never throws, so never installs stack unwinding code around it), which means the scope(exit) never triggers and the resource is leaked.

This is a relatively benign example; imagine if you have a unittest that calls alloca() and relies on cleanup code to adjust the stack pointer properly.  A failed assert bypassing stack unwinding will then corrupt your stack. (Of course, this is a problem with throwing Errors in general, not just for unittests. But it shows that throwing a UnittestAssertError instead of AssertError does not necessarily solve the problem.)


T

-- 
Try to keep an open mind, but not so open your brain falls out. -- theboz
February 07, 2019
On Thursday, 7 February 2019 at 01:04:15 UTC, H. S. Teoh wrote:
> [...]
>
> Of course, one could argue that checkArgs ought to return bool, and the assert moved into the contract itself.  That's probably a reasonable thing to do, but existing code may not follow such a practice.

Having a function call that's UB outside a contract but fine inside one seems pretty suspect to me, so I would definitely argue in favor of this change.

> [...]
>
> Since the nested helper function is marked nothrow, a failed assert throwing an AssertError or UnittestAssertError may bypass the unittest's unwinding code (the compiler assumes runTest never throws, so never installs stack unwinding code around it), which means the scope(exit) never triggers and the resource is leaked.

I guess the implication here is that assert shouldn't count as nothrow in unit tests? That won't help non-nested helper functions, of course, but they also won't throw UnittestAssertError (in this hypothetical), so that should be okay, from a safety perspective.

These don't seem like insurmountable problems, but the possibility that existing code might *rely* on AssertErrors from deep in the call stack being caught by the runtime is one I hadn't considered. Since there's no way to distinguish the legitimate uses of that "feature" from the actually-broken ones, it's hard to imagine a solution that fixes the latter without breaking the former.
February 07, 2019
On Thu, Feb 07, 2019 at 06:51:23AM +0000, Paul Backus via Digitalmars-d wrote:
> On Thursday, 7 February 2019 at 01:04:15 UTC, H. S. Teoh wrote:
[...]
> > Since the nested helper function is marked nothrow, a failed assert throwing an AssertError or UnittestAssertError may bypass the unittest's unwinding code (the compiler assumes runTest never throws, so never installs stack unwinding code around it), which means the scope(exit) never triggers and the resource is leaked.
> 
> I guess the implication here is that assert shouldn't count as nothrow in unit tests? That won't help non-nested helper functions, of course, but they also won't throw UnittestAssertError (in this hypothetical), so that should be okay, from a safety perspective.
> 
> These don't seem like insurmountable problems, but the possibility that existing code might *rely* on AssertErrors from deep in the call stack being caught by the runtime is one I hadn't considered. Since there's no way to distinguish the legitimate uses of that "feature" from the actually-broken ones, it's hard to imagine a solution that fixes the latter without breaking the former.

Yeah, this is why I said this was a can o' worms. There's probably a solution lurking in there somewhere, but it's complicated, has messy edge cases, and I'm not sure how likely it will be adopted.  It's all not so bad when you follow the "assert failure == immediately terminate program" paradigm strictly; but once you step outside of that, all sorts of problems await.

Of course, on the flip side, unittests that acquire global / external resources that need cleanup, etc., are IMNSHO a code smell. The code should be structured in such a way that every unittest is self-contained and does not affect global state, and especially not external OS state. (There's a place for such tests, but they're hardly *unittests* anymore.)  If the code can't be tested that way, then IMO it's suffers from bad design and needs to be refactored.

If all unittests were written this way, then your proposed solution would totally make sense.  Unfortunately, reality is a lot messier than that.  :-(


T

-- 
PNP = Plug 'N' Pray
February 07, 2019
On Thursday, 7 February 2019 at 15:43:34 UTC, H. S. Teoh wrote:
> On Thu, Feb 07, 2019 at 06:51:23AM +0000, Paul Backus via Digitalmars-d wrote:
>> On Thursday, 7 February 2019 at 01:04:15 UTC, H. S. Teoh wrote:
> [...]
>> > Since the nested helper function is marked nothrow, a failed assert throwing an AssertError or UnittestAssertError may bypass the unittest's unwinding code (the compiler assumes runTest never throws, so never installs stack unwinding code around it), which means the scope(exit) never triggers and the resource is leaked.
>> 
>> I guess the implication here is that assert shouldn't count as nothrow in unit tests? That won't help non-nested helper functions, of course, but they also won't throw UnittestAssertError (in this hypothetical), so that should be okay, from a safety perspective.
>> 
>> These don't seem like insurmountable problems, but the possibility that existing code might *rely* on AssertErrors from deep in the call stack being caught by the runtime is one I hadn't considered. Since there's no way to distinguish the legitimate uses of that "feature" from the actually-broken ones, it's hard to imagine a solution that fixes the latter without breaking the former.
>
> Yeah, this is why I said this was a can o' worms. There's probably a solution lurking in there somewhere, but it's complicated, has messy edge cases, and I'm not sure how likely it will be adopted.  It's all not so bad when you follow the "assert failure == immediately terminate program" paradigm strictly; but once you step outside of that, all sorts of problems await.
>
> Of course, on the flip side, unittests that acquire global / external resources that need cleanup, etc., are IMNSHO a code smell. The code should be structured in such a way that every unittest is self-contained and does not affect global state, and especially not external OS state. (There's a place for such tests, but they're hardly *unittests* anymore.)  If the code can't be tested that way, then IMO it's suffers from bad design and needs to be refactored.
>
> If all unittests were written this way, then your proposed solution would totally make sense.  Unfortunately, reality is a lot messier than that.  :-(
>
>
> T

A fork-based unittest runner would solve some problems without having to restart the process (could be expensive startup) or have people re-write their tests to use a new type of assert.

The process is started, static constructors are run setting up anything needed, the process is then forked & the tests run in the fork until death from success or assert, somehow communicating the index of the last successful test to the runner (let's call that tests[i]). Then if i < test.length - 1 do another fork and start from tests[i + 2] to skip the one that failed.

There are probably corner cases where you wouldn't want this behavior, but I can't think of one off the top of my head.
February 07, 2019
On Thursday, 7 February 2019 at 16:49:38 UTC, John Colvin wrote:
> There are probably corner cases where you wouldn't want this behavior, but I can't think of one off the top of my head.

I have no idea what the behaviour of the fork alternatives is like on Windows, so it might not fly there.
February 07, 2019
On Thu, Feb 07, 2019 at 04:49:38PM +0000, John Colvin via Digitalmars-d wrote: [...]
> A fork-based unittest runner would solve some problems without having to restart the process (could be expensive startup) or have people re-write their tests to use a new type of assert.
> 
> The process is started, static constructors are run setting up anything needed, the process is then forked & the tests run in the fork until death from success or assert, somehow communicating the index of the last successful test to the runner (let's call that tests[i]). Then if i < test.length - 1 do another fork and start from tests[i + 2] to skip the one that failed.
> 
> There are probably corner cases where you wouldn't want this behavior, but I can't think of one off the top of my head.

One case is where the unittests depend on the state of the filesystem, e.g., they all write to the same temp file as part of the testing process. I don't recommend this practice, though, for obvious reasons.

(In fact, I'm inclined to say unittests should *not* depend on the filesystem at all; IMNSHO `import std.stdio;` is a code smell in unittests. The code should be refactored such that it is testable without involving the actual filesystem (simulated filesystems can be used for testing FS-dependent logic, if the API is designed properly). But I bet most D unittests aren't written up to that standard!)


T

-- 
Talk is cheap. Whining is actually free. -- Lars Wirzenius
February 07, 2019
On Thursday, 7 February 2019 at 15:43:34 UTC, H. S. Teoh wrote:
> [snip]
>
> Of course, on the flip side, unittests that acquire global / external resources that need cleanup, etc., are IMNSHO a code smell.

That's a good point. I should probably just stick pure on every unittest block by default.
« First   ‹ Prev
1 2