July 29, 2016
On Friday, 29 July 2016 at 07:01:35 UTC, Walter Bright wrote:
> On 7/28/2016 11:07 PM, Jack Stouffer wrote:
>> you're making a decision on the user's behalf that coverage % is
>> unimportant without knowing their circumstances.
>
> Think of it like the airspeed indicator on an airplane. There is no right or wrong airspeed. The pilot reads the indicated value, interprets it in the context of what the other instruments say, APPLIES GOOD JUDGMENT, and flies the airplane.
>
> You won't find many pilots willing to fly without one.

Maybe it would help to give more than one value, e.g. the actual code coverage, i.e. functions and branches executed in the actual program, and commands executed in the unit test. So you would have

100% code coverage
95% total commands executed (but don't worry!)
July 29, 2016
On 7/29/16 3:01 AM, Walter Bright wrote:
> On 7/28/2016 11:07 PM, Jack Stouffer wrote:
>> you're making a decision on the user's behalf that coverage % is
>> unimportant without knowing their circumstances.
>
> Think of it like the airspeed indicator on an airplane. There is no
> right or wrong airspeed. The pilot reads the indicated value, interprets
> it in the context of what the other instruments say, APPLIES GOOD
> JUDGMENT, and flies the airplane.
>
> You won't find many pilots willing to fly without one.
>

What if the guage was air-speed added to fuel level, and you didn't get a guage for each individually?

-Steve
July 29, 2016
On Friday, 29 July 2016 at 07:01:35 UTC, Walter Bright wrote:
> The pilot reads the indicated value, interprets it in the context of what the other instruments say, APPLIES GOOD JUDGMENT, and flies the airplane.

Continuing with this metaphor, in this situation you're not the pilot making the judgement, you're the aerospace engineer deciding that the speedometer in the plane can be off by several hundred m/s and it's no big deal.

Yes, every measurement in the real world has a margin of error. But, since we're dealing with computers this is one of the rare situations where a perfect number can actually be obtained and presented to the user.

> There is no right or wrong airspeed.

The right one is the actual speed of the plane and the wrong one is every other number.
July 30, 2016
On Friday, 29 July 2016 at 05:49:01 UTC, Jonathan M Davis wrote:
> On Thursday, July 28, 2016 22:12:58 Walter Bright via Digitalmars-d wrote:
>> As soon as we start taking the % coverage too seriously, we are in trouble. It's never going to be cut and dried what should be tested and what is unreasonable to test, and I see no point in arguing about it.
>>
>> The % is a useful indicator, that is all. It is not a substitute for thought.
>>
>> As always, use good judgement.
>
> True, but particularly when you start doing stuff like trying to require that modules have 100% coverage - or that the coverage not be reduced by a change - it starts mattering - especially if it's done with build tools. The current situation is far from the end of the world, but I definitely think that we'd be better off if we fixed some of these issues so that the percentage reflected the amount of the actual code that's covered rather than having unit tests, assert(0) statements, invariants, etc. start affecting code coverage when they aren't what you're trying to cover at all.
>
> - Jonathan M Davis

Yep especially because I think we agree that "coverage [should] not be reduced by a change", except there is a pretty good reason to do so?

It could have the negative effect that people won't use such techniques anymore (e.g. debugging in unittests, invariants, ...) as they might develop an evil smell.
August 04, 2016
On Thursday, 28 July 2016 at 23:14:42 UTC, Walter Bright wrote:
> On 7/28/2016 3:15 AM, Johannes Pfau wrote:
>> And as a philosophical question: Is code coverage in unittests even a
>> meaningful measurement?
>
> Yes. I've read all the arguments against code coverage testing. But in my usage of it for 30 years, it has been a dramatic and unqualified success in improving the reliability of shipping code.

Have you read this?

http://www.linozemtseva.com/research/2014/icse/coverage/

Atila
August 04, 2016
On 8/4/2016 1:13 AM, Atila Neves wrote:
> On Thursday, 28 July 2016 at 23:14:42 UTC, Walter Bright wrote:
>> On 7/28/2016 3:15 AM, Johannes Pfau wrote:
>>> And as a philosophical question: Is code coverage in unittests even a
>>> meaningful measurement?
>>
>> Yes. I've read all the arguments against code coverage testing. But in my
>> usage of it for 30 years, it has been a dramatic and unqualified success in
>> improving the reliability of shipping code.
>
> Have you read this?
>
> http://www.linozemtseva.com/research/2014/icse/coverage/

I've seen the reddit discussion of it. I don't really understand from reading the paper how they arrived at their test suites, but I suspect that may have a lot to do with the poor correlations they produced.

Unittests have uncovered lots of bugs for me, and code that was unittested had far, far fewer bugs showing up after release. The bugs that did turn up tended to be based on misunderstandings of the requirements.

For example, the Warp project was fully unittested from the ground up. I attribute that as the reason for the remarkably short development time for it and the near complete absence of bugs in the shipped product.

Unittests also enabled fearless rejiggering of the data structures trying to make Warp run faster. Not-unittested code tends to stick with the first design out of fear.
August 04, 2016
On Thursday, 4 August 2016 at 10:24:39 UTC, Walter Bright wrote:
> On 8/4/2016 1:13 AM, Atila Neves wrote:
>> On Thursday, 28 July 2016 at 23:14:42 UTC, Walter Bright wrote:
>>> On 7/28/2016 3:15 AM, Johannes Pfau wrote:
>>>> And as a philosophical question: Is code coverage in unittests even a
>>>> meaningful measurement?
>>>
>>> Yes. I've read all the arguments against code coverage testing. But in my
>>> usage of it for 30 years, it has been a dramatic and unqualified success in
>>> improving the reliability of shipping code.
>>
>> Have you read this?
>>
>> http://www.linozemtseva.com/research/2014/icse/coverage/
>
> I've seen the reddit discussion of it. I don't really understand from reading the paper how they arrived at their test suites, but I suspect that may have a lot to do with the poor correlations they produced.

I think I read the paper around a year ago, my memory is fuzzy. From what I remember they analysed existing test suites. What I do remember is having the impression that it was done well.

> Unittests have uncovered lots of bugs for me, and code that was unittested had far, far fewer bugs showing up after release. <snip>

No argument there, as far as I'm concerned, unit tests = good thing (TM).

It think measuring unit test code coverage is a good idea, but only so it can be looked at to find lines that really should have been covered but weren't. What I take issue with is two things:

1. Code coverage metric targets (especially if the target is 100%).  This leads to inane behaviours such as "testing" a print function (which itself was only used in testing) to meet the target. It's busywork that accomplishes nothing.

2. Using the code coverage numbers as a measure of unit test quality. This was always obviously wrong to me, I was glad that the research I linked to confirmed my opinion, and as far as I know (I'd be glad to be proven wrong), nobody else has published anything to convince me otherwise.

Code coverage, as a measure of test quality, is fundamentally broken. It measures coupling between the production code and the tests, which is never a good idea. Consider:

int div(int i, int j) { return i + j; }
unittest { div(3, 2); }

100% coverage, utterly wrong. Fine, no asserts is "cheating":

int div(int i, int j) { return i / j; }
unittest { assert(div(4, 2) == 2); }

100% coverage. No check for division by 0. Oops.

This is obviously a silly example, but the main idea is: coverage doesn't measure the quality of the sentinel values. Bad tests serve only as sanity tests, and the only way I've seen so far to make sure the tests themselves are good is mutant testing.



Atila

August 04, 2016
On 8/4/2016 12:04 PM, Atila Neves wrote:
> What I take issue with is two things:
>
> 1. Code coverage metric targets (especially if the target is 100%).  This leads
> to inane behaviours such as "testing" a print function (which itself was only
> used in testing) to meet the target. It's busywork that accomplishes nothing.

Any metric that is blindly followed results in counterproductive edge cases. It doesn't mean the metric is pointless, however, it just means that "good judgment" is necessary.

I don't think anyone can quote me on a claim that 100% coverage is required. I have said things like uncovered code requires some sort of credible justification. Being part of the test harness is a credible justification, as are assert(0)'s not being executed.

Leaving the multi-codepoint Unicode pathway untested probably has no credible justification.


> 2. Using the code coverage numbers as a measure of unit test quality. This was
> always obviously wrong to me, I was glad that the research I linked to confirmed
> my opinion, and as far as I know (I'd be glad to be proven wrong), nobody else
> has published anything to convince me otherwise.
>
> Code coverage, as a measure of test quality, is fundamentally broken. It
> measures coupling between the production code and the tests, which is never a
> good idea. Consider:

All that means is code coverage is necessary, but not sufficient. Even just executing code and not testing the results has *some* value, in that it verifies that the code doesn't crash, and that it is not dead code.

----

One of the interesting differences between D and C++ is that D requires template bodies to have valid syntax, while C++ requires template bodies to be both syntactically correct and partially semantically correct. The justification for the latter is so that the user won't see semantic errors when instantiating templates, but I interpret that as "so I can ship templates that were never instantiated", a justification that is unsupportable in my view :-)
August 04, 2016
In adding some overflow detection to Phobos, I discovered that some allocations were never called by the unittests. Adding a unittest for those paths, I discovered those paths didn't work at all for any cases.

I'm not giving up coverage testing anytime soon, regardless of what some study claims :-)
August 05, 2016
On Friday, 5 August 2016 at 02:37:35 UTC, Walter Bright wrote:
> In adding some overflow detection to Phobos, I discovered that some allocations were never called by the unittests. Adding a unittest for those paths, I discovered those paths didn't work at all for any cases.
>
> I'm not giving up coverage testing anytime soon, regardless of what some study claims :-)

:)

Like I said, measuring coverage is important, what isn't is using it as a measure of the quality of the tests themselves. The other important thing is to decide whether or not certain lines are worth covering, which of course you can only do if you have the coverage data!

Mutant testing could have found those code paths you just mentioned, BTW: you'd always get surviving mutants for those paths.

Atila