Thread overview
-cov LOC is inadequate for 1 liner branching; need a metric based on branching
Feb 05, 2018
Timothee Cour
Feb 11, 2018
timotheecour
Feb 11, 2018
Walter Bright
Feb 11, 2018
Timothee Cour
Feb 11, 2018
Walter Bright
Feb 11, 2018
Timothee Cour
Feb 12, 2018
Walter Bright
Feb 12, 2018
Johan Engelen
February 05, 2018
just filed https://issues.dlang.org/show_bug.cgi?id=18377:

`dmd -cov -run main.d` shows 100% coverage; this is misleading since a branch is not taken:

```
void main(){
  int a;
  if(false) a+=10;
}
```

how about adding a `-covmode=[loc|branch]` that would allow either reporting LOC coverage or branch coverage?

branch coverage would report number of branches taken at least once / total number of branches.

It would not only address the above issue, but it is IMO a much better metric for coverage, less sensitive to 'overcounting' of large blocks in main code branches (size of code block in a branch is irrelevant as far as testing is concerned); eg:

```
int fun(int x){
  if(x<0)
    return fun2();  // accounts for 1 LOC and 1 branch
  // long block of non-branching code here... // accounts for 10 LOC
and 1 branch
}
```

NOTE: branches would include anything that allows more than 1 code
path (eg: switch, if)
February 11, 2018
On Monday, 5 February 2018 at 19:32:37 UTC, Timothee Cour wrote:
> just filed https://issues.dlang.org/show_bug.cgi?id=18377:

@wilzbach
> It probably makes sense to have a look at how other languages are dealing with this. Seems to be a common problem

some links:
* https://www.ncover.com/support/docs/extras/code-coverage/understanding-branch-coverage => "Why do we need the Branch Coverage metric?"
* http://coverage.readthedocs.io/en/coverage-4.4.2/branch.html => "coverage run --branch myprog.py"

February 11, 2018
On 2/5/2018 11:32 AM, Timothee Cour wrote:
> just filed https://issues.dlang.org/show_bug.cgi?id=18377:
> 
> `dmd -cov -run main.d` shows 100% coverage; this is misleading since a
> branch is not taken:
> 
> ```
> void main(){
>    int a;
>    if(false) a+=10;
> }
> ```

Consider how -cov works today:

  2|  x = 3; y = 4;
  1|  ++x;

The first line has a count of 2, because there are two statements and each contributes one increment to the line.

  1|  x = 3; // reference count
  2|  if (true && false) c;
  3|  if (true && true) c;
  1|  if (false && b) c;

The sequence points are each counted separately. So, by comparing with the 'reference count', you can see which sequence points are executed. Also, if one finds that unattractive, the code can be organized like:

  1|  if (true &&
  1|      false)
  0|     c;

and the separate counts will be revealed instead of aggregated.

I agree that this is not ideal, however:

1. it works
2. it is simple and robust
3. the display to the user is simple
4. it's easy to aggregate multiple runs together with simple text processing code
5. one can 'fix' it with a stylistic change in the formatting of the source code

Any further improvement would be a large increase in complexity of the implementation, and I don't know of reasonable way to present this to the user in a textual format.

Is it worth it? I don't think so. Like builtin unittests, the big win with -cov is it is *trivial* to use, which encourages its adoption. It's a 99% solution, with 99% of the benefits, with 1% of the implementation effort. We should be expending effort elsewhere than putting an additional 99% effort to squeeze out that last 1% of benefit.
February 11, 2018
I think you're missing my main point: it's explained here
https://www.ncover.com/support/docs/extras/code-coverage/understanding-branch-coverage
but the gist is that line based coverage is over-counting:
```
if(A)
  // 100 lines of code
else
  // 1 line of code
```
gives a line coverage of ~ 99% vs a branch coverage of ~50%
(assuming `else` branch never covered in unittests)

What matters as far as bugs are concerned is that 50% of cases are
covered. Increasing the size of the `if(A)` branch increases line
coverage (which is irrelevant) but not branch coverage.


On Sun, Feb 11, 2018 at 1:32 PM, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 2/5/2018 11:32 AM, Timothee Cour wrote:
>>
>> just filed https://issues.dlang.org/show_bug.cgi?id=18377:
>>
>> `dmd -cov -run main.d` shows 100% coverage; this is misleading since a branch is not taken:
>>
>> ```
>> void main(){
>>    int a;
>>    if(false) a+=10;
>> }
>> ```
>
>
> Consider how -cov works today:
>
>   2|  x = 3; y = 4;
>   1|  ++x;
>
> The first line has a count of 2, because there are two statements and each contributes one increment to the line.
>
>   1|  x = 3; // reference count
>   2|  if (true && false) c;
>   3|  if (true && true) c;
>   1|  if (false && b) c;
>
> The sequence points are each counted separately. So, by comparing with the 'reference count', you can see which sequence points are executed. Also, if one finds that unattractive, the code can be organized like:
>
>   1|  if (true &&
>   1|      false)
>   0|     c;
>
> and the separate counts will be revealed instead of aggregated.
>
> I agree that this is not ideal, however:
>
> 1. it works
> 2. it is simple and robust
> 3. the display to the user is simple
> 4. it's easy to aggregate multiple runs together with simple text processing
> code
> 5. one can 'fix' it with a stylistic change in the formatting of the source
> code
>
> Any further improvement would be a large increase in complexity of the implementation, and I don't know of reasonable way to present this to the user in a textual format.
>
> Is it worth it? I don't think so. Like builtin unittests, the big win with -cov is it is *trivial* to use, which encourages its adoption. It's a 99% solution, with 99% of the benefits, with 1% of the implementation effort. We should be expending effort elsewhere than putting an additional 99% effort to squeeze out that last 1% of benefit.
February 11, 2018
On 2/11/2018 1:55 PM, Timothee Cour wrote:
> I think you're missing my main point: it's explained here
> https://www.ncover.com/support/docs/extras/code-coverage/understanding-branch-coverage
> but the gist is that line based coverage is over-counting:
> ```
> if(A)
>    // 100 lines of code
> else
>    // 1 line of code
> ```
> gives a line coverage of ~ 99% vs a branch coverage of ~50%
> (assuming `else` branch never covered in unittests)
> 
> What matters as far as bugs are concerned is that 50% of cases are
> covered. Increasing the size of the `if(A)` branch increases line
> coverage (which is irrelevant) but not branch coverage.

I understand that point. The -cov coverage percentage is the line coverage, not the sequence point coverage. (Hence it will never be greater than 100%, and it will never underestimate the coverage. It would be more accurately termed an "upper bound" on the coverage.)

And yes, that makes it fuzzy in proportion to how many lines contain multiple sequence points. Eliminating that fuzziness does require a vast increase in the complexity of the -cov implementation.
February 11, 2018
> -cov coverage percentage is the line coverage, not the sequence point coverage [...] makes it fuzzy in proportion to how many lines contain multiple sequence points

Based on your comment I'm pretty sure you're still not getting my point, so apologies if I was unclear and let me try to explain better:

it's not about `line coverage` vs `sequence point coverage`, as that difference is not very large (indeed, just 'fuzzy').

It's about `line coverage` vs `branch coverage` (see exact definition in linked article), that difference is very large in practice.

here's my example, but more concretely explained:
```
void main(int a){
if(a>0){
  statement1(); // line 3
  statement2(); // line 4
...
  statement100(); // line 102
} else{
  statement101(); // line 104
}
}
unittest{ fun(1); }
```
* line coverage is around 99%.
* sequence point coverage is also 99% (and would be close to that if
some lines had multiple statements)
* branch coverage is 50%.

This is not an artificial example, this is the common case.

What's more, code instrumentation to enable branch coverage is not more complex to implement compared to line coverage (I would even venture it's less complex and less costly).



On Sun, Feb 11, 2018 at 2:32 PM, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 2/11/2018 1:55 PM, Timothee Cour wrote:
>>
>> I think you're missing my main point: it's explained here
>>
>> https://www.ncover.com/support/docs/extras/code-coverage/understanding-branch-coverage
>> but the gist is that line based coverage is over-counting:
>> ```
>> if(A)
>>    // 100 lines of code
>> else
>>    // 1 line of code
>> ```
>> gives a line coverage of ~ 99% vs a branch coverage of ~50%
>> (assuming `else` branch never covered in unittests)
>>
>> What matters as far as bugs are concerned is that 50% of cases are
>> covered. Increasing the size of the `if(A)` branch increases line
>> coverage (which is irrelevant) but not branch coverage.
>
>
> I understand that point. The -cov coverage percentage is the line coverage, not the sequence point coverage. (Hence it will never be greater than 100%, and it will never underestimate the coverage. It would be more accurately termed an "upper bound" on the coverage.)
>
> And yes, that makes it fuzzy in proportion to how many lines contain multiple sequence points. Eliminating that fuzziness does require a vast increase in the complexity of the -cov implementation.
February 11, 2018
On 2/11/2018 3:01 PM, Timothee Cour wrote:
> It's about `line coverage` vs `branch coverage` (see exact definition
> in linked article),

Ok, I see now, thanks for your patience and the clarification.


> that difference is very large in practice.

Only if one cares about the percentage metric. Pragmatically, I'm much more interested in exactly what was not covered rather than the percentage, and any good QA department should be focused on that rather than the percent metric.

BTW, when I wrote the -cov implementation, I threw in the % metric just for fun <g>.


> What's more, code instrumentation to enable branch coverage is not
> more complex to implement compared to line coverage (I would even
> venture it's less complex and less costly).

I'm not convinced of that, as you've still got the:

   if (a) b;

problem. Furthermore, in the first link they alluded to the thrown exception problem, with some gobbledy-gook answer on how they dealt with it (they punted). Sequence point coverage doesn't have that problem.

Next, in the python linked example, they found it necessary to add a bunch of pragma's to give needed information to the branch counter. I have a hard time seeing anyone using that unless they're forced to by management. The idea with dmd's coverage is you type -cov and you get a coverage report. Done.

The coverage report is a simple text file that is both readable by itself and trivially parse-able with a program for those who really want a GUI presentation of the results, or who want to integrate it with github (as has been done for us, yay!).

(Back in the olden daze, a coverage analyzer was a separate product that came with a 200 page manual. Who bought it? Managers. Who used it? Nobody. You'd see the manual on programmers' shelves, still in its shrink wrap.)

---

I'm actually glad to see people complain about -cov and want to improve it. It shows it is successful - people are using it! As Bjarne Stroustrup once said, there are two kinds of products - ones people complain about, and ones people don't use.
February 12, 2018
On Monday, 5 February 2018 at 19:32:37 UTC, Timothee Cour wrote:
> just filed https://issues.dlang.org/show_bug.cgi?id=18377:

If I may add something semi-relevant to the current discussion, have a look here:
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html

It is non-trivial to add llvm-cov support to LDC, but half the work is already done in LDC [*].
The current instrumentation implementation in Clang and LDC for PGO/coverage only instruments branch points, and is therefore faster than DMD's `-cov` (also present in LDC) but less precise: it assumes an exception-less execution flow. It's however quite easy to add precision and deal with exception-flow, at the cost of execution speed of course. It'll be interesting to investigate the cost/benefit! Let me know if you're interested in working on it.

A copy from Clang's documentation page, from which some cool ideas/capabilities can be glanced (needs monospaced font):
```
    1|   20|#define BAR(x) ((x) || (x))
                           ^20     ^2
    2|    2|template <typename T> void foo(T x) {
    3|   22|  for (unsigned I = 0; I < 10; ++I) { BAR(I); }
                                   ^22     ^20  ^20^20
    4|    2|}
------------------
| void foo<int>(int):
|      2|    1|template <typename T> void foo(T x) {
|      3|   11|  for (unsigned I = 0; I < 10; ++I) { BAR(I); }
|                                     ^11     ^10  ^10^10
|      4|    1|}
------------------
| void foo<float>(int):
|      2|    1|template <typename T> void foo(T x) {
|      3|   11|  for (unsigned I = 0; I < 10; ++I) { BAR(I); }
|                                     ^11     ^10  ^10^10
|      4|    1|}
------------------
```

cheers,
  Johan

[*] PGO instrumentation is the same, I can give some guidance to anybody who wants to work on it.