Jump to page: 1 24  
Page
Thread overview
What does Coverity/clang static analysis actually do?
Oct 01, 2009
Walter Bright
Oct 01, 2009
BCS
Oct 01, 2009
Leandro Lucarella
Oct 09, 2009
Leandro Lucarella
Oct 01, 2009
Walter Bright
Oct 01, 2009
Lutger
Oct 01, 2009
Walter Bright
Oct 02, 2009
Jérôme M. Berger
Oct 17, 2009
asd
Oct 01, 2009
Bill Baxter
Oct 03, 2009
Christopher Wright
Oct 01, 2009
David Gileadi
Oct 01, 2009
Walter Bright
Oct 01, 2009
Lutger
Oct 01, 2009
Walter Bright
Oct 01, 2009
grauzone
Oct 02, 2009
Walter Bright
Oct 02, 2009
BCS
Oct 02, 2009
Christopher Wright
Oct 03, 2009
BCS
Oct 01, 2009
Brad Roberts
Oct 02, 2009
Walter Bright
Oct 02, 2009
Brad Roberts
Oct 02, 2009
Walter Bright
Oct 02, 2009
Brad Roberts
Oct 02, 2009
BCS
Oct 02, 2009
Walter Bright
Oct 02, 2009
Rainer Schuetze
Oct 02, 2009
Walter Bright
Oct 03, 2009
Rainer Deyke
October 01, 2009
I've been interested in having the D compiler take advantage of the flow analysis in the optimizer to do some more checking. Coverity and clang get a lot of positive press about doing this, but any details of exactly *what* they do have been either carefully hidden (in Coverity's case) or undocumented (clang's page on this is blank). All I can find is marketing hype and a lot of vague handwaving.

Here is what I've been able to glean from much time spent with google on what they detect and my knowledge of how data flow analysis works:

1. dereference of NULL pointers (all reaching definitions of a pointer are NULL)

2. possible dereference of NULL pointers (some reaching definitions of a pointer are NULL)

3. use of uninitialized variables (no reaching definition)

4. dead assignments (assignment of a value to a variable that is never subsequently used)

5. dead code (code that can never be executed)

6. array overflows

7. proper pairing of allocate/deallocate function calls

8. improper use of signed integers (who knows what this actually is)


Frankly, this is not an impressive list. These issues are discoverable using standard data flow analysis, and in fact are part of Digital Mars' optimizer. Here is the current state of it for dmd:

1. Optimizer discovers it, but ignores the information. Due to the recent thread on it, I added a report for it for D (still ignored for C). The downside is I can no longer use *cast(char*)0=0 to drop me into the debugger, but I can live with that as assert(0) will do the same thing.

2. Optimizer collects the info, but ignores this, because people are annoyed by false positives.

3. Optimizer detects and reports it. Irrelevant for D, though, because variables are always initialized. The =void case is rare enough to be irrelevant.

4. Dead assignments are automatically detected and removed. I'm not convinced this should be reported, as it can legitimately happen when generating source code. Generating false positives annoy the heck out of users.

5. Dead code is detected and silently removed by optimizer. dmd front end will complain about dead code.

6. Arrays are solidly covered by a runtime check. There is code in the optimizer to detect many cases of overflows at compile time, but the code is currently disabled because the runtime check covers 100% of the cases.

7. Not done because it requires the user to specify what the paired functions are. Given this info, it is rather simple to graft onto existing data flow analysis.

8. D2 has acquired some decent checking for this.


There's a lot of hoopla about these static checkers, but I'm not impressed by them based on what I can find out about them. What do you know about what these checkers do that is not on this list? Any other kinds of checking that would be great to implement?

D's dead code checking has been an encouraging success, and I think people will like the null dereference checks. More along these lines will be interesting.
October 01, 2009
Hello Walter,

> Frankly, this is not an impressive list. These issues are discoverable
> using standard data flow analysis, and in fact are part of Digital
> Mars' optimizer. Here is the current state of it for dmd:
> 
> 1. Optimizer discovers it, but ignores the information. Due to the
> recent thread on it, I added a report for it for D (still ignored for
> C). The downside is I can no longer use *cast(char*)0=0 to drop me
> into the debugger, but I can live with that as assert(0) will do the
> same thing.

nice

> 4. Dead assignments are automatically detected and removed. I'm not
> convinced this should be reported, as it can legitimately happen when
> generating source code. Generating false positives annoy the heck out
> of users.

vote++ on silent

> 6. Arrays are solidly covered by a runtime check. There is code in the
> optimizer to detect many cases of overflows at compile time, but the
> code is currently disabled because the runtime check covers 100% of
> the cases.


I'd advocate for any compile time checks that never generate false positives running even if the runtime checks would get it also. I'd rather known sooner than later.


October 01, 2009
Walter Bright, el  1 de octubre a las 11:21 me escribiste:
> I've been interested in having the D compiler take advantage of the flow analysis in the optimizer to do some more checking. Coverity and clang get a lot of positive press about doing this, but any details of exactly *what* they do have been either carefully hidden (in Coverity's case) or undocumented (clang's page on this is blank). All I can find is marketing hype and a lot of vague handwaving.

Clang is still in development. It will be released with LLVM in the upcoming 2.6 version for the first time. The C and objective C support is supposed to be fairly mature though, but I guess documenting the static analyzer is not very high in their priority list (maybe this will change after the release).

You can ask in the Clang ML, Clang developers (and LLVM in general) are
very receptive.

> 1. Optimizer discovers it, but ignores the information. Due to the
> recent thread on it, I added a report for it for D (still ignored
> for C). The downside is I can no longer use *cast(char*)0=0 to drop
> me into the debugger, but I can live with that as assert(0) will do
> the same thing.

There are breakpoints too, you know? =P

> There's a lot of hoopla about these static checkers, but I'm not impressed by them based on what I can find out about them. What do you know about what these checkers do that is not on this list? Any other kinds of checking that would be great to implement?
> 
> D's dead code checking has been an encouraging success, and I think people will like the null dereference checks. More along these lines will be interesting.

You can take a look at sparse too. AFAIK is used by the Linux kernel: http://www.kernel.org/pub/software/devel/sparse/

And Cppcheck: http://sourceforge.net/apps/mediawiki/cppcheck/index.php?title=Main_Page

There is a list of tools at Wikipedia too: http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis


-- 
Leandro Lucarella (AKA luca)                      http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
They love me like I was a brother
They protect me, listen to me
They dug me my very own garden
Gave me sunshine, made me happy

Nice dream, nice dream
Nice dream
October 01, 2009
Nick Sabalausky wrote:
> "Walter Bright" <newshound1@digitalmars.com> wrote in message 
>> 2. possible dereference of NULL pointers (some reaching definitions of a pointer are NULL)
>> 2. Optimizer collects the info, but ignores this, because people are annoyed by false positives.
>>
> 
> If you mean something like this:
> 
> Foo f;
> if(cond)
>     f = new Foo();
> f.bar();
> 
> Then I *want* the compiler to tell me. C# does this and I've never been annoyed by it, in fact I've always appreciated it. I'm not aware of any other C# user that has a problem with that either. If that's not what you mean though, then could you elaborate?

The problem crops up when there are two connected variables:

  void foo(bool flag)
  {
    char* p = null;
    if (flag)
	p = "hello";
    ...
    if (flag)
	bar(*p);
  }

The code is logically correct, there is no null pointer dereference possible. However, the data flow analysis will see the *p and see two reaching definitions for p: null and "hello", even though only one actually reaches.

Hence the false positive. To eliminate the false error report, the user would have to insert a redundant null check.

Does this happen in practice? Yes.
October 01, 2009
On Thu, Oct 1, 2009 at 1:38 PM, Nick Sabalausky <a@a.a> wrote:

>> 2. possible dereference of NULL pointers (some reaching definitions of a
>> pointer are NULL)
>> 2. Optimizer collects the info, but ignores this, because people are
>> annoyed by false positives.
>>
>
> If you mean something like this:
>
> Foo f;
> if(cond)
>    f = new Foo();
> f.bar();

Probably more like

Foo f;
createAndSetOutParam(f);
f.bar();

or

Foo f;
if (a > 10) { f = new Foo(10); }
...
if (a > 10) {
   f.bar();
}

Or some complex intertwined gotos or any number of other things that would be hard for the compiler to verify without a lot of effort.

I think I'd want to try this out as an optional warning for a while to see how annoying the false positives really are in practice.  Still, it seems like there's a subset of cases where it can be proved with 100% certainty that the code is wrong.  Just reporting those cases would be a big win, I think.

> Then I *want* the compiler to tell me. C# does this and I've never been annoyed by it, in fact I've always appreciated it. I'm not aware of any other C# user that has a problem with that either. If that's not what you mean though, then could you elaborate?

How does C# handle the cases above?


>> 6. array overflows
>> 6. Arrays are solidly covered by a runtime check. There is code in the
>> optimizer to detect many cases of overflows at compile time, but the code
>> is currently disabled because the runtime check covers 100% of the cases.
>>
>
> I'm puzzled by why you would prefer to leave this entirely runtime when some of it can be detected at compile-time. Normally you agree that catching something at compile-time is better whenever possible. So shouldn't the array overflows that can be detected at compile-time be detected at compile-time? I would certainly prefer that.

I agree. Run-time checks are only useful if the code is on a path that actually runs.  Compile-time checks are useful even if the code is on the branch not taken.

--bb
October 01, 2009
Walter Bright wrote:

[snip]

We need to have passable flow control to typecheck creation of immutable objects.

Andrei
October 01, 2009
Walter Bright wrote:
> There's a lot of hoopla about these static checkers, but I'm not impressed by them based on what I can find out about them. What do you know about what these checkers do that is not on this list? Any other kinds of checking that would be great to implement?

I'm not familiar with C/C++ static checkers.

Eclipse's Java compiler has decent support for code checks.  I'm copying the list of items it can check here (except for those that seem Java-specific), in case it's of interest.  For each of the below, you can choose whether it's an error, a warning, or ignored.  I've included their defaults.

Code Style:

Non-static access to static member: Warning
Indirect access to static member: Ignored
Unqualified access to instance field: Ignored
Undocumented empty block: Ignored
Method with a constructor name: Warning
Parameter assignment: Ignored

Potential programming problems:

Assignment has no effect (e.g. 'x = x'): Warning
Possible accidental boolean assignment (e.g. 'if (a = b)'): Ignored
'finally' does not complete normally: Warning
Empty statement: Ignored
Hidden catch block: Warning
Inexact type match for vararg argments: Warning
Enum type constant not covered on 'switch': Ignored
'switch' case fall-through: Ignored
Null pointer access: Warning
Potential null pointer access: Ignored
Comparing identical values ('x == x'): Warning
Missing synchronized modifier on inherited method: Ignored
Class overrides 'equals()' but not 'hashCode()': Ignored
Dead code (e.g. 'if (false)'): Warning

Name shadowing and conflicts:

Field declaration hides another field or variable: Ignored
Local variable declaration hides another field or variable: Ignored
	(If not ignored) Include constructor or setter method parameters
Type parameter hides another type: Warning
Method does not override package visible method: Warning
Interface method conflicts with protected 'Object' method: Warning

Unnecessary code:

Local variable is never read: Warning
Parameter is never read: Ignored
	(If not ignored) Ignore in overriding and implementing methods
Unused import: Warning
Unused local or private member: Warning
Redundant null check: Ignored
Unnecessary 'else' statement: Ignored
Unnecessary cast or 'instanceof' operation: Ignored
Unused 'break' or 'continue' label: Warning
Redundant super interface: Ignored

I understand DMD's policy on warnings, but some of the above checks are reasons why I've grown to like some warnings.  Of course it helps that Eclipse's compiler is most often used with its IDE.

-Dave
October 01, 2009
Nick Sabalausky wrote:
>> 3. use of uninitialized variables (no reaching definition)
>> 3. Optimizer detects and reports it. Irrelevant for D, though, because variables are always initialized. The =void case is rare enough to be irrelevant.
>>
> 
> D variable default-initialization is absolutely no different from your scenario of a programmer blindly tossing in =0 to shut up a compiler, *except* that the programmer is never even given the opportunity to do the right thing. This is *bad*. I *want* variables that haven't meen manually inited to be statically treated as uninited. C# does this and it works great.

The difference is the maintenance programmer won't be left puzzling why there is an explicit assignment to the variable that is never used. The point to default initialization is consistency in the resulting behavior. Also, the optimizer will remove nearly all of the default initializers if they are dead assignments.

Anyhow, I think this issue was beaten to death in the previous thread on null dereference. I don't wish to divert this thread into rediscussing it, but rather stick with what other kinds of bug-detecting data flow analyses there are?


>> 4. dead assignments (assignment of a value to a variable that is never subsequently used)
>> 4. Dead assignments are automatically detected and removed. I'm not convinced this should be reported, as it can legitimately happen when generating source code. Generating false positives annoy the heck out of users.
>>
> 
> I'll agree with you here. But it might be nice to have an option to just simply report them anyway for when the programmer wants to see if there's any of these around that he can clean up.

I congenitally dislike optional warnings, as I've pontificated at length about here before <g>. The problem is it makes for a wishy-washy definition of the language, and muddies what is legal versus illegal. D needs to advance the state of the art with clear thinking about what is legal and what isn't. Warnings and false positives are failures of language design.


>> 6. array overflows
>> 6. Arrays are solidly covered by a runtime check. There is code in the optimizer to detect many cases of overflows at compile time, but the code is currently disabled because the runtime check covers 100% of the cases.
>>
> 
> I'm puzzled by why you would prefer to leave this entirely runtime when some of it can be detected at compile-time. Normally you agree that catching something at compile-time is better whenever possible. So shouldn't the array overflows that can be detected at compile-time be detected at compile-time? I would certainly prefer that.

Because when I implemented it I discovered that a compile time check rarely (if ever) caught actual bugs, the real problems could only be caught at runtime. Normally one uses things like foreach which automatically generate code that won't array overflow.



I generally regard as evil:

1. bugs that have erratic, random, irreproducible symptoms

2. source code that doesn't have an obvious purpose (this includes code inserted solely to suppress a false positive or warning)

I regard as undesirable:

3. wishy-washy warnings

4. overzealous compiler messages that are more akin to nagging than finding actual bugs
October 01, 2009
Walter Bright wrote:
> Anyhow, I think this issue was beaten to death in the previous thread on null dereference. I don't wish to divert this thread into rediscussing 

I don't want to make this "discussion" even more complicated, but some people (such as me) like the existing default initialization behavior, and find the C# behavior annoying.  It doesn't really have to do with the null pointer problem either.

I especially like that it's consistent with the initialization of member variables in structs/classes.
October 01, 2009
Walter Bright wrote:

> Nick Sabalausky wrote:
>> "Walter Bright" <newshound1@digitalmars.com> wrote in message
>>> 2. possible dereference of NULL pointers (some reaching definitions of a
>>> pointer are NULL)
>>> 2. Optimizer collects the info, but ignores this, because people are
>>> annoyed by false positives.
>>>
>> 
>> If you mean something like this:
>> 
>> Foo f;
>> if(cond)
>>     f = new Foo();
>> f.bar();
>> 
>> Then I *want* the compiler to tell me. C# does this and I've never been annoyed by it, in fact I've always appreciated it. I'm not aware of any other C# user that has a problem with that either. If that's not what you mean though, then could you elaborate?
> 
> The problem crops up when there are two connected variables:
> 
>    void foo(bool flag)
>    {
>      char* p = null;
>      if (flag)
> p = "hello";
>      ...
>      if (flag)
> bar(*p);
>    }
> 
> The code is logically correct, there is no null pointer dereference possible. However, the data flow analysis will see the *p and see two reaching definitions for p: null and "hello", even though only one actually reaches.
> 
> Hence the false positive. To eliminate the false error report, the user would have to insert a redundant null check.
> 
> Does this happen in practice? Yes.

How hard is this to implement? I ask this because I would suggest to try it out and see how much it catches vs. how annoying it is. In VB.NET I have quite some false positives, but in C# less. It's all about how it fits with the rest of the language. VB.NET doesn't have a ternary operator for example. In D you have less need for pointers and generally a much more expressive vocabulary at your disposal than other C family languages.


« First   ‹ Prev
1 2 3 4