October 01, 2009
David Gileadi wrote:
> Eclipse's Java compiler has decent support for code checks.  I'm copying the list of items it can check here (except for those that seem Java-specific), in case it's of interest.

This is the kind of thing I was asking for. Thanks! There are some good ideas in there.
October 01, 2009
Lutger wrote:
> How hard is this to implement? I ask this because I would suggest to try it out and see how much it catches vs. how annoying it is. In VB.NET I have quite some false positives, but in C# less. It's all about how it fits with the rest of the language. VB.NET doesn't have a ternary operator for example. In D you have less need for pointers and generally a much more expressive vocabulary at your disposal than other C family languages. 

I did implement it long ago, then disabled it because of too many false positives. It was more annoying than useful.
October 01, 2009
David Gileadi wrote:

....
> Non-static access to static member: Warning

This one has helped me catch some bugs in .NET, really nice especially with overloaded functions.
October 01, 2009
On Thu, 1 Oct 2009, Walter Bright wrote:

> I've been interested in having the D compiler take advantage of the flow analysis in the optimizer to do some more checking. Coverity and clang get a lot of positive press about doing this, but any details of exactly *what* they do have been either carefully hidden (in Coverity's case) or undocumented (clang's page on this is blank). All I can find is marketing hype and a lot of vague handwaving.
> 
> Here is what I've been able to glean from much time spent with google on what they detect and my knowledge of how data flow analysis works:
> 

Snipped a lot of the detail, because that's not really what makes the tools interesting.  There's a couple things that do, im my opinion -- with a little experience having used Fortify and looked at Coverity a couple times over the years (and would be using if it wasn't so much more expensive than Fortify).

1) Rich flow control.  They go well beyond what's typically done by compiliers during their optimization passes.  They tend to be whole-code in scope and actually DO the parts that are hard, like cross expression variable value tracking similar to a couple examples in this thread. Function boundaries are no obstacle to them.  The only obstacle is where source isn't provided.

2) Due to working with whole source bases, the UI for managing the data produced is critical to overall usability.  A lot of time goes into making it easy to manage the output.. both for single runs and for cross-run flow of data.  Some examples:

   * suppression of false positives,
   * graphing of issue trends
   * categorization of issue types

3) Rule creation.  The core engine usually generates some digested dataset upon rules are evaluated.  The systems come with a builtin set that do the sorts of things already talked about.  In addition they come with the ability to develop new rules specific to your application and business needs.  For example:

   * tracking of taint from user data
   * what data is acceptable to log to files (for example NOT credit-cards)

4) They're expected to be slower than compilation, so it's ok to do things that are computationally prohibitive to do during compilation cycles.

----

I've seen these tools detect some amazing subtle bugs in c and c++ code. They're particularly handy in messy code.   They can help find memory leaks where the call graphs are arbitrarily obscure.  Sites where NULL pointers are passed into a function that dereferences without a null check even when the call graph has many layers.

Yes, rigid contract systems and richer type systems can help reduce the need for some of these sorts of checks, but as we all know, there's tradeoffs.


That help?

Later,
Brad

October 02, 2009
Brad Roberts wrote:
> 1) Rich flow control.  They go well beyond what's typically done by compiliers during their optimization passes.  They tend to be whole-code in scope and actually DO the parts that are hard, like cross expression variable value tracking similar to a couple examples in this thread.  Function boundaries are no obstacle to them.  The only obstacle is where source isn't provided.

Modern compiler optimizers (including dmc and dmd) DO do cross-expression variable tracking. They just don't often do it inter-function (called inter-procedural analysis) because of time and memory constraints, not that it is technically more difficult.

C and C++ do have some issues with inter-procedural analysis because the compiler view of them, at heart, is single file. D is much more conducive to that because the compiler sees an arbitrarily large part of the source code, and can obviously see all of it if the user so desires.


> 2) Due to working with whole source bases, the UI for managing the data produced is critical to overall usability.  A lot of time goes into making it easy to manage the output.. both for single runs and for cross-run flow of data.  Some examples:
> 
>    * suppression of false positives, 

I'd rather do a better job and not have false positives.

>    * graphing of issue trends

That's a crock <g>.

>    * categorization of issue types

I'm not convinced that is of critical value.


> 3) Rule creation.  The core engine usually generates some digested dataset upon rules are evaluated.  The systems come with a builtin set that do the sorts of things already talked about.  In addition they come with the ability to develop new rules specific to your application and business needs.  For example:
> 
>    * tracking of taint from user data
>    * what data is acceptable to log to files (for example NOT credit-cards)

There have been several proposals for user-defined attributes for types, I think that is better than having some external rule file.


> 4) They're expected to be slower than compilation, so it's ok to do things that are computationally prohibitive to do during compilation cycles.

I agree.

> ----
> 
> I've seen these tools detect some amazing subtle bugs in c and c++ code.  They're particularly handy in messy code.   They can help find memory leaks where the call graphs are arbitrarily obscure.  Sites where NULL pointers are passed into a function that dereferences without a null check even when the call graph has many layers.

Once you get the data flow analysis equations right, they'll detect it every time regardless of how subtle or layered the call graph is.

> Yes, rigid contract systems and richer type systems can help reduce the need for some of these sorts of checks, but as we all know, there's tradeoffs.
> 
> 
> That help?

Yes, very much. In particular, I wasn't sure coverity did inter-procedural analysis.
October 02, 2009
Walter Bright wrote:
> Brad Roberts wrote:
>> 1) Rich flow control.  They go well beyond what's typically done by compiliers during their optimization passes.  They tend to be whole-code in scope and actually DO the parts that are hard, like cross expression variable value tracking similar to a couple examples in this thread.  Function boundaries are no obstacle to them.  The only obstacle is where source isn't provided.
> 
> Modern compiler optimizers (including dmc and dmd) DO do cross-expression variable tracking. They just don't often do it inter-function (called inter-procedural analysis) because of time and memory constraints, not that it is technically more difficult.

Exactly my point.. compilers tend to make the opposite trade off that tools like Coverity do.  Not that compilers can't or don't.  They just usually do a small subset of what is possible in the grander sense of 'possible'.

> C and C++ do have some issues with inter-procedural analysis because the compiler view of them, at heart, is single file. D is much more conducive to that because the compiler sees an arbitrarily large part of the source code, and can obviously see all of it if the user so desires.

Neither c nor c++ as languages are _required_ to do single file compilation. That's just what most compilers do.  In fact, gcc/g++ have been capable of doing whole-app compilation for a couple years now -- though not many people use it that way as far as I can tell.  See also, exact same issue as above.. it's a trade off.

>> 2) Due to working with whole source bases, the UI for managing the data produced is critical to overall usability.  A lot of time goes into making it easy to manage the output.. both for single runs and for cross-run flow of data.  Some examples:
>>
>>    * suppression of false positives,
> 
> I'd rather do a better job and not have false positives.

Of course you would.. everyone would.  It's a meaningless statement since no one would ever contradict it with any seriousness.  At the risk of repeating myself.. same tradeoffs again.

>>    * graphing of issue trends
> 
> That's a crock <g>.

Uh, whatever.  Most of the rest of us humans respond much better to pictures and trends than to raw numbers.  Show me some visual indication of the quality of my code (ignoring the arguments about the validity of such graphs) and I can pretty much guarantee that I'll work to improve that measure.  Nearly everyone I've ever worked with behaves similarly.. once they agree that the statistic being measured is useful.  One of the best examples is percent of code covered by unit tests.  The same applies to number of non-false positive issues discovered through static analysis.

A specific case of this:  At informix we had a step in our build process that ran lint (yes, it's ancient, but this was a decade ago and the practice was at least a decade old before I got there).  Any new warnings weren't tolerated. The build automatically reported any delta over the previous build.  It was standard practice and kept the code pretty darned clean.

>>    * categorization of issue types
> 
> I'm not convinced that is of critical value.

You don't need to be.  You view too many things in black/white terms.

>> 3) Rule creation.  The core engine usually generates some digested dataset upon rules are evaluated.  The systems come with a builtin set that do the sorts of things already talked about.  In addition they come with the ability to develop new rules specific to your application and business needs.  For example:
>>
>>    * tracking of taint from user data
>>    * what data is acceptable to log to files (for example NOT
>> credit-cards)
> 
> There have been several proposals for user-defined attributes for types, I think that is better than having some external rule file.

Again, this is where trade offs come in.  If it can be done cheaply enough to warrant being done during compilation and is accurate enough in small scoped analysis.. yay.  But sometimes you still want to do things that take more time and more completely.

>> 4) They're expected to be slower than compilation, so it's ok to do things that are computationally prohibitive to do during compilation cycles.
> 
> I agree.
> 
>> ----
>>
>> I've seen these tools detect some amazing subtle bugs in c and c++ code.  They're particularly handy in messy code.   They can help find memory leaks where the call graphs are arbitrarily obscure.  Sites where NULL pointers are passed into a function that dereferences without a null check even when the call graph has many layers.
> 
> Once you get the data flow analysis equations right, they'll detect it every time regardless of how subtle or layered the call graph is.
> 
>> Yes, rigid contract systems and richer type systems can help reduce the need for some of these sorts of checks, but as we all know, there's tradeoffs.
>>
>>
>> That help?
> 
> Yes, very much. In particular, I wasn't sure coverity did inter-procedural analysis.

That's essentially all it does, if you boil away all the other interesting stuff layered on top of that core, but it does it very well and with a lot of tooling around it.

Later,
Brad
October 02, 2009
Nick Sabalausky wrote:
> If you accept the idea of a compiler (like DMD) having rudimentary built-in optional versions of normally separate tools like profiling, unittesting, doc generation, etc., and you accept that lint tools are perfectly fine tools to use (as I think you do, or am I mistaken?), then I don't see what would make lint tools an exception to the "built-ins are ok" attitude (especially since a separate one would require a lot of redundant parsing/analysis.)

This is a more general comment on your post (and similar ones by others, it's a recurring theme):

Consider the Bible. It's long and complicated, and by careful examination of it you can find a verse here and there to justify *any* behavior.

D is complicated, and is founded on principles that are not orthogonal - they are often at odds with each other. Any attempt to take one particular aspect of D's behavior and use it as a rule to impose elsewhere is surely doomed to conflict with some other rule.

The only reasonable way forward is to evaluate each idea not only in terms of all of D's principles, but also on its own merits, and throw in one's best judgment.

Nearly a decade with D has now shown that some ideas and choices were dead wrong, but others were more right than I even dreamed <g>.
October 02, 2009
Brad Roberts wrote:
>>> * graphing of issue trends
>> That's a crock <g>.
> 
> Uh, whatever.  Most of the rest of us humans respond much better to
> pictures and trends than to raw numbers.  Show me some visual
> indication of the quality of my code (ignoring the arguments about
> the validity of such graphs) and I can pretty much guarantee that
> I'll work to improve that measure.  Nearly everyone I've ever worked
> with behaves similarly.. once they agree that the statistic being measured is useful.  One of the best examples is percent of code
> covered by unit tests.  The same applies to number of non-false
> positive issues discovered through static analysis.
> 


A long time ago, the company I worked for decided to put up a huge chart
on the wall that everyone could see, and every day the current bug count
was plotted on it. The idea was to show a downward trend.

It wasn't very long (a few days) before this scheme completely backfired:

1. engineers stopped submitting new bug reports

2. the engineers and QA would argue about what was a bug and what wasn't

3. multiple bugs would get combined into one bug report so it only
counted once

4. if a bug was "X is not implemented", then when X was implemented,
there might be 3 or 4 bugs against X. Therefore, X did not get implemented.

5. there was a great rush to submit half-assed fixes before the daily
count was made

6. people would invent bugs for which they would simultaneously submit fixes (look ma, I fixed all these bugs!)

7. arguing about it started to consume a large fraction of the
engineering day, including the managers who were always called in to
resolve the disputes

In other words, everyone figured out they were being judged on the
graph, not the quality of the product, and quickly changed their
behavior to "work the graph" rather than the quality.

To the chagrin of the QA staff, management finally tore down the chart.

Note that nobody involved in this was a moron. They all knew exactly what was happening, it was simply irresistible.

> A specific case of this:  At informix we had a step in our build
> process that
> ran lint (yes, it's ancient, but this was a decade ago and the
> practice was at
> least a decade old before I got there).  Any new warnings weren't
> tolerated.
> The build automatically reported any delta over the previous build.
> It was standard practice and kept the code pretty darned clean.

I think that's something different - it's not graphing or trending the data.
October 02, 2009
Walter Bright wrote:
> Brad Roberts wrote:
>>>> * graphing of issue trends
>>> That's a crock <g>.
>>
>> Uh, whatever.  Most of the rest of us humans respond much better to pictures and trends than to raw numbers.  Show me some visual indication of the quality of my code (ignoring the arguments about the validity of such graphs) and I can pretty much guarantee that I'll work to improve that measure.  Nearly everyone I've ever worked with behaves similarly.. once they agree that the statistic being measured is useful.  One of the best examples is percent of code covered by unit tests.  The same applies to number of non-false positive issues discovered through static analysis.
>>
> 
> 
> A long time ago, the company I worked for decided to put up a huge chart on the wall that everyone could see, and every day the current bug count was plotted on it. The idea was to show a downward trend.
> 
> It wasn't very long (a few days) before this scheme completely backfired:
> 
> 1. engineers stopped submitting new bug reports
> 
> 2. the engineers and QA would argue about what was a bug and what wasn't
> 
> 3. multiple bugs would get combined into one bug report so it only counted once
> 
> 4. if a bug was "X is not implemented", then when X was implemented,
> there might be 3 or 4 bugs against X. Therefore, X did not get implemented.
> 
> 5. there was a great rush to submit half-assed fixes before the daily count was made
> 
> 6. people would invent bugs for which they would simultaneously submit fixes (look ma, I fixed all these bugs!)
> 
> 7. arguing about it started to consume a large fraction of the engineering day, including the managers who were always called in to resolve the disputes
> 
> In other words, everyone figured out they were being judged on the graph, not the quality of the product, and quickly changed their behavior to "work the graph" rather than the quality.
> 
> To the chagrin of the QA staff, management finally tore down the chart.
> 
> Note that nobody involved in this was a moron. They all knew exactly what was happening, it was simply irresistible.

Existence of a bad case doesn't disprove the usefulness in general.  Yes, I agree that number of bugs is a bad metric to measure all by itself.

Water can drown a person, but that doesn't make it something to avoid.

Sigh,
Brad
October 02, 2009
Hello Walter,

>> 3) Rule creation.  The core engine usually generates some digested
>> dataset upon rules are evaluated.  The systems come with a builtin
>> set that do the sorts of things already talked about.  In addition
>> they come with the ability to develop new rules specific to your
>> application and business needs.  For example:
>> 
>> * tracking of taint from user data
>> * what data is acceptable to log to files (for example NOT
>> credit-cards)
> There have been several proposals for user-defined attributes for
> types, I think that is better than having some external rule file.
> 

For open source and libs, yes. For proprietary code bases, I'd say it's about a wash. Having it in another file could make the language/code base easier to read and also allow a much more powerful rules language (because it doesn't have to fit in the host language). And because only you will be maintaining the code, needing another tool (that you already have) and another build step isn't much of an issue.