Notes from C++ static analysis - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Notes from C++ static analysis

Thread overview

Notes from C++ static analysis
Jun 26, 2013 bearophile
Jun 26, 2013 H. S. Teoh
Jun 26, 2013 Adam D. Ruppe
Jun 26, 2013 dennis luehring
Jun 26, 2013 H. S. Teoh
Jun 26, 2013 Adam D. Ruppe
Jun 26, 2013 Marco Leise
Jun 26, 2013 Adam D. Ruppe
Jun 26, 2013 Dmitry Olshansky
Jun 26, 2013 H. S. Teoh
Jun 26, 2013 Dmitry Olshansky
Jun 26, 2013 Paulo Pinto
Jun 26, 2013 H. S. Teoh
Jun 27, 2013 Paulo Pinto
Jun 26, 2013 Walter Bright
Jun 26, 2013 Walter Bright
Jun 26, 2013 Timon Gehr
Jun 27, 2013 Walter Bright
Jun 27, 2013 Peter Williams
Jun 27, 2013 Andrei Alexandrescu
Jun 27, 2013 Paulo Pinto
Jun 27, 2013 monarch_dodra
Jun 26, 2013 Jonathan M Davis
Jun 27, 2013 Andrei Alexandrescu
Jun 27, 2013 Paulo Pinto
Jun 26, 2013 Adam D. Ruppe
Jun 26, 2013 H. S. Teoh
Jun 27, 2013 Walter Bright
Jun 26, 2013 Andrei Alexandrescu
Jun 26, 2013 dennis luehring
Jun 26, 2013 dennis luehring
Jun 27, 2013 Andrei Alexandrescu
Jun 26, 2013 Andrej Mitrovic
Jun 27, 2013 Andrei Alexandrescu
Jun 27, 2013 Andrej Mitrovic
Jun 27, 2013 Andrei Alexandrescu
Jun 27, 2013 Andrej Mitrovic
Jun 27, 2013 Andrei Alexandrescu
Jun 27, 2013 Andrej Mitrovic
Jun 27, 2013 Adam D. Ruppe
Jun 27, 2013 Andrej Mitrovic
Jun 27, 2013 bearophile
Jun 28, 2013 Peter Williams
Jun 27, 2013 Andrej Mitrovic
Jun 27, 2013 bearophile
Jun 27, 2013 Adam D. Ruppe
Jun 27, 2013 Andrej Mitrovic
Jun 27, 2013 Andrei Alexandrescu
Jun 27, 2013 Peter Williams
Jun 27, 2013 H. S. Teoh
Jun 27, 2013 Peter Williams
Jun 27, 2013 Andrei Alexandrescu
Jun 27, 2013 Jonathan M Davis
Jun 27, 2013 bearophile
Jun 27, 2013 monarch_dodra
Jun 26, 2013 bearophile
Jun 27, 2013 Andrei Alexandrescu
Jun 27, 2013 Jonathan M Davis
Jun 27, 2013 Andrei Alexandrescu
Jun 27, 2013 Jonathan M Davis
Jun 27, 2013 Andrei Alexandrescu
Jun 27, 2013 Andrei Alexandrescu
Jun 27, 2013 deadalnix
Jun 27, 2013 Adam D. Ruppe
Jun 27, 2013 Jonathan M Davis
Jun 28, 2013 Peter Williams
Jun 27, 2013 Nicolas Sicard
Jun 27, 2013 Artur Skawina
Jun 26, 2013 Walter Bright
Jun 27, 2013 renoX
Jun 28, 2013 Jonathan M Davis
Jun 28, 2013 Peter Williams
Jun 28, 2013 Walter Bright
Jun 28, 2013 Jonathan M Davis

June 26, 2013

Notes from C++ static analysis

Posted by bearophile

bearophile

An interesting blog post found through Reddit:

http://randomascii.wordpress.com/2013/06/24/two-years-and-thousands-of-bugs-of-/

The post is about the heavy usage of static analysis on lot of C++ code. They have a Python script that shows new warnings only the first time they appear in the code base. This is a simple but very useful memory, to solve one of the most important downsides of warnings.

The article groups bugs in some different categories. Some of the D code below is derived from the article.

- - - - - - - - - - - - - - - - - -

Format strings:

The most common problem they find are errors in the format string of printf-like functions (despite the code is C++):

>The top type of bug that /analyze finds is format string errors – mismatches between printf-style format strings and the corresponding arguments. Sometimes there is a missing argument, sometimes there is an extra argument, and sometimes the arguments don’t match, such as printing a float, long or ‘long long’ with %d.<

Such errors in D are less bad, because writef("%d",x) is usable for all kind of integral values. On the other hand this D program prints just "10" with no errors, ignoring the second x:

import std.stdio;
void main() {
    size_t x = 10;
    writefln("%d", x, x);
}

In a modern statically typed language I'd like such code to give a compile-time error.

This is how how Rust gets this right:

println(fmt!("hello, %d", j))

https://github.com/mozilla/rust/blob/master/src/libsyntax/ext/fmt.rs
https://github.com/Aatch/rust-fmt

In D it can be written a safe function that needs no extra static analysis:

ctWritefln!"%d"(x, x);

- - - - - - - - - - - - - - - - - -

Variable shadowing:

This is a much less common problem in D because this code gives a errors:

void main() {
    bool result = true;
    if (true) {
        bool result = false;
    }
    foreach (i; 0 .. 10) {
        foreach (i; 0 .. 20) {
        }
    }
    for (int i = 0; i < 10; i++) {
        for (int i = 0; i < 20; i++) {
        }
    }
}


test.d(4): Error: is shadowing declaration test.main.result
test.d(7): Error: is shadowing declaration test.main.i
test.d(11): Error: is shadowing declaration test.main.i


There are some situations where this doesn't help, but they are not common in idiomatic D code:

void main() {
    int i, j;
    for (i = 0; i < 10; i++) {
        for (i = 0; i < 20; i++) {
        }
    }
}


In D this is one case similar to variable shadowing, that the compiler doesn't help you with:

class Foo {
    int x, y, z, w;
    this(in int x_, in int y_, in int z_, in int w_) {
        this.x = x_;
        this.y = y_;
        this.z = z;
        this.w = w_;
    }
}
void main() {
    auto f = new Foo(1, 2, 3, 4);
}


I believe the compile should give some warning there:
http://d.puremagic.com/issues/show_bug.cgi?id=3878

- - - - - - - - - - - - - - - - - -

Logic bugs:


bool someFunction() { return true; }
uint getFlags() { return uint.max; }
void main() {
    uint kFlagValue = 2u ^^ 14;
    if (someFunction() || getFlags() | kFlagValue) {}
}


The D compiler gives no warnings. from the article:

>The code above is an expensive and misleading way to go "if ( true )". Visual Studio gave a clear warning that described the problem well:

    warning C6316: Incorrect operator:  tested expression is constant and non-zero.  Use bitwise-and to determine whether bits are set.<


See:
http://msdn.microsoft.com/en-us/library/f921xb29.aspx

A simpler example:

enum INPUT_VALUE = 2;
void f(uint flags) {
    if (flags | INPUT_VALUE) {}
}


I have just added it to Bugzilla:
http://d.puremagic.com/issues/show_bug.cgi?id=10480



Another problem:

void main() {
    bool bOnFire = true;
    float angle = 20.0f + bOnFire ? 5.0f : 10.0f;
}


D compiler gives no warnings.

Visual Studio gave:

>warning C6336: Arithmetic operator has precedence over question operator, use parentheses to clarify intent.
warning C6323: Use of arithmetic operator on Boolean type(s).<


See:
http://msdn.microsoft.com/en-us/library/ms182085.aspx

I opened an ER lot of time ago, "Require parenthesization of ternary operator when compounded":
http://d.puremagic.com/issues/show_bug.cgi?id=8757

- - - - - - - - - - - - - - - - - -

Signed, unsigned, and tautologies:

Currently this gives no warnings:


>This code would have been fine if both a and b were signed – but one of them wasn’t, making this operation nonsensical.<


import std.algorithm: max;
void main() {
    int a = -10;
    uint b = 5;
    auto result = max(0, a - b);
}



>We had quite a few places where we were checking to see if unsigned variables were less than zero -- now we have fewer.<

This is a well known problem, it's an issue in Bugzilla since lot of time, and it seems there is no simple way to face it in D.

Bye,
bearophile

June 26, 2013

Re: Notes from C++ static analysis

Posted by H. S. Teoh
in reply to bearophile

H. S. Teoh

Posted in reply to bearophile

On Wed, Jun 26, 2013 at 08:08:08PM +0200, bearophile wrote:
> An interesting blog post found through Reddit:
> 
> http://randomascii.wordpress.com/2013/06/24/two-years-and-thousands-of-bugs-of-/
[...]
> The most common problem they find are errors in the format string of printf-like functions (despite the code is C++):

None of my C++ code uses iostream. I still find stdio.h more comfortable to use, in spite of its many problems. One of the most annoying features of iostream is the abuse of operator<< and operator>> for I/O. Format strings are an ingenious idea sorely lacking in the iostream department (though admittedly the way it was implemented in stdio is rather unsafe, due to the inability of C to do many compile-time checks).

> >The top type of bug that /analyze finds is format string errors – mismatches between printf-style format strings and the corresponding arguments. Sometimes there is a missing argument, sometimes there is an extra argument, and sometimes the arguments don’t match, such as printing a float, long or ‘long long’ with %d.<
> 
> Such errors in D are less bad, because writef("%d",x) is usable for
> all kind of integral values.

Less bad? Actually, IME format strings in D are amazingly useful! You can pretty much use %s 99% of the time, because static type inference works so well in D! The only time I actually write anything other than %s is when I need to specify floating-point formatting options, like %precision, or scientific format vs. decimal, etc..

Then throw in the array formatters %(...%), and D format strings will totally blow C's stdio out of the water.

> On the other hand this D program prints
> just "10" with no errors, ignoring the second x:
> 
> import std.stdio;
> void main() {
>     size_t x = 10;
>     writefln("%d", x, x);
> }
> 
> In a modern statically typed language I'd like such code to give a compile-time error.

This looks like a bug to me. Please file one. :)

[...]
> There are some situations where this doesn't help, but they are not common in idiomatic D code:
> 
> void main() {
>     int i, j;
>     for (i = 0; i < 10; i++) {
>         for (i = 0; i < 20; i++) {
>         }
>     }
> }

I don't think this particular error is compiler-catchable. Sometimes, you *want* the nested loop to reuse the same index (though probably not in exactly the formulation as above, most likely the inner loop will omit the i=0 part). The compiler can't find such errors unless it reads the programmer's mind.

> In D this is one case similar to variable shadowing, that the compiler doesn't help you with:
> 
> class Foo {
>     int x, y, z, w;
>     this(in int x_, in int y_, in int z_, in int w_) {
>         this.x = x_;
>         this.y = y_;
>         this.z = z;
>         this.w = w_;
>     }
> }

Yeah, this one bit me before. Really hard. I had code that looked like this:

	class C {
		int x;
		this(int x) {
			x = f(x);	// ouch
		}
		int f(int x) { ... }
	}

This failed horribly, so I rewrote the //ouch line to:

	this.x = x;

But that is still very risky, since in a member function that doesn't shadow x, the above line is equivalent to this.x = this.x.

Anyway, in the end I decided that naming member function arguments after member variables is a Very Stupid Idea, and that it should never be done. It would be nice if the D compiler rejected such code.

[...]
> Logic bugs:
[...]
> enum INPUT_VALUE = 2;
> void f(uint flags) {
>     if (flags | INPUT_VALUE) {}
> }
> 
> 
> I have just added it to Bugzilla: http://d.puremagic.com/issues/show_bug.cgi?id=10480
[...]

Huh? Shouldn't that be (flags & ~INPUT_VALUE)?

How would the compiler catch such cases in general, though? I mean, like in arbitrarily complex boolean expressions.

T

-- 
It said to install Windows 2000 or better, so I installed Linux instead.

June 26, 2013

Re: Notes from C++ static analysis

Posted by Adam D. Ruppe
in reply to bearophile

Adam D. Ruppe

Posted in reply to bearophile

On Wednesday, 26 June 2013 at 18:08:10 UTC, bearophile wrote:
> In D this is one case similar to variable shadowing, that the compiler doesn't help you with:
>         this.z = z;

I'd argue that assigning something to itself is never useful.

June 26, 2013

Re: Notes from C++ static analysis

Posted by H. S. Teoh
in reply to Adam D. Ruppe

H. S. Teoh

Posted in reply to Adam D. Ruppe

On Wed, Jun 26, 2013 at 08:57:46PM +0200, Adam D. Ruppe wrote:
> On Wednesday, 26 June 2013 at 18:08:10 UTC, bearophile wrote:
> >In D this is one case similar to variable shadowing, that the compiler doesn't help you with:
> >        this.z = z;
> 
> I'd argue that assigning something to itself is never useful.

Unless opAssign does something unusual.

But yeah, that's bad practice and the compiler should warn about it. The reason it doesn't, though, IIRC is because of generic code, where it would suck to have to special-case when two template arguments actually alias the same thing.

T

-- 
People say I'm arrogant, and so I am!!

June 26, 2013

Re: Notes from C++ static analysis

Posted by Adam D. Ruppe
in reply to H. S. Teoh

Adam D. Ruppe

Posted in reply to H. S. Teoh

On Wednesday, 26 June 2013 at 18:54:17 UTC, H. S. Teoh wrote:
>> import std.stdio;
>> void main() {
>>     size_t x = 10;
>>     writefln("%d", x, x);
>> }
>> 
>> In a modern statically typed language I'd like such code to give a compile-time error.
>
> This looks like a bug to me. Please file one. :)

Not necessarily, since you might want a format string to be a runtime variable, like when doing translations. I could live with there being another function that does runtime though.

Things might be confusing too because of positional parameters (%1$d). You might offer something that isn't necessarily used:

config.dateFormat = "%3$d/%2$d";
writeln(config.dateFormat, year, month, day);

> Anyway, in the end I decided that naming member function arguments after member variables is a Very Stupid Idea,

Blargh, I do it a lot. But I would be ok with the lhs of a member when there's a parameter of the same name requiring that you call it this.x.

> How would the compiler catch such cases in general, though? I mean, like in arbitrarily complex boolean expressions.

The Microsoft compiler warned about it, after constant folding, working out to if(1).

I'm a little concerned that it would complain about some false positives though, which can be quite deliberate in D, like if(__ctfe).

June 26, 2013

Re: Notes from C++ static analysis

Posted by Andrei Alexandrescu
in reply to bearophile

Andrei Alexandrescu

Posted in reply to bearophile

On 6/26/13 11:08 AM, bearophile wrote:
> On the other hand this D program prints just
> "10" with no errors, ignoring the second x:
>
> import std.stdio;
> void main() {
> size_t x = 10;
> writefln("%d", x, x);
> }
>
> In a modern statically typed language I'd like such code to give a
> compile-time error.

Actually this is good because it allows to customize the format string to print only a subset of available information (I've actually used this).

> This is how how Rust gets this right:
>
> println(fmt!("hello, %d", j))
>
> https://github.com/mozilla/rust/blob/master/src/libsyntax/ext/fmt.rs
> https://github.com/Aatch/rust-fmt

This is potentially inefficient because it creates a string instead of formatting straight in the output buffer.

Andrei

June 26, 2013

Re: Notes from C++ static analysis

Posted by dennis luehring
in reply to Adam D. Ruppe

dennis luehring

Posted in reply to Adam D. Ruppe

Am 26.06.2013 21:07, schrieb Adam D. Ruppe:
> On Wednesday, 26 June 2013 at 18:54:17 UTC, H. S. Teoh wrote:
>>> import std.stdio;
>>> void main() {
>>>     size_t x = 10;
>>>     writefln("%d", x, x);
>>> }
>>>
>>> In a modern statically typed language I'd like such code to
>>> give a compile-time error.
>>
>> This looks like a bug to me. Please file one. :)
>
> Not necessarily, since you might want a format string to be a
> runtime variable, like when doing translations. I could live with
> there being another function that does runtime though.

then you normaly quote the % with %% or something else to inactivate it - thats much more clean then just to allow it for this corner case out of the box

June 26, 2013

Re: Notes from C++ static analysis

Posted by dennis luehring
in reply to Andrei Alexandrescu

dennis luehring

Posted in reply to Andrei Alexandrescu

Am 26.06.2013 21:33, schrieb Andrei Alexandrescu:
> On 6/26/13 11:08 AM, bearophile wrote:
>> On the other hand this D program prints just
>> "10" with no errors, ignoring the second x:
>>
>> import std.stdio;
>> void main() {
>> size_t x = 10;
>> writefln("%d", x, x);
>> }
>>
>> In a modern statically typed language I'd like such code to give a
>> compile-time error.
>
> Actually this is good because it allows to customize the format string
> to print only a subset of available information (I've actually used this).

why is there always a tiny need for such tricky stuff - isn't that only usefull in very rare cases

June 26, 2013

Re: Notes from C++ static analysis

Posted by dennis luehring
in reply to dennis luehring

dennis luehring

Posted in reply to dennis luehring

Am 26.06.2013 21:53, schrieb dennis luehring:
> Am 26.06.2013 21:33, schrieb Andrei Alexandrescu:
>> On 6/26/13 11:08 AM, bearophile wrote:
>>> On the other hand this D program prints just
>>> "10" with no errors, ignoring the second x:
>>>
>>> import std.stdio;
>>> void main() {
>>> size_t x = 10;
>>> writefln("%d", x, x);
>>> }
>>>
>>> In a modern statically typed language I'd like such code to give a
>>> compile-time error.
>>
>> Actually this is good because it allows to customize the format string
>> to print only a subset of available information (I've actually used this).
>
> why is there always a tiny need for such tricky stuff - isn't that only
> usefull in very rare cases
>

or better said - could then someone add a description to writefln why there is a need that writefln can "handle" more values then asked in the format-string - maybe with an example that realy shows the usefullness of this feature - and why an simple enum + if/else can't handle this also very elegant

June 26, 2013

Re: Notes from C++ static analysis

Posted by H. S. Teoh
in reply to Adam D. Ruppe

H. S. Teoh

Posted in reply to Adam D. Ruppe

On Wed, Jun 26, 2013 at 09:07:30PM +0200, Adam D. Ruppe wrote:
> On Wednesday, 26 June 2013 at 18:54:17 UTC, H. S. Teoh wrote:
> >>import std.stdio;
> >>void main() {
> >>    size_t x = 10;
> >>    writefln("%d", x, x);
> >>}
> >>
> >>In a modern statically typed language I'd like such code to give a compile-time error.
> >
> >This looks like a bug to me. Please file one. :)
> 
> Not necessarily, since you might want a format string to be a runtime variable, like when doing translations. I could live with there being another function that does runtime though.
[...]

Wait, I thought we were talking about *compile-time* warnings for extraneous arguments to writefln. If the format string is not known at compile-time, then there's nothing to be done, and as you said, it's arguably better to allow more arguments than format specifiers if you're doing i18n.

But if the format string is known at compile-time, and there are extraneous arguments, then it should be a warning / error.

T

-- 
People tell me that I'm skeptical, but I don't believe it.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation