Jump to page: 1 2
Thread overview
Fuzzed - a program to find DMDFE parser crash
Dec 15
Basile B.
Dec 15
Basile B.
Dec 16
Basile B.
Dec 16
Basile B.
Dec 16
Basile B.
December 15
Fuzzed [1] is a simple fuzzer for the D programming language. It allows to detect sequences of tokens that crash the parser. While the D front end is not yet used to make tools, if this ever happens the parser will have to accept invalid code. As experienced with dparse, invalid code tend to crash more a parser because of a cognitive bias that lead us, "hoomans", to prove that things work rather than the opposite.

You can run it on one your core, report the crasher programs to the project issue tracker or fix them yourself:

> gdb dmd
> run <the_crasher>
> bt

And then try to see what happens in the parser at the location pointed on top of the back trace. Note that you'll need to build dmd debug version.

The time to write this announce, already 5 "crashers" found.

[1] https://github.com/BBasile/fuzzed
December 15
On Saturday, 15 December 2018 at 11:29:45 UTC, Basile B. wrote:
> Fuzzed [1] is a simple fuzzer for the D programming language.

Are you familiar with libFuzzer and LDC's integration?
https://johanengelen.github.io/ldc/2018/01/14/Fuzzing-with-LDC.html
You can feed libFuzzer with a dictionary of keywords to speed up the initial fuzzing phase, where the keywords are the tokens strings that you use.
Besides finding crashes, it's also good to enable ASan to find memory-related bugs that by luck didn't crash the program.

> The time to write this announce, already 5 "crashers" found.

Great :)

The other day I was reminded of OSS Fuzz and that it'd be nice if we would setup fuzzing for the frontend and phobos there...

-Johan


December 15
On Saturday, 15 December 2018 at 11:29:45 UTC, Basile B. wrote:
> Fuzzed [1] is a simple fuzzer for the D programming language. It allows to detect sequences of tokens that crash the parser. While the D front end is not yet used to make tools, if this ever happens the parser will have to accept invalid code. As experienced with dparse, invalid code tend to crash more a parser because of a cognitive bias that lead us, "hoomans", to prove that things work rather than the opposite.
>

Nice. In my experience fuzzing parses works very well. I have good memories with afl. So much so that I once wrote a wrapper around it to handle running it distributed.

See https://github.com/skoppe/afl-dist
Could use a readme and a how-to though.

December 15
On Saturday, 15 December 2018 at 14:22:48 UTC, Johan Engelen wrote:
> On Saturday, 15 December 2018 at 11:29:45 UTC, Basile B. wrote:
>> Fuzzed [1] is a simple fuzzer for the D programming language.
>
> Are you familiar with libFuzzer and LDC's integration?
> https://johanengelen.github.io/ldc/2018/01/14/Fuzzing-with-LDC.html

No, but i'm not that surprised to see that a fuzzer already exists.
I may have even seen this article but completely forgot it.

> You can feed libFuzzer with a dictionary of keywords to speed up the initial fuzzing phase, where the keywords are the tokens strings that you use.
> Besides finding crashes, it's also good to enable ASan to find memory-related bugs that by luck didn't crash the program.
>
>> The time to write this announce, already 5 "crashers" found.
>
> Great :)

I have about 40 now

>
> The other day I was reminded of OSS Fuzz and that it'd be nice if we would setup fuzzing for the frontend and phobos there...
>
> -Johan

I started looking at a crasher:

   typeof function function in

which crashes in hdrgen. Actually i realize that i don't like the D parser. In many cases it checks for errors but continues parsing unconditionally.

In the example, "in" leads to an null contract that the pretty formatter dereferences at some point, but parsing should have stopped after "typeof" since there is no left paren. Now take a look at typeof sub parser


    AST.TypeQualified parseTypeof()
    {
        AST.TypeQualified t;
        const loc = token.loc;

        nextToken();
        check(TOK.leftParentheses); // <--  why continuing if the check fails?
        if (token.value == TOK.return_)
        {
            nextToken();
            t = new AST.TypeReturn(loc);
        }
        else
        {
            AST.Expression exp = parseExpression();
            t = new AST.TypeTypeof(loc, exp);
        }
        check(TOK.rightParentheses);
        return t;
    }

I think this is what Walter calls "AST poisoning" (never understood how it worked before today). And the whole parser is like this.

This poisoning kills the interest of using a fuzzer. 99% of the crashes will be in hdrgen.
December 15
On Saturday, 15 December 2018 at 15:37:19 UTC, Basile B. wrote:
> I think this is what Walter calls "AST poisoning" (never understood how it worked before today). And the whole parser is like this.
>
> This poisoning kills the interest of using a fuzzer. 99% of the crashes will be in hdrgen.

As is common with fuzzing, you'll need to ensure the program crashes. Sometimes that requires some tweaking.

Regardless, you still have the input to investigate.
December 15
On Sat, 15 Dec 2018 21:09:12 +0000, Sebastiaan Koppe wrote:
> On Saturday, 15 December 2018 at 15:37:19 UTC, Basile B. wrote:
>> I think this is what Walter calls "AST poisoning" (never understood how it worked before today). And the whole parser is like this.
>>
>> This poisoning kills the interest of using a fuzzer. 99% of the crashes will be in hdrgen.
> 
> As is common with fuzzing, you'll need to ensure the program crashes. Sometimes that requires some tweaking.
> 
> Regardless, you still have the input to investigate.

I think the point is that DMD tries to recover from parsing failures in order to provide additional error messages. But those parsing failures leave the parser in an invalid state, and invalid states are fertile ground for crashes.

The way to fix this is to replace the entire parser and get rid of the idea of AST poisoning; at the first error, you give up on parsing the entire file. From there, you can try recovering from specific errors with proper testing.
December 15
On 12/15/2018 3:29 AM, Basile B. wrote:
> The time to write this announce, already 5 "crashers" found.

Great! Please post them to bugzilla.
December 15
On 12/15/2018 2:48 PM, Neia Neutuladh wrote:
> The way to fix this is to replace the entire parser and get rid of the
> idea of AST poisoning; at the first error, you give up on parsing the
> entire file. From there, you can try recovering from specific errors with
> proper testing.

DMD tries to continue parsing after a syntax error, but it does not attempt semantic analysis if there were any errors.

December 16
On Saturday, 15 December 2018 at 22:48:01 UTC, Neia Neutuladh wrote:
> The way to fix this is to replace the entire parser and get rid of the idea of AST poisoning; at the first error, you give up on parsing the entire file. From there, you can try recovering from specific errors with proper testing.

You can still continue parsing after an error but right now many sub-parsers always return an AstNode instead of null. The parser on null sub parser result could go to the end of the scope or to the next statement, depending on what it expected, and continue from there. That being said this wouldn't always work, e.g when a semi colon or a curly brace misses.

Simple example:

    struct Foo
    {
        int a, b
        string c; // error because a type identifier part wasn't expected ...
    }             // ... we're in a aggr body so consume toks past the curly brace

    struct Bar
    {
    }



December 16
On Sunday, 16 December 2018 at 01:57:17 UTC, Walter Bright wrote:
> On 12/15/2018 2:48 PM, Neia Neutuladh wrote:
>> The way to fix this is to replace the entire parser and get rid of the
>> idea of AST poisoning; at the first error, you give up on parsing the
>> entire file. From there, you can try recovering from specific errors with
>> proper testing.
>
> DMD tries to continue parsing after a syntax error, but it does not attempt semantic analysis if there were any errors.

The problem i underlined is more that, like in the code that parses typeof, a non null node is returned even if some expectations are not verified when parsing.

I'm not sure of what is the right fix. fixing the ast pretty printer or the parser ?
« First   ‹ Prev
1 2