Thread overview
DMD library available as DUB package
Jul 18, 2017
Jacob Carlborg
Jul 18, 2017
Suliman
Jul 18, 2017
Dukc
Jul 18, 2017
Jacob Carlborg
Jul 18, 2017
Meta
Jul 18, 2017
NotSpooky
Jul 19, 2017
Andrea Fontana
Jul 30, 2017
Johan Engelen
Jul 31, 2017
Johan Engelen
July 18, 2017
During the dconf hackathon I set out to create a DUB package for DMD to be used as a library. This has finally been merged [1] and is available here [2]. It contains the lexer and the parser.

A minimal example:

#!/usr/bin/env dub
/++ dub.sdl:
name "dmd_lexer_example"
dependency "dmd" version="~master"
+/

void main()
{
    import ddmd.lexer;
    import ddmd.tokens;
    import std.stdio;

    immutable sourceCode = "void test() {} // foobar";
    scope lexer = new Lexer("test", sourceCode.ptr, 0, sourceCode.length, 0, 0);

    while (lexer.nextToken != TOKeof)
        writeln(lexer.token.value);
}

[1] https://github.com/dlang/dmd/pull/6771
[2] http://code.dlang.org/packages/dmd

-- 
/Jacob Carlborg
July 18, 2017
Could you explain where it can be helpful?
July 18, 2017
On Tuesday, 18 July 2017 at 12:35:10 UTC, Suliman wrote:
> Could you explain where it can be helpful?

For tools, such as source code formatters. They do not have to write the parsers themselves if they use a library such as this one.
July 18, 2017
On 2017-07-18 14:35, Suliman wrote:
> Could you explain where it can be helpful?

As Dukc said, for tools that need to analyze D source code.

-- 
/Jacob Carlborg
July 18, 2017
On Tuesday, 18 July 2017 at 12:07:27 UTC, Jacob Carlborg wrote:
> During the dconf hackathon I set out to create a DUB package for DMD to be used as a library. This has finally been merged [1] and is available here [2]. It contains the lexer and the parser.
>
> A minimal example:
>
> #!/usr/bin/env dub
> /++ dub.sdl:
> name "dmd_lexer_example"
> dependency "dmd" version="~master"
> +/
>
> void main()
> {
>     import ddmd.lexer;
>     import ddmd.tokens;
>     import std.stdio;
>
>     immutable sourceCode = "void test() {} // foobar";
>     scope lexer = new Lexer("test", sourceCode.ptr, 0, sourceCode.length, 0, 0);
>
>     while (lexer.nextToken != TOKeof)
>         writeln(lexer.token.value);
> }
>
> [1] https://github.com/dlang/dmd/pull/6771
> [2] http://code.dlang.org/packages/dmd

Nice, I was not aware that DMD as a library was so close to being a reality.
July 18, 2017
On Tuesday, 18 July 2017 at 12:07:27 UTC, Jacob Carlborg wrote:
> During the dconf hackathon I set out to create a DUB package for DMD to be used as a library. This has finally been merged [1] and is available here [2]. It contains the lexer and the parser.
>
> A minimal example:
>
> #!/usr/bin/env dub
> /++ dub.sdl:
> name "dmd_lexer_example"
> dependency "dmd" version="~master"
> +/
>
> void main()
> {
>     import ddmd.lexer;
>     import ddmd.tokens;
>     import std.stdio;
>
>     immutable sourceCode = "void test() {} // foobar";
>     scope lexer = new Lexer("test", sourceCode.ptr, 0, sourceCode.length, 0, 0);
>
>     while (lexer.nextToken != TOKeof)
>         writeln(lexer.token.value);
> }
>
> [1] https://github.com/dlang/dmd/pull/6771
> [2] http://code.dlang.org/packages/dmd

Awesome, was waiting for this.

July 19, 2017
On Tuesday, 18 July 2017 at 12:07:27 UTC, Jacob Carlborg wrote:
> During the dconf hackathon I set out to create a DUB package for DMD to be used as a library. This has finally been merged [1] and is available here [2]. It contains the lexer and the parser.

Great news!! I think it was not ready yet.
July 30, 2017
On Tuesday, 18 July 2017 at 12:07:27 UTC, Jacob Carlborg wrote:
> During the dconf hackathon I set out to create a DUB package for DMD to be used as a library. This has finally been merged [1] and is available here [2]. It contains the lexer and the parser.

This is great news of course!

But I have some bad news ;-)
Now that the Lexer nicely separated, it is very easy for me to testdrive libFuzzer+AddressSanitizer on the lexer and... Expect many bug reports in the next days. I am testing this code:

```
void fuzzDMDLexer(const(char*) data, size_t length)
{
    scope lexer = new Lexer("test", data, 0, length, false, false);
    lexer.nextToken;

    do  {
        auto drop = lexer.token.value;
    }
    while (lexer.nextToken != TOKeof);
}
```

A short list of heap-overflow memory access bugs (params data and length are consistent):
1. length == 0
2. data == "\n" (line feed, 0xa)
3. data == "only_ascii*" (nothing following the "*" is the problem)
4. data == "%%"
5. data == "*ô"
6. data == "\nÜÜÜ"
7. data == "\x0a''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''"
8. data == ")\xf7"

`void scan(Token* t)` is to blame for most of the bugs I found so far. See e.g. line 980 that causes bug 3:
https://github.com/dlang/dmd/blob/154aa1bfd36333a8777d571e39690511e670bfcf/src/ddmd/lexer.d#L979-L980

Example of stacktrace (bug 8):
```
==11222==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000952 at pc 0x0001028915b5 bp 0x7fff5d3941f0 sp 0x7fff5d3941e8
READ of size 1 at 0x602000000952 thread T0
    #0 0x1028915b4 in _D4ddmd5lexer5Lexer9decodeUTFMFZk lexer.d:2314
    #1 0x102887cae in _D4ddmd5lexer5Lexer4scanMFPS4ddmd6tokens5TokenZv lexer.d:1019
    #2 0x102875089 in _D4ddmd5lexer5Lexer9nextTokenMFZE4ddmd6tokens3TOK lexer.d:222
    #3 0x1028c5d20 in _D9fuzzlexer12fuzzDMDLexerFxPhmZv fuzzlexer.d:31
 ```

I am very excited to see the fuzzer+asan working so nicely!
:-)
  Johan


July 31, 2017
On Sunday, 30 July 2017 at 23:41:40 UTC, Johan Engelen wrote:
> On Tuesday, 18 July 2017 at 12:07:27 UTC, Jacob Carlborg wrote:
>> During the dconf hackathon I set out to create a DUB package for DMD to be used as a library. This has finally been merged [1] and is available here [2]. It contains the lexer and the parser.
>
> This is great news of course!
>
> But I have some bad news ;-)
> Now that the Lexer nicely separated, it is very easy for me to testdrive libFuzzer+AddressSanitizer on the lexer and... Expect many bug reports in the next days.

OK, this wasn't entirely fair.
1. I didn't read the API: the buffer needs to be null-terminated.
2. With a fix [1] to prevent reading beyond the input buffer, I have yet to find a new bug.

The fuzzer is running now... I wonder how long it takes to find the next bug, if any.

-Johan

[1] https://github.com/dlang/dmd/pull/7050