Thread overview
Goldie Parsing System v0.5 - Speed
May 18, 2011
Nick Sabalausky
May 18, 2011
Stephan
May 18, 2011
Nick Sabalausky
May 18, 2011
Goldie Parsing System v0.5 is now out. This version focuses mainly on speed improvements.

== Links: ==

Homepage and Documentation:
    http://www.semitwist.com/goldie/

Prepackaged Downloads:
    http://www.dsource.org/projects/goldie/browser/downloads

== New in v0.5: ==

    - Improved lexing/parsing speed by about 5x-6x.

    - Small additional speedup lexing languages with large character sets
(such as Unicode).

    - GRMC: Grammar Compiler: Supports {All Valid} character set.

    - GRMC: Grammar Compiler: Complex grammars are compiled to CGT up to
about 4x-8x faster.

    - GRMC: Grammar Compiler: Verbose (-v) flag shows each step and amount
of time taken.

    - Parse Anything: No more unhandled exception when parsing a source with
an error.

    - Fixed to work with DMD 2.053 (still works with 2.052, too).

There are still more optimizations than can be done, but I felt this was enough to warrant a new release.


May 18, 2011
On 18.05.2011 05:47, Nick Sabalausky wrote:
> Goldie Parsing System v0.5 is now out. This version focuses mainly on speed
> improvements.
>
> == Links: ==
>
> Homepage and Documentation:
>      http://www.semitwist.com/goldie/
>
> Prepackaged Downloads:
>      http://www.dsource.org/projects/goldie/browser/downloads
>
> == New in v0.5: ==
>
>      - Improved lexing/parsing speed by about 5x-6x.
>
>      - Small additional speedup lexing languages with large character sets
> (such as Unicode).
>
>      - GRMC: Grammar Compiler: Supports {All Valid} character set.
>
>      - GRMC: Grammar Compiler: Complex grammars are compiled to CGT up to
> about 4x-8x faster.
>
>      - GRMC: Grammar Compiler: Verbose (-v) flag shows each step and amount
> of time taken.
>
>      - Parse Anything: No more unhandled exception when parsing a source with
> an error.
>
>      - Fixed to work with DMD 2.053 (still works with 2.052, too).
>
> There are still more optimizations than can be done, but I felt this was
> enough to warrant a new release.
>
>

Great work.

Is it possible to generate a parser for D with this ?

Regards,
Stephan
May 18, 2011
"Stephan" <spam@extrawurst.org> wrote in message news:ir05te$tbd$1@digitalmars.com...
> On 18.05.2011 05:47, Nick Sabalausky wrote:
>> Goldie Parsing System v0.5 is now out. This version focuses mainly on
>> speed
>> improvements.
>>
>
> Great work.
>

Thanks :)

> Is it possible to generate a parser for D with this ?
>

It should be possible to write a grammar that handles most of D. But there would be some awkwardness and corner cases that, to really be handled right, would need some enhancements I haven't put in yet.

For example:

- Nested comments aren't yet officially supported. GOLD (which Goldie is based on) will support them in the currently-in-beta v4.2 ( http://www.devincook.com/goldparser/v4.2.htm ). I intend to make Goldie fully compatible with all the new GOLD v4.2 features, but just haven't gotten to them yet. In the meantime, what you can do is lex the D source first, then go through the resulting token array removing everything from a "/+" token to its matching "+/" token (there will be a bunch of junk in between, including some error tokens, you can just rip it all out), and then send that through the parser.

- Another comment-related thing that'll be fixed with the v4.2 enhancements: Currently, GOLD and Goldie handle (non-nested) block comments by actually lexing what's inside the comment (and ignoring any errors). Normally this works out fine, but it does lead to some occasional edge-cases where the "*/" isn't handled right.

- D relies on certain disambiguation rules. For instance: "a*b" could be either a multiplication expression or a pointer declaration. D handles this by saying "if something can be either an expression or a declaration, then always interpret it as (umm...actually I forget which one it always chooses, but it's always that same one)". Goldie (and GOLD) currently doesn't have any conflict resolution. If you try to create a grammar that has such an ambiguity, you'll just get a "reduce-reduce conflict" error, or "shift-reduce" problems. The way to work around this is to design the grammar to completely conflate the two notions, so instead of having <Expression> and <Declaration>, you'd just have something like <ExprOrDecl>. Unfortunately, this isn't always easy, it does tend to obfuscate the grammar, it makes the nonterminals less meaningful, and it'll create much more work for your semantics pass. I do intend to solve this, but it'll probably be a very non-trival matter. More discussion (possibly a bit technical) on this issue is here: http://groups.google.com/group/gold-parsing-system/browse_thread/thread/5959e0cfef76ce68

FWIW, Goldie does include a lex-only grammar for D2, which could be used as a starting point (although it's possible I might have gotten some edge cases wrong regarding the decimal literals. Also, this grammar is currently ASCII-only, but that can easily be changed):

http://www.dsource.org/projects/goldie/browser/tags/v0.5/lang/dlex.grm