March 11, 2012
On 11-03-2012 00:28, Philippe Sigaud wrote:
> Hello,
>
> I created a new Github project, Pegged, a Parsing Expression Grammar
> (PEG) generator in D.
>
> https://github.com/PhilippeSigaud/Pegged
>
> docs: https://github.com/PhilippeSigaud/Pegged/wiki
>
> PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar
>
> The idea is to give the generator a PEG with the standard syntax. From
> this grammar definition, a set of related parsers will be created, to be
> used at runtime or compile time.
>
> Usage
> -----
>
> To use Pegged, just call the `grammar` function with a PEG and mix it
> in. For example:
>
>
> import pegged.grammar;
>
> mixin(grammar("
> Expr <- Factor AddExpr*
> AddExpr <- ('+'/'-') Factor
> Factor <- Primary MulExpr*
> MulExpr <- ('*'/'/') Primary
> Primary <- Parens / Number / Variable / '-' Primary
>
> Parens <- '(' Expr ')'
> Number <~ [0-9]+
> Variable <- Identifier
> "));
>
>
>
> This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for
> basic arithmetic expressions with operator precedence ('*' and '/' bind
> stronger than '+' or '-'). `Identifier` is a pre-defined parser
> recognizing your basic C-style identifier. Recursive or mutually
> recursive rules are OK (no left recursion for now).
>
> To use a parser, use the `.parse` method. It will return a parse tree
> containing the calls to the different rules:
>
> // Parsing at compile-time:
> enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6");
>
> pragma(msg, parseTree1.capture);
> writeln(parseTree1);
>
> // And at runtime too:
> auto parseTree2 = Expr.parse(" 0 + 123 - 456 ");
> assert(parseTree2.capture == ["0", "+", "123", "-", "456"]);
>
>
>
> Features
> --------
>
> * The complete set of PEG operators are implemented
> * Pegged can parse its input at compile time and generate a complete
> parse tree at compile time. In a word: compile-time string (read: D
> code) transformation and generation.
> * You can parse at runtime also, you lucky you.
> * Use a standard and readable PEG syntax as a DSL, not a bunch of
> templates that hide the parser in noise.
> * But you can use expression templates if you want, as parsers are all
> available as such. Pegged is implemented as an expression template, and
> what's good for the library writer is sure OK for the user too.
> * Some useful additional operators are there too: a way to discard
> matches (thus dumping them from the parse tree), to push captures on a
> stack, to accept matches that are equal to another match
> * Adding new parsers is easy.
> * Grammars are composable: you can put different
> `mixin(grammar(rules));` in a module and then grammars and rules can
> refer to one another. That way, you can have utility grammars providing
> their functionalities to other grammars.
> * That's why Pegged comes with some pre-defined grammars (JSON, etc).
> * Grammars can be dumped in a file to create a D module.
>
> More advanced features, outside the standard PEG perimeter are there to
> bring more power in the mix:
>
> * Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The
> previous rule defines a parametrized parser taking two other parsers
> (namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s.
> * Named captures: any parser can be named with the `=` operator. The
> parse tree generated by the parser (so, also its matches) is delivered
> to the user in the output. Other parsers in the grammar see the named
> captures too.
> * Semantic actions can be added to any rule in a grammar. Once a rule
> has matched, its associated action is called on the rule output and
> passed as final result to other parsers further up the grammar. Do what
> you want to the parse tree. If the passed actions are delegates, they
> can access external variables.
>
>
> Philippe
>

Question: Are the generated parsers, AST nodes, etc classes or structs?

-- 
- Alex
March 11, 2012
On 3/11/12 24:28 , Philippe Sigaud wrote:
> Hello,
>
> I created a new Github project, Pegged, a Parsing Expression Grammar
> (PEG) generator in D.
>
>
> Philippe
>

Very cool!

Quick question, you mention the ability to opt-out of the space-insensitivity, where might one find this?

Thanks!
March 11, 2012
On 11-03-2012 00:28, Philippe Sigaud wrote:
> Hello,
>
> I created a new Github project, Pegged, a Parsing Expression Grammar
> (PEG) generator in D.
>
> https://github.com/PhilippeSigaud/Pegged
>
> docs: https://github.com/PhilippeSigaud/Pegged/wiki
>
> PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar
>
> The idea is to give the generator a PEG with the standard syntax. From
> this grammar definition, a set of related parsers will be created, to be
> used at runtime or compile time.
>
> Usage
> -----
>
> To use Pegged, just call the `grammar` function with a PEG and mix it
> in. For example:
>
>
> import pegged.grammar;
>
> mixin(grammar("
> Expr <- Factor AddExpr*
> AddExpr <- ('+'/'-') Factor
> Factor <- Primary MulExpr*
> MulExpr <- ('*'/'/') Primary
> Primary <- Parens / Number / Variable / '-' Primary
>
> Parens <- '(' Expr ')'
> Number <~ [0-9]+
> Variable <- Identifier
> "));
>
>
>
> This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for
> basic arithmetic expressions with operator precedence ('*' and '/' bind
> stronger than '+' or '-'). `Identifier` is a pre-defined parser
> recognizing your basic C-style identifier. Recursive or mutually
> recursive rules are OK (no left recursion for now).
>
> To use a parser, use the `.parse` method. It will return a parse tree
> containing the calls to the different rules:
>
> // Parsing at compile-time:
> enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6");
>
> pragma(msg, parseTree1.capture);
> writeln(parseTree1);
>
> // And at runtime too:
> auto parseTree2 = Expr.parse(" 0 + 123 - 456 ");
> assert(parseTree2.capture == ["0", "+", "123", "-", "456"]);
>
>
>
> Features
> --------
>
> * The complete set of PEG operators are implemented
> * Pegged can parse its input at compile time and generate a complete
> parse tree at compile time. In a word: compile-time string (read: D
> code) transformation and generation.
> * You can parse at runtime also, you lucky you.
> * Use a standard and readable PEG syntax as a DSL, not a bunch of
> templates that hide the parser in noise.
> * But you can use expression templates if you want, as parsers are all
> available as such. Pegged is implemented as an expression template, and
> what's good for the library writer is sure OK for the user too.
> * Some useful additional operators are there too: a way to discard
> matches (thus dumping them from the parse tree), to push captures on a
> stack, to accept matches that are equal to another match
> * Adding new parsers is easy.
> * Grammars are composable: you can put different
> `mixin(grammar(rules));` in a module and then grammars and rules can
> refer to one another. That way, you can have utility grammars providing
> their functionalities to other grammars.
> * That's why Pegged comes with some pre-defined grammars (JSON, etc).
> * Grammars can be dumped in a file to create a D module.
>
> More advanced features, outside the standard PEG perimeter are there to
> bring more power in the mix:
>
> * Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The
> previous rule defines a parametrized parser taking two other parsers
> (namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s.
> * Named captures: any parser can be named with the `=` operator. The
> parse tree generated by the parser (so, also its matches) is delivered
> to the user in the output. Other parsers in the grammar see the named
> captures too.
> * Semantic actions can be added to any rule in a grammar. Once a rule
> has matched, its associated action is called on the rule output and
> passed as final result to other parsers further up the grammar. Do what
> you want to the parse tree. If the passed actions are delegates, they
> can access external variables.
>
>
> Philippe
>

By the way, bootstrap.d seems to fail to build at the moment:

../pegged/utils/bootstrap.d(1433): found ':' when expecting ')' following template argument list
../pegged/utils/bootstrap.d(1433): members expected
../pegged/utils/bootstrap.d(1433): { } expected following aggregate declaration
../pegged/utils/bootstrap.d(1433): semicolon expected, not '!'
../pegged/utils/bootstrap.d(1433): Declaration expected, not '!'
../pegged/utils/bootstrap.d(1466): unrecognized declaration

-- 
- Alex
March 11, 2012
On 11-03-2012 16:02, Alex Rønne Petersen wrote:
> On 11-03-2012 00:28, Philippe Sigaud wrote:
>> Hello,
>>
>> I created a new Github project, Pegged, a Parsing Expression Grammar
>> (PEG) generator in D.
>>
>> https://github.com/PhilippeSigaud/Pegged
>>
>> docs: https://github.com/PhilippeSigaud/Pegged/wiki
>>
>> PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar
>>
>> The idea is to give the generator a PEG with the standard syntax. From
>> this grammar definition, a set of related parsers will be created, to be
>> used at runtime or compile time.
>>
>> Usage
>> -----
>>
>> To use Pegged, just call the `grammar` function with a PEG and mix it
>> in. For example:
>>
>>
>> import pegged.grammar;
>>
>> mixin(grammar("
>> Expr <- Factor AddExpr*
>> AddExpr <- ('+'/'-') Factor
>> Factor <- Primary MulExpr*
>> MulExpr <- ('*'/'/') Primary
>> Primary <- Parens / Number / Variable / '-' Primary
>>
>> Parens <- '(' Expr ')'
>> Number <~ [0-9]+
>> Variable <- Identifier
>> "));
>>
>>
>>
>> This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for
>> basic arithmetic expressions with operator precedence ('*' and '/' bind
>> stronger than '+' or '-'). `Identifier` is a pre-defined parser
>> recognizing your basic C-style identifier. Recursive or mutually
>> recursive rules are OK (no left recursion for now).
>>
>> To use a parser, use the `.parse` method. It will return a parse tree
>> containing the calls to the different rules:
>>
>> // Parsing at compile-time:
>> enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6");
>>
>> pragma(msg, parseTree1.capture);
>> writeln(parseTree1);
>>
>> // And at runtime too:
>> auto parseTree2 = Expr.parse(" 0 + 123 - 456 ");
>> assert(parseTree2.capture == ["0", "+", "123", "-", "456"]);
>>
>>
>>
>> Features
>> --------
>>
>> * The complete set of PEG operators are implemented
>> * Pegged can parse its input at compile time and generate a complete
>> parse tree at compile time. In a word: compile-time string (read: D
>> code) transformation and generation.
>> * You can parse at runtime also, you lucky you.
>> * Use a standard and readable PEG syntax as a DSL, not a bunch of
>> templates that hide the parser in noise.
>> * But you can use expression templates if you want, as parsers are all
>> available as such. Pegged is implemented as an expression template, and
>> what's good for the library writer is sure OK for the user too.
>> * Some useful additional operators are there too: a way to discard
>> matches (thus dumping them from the parse tree), to push captures on a
>> stack, to accept matches that are equal to another match
>> * Adding new parsers is easy.
>> * Grammars are composable: you can put different
>> `mixin(grammar(rules));` in a module and then grammars and rules can
>> refer to one another. That way, you can have utility grammars providing
>> their functionalities to other grammars.
>> * That's why Pegged comes with some pre-defined grammars (JSON, etc).
>> * Grammars can be dumped in a file to create a D module.
>>
>> More advanced features, outside the standard PEG perimeter are there to
>> bring more power in the mix:
>>
>> * Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The
>> previous rule defines a parametrized parser taking two other parsers
>> (namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s.
>> * Named captures: any parser can be named with the `=` operator. The
>> parse tree generated by the parser (so, also its matches) is delivered
>> to the user in the output. Other parsers in the grammar see the named
>> captures too.
>> * Semantic actions can be added to any rule in a grammar. Once a rule
>> has matched, its associated action is called on the rule output and
>> passed as final result to other parsers further up the grammar. Do what
>> you want to the parse tree. If the passed actions are delegates, they
>> can access external variables.
>>
>>
>> Philippe
>>
>
> By the way, bootstrap.d seems to fail to build at the moment:
>
> .../pegged/utils/bootstrap.d(1433): found ':' when expecting ')'
> following template argument list
> .../pegged/utils/bootstrap.d(1433): members expected
> .../pegged/utils/bootstrap.d(1433): { } expected following aggregate
> declaration
> .../pegged/utils/bootstrap.d(1433): semicolon expected, not '!'
> .../pegged/utils/bootstrap.d(1433): Declaration expected, not '!'
> .../pegged/utils/bootstrap.d(1466): unrecognized declaration
>

Also, I have sent a pull request to fix the build on 64-bit: https://github.com/PhilippeSigaud/Pegged/pull/1

-- 
- Alex
March 11, 2012
>> On Sun, Mar 11, 2012 at 00:34, Alex Rønne Petersen<xtzgzorex@gmail.com>
 wrote:

[Parsing C?]
>> I think so. But you'd have to do add some semantic action to deal with typedefs and macros.
>
>
> Oh, I should have mentioned I only meant the actual language (ignoring
the preprocessor).

OK. I admit I downloaded the C spec online, but was a bit taken aback by the size of it. mot of it was the definition of the standard library, but still...

> Why do you need semantic actions for typedefs though? Can't you defer
resolution of types until after parsing?

Yes, that the way I'd do it. But some people seem to want to do it while parsing. Maybe it blocks some parsing, if the parser encounter an identifier where there should be a type?


March 11, 2012
alex:
> Question: Are the generated parsers, AST nodes, etc classes or structs?

They are structs. See:

https://github.com/PhilippeSigaud/Pegged/wiki/Parse-Trees


March 11, 2012
On 11-03-2012 18:06, Philippe Sigaud wrote:
>  >> On Sun, Mar 11, 2012 at 00:34, Alex Rønne
> Petersen<xtzgzorex@gmail.com <mailto:xtzgzorex@gmail.com>>  wrote:
>
> [Parsing C?]
>  >> I think so. But you'd have to do add some semantic action to deal with
>  >> typedefs and macros.
>  >
>  >
>  > Oh, I should have mentioned I only meant the actual language
> (ignoring the preprocessor).
>
> OK. I admit I downloaded the C spec online, but was a bit taken aback by
> the size of it. mot of it was the definition of the standard library,
> but still...
>
>  > Why do you need semantic actions for typedefs though? Can't you defer
> resolution of types until after parsing?
>
> Yes, that the way I'd do it. But some people seem to want to do it while
> parsing. Maybe it blocks some parsing, if the parser encounter an
> identifier where there should be a type?
>

Hm, I don't *think* C has such ambiguities but I could well be wrong. In any case, if it can handle the non-ambiguous case, that's enough for me. :)

-- 
- Alex
March 11, 2012
> Quick question, you mention the ability to opt-out of the
space-insensitivity, where might one find this?

Yes, undocumented. Use the '>' operator.

You know, I introduced space-insensitivity recently, to simplify some rules and it keeps biting me back.

For example

Line <- (!EOL .)* EOL

The catch is, the (!EOL .) sequence accepts spaces (so, line terminators)
between the !EOL and the .

Crap.

So, I keep writing

Line <- (!EOL > .)* EOL

And I'm more and more convinced that ws a bbad move on my part. Or, at least, give the user a way to opt-out for an entire rule.


March 11, 2012
> Also, I have sent a pull request to fix the build on 64-bit:
https://github.com/PhilippeSigaud/Pegged/pull/1

Merged, thanks!


March 11, 2012
> By the way, bootstrap.d seems to fail to build at the moment:
>
> ../pegged/utils/bootstrap.d(1433): found ':' when expecting ')' following
template argument list
> ../pegged/utils/bootstrap.d(1433): members expected
> ../pegged/utils/bootstrap.d(1433): { } expected following aggregate
declaration
> ../pegged/utils/bootstrap.d(1433): semicolon expected, not '!'
> ../pegged/utils/bootstrap.d(1433): Declaration expected, not '!'
> ../pegged/utils/bootstrap.d(1466): unrecognized declaration

Hmm, it compiled for me a few hours ago. I'll see if I broke something while pushing.

I'll also try to make the whole grammar-modification process easier. Since users can modify Pegged own grammar, I might as well make that fluid and easy to do.

I'll put the Pegged grammar as a string in a separate module and create a function that does the rest: modify the string, it will recompile the entire grammar for you.