March 11, 2012
> Hm, I don't *think* C has such ambiguities but I could well be wrong. In
any case, if it can handle the non-ambiguous case, that's enough for me.

I wanted to tackle D this week, but I might as well begin with C :)

Do you happen to have any handy and readable EBNF grammar for C? At least for D, I have dlang.org.


March 11, 2012
On 11-03-2012 18:19, Philippe Sigaud wrote:
>  > Hm, I don't *think* C has such ambiguities but I could well be wrong.
> In any case, if it can handle the non-ambiguous case, that's enough for me.
>
> I wanted to tackle D this week, but I might as well begin with C :)
>
> Do you happen to have any handy and readable EBNF grammar for C? At
> least for D, I have dlang.org <http://dlang.org>.
>

Yep, see: http://ssw.jku.at/Coco/ (C.atg)

Coco/R is more or less EBNF.

-- 
- Alex
March 11, 2012
On 11-03-2012 18:17, Philippe Sigaud wrote:
>  > By the way, bootstrap.d seems to fail to build at the moment:
>  >
>  > ../pegged/utils/bootstrap.d(1433): found ':' when expecting ')'
> following template argument list
>  > ../pegged/utils/bootstrap.d(1433): members expected
>  > ../pegged/utils/bootstrap.d(1433): { } expected following aggregate
> declaration
>  > ../pegged/utils/bootstrap.d(1433): semicolon expected, not '!'
>  > ../pegged/utils/bootstrap.d(1433): Declaration expected, not '!'
>  > ../pegged/utils/bootstrap.d(1466): unrecognized declaration
>
> Hmm, it compiled for me a few hours ago. I'll see if I broke something
> while pushing.
>
> I'll also try to make the whole grammar-modification process easier.
> Since users can modify Pegged own grammar, I might as well make that
> fluid and easy to do.
>
> I'll put the Pegged grammar as a string in a separate module and create
> a function that does the rest: modify the string, it will recompile the
> entire grammar for you.
>

Is bootstrap.d currently essential to actually use Pegged?

-- 
- Alex
March 11, 2012
On 3/11/12 3:02 AM, Philippe Sigaud wrote:
> There is an operator to drop unnecessary nodes (':').

Good.

> Apart from that,
> a Pegged grammar is a self-contained entity: it automatically cuts
> nodes coming from other grammars, to simplify the tree (it keeps the
> matcheds substrings, of course). I plan to code different type of
> grammars, to play with the way the deal with nodes coming from outside
> rules and grammar composition.

Not getting this, but picture me with an intense look and nodding.

> As for the root, the PEG rule is that the first rule in the grammar is
> the root. But currently, I deliberately made my grammars unsealed, in
> that the user can call any rule inside the grammar to begin parsing:
>
> mixin(grammar("
>      A<- B C
>      B<- ...
>      C<-
> "));
>
> A.parse("some input"); // standard way to do it, the parse tree will
> be rooted on an 'A' node.
> B.parse("some input"); // also possible, the parse tree will be rooted
> on a 'B' node.

That sounds cool, but how do you say that in the construct

Expression <- Term "+" Term

the "+" should become the root?


Andrei
March 12, 2012
On 11-03-2012 00:28, Philippe Sigaud wrote:
> Hello,
>
> I created a new Github project, Pegged, a Parsing Expression Grammar
> (PEG) generator in D.
>
> https://github.com/PhilippeSigaud/Pegged
>
> docs: https://github.com/PhilippeSigaud/Pegged/wiki
>
> PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar
>
> The idea is to give the generator a PEG with the standard syntax. From
> this grammar definition, a set of related parsers will be created, to be
> used at runtime or compile time.
>
> Usage
> -----
>
> To use Pegged, just call the `grammar` function with a PEG and mix it
> in. For example:
>
>
> import pegged.grammar;
>
> mixin(grammar("
> Expr <- Factor AddExpr*
> AddExpr <- ('+'/'-') Factor
> Factor <- Primary MulExpr*
> MulExpr <- ('*'/'/') Primary
> Primary <- Parens / Number / Variable / '-' Primary
>
> Parens <- '(' Expr ')'
> Number <~ [0-9]+
> Variable <- Identifier
> "));
>
>
>
> This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for
> basic arithmetic expressions with operator precedence ('*' and '/' bind
> stronger than '+' or '-'). `Identifier` is a pre-defined parser
> recognizing your basic C-style identifier. Recursive or mutually
> recursive rules are OK (no left recursion for now).
>
> To use a parser, use the `.parse` method. It will return a parse tree
> containing the calls to the different rules:
>
> // Parsing at compile-time:
> enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6");
>
> pragma(msg, parseTree1.capture);
> writeln(parseTree1);
>
> // And at runtime too:
> auto parseTree2 = Expr.parse(" 0 + 123 - 456 ");
> assert(parseTree2.capture == ["0", "+", "123", "-", "456"]);
>
>
>
> Features
> --------
>
> * The complete set of PEG operators are implemented
> * Pegged can parse its input at compile time and generate a complete
> parse tree at compile time. In a word: compile-time string (read: D
> code) transformation and generation.
> * You can parse at runtime also, you lucky you.
> * Use a standard and readable PEG syntax as a DSL, not a bunch of
> templates that hide the parser in noise.
> * But you can use expression templates if you want, as parsers are all
> available as such. Pegged is implemented as an expression template, and
> what's good for the library writer is sure OK for the user too.
> * Some useful additional operators are there too: a way to discard
> matches (thus dumping them from the parse tree), to push captures on a
> stack, to accept matches that are equal to another match
> * Adding new parsers is easy.
> * Grammars are composable: you can put different
> `mixin(grammar(rules));` in a module and then grammars and rules can
> refer to one another. That way, you can have utility grammars providing
> their functionalities to other grammars.
> * That's why Pegged comes with some pre-defined grammars (JSON, etc).
> * Grammars can be dumped in a file to create a D module.
>
> More advanced features, outside the standard PEG perimeter are there to
> bring more power in the mix:
>
> * Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The
> previous rule defines a parametrized parser taking two other parsers
> (namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s.
> * Named captures: any parser can be named with the `=` operator. The
> parse tree generated by the parser (so, also its matches) is delivered
> to the user in the output. Other parsers in the grammar see the named
> captures too.
> * Semantic actions can be added to any rule in a grammar. Once a rule
> has matched, its associated action is called on the rule output and
> passed as final result to other parsers further up the grammar. Do what
> you want to the parse tree. If the passed actions are delegates, they
> can access external variables.
>
>
> Philippe
>

Hm, since ' is used in the grammar of Pegged, how do I express it in my grammar spec? Is there a predefined rule for it, or is \' supposed to work?

-- 
- Alex
March 12, 2012
On 12-03-2012 02:27, Alex Rønne Petersen wrote:
> On 11-03-2012 00:28, Philippe Sigaud wrote:
>> Hello,
>>
>> I created a new Github project, Pegged, a Parsing Expression Grammar
>> (PEG) generator in D.
>>
>> https://github.com/PhilippeSigaud/Pegged
>>
>> docs: https://github.com/PhilippeSigaud/Pegged/wiki
>>
>> PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar
>>
>> The idea is to give the generator a PEG with the standard syntax. From
>> this grammar definition, a set of related parsers will be created, to be
>> used at runtime or compile time.
>>
>> Usage
>> -----
>>
>> To use Pegged, just call the `grammar` function with a PEG and mix it
>> in. For example:
>>
>>
>> import pegged.grammar;
>>
>> mixin(grammar("
>> Expr <- Factor AddExpr*
>> AddExpr <- ('+'/'-') Factor
>> Factor <- Primary MulExpr*
>> MulExpr <- ('*'/'/') Primary
>> Primary <- Parens / Number / Variable / '-' Primary
>>
>> Parens <- '(' Expr ')'
>> Number <~ [0-9]+
>> Variable <- Identifier
>> "));
>>
>>
>>
>> This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for
>> basic arithmetic expressions with operator precedence ('*' and '/' bind
>> stronger than '+' or '-'). `Identifier` is a pre-defined parser
>> recognizing your basic C-style identifier. Recursive or mutually
>> recursive rules are OK (no left recursion for now).
>>
>> To use a parser, use the `.parse` method. It will return a parse tree
>> containing the calls to the different rules:
>>
>> // Parsing at compile-time:
>> enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6");
>>
>> pragma(msg, parseTree1.capture);
>> writeln(parseTree1);
>>
>> // And at runtime too:
>> auto parseTree2 = Expr.parse(" 0 + 123 - 456 ");
>> assert(parseTree2.capture == ["0", "+", "123", "-", "456"]);
>>
>>
>>
>> Features
>> --------
>>
>> * The complete set of PEG operators are implemented
>> * Pegged can parse its input at compile time and generate a complete
>> parse tree at compile time. In a word: compile-time string (read: D
>> code) transformation and generation.
>> * You can parse at runtime also, you lucky you.
>> * Use a standard and readable PEG syntax as a DSL, not a bunch of
>> templates that hide the parser in noise.
>> * But you can use expression templates if you want, as parsers are all
>> available as such. Pegged is implemented as an expression template, and
>> what's good for the library writer is sure OK for the user too.
>> * Some useful additional operators are there too: a way to discard
>> matches (thus dumping them from the parse tree), to push captures on a
>> stack, to accept matches that are equal to another match
>> * Adding new parsers is easy.
>> * Grammars are composable: you can put different
>> `mixin(grammar(rules));` in a module and then grammars and rules can
>> refer to one another. That way, you can have utility grammars providing
>> their functionalities to other grammars.
>> * That's why Pegged comes with some pre-defined grammars (JSON, etc).
>> * Grammars can be dumped in a file to create a D module.
>>
>> More advanced features, outside the standard PEG perimeter are there to
>> bring more power in the mix:
>>
>> * Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The
>> previous rule defines a parametrized parser taking two other parsers
>> (namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s.
>> * Named captures: any parser can be named with the `=` operator. The
>> parse tree generated by the parser (so, also its matches) is delivered
>> to the user in the output. Other parsers in the grammar see the named
>> captures too.
>> * Semantic actions can be added to any rule in a grammar. Once a rule
>> has matched, its associated action is called on the rule output and
>> passed as final result to other parsers further up the grammar. Do what
>> you want to the parse tree. If the passed actions are delegates, they
>> can access external variables.
>>
>>
>> Philippe
>>
>
> Hm, since ' is used in the grammar of Pegged, how do I express it in my
> grammar spec? Is there a predefined rule for it, or is \' supposed to work?
>

Ah, Quote it is. I also noticed there is DoubleQuote. Is " used for anything in the Pegged grammar?

-- 
- Alex
March 13, 2012
>>
>
> Admittedly I have not heard of PEGs before, so I'm curious: Is this powerful enough to parse a language such as C?

I've just read a few articles referenced from this page, and the second link was by someone who had done java 1.5, the second link
http://bford.info/packrat/
http://www.romanredz.se/papers/FI2007.pdf

It is interesting but that article left me with some questions about the implementation in order to make it useful for my needs.

I had done an experiment with mvel expression evaluation in java and gotten good improvements relative to homebrew expression evaluators.

However, the mvel expressions are missing the ability to express array operations clearly, which is something that is very clear in D, and my particular need is to enable the user to express array operations.

With this pegged embedded parser, it appears to me you could provide a group of  context symbols as part of a language definition, similar to providing a list of reserved words, so that the parsing of the user's expression would also validate the symbols used.

Also, I've been reading David Simcha's parallel_algorithm.d, here:
https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorithm.d
and in the parallelArrayOp portion, he has presented a way to turn the D array expressions into code that is executed in parallel on multicore systems. That is something I'd want to use, but the manual lexing requirement is a bit clunky, and the rules are unclear to me,  so it seems to me a combination with the pegged parser could make that more accessible.
Thanks,
Jay






March 13, 2012
On Tuesday, 13 March 2012 at 05:25:38 UTC, Jay Norwood wrote:
>
>>>
>>
>> Admittedly I have not heard of PEGs before, so I'm curious: Is this powerful enough to parse a language such as C?
>
> I've just read a few articles referenced from this page, and the second link was by someone who had done java 1.5, the second link
> http://bford.info/packrat/
> http://www.romanredz.se/papers/FI2007.pdf

Also in the later paper he did a C parser, so I suppose that is the answer ...

http://www.romanredz.se/papers/FI2008.pdf
March 13, 2012
On 03/10/2012 03:28 PM, Philippe Sigaud wrote:
> Hello,
>
> I created a new Github project, Pegged, a Parsing Expression Grammar
> (PEG) generator in D.
>
> https://github.com/PhilippeSigaud/Pegged
>
> docs: https://github.com/PhilippeSigaud/Pegged/wiki

Just WOW!

Nice to have on your WIKI would be a EBNF to PEG sheet.

Wirth EBNF     	Pegged
A = BC.		A <- B C
A = B|C.	A <- C / C
A = [B].	A <- B?
A = {B}.	A <- B*


Having EBNF expressed in Pegged ...

EBNF <- Procuction+
Production <- Identifier '=' Expression '.'
Expression <- Term ( '|' Term)*
Term <- Factor Factor*
Factor <- 	Identifier /
		Literal /
		'[' Expression ']'
		 / '{' Expression }'/
		'(' Expression ')'
lowerCase  <- [a-z]
upperCase  <- [A-Z]
Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)*
Literal <- ("'" .+ "'" /  '"' .+ '"')

Due to the fact that EBNF can be expressed in EBNF it should be possible to parse an arbitrary EBNF file and generate PEG output.
What do you think ?

BTW.. Is my EBNF PEG description correct ? TIA Bjoern
March 13, 2012
On 12.03.2012 16:43, bls wrote:
> On 03/10/2012 03:28 PM, Philippe Sigaud wrote:
>> Hello,
>>
>> I created a new Github project, Pegged, a Parsing Expression Grammar
>> (PEG) generator in D.
>>
>> https://github.com/PhilippeSigaud/Pegged
>>
>> docs: https://github.com/PhilippeSigaud/Pegged/wiki
>
> Just WOW!
>
> Nice to have on your WIKI would be a EBNF to PEG sheet.
>
> Wirth EBNF Pegged
> A = BC. A <- B C
> A = B|C. A <- C / C

Maybe A <- B / C. And even then it's not exactly equivalent if the grammar was ambiguous.
Imagine: B <- a, C <- aa

-- 
Dmitry Olshansky