Pegged, a Parsing Expression Grammar (PEG) generator in D (page 6) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Announce » Pegged, a Parsing Expression Grammar (PEG) generator in D (page 6)

March 15, 2012

Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

Posted by Timon Gehr
in reply to Jay Norwood

Timon Gehr

Posted in reply to Jay Norwood

On 03/15/2012 12:09 AM, Jay Norwood wrote:
> On Saturday, 10 March 2012 at 23:28:42 UTC, Philippe Sigaud wrote:
>> * Grammars can be dumped in a file to create a D module.
>>
>
>
> In reading the D spec I've seen a few instance where there are infered
> items such as auto for variables and various parameters in templates. I
> presume your D grammar will have to have some rules to be able to infer
> these as well.
>
> It seems like it would not be a big step to output what are the infered
> proper statements as the .capture output. Is that correct? I think that
> would be helpful in some cases to be able to view the fully expanded
> expression.

This is not what a parser does. Resolving symbols and assigning types to expressions is part of semantic analysis.

March 17, 2012

Re: Pegged, From EBNF to PEG

Posted by Philippe Sigaud
in reply to Dmitry Olshansky

Philippe Sigaud

Posted in reply to Dmitry Olshansky

On Wed, Mar 14, 2012 at 10:48, Dmitry Olshansky <dmitry.olsh@gmail.com> wrote:

>> That's one of the caveats on PEG. That and greedy operators.
>>
>> 'a*a' never succeeds because 'a*' consumes all the available a's.
>>
>
> Hey, wait, I thought there has to be backtrack here, i.e. when second 'a' finally fails?

PEG only do local backtracking in 'or' choices, not in sequences. I think that's because the original author was dealing with packrat parsing and its O(input-size) guarantee.

I remember trying compile-time backtracking in another similar library in D 1-2 years ago and getting lots of pb. I might add that in Pegged, but I don't know the consequences. How do you do that in std.regex?

(nice article btw, I learnt some about regexes)

March 17, 2012

Re: Pegged, From EBNF to PEG

Posted by Dmitry Olshansky
in reply to Philippe Sigaud

Dmitry Olshansky

Posted in reply to Philippe Sigaud

On 17.03.2012 10:59, Philippe Sigaud wrote:
> On Wed, Mar 14, 2012 at 10:48, Dmitry Olshansky<dmitry.olsh@gmail.com>  wrote:
>
>>> That's one of the caveats on PEG. That and greedy operators.
>>>
>>> 'a*a' never succeeds because 'a*' consumes all the available a's.
>>>
>>
>> Hey, wait, I thought there has to be backtrack here, i.e. when second 'a'
>> finally fails?
>
> PEG only do local backtracking in 'or' choices, not in sequences. I
> think that's because the original author was dealing with packrat
> parsing and its O(input-size) guarantee.

Ok, let's agree on fact that semantically
a* is :
As <- a As / a

and a*? is this:
As <- a / a As

Now that's local ;)

>
> I remember trying compile-time backtracking in another similar library
> in D 1-2 years ago and getting lots of pb. I might add that in Pegged,
> but I don't know the consequences. How do you do that in std.regex?

There are two fundamental ways to get around the problem, the easiest
to implement is to use a stack to record a position in input + number of alternative/production (I call it a PC - program counter) every time there is a "fork" like that. Then when the current path ultimately fails pop stack, reset input and go over again until you either match or empty the stack. That's how to avoid deeeep recursion that happens here.

The problematic thing now is combinatorial explosion of
a* a* a* .... //simplified, but you get the idea

The trick to avoid this sort of crap is to
1) agree that whatever matches first has higher priority (left-most match) that's a simple disambiguation rule.
2) record what positions + pc you place on stack e.g. use a set, or in fact, a bitmap to push every unique combination  (pc,index) only once.

Voila! Now you have a linear worst-case algorithm with (M*N) complexity where M is number of all possible PCs, and N is number of all possible indexes in input.

Now if we put aside all crap talk about mathematical purity and monads, and you name it, a packrat is essentially this, a dynamic programming trick applied to recursive top-down parsing. The trick could be extended even more to avoid extra checks on input characters, but the essence is this "memoization".

>
> (nice article btw, I learnt some about regexes)

Thanks, I hope it makes them more popular.
Might as well keep me busy fixing bugs :)

-- 
Dmitry Olshansky

March 17, 2012

Re: Pegged, From EBNF to PEG

Posted by Philippe Sigaud
in reply to Dmitry Olshansky

Philippe Sigaud

Posted in reply to Dmitry Olshansky

On Sat, Mar 17, 2012 at 10:09, Dmitry Olshansky <dmitry.olsh@gmail.com> wrote:

> Ok, let's agree on fact that semantically
> a* is :
> As <- a As / a
>
> and a*? is this:
> As <- a / a As
>
> Now that's local ;)

It's local, yes. But the pb is with   Expr <- A* B C D, when D fails. PEG sequences don't backtrack.

>> I remember trying compile-time backtracking in another similar library in D 1-2 years ago and getting lots of pb. I might add that in Pegged, but I don't know the consequences. How do you do that in std.regex?
>
>
>
> There are two fundamental ways to get around the problem
(snip)

Thanks for the explanations. Does std.regex implement this?

> Now if we put aside all crap talk about mathematical purity and monads, and you name it, a packrat is essentially this, a dynamic programming trick applied to recursive top-down parsing. The trick could be extended even more to avoid extra checks on input characters, but the essence is this "memoization".

I plan to store indices anyway. I might add this in the future. A read something on using PEGs to parse Java and C and there was no need to do a complete memoization: storing the last parse result as a temporary seemed to be enough.

>> (nice article btw, I learnt some about regexes)
>
>
> Thanks, I hope it makes them more popular.
> Might as well keep me busy fixing bugs :)

As people said on Reddit, you should make more whooping sounds about CT-regex. That was what wowed me and pushed me to tackle CT-parsing again.

March 17, 2012

Re: Pegged, From EBNF to PEG

Posted by Dmitry Olshansky
in reply to Philippe Sigaud

Dmitry Olshansky

Posted in reply to Philippe Sigaud

On 17.03.2012 18:11, Philippe Sigaud wrote:
> On Sat, Mar 17, 2012 at 10:09, Dmitry Olshansky<dmitry.olsh@gmail.com>  wrote:
>
>> Ok, let's agree on fact that semantically
>> a* is :
>> As<- a As / a
>>
>> and a*? is this:
>> As<- a / a As
>>
>> Now that's local ;)
>
> It's local, yes. But the pb is with   Expr<- A* B C D, when D fails.
> PEG sequences don't backtrack.

I'd argue they do. As I see it as:
Expr <- As B C D / B C D
As <- A / A As
(or use an epsilon production for As, is it allowed in pegged ?)

>
>
>>> I remember trying compile-time backtracking in another similar library
>>> in D 1-2 years ago and getting lots of pb. I might add that in Pegged,
>>> but I don't know the consequences. How do you do that in std.regex?
>>
>>
>>
>> There are two fundamental ways to get around the problem
> (snip)
>
> Thanks for the explanations. Does std.regex implement this?
>

It does the second way that I haven't told about, and had hard time to implement in C-T :)
And the simple version of what I presented (i.e. stack stuff) is used in CT-regex and as fallback in R-T. The problem with doing a bitmap memoization is that regex often used to _search_ things in large inputs. Some compromise and/or thresholds have to be estimated and found.
Parsing on the contrary guaranties that the whole input is used or in error, hence bitmap is not wasted.

>
>> Now if we put aside all crap talk about mathematical purity and monads, and
>> you name it, a packrat is essentially this, a dynamic programming trick
>> applied to recursive top-down parsing. The trick could be extended even more
>> to avoid extra checks on input characters, but the essence is this
>> "memoization".
>
> I plan to store indices anyway. I might add this in the future. A read
> something on using PEGs to parse Java and C and there was no need to
> do a complete memoization: storing the last parse result as a
> temporary seemed to be enough.
>

Well even straight no-memorization is somehow enough, it's just a way to ensure no extra work is done. C & Java are not much of backtracky grammars.

>>> (nice article btw, I learnt some about regexes)
>>
>>
>> Thanks, I hope it makes them more popular.
>> Might as well keep me busy fixing bugs :)
>
> As people said on Reddit, you should make more whooping sounds about
> CT-regex. That was what wowed me and pushed me to tackle CT-parsing
> again.

It's just I'm also well aware of how much luck it (still) takes to fly ;)
The asserts that compare C-T vs R-T go off too often for my taste, other then this the memory usage during compile is too high.
I recall once dmd had GC re-enabled for brief period, dmd used no more then ~64Mb doing real crazy ct-regex stuff.

OK, back to C-T parsing, I have one crazy idea that I can't get away from - add operator precedence grammar into the mix. From what I observe it should integrate into PEG smoothly. Then it would make "military-grade" hybrid that uses operator precedence parsing of expressions and the like. Few more tricks and it may beat some
existing parser generators.

See this post, where I tried to describe that idea early on:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=159562

I might catch spare time to go about adding it myself, the only tricky thing is to embed plain semantic actions, as AST generation would be more or less the same.

-- 
Dmitry Olshansky

March 17, 2012

Re: Pegged, From EBNF to PEG

Posted by Philippe Sigaud
in reply to Dmitry Olshansky

Philippe Sigaud

Posted in reply to Dmitry Olshansky

On Sat, Mar 17, 2012 at 15:48, Dmitry Olshansky <dmitry.olsh@gmail.com> wrote:

>> PEG sequences don't backtrack.
>
>
> I'd argue they do. As I see it as:
> Expr <- As B C D / B C D
> As <- A / A As

That's what people doing Regex-to-PEG translations do, yes. But it's not the spontaneous behavior of A* B C D in PEG.

But that means I could add a switch to transform the expressions that way.

> (or use an epsilon production for As, is it allowed in pegged ?)

I called it 'Eps', it's a predefined parser that always matches and
consumes nothing.
I used the greek epsilon at the beginning (ε) but thought that many
people would shout at this :)

> OK, back to C-T parsing, I have one crazy idea that I can't get away from -
> add operator precedence grammar into the mix. From what I observe it should
> integrate into PEG smoothly. Then it would make "military-grade" hybrid that
> uses operator precedence parsing of expressions and the like. Few more
> tricks and it may beat some
> existing parser generators.
>
> See this post, where I tried to describe that idea early on: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=159562

I remember reading this. But I don't feel I'm up to it for now.

> I might catch spare time to go about adding it myself, the only tricky thing is to embed plain semantic actions, as AST generation would be more or less the same.

Cool!

March 23, 2012

Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

Posted by Martin Nowak
in reply to Philippe Sigaud

Martin Nowak

Posted in reply to Philippe Sigaud

On Sun, 11 Mar 2012 00:28:42 +0100, Philippe Sigaud <philippe.sigaud@gmail.com> wrote:

> Hello,
>
> I created a new Github project, Pegged, a Parsing Expression Grammar (PEG) generator in D.
>
> https://github.com/PhilippeSigaud/Pegged
>
> docs: https://github.com/PhilippeSigaud/Pegged/wiki
>
> PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar
>
> The idea is to give the generator a PEG with the standard syntax.  From this grammar definition, a set of related parsers will be created, to be used at runtime or compile time.
>

Just wanted to draw your attention on a pull request of mine.
https://github.com/D-Programming-Language/dmd/pull/426

Using an added option (-M|Mf|Md) dmd will gather all mixin
strings into a file. This is very helpful for better error
messages and debugging.

August 07, 2013

Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

Posted by Elie Morisse
in reply to Philippe Sigaud

Elie Morisse

Posted in reply to Philippe Sigaud

Hi Philippe,

I wrote a Kate(the official text editor of KDE, also used in KDevelop) syntax highlighting file for Pegged:

http://www.homo-nebulus.fr/dlang/pegged.xml

Screenshot:
http://imgur.com/sb3jFqs.png

Of course for the syntax highlighting to work the grammar must be in a separate file and included like this in a .d file:

  mixin(grammar(import("Tales.pgd")));

August 09, 2013

Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

Posted by Tom Compton
in reply to Alex Rønne Petersen

Tom Compton

Posted in reply to Alex Rønne Petersen

On Sunday, 11 March 2012 at 12:40:34 UTC, Alex Rønne Petersen
wrote:

> Question: Are the generated parsers, AST nodes, etc classes or structs?

Why is this important?

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation