Let's stop parser Hell (page 10)

On 07-Jul-12 22:29, Chad J wrote: > On 07/07/2012 01:01 PM, Roman D. Boiko wrote: >> On Saturday, 7 July 2012 at 16:56:06 UTC, Chad J wrote: >>> Yeah, it's good to hear this notion reinforced. I had this suspicion >>> that the packrat parser is not necessarily the best/fastest solution, >>> mostly because of the large allocation that has to happen before you >>> get O(n) performance. Thus I figured that pegged might eventually use >>> different parsing strategies underneath it all, possibly with a lot of >>> special-casing and clever hand-tuned and profiled optimizations. At >>> least that's what makes sense to me. >> >> At the very least, we could use DFA instead of backtracking where >> possible. This is the approach implemented in ANTLR, but I wanted to >> introduce them before I knew about existence of the latter, simply >> because this would likely produce the fastest parsers possible. > > These were my thoughts exactly, although somewhat unsubstantiated in my > case ;) Yup the nice thing about ANTLR is the usage of DFA. e.g. it's used for disambiguating alternatives that LL(k) could never do. -- Dmitry Olshansky

July 07, 2012

Re: Let's stop parser Hell

Posted by Dmitry Olshansky
in reply to Chad J

Permalink

Dmitry Olshansky

Posted in reply to Chad J

Permalink

On 07-Jul-12 20:56, Chad J wrote:
> On 07/07/2012 12:37 PM, Roman D. Boiko wrote:
>> On Saturday, 7 July 2012 at 16:27:00 UTC, Philippe Sigaud wrote:
>>
>>> Note that PEG does not impose to use packrat parsing, even though it
>>> was developed to use it. I think it's a historical 'accident' that put
>>> the two together: Bryan Ford thesis used the two together.
>>>
>>> Note that many PEG parsers do not rely on packrat (Pegged does not).
>>> There are a bunch of articles on Bryan Ford's website by a guy
>>> writting a PEG parser for Java, and who found that storing the last
>>> rules was enought to get a slight speed improvement, buth that doing
>>> anymore sotrage was detrimental to the parser's overall efficiency.
>>
>> That's great! Anyway I want to understand the advantages and limitations
>> of both Pegged and ANTLR, and probably study some more techniques. Such
>> research consumes a lot of time but can be done incrementally along with
>> development.
>
> Yeah, it's good to hear this notion reinforced.  I had this suspicion
> that the packrat parser is not necessarily the best/fastest solution,
> mostly because of the large allocation that has to happen before you get
> O(n) performance.  Thus I figured that pegged might eventually use
> different parsing strategies underneath it all, possibly with a lot of
> special-casing and clever hand-tuned and profiled optimizations.  At
> least that's what makes sense to me.

Another idea that I've never realized is to add operator precedence grammar to pegged. Of course it should fit naturally with traditional PEG, for instance taking responsibility for parsing expressions.

-- 
Dmitry Olshansky

On Saturday, 7 July 2012 at 19:29:47 UTC, Dmitry Olshansky wrote: > Another idea that I've never realized is to add operator precedence grammar to pegged. Of course it should fit naturally with traditional PEG, for instance taking responsibility for parsing expressions. But that's already available by explicitly defining expression grammar via nested rules. See for example the examples/dgrammar.d in Pegged. This way, for example, multiplication has precedence over addition. (It looks like I misunderstood you. Did I?)

On 07-Jul-12 23:35, Roman D. Boiko wrote: > On Saturday, 7 July 2012 at 19:29:47 UTC, Dmitry Olshansky wrote: >> Another idea that I've never realized is to add operator precedence >> grammar to pegged. Of course it should fit naturally with traditional >> PEG, for instance taking responsibility for parsing expressions. > > But that's already available by explicitly defining expression grammar > via nested rules. See for example the examples/dgrammar.d in Pegged. > This way, for example, multiplication has precedence over addition. (It > looks like I misunderstood you. Did I?) Yes, I've meant something completely different ;) http://en.wikipedia.org/wiki/Operator-precedence_grammar -- Dmitry Olshansky

On 07/07/2012 09:35 PM, Roman D. Boiko wrote: > On Saturday, 7 July 2012 at 19:29:47 UTC, Dmitry Olshansky wrote: >> Another idea that I've never realized is to add operator precedence >> grammar to pegged. Of course it should fit naturally with traditional >> PEG, for instance taking responsibility for parsing expressions. > > But that's already available by explicitly defining expression grammar > via nested rules. See for example the examples/dgrammar.d in Pegged. > This way, for example, multiplication has precedence over addition. (It > looks like I misunderstood you. Did I?) http://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method

On Saturday, 7 July 2012 at 19:50:37 UTC, Timon Gehr wrote: > http://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method OK, at least I didn't misunderstand. So my question was whether the alternative that I described and which exists in PEG is somehow worse than OPP. Wikipedia seems to provide an answer to that: OPP tend to be simpler. (I didn't investigate this topic further.)

On 7/7/12 1:01 PM, Roman D. Boiko wrote: > On Saturday, 7 July 2012 at 16:56:06 UTC, Chad J wrote: >> Yeah, it's good to hear this notion reinforced. I had this suspicion >> that the packrat parser is not necessarily the best/fastest solution, >> mostly because of the large allocation that has to happen before you >> get O(n) performance. Thus I figured that pegged might eventually use >> different parsing strategies underneath it all, possibly with a lot of >> special-casing and clever hand-tuned and profiled optimizations. At >> least that's what makes sense to me. > > At the very least, we could use DFA instead of backtracking where > possible. This is the approach implemented in ANTLR, but I wanted to > introduce them before I knew about existence of the latter, simply > because this would likely produce the fastest parsers possible. Doesn't ANTLR use full-fledged character-level LL(*) parsing even in the tokenizer? Andrei

On Saturday, 7 July 2012 at 19:58:37 UTC, Roman D. Boiko wrote: > On Saturday, 7 July 2012 at 19:50:37 UTC, Timon Gehr wrote: >> http://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method > > OK, at least I didn't misunderstand. So my question was whether the alternative that I described and which exists in PEG is somehow worse than OPP. Wikipedia seems to provide an answer to that: OPP tend to be simpler. (I didn't investigate this topic further.) OK, now I agree, the need to perform several nested calls in order to parse some expression is costly enough to consider OPP superior to a general PEG for expressions. At first I was surprised when found that D doesn't define operator precedence explicitly, but instead provides a hierarchy of expression types. Then I somehow concluded that the approaches are equivalent. (I started learning parsing techniques only in February '12.) Since then I never reconsidered my past conclusions.

On 7/7/12 3:17 PM, Dmitry Olshansky wrote: > I'll have to point out that the whole point about integrated lexing is > moot. It's more of liability then benefit. At very least it's just > implementation curiosity not advantage. Interesting. I'll have to ask for more details on why. Andrei

On Saturday, 7 July 2012 at 20:04:21 UTC, Andrei Alexandrescu wrote: > Doesn't ANTLR use full-fledged character-level LL(*) parsing even in the tokenizer? Since I didn't understand your question I assume that my statement was somehow incorrect (likely because I made some wrong assumptions about ANTLR). I didn't know about its existence until today and still don't understand it completely. What I think I understood is that it uses DFA for deciding which grammar rule to apply instead of doing backtracking. I also think that it uses DFA for low-level scanning (I'm not sure). The idea to introduce DFA for both determining which rule to apply and lexing of terminal symbols appeared to me much earlier, and the suggestion to introduce them into Pegged is one of options which I think could extremely improve performance.

Forums