View mode: basic / threaded / horizontal-split · Log in · Help
July 07, 2012
Re: Let's stop parser Hell
On 07-Jul-12 22:29, Chad J wrote:
> On 07/07/2012 01:01 PM, Roman D. Boiko wrote:
>> On Saturday, 7 July 2012 at 16:56:06 UTC, Chad J wrote:
>>> Yeah, it's good to hear this notion reinforced. I had this suspicion
>>> that the packrat parser is not necessarily the best/fastest solution,
>>> mostly because of the large allocation that has to happen before you
>>> get O(n) performance. Thus I figured that pegged might eventually use
>>> different parsing strategies underneath it all, possibly with a lot of
>>> special-casing and clever hand-tuned and profiled optimizations. At
>>> least that's what makes sense to me.
>>
>> At the very least, we could use DFA instead of backtracking where
>> possible. This is the approach implemented in ANTLR, but I wanted to
>> introduce them before I knew about existence of the latter, simply
>> because this would likely produce the fastest parsers possible.
>
> These were my thoughts exactly, although somewhat unsubstantiated in my
> case ;)

Yup the nice thing about ANTLR is the usage of DFA. e.g. it's used for 
disambiguating alternatives that LL(k) could never do.

-- 
Dmitry Olshansky
July 07, 2012
Re: Let's stop parser Hell
On 07-Jul-12 20:56, Chad J wrote:
> On 07/07/2012 12:37 PM, Roman D. Boiko wrote:
>> On Saturday, 7 July 2012 at 16:27:00 UTC, Philippe Sigaud wrote:
>>
>>> Note that PEG does not impose to use packrat parsing, even though it
>>> was developed to use it. I think it's a historical 'accident' that put
>>> the two together: Bryan Ford thesis used the two together.
>>>
>>> Note that many PEG parsers do not rely on packrat (Pegged does not).
>>> There are a bunch of articles on Bryan Ford's website by a guy
>>> writting a PEG parser for Java, and who found that storing the last
>>> rules was enought to get a slight speed improvement, buth that doing
>>> anymore sotrage was detrimental to the parser's overall efficiency.
>>
>> That's great! Anyway I want to understand the advantages and limitations
>> of both Pegged and ANTLR, and probably study some more techniques. Such
>> research consumes a lot of time but can be done incrementally along with
>> development.
>
> Yeah, it's good to hear this notion reinforced.  I had this suspicion
> that the packrat parser is not necessarily the best/fastest solution,
> mostly because of the large allocation that has to happen before you get
> O(n) performance.  Thus I figured that pegged might eventually use
> different parsing strategies underneath it all, possibly with a lot of
> special-casing and clever hand-tuned and profiled optimizations.  At
> least that's what makes sense to me.

Another idea that I've never realized is to add operator precedence 
grammar to pegged. Of course it should fit naturally with traditional 
PEG, for instance taking responsibility for parsing expressions.

-- 
Dmitry Olshansky
July 07, 2012
Re: Let's stop parser Hell
On Saturday, 7 July 2012 at 19:29:47 UTC, Dmitry Olshansky wrote:
> Another idea that I've never realized is to add operator 
> precedence grammar to pegged. Of course it should fit naturally 
> with traditional PEG, for instance taking responsibility for 
> parsing expressions.

But that's already available by explicitly defining expression 
grammar via nested rules. See for example the examples/dgrammar.d 
in Pegged. This way, for example, multiplication has precedence 
over addition. (It looks like I misunderstood you. Did I?)
July 07, 2012
Re: Let's stop parser Hell
On 07-Jul-12 23:35, Roman D. Boiko wrote:
> On Saturday, 7 July 2012 at 19:29:47 UTC, Dmitry Olshansky wrote:
>> Another idea that I've never realized is to add operator precedence
>> grammar to pegged. Of course it should fit naturally with traditional
>> PEG, for instance taking responsibility for parsing expressions.
>
> But that's already available by explicitly defining expression grammar
> via nested rules. See for example the examples/dgrammar.d in Pegged.
> This way, for example, multiplication has precedence over addition. (It
> looks like I misunderstood you. Did I?)

Yes, I've meant something completely different ;)
http://en.wikipedia.org/wiki/Operator-precedence_grammar

-- 
Dmitry Olshansky
July 07, 2012
Re: Let's stop parser Hell
On 07/07/2012 09:35 PM, Roman D. Boiko wrote:
> On Saturday, 7 July 2012 at 19:29:47 UTC, Dmitry Olshansky wrote:
>> Another idea that I've never realized is to add operator precedence
>> grammar to pegged. Of course it should fit naturally with traditional
>> PEG, for instance taking responsibility for parsing expressions.
>
> But that's already available by explicitly defining expression grammar
> via nested rules. See for example the examples/dgrammar.d in Pegged.
> This way, for example, multiplication has precedence over addition. (It
> looks like I misunderstood you. Did I?)

http://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method
July 07, 2012
Re: Let's stop parser Hell
On Saturday, 7 July 2012 at 19:50:37 UTC, Timon Gehr wrote:
> http://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method

OK, at least I didn't misunderstand. So my question was whether 
the alternative that I described and which exists in PEG is 
somehow worse than OPP. Wikipedia seems to provide an answer to 
that: OPP tend to be simpler. (I didn't investigate this topic 
further.)
July 07, 2012
Re: Let's stop parser Hell
On 7/7/12 1:01 PM, Roman D. Boiko wrote:
> On Saturday, 7 July 2012 at 16:56:06 UTC, Chad J wrote:
>> Yeah, it's good to hear this notion reinforced. I had this suspicion
>> that the packrat parser is not necessarily the best/fastest solution,
>> mostly because of the large allocation that has to happen before you
>> get O(n) performance. Thus I figured that pegged might eventually use
>> different parsing strategies underneath it all, possibly with a lot of
>> special-casing and clever hand-tuned and profiled optimizations. At
>> least that's what makes sense to me.
>
> At the very least, we could use DFA instead of backtracking where
> possible. This is the approach implemented in ANTLR, but I wanted to
> introduce them before I knew about existence of the latter, simply
> because this would likely produce the fastest parsers possible.

Doesn't ANTLR use full-fledged character-level LL(*) parsing even in the 
tokenizer?

Andrei
July 07, 2012
Re: Let's stop parser Hell
On Saturday, 7 July 2012 at 19:58:37 UTC, Roman D. Boiko wrote:
> On Saturday, 7 July 2012 at 19:50:37 UTC, Timon Gehr wrote:
>> http://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method
>
> OK, at least I didn't misunderstand. So my question was whether 
> the alternative that I described and which exists in PEG is 
> somehow worse than OPP. Wikipedia seems to provide an answer to 
> that: OPP tend to be simpler. (I didn't investigate this topic 
> further.)
OK, now I agree, the need to perform several nested calls in 
order to parse some expression is costly enough to consider OPP 
superior to a general PEG for expressions.

At first I was surprised when found that D doesn't define 
operator precedence explicitly, but instead provides a hierarchy 
of expression types. Then I somehow concluded that the approaches 
are equivalent. (I started learning parsing techniques only in 
February '12.) Since then I never reconsidered my past 
conclusions.
July 07, 2012
Re: Let's stop parser Hell
On 7/7/12 3:17 PM, Dmitry Olshansky wrote:
> I'll have to point out that the whole point about integrated lexing is
> moot. It's more of liability then benefit. At very least it's just
> implementation curiosity not advantage.

Interesting. I'll have to ask for more details on why.

Andrei
July 07, 2012
Re: Let's stop parser Hell
On Saturday, 7 July 2012 at 20:04:21 UTC, Andrei Alexandrescu 
wrote:
> Doesn't ANTLR use full-fledged character-level LL(*) parsing 
> even in the tokenizer?

Since I didn't understand your question I assume that my 
statement was somehow incorrect (likely because I made some wrong 
assumptions about ANTLR). I didn't know about its existence until 
today and still don't understand it completely. What I think I 
understood is that it uses DFA for deciding which grammar rule to 
apply instead of doing backtracking. I also think that it uses 
DFA for low-level scanning (I'm not sure).

The idea to introduce DFA for both determining which rule to 
apply and lexing of terminal symbols appeared to me much earlier, 
and the suggestion to introduce them into Pegged is one of 
options which I think could extremely improve performance.
6 7 8 9 10 11 12 13 14
Top | Discussion index | About this forum | D home