Let's stop parser Hell (page 17) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Let's stop parser Hell (page 17)

July 09, 2012

Re: Let's stop parser Hell

Posted by Daniel Murphy
in reply to Jonathan M Davis

Daniel Murphy

Posted in reply to Jonathan M Davis

"Jonathan M Davis" <jmdavisProg@gmx.com> wrote in message news:mailman.190.1341818983.31962.digitalmars-d@puremagic.com...
>>
>> I'm pretty sure UFCS affects lexing or parsing. How else would this be legal:
>>
>> 4.foo();
>
> That definitely wouldn't affect lexing, because it doesn't affect the
> tokens at
> all.

Not true.  This used to be lexed as '4.f' 'oo'. (I think)

July 10, 2012

Re: Let's stop parser Hell

Posted by Roman D. Boiko
in reply to Roman D. Boiko

Roman D. Boiko

Posted in reply to Roman D. Boiko

On Saturday, 7 July 2012 at 16:37:56 UTC, Roman D. Boiko wrote:
>> Note that PEG does not impose to use packrat parsing, even though it was developed to use it. I think it's a historical 'accident' that put the two together: Bryan Ford thesis used the two together.
>>
>> Note that many PEG parsers do not rely on packrat (Pegged does not).
>> There are a bunch of articles on Bryan Ford's website by a guy
>> writting a PEG parser for Java, and who found that storing the last rules was enought to get a slight speed improvement, buth that doing anymore sotrage was detrimental to the parser's overall efficiency.
>
> That's great! Anyway I want to understand the advantages and limitations of both Pegged and ANTLR, and probably study some more techniques. Such research consumes a lot of time but can be done incrementally along with development.

One disadvantage of Packrat parsers I mentioned was problematic error recovery (according to the article from ANTLR website). After some additional research, I found that it is not a critical problem. To find the exact place of error (from parser's perspective, not user's) one only needs to remember the farthest successfully parsed position (among several backtracking attempts) and the reason that it failed.

It is also possible to rerun parsing with some additional heuristics after failing, thus enabling advanced error repair scenarios.

Since Pegged doesn't use Packrat algorithm, this solution might be either not relevant or not applicable, but I doubt that there will be any fundamental problem with error recovery.

Unpleasant debugging experience, however, should be relevant for any parser that uses backtracking heavily.

July 10, 2012

Re: Let's stop parser Hell

Posted by Philippe Sigaud
in reply to Roman D. Boiko

Philippe Sigaud

Posted in reply to Roman D. Boiko

Tue, Jul 10, 2012 at 12:41 PM, Roman D. Boiko <rb@d-coding.com> wrote:

> One disadvantage of Packrat parsers I mentioned was problematic error recovery (according to the article from ANTLR website). After some additional research, I found that it is not a critical problem. To find the exact place of error (from parser's perspective, not user's) one only needs to remember the farthest successfully parsed position (among several backtracking attempts) and the reason that it failed.

IIRC, that's what I encoded in Pegged (admittedly limited) error reporting: remember the farthest error.

> It is also possible to rerun parsing with some additional heuristics after failing, thus enabling advanced error repair scenarios.

Do people really what error-repairing parsers? I want my parsers to tell me something is bad, and, optionally to advance a possible repair, but definitely *not* to automatically repair a inferred error and continue happily.

July 10, 2012

Re: Let's stop parser Hell

Posted by Timon Gehr
in reply to Philippe Sigaud

Timon Gehr

Posted in reply to Philippe Sigaud

On 07/10/2012 09:14 PM, Philippe Sigaud wrote:
> Tue, Jul 10, 2012 at 12:41 PM, Roman D. Boiko<rb@d-coding.com>  wrote:
>
>
>> One disadvantage of Packrat parsers I mentioned was problematic error
>> recovery (according to the article from ANTLR website). After some
>> additional research, I found that it is not a critical problem. To find the
>> exact place of error (from parser's perspective, not user's) one only needs
>> to remember the farthest successfully parsed position (among several
>> backtracking attempts) and the reason that it failed.
>
> IIRC, that's what I encoded in Pegged (admittedly limited) error
> reporting: remember the farthest error.
>
>> It is also possible to rerun parsing with some additional heuristics after
>> failing, thus enabling advanced error repair scenarios.
>
> Do people really what error-repairing parsers? I want my parsers to
> tell me something is bad, and, optionally to advance a possible
> repair, but definitely *not* to automatically repair a inferred error
> and continue happily.

FWIW, this is what most HTML parsers are doing.

July 10, 2012

Re: Let's stop parser Hell

Posted by Philippe Sigaud
in reply to Timon Gehr

Philippe Sigaud

Posted in reply to Timon Gehr

On Tue, Jul 10, 2012 at 9:25 PM, Timon Gehr <timon.gehr@gmx.ch> wrote:

>> Do people really what error-repairing parsers? I want my parsers to tell me something is bad, and, optionally to advance a possible repair, but definitely *not* to automatically repair a inferred error and continue happily.
>
>
> FWIW, this is what most HTML parsers are doing.

Ah, right. I can get it for HTML/XML. JSON also, maybe.
I was thinking of parsing a programming language (C, D, etc)

Consider me half-convinced :)

July 10, 2012

Re: Let's stop parser Hell

Posted by Roman D. Boiko
in reply to Philippe Sigaud

Roman D. Boiko

Posted in reply to Philippe Sigaud

On Tuesday, 10 July 2012 at 19:41:29 UTC, Philippe Sigaud wrote:
> On Tue, Jul 10, 2012 at 9:25 PM, Timon Gehr <timon.gehr@gmx.ch> wrote:
>
>>> Do people really what error-repairing parsers? I want my parsers to
>>> tell me something is bad, and, optionally to advance a possible
>>> repair, but definitely *not* to automatically repair a inferred error
>>> and continue happily.
>>
>>
>> FWIW, this is what most HTML parsers are doing.
>
> Ah, right. I can get it for HTML/XML. JSON also, maybe.
> I was thinking of parsing a programming language (C, D, etc)
>
> Consider me half-convinced :)

It would still generate errors. But would enable a lot of useful functionality: autocompletion, refactoring, symbol documentation in a tooltip, displaying method overloads with parameters as-you-type, go to definition, etc.

July 10, 2012

Re: Let's stop parser Hell

Posted by Jonathan M Davis
in reply to Timon Gehr

Jonathan M Davis

Posted in reply to Timon Gehr

On Tuesday, July 10, 2012 21:25:52 Timon Gehr wrote:
> On 07/10/2012 09:14 PM, Philippe Sigaud wrote:
> > Tue, Jul 10, 2012 at 12:41 PM, Roman D. Boiko<rb@d-coding.com> wrote:
> >> One disadvantage of Packrat parsers I mentioned was problematic error
> >> recovery (according to the article from ANTLR website). After some
> >> additional research, I found that it is not a critical problem. To find
> >> the
> >> exact place of error (from parser's perspective, not user's) one only
> >> needs
> >> to remember the farthest successfully parsed position (among several
> >> backtracking attempts) and the reason that it failed.
> > 
> > IIRC, that's what I encoded in Pegged (admittedly limited) error reporting: remember the farthest error.
> > 
> >> It is also possible to rerun parsing with some additional heuristics
> >> after
> >> failing, thus enabling advanced error repair scenarios.
> > 
> > Do people really what error-repairing parsers? I want my parsers to tell me something is bad, and, optionally to advance a possible repair, but definitely *not* to automatically repair a inferred error and continue happily.
> 
> FWIW, this is what most HTML parsers are doing.

Which is horrible. You pretty much have to with HTML because of the horrid decision that it should be parsed so laxly by browsers, but pretty much nothing else should do that. Either it's correct or it's not. Having the compiler "fix" your code would cause far more problems that it would ever fix.

- Jonathan M Davis

July 10, 2012

Re: Let's stop parser Hell

Posted by Roman D. Boiko
in reply to Jonathan M Davis

Roman D. Boiko

Posted in reply to Jonathan M Davis

On Tuesday, 10 July 2012 at 20:25:12 UTC, Jonathan M Davis wrote:
> On Tuesday, July 10, 2012 21:25:52 Timon Gehr wrote:
>> FWIW, this is what most HTML parsers are doing.
>
> Which is horrible. You pretty much have to with HTML because of the horrid
> decision that it should be parsed so laxly by browsers, but pretty much
> nothing else should do that. Either it's correct or it's not. Having the
> compiler "fix" your code would cause far more problems that it would ever fix.

Not having control over parser or source code causes problems. Ability to deliver useful functionality (see my post above) is a different use case.

July 10, 2012

Re: Let's stop parser Hell

Posted by Jacob Carlborg
in reply to Jonathan M Davis

Jacob Carlborg

Posted in reply to Jonathan M Davis

On 2012-07-10 22:25, Jonathan M Davis wrote:

> Which is horrible. You pretty much have to with HTML because of the horrid
> decision that it should be parsed so laxly by browsers, but pretty much
> nothing else should do that. Either it's correct or it's not. Having the
> compiler "fix" your code would cause far more problems that it would ever fix.

I'm not sure but I think he was referring to a kind of error reporting technique used by compilers. Example:

int foo ()
{
   int a = 3 // note the missing semicolon
   return a;
}

Instead of the parser going completely mad because of the missing semicolon. It will basically insert a semicolon, report the error and then happily continue parsing. I think this will make it easier to find later errors and less likely to report incorrect errors due to a previous error.

-- 
/Jacob Carlborg

July 10, 2012

Re: Let's stop parser Hell

Posted by Dmitry Olshansky
in reply to Jonathan M Davis

Dmitry Olshansky

Posted in reply to Jonathan M Davis

On 11-Jul-12 00:25, Jonathan M Davis wrote:
> On Tuesday, July 10, 2012 21:25:52 Timon Gehr wrote:
>> On 07/10/2012 09:14 PM, Philippe Sigaud wrote:
>>> Tue, Jul 10, 2012 at 12:41 PM, Roman D. Boiko<rb@d-coding.com> wrote:
>>>> One disadvantage of Packrat parsers I mentioned was problematic error
>>>> recovery (according to the article from ANTLR website). After some
>>>> additional research, I found that it is not a critical problem. To find
>>>> the
>>>> exact place of error (from parser's perspective, not user's) one only
>>>> needs
>>>> to remember the farthest successfully parsed position (among several
>>>> backtracking attempts) and the reason that it failed.
>>>
>>> IIRC, that's what I encoded in Pegged (admittedly limited) error
>>> reporting: remember the farthest error.
>>>
>>>> It is also possible to rerun parsing with some additional heuristics
>>>> after
>>>> failing, thus enabling advanced error repair scenarios.
>>>
>>> Do people really what error-repairing parsers? I want my parsers to
>>> tell me something is bad, and, optionally to advance a possible
>>> repair, but definitely *not* to automatically repair a inferred error
>>> and continue happily.
>>
>> FWIW, this is what most HTML parsers are doing.
>
> Which is horrible. You pretty much have to with HTML because of the horrid
> decision that it should be parsed so laxly by browsers, but pretty much
> nothing else should do that. Either it's correct or it's not. Having the
> compiler "fix" your code would cause far more problems that it would ever fix.
>

BTW clang does this and even more of stuff on semantic level. It's known to won a legions of users because of that (well not only that but good diagnostic in general).


-- 
Dmitry Olshansky

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation