Let's stop parser Hell (page 18) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Let's stop parser Hell (page 18)

July 10, 2012

Re: Let's stop parser Hell

Posted by Jonathan M Davis
in reply to Jacob Carlborg

Jonathan M Davis

Posted in reply to Jacob Carlborg

On Tuesday, July 10, 2012 22:40:17 Jacob Carlborg wrote:
> On 2012-07-10 22:25, Jonathan M Davis wrote:
> > Which is horrible. You pretty much have to with HTML because of the horrid decision that it should be parsed so laxly by browsers, but pretty much nothing else should do that. Either it's correct or it's not. Having the compiler "fix" your code would cause far more problems that it would ever fix.
> I'm not sure but I think he was referring to a kind of error reporting technique used by compilers. Example:
> 
> int foo ()
> {
> int a = 3 // note the missing semicolon
> return a;
> }
> 
> Instead of the parser going completely mad because of the missing semicolon. It will basically insert a semicolon, report the error and then happily continue parsing. I think this will make it easier to find later errors and less likely to report incorrect errors due to a previous error.

Well, giving an error, continuing to parse, and giving a partial result can be useful (and you give a prime example of that), but "fixing" the problem (e.g by inserting the semicolon) and not considering it to be an error would be a _huge_ mistake IMHO. And that's essentially what happens with HTML.

- Jonathan M Davis

July 10, 2012

Re: Let's stop parser Hell

Posted by Timon Gehr
in reply to Jonathan M Davis

Timon Gehr

Posted in reply to Jonathan M Davis

On 07/10/2012 10:53 PM, Jonathan M Davis wrote:
> On Tuesday, July 10, 2012 22:40:17 Jacob Carlborg wrote:
>> On 2012-07-10 22:25, Jonathan M Davis wrote:
>>> Which is horrible. You pretty much have to with HTML because of the horrid
>>> decision that it should be parsed so laxly by browsers, but pretty much
>>> nothing else should do that. Either it's correct or it's not. Having the
>>> compiler "fix" your code would cause far more problems that it would ever
>>> fix.
>> I'm not sure but I think he was referring to a kind of error reporting
>> technique used by compilers. Example:
>>
>> int foo ()
>> {
>>     int a = 3 // note the missing semicolon
>>     return a;
>> }
>>
>> Instead of the parser going completely mad because of the missing
>> semicolon. It will basically insert a semicolon, report the error and
>> then happily continue parsing. I think this will make it easier to find
>> later errors and less likely to report incorrect errors due to a
>> previous error.
>
> Well, giving an error, continuing to parse, and giving a partial result can be
> useful (and you give a prime example of that), but "fixing" the problem (e.g by
> inserting the semicolon) and not considering it to be an error would be a
> _huge_ mistake IMHO.

This is actually precisely what many of the more recent curly-brace-
and-semicolon languages have been doing with regard to semicolons.

July 10, 2012

Re: Let's stop parser Hell

Posted by deadalnix
in reply to Christophe Travert

deadalnix

Posted in reply to Christophe Travert

On 09/07/2012 10:14, Christophe Travert wrote:
> deadalnix , dans le message (digitalmars.D:171330), a écrit :
>> D isn't 100% CFG. But it is close.
>
> What makes D fail to be a CFG?

type[something] <= something can be a type or an expression.
typeid(somethning) <= same here
identifier!(something) <= again

July 10, 2012

Re: Let's stop parser Hell

Posted by Timon Gehr
in reply to deadalnix

Timon Gehr

Posted in reply to deadalnix

On 07/11/2012 01:16 AM, deadalnix wrote:
> On 09/07/2012 10:14, Christophe Travert wrote:
>> deadalnix , dans le message (digitalmars.D:171330), a écrit :
>>> D isn't 100% CFG. But it is close.
>>
>> What makes D fail to be a CFG?
>
> type[something] <= something can be a type or an expression.
> typeid(somethning) <= same here
> identifier!(something) <= again

'something' is context-free:

something ::= type | expression.

July 11, 2012

Re: Let's stop parser Hell

Posted by Jacob Carlborg
in reply to Jonathan M Davis

Jacob Carlborg

Posted in reply to Jonathan M Davis

On 2012-07-10 22:53, Jonathan M Davis wrote:

> Well, giving an error, continuing to parse, and giving a partial result can be
> useful (and you give a prime example of that), but "fixing" the problem (e.g by
> inserting the semicolon) and not considering it to be an error would be a
> _huge_ mistake IMHO. And that's essentially what happens with HTML.

No, that is _not_ what happens with HTML. With HTML, the browser _do not_ output the error and continues as if it was valid could. As far as I know, up until HTML 5, the spec hasn't mentioned what should happen with invalid code.

This is just a error handling strategy that is an implementation detail. It will not change what is and what isn't valid code. Are you preferring getting just the first error when compiling? Fix the error, compile again, get a new error and so on.

-- 
/Jacob Carlborg

July 11, 2012

Re: Let's stop parser Hell

Posted by Jonathan M Davis
in reply to Jacob Carlborg

Jonathan M Davis

Posted in reply to Jacob Carlborg

On Wednesday, July 11, 2012 08:41:53 Jacob Carlborg wrote:
> On 2012-07-10 22:53, Jonathan M Davis wrote:
> > Well, giving an error, continuing to parse, and giving a partial result can be useful (and you give a prime example of that), but "fixing" the problem (e.g by inserting the semicolon) and not considering it to be an error would be a _huge_ mistake IMHO. And that's essentially what happens with HTML.
> No, that is _not_ what happens with HTML. With HTML, the browser _do not_ output the error and continues as if it was valid could. As far as I know, up until HTML 5, the spec hasn't mentioned what should happen with invalid code.
> 
> This is just a error handling strategy that is an implementation detail. It will not change what is and what isn't valid code. Are you preferring getting just the first error when compiling? Fix the error, compile again, get a new error and so on.

??? I guess that I wasn't clear. I mean that with HTML, it ignores errors. The browser doesn't spit out errors. It just guesses at what you really meant and displays that. It "fixes" the error for you, which is a horrible design IMHO. Obviously, we're stuck with it for HTML, but it should not be replicated with anything else.

This is in contrast to your example of outputting an error and continuing to parse as best it can in order to provide more detail and more error messages but _not_ ultimately considering the parsing successful. _That_ is useful. HTML's behavior is not.

- Jonathan M Davis

July 11, 2012

Re: Let's stop parser Hell

Posted by Christophe Travert
in reply to Timon Gehr

Christophe Travert

Posted in reply to Timon Gehr

Timon Gehr , dans le message (digitalmars.D:171814), a écrit :
> On 07/11/2012 01:16 AM, deadalnix wrote:
>> On 09/07/2012 10:14, Christophe Travert wrote:
>>> deadalnix , dans le message (digitalmars.D:171330), a écrit :
>>>> D isn't 100% CFG. But it is close.
>>>
>>> What makes D fail to be a CFG?
>>
>> type[something] <= something can be a type or an expression.
>> typeid(somethning) <= same here
>> identifier!(something) <= again
> 
> 'something' is context-free:
> 
> something ::= type | expression.

Do you have to know if something is a type or an expression for a simple parsing? The langage would better not require this, otherwise simple parsing is not possible without looking at all forward references and imported files.

July 11, 2012

Re: Let's stop parser Hell

Posted by Jacob Carlborg
in reply to Jonathan M Davis

Jacob Carlborg

Posted in reply to Jonathan M Davis

On 2012-07-11 08:52, Jonathan M Davis wrote:

> ??? I guess that I wasn't clear. I mean that with HTML, it ignores errors. The
> browser doesn't spit out errors. It just guesses at what you really meant and
> displays that. It "fixes" the error for you, which is a horrible design IMHO.
> Obviously, we're stuck with it for HTML, but it should not be replicated with
> anything else.
>
> This is in contrast to your example of outputting an error and continuing to
> parse as best it can in order to provide more detail and more error messages
> but _not_ ultimately considering the parsing successful. _That_ is useful.
> HTML's behavior is not.

Ok, I see. It seems we're meaning the same thing.

-- 
/Jacob Carlborg

July 11, 2012

Re: Let's stop parser Hell

Posted by Timon Gehr
in reply to Christophe Travert

Timon Gehr

Posted in reply to Christophe Travert

On 07/11/2012 10:23 AM, Christophe Travert wrote:
> Timon Gehr , dans le message (digitalmars.D:171814), a écrit :
>> On 07/11/2012 01:16 AM, deadalnix wrote:
>>> On 09/07/2012 10:14, Christophe Travert wrote:
>>>> deadalnix , dans le message (digitalmars.D:171330), a écrit :
>>>>> D isn't 100% CFG. But it is close.
>>>>
>>>> What makes D fail to be a CFG?
>>>
>>> type[something]<= something can be a type or an expression.
>>> typeid(somethning)<= same here
>>> identifier!(something)<= again
>>
>> 'something' is context-free:
>>
>> something ::= type | expression.
>
> Do you have to know if something is a type or an expression for a simple
> parsing?

No. Some token sequences can be both a type and an expression based on
the context (the CFG is ambiguous), but that is the analysers business.
Parsing D code does not require any kind of analysis.

> The langage would better not require this, otherwise simple
> parsing is not possible without looking at all forward references and
> imported files.

July 11, 2012

Re: Let's stop parser Hell

Posted by David Piepgrass
in reply to Timon Gehr

David Piepgrass

Posted in reply to Timon Gehr

On Tuesday, 10 July 2012 at 23:49:58 UTC, Timon Gehr wrote:
> On 07/11/2012 01:16 AM, deadalnix wrote:
>> On 09/07/2012 10:14, Christophe Travert wrote:
>>> deadalnix , dans le message (digitalmars.D:171330), a écrit :
>>>> D isn't 100% CFG. But it is close.
>>> What makes D fail to be a CFG?
>> type[something] <= something can be a type or an expression.
>> typeid(somethning) <= same here
>> identifier!(something) <= again
>
> 'something' is context-free:
>
> something ::= type | expression.

I don't see how "type | expression" is context free. The input "Foo" could be a type or expression, you can't tell which without looking at the context.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation