Thread overview
Recursive descent is dangerous
Feb 28, 2005
Manfred Nowak
Feb 28, 2005
Stewart Gordon
Feb 28, 2005
Walter
Mar 04, 2005
Ilya Minkov
Mar 12, 2005
Manfred Nowak
February 28, 2005
Recursive descent parsing  is dangerous, especially when there is no formal grammar at hand.

<cite  href="http://www.digitalmars.com/d/expression.html">
MulExpression:
		UnaryExpression
		MulExpression * UnaryExpression
UnaryExpression:
		* UnaryExpression
		NewExpression
NewExpression:
		new BasicType Stars
Stars:
		nothing
		*
		* Stars
</cite>

Do you see the ambiguousness of this grammar? A recursive descent parser, as currently used, does not see it.

Consider this derivation according to the grammar above, which seems to be excerpted from the recursive descent parser:

    MulExpression
--> MulExpression           * UnaryExpression
--> UnaryExpression         * UnaryExpression
--> * UnaryExpression       * * UnaryExpression
--> * * UnaryExpression     * * UnaryExpression
--> * * NewExpression       * * UnaryExpression
--> * * new BasicType Stars * * UnaryExpression
--> * * new int       *     * * Identifier

Then the following should be flawlessly parsed:

<code>
void main(){
  int** x;
  x=new int*;
  int* y;
  y= *x;

  int z= **x * *y;
  z= **new int* * *y;
}
</code>

But it cannot be parsed because of the ambiguity, which the recursive descent parser is not aware of and therefore parses the code in question as

--> **new BasicType Stars Identifier

which cannot be derived from the startsymbol.

Also parenthesing does not help

  Z= **(new int*) * *y;

because somehow int-promotion is made and the error mesage

 "can only * a pointer, not a 'int'"

is generated.

-manfred
February 28, 2005
Manfred Nowak wrote:
> Recursive descent parsing  is dangerous, especially when there is no formal grammar at hand.

Ambiguous grammars are dangerous, before you get into the question of how to parse them.

> <cite  href="http://www.digitalmars.com/d/expression.html">
> MulExpression:
> 		UnaryExpression
> 		MulExpression * UnaryExpression
> UnaryExpression:
> 		* UnaryExpression
> 		NewExpression
> NewExpression:
> 		new BasicType Stars
> Stars:
> 		nothing
> 		*
> 		* Stars
> </cite>
> 
> Do you see the ambiguousness of this grammar? A recursive descent parser, as currently used, does not see it.
<snip>

Yes.  I even see the ambiguity of a simple statement like

    qwert * yuiop;

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on the 'group where everyone may benefit.
February 28, 2005
The ambiguity of:

    a * b

is resolved in the parser by using the rule "if it parses as a declaration, it is a declaration." Hence,

    a * b;

resolves as b being declared as a pointer to a.


March 04, 2005
Manfred Nowak wrote:
> Recursive descent parsing  is dangerous, especially when there is no formal grammar at hand.

If one uses a recursive descent parser generator, it would bark and make you fix. I just had COCO/R bark at me yesterday for a similar reason - something that one cannot actually detect visually in a complex grammar. More recent recursive descent parser generators offer disambiguation on the basis of more lookahead or other means.

> <cite  href="http://www.digitalmars.com/d/expression.html">
> MulExpression:
> 		UnaryExpression
> 		MulExpression * UnaryExpression
> UnaryExpression:
> 		* UnaryExpression
> 		NewExpression
> NewExpression:
> 		new BasicType Stars
> Stars:
> 		nothing
> 		*
> 		* Stars
> </cite>
> 
> Do you see the ambiguousness of this grammar? A recursive descent parser, as currently used, does not see it.

This grammar need not represent the grammar on which DMD is actually built. However, this part might.

> Consider this derivation according to the grammar above, which seems to be excerpted from the recursive descent parser:
> 
>     MulExpression
> --> MulExpression           * UnaryExpression
> --> UnaryExpression         * UnaryExpression
> --> * UnaryExpression       * * UnaryExpression
> --> * * UnaryExpression     * * UnaryExpression
> --> * * NewExpression       * * UnaryExpression
> --> * * new BasicType Stars * * UnaryExpression
> --> * * new int       *     * * Identifier
> 
> Then the following should be flawlessly parsed:
> 
> <code>
> void main(){
>   int** x;
>   x=new int*;
>   int* y;
>   y= *x;
> 
>   int z= **x * *y;
>   z= **new int* * *y;
> }
> </code>
> 
> But it cannot be parsed because of the ambiguity, which the recursive descent parser is not aware of and therefore parses the code in question as
> 
> --> **new BasicType Stars Identifier
> 
> which cannot be derived from the startsymbol.

A recursive descent parser never arrives at parsing this expression completely. To the contrary, an LR parser would parse this subexpression completely, and might later have problems deriving it from the root. However, since these decisions in an LR parser are made statically, a table generator would bark at you.

As opposed to an LR parser one can quite nicely use the information on what we are trying to parse - the grammar can have separate sets of simiar rules for stuff which can be confusingly similar but can be disambiguated by where it may appear. This is not possible wlth bottom-up parsing.

There are 2 other techniques I am aware of. One is GLR, of which i believe it uses a combination of top-down and bottom-up techniques - and is moderately fast - however i cannot see how it can solve the problem. Another is Earley parser which yuilds all possible derivations, but is probably not fast enough to be used for such a big task as D.

> Also parenthesing does not help
> 
>   Z= **(new int*) * *y;
> 
> because somehow int-promotion is made and the error mesage
> 
>  "can only * a pointer, not a 'int'"
> 
> is generated.

Int-promotion? I see the problem but i don't see how it happens. Perhaps it is an unrelated bug or perhaps the grammar needs tweaking.

Anyway, what do you propose? How would a different type of parser solve this problem? - well, apart from Earley parser.

Please feel free to correct me if you feel confident, i'm not very solid on the parsing ground and could have mixed something up, and am too tired to pick up the reference at the moment.

-eye
March 12, 2005
Ilya Minkov <minkov@cs.tum.edu> wrote:

[...]
> Int-promotion? I see the problem but i don't see how it happens. Perhaps it is an unrelated bug or perhaps the grammar needs tweaking.
> 
> Anyway, what do you propose? How would a different type of parser solve this problem? - well, apart from Earley parser.
[...]

I agree that this might be an unrelated bug.

I detected this ambiguity, while trying to establish a LR-grammar for D. So my proposal is a dual one.

First: use hand coded recursive descent parsers only if you have also a formal grammar at hand, from which you can detect ambiguities by the according tools preferable before you hand code the parser.

Second: my LR-grammar can be disambiguated by intoducing the need for parentheses around new-expressions if followed by the multiplicative operator `*'.

-manfred