Jump to page: 1 2
Thread overview
Walter - Discrepancy between EqualExpression spec and implementation
May 29, 2006
xs0
Jun 02, 2006
renox
May 29, 2006
James Dunne
Jun 07, 2006
Walter Bright
Jun 16, 2006
Lionello Lunesu
May 28, 2006
I'm writing a small scripting language with a syntax based on D, and as such, I'm basing some of the parsing code off the the DMD frontend source. I've found a discrepancy between the spec and the code, though, and I'm not sure if it's a bug or if it was changed in the code on purpose and the spec just hasn't been updated.

For EqualExpressions, the following grammar is given:

EqualExpression:
    RelExpression
    EqualExpression == RelExpression
    EqualExpression != RelExpression
    EqualExpression is RelExpression
    EqualExpression !is RelExpression

Notice that the left-hand operand of the operators is an EqualExpression.

However, in the code (parse.c line 4365), instead of using parseEqualExp() to get the left-hand operand as I would have expected, it instead uses parseRelExp().  Meaning that the code implements the following grammar instead:

EqualExpression:
    RelExpression
    RelExpression == RelExpression
    RelExpression != RelExpression
    RelExpression is RelExpression
    RelExpression !is RelExpression

I'm pretty sure only Walter can answer this one!


May 28, 2006
"Jarrett Billingsley" <kb3ctd2@yahoo.com> wrote in message news:e5d40i$vp4$1@digitaldaemon.com...
> I'm writing a small scripting language with a syntax based on D, and as such, I'm basing some of the parsing code off the the DMD frontend source. I've found a discrepancy between the spec and the code, though, and I'm not sure if it's a bug or if it was changed in the code on purpose and the spec just hasn't been updated.

I just realized that most of the parsing functions follow this same pattern (of not following the spec).  The only functions that seem to follow the spec are parseAssignExp() and parseCondExp().


May 29, 2006
Forgive me if I am off-base here, but the following code:

int main()
{
	int i = 0;
	bool b = i == 0 == true !is false;

	return 0;
}

Compiles without errors.  This would suggest to me that it is being parsed according to the spec, roughly.  Perhaps it's just not following that exact BNF (but implementing the same difference, e.g. outside EqualExpression)?

-[Unknown]


> "Jarrett Billingsley" <kb3ctd2@yahoo.com> wrote in message news:e5d40i$vp4$1@digitaldaemon.com...
>> I'm writing a small scripting language with a syntax based on D, and as such, I'm basing some of the parsing code off the the DMD frontend source. I've found a discrepancy between the spec and the code, though, and I'm not sure if it's a bug or if it was changed in the code on purpose and the spec just hasn't been updated.
> 
> I just realized that most of the parsing functions follow this same pattern (of not following the spec).  The only functions that seem to follow the spec are parseAssignExp() and parseCondExp(). 
> 
> 
May 29, 2006
"Unknown W. Brackets" <unknown@simplemachines.org> wrote in message news:e5djuj$1dik$1@digitaldaemon.com...

> This would suggest to me that it is being parsed according to the spec, roughly.  Perhaps it's just not following that exact BNF (but implementing the same difference, e.g. outside EqualExpression)?

I'm thinking about it.  It's very weird, but I think I've got it.  I think all Walter's done with this code is factored out a recursive step; in this case, calling parseEqualExp() is the same as calling parseRelExp().

Recursive descent parsers are weird.


May 29, 2006
> Recursive descent parsers are weird. 

Heh, I think it's quite the opposite, with recursive descent you can actually figure out what's going on. Have you seen a yacc-generated parser (it's not recursive descent but "shift/reduce", LALR or whatever that's called)? Now that's really weird:

http://search.cpan.org/src/RGARCIA/perl-5.9.3/perly.tab
http://search.cpan.org/src/RGARCIA/perl-5.9.3/perly.c

Of course, the input file is much more readable, but you can't compare C++ code in one case with grammar definition code in the other case, can you? :)


xs0
May 29, 2006
Jarrett Billingsley wrote:
> "Jarrett Billingsley" <kb3ctd2@yahoo.com> wrote in message news:e5d40i$vp4$1@digitaldaemon.com...
> 
>>I'm writing a small scripting language with a syntax based on D, and as such, I'm basing some of the parsing code off the the DMD frontend source. I've found a discrepancy between the spec and the code, though, and I'm not sure if it's a bug or if it was changed in the code on purpose and the spec just hasn't been updated.
> 
> 
> I just realized that most of the parsing functions follow this same pattern (of not following the spec).  The only functions that seem to follow the spec are parseAssignExp() and parseCondExp(). 
> 
> 

I've taken a serious look into this too :)  Here's my answer:

See the while loops implemented in most of those functions?

Grammar rule:
-------------
OrOrExpression:
	AndAndExpression
	OrOrExpression || AndAndExpression

DMD implementation:
-------------------
Expression *Parser::parseOrOrExp()
{   Expression *e;
    Expression *e2;
    Loc loc = this->loc;

    e = parseAndAndExp();
    while (token.value == TOKoror)
    {
	nextToken();
	e2 = parseAndAndExp();
	e = new OrOrExp(loc, e, e2);
    }
    return e;
}

So, on the first iteration we either get an AndAndExpression alone, or an OrOrExpression with the left expression being an AndAndExpression and the right expression being an AndAndExpression.

The left-recursive descent rule 'OrOrExpression || AndAndExpression' means to loop over the left side until a subsequent token is NOT the '||' token.  Thus, we can only get an OrOrExpression out of two AndAndExpressions next to each other, separated by the '||' token, or just an AndAndExpression alone.  So, the code does implement the spec correctly.  All the other expression rules follow this pattern.

As to the exceptional case of AssignExpression:

Grammar rule:
-------------
AssignExpression:
	ConditionalExpression
	ConditionalExpression = AssignExpression
	ConditionalExpression += AssignExpression
	ConditionalExpression -= AssignExpression
	ConditionalExpression *= AssignExpression
	ConditionalExpression /= AssignExpression
	ConditionalExpression %= AssignExpression
	ConditionalExpression &= AssignExpression
	ConditionalExpression |= AssignExpression
	ConditionalExpression ^= AssignExpression
	ConditionalExpression ~= AssignExpression
	ConditionalExpression <<= AssignExpression
	ConditionalExpression >>= AssignExpression
	ConditionalExpression >>>= AssignExpression

DMD implementation:
(I've expanded out the #define shortcut for readability)
-------------------
Expression *Parser::parseAssignExp() {
    Expression *e;
    Expression *e2;
    Loc loc;

    e = parseCondExp();
    while (1) {
        loc = this->loc;
        switch (token.value) {
            case TOKassign:  nextToken(); e2 = parseAssignExp(); e = new AssignExp(loc,e,e2); continue;
            case TOKaddass:  nextToken(); e2 = parseAssignExp(); e = new AddAssignExp(loc,e,e2); continue;
            case TOKminass:  nextToken(); e2 = parseAssignExp(); e = new MinAssignExp(loc,e,e2); continue;
            case TOKmulass:  nextToken(); e2 = parseAssignExp(); e = new MulAssignExp(loc,e,e2); continue;
            case TOKdivass:  nextToken(); e2 = parseAssignExp(); e = new DivAssignExp(loc,e,e2); continue;
            case TOKmodass:  nextToken(); e2 = parseAssignExp(); e = new ModAssignExp(loc,e,e2); continue;
            case TOKandass:  nextToken(); e2 = parseAssignExp(); e = new AndAssignExp(loc,e,e2); continue;
            case TOKorass:   nextToken(); e2 = parseAssignExp(); e = new OrAssignExp(loc,e,e2); continue;
            case TOKxorass:  nextToken(); e2 = parseAssignExp(); e = new XorAssignExp(loc,e,e2); continue;
            case TOKshlass:  nextToken(); e2 = parseAssignExp(); e = new ShlAssignExp(loc,e,e2); continue;
            case TOKshrass:  nextToken(); e2 = parseAssignExp(); e = new ShrAssignExp(loc,e,e2); continue;
            case TOKushrass: nextToken(); e2 = parseAssignExp(); e = new UshrAssignExp(loc,e,e2); continue;
            case TOKcatass:  nextToken(); e2 = parseAssignExp(); e = new CatAssignExp(loc,e,e2); continue;

            default:
            break;
        }
        break;
    }
    return e;
}

Here, we see that e2 is always evaluated as an AssignExpression, thus this rule is right-recursive, not left-recursive as the other pattern. Again, note the while loop here.

I'm not sure how to make it clearer, but if you have any questions don't hesitate to ask.

P.S. - I've also used this code pattern as a template for my own language parser and it does work quite well.

-- 
Regards,
James Dunne
May 30, 2006
"James Dunne" <james.jdunne@gmail.com> wrote in message news:e5g01q$1l9r$1@digitaldaemon.com...

> The left-recursive descent rule 'OrOrExpression || AndAndExpression' means to loop over the left side until a subsequent token is NOT the '||' token. Thus, we can only get an OrOrExpression out of two AndAndExpressions next to each other, separated by the '||' token, or just an AndAndExpression alone.  So, the code does implement the spec correctly.  All the other expression rules follow this pattern.

Ahhhahaha... I see.  That's kind of what my mind was working towards when I was thinking about it.

> Here, we see that e2 is always evaluated as an AssignExpression, thus this rule is right-recursive, not left-recursive as the other pattern. Again, note the while loop here.

I thought it might have something to do with left- and right-associativity, as it's only the assignment operators which are right-associative (and the postfix operators too, but you don't see that with the way the D parser is structured).  I used a different form of expression parsing before which required that you have the associativity of each operator known beforehand (along with the precedence, as using different associativity changed how you parsed the LHS and RHS of an operator, just like in the recursive descent method).  The other method was a bit more terse and mathematical, but had the advantage of smaller code.  But recursive descent is just.. cool, if a bit weird.


June 02, 2006
Unknown W. Brackets wrote:

> Forgive me if I am off-base here, but the following code:
> 
> int main()
> {
>     int i = 0;
>     bool b = i == 0 == true !is false;
> 
>     return 0;
> }
> 
> Compiles without errors.

Urgh, if I see someone coding like this, let's just say that I won't be polite..

At some point I wonder if it wouldn't be better if in the language all the operators shouldn't have the same priority, just to force the programmers to use parenthesis!

The only downside I see is a big pitfall for the beginners and perhaps less opportunity for optimisation by the compiler depending of the implementation..

Regards,
RenoX
June 03, 2006
I have been known to revert checkins for code line that, haha.

Still, I like that I can write something like this:

x = original_x * cos(angle) - original_y * sin(angle);
y = original_y * cos(angle) + original_x * sin(angle);

And know it will work as I expect.  For this reason, I love operator precedence.

But I agree that "x == y == z" is just wrong.  I even take it farther and can't stand when people do "x = y = 0;", even if it is fine code.

Everyone has their own preferences.

-[Unknown]


> Urgh, if I see someone coding like this, let's just say that I won't be polite..
> 
> At some point I wonder if it wouldn't be better if in the language all the operators shouldn't have the same priority, just to force the programmers to use parenthesis!
> 
> The only downside I see is a big pitfall for the beginners and perhaps less opportunity for optimisation by the compiler depending of the implementation..
> 
> Regards,
> RenoX
June 07, 2006
Jarrett Billingsley wrote:
> I'm writing a small scripting language with a syntax based on D, and as such, I'm basing some of the parsing code off the the DMD frontend source. I've found a discrepancy between the spec and the code, though, and I'm not sure if it's a bug or if it was changed in the code on purpose and the spec just hasn't been updated.
> 
> For EqualExpressions, the following grammar is given:
> 
> EqualExpression:
>     RelExpression
>     EqualExpression == RelExpression
>     EqualExpression != RelExpression
>     EqualExpression is RelExpression
>     EqualExpression !is RelExpression
> 
> Notice that the left-hand operand of the operators is an EqualExpression.
> 
> However, in the code (parse.c line 4365), instead of using parseEqualExp() to get the left-hand operand as I would have expected, it instead uses parseRelExp().  Meaning that the code implements the following grammar instead:
> 
> EqualExpression:
>     RelExpression
>     RelExpression == RelExpression
>     RelExpression != RelExpression
>     RelExpression is RelExpression
>     RelExpression !is RelExpression
> 
> I'm pretty sure only Walter can answer this one! 

It's implemented as (simplified):

Expression *Parser::parseEqualExp()
{   Expression *e;
    Expression *e2;
    e = parseRelExp();
    while (1)
    {   enum TOK value = token.value;

        switch (value)
        {
            case TOKequal:
            case TOKnotequal:
                nextToken();
                e2 = parseRelExp();
                e = new EqualExp(value, loc, e, e2);
                continue;

            default:
                break;
        }
        break;
    }
    return e;
}

which means that:
	a == b == c == d
is parsed as:
	(((a == b) == c) == d)
which I think you'll see matches the grammar. The while loop makes it left-associative.
« First   ‹ Prev
1 2