D grammar (yacc format)

Jan 28, 2004

Sammy

Jan 28, 2004

Ben Hinkle

Jan 29, 2004

Sammy

Jan 29, 2004

Sammy

Jan 29, 2004

Sammy

I began to write a yacc style grammar for D. I am using the parse.c from the dmd source. Each function is being translated to a piece of the grammar. Each piece is added to the source file using some tag ( /**@Grammar ) Eg: /**@Grammar * * %token "^" TOKxor * * XorExp : AndExp * | XorExp "^" AndExp * ; * **/ Expression *Parser::parseXorExp() { Expression *e; Expression *e2; Loc loc = this->loc; e = parseAndExp(); while (token.value == TOKxor) { nextToken(); e2 = parseAndExp(); e = new XorExp(loc, e, e2); } return e; } I dont know if the final grammar will be LALR, nor if it will be unabiguous. If everything is ok, i will use bison to process the grammar. if not, there are other parser generators who can handle anbiguous and/or non-lalr grammars ( eg: btyacc, accent, msta, D'Parser ) I would like to see the grammar maintained ( i can do that ) and distributed with DMD ( i cant do that :-) . I also have a lexer (made with flex), who needs to be updated. What do you think ?

Wow, this would be awesome! Think you'll end up using somethign besides bison though , D'Parser looks really cool. C "Sammy" <none@here.com> wrote in message news:bv941u$1e1e$1@digitaldaemon.com... > I began to write a yacc style grammar for D. > > I am using the parse.c from the dmd source. > Each function is being translated to a piece of > the grammar. Each piece is added to the source > file using some tag ( /**@Grammar ) > > Eg: > > /**@Grammar > * > * %token "^" TOKxor > * > * XorExp : AndExp > * | XorExp "^" AndExp > * ; > * > **/ > > Expression *Parser::parseXorExp() > { Expression *e; > Expression *e2; > Loc loc = this->loc; > > e = parseAndExp(); > while (token.value == TOKxor) > { > nextToken(); > e2 = parseAndExp(); > e = new XorExp(loc, e, e2); > } > return e; > } > > I dont know if the final grammar will be LALR, nor > if it will be unabiguous. > > If everything is ok, i will use bison to process the grammar. > if not, there are other parser generators who can > handle anbiguous and/or non-lalr grammars > ( eg: btyacc, accent, msta, D'Parser ) > > I would like to see the grammar maintained ( i can do that ) > and distributed with DMD ( i cant do that :-) . > > I also have a lexer (made with flex), who needs to be > updated. > > What do you think ? > >

Interestingly GCC's C++ frontend switched from using a yacc generated parser to a hand-written parser in GCC-3.4. I don't know all the reasons why but you could probably Google around to find out why. Personally I find the 3.4 hand-written parser is easier to understand. -Ben "Sammy" <none@here.com> wrote in message news:bv941u$1e1e$1@digitaldaemon.com... > I began to write a yacc style grammar for D. > > I am using the parse.c from the dmd source. > Each function is being translated to a piece of > the grammar. Each piece is added to the source > file using some tag ( /**@Grammar ) > > Eg: > > /**@Grammar > * > * %token "^" TOKxor > * > * XorExp : AndExp > * | XorExp "^" AndExp > * ; > * > **/ > > Expression *Parser::parseXorExp() > { Expression *e; > Expression *e2; > Loc loc = this->loc; > > e = parseAndExp(); > while (token.value == TOKxor) > { > nextToken(); > e2 = parseAndExp(); > e = new XorExp(loc, e, e2); > } > return e; > } > > I dont know if the final grammar will be LALR, nor > if it will be unabiguous. > > If everything is ok, i will use bison to process the grammar. > if not, there are other parser generators who can > handle anbiguous and/or non-lalr grammars > ( eg: btyacc, accent, msta, D'Parser ) > > I would like to see the grammar maintained ( i can do that ) > and distributed with DMD ( i cant do that :-) . > > I also have a lexer (made with flex), who needs to be > updated. > > What do you think ? > >

"Ben Hinkle" <bhinkle4@juno.com> wrote in message news:bv9etc$21ib$1@digitaldaemon.com... > Interestingly GCC's C++ frontend switched from using a yacc generated parser > to a hand-written parser in GCC-3.4. I don't know all the reasons why but you could probably Google around to find out why. Personally I find the 3.4 > hand-written parser is easier to understand. > > -Ben > I am not sure, and i dont remember where i read this, but I think its because the parser is faster, simpler and as you said, easier to understand ( and possibly to implement ). Implement the parser, any way you want. But if you want to have a clear picture of the language, i think you need a grammar. If you use EBNF, instead of pure BNF, you will end up with a grammar fitting on 2 or 3 pages instead of the 67 pages of the file parser.c So you are right, but i still find its still and interesting exercice. - S

"Ben Hinkle" <bhinkle4@juno.com> wrote in message news:bv9etc$21ib$1@digitaldaemon.com... > Interestingly GCC's C++ frontend switched from using a yacc generated parser > to a hand-written parser in GCC-3.4. I don't know all the reasons why but you could probably Google around to find out why. Personally I find the 3.4 > hand-written parser is easier to understand. > > -Ben > Found this short explaination on : http://gnu.essentkabel.com/software/gcc/gcc-3.4/changes.html GCC 3.4 Release Series. Changes, New Features, and Fixes C++ A hand-written recursive-descent C++ parser has replaced the YACC-derived C++ parser from previous GCC releases. The new parser contains much improved infrastructure needed for better parsing of C++ source codes, handling of extensions, and clean separation (where possible) between proper semantics analysis and parsing. The new parser fixes many bugs that were found in the old parser. - S

January 29, 2004

Re: D grammar (yacc format)

Posted by Sammy
in reply to Ben Hinkle

Permalink

Sammy

Posted in reply to Ben Hinkle

Permalink

"Ben Hinkle" <bhinkle4@juno.com> wrote in message news:bv9etc$21ib$1@digitaldaemon.com...
> Interestingly GCC's C++ frontend switched from using a yacc generated
parser
> to a hand-written parser in GCC-3.4. I don't know all the reasons why but you could probably Google around to find out why. Personally I find the
3.4
> hand-written parser is easier to understand.
>
> -Ben
>

Sorry, to post again.
All those explanation tells that its a very good choice
to use a hand written parser...

From: http://compilers.iecc.com/comparch/article/02-01-022
----------------------------------------------------------------------
Re: C++ parsing : what's new ?
From comp.compilers

From:  Zack Weinberg <zackw@panix.com>
Newsgroups:  comp.compilers
Date:  4 Jan 2002 00:09:35 -0500
Organization:  PANIX -- Public Access Networks Corp.
References:  01-12-179
Keywords:  C++, parse
Posted-Date:  04 Jan 2002 00:09:35 EST

Peter H. Froehlich <pfroehli@ics.uci.edu> writes:
>On Saturday, December 29, 2001, at 10:30 , Travis Nixon wrote:
>
>> "Martin von Loewis" <loewis@informatik.hu-berlin.de> wrote in message
>>> The gcc grammar is currently being rewritten, from a bison-based one to a hand-written recursive-descent parser.
>>
>> Is there a discussion online anywhere about the reasons for doing this?
>
>As I recall the motivation is performance. The people doing the rewrite expect the recursive descent parser to be way faster than the table-driven one they have now. However, I sure would *not* want to write a parser like that for a grammar as hairy as C/C++. :-)

The motivations are correctness, performance, and better error recovery, in that order.

The existing gcc grammar for C++ does not interpret all language constructs correctly; this has been judged unfixable while continuing to use yacc. There are already horrible kludges to get certain constructs mostly right (look at cp/spew.c, if you want to know) that become unnecessary with a recursive descent parser.

A hand-rolled recursive descent parser has a lot more information accessible to it. This should permit it to make better backtracking decisions. Bad backtracking is the major cause of bad performance in the existing parser. It will certainly permit better error recovery.

Note that this approach is known to work - the Edison Design Group (EDG)'s C/C++ front end uses a hand-written recursive descent parser, and it does all the above things better than g++. As someone who's worked on both front ends, I can also say that I found it _easier_ to work with the recursive descent parser.

zw

Forums