Jump to page: 1 25  
Page
Thread overview
Official D Grammar
Apr 01, 2013
Brian Schott
Apr 02, 2013
Walter Bright
Apr 09, 2013
Bruno Medeiros
Apr 20, 2013
Walter Bright
Apr 20, 2013
Timon Gehr
Apr 09, 2013
Jacob Carlborg
Apr 23, 2013
Stewart Gordon
Apr 23, 2013
Diggory
Apr 24, 2013
Jonathan M Davis
Apr 24, 2013
Jonathan M Davis
Apr 24, 2013
Jacob Carlborg
Apr 26, 2013
Jonathan M Davis
Apr 24, 2013
Paulo Pinto
Apr 02, 2013
Stewart Gordon
Apr 02, 2013
Jacob Carlborg
Apr 02, 2013
Tobias Pankrath
Apr 08, 2013
Tobias Pankrath
Apr 09, 2013
Philippe Sigaud
Apr 06, 2013
Bruno Medeiros
Apr 06, 2013
Bruno Medeiros
Apr 06, 2013
Jonathan M Davis
Apr 06, 2013
Artur Skawina
Apr 06, 2013
Bruno Medeiros
Apr 07, 2013
Artur Skawina
Apr 09, 2013
Bruno Medeiros
Apr 09, 2013
Artur Skawina
Apr 20, 2013
Brian Schott
Apr 20, 2013
Dmitry Olshansky
Apr 26, 2013
Brian Schott
Apr 29, 2013
Brian Schott
May 13, 2013
Brian Schott
Jun 03, 2013
Brian Schott
Jul 19, 2013
Brian Schott
Jul 19, 2013
Mr. Anonymous
Jul 19, 2013
Brian Schott
Jul 19, 2013
H. S. Teoh
Jul 19, 2013
Brian Schott
Jul 19, 2013
Johannes Pfau
Jul 20, 2013
Jacob Carlborg
Jul 20, 2013
Michael
May 01, 2013
Tobias Pankrath
May 01, 2013
Russel Winder
May 02, 2013
Bruno Medeiros
May 02, 2013
Russel Winder
May 03, 2013
Tobias Pankrath
May 02, 2013
Brian Schott
April 01, 2013
I've pretty much finished up my work on the std.d.lexer module. I am waiting for the review queue to make some progress on the other (three?) modules being reviewed before starting a thread on it.

In the meantime I've started some work on an AST module for Phobos that contains the data types necessary to build up a parser module so that we can have a standard set of code build D dev tools off of. I decided to work directly from the standard on dlang.org for this to make sure that my module is correct and that the standard is actually correct.

I've seen several threads on this newsgroup complaining about the state of the standard and unfortunately this will be another one.

1) Grammar defined in terms of things that aren't tokens. Take, for example, PropertyDeclaration. It's defined as an "@" token followed by... what? "safe"? It's not a real token. It's an identifier. You can't parse this based on checking the token type. You have to check the type and the value.

2) Grammar references rules that don't exist. UserDefinedAttribute is defined in terms of CallExpression, but CallExpression doesn't exist elsewhere in the grammar. BaseInterfaceList is defined in terms of InterfaceClasses, but that rule is never defined.

3) Unnecessary rules. KeyExpression, ValueExpression, ScopeBlockStatement, DeclarationStatement, ThenStatement, ElseStatement, Test, Increment, Aggregate, LwrExpression, UprExpression, FirstExp, LastExp, StructAllocator, StructDeallocator, EnumTag, EnumBaseType, EmptyEnumBody, ConstraintExpression, MixinIdentifier, etc... are all defined in terms of only one other rule.

I think that we need to be able to create a grammar description that:
* Fits in to a single file, so that a tool implementer does not need to collect bits of the grammar from the various pages on dlang.org.
* Can be verified to be correct by an existing tool such as Bison, Goldie, JavaCC, <your favorite here> with a small number of changes.
* Is part of the dmd/dlang repositories on github and gets updated every time the language changes.

I'm willing to work on this if there's a good chance it will actually be implemented. Thoughts?
April 02, 2013
On 4/1/2013 4:18 PM, Brian Schott wrote:
> I've pretty much finished up my work on the std.d.lexer module. I am waiting for
> the review queue to make some progress on the other (three?) modules being
> reviewed before starting a thread on it.
>
> In the meantime I've started some work on an AST module for Phobos that contains
> the data types necessary to build up a parser module so that we can have a
> standard set of code build D dev tools off of. I decided to work directly from
> the standard on dlang.org for this to make sure that my module is correct and
> that the standard is actually correct.
>
> I've seen several threads on this newsgroup complaining about the state of the
> standard and unfortunately this will be another one.
>
> 1) Grammar defined in terms of things that aren't tokens. Take, for example,
> PropertyDeclaration. It's defined as an "@" token followed by... what? "safe"?
> It's not a real token. It's an identifier. You can't parse this based on
> checking the token type. You have to check the type and the value.

True, do you have a suggestion?

>
> 2) Grammar references rules that don't exist. UserDefinedAttribute is defined in
> terms of CallExpression, but CallExpression doesn't exist elsewhere in the
> grammar. BaseInterfaceList is defined in terms of InterfaceClasses, but that
> rule is never defined.

Yes, this needs to be fixed.

>
> 3) Unnecessary rules. KeyExpression, ValueExpression, ScopeBlockStatement,
> DeclarationStatement, ThenStatement, ElseStatement, Test, Increment, Aggregate,
> LwrExpression, UprExpression, FirstExp, LastExp, StructAllocator,
> StructDeallocator, EnumTag, EnumBaseType, EmptyEnumBody, ConstraintExpression,
> MixinIdentifier, etc... are all defined in terms of only one other rule.

Using these makes documentation easier, and I don't think it harms anything.


> I think that we need to be able to create a grammar description that:
> * Fits in to a single file, so that a tool implementer does not need to collect
> bits of the grammar from the various pages on dlang.org.
> * Can be verified to be correct by an existing tool such as Bison, Goldie,
> JavaCC, <your favorite here> with a small number of changes.
> * Is part of the dmd/dlang repositories on github and gets updated every time
> the language changes.
>
> I'm willing to work on this if there's a good chance it will actually be
> implemented. Thoughts?

I suggest doing this as a sequence of pull requests, not doing just one big one.
April 02, 2013
On 02/04/2013 00:18, Brian Schott wrote:
<snip>
> I think that we need to be able to create a grammar description that:
> * Fits in to a single file, so that a tool implementer does not need to
> collect bits of the grammar from the various pages on dlang.org.
> * Can be verified to be correct by an existing tool such as Bison,
> Goldie, JavaCC, <your favorite here> with a small number of changes.
> * Is part of the dmd/dlang repositories on github and gets updated every
> time the language changes.
<snip>

Indeed, the published grammar needs to be thoroughly checked against what DMD is actually doing, and any discrepancies fixed (or filed in Bugzilla to be fixed in due course).  And then they need to be kept in sync.

Has the idea of using a parser generator to build D's parsing code been rejected in the past, or is hand-coding just the way Walter decided to do it?  Is the code any more efficient than what a typical parser generator would generate?

And all disambiguation rules (such as "if it's parseable as a DeclarationStatement, it's a DeclarationStatement") need to be made explicit as part of the grammar.  I suppose this is where using Bison or similar would help, as it would point out any ambiguities in the grammar that need rules to resolve them.

Stewart.
April 02, 2013
On 2013-04-02 15:21, Stewart Gordon wrote:

> Indeed, the published grammar needs to be thoroughly checked against
> what DMD is actually doing, and any discrepancies fixed (or filed in
> Bugzilla to be fixed in due course).  And then they need to be kept in
> sync.
>
> Has the idea of using a parser generator to build D's parsing code been
> rejected in the past, or is hand-coding just the way Walter decided to
> do it?  Is the code any more efficient than what a typical parser
> generator would generate?
>
> And all disambiguation rules (such as "if it's parseable as a
> DeclarationStatement, it's a DeclarationStatement") need to be made
> explicit as part of the grammar.  I suppose this is where using Bison or
> similar would help, as it would point out any ambiguities in the grammar
> that need rules to resolve them.

I'm wondering if it's possibly to mechanically check that what's in the grammar is how DMD behaves.

-- 
/Jacob Carlborg
April 02, 2013
> I'm wondering if it's possibly to mechanically check that what's in the grammar is how DMD behaves.

Take the grammar and (randomly) generate strings with it and check if DMD does complain. You'd need a parse only don't check semantics flag, though.

This will not check if the strings are parsed correctly by DMD nor if invalid strings are rejected. But it would be a start.

April 06, 2013
On 02/04/2013 00:18, Brian Schott wrote:
> I've pretty much finished up my work on the std.d.lexer module. I am
> waiting for the review queue to make some progress on the other (three?)
> modules being reviewed before starting a thread on it.
>
> In the meantime I've started some work on an AST module for Phobos that
> contains the data types necessary to build up a parser module so that we
> can have a standard set of code build D dev tools off of. I decided to
> work directly from the standard on dlang.org for this to make sure that
> my module is correct and that the standard is actually correct.
>
> I've seen several threads on this newsgroup complaining about the state
> of the standard and unfortunately this will be another one.
>
> 1) Grammar defined in terms of things that aren't tokens. Take, for
> example, PropertyDeclaration. It's defined as an "@" token followed
> by... what? "safe"? It's not a real token. It's an identifier. You can't
> parse this based on checking the token type. You have to check the type
> and the value.
>
> 2) Grammar references rules that don't exist. UserDefinedAttribute is
> defined in terms of CallExpression, but CallExpression doesn't exist
> elsewhere in the grammar. BaseInterfaceList is defined in terms of
> InterfaceClasses, but that rule is never defined.
>
> 3) Unnecessary rules. KeyExpression, ValueExpression,
> ScopeBlockStatement, DeclarationStatement, ThenStatement, ElseStatement,
> Test, Increment, Aggregate, LwrExpression, UprExpression, FirstExp,
> LastExp, StructAllocator, StructDeallocator, EnumTag, EnumBaseType,
> EmptyEnumBody, ConstraintExpression, MixinIdentifier, etc... are all
> defined in terms of only one other rule.
>
> I think that we need to be able to create a grammar description that:
> * Fits in to a single file, so that a tool implementer does not need to
> collect bits of the grammar from the various pages on dlang.org.
> * Can be verified to be correct by an existing tool such as Bison,
> Goldie, JavaCC, <your favorite here> with a small number of changes.
> * Is part of the dmd/dlang repositories on github and gets updated every
> time the language changes.
>
> I'm willing to work on this if there's a good chance it will actually be
> implemented. Thoughts?

Interesting thread. I've been working on a hand-written D parser (in Java, for the DDT IDE) and I too have found a slew of grammar spec issues. Some of them more serious than the ones you mentioned above. In same cases it's actually not clear, or downright wrong what the grammar spec says. For example, here's one off of my notes:

  void func(int foo() { } );

The spec says that is parsable (basically a function declaration in the parameter list), which makes no sense, and DMD doesn't accept. Some cases are a bit trickier, since it's not clear if the syntax should be accepted or not (sometimes they might make sense but not be allowed).

These issues make things a bit harder for tools development that require D language parsers. But the whole grammar spec is so messy, I've been unsure whether it's worth filling bug reports or not (would they be addressed?). There is also the problem that even if those issues are fixed now, the spec could very easily fall out of date in the future, unless we have some system to test the spec. Like you mentioned, ideally we would have a grammar spec for a grammar/PG tool so that correctness could more easily be verified.
(it doesn't guarantee no spec bugs, but it makes it much harder for them to be there)


-- 
Bruno Medeiros - Software Engineer
April 06, 2013
On 02/04/2013 00:18, Brian Schott wrote:
> I've pretty much finished up my work on the std.d.lexer module. I am
> waiting for the review queue to make some progress on the other (three?)
> modules being reviewed before starting a thread on it.
>

BTW, even in the lexer spec I've found an issue. How does this parse:
  5.blah
According to the spec (maximal munch technique), it should be FLOAT then IDENTIFIER. But DMD parses it as INTEGER DOT IDENTIFIER. I'm assuming the lastest is the correct behavior, so you can write stuff like 123.init, but that should be clarified.


-- 
Bruno Medeiros - Software Engineer
April 06, 2013
On Saturday, April 06, 2013 16:21:12 Bruno Medeiros wrote:
> On 02/04/2013 00:18, Brian Schott wrote:
> > I've pretty much finished up my work on the std.d.lexer module. I am waiting for the review queue to make some progress on the other (three?) modules being reviewed before starting a thread on it.
> 
> BTW, even in the lexer spec I've found an issue. How does this parse:
> 5.blah
> According to the spec (maximal munch technique), it should be FLOAT then
> IDENTIFIER. But DMD parses it as INTEGER DOT IDENTIFIER. I'm assuming
> the lastest is the correct behavior, so you can write stuff like
> 123.init, but that should be clarified.

It would definitely have to be INTEGER DOT IDENTIFIER due to UFCS, so it sounds like the spec wasn't updated like it should have been.

- Jonathan M Davis
April 06, 2013
On 04/06/13 17:21, Bruno Medeiros wrote:
> On 02/04/2013 00:18, Brian Schott wrote:
>> I've pretty much finished up my work on the std.d.lexer module. I am waiting for the review queue to make some progress on the other (three?) modules being reviewed before starting a thread on it.
>>
> 
> BTW, even in the lexer spec I've found an issue. How does this parse:
>   5.blah
> According to the spec (maximal munch technique), it should be FLOAT then IDENTIFIER. But DMD parses it as INTEGER DOT IDENTIFIER. I'm assuming the lastest is the correct behavior, so you can write stuff like 123.init, but that should be clarified.

"1..2", "1.ident" and a float literal with '_' after the '.' are the
DecimalFloat cases that I immediately ran into when doing a lexer based on
the dlang grammar. It's obvious to a human how these should be handled, but
code generators aren't that smart... But they are good at catching mistakes
like these.
Actually, that last case is even more "interesting"; http://dlang.org/lex.html
has "1_2_3_4_5_6_._5_6_7_8" as a valid example, which of course it's not
("_5_6_7_8" is a valid identifier), but there is no reason do disallow
"1_2_3_4_5_6_.5_6_7_8".

> that should be clarified.

These are just grammar bugs, that could easily be fixed.  Then there are some things that can be less obvious, but shouldn't really be controversial like allowing empty HexString literals.

Then there's the enhancement category.
Looking through my comments, I think the only deliberate change from dlang.org
that I have is in DelimitedString -- there is no reason to forbid q"/abc/def/";
there are no back-compat issues, as it couldn't have existed in legacy D code.

artur
April 06, 2013
On 06/04/2013 20:52, Artur Skawina wrote:
> On 04/06/13 17:21, Bruno Medeiros wrote:
>> On 02/04/2013 00:18, Brian Schott wrote:
>>> I've pretty much finished up my work on the std.d.lexer module. I am
>>> waiting for the review queue to make some progress on the other (three?)
>>> modules being reviewed before starting a thread on it.
>>>
>>
>> BTW, even in the lexer spec I've found an issue. How does this parse:
>>    5.blah
>> According to the spec (maximal munch technique), it should be FLOAT then IDENTIFIER. But DMD parses it as INTEGER DOT IDENTIFIER. I'm assuming the lastest is the correct behavior, so you can write stuff like 123.init, but that should be clarified.
>
> "1..2", "1.ident" and a float literal with '_' after the '.' are the
> DecimalFloat cases that I immediately ran into when doing a lexer based on
> the dlang grammar. It's obvious to a human how these should be handled, but
> code generators aren't that smart... But they are good at catching mistakes
> like these.

The "1..2" is actually mentioned in the spec:
"An exception to this rule is that a .. embedded inside what looks like two floating point literals, as in 1..2, is interpreted as if the .. was separated by a space from the first integer."
so it's there, even if it can be missed.

But unless I missed it, the spec is incorrect for the "1.ident" or "1_2_3_4_5_6_._5_6_7_8" cases as there is no exception mentioned there... and it's not always 100% obvious to a human how these should be handled. Or maybe that's just me :)

-- 
Bruno Medeiros - Software Engineer
« First   ‹ Prev
1 2 3 4 5