April 29, 2013
On Friday, 26 April 2013 at 20:44:12 UTC, Brian Schott wrote:
> I'm beginning to think that ANTRL is not up to the task either. I've somehow managed to get the grammar to the point where it correctly parses several phobos modules but takes a half hour to do so.

I think I'm done with this grammar for now. If there's any interest, I can use it to create syntax diagrams for inclusion on dlang.org.

I'll also be filing a series of bug reports over the next few days. (Two recent examples: 0.0L is not valid according to the lexical standard, and deprecated("string") is not documented anywhere)
April 29, 2013
On 4/28/13 8:16 PM, Brian Schott wrote:
> On Friday, 26 April 2013 at 20:44:12 UTC, Brian Schott wrote:
>> I'm beginning to think that ANTRL is not up to the task either. I've
>> somehow managed to get the grammar to the point where it correctly
>> parses several phobos modules but takes a half hour to do so.
>
> I think I'm done with this grammar for now. If there's any interest, I
> can use it to create syntax diagrams for inclusion on dlang.org.

Yes! This is awesome. Thanks for this work.

> I'll also be filing a series of bug reports over the next few days. (Two
> recent examples: 0.0L is not valid according to the lexical standard,
> and deprecated("string") is not documented anywhere)

Great.


Andrei
May 01, 2013
On Friday, 26 April 2013 at 20:44:12 UTC, Brian Schott wrote:
> On Saturday, 20 April 2013 at 08:31:34 UTC, Brian Schott wrote:
>> This uses ANTLR, as the other parser generators can't handle D's grammar.
>
> I'm beginning to think that ANTRL is not up to the task either. I've somehow managed to get the grammar to the point where it correctly parses several phobos modules but takes a half hour to do so.

Do you know, which parts of the grammar make the parser so slow?
Do you use any suspicious features of ANTLR?
May 01, 2013
On Wed, 2013-05-01 at 17:34 +0200, Tobias Pankrath wrote:
> On Friday, 26 April 2013 at 20:44:12 UTC, Brian Schott wrote:
> > On Saturday, 20 April 2013 at 08:31:34 UTC, Brian Schott wrote:
> >> This uses ANTLR, as the other parser generators can't handle D's grammar.

Which other parser generators? cf. for example, http://en.wikipedia.org/wiki/Comparison_of_parser_generators

> > I'm beginning to think that ANTRL is not up to the task either. I've somehow managed to get the grammar to the point where it correctly parses several phobos modules but takes a half hour to do so.

LL(k) can sometimes do this. As can LALR(1) of course.

> Do you know, which parts of the grammar make the parser so slow? Do you use any suspicious features of ANTLR?

There isn't just one ANTLR: ANTLR 2, ANTLR 3, and ANTLR 4 are very different beasties.

Not many of the parser generators generate D code :-(

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


May 02, 2013
On 26/04/2013 21:44, Brian Schott wrote:
> On Saturday, 20 April 2013 at 08:31:34 UTC, Brian Schott wrote:
>> This uses ANTLR, as the other parser generators can't handle D's grammar.
>
> I'm beginning to think that ANTRL is not up to the task either. I've
> somehow managed to get the grammar to the point where it correctly
> parses several phobos modules but takes a half hour to do so.

Half hour??? That seems like a massive amount of time, even if the generated parser is very suboptimal and is doing a lot of unnecessary backtracking. I mean, even if the parser has quadratic performance over typical usage source files, it seems like a lot.

To be honest, that's one of the reasons that put me off working with ANLTR. It seems easy to create a parser with ANTLR, but to create an efficient, well-behaved parser it looks quite complicated, in the sense that you can't abstract yourself from what is happening under the hood... you have to read a lot of theory and documention to learn the innards of ANTLR, and understand what kind of code it's actually generating, and how it processes input. (at moments it feels like you have to take a degree to learn how to use it effectively...)

-- 
Bruno Medeiros - Software Engineer
May 02, 2013
On Thu, 2013-05-02 at 17:44 +0100, Bruno Medeiros wrote: […]
> To be honest, that's one of the reasons that put me off working with ANLTR. It seems easy to create a parser with ANTLR, but to create an efficient, well-behaved parser it looks quite complicated, in the sense that you can't abstract yourself from what is happening under the hood... you have to read a lot of theory and documention to learn the innards of ANTLR, and understand what kind of code it's actually generating, and how it processes input. (at moments it feels like you have to take a degree to learn how to use it effectively...)

The Groovy parser is an ANTLR 2.7.7 grammar, it works well and quickly.
I think the trick is to work with the LL(k) idioms and avoid letting
LALR(1) thoughts creep in.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


May 02, 2013
On Thursday, 2 May 2013 at 16:44:44 UTC, Bruno Medeiros wrote:
> (at moments it feels like you have to take a degree to learn how to use it effectively...)

That or buy the documentation. Or both...
May 03, 2013
On Thursday, 2 May 2013 at 17:13:46 UTC, Russel Winder wrote:
> On Thu, 2013-05-02 at 17:44 +0100, Bruno Medeiros wrote:
> […]
>> To be honest, that's one of the reasons that put me off working with ANLTR. It seems easy to create a parser with ANTLR, but to create an efficient, well-behaved parser it looks quite complicated, in the sense that you can't abstract yourself from what is happening under the hood... you have to read a lot of theory and documention to learn the innards of ANTLR, and understand what kind of code it's actually generating, and how it processes input. (at moments it feels like you have to take a degree to learn how to use it effectively...)
>
> The Groovy parser is an ANTLR 2.7.7 grammar, it works well and quickly.
> I think the trick is to work with the LL(k) idioms and avoid letting
> LALR(1) thoughts creep in.

LALR(1) is O(n) last time I looked. Both Ll(k) and LALR(k) should never backtrack. Dunno what Antlr is doing.
May 13, 2013
I wrote a small program to take the grammar and generate input to the dot program from Graphviz. You can generate the diagrams with some scripts available on Github or view a prebuilt version here:

http://hackerpilot.org/d/dgrammar.zip

Please take a look at this and let me know if you find any mistakes.
June 03, 2013
As threatened at DConf, I've started filing bugs against the grammar specification. Anyone interested can track bug 10233[1], which I've marked as blocked by the various issues I've been finding.

As usual, my best guess at D's actual grammar[2] is located here[3].

Top secret project is hinted at here[4].

[1] http://d.puremagic.com/issues/show_bug.cgi?id=10233
[2] I disclaim all responsibility for injuries sustained while looking at the functionDefinition rule.
[3] https://github.com/Hackerpilot/DGrammar/blob/master/D.g4
[4] http://hackerpilot.github.io/experimental/std_lexer/phobos/parser.html