Jump to page: 1 2 3
Thread overview
D Parsing (again)/ D grammar
Oct 02, 2014
Vladimir Kazanov
Oct 02, 2014
Daniel Kozak
Oct 02, 2014
Vladimir Kazanov
Oct 02, 2014
Vladimir Kazanov
Oct 02, 2014
Cliff
Oct 02, 2014
Vladimir Kazanov
Oct 02, 2014
Cliff
Oct 02, 2014
Vladimir Kazanov
Oct 02, 2014
Philippe Sigaud
Oct 02, 2014
Vladimir Kazanov
Oct 02, 2014
Philippe Sigaud
Oct 02, 2014
Vladimir Kazanov
Oct 03, 2014
Philippe Sigaud
Oct 03, 2014
Vladimir Kazanov
Oct 03, 2014
Philippe Sigaud
Oct 03, 2014
Vladimir Kazanov
Oct 03, 2014
Philippe Sigaud
Oct 03, 2014
Vladimir Kazanov
Oct 02, 2014
Vladimir Kazanov
Oct 03, 2014
Philippe Sigaud
Oct 02, 2014
Vladimir Kazanov
October 02, 2014
Hello, everyone!

Recently, I have been doing some experimental work related to generalized parser generators. It is not yet ready for a public release, I am not even sure it will ever be. Anyway, right now I am considering adding D as a target for parser generation. Actually, it's more than that: the grand target is to be able to generate a D language parser in D. With Python, for example, it is rather easy, as I can just mechanically transform the original grammar into a suitable form and be done with it.

The generator itself is quite powerful, theoretically it should be able to handle all context-free grammar (see http://dotat.at/tmp/gll.pdf for theory).

There is a grammar page on dlang.org (http://dlang.org/grammar.html). I have also found a related discussion in the forum (http://forum.dlang.org/thread/bwsofbnigfbrxwouiobj@forum.dlang.org). From the discussion I found out that D parser is a hand-made RD-parser with "a few tricks"(c).

So, my questions:

1. How realistic is the grammar? Is it a real one or just a sketch of sorts? Is it something developers use to build the parser?

2. There's also a D grammar for Pegged. How good is it? Can it be used as a starting point to build a parser?
October 02, 2014
V Thu, 02 Oct 2014 13:49:10 +0000
Vladimir Kazanov via Digitalmars-d <digitalmars-d@puremagic.com>
napsáno:

> Hello, everyone!
> 
> Recently, I have been doing some experimental work related to generalized parser generators. It is not yet ready for a public release, I am not even sure it will ever be. Anyway, right now I am considering adding D as a target for parser generation. Actually, it's more than that: the grand target is to be able to generate a D language parser in D. With Python, for example, it is rather easy, as I can just mechanically transform the original grammar into a suitable form and be done with it.
> 
> The generator itself is quite powerful, theoretically it should be able to handle all context-free grammar (see http://dotat.at/tmp/gll.pdf for theory).
> 
> There is a grammar page on dlang.org
> (http://dlang.org/grammar.html). I have also found a related
> discussion in the forum
> (http://forum.dlang.org/thread/bwsofbnigfbrxwouiobj@forum.dlang.org).
>  From the discussion I found out that D parser is a hand-made
> RD-parser with "a few tricks"(c).
> 
> So, my questions:
> 
> 1. How realistic is the grammar? Is it a real one or just a sketch of sorts? Is it something developers use to build the parser?
> 
> 2. There's also a D grammar for Pegged. How good is it? Can it be used as a starting point to build a parser?

https://github.com/Hackerpilot/DGrammar

October 02, 2014
On Thursday, 2 October 2014 at 13:49:12 UTC, Vladimir Kazanov wrote:
> The generator itself is quite powerful, theoretically it should be able to handle all context-free grammar (see http://dotat.at/tmp/gll.pdf for theory).

Cool, GLL is the way to go IMO, but I am also looking at Earley-parsers. What is the advantage of GLL over Earley if you use a parser generator? I think they both are O(3) or something like that?

> From the discussion I found out that D parser is a hand-made RD-parser with "a few tricks"(c).

I think D is close to LL(2) for the most part. But I suppose a GLL parser could allow keywords to be used as symbol names in most cases? That would be nice.

October 02, 2014
On Thursday, 2 October 2014 at 14:08:09 UTC, Daniel Kozak via Digitalmars-d wrote:

>
> https://github.com/Hackerpilot/DGrammar

Thank you, this definitely looks like something I was looking for.

Any known pitfalls here..?
October 02, 2014
On Thursday, 2 October 2014 at 15:01:13 UTC, Ola Fosheim Grøstad wrote:
>
> Cool, GLL is the way to go IMO, but I am also looking at Earley-parsers. What is the advantage of GLL over Earley if you use a parser generator? I think they both are O(3) or something like that?

They are somewhat similar in terms of asymptotic complexity on complicated examples. Constant is better though. But there's a nice property of all generalized parsers: for LL (for GLL) and  LR (for GLR) parts of grammars they go almost as fast as LL/LR parsers do. On ambiguities they slow down, of course.

There are four properties I really like:

1. GLL should be faster than Earley's (even the modern incarnations of it), but this is something I have yet to test.

2. It is fully general.

3. The automatically generated code repeats the original grammar structure - the same way recursive decent parsers do.

4. The core parser is still that simple LL/RD parser I can practically debug.

This comes at a price, as usual... I would not call it obvious :-) But nobody can say that modern Earley's flavours are trivial.

>> From the discussion I found out that D parser is a hand-made RD-parser with "a few tricks"(c).
>
> I think D is close to LL(2) for the most part. But I suppose a GLL parser could allow keywords to be used as symbol names in most cases? That would be nice.

This is possible, I guess, the same way people do it in GLR parsers.
October 02, 2014
On Thursday, 2 October 2014 at 15:47:04 UTC, Vladimir Kazanov wrote:
> On Thursday, 2 October 2014 at 15:01:13 UTC, Ola Fosheim Grøstad wrote:
>>
>> Cool, GLL is the way to go IMO, but I am also looking at Earley-parsers. What is the advantage of GLL over Earley if you use a parser generator? I think they both are O(3) or something like that?
>
> They are somewhat similar in terms of asymptotic complexity on complicated examples. Constant is better though. But there's a nice property of all generalized parsers: for LL (for GLL) and  LR (for GLR) parts of grammars they go almost as fast as LL/LR parsers do. On ambiguities they slow down, of course.
>
> There are four properties I really like:
>
> 1. GLL should be faster than Earley's (even the modern incarnations of it), but this is something I have yet to test.
>
> 2. It is fully general.
>
> 3. The automatically generated code repeats the original grammar structure - the same way recursive decent parsers do.
>
> 4. The core parser is still that simple LL/RD parser I can practically debug.
>
> This comes at a price, as usual... I would not call it obvious :-) But nobody can say that modern Earley's flavours are trivial.
>
>>> From the discussion I found out that D parser is a hand-made RD-parser with "a few tricks"(c).
>>
>> I think D is close to LL(2) for the most part. But I suppose a GLL parser could allow keywords to be used as symbol names in most cases? That would be nice.
>
> This is possible, I guess, the same way people do it in GLR parsers.

What has steered you down the path of writing your own parser generator as opposed to using an existing one such as ANTLR?  Were there properties you wanted that it didn't have, or performance, or...?
October 02, 2014
On Thursday, 2 October 2014 at 15:47:04 UTC, Vladimir Kazanov wrote:
> 3. The automatically generated code repeats the original grammar structure - the same way recursive decent parsers do.
>
> 4. The core parser is still that simple LL/RD parser I can practically debug.

This sounds nice! Does this mean that it would be possible to use your parser generator to create a skeleton which is then manipulated manually or is there non-local complexities that makes manual edits risky?

One reason I believe GLL is the right way to go is that I think RD makes it easier to generate good error messages suitable for display (to the end user).
October 02, 2014
On Thursday, 2 October 2014 at 17:17:53 UTC, Cliff wrote:

>
> What has steered you down the path of writing your own parser generator as opposed to using an existing one such as ANTLR?  Were there properties you wanted that it didn't have, or performance, or...?

Like I said in the introducing post, this is a personal experiment of sorts. I am aware of most alternatives, such as ANTLR's ALL(*) and many, MANY others. :) And I would never write something myself as a part of my full-time job.

But right now I am writing an article on generalized parsers, toying with implementations I could lay my hands on, implementing others. GLL is a rather exotic LL flavor which looks attractive in theory. I want to see it in practice.
October 02, 2014
On Thursday, 2 October 2014 at 17:43:45 UTC, Vladimir Kazanov wrote:
> On Thursday, 2 October 2014 at 17:17:53 UTC, Cliff wrote:
>
>>
>> What has steered you down the path of writing your own parser generator as opposed to using an existing one such as ANTLR?  Were there properties you wanted that it didn't have, or performance, or...?
>
> Like I said in the introducing post, this is a personal experiment of sorts. I am aware of most alternatives, such as ANTLR's ALL(*) and many, MANY others. :) And I would never write something myself as a part of my full-time job.
>
> But right now I am writing an article on generalized parsers, toying with implementations I could lay my hands on, implementing others. GLL is a rather exotic LL flavor which looks attractive in theory. I want to see it in practice.

Very cool - post the GitHub or equivalent when you get the chance (assuming you are sharing).  This is an area of interest for me as well.
October 02, 2014
On Thursday, 2 October 2014 at 17:39:55 UTC, Ola Fosheim Grøstad wrote:
> On Thursday, 2 October 2014 at 15:47:04 UTC, Vladimir Kazanov wrote:
>> 3. The automatically generated code repeats the original grammar structure - the same way recursive decent parsers do.
>>
>> 4. The core parser is still that simple LL/RD parser I can practically debug.
>
> This sounds nice! Does this mean that it would be possible to use your parser generator to create a skeleton which is then manipulated manually or is there non-local complexities that makes manual edits risky?

Chances are that I will be able to get the original GLL parser generator from one of algorithm authors (Adrian Johnstone). He's really helpful here. From that point, I will only have to add a frontend for generating a concrete parser, starting with Python - it already has a fully working grammar. Hopefully, I will also be able to find a suitable grammar for D, it is always a pleasure to play with the language.

Otherwise, I will continue my current effort - implementing the GLL parser generator in D.

Anyway, right now, from what I see in the papers, it looks like it is quite possible, yes. Again, requires a bit of understanding (nothing special, really), but compared to dumb LALR-style tables it's nothing.

> One reason I believe GLL is the right way to go is that I think RD makes it easier to generate good error messages suitable for display (to the end user).

This is one of the reasons I prefer LL/RD parsers :-) They are easy to follow.
« First   ‹ Prev
1 2 3