View mode: basic / threaded / horizontal-split · Log in · Help
July 08, 2012
Re: Let's stop parser Hell
On 08-Jul-12 14:48, Tobias Pankrath wrote:
> On Sunday, 8 July 2012 at 09:14:32 UTC, Dmitry Olshansky wrote:
>> On 08-Jul-12 13:05, Tobias Pankrath wrote:
>>>>> Yup, LL(*) is my favorite so far.
>>>>
>>>> That's Terence Parr's discovery, right? I've always liked ANTLR, so if
>>>> PEGs turn out to have issues LL(*) sounds like a promising alternative.
>>>
>>> We should also consider using GLR if LL(*) doesn't work.
>>
>> GLR ... are you serious? It still does parsing in n^3 if I'm not
>> mistaken.
>
> Since GLR can parse any CFG the upper bound is in O(n^3), but the actual
> performance seems to depend on the grammar.
>
>  From Elkhound [1]
>> It should be competitive with Bison for LALR (fragements of) grammars,
>> and degrade gracefully from there. On the scale of grammar
>> nondeterminism, from none
>> (LALR) to some to lots, "some" is the niche Elkhound is going after.
>> These goals are driven by Elkhound's primary application, the Elsa C++
>> Parser. In essence, Elkhound came about because I wanted to apply
>> automatic parsing technology to parsing C++, but found exiting systems
>> inadequate.
>
> So it seems to me that it is not worse than PEG if you have a grammar with
> reasonable many ambiguities.
>
> Performance is one thing, but you should be able to express your
> language in the underlying formalism without too many hurdles.
>

The problem is that say LL(*) can be easily tweaked in place to run 
various semantic actions as it parse the source. I think GLR is harder 
to make do anything other then "parse & say it's fine".


-- 
Dmitry Olshansky
July 08, 2012
Re: Let's stop parser Hell
On 2012-07-07 18:49, deadalnix wrote:

> I tried that. This is almost impossible, dmd's parser and AST are very
> tightly mixed with dmd's internals.

I suspected that.

-- 
/Jacob Carlborg
July 08, 2012
Re: Let's stop parser Hell
> The problem is that say LL(*) can be easily tweaked in place to 
> run various semantic actions as it parse the source. I think 
> GLR is harder to make do anything other then "parse & say it's 
> fine".

Did you look at Elkhound? According to the tutorial you can use 
actions
in the same way as if you were using an (LA)LR parser. Dunno if 
it's sufficient, but with Elkhound an C++-Parser has been written.

See also 
http://stackoverflow.com/questions/428892/what-parser-generator-do-you-recommend 
where Ira Baxter (maybe you know him) makes a case for DMS, which 
uses GLR internally. Here is a google tech talk of Ira explaining 
the DMS http://www.youtube.com/watch?v=C-_dw9iEzhA .

I'm not very experienced with parser generators. Just trying to 
put all options on the table.
July 08, 2012
Re: Let's stop parser Hell
On 2012-07-07 19:53, Jonathan M Davis wrote:

> There are multiple issues here. The one that Andrei is raising is the fact
> that D isn't properly formalized. Essentially, the compiler _is_ the spec, and
> while the online spec _mostly_ follows it, it doesn't entirely, and the online
> spec isn't always as precise as it needs to be regardless. With a fully
> formalized spec, it should be possible to fully implement a D compiler from
> the spec alone, and I don't think that that's currently possible.

That's been a problem for a long time.

> Writing a D lexer and parser (if not a full-blown compiler) from scratch would
> help highlight the places in the spec which are lacking, and having it be part
> of Phobos would arguably increase Walter's incentive to make sure that the
> spec is in line with the compiler (and vice versa) so that stuff _other_ than
> the compiler which is based on the spec would be able to match the compiler.
>
> Clang is in a _completely_ different situation. It's a C/C++ compiler, and both
> C and C++ already have official, formalized specs which Clang conforms to (or is
> supposed to anyway). Clang has no control over the spec at all. It merely
> implements it. So, there is no issue of trying to keep other tools or
> compilers in line with Clang due to it being the language's spec like we
> effectively have with dmd. It may help the tools which use Clang to be fully in
> line with Clang and not have to worry about whether Clang implements the spec
> slightly differently, but in theory, if they all follow the spec correctly,
> that isn't in issue (though obviously in practice it can be).

That should be true for D as well when the spec is complete.

> In D's case, all of the major D compilers use the same frontend, which helps
> compatability but harms the spec, because there's less incentive to keep it
> precise and  up-to-date. So, from the perspective of the spec, implementing
> the D lexer and parser for Phobos separately from dmd would be of great
> benefit.

That might be true.

> IMHO, the reason that porting dmd's lexer and parser would be of great benefit
> is primarily maintenance. It makes it much easier to keep Phobos' lexer and
> parser in line with dmd, making discrepencies less likely, but it arguably
> reduces the incentive to improve the spec.

But then it sounds like the best solution would be not to have a 
lexer/parser based on DMD and instead making sure the spec is correct.

> The benefits of having a lexer and parser as a library (regardless of whether
> it's from scratch or a port from dmd) are primarly derived from the fact that
> it makes it much easier to create tools which use them. Such tools no longer
> have to write their own lexers or parsers.

Of course, that's the whole point.

> If the compiler uses the same library, it has the added benefit of making it so
> that any tool using the library will be in sync with the compiler, but if the
> spec is properly formalized and up-to-date, and the library is kep up-to-date
> with _it_, that shouldn't really be a problem. You still have the debate as to
> whether it's better to have a separate implementation based on the spec
> (thereby making it more likely that the spec is correct) or whether it's
> better to have the compiler share the implementation so that the library is
> guaranteed to match the compiler (though not necessarily the spec), but I
> think that that's a separate debate from whether we should have the lexer and
> parser as a library.

What keeps popping up in my head is the scenario where users are 
complaining about the frontend in Phobos not behaving as their compiler. 
If this is due to they are out of sync, bugs or not following the spec 
doesn't matter.

I still thinks D is changing too much to have two separate 
implementations of the compiler and a library like this.

> In all honesty though, I would be surprised if you could convince Walter to
> switch dmd's frontend to Phobos' lexer and parser even once they've been
> implemented. So, while I agree that there are benefits in doing so, I'm not
> sure how much chance you have of ever getting any traction with that.

That is perhaps true.

> Another thing to consider is that that might make it so that gdc and ldc
> couldn't share the same frontend with dmd (assuming that they have to keep
> their frontends in C or C++ -  I don't know if they do) - but if so, that
> would increase the incentive for the spec to be correct if dmd ever started
> using a D frontend.

-- 
/Jacob Carlborg
July 08, 2012
Re: Let's stop parser Hell
On Sunday, 8 July 2012 at 09:14:32 UTC, Dmitry Olshansky wrote:
> On 08-Jul-12 13:05, Tobias Pankrath wrote:
>>>> Yup, LL(*) is my favorite so far.
>>>
>>> That's Terence Parr's discovery, right? I've always liked 
>>> ANTLR, so if
>>> PEGs turn out to have issues LL(*) sounds like a promising 
>>> alternative.
>>
>> We should also consider using GLR if LL(*) doesn't work.
>
> GLR ... are you serious? It still does parsing in n^3 if I'm 
> not mistaken.

http://www.antlr.org/papers/LL-star-PLDI11.pdf describes some 
disadvantages of GLR with respect to LL(*).
July 08, 2012
Re: Let's stop parser Hell
On Sunday, July 08, 2012 14:13:18 Jacob Carlborg wrote:
> On 2012-07-07 19:53, Jonathan M Davis wrote:
> > There are multiple issues here. The one that Andrei is raising is the fact
> > that D isn't properly formalized. Essentially, the compiler _is_ the spec,
> > and while the online spec _mostly_ follows it, it doesn't entirely, and
> > the online spec isn't always as precise as it needs to be regardless.
> > With a fully formalized spec, it should be possible to fully implement a
> > D compiler from the spec alone, and I don't think that that's currently
> > possible.
> 
> That's been a problem for a long time.

True, but Andrei is basically arguing that porting dmd's lexer and parser to D 
would reduce the incentive to fix this, whereas having a new lexer/parser would 
encourage fixing it (though he's arguing for purely using a generative parser 
rather than writing a D-specific one).

> > IMHO, the reason that porting dmd's lexer and parser would be of great
> > benefit is primarily maintenance. It makes it much easier to keep Phobos'
> > lexer and parser in line with dmd, making discrepencies less likely, but
> > it arguably reduces the incentive to improve the spec.
> 
> But then it sounds like the best solution would be not to have a
> lexer/parser based on DMD and instead making sure the spec is correct.

That would be the question. Given enough time, what I'd probably want to do 
would be to port dmd's lexer and parser to D in order to properly figure it all 
out, updating the spec in the process, and _then_ go write one from scratch, 
possibly basing some of it on how dmd did it, possibly not. But I don't really 
have the time for any of that right now, and most of the focus right now from 
people interested in parsing seems to be on pegged and parser generators 
(which are very cool and in some ways much more interesting, but I seriously 
question that that's the performant way to go if you're looking to parse D 
specifically). So, who knows what we're going to get out of this and when we'll 
get it.

> What keeps popping up in my head is the scenario where users are
> complaining about the frontend in Phobos not behaving as their compiler.
> If this is due to they are out of sync, bugs or not following the spec
> doesn't matter.
> 
> I still thinks D is changing too much to have two separate
> implementations of the compiler and a library like this.

Well, for the lexer and parser, we're probably okay (or very close to it). As 
Daniel pointed out elsewhere in this thread, that part of the frontend doesn't 
change much (particularly the lexer). There's definitely still some churn, but 
it's nothing like it used to be.

- Jonathan M Davis
July 08, 2012
Re: Let's stop parser Hell
On Sunday, 8 July 2012 at 20:06:07 UTC, Jonathan M Davis wrote:
> most of the focus right now from people interested in parsing 
> seems to be on pegged and parser generators (which are very 
> cool and in some ways much more interesting, but I seriously 
> question that that's the performant way to go if you're looking 
> to parse D specifically).

Can you provide a *specific* example of performance optimization 
which a custom D compiler would have, but parser generator would 
be unlikely to catch up.
July 08, 2012
Re: Let's stop parser Hell
On Sunday, July 08, 2012 22:15:26 Roman D. Boiko wrote:
> On Sunday, 8 July 2012 at 20:06:07 UTC, Jonathan M Davis wrote:
> > most of the focus right now from people interested in parsing
> > seems to be on pegged and parser generators (which are very
> > cool and in some ways much more interesting, but I seriously
> > question that that's the performant way to go if you're looking
> > to parse D specifically).
> 
> Can you provide a *specific* example of performance optimization
> which a custom D compiler would have, but parser generator would
> be unlikely to catch up.

It's been too long since I was actively working on parsers to give any 
details, but it is my understanding that because a hand-written parser is 
optimized for a specific grammar, it's going to be faster. Also, look at dmd 
and dmc vs other compilers. They use hand-written parsers and are generally 
much faster than their competitors.

One thing to remember about hand-written parsers vs generative ones though is 
that they usually are completely different in terms of the type of parser that 
you write (e.g. hand-written parsers are generally recursive-decent parser 
whereas generative ones usually use bottom-up parsers). So, that could have a 
large impact on performance as well (in either direction).

To be clear though, I have _no_ problem with having a generative parser in 
Phobos (or having other 3rd party options available). Parsers like pegged are 
extremely cool and extremely useful. It's just that it's my understanding that 
well-written hand-written parsers are faster than generated ones, so I think 
that it would be benecial to have a hand-written parser for D in Phobos _in 
addition_ to a general, generative solution.

But to fully prove that a hand-written one would be faster, we'd of course 
have to have actual solutions to compare. And if the API for a D-specific 
parser in Phobos is designed well enough, and it somehow proved that a 
generative solution was faster, then the hand-written one could be replaced by 
the generative one underneat the hood.

- Jonathan M Davis
July 08, 2012
Re: Let's stop parser Hell
On Sunday, 8 July 2012 at 21:03:41 UTC, Jonathan M Davis wrote:
> It's been too long since I was actively working on parsers to 
> give any details, but it is my understanding that because a 
> hand-written parser is optimized for a specific grammar, it's 
> going to be faster.

My aim is to find out any potential bottlenecks and ensure that 
those are possible to get rid of. So, let's try.

I believe it would not hurt generality or quality of a parser 
generator if it contained sews for inserting custom (optimized) 
code where necessary, including those needed to take advantage of 
some particular aspects of D grammar. Thus I claim that 
optimization for D grammar is possible.

> Also, look at dmd and dmc vs other compilers. They use 
> hand-written parsers and are generally much faster than their 
> competitors.

Due to which particular design or implementation decisions? Would 
it be possible to name a few which are relevant in this context?

> One thing to remember about hand-written parsers vs generative 
> ones though is that they usually are completely different in 
> terms of the type of parser that you write (e.g. hand-written 
> parsers are generally recursive-decent parser whereas 
> generative ones usually use bottom-up parsers).

So far discussion goes in favor of LL(*) parser like ANTLR, which 
is top-down recursive-descent. Either Pegged will be optimized 
with LL(*) algorithms, or a new parser generator created.
July 08, 2012
Re: Let's stop parser Hell
> I believe it would not hurt generality or quality of a parser 
> generator if it contained sews for inserting custom (optimized)

s/sews/seams
11 12 13 14 15 16 17 18 19
Top | Discussion index | About this forum | D home