View mode: basic / threaded / horizontal-split · Log in · Help
July 08, 2012
Re: Let's stop parser Hell
On Sunday, 8 July 2012 at 21:22:39 UTC, Roman D. Boiko wrote:
> On Sunday, 8 July 2012 at 21:03:41 UTC, Jonathan M Davis wrote:
>> It's been too long since I was actively working on parsers to 
>> give any details, but it is my understanding that because a 
>> hand-written parser is optimized for a specific grammar, it's 
>> going to be faster.
>
> My aim is to find out any potential bottlenecks and ensure that 
> those are possible to get rid of. So, let's try.
>
> I believe it would not hurt generality or quality of a parser 
> generator if it contained sews for inserting custom (optimized) 
> code where necessary, including those needed to take advantage 
> of some particular aspects of D grammar. Thus I claim that 
> optimization for D grammar is possible.

I'm convinced that the output of a parser generator (PG) can be 
very nearly as fast as hand-written code. ANTLR's output (last I 
checked) was not ideal, but the one I planned to make (a few 
years ago) would have produced faster code.

By default the PG's output will not be the speed of hand-written 
code, but the user can optimize it. Assuming an ANTLR-like PG, 
the user can inspect the original output looking for inefficient 
lookahead, or cases where the parser looks for rare cases before 
common cases, and then improve the grammar and insert ... I 
forget all the ANTLR terminology ... syntactic predicates or 
whatever, to optimize the parser.

> So far discussion goes in favor of LL(*) parser like ANTLR, 
> which is top-down recursive-descent. Either Pegged will be 
> optimized with LL(*) algorithms, or a new parser generator 
> created.

Right, for instance I am interested in writing a top-down PG 
because I understand them better and prefer the top-down approach 
due to its flexibility (semantic actions, allowing custom code) 
and understandability (the user can realistically understand the 
output; in fact readability would be a specific goal of mine)

Roman, regarding what you were saying to me earlier:
>In stage 2 you have only performed some basic analysis, like, 
>e.g., matched braces to define some hierarchy. This means that 
>at the time when you find a missing brace, for example, you 
>cannot tell anything more than that braces don't match.

Stage 2 actually can tell more than just "a brace is missing 
somewhere". Because so many languages are C-like. So given this 
situation:

   frob (c &% x)
      blip # gom;
   }

It doesn't need to know what language this is to tell where the 
brace belongs. Even in a more nebulous case like:

   frob (c &% x) bar @ lic
      blip # gom;
   }

probably the brace belongs at the end of the first line.

Perhaps your point is that there are situations where a parser 
that knows the "entire" grammar could make a better guess about 
where the missing brace/paren belongs. That's certainly true.

However, just because it can guess better, doesn't mean it can 
reinterpret the code based on that guess. I mean, I don't see any 
way to "back up" a parser by an arbitrary amount. A hypothetical 
stage 2 would probably be hand-written and could realistically 
back up and insert a brace/paren anywhere that the heuristics 
dictate, because it is producing a simple data structure and it 
doesn't need to do any semantic actions as it parses. A "full" 
parser, on the other hand, has done a lot of work that it can't 
undo, so the best it can do is report to the user "line 54: 
error: brace mismatch; did you forget a brace on line 13?" The 
heuristic is still helpful, but it has already parsed lines 13 to 
54 in the wrong context (and, in some cases, has already split 
out a series of error messages that are unrelated to the user's 
actual mistake).

> As I demonstrated in some examples, it could get the output 
> which implies incorrect structure

I was unable to find the examples you refer to... this thread's 
getting a little unweildy :)
July 08, 2012
Re: Let's stop parser Hell
David, I would suggest you to create a proof-of-concept prototype 
and pay attention to some non-trivial use cases. This way you'd 
likely be able to reduce major risks significantly before making 
any serious time investments.
July 09, 2012
Re: Let's stop parser Hell
On 2012-07-08 22:05, Jonathan M Davis wrote:

> Well, for the lexer and parser, we're probably okay (or very close to it). As
> Daniel pointed out elsewhere in this thread, that part of the frontend doesn't
> change much (particularly the lexer). There's definitely still some churn, but
> it's nothing like it used to be.

It don't feel like that, didn't we get lambdas or UFCS in the last release?

-- 
/Jacob Carlborg
July 09, 2012
Re: Let's stop parser Hell
On 2012-07-08 22:15, Roman D. Boiko wrote:

> Can you provide a *specific* example of performance optimization which a
> custom D compiler would have, but parser generator would be unlikely to
> catch up.

I'm not expert in this field but I've heard that it's harder to get good 
error reporting with a parser generator.

-- 
/Jacob Carlborg
July 09, 2012
Re: Let's stop parser Hell
On Monday, July 09, 2012 08:33:31 Jacob Carlborg wrote:
> On 2012-07-08 22:05, Jonathan M Davis wrote:
> > Well, for the lexer and parser, we're probably okay (or very close to it).
> > As Daniel pointed out elsewhere in this thread, that part of the frontend
> > doesn't change much (particularly the lexer). There's definitely still
> > some churn, but it's nothing like it used to be.
> 
> It don't feel like that, didn't we get lambdas or UFCS in the last release?

lambdas are a bit older than that IIRC, and I don't think that UFCS actually 
affects the lexer or parser. Yes, changes are still made, but they're 
increasingly rare, and most of the stuff being changed is on the semantic level 
(most of which is bug fixes). So, out of all of the parts of the compiler to 
duplicate, those are the least likely to have to have major changes made to 
them later (especially the lexer).

- Jonathan M Davis
July 09, 2012
Re: Let's stop parser Hell
On 2012-07-09 08:39, Jonathan M Davis wrote:

> lambdas are a bit older than that IIRC, and I don't think that UFCS actually
> affects the lexer or parser. Yes, changes are still made, but they're
> increasingly rare, and most of the stuff being changed is on the semantic level
> (most of which is bug fixes). So, out of all of the parts of the compiler to
> duplicate, those are the least likely to have to have major changes made to
> them later (especially the lexer).

I'm pretty sure UFCS affects lexing or parsing. How else would this be 
legal:

4.foo();

-- 
/Jacob Carlborg
July 09, 2012
Re: Let's stop parser Hell
On Monday, July 09, 2012 09:16:41 Jacob Carlborg wrote:
> On 2012-07-09 08:39, Jonathan M Davis wrote:
> > lambdas are a bit older than that IIRC, and I don't think that UFCS
> > actually affects the lexer or parser. Yes, changes are still made, but
> > they're increasingly rare, and most of the stuff being changed is on the
> > semantic level (most of which is bug fixes). So, out of all of the parts
> > of the compiler to duplicate, those are the least likely to have to have
> > major changes made to them later (especially the lexer).
> 
> I'm pretty sure UFCS affects lexing or parsing. How else would this be
> legal:
> 
> 4.foo();

That definitely wouldn't affect lexing, because it doesn't affect the tokens at 
all. Whether it affects the parser would depend on the grammar. There's a good 
chance it would affect parsing, but it might not. It depends on how numeric 
literals are treated. In most cases though, UFCS definitely wouldn't affect 
parsing at all, because it's purely a matter of symbol resolution, so if 
changes had to be made to the parser, they would almost certainly have been 
minimal.

Yes, some changes are still being made to the parser, but since the language 
is mostly frozen, almost all of the changes have gone into bug fixes, which are 
generally semantic issues and don't affect lexing or parsing at all. So, while 
there would still be some maintenance issues in keeping a separate lexer and 
parser in line with dmd, I really don't think that it would take much at this 
point. Most of the work would be in getting them to match dmd when they're 
first written.

- Jonathan M Davis
July 09, 2012
Re: Let's stop parser Hell
deadalnix , dans le message (digitalmars.D:171330), a écrit :
> D isn't 100% CFG. But it is close.

What makes D fail to be a CFG?
July 09, 2012
Re: Let's stop parser Hell
On Monday, 9 July 2012 at 06:35:38 UTC, Jacob Carlborg wrote:
> I'm not expert in this field but I've heard that it's harder to 
> get good error reporting with a parser generator.

Good point!
July 09, 2012
Re: Let's stop parser Hell
On 07/09/2012 08:33 AM, Jacob Carlborg wrote:
> On 2012-07-08 22:05, Jonathan M Davis wrote:
>
>> Well, for the lexer and parser, we're probably okay (or very close to
>> it). As
>> Daniel pointed out elsewhere in this thread, that part of the frontend
>> doesn't
>> change much (particularly the lexer). There's definitely still some
>> churn, but
>> it's nothing like it used to be.
>
> It don't feel like that, didn't we get lambdas or UFCS in the last release?
>

Those changes are trivial to incorporate into a well written parser.
12 13 14 15 16 17 18 19 20
Top | Discussion index | About this forum | D home