View mode: basic / threaded / horizontal-split · Log in · Help
July 07, 2012
Re: Let's stop parser Hell
On Saturday, 7 July 2012 at 10:05:30 UTC, Dmitry Olshansky wrote:
> I can tell you that they are not slower then one another in 
> principle. Quality of implementations trumps every theoretical 
> aspect here. LALR is usually fast even if implemented by book 
> but they are hard to optimize futher and quite restrictive on 
> "semantic extensions".
>
> Proper CFG parsers all are liner-time anyway.

Exactly, that is also my point. I think that Pegged can be 
heavily optimized in performance, and there are no fundamental 
issues which would make it inherently slower than LALR or a 
hand-written D-specific parser.
July 07, 2012
Re: Let's stop parser Hell
On 2012-07-07 03:12, Jonathan M Davis wrote:

> Now, the issue of a "strong, dependable formalization of D's syntax" is
> another thing entirely. Porting dmd's lexer and parser to Phobos would keep
> the Phobos implementation in line with dmd much more easily and avoid
> inconsistencies in the language definition and the like. However, if we write a
> new lexer and parser specifically for Phobos which _doesn't_ port the lexer or
> parser from dmd, then that _would_ help drive making the spec match the
> compiler (or vice versa). So, I agree that could be a definite argument for
> writing a lexer and parser from scratch rather than porting the one from dmd,
> but I don't buy the bit about it smothering parser generators at all. I think
> that the use cases are completely different.

I think the whole point of having a compiler as a library is that the 
compiler should use the library as well. Otherwise the two will get out 
of sync.

Just look at Clang, LLVM, LLDB and Xcode, they took the correct 
approach. Clang and LLVM (and I think LLDB) are available as libraries. 
Then the compiler, debugger (lldb) and IDE uses these libraries as part 
of their implementation. They don't have their own implementation that 
is similar to the libraries, making it "easy" to stay in sync. They 
_use_ the libraries as libraries.

This is what DMD and Phobos should do as well. If it's too complicated 
to port the lexer/parser to D then it would be better, at least as a 
first step, to modify DMD as needed. Create a C API for DMD and then 
create D bindings to be put into Phobos.

-- 
/Jacob Carlborg
July 07, 2012
Re: Let's stop parser Hell
On Saturday, 7 July 2012 at 10:26:39 UTC, Roman D. Boiko wrote:
> I think that Pegged can be heavily optimized in performance, 
> and there are no
> fundamental issues which would make it inherently slower than 
> LALR or a hand-written D-specific parser.

Hmm... found an interesting article: 
http://www.antlr.org/papers/LL-star-PLDI11.pdf

It describes some disadvantages of Packrat parsing, like problems 
with debugging and error recovery. These are important for DCT, 
so I'll have to perform additional research.
July 07, 2012
Re: Let's stop parser Hell
On 07-Jul-12 15:30, Roman D. Boiko wrote:
> On Saturday, 7 July 2012 at 10:26:39 UTC, Roman D. Boiko wrote:
>> I think that Pegged can be heavily optimized in performance, and there
>> are no
>> fundamental issues which would make it inherently slower than LALR or
>> a hand-written D-specific parser.
>
> Hmm... found an interesting article:
> http://www.antlr.org/papers/LL-star-PLDI11.pdf
>
> It describes some disadvantages of Packrat parsing, like problems with
> debugging and error recovery. These are important for DCT, so I'll have
> to perform additional research.

Yup, LL(*) is my favorite so far. As for debugging and error recovery 
they are always a problem.

-- 
Dmitry Olshansky
July 07, 2012
Re: Let's stop parser Hell
On Saturday, 7 July 2012 at 11:33:18 UTC, Dmitry Olshansky wrote:
> Yup, LL(*) is my favorite so far. As for debugging and error 
> recovery they are always a problem.

It's interesting, that I also wanted to use DFA for non-terminals 
(similarly to LL(*)), but was considering their usage as pure 
performance optimization. Concerns raised in the article seem to 
be very reasonable, I will likely reconsider my previous plans...

This forum is an endless source of high-quality feed-back for me.
July 07, 2012
Re: Let's stop parser Hell
> I can tell you that they are not slower then one another in 
> principle. Quality of implementations trumps every theoretical 
> aspect here. LALR is usually fast even if implemented by book 
> but they are hard to optimize futher and quite restrictive on 
> "semantic extensions".
>
> Proper CFG parsers all are liner-time anyway.

To be picky here:
The languages that can be parsed in linear time are a strict 
subset of CFGs.

However I do agree, that error handling and flexibility are more 
important than raw speed. I don't want to get a concrete syntax 
tree full of unneeded productions, like the ones you get when you 
encode precedence rules in your grammar.
July 07, 2012
Re: Let's stop parser Hell
On Saturday, 7 July 2012 at 14:17:54 UTC, Tobias Pankrath wrote:
>> Proper CFG parsers all are liner-time anyway.
>
> To be picky here:
> The languages that can be parsed in linear time are a strict 
> subset of CFGs.
Also any PEG defines a language with linear-time parsing. Some 
non-context-free languages can be described with PEG.
July 07, 2012
Re: Let's stop parser Hell
On 07/05/2012 08:28 AM, Roman D. Boiko wrote:
> Pegged may eventually become standard, if it will be performanceoptimized and a bit more customizable

Interesting, I thought you were hand-writing this stuff.

I'm a fan of pegged and made some pull requests, one of which was aimed 
at making it more customizable by allowing the user to define what type 
gets used as a parse tree node, thus allowing one to potentially use 
their parse tree as an AST (and maybe do a couple other things too). 
It's a WIP, but the proof of concept is done: DMD can handle the extra 
templating at compile time, so it works.

What kind of things did you want in terms of customizability?
July 07, 2012
Re: Let's stop parser Hell
On Saturday, 7 July 2012 at 15:42:30 UTC, Chad J wrote:
> On 07/05/2012 08:28 AM, Roman D. Boiko wrote:
>> Pegged may eventually become standard, if it will be 
>> performanceoptimized and a bit more customizable
>
> Interesting, I thought you were hand-writing this stuff.
>
> I'm a fan of pegged and made some pull requests, one of which 
> was aimed at making it more customizable by allowing the user 
> to define what type gets used as a parse tree node, thus 
> allowing one to potentially use their parse tree as an AST (and 
> maybe do a couple other things too). It's a WIP, but the proof 
> of concept is done: DMD can handle the extra templating at 
> compile time, so it works.
>
> What kind of things did you want in terms of customizability?

There are many possible customization point which would make it a 
better fit for my project while also being useful in general.

The most critical was the one you implemented: ability to define 
a custom parse tree. I also needed the ability to use immutable 
structures (almost) everywhere, while currently they must be 
mutable. Also it would be great to avoid UTF conversions (instead 
use functions and types templated by the string type). I also 
wanted to add ability to reuse parts of previously generated AST 
which correspond to source code that has not been changed, or to 
identical source code parsed previously. (This would allow 
incremental parsing, e.g., while user is typing.) The latter 
would require to track source code positions separately, or even 
generating them on demand (this is the feature actively 
criticized by deadalnix in some previous discussions but central 
to DCT architecture). AST would only hold information about 
widths of its nodes.

I've also written some notes (10-15 suggestions) while studying 
Pegged code, which will be shared later.

However, as I found a few hours ago, Packrat parsing (typically 
used to handle PEG) has serious disadvantages: it complicates 
debugging because of frequent backtracking, it has problems with 
error recovery, and typically disallows to add actions with side 
effects (because of possibility of backtracking). These are 
important enough to reconsider my plans of using Pegged. I will 
try to analyze whether the issues are so fundamental that I (or 
somebody else) will have to create an ANTLR-like parser instead, 
or whether it is possible to introduce changes into Pegged that 
would fix these problems.
July 07, 2012
Re: Let's stop parser Hell
On Saturday, 7 July 2012 at 15:39:52 UTC, Roman D. Boiko wrote:
> On Saturday, 7 July 2012 at 14:17:54 UTC, Tobias Pankrath wrote:
>>> Proper CFG parsers all are liner-time anyway.
>>
>> To be picky here:
>> The languages that can be parsed in linear time are a strict 
>> subset of CFGs.
> Also any PEG defines a language with linear-time parsing. Some 
> non-context-free languages can be described with PEG.

Interesting, I thought that PEG ⊂ CFG holds.
3 4 5 6 7 8 9 10 11
Top | Discussion index | About this forum | D home