View mode: basic / threaded / horizontal-split · Log in · Help
July 08, 2012
Re: Let's stop parser Hell
On Saturday, 7 July 2012 at 22:40:05 UTC, Andrei Alexandrescu
wrote:

> http://www.antlr.org/wiki/display/~admin/ANTLR+v4+lexers

Thanks, nice article.

Also found another post:
http://www.antlr.org/wiki/display/~admin/2008/11/30/Example+tree+rewriting+with+patterns,
which is related to some other discussion in this thread :)
July 08, 2012
Re: Let's stop parser Hell
On Saturday, 7 July 2012 at 22:35:37 UTC, Roman D. Boiko wrote:
> On Saturday, 7 July 2012 at 22:25:00 UTC, David Piepgrass wrote:
>> This is all true, but forgetting a brace commonly results in a 
>> barrage of error messages anyway. Code that guesses what you 
>> meant obviously won't work all the time, and phase 3 would 
>> have to take care not to emit an error message about a "{" 
>> token that doesn't actually exist (that was merely 
>> "guessed-in"). But at least it's nice for a parser to be 
>> /able/ to guess what you meant; for a typical parser it would 
>> be out of the question, upon detecting an error, to back up 
>> four source lines, insert a brace and try again.
>
> So you simply admit that error recovery is difficult to 
> implement. For me, it is a must-have, and thus throwing away 
> information is bad.

I'm not seeing any tremendous error-handling difficulty in my 
idea. Anyway, I missed the part about information being thrown 
away...?
July 08, 2012
Re: Let's stop parser Hell
On 07/07/2012 21:17, Dmitry Olshansky wrote:
> On 07-Jul-12 22:23, Andrei Alexandrescu wrote:
>> On 7/7/12 6:24 AM, Roman D. Boiko wrote:
>>> On Saturday, 7 July 2012 at 09:06:57 UTC, Roman D. Boiko wrote:
>>>> http://stackoverflow.com/questions/11373644/performance-of-parsers-peg-vs-lalr1-or-llk
>>>>
>>>>
>>>>
>>>
>>> So far it looks like LALR parsers may have lower constant factors than
>>> Packrat.
>>>
>>> The difference could be minimized by paying attention to parsing of
>>> terminal symbols, which was in my plans already. It is not necessary to
>>> strictly follow Packrat parsing algorithm.
>>>
>>> The benefits of Pegged, in my view, are its support of Parsing
>>> Expression Grammar (PEG) and compile-time evaluation. It is easily
>>> extensible and modifiable.
>>
>> Isn't also the fact that lexing and parsing are integrated? With
>> traditional LALR you need a separate tokenizer.
>>
>
> I'll have to point out that the whole point about integrated lexing is
> moot. It's more of liability then benefit. At very least it's just
> implementation curiosity not advantage.
>

Many usages only need a lexer.
July 08, 2012
Re: Let's stop parser Hell
>> Yeah, with a tree-transforming parser, I imagine the same 
>> thing, except
>> my current [fantasy] is to convert a certain subset of D to 
>> multiple other
>> languages automatically. Then I could write libraries that can 
>> easily be
>> used by an astonishingly large audience. I certainly would 
>> like to see D
>> targetting Android, but that's best done directly from D to 
>> ARM.
>>
> That does sound very cool.  Possibly difficult though, due to 
> having to cater to the lowest-common-denominator in all of your 
> API designs.  No templated functions or ranges in your API, 
> that's for sure.  I'm sure there are some things where this is 
> very doable though; it probably depends on what kind of 
> libraries you are writing.

Well, for templates, in general, it would be necessary to 
instantiate a particular set of templates and explicitly give 
them names in the target language. So for instance, I could 
define a Point!T struct in D, sure, but then I'd have to tell the 
language converter to create target-language specializations: in 
C#, PointD=Point!double, PointI=Point!int, etc. If the target 
were C++, the template could be translated to a C++ template, 
Point<T>, as long as there aren't any "static ifs" or other 
things that can't be translated. Notably, if a template P!T 
depends on another template Q!T, then P!T cannot be translated to 
a C++/C# P<T> unless Q!T was also translated as Q<T>.

Adapting standard libraries could no doubt be a gigantic problem. 
I don't know how to begin to think about doing that.

But for ranges in particular, I think the concept is too 
important to leave out of public interfaces. So I'd port the 
major range data structures to the target languages, most likely 
by hand, so that they could be used by converted code.

> As for D targeting Android, my intent is really to target X 
> where X is any CPU/OS combo you can think of.  I want to be 
> able to get D, the language, not necessarily phobos or other 
> niceties, to work on any platform, and to do so without much 
> work.  Cross-compiling to a new platform that has never been 
> cross-compiled before should require zero coding.

I understand. Conversion to C is an effective last resort. And, 
well, I hear a lot of compilers have even used it as a standard 
practice. I guess you'd be stuck with refcounting, though.

> I think that the D-directly-to-ARM is the current approach for 
> cross-compiling.  I critique it for its underwhelming lack of 
> results.

Yeah. I assume it involves weird object-file formats, calling 
conventions and ABIs. I guess very few want to get involved with 
that stuff, and very few have the slightest clue where to begin, 
myself included.

> (2) suffer from integration problems if you try to compile the 
> expressions in separate files before compiling the rest of the 
> front-end.

Absolutely, I love language-integrated metaprogramming. Without 
it you end up with complicated build environments, and I hate 
those, cuz there isn't a single standard build environment that 
everybody likes. I think people should be able to just load up 
their favorite IDE and add all source files to the project and It 
Just Works. Or on the command line, do dmd *.d or whatever. Oh, 
and the ability to run the same code at meta-compile-time, 
compile-time and run-time, also priceless.
July 08, 2012
Re: Let's stop parser Hell
"Jonathan M Davis" <jmdavisProg@gmx.com> wrote in message 
news:mailman.163.1341683651.31962.digitalmars-d@puremagic.com...
> [snip]
>
> In all honesty though, I would be surprised if you could convince Walter 
> to
> switch dmd's frontend to Phobos' lexer and parser even once they've been
> implemented. So, while I agree that there are benefits in doing so, I'm 
> not
> sure how much chance you have of ever getting any traction with that.
>
> - Jonathan M Davis

A more realistic alternative is to modify dmd to optionally (at build/link 
time) use Phobos' lexer and/or parser, which would allow them to be run 
against the test suite.  Parser and lexer changes are relatively rare in the 
compiler, so keeping two implementations in sync is not really that big a 
deal.
July 08, 2012
Re: Let's stop parser Hell
On Sunday, 8 July 2012 at 01:15:31 UTC, David Piepgrass wrote:
> I'm not seeing any tremendous error-handling difficulty in my 
> idea. Anyway, I missed the part about information being thrown 
> away...?

(I will use the word "you" to refer to some abstract person or 
compiler.)

Error handling could mean either error reporting and stopping 
after the first error, or reporting several errors and continuing 
parsing + semantic analysis whenever possible, so that the user 
would get partial functionality like code completion, information 
about method overloads, or "go to definition" / find all usages, 
etc.

The first method is the less powerful option, but still requires 
constructing a meaningful error message which could help the user.

There are many possible error recovery strategies. For example, 
you might decide to insert some token according to the parsing 
information available up to the moment when error is discovered, 
if that would fix the problem and allow to continue parsing. 
Another option is to ignore a series of further tokens (treat 
them a whitespace, for example), until parser is able to continue 
its work. Also there are many heuristics for typical error 
situations.

All of these can only be performed if parser gathers the syntax 
information which can be inferred from source code according to 
the grammar rules.

In stage 2 you have only performed some basic analysis, like, 
e.g., matched braces to define some hierarchy. This means that at 
the time when you find a missing brace, for example, you cannot 
tell anything more than that braces don't match. Or, if the user 
inserts a brace in an incorrect location, it is only possible to 
say that it is either missing somewhere and somewhere else 
another brace is missing, or that it is missing in one place, and 
some other brace is redundant. In many cases you won't even 
notice that a brace is incorrectly placed, and pass the resulting 
tree to the 3rd stage. You don't take any hint from grammar about 
the exact locations where some token should be present.

Now, stage 3 heavily depends on the output of stage 2. As I 
demonstrated in some examples, it could get the output which 
implies incorrect structure, even if that has not been found in 
the previous stage. You would need to analyze so much information 
attempting to find the real roots of a problem, that effectively 
it would involve duplicating (implicitly) the logic of previous 
stage, but with combinatorial complexity growth.

The problems you would need to deal with are much more complex 
than I described here. So you wouldn't be able to deliver error 
recovery at all, or (if you could) it would be either trivial or 
would require needlessly complex logic. Breaking the system at 
the wrong boundary (when the number of dependencies that cross 
the boundary is large) always introduces artificial complexity.

Described above is the illustration of what I call information 
loss. I may have described something not as clearly as needed, 
but I didn't have the goal to provide a thorough and verifiable 
analysis. I speculated and simplified a lot. If you decide to 
ignore this, it is not a problem and I don't state that you will 
fail any of your goals. Just admit that __for me__ this approach 
is not a viable option. Everything written above is IMHO and 
there may be many ways to resolve the problems with various 
degrees of success.
July 08, 2012
Re: Let's stop parser Hell
>> Yup, LL(*) is my favorite so far.
>
> That's Terence Parr's discovery, right? I've always liked 
> ANTLR, so if PEGs turn out to have issues LL(*) sounds like a 
> promising alternative.

We should also consider using GLR if LL(*) doesn't work.
July 08, 2012
Re: Let's stop parser Hell
On 08-Jul-12 13:05, Tobias Pankrath wrote:
>>> Yup, LL(*) is my favorite so far.
>>
>> That's Terence Parr's discovery, right? I've always liked ANTLR, so if
>> PEGs turn out to have issues LL(*) sounds like a promising alternative.
>
> We should also consider using GLR if LL(*) doesn't work.

GLR ... are you serious? It still does parsing in n^3 if I'm not mistaken.

-- 
Dmitry Olshansky
July 08, 2012
Re: Let's stop parser Hell
On Sunday, 8 July 2012 at 09:14:32 UTC, Dmitry Olshansky wrote:
> On 08-Jul-12 13:05, Tobias Pankrath wrote:
>>>> Yup, LL(*) is my favorite so far.
>>>
>>> That's Terence Parr's discovery, right? I've always liked 
>>> ANTLR, so if
>>> PEGs turn out to have issues LL(*) sounds like a promising 
>>> alternative.
>>
>> We should also consider using GLR if LL(*) doesn't work.
>
> GLR ... are you serious? It still does parsing in n^3 if I'm 
> not mistaken.

Since GLR can parse any CFG the upper bound is in O(n^3), but the 
actual performance seems to depend on the grammar.

From Elkhound [1]
> It should be competitive with Bison for LALR (fragements of) 
> grammars, and degrade gracefully from there. On the scale of 
> grammar nondeterminism, from none
> (LALR) to some to lots, "some" is the niche Elkhound is going 
> after.
> These goals are driven by Elkhound's primary application, the 
> Elsa C++ Parser. In essence, Elkhound came about because I 
> wanted to apply automatic parsing technology to parsing C++, 
> but found exiting systems inadequate.

So it seems to me that it is not worse than PEG if you have a 
grammar with
reasonable many ambiguities.

Performance is one thing, but you should be able to express your 
language in the underlying formalism without too many hurdles.
July 08, 2012
Re: Let's stop parser Hell
> From Elkhound [1]

http://scottmcpeak.com/elkhound/sources/elkhound/algorithm.html
10 11 12 13 14 15 16 17 18
Top | Discussion index | About this forum | D home