July 08, 2012
On 08-Jul-12 14:48, Tobias Pankrath wrote:
> On Sunday, 8 July 2012 at 09:14:32 UTC, Dmitry Olshansky wrote:
>> On 08-Jul-12 13:05, Tobias Pankrath wrote:
>>>>> Yup, LL(*) is my favorite so far.
>>>>
>>>> That's Terence Parr's discovery, right? I've always liked ANTLR, so if
>>>> PEGs turn out to have issues LL(*) sounds like a promising alternative.
>>>
>>> We should also consider using GLR if LL(*) doesn't work.
>>
>> GLR ... are you serious? It still does parsing in n^3 if I'm not
>> mistaken.
>
> Since GLR can parse any CFG the upper bound is in O(n^3), but the actual
> performance seems to depend on the grammar.
>
>  From Elkhound [1]
>> It should be competitive with Bison for LALR (fragements of) grammars,
>> and degrade gracefully from there. On the scale of grammar
>> nondeterminism, from none
>> (LALR) to some to lots, "some" is the niche Elkhound is going after.
>> These goals are driven by Elkhound's primary application, the Elsa C++
>> Parser. In essence, Elkhound came about because I wanted to apply
>> automatic parsing technology to parsing C++, but found exiting systems
>> inadequate.
>
> So it seems to me that it is not worse than PEG if you have a grammar with
> reasonable many ambiguities.
>
> Performance is one thing, but you should be able to express your
> language in the underlying formalism without too many hurdles.
>

The problem is that say LL(*) can be easily tweaked in place to run various semantic actions as it parse the source. I think GLR is harder to make do anything other then "parse & say it's fine".


-- 
Dmitry Olshansky


July 08, 2012
On 2012-07-07 18:49, deadalnix wrote:

> I tried that. This is almost impossible, dmd's parser and AST are very
> tightly mixed with dmd's internals.

I suspected that.

-- 
/Jacob Carlborg


July 08, 2012
> The problem is that say LL(*) can be easily tweaked in place to run various semantic actions as it parse the source. I think GLR is harder to make do anything other then "parse & say it's fine".

Did you look at Elkhound? According to the tutorial you can use actions
in the same way as if you were using an (LA)LR parser. Dunno if it's sufficient, but with Elkhound an C++-Parser has been written.

See also http://stackoverflow.com/questions/428892/what-parser-generator-do-you-recommend where Ira Baxter (maybe you know him) makes a case for DMS, which uses GLR internally. Here is a google tech talk of Ira explaining the DMS http://www.youtube.com/watch?v=C-_dw9iEzhA .

I'm not very experienced with parser generators. Just trying to put all options on the table.
July 08, 2012
On 2012-07-07 19:53, Jonathan M Davis wrote:

> There are multiple issues here. The one that Andrei is raising is the fact
> that D isn't properly formalized. Essentially, the compiler _is_ the spec, and
> while the online spec _mostly_ follows it, it doesn't entirely, and the online
> spec isn't always as precise as it needs to be regardless. With a fully
> formalized spec, it should be possible to fully implement a D compiler from
> the spec alone, and I don't think that that's currently possible.

That's been a problem for a long time.

> Writing a D lexer and parser (if not a full-blown compiler) from scratch would
> help highlight the places in the spec which are lacking, and having it be part
> of Phobos would arguably increase Walter's incentive to make sure that the
> spec is in line with the compiler (and vice versa) so that stuff _other_ than
> the compiler which is based on the spec would be able to match the compiler.
>
> Clang is in a _completely_ different situation. It's a C/C++ compiler, and both
> C and C++ already have official, formalized specs which Clang conforms to (or is
> supposed to anyway). Clang has no control over the spec at all. It merely
> implements it. So, there is no issue of trying to keep other tools or
> compilers in line with Clang due to it being the language's spec like we
> effectively have with dmd. It may help the tools which use Clang to be fully in
> line with Clang and not have to worry about whether Clang implements the spec
> slightly differently, but in theory, if they all follow the spec correctly,
> that isn't in issue (though obviously in practice it can be).

That should be true for D as well when the spec is complete.

> In D's case, all of the major D compilers use the same frontend, which helps
> compatability but harms the spec, because there's less incentive to keep it
> precise and  up-to-date. So, from the perspective of the spec, implementing
> the D lexer and parser for Phobos separately from dmd would be of great
> benefit.

That might be true.

> IMHO, the reason that porting dmd's lexer and parser would be of great benefit
> is primarily maintenance. It makes it much easier to keep Phobos' lexer and
> parser in line with dmd, making discrepencies less likely, but it arguably
> reduces the incentive to improve the spec.

But then it sounds like the best solution would be not to have a lexer/parser based on DMD and instead making sure the spec is correct.

> The benefits of having a lexer and parser as a library (regardless of whether
> it's from scratch or a port from dmd) are primarly derived from the fact that
> it makes it much easier to create tools which use them. Such tools no longer
> have to write their own lexers or parsers.

Of course, that's the whole point.

> If the compiler uses the same library, it has the added benefit of making it so
> that any tool using the library will be in sync with the compiler, but if the
> spec is properly formalized and up-to-date, and the library is kep up-to-date
> with _it_, that shouldn't really be a problem. You still have the debate as to
> whether it's better to have a separate implementation based on the spec
> (thereby making it more likely that the spec is correct) or whether it's
> better to have the compiler share the implementation so that the library is
> guaranteed to match the compiler (though not necessarily the spec), but I
> think that that's a separate debate from whether we should have the lexer and
> parser as a library.

What keeps popping up in my head is the scenario where users are complaining about the frontend in Phobos not behaving as their compiler. If this is due to they are out of sync, bugs or not following the spec doesn't matter.

I still thinks D is changing too much to have two separate implementations of the compiler and a library like this.

> In all honesty though, I would be surprised if you could convince Walter to
> switch dmd's frontend to Phobos' lexer and parser even once they've been
> implemented. So, while I agree that there are benefits in doing so, I'm not
> sure how much chance you have of ever getting any traction with that.

That is perhaps true.

> Another thing to consider is that that might make it so that gdc and ldc
> couldn't share the same frontend with dmd (assuming that they have to keep
> their frontends in C or C++ -  I don't know if they do) - but if so, that
> would increase the incentive for the spec to be correct if dmd ever started
> using a D frontend.

-- 
/Jacob Carlborg


July 08, 2012
On Sunday, 8 July 2012 at 09:14:32 UTC, Dmitry Olshansky wrote:
> On 08-Jul-12 13:05, Tobias Pankrath wrote:
>>>> Yup, LL(*) is my favorite so far.
>>>
>>> That's Terence Parr's discovery, right? I've always liked ANTLR, so if
>>> PEGs turn out to have issues LL(*) sounds like a promising alternative.
>>
>> We should also consider using GLR if LL(*) doesn't work.
>
> GLR ... are you serious? It still does parsing in n^3 if I'm not mistaken.

http://www.antlr.org/papers/LL-star-PLDI11.pdf describes some disadvantages of GLR with respect to LL(*).
July 08, 2012
On Sunday, July 08, 2012 14:13:18 Jacob Carlborg wrote:
> On 2012-07-07 19:53, Jonathan M Davis wrote:
> > There are multiple issues here. The one that Andrei is raising is the fact that D isn't properly formalized. Essentially, the compiler _is_ the spec, and while the online spec _mostly_ follows it, it doesn't entirely, and the online spec isn't always as precise as it needs to be regardless. With a fully formalized spec, it should be possible to fully implement a D compiler from the spec alone, and I don't think that that's currently possible.
> 
> That's been a problem for a long time.

True, but Andrei is basically arguing that porting dmd's lexer and parser to D would reduce the incentive to fix this, whereas having a new lexer/parser would encourage fixing it (though he's arguing for purely using a generative parser rather than writing a D-specific one).

> > IMHO, the reason that porting dmd's lexer and parser would be of great benefit is primarily maintenance. It makes it much easier to keep Phobos' lexer and parser in line with dmd, making discrepencies less likely, but it arguably reduces the incentive to improve the spec.
> 
> But then it sounds like the best solution would be not to have a lexer/parser based on DMD and instead making sure the spec is correct.

That would be the question. Given enough time, what I'd probably want to do would be to port dmd's lexer and parser to D in order to properly figure it all out, updating the spec in the process, and _then_ go write one from scratch, possibly basing some of it on how dmd did it, possibly not. But I don't really have the time for any of that right now, and most of the focus right now from people interested in parsing seems to be on pegged and parser generators (which are very cool and in some ways much more interesting, but I seriously question that that's the performant way to go if you're looking to parse D specifically). So, who knows what we're going to get out of this and when we'll get it.

> What keeps popping up in my head is the scenario where users are complaining about the frontend in Phobos not behaving as their compiler. If this is due to they are out of sync, bugs or not following the spec doesn't matter.
> 
> I still thinks D is changing too much to have two separate implementations of the compiler and a library like this.

Well, for the lexer and parser, we're probably okay (or very close to it). As Daniel pointed out elsewhere in this thread, that part of the frontend doesn't change much (particularly the lexer). There's definitely still some churn, but it's nothing like it used to be.

- Jonathan M Davis
July 08, 2012
On Sunday, 8 July 2012 at 20:06:07 UTC, Jonathan M Davis wrote:
> most of the focus right now from people interested in parsing seems to be on pegged and parser generators (which are very cool and in some ways much more interesting, but I seriously question that that's the performant way to go if you're looking to parse D specifically).

Can you provide a *specific* example of performance optimization which a custom D compiler would have, but parser generator would be unlikely to catch up.
July 08, 2012
On Sunday, July 08, 2012 22:15:26 Roman D. Boiko wrote:
> On Sunday, 8 July 2012 at 20:06:07 UTC, Jonathan M Davis wrote:
> > most of the focus right now from people interested in parsing seems to be on pegged and parser generators (which are very cool and in some ways much more interesting, but I seriously question that that's the performant way to go if you're looking to parse D specifically).
> 
> Can you provide a *specific* example of performance optimization which a custom D compiler would have, but parser generator would be unlikely to catch up.

It's been too long since I was actively working on parsers to give any details, but it is my understanding that because a hand-written parser is optimized for a specific grammar, it's going to be faster. Also, look at dmd and dmc vs other compilers. They use hand-written parsers and are generally much faster than their competitors.

One thing to remember about hand-written parsers vs generative ones though is that they usually are completely different in terms of the type of parser that you write (e.g. hand-written parsers are generally recursive-decent parser whereas generative ones usually use bottom-up parsers). So, that could have a large impact on performance as well (in either direction).

To be clear though, I have _no_ problem with having a generative parser in Phobos (or having other 3rd party options available). Parsers like pegged are extremely cool and extremely useful. It's just that it's my understanding that well-written hand-written parsers are faster than generated ones, so I think that it would be benecial to have a hand-written parser for D in Phobos _in addition_ to a general, generative solution.

But to fully prove that a hand-written one would be faster, we'd of course have to have actual solutions to compare. And if the API for a D-specific parser in Phobos is designed well enough, and it somehow proved that a generative solution was faster, then the hand-written one could be replaced by the generative one underneat the hood.

- Jonathan M Davis
July 08, 2012
On Sunday, 8 July 2012 at 21:03:41 UTC, Jonathan M Davis wrote:
> It's been too long since I was actively working on parsers to give any details, but it is my understanding that because a hand-written parser is optimized for a specific grammar, it's going to be faster.

My aim is to find out any potential bottlenecks and ensure that those are possible to get rid of. So, let's try.

I believe it would not hurt generality or quality of a parser generator if it contained sews for inserting custom (optimized) code where necessary, including those needed to take advantage of some particular aspects of D grammar. Thus I claim that optimization for D grammar is possible.

> Also, look at dmd and dmc vs other compilers. They use hand-written parsers and are generally much faster than their competitors.

Due to which particular design or implementation decisions? Would it be possible to name a few which are relevant in this context?

> One thing to remember about hand-written parsers vs generative ones though is that they usually are completely different in terms of the type of parser that you write (e.g. hand-written parsers are generally recursive-decent parser whereas generative ones usually use bottom-up parsers).

So far discussion goes in favor of LL(*) parser like ANTLR, which is top-down recursive-descent. Either Pegged will be optimized with LL(*) algorithms, or a new parser generator created.


July 08, 2012
> I believe it would not hurt generality or quality of a parser generator if it contained sews for inserting custom (optimized)

s/sews/seams