November 05, 2013
On Monday, 4 November 2013 at 13:20:01 UTC, Timothee Cour wrote:
> for lexing there's already dscanner we could use (while we wait for perhaps
> a autogenerated lexer);
> so I think priority is on the autogenerated parser (dscanner has one but
> hand designed), where it's still unknown what will work well.

Yes, that tool has two properties:
1) It works now. Not Soon(tm). You can download it, compile it, and use it to dump the AST of your D code in just a minute or two.
2) It wasn't built THE ONE TRUE WAY.

But we should take a step back first. Before we try to implement a parser for D's grammar, we need to figure out what exactly D's grammar is.

Seriously. We don't have a real grammar for D. We have the language spec on dlang.org, but it isn't complete, consistent, or updated when the language changes. Want examples? I have a tracker for them here: http://d.puremagic.com/issues/show_bug.cgi?id=10233

There's also my project here: https://github.com/Hackerpilot/DGrammar, but it's not official and I keep finding differences between it and the behavior of DMD.

Why am I the only one who thinks this is a problem?
November 05, 2013
On Tuesday, 5 November 2013 at 07:27:10 UTC, Brian Schott wrote:
> Why am I the only one who thinks this is a problem?

Ignore this line if your name is "Kenji".
November 05, 2013
Brian Schott wrote:

> Why am I the only one who thinks this is a problem?

meetoo.

-manfred
November 05, 2013
05-Nov-2013 10:36, Chad Joan пишет:
> On Friday, 1 November 2013 at 21:22:46 UTC, Andrei Alexandrescu wrote:
>> ...
>> Bugs stopping Pegged from going forward should receive high priority.
>> I also encourage others to participate to that and similar work.
>>
>>
>> Andrei
>
[snip]

> I was working on a parser-engine-as-a-library that could be used to as
> optimized internals for Pegged, as well as any other tools that need to
> recognize these common patterns.
>
> The idea was to expose an API like this:
>
>      string makeParser()
>      {
>          auto builder = new ParserBuilder!char;
>          builder.pushSequence();
>              builder.literal('x');
>              builder.pushMaybe();
>                  builder.literal('y');
>              builder.pop();
>          builder.pop();
>          return builder.toDCode("callMe");
>      }

I was also toying with the idea of exposing Builder interface for std.regex. But push/pop IMHO are better be implicitly designed-out:

auto re = atom('x').star(charClass(unicode.Letter),atom('y')).build();

... and letting the nesting be explicit.

Is the same as:
auto re = regex(`x(?:\p{L}y)*`);

Aimed for apps/libs that build regular expressions anyway and have no need in textual parser.

> That snippet would create a parser that recognizes the grammar 'x' (
> 'y'? ).
> The current fledgling implementation creates this parser:
> http://pastebin.com/MgSqWXE2
>
> Of course, no one would be expected to write grammars like that. It
> would be the job of tools like Pegged or std.regex to package it up in
> nice syntax that is easy to use.

I thought to provide some building blocks for that with new std.uni. Not quite everything I wanted, but now at least there is one set of wheels less to reinvent.

[snip]

> Another fun thought: PEGs can have look-behind that includes any regular
> elements without any additional algorithmic complexity. Just take all of
> the look-behinds in the grammar, mash them together into one big
> regular-expression using regular alternation (|), and then have the
> resulting automaton consume in lock-step with the PEG parser.  Whenever
> the PEG parser needs to do a lookbehind, it just checks to see if the
> companion automaton is in a matching state for the capture it needs.

Sounds quite slow to do it "just in case". Also complete DFAs tend to be mm quite big.

What ANTLR does is similar technique - a regular lookahead to resolve ambiguity in the grammar (implicitly). A lot like LL(k) but with unlimited length (so called LL(*)). Of course, it generates LL(k) disambiguation where possible, then LL(*), failing that the usual backtracking.

>
> *sigh*, I feel like I could write a paper on this stuff if I were in
> grad school right now.  Alas, I am stuck doing 50-60 hours a week of
> soul-sucking business programming.

I heard Sociomantic is hiring D programmers for coding some awesome stuff, you may as well apply :)

> Well, then again, my understanding
> is that even though I can think of things that seem like they would make
> interesting topics for publishable papers, reality would have the profs
> conscript me to do completely different things that are possibly just as
> inane as the business programming.
>
Speaking for my limited experience - at times it's like that.

> I worry that the greater threat to good AST manipulation tools in D is a
> lack of free time, and not the DMD bugs as much.

Good for you I guess, my developments in related area are blocked still :(

> I hope this is useful to someone!


-- 
Dmitry Olshansky
November 05, 2013
On Tue, Nov 5, 2013 at 5:45 AM, Chad Joan <chadjoan@gmail.com> wrote:

>  Use the repetition operator(s) and turn the resulting array into
>> whatever tree you like.  In practice, I have never had a problem with this.
>>
>> I have used both Pegged and have written an SQL parser at work (not
>> exhaustive, just what's needed) that uses C macros as PEG elements.
>>  Tangent: Using C macros for this worked surprisingly well and allowed me
>> to avoid an extra step in the build process (I do not have access to D for
>> that project).  Pegged is still much more scalable, optimizable, and the
>> grammars are a lot less ugly/finicky.
>>
>
That made my day, thanks!


November 05, 2013
On Tue, Nov 5, 2013 at 3:54 PM, Dmitry Olshansky <dmitry.olsh@gmail.com>wrote:

>
>
>
> I was also toying with the idea of exposing Builder interface for std.regex. But push/pop IMHO are better be implicitly designed-out:
>
> auto re = atom('x').star(charClass(unicode.Letter),atom('y')).build();
>
> ... and letting the nesting be explicit.
>
> Is the same as:
> auto re = regex(`x(?:\p{L}y)*`);
>
> Aimed for apps/libs that build regular expressions anyway and have no need in textual parser.
>
>
Another possible advantage is to reference external names inside your construction, thus naming other regexen or refencing external variables to deposit backreferences inside them. All in all, to get a regex construct that can interact with the external word.



>
> What ANTLR does is similar technique - a regular lookahead to resolve
> ambiguity in the grammar (implicitly). A lot like LL(k) but with unlimited
> length (so called LL(*)). Of course, it generates LL(k) disambiguation
> where possible, then LL(*), failing that the usual backtracking.
>

I liked that idea since the author added it to ANTLR, but I never used it
since.
I wonder whether that can be implemented inside another parser generator or
if it uses some specific-to-ANTLR internal machinery.



>
>
>> *sigh*, I feel like I could write a paper on this stuff if I were in grad school right now.  Alas, I am stuck doing 50-60 hours a week of soul-sucking business programming.
>>
>
> I heard Sociomantic is hiring D programmers for coding some awesome stuff, you may as well apply :)




Well, I do 50-60 hours a week of fulfilling and challenging work, but the free time at the end is the same ;) I feel Chad's pain, though.




>  Well, then again, my understanding
>> is that even though I can think of things that seem like they would make interesting topics for publishable papers, reality would have the profs conscript me to do completely different things that are possibly just as inane as the business programming.
>>
>> Speaking for my limited experience - at times it's like that.
>
>
> I worry that the greater threat to good AST manipulation tools in D is a
>> lack of free time, and not the DMD bugs as much.
>>
>
> Good for you I guess, my developments in related area are blocked still :(
>
>

Walter is far from convinced that AST manipulation is a good thing. You would have to convince him first. His fear is that it will lead to unreadable code, and everyone using her own personnal version of D.

AFAICT, nothing of the sort happened in Lisp (I mean, Lispers have balkanization, but *not* due to AST manipulation).


November 05, 2013
On Tue, Nov 5, 2013 at 8:27 AM, Brian Schott <briancschott@gmail.com> wrote:

> But we should take a step back first. Before we try to implement a parser for D's grammar, we need to figure out what exactly D's grammar is.
>
> Seriously. We don't have a real grammar for D. We have the language spec on dlang.org, but it isn't complete, consistent, or updated when the language changes. Want examples? I have a tracker for them here: http://d.puremagic.com/issues/show_bug.cgi?id=10233
>
> There's also my project here: https://github.com/Hackerpilot/DGrammar, but it's not official and I keep finding differences between it and the behavior of DMD.
>
> Why am I the only one who thinks this is a problem?
>

You're not alone in this, but Walter's answer is that we must make pull requests on the website if we find errors in the online grammar. The problem is not knowing what the intended behavior is. When the reference compiler and the grammar disagree, who I am to say which is wrong?

We could also see if some slight modification to the grammar could push it
into LALR(1) territory.


November 05, 2013
On 11/02/2013 05:06 AM, Manfred Nowak wrote:
> One prerequisite for every PEG-Parser is, that the language has to be
> designed to be without any ambiguity.

AFAIK Parser expression grammars handle ambiguity through prioritization.
You'd still need to rewrite grammar rules that involve left-recursion or hidden left recursion.
November 05, 2013
On 11/01/2013 09:59 PM, Philippe Sigaud wrote:
> The examples directory shows different grammars, from JSON to XML to C to D.

Sounds promising.
I think I found a way to implement a D REPL but are lacking a reliable way to classify code snippets (expressions, declarations, statements) and get introduced symbol names.
Do you think it will be able to handle that (performance is not an issue)?

If there is no parser I either have to use a crude heuristic
or I could use machine learning to classify the code based on token histograms (after all we have at least 2 D lexers).
November 05, 2013
On 11/01/2013 09:59 PM, Philippe Sigaud wrote:
>
> I used it with the D grammar, but the one on the website is woefully
> inadequate (some mistakes, out of date compared to the compiler and
> written in a somewhat convoluted style full of left-recursion). The
> shortcomings are that the generated parser is quite slow compared to
> other D parsers.
>
> That comes from my coding, of course: Pegged generates a simple
> recursive descent parser. I guess I could push for a better engine, but
> I'm waiting for CTFE to get a bit better.
>
> The advantages are nice, though: full parse tree, semantic actions, and
> so on.

Like many others I'm hoping for a nice general parser generator for quite some time.
So I'm also asking specifically about your insights on PEGs.
From what I know they do have some memory issues due to the backtrace memoization (only for deep parse trees though, which aren't common in programming languages). Also it seems that the research community has some issues to formalize PEGs and what grammars they are capable to handle.
Also why is the expr grammar example so complicated?