Lexer and parser generators using CTFE (page 9) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Lexer and parser generators using CTFE (page 9)

May 28, 2012

Re: Lexer and parser generators using CTFE

Posted by Jacob Carlborg
in reply to F i L

Jacob Carlborg

Posted in reply to F i L

On 2012-05-27 22:15, F i L wrote:
> I'm not sure I follow all the details of what Andrei's suggesting and
> what's being talked about here, this parser/lexer stuff is still very
> new to me, so this may be a bit off-topic. However, I thought I'd weigh
> in on something I was very impressed with about the Nimrod language's
> direct AST access/manipulation.
>
> Nim has a "template" which is very much like D's mixin templates, example:
>
> # Nim
> template foo(b:string) =
> var bar = b
>
> block main:
> foo("test")
> assert(bar == "test")
>
> and the equivalent in...
>
> // D
> mixin template foo(string b) {
> auto bar = b;
> }
>
> void main() {
> mixin foo("test");
> assert(bar == "test");
> }
>
> which is all very straight forward stuff, the cool part comes with Nim's
> macro's. Nim has a two unique types: expr & stmt (expression &
> statement). They're direct AST structures which can be passed to
> template/macro procedures and arbitrarily mutated. Example:
>
> macro foo(s:stmt): stmt =
> result = newNimNode(nnkStmtList)
> for i in 1 .. s.len-1:
> var str = s[i].toStrLit()
> result.add(newCall("echo", str))
>
> block main:
> foo:
> bar
> baz
>
> the above code prints:
> "
> bar
> baz
> "
>
> **Some notes: result is what's returned, and the reason you can use
> "foo" with a statement body is because any macro/template who's last
> parameter is type 'stmt' can be called with block semantics; similar to
> how UFCS works with the first parameter.**
>
> The above *might* look like the following in D:
>
> macro foo(ASTNode[] stmts...) {
> ASTNode[] result;
> foreach (s; stmts) {
> auto str = s.toASTString();
> result ~= new ASTCall!"writeln"(str);
> }
> return result;
> }
>
> void main() {
> foo {
> bar;
> baz;
> }
> }
>
> This kind of direct AST manipulation + body semantics opens the doors
> for a lot of cool things. If you read through Nim's lib documentation
> you'll notice many of the "language" features are actually just Library
> procedures in the defaultly included system module. Which is great
> because contributions to the *language* can be made very easily. Also,
> the infrastructure to read/write AST is no-doubt immensely useful for
> IDE support and other such dev tools.
>
> I'm not a huge fan of everything in Nimrod, but this is something they
> definitely got right, and I think D could gain from their experience.

This is a very cool feature of Nimrod. It allows to move several language features to the library.

* synchronized
* scope
* foreach (possibly)

-- 
/Jacob Carlborg

May 28, 2012

Re: Lexer and parser generators using CTFE

Posted by Philippe Sigaud
in reply to John Belmonte

Philippe Sigaud

Posted in reply to John Belmonte

On Sun, May 27, 2012 at 11:13 PM, John Belmonte <john@neggie.net> wrote:
> I'm wondering if people have seen LPeg.  It's a Lua library, but the design is interesting in that patterns are first class objects which can be composed with operator overloading.
>
> http://www.inf.puc-rio.br/~roberto/lpeg/

This is exactly the approach followed by C++ Boost::Spirit, with halas the limitations from Algol-language operators and precedence: no postfix '*' (Kleene Star) operator. It seems that lpeg has the same problem.

I played with this idea with my own Pegged (https://github.com/PhilippeSigaud/Pegged), but I wasn't quite convinced by the result, exactly for the reason above. Also, when looking at real-world Spirit examples, I was a bit disappointed by the resulting syntax: it's not *that* readable for complicated expressions. In fact, that's exactly why I decided to follow the DSL road with Pegged, so as to obtain exactly the PEG syntax, with the real operators and their standard precedence.

Btw, if you're interested in expression templates, I uploaded yesterday a very preliminary project in Github, to construct, and then manipulate, expressions:

auto e = expr(); // The null expression
auto f = 2*e + 1- (e~"abc")*3; // f is an expression template that
encodes the right-hand side.

Then, it's up to the user to define what the expression represents.

https://github.com/PhilippeSigaud/Expression-Tree

May 28, 2012

Re: Lexer and parser generators using CTFE

Posted by John Belmonte
in reply to Philippe Sigaud

John Belmonte

Posted in reply to Philippe Sigaud

On Monday, 28 May 2012 at 12:27:09 UTC, Philippe Sigaud wrote:
> I played with this idea with my own Pegged
> (https://github.com/PhilippeSigaud/Pegged), but I wasn't quite
> convinced by the result, exactly for the reason above. Also, when
> looking at real-world Spirit examples, I was a bit disappointed by the
> resulting syntax: it's not *that* readable for complicated
> expressions. In fact, that's exactly why I decided to follow the DSL
> road with Pegged, so as to obtain exactly the PEG syntax, with the
> real operators and their standard precedence.

Fair enough.  I did notice the following in the markdown PEG though which could benefit from first class patterns:


# Parsers for different kinds of block-level HTML content.
# This is repetitive due to constraints of PEG grammar.

HtmlBlockOpenAddress <- '<' Spnl ("address" / "ADDRESS") Spnl HtmlAttribute* '>'
HtmlBlockCloseAddress <- '<' Spnl '/' ("address" / "ADDRESS") Spnl '>'
HtmlBlockAddress <- HtmlBlockOpenAddress (HtmlBlockAddress /
   !HtmlBlockCloseAddress .)* HtmlBlockCloseAddress

HtmlBlockOpenBlockquote <- '<' Spnl ("blockquote" / "BLOCKQUOTE") Spnl
   HtmlAttribute* '>'
HtmlBlockCloseBlockquote <- '<' Spnl '/' ("blockquote" / "BLOCKQUOTE") Spnl '>'
HtmlBlockBlockquote <- HtmlBlockOpenBlockquote (HtmlBlockBlockquote /
   !HtmlBlockCloseBlockquote .)* HtmlBlockCloseBlockquote

.
.
.

May 29, 2012

Re: Lexer and parser generators using CTFE

Posted by Philippe Sigaud
in reply to John Belmonte

Philippe Sigaud

Posted in reply to John Belmonte

On Mon, May 28, 2012 at 11:42 PM, John Belmonte <john@neggie.net> wrote:
> Fair enough.  I did notice the following in the markdown PEG though which could benefit from first class patterns:
>
>
> # Parsers for different kinds of block-level HTML content. # This is repetitive due to constraints of PEG grammar.
>
> HtmlBlockOpenAddress <- '<' Spnl ("address" / "ADDRESS") Spnl HtmlAttribute*
> '>'
> HtmlBlockCloseAddress <- '<' Spnl '/' ("address" / "ADDRESS") Spnl '>'
> HtmlBlockAddress <- HtmlBlockOpenAddress (HtmlBlockAddress /
>   !HtmlBlockCloseAddress .)* HtmlBlockCloseAddress
>
> HtmlBlockOpenBlockquote <- '<' Spnl ("blockquote" / "BLOCKQUOTE") Spnl
>   HtmlAttribute* '>'
> HtmlBlockCloseBlockquote <- '<' Spnl '/' ("blockquote" / "BLOCKQUOTE") Spnl
> '>'
> HtmlBlockBlockquote <- HtmlBlockOpenBlockquote (HtmlBlockBlockquote /
>   !HtmlBlockCloseBlockquote .)* HtmlBlockCloseBlockquote
>

You're exactly right! Nice catch.

I took this PEG from another github project (I hope I put the attribution somewhere?) and that was before Pegged accepted parameterized rules. I could indeed drastically factor the previous code.

June 01, 2012

Re: Lexer and parser generators using CTFE

Posted by Hisayuki Mima
in reply to d coder

Hisayuki Mima

Posted in reply to d coder

(2012年05月28日 02:31), d coder wrote:
>
>     Generally a parser generated by other tool and accepting tokens
>     returns the abstract syntax tree, but it return the evaluated value
>     in the example.
>     In other words, it does lexical analysis, parsing and (type)
>     converting at a time.
>     If you want simply abstract syntax tree, it may be a little pain to
>     use ctpg.
>
>
>   Hello Youkei
>
> I am trying to use CTPG for compile time parsing for a DSL I am working
> on. I have tried the examples you created in the examples directory.
>
> I would like the parser to effect some side effects. For this purpose, I
> tried including the parser mixin into a class, but I got a strange error
> saying:
>
> Error: need 'this' to access member parse
>
> I have copied the code I am trying to compile at the end of the email.
> Let me know what I could be doing wrong here.
>
> Regards
> - Puneet
>
>
> import ctpg;
> import std.array: join;
> import std.conv: to;
>
> class Foo
> {
>    int result;
>    mixin(generateParsers(q{
>          int root = mulExp $;
>
>          int mulExp =
>            primary !"*" mulExp >> (lhs, rhs){ return lhs * rhs; }
>          / primary;
>
>          int primary = !"(" mulExp !")" / [0-9]+ >> join >> to!int;
>        }));
>
>    void frop() {
>      result = parse!root("5*8");
>    }
> }
>
>
> void main(){
>    Foo foo = new Foo();
>    foo.frop();
> }
>

Hello Puneet,

Thank you for your report. I fixed it. Now CTPG creates a static function as a parser.
But I'm afraid this fix doesn't help you because I don't understand what a side effect you said is.
Can you show me some examples which include the side effect?

Thanks,
Hisayuki Mima

June 01, 2012

Re: Lexer and parser generators using CTFE

Posted by Ken
in reply to Andrei Alexandrescu

Ken

Posted in reply to Andrei Alexandrescu

On Tuesday, 28 February 2012 at 07:59:16 UTC, Andrei Alexandrescu wrote:
> I'm starting a new thread on this because I think the matter is of strategic importance.
>
> We all felt for a long time that there's a lot of potential in CTFE, and potential applications have been discussed more than a few times, ranging from formatting strings parsed to DSLs and parser generators.
>
> Such feats are now approaching fruition because a number of factors converge:
>
> * Dmitry Olshansky's regex library (now in Phobos) generates efficient D code straight from regexen.
>
> * The scope and quality of CTFE has improved enormously, making more advanced uses possible and even relatively easy (thanks Don!)
>
> * Hisayuki Mima implemented a parser generator in only 3000 lines of code (sadly, no comments or documentation yet :o))
>
> * With the occasion of that announcement we also find out Philippe Sigaud has already a competing design and implementation of a parser generator.
>
> This is the kind of stuff I've had an eye on for the longest time. I'm saying it's of strategic importance because CTFE technology, though not new and already available with some languages, has unique powers when combined with other features of D. With CTFE we get to do things that are quite literally impossible to do in other languages.
>
> We need to have a easy-to-use, complete, seamless, and efficient lexer-parser generator combo in Phobos, pronto. The lexer itself could use a character-level PEG or a classic automaton, and emit tokens for consumption by a parser generator. The two should work in perfect tandem (no need for glue code). At the end of the day, defining a complete lexer+parser combo for a language should be just a few lines longer than the textual representation of the grammar itself.
>
> What do you all think? Let's get this project off the ground!
>
>
> Thanks,
>
> Andrei

Great!  So what's the next step?  Do we wait for the maintainers of one of the CTFE parser gen packages to drop it in the Phobos Review Queue?  Do a reimplementation for Phobos?

We could attack this in pieces.  Start with a lexer/FSA generator (like Ragel but using CTFE) - this will make it much easier to consume many wire protocols, for starters (I found it very easy to make a simple HTTP client using Ragel), and will be quite useful on its own.  Then extend it into a lexer for a parser generator.

June 01, 2012

Re: Lexer and parser generators using CTFE

Posted by Andrei Alexandrescu
in reply to Ken

Andrei Alexandrescu

Posted in reply to Ken

On 6/1/12 8:39 AM, Ken wrote:
> Great! So what's the next step? Do we wait for the maintainers of one of
> the CTFE parser gen packages to drop it in the Phobos Review Queue? Do a
> reimplementation for Phobos?
>
> We could attack this in pieces. Start with a lexer/FSA generator (like
> Ragel but using CTFE) - this will make it much easier to consume many
> wire protocols, for starters (I found it very easy to make a simple HTTP
> client using Ragel), and will be quite useful on its own. Then extend it
> into a lexer for a parser generator.

I think this core strength of the language should be properly supplanted by library support, so I'd be very interested to look at Phobos proposals. The proposals should come with sample grammars of nontrivial size, ideally a parser for the entire D language.

There might be things to fix in the compiler, e.g. memory consumption.

Andrei

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation