Lexer and parser generators using CTFE (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Lexer and parser generators using CTFE (page 2)

February 28, 2012

Re: Lexer and parser generators using CTFE

Posted by deadalnix
in reply to H. S. Teoh

deadalnix

Posted in reply to H. S. Teoh

Le 28/02/2012 21:02, H. S. Teoh a écrit :
> OTOH, perhaps once we start writing a lexer/parser generator, that will
> force us to fix the I/O and allocator system. Without a good project to
> motivate us to fix these tedious issues, it seems that we just lose
> inspiration to do it after a while.
>

std.conainer is already a good candidate for allocators.

Back to the topic, this is a great idea, but isn't it building on weak bases ? We still have issue with const correctness, or postblit, and on lib side, we lack allocators, error handling is poor, containers are poor, etc . . .

February 28, 2012

Re: Lexer and parser generators using CTFE

Posted by Martin Nowak
in reply to H. S. Teoh

Martin Nowak

Posted in reply to H. S. Teoh

On Tue, 28 Feb 2012 21:02:24 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:

> On Tue, Feb 28, 2012 at 07:46:04PM +0100, Martin Nowak wrote:
> [...]
>> I wrote a generic lexer generator some time ago.
>> It already let to some compiler O(N^2) optimizations, because the token
>> declarations sneak into the mangling :(.
>> I also finally added a workaround for a remaining CTFE bug (#6815).
>>
>> https://gist.github.com/1255439 - lexer generator
>> https://gist.github.com/1262321 - complete and fast D lexer
>
> Cool! I'll have to take a look at this sometime.
>
>
> [...]
>> <PERSONAL OPINION
>> The hassle of providing good error messages and synthesizing parse results
>> in a generic parser outweigh the benefit of a declarative grammar.
>> /PERSONAL OPINION>
>
> But things like lex/yacc have been useful throughout the years. With D's
> delegates, lexer/parser action rules should be even cleaner, no?
>
Yacc does work but at the price of an additional build step and total automaton obfuscation.
And even at that price the results are still hard to maintain klingon sources.
http://supercollider.git.sourceforge.net/git/gitweb.cgi?p=supercollider/supercollider;a=blob;f=lang/LangSource/Bison/lang11d

I won't deny that the combination of CTFE text processing and static introspection could
improve on this. It could be made more feasible by some conventions, e.g. parse result
always uses structs or classes and built-in arrays.

----

class Module
{
    this(Declaration[]);
}

class StructDeclaration : Declaration
{
    enum _enbf = "struct $1=Identifier { $2=Declaration* }";

    this(Identifier, Declaration[]);
}

...

Parser!(Module, StructDeclaration, ...) parser;
Module m = parser.parse(read("file.d"));

February 28, 2012

Re: Lexer and parser generators using CTFE

Posted by H. S. Teoh
in reply to Martin Nowak

H. S. Teoh

Posted in reply to Martin Nowak

On Tue, Feb 28, 2012 at 10:03:42PM +0100, Martin Nowak wrote: [...]
> I won't deny that the combination of CTFE text processing and static introspection could improve on this. It could be made more feasible by some conventions, e.g. parse result always uses structs or classes and built-in arrays.

Excellent idea, I like this.


> class Module
> {
>     this(Declaration[]);
> }
> 
> class StructDeclaration : Declaration
> {
>     enum _enbf = "struct $1=Identifier { $2=Declaration* }";
> 
>     this(Identifier, Declaration[]);
> }
> 
> ...
> 
> Parser!(Module, StructDeclaration, ...) parser;
> Module m = parser.parse(read("file.d"));

I like this! Definitely an improvement over yacc syntax.


T

-- 
Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn

February 29, 2012

Re: Lexer and parser generators using CTFE

Posted by Christopher Bergqvist
in reply to CTFE-4-the-win

Christopher Bergqvist

Posted in reply to CTFE-4-the-win

On Tuesday, 28 February 2012 at 08:36:13 UTC, CTFE-4-the-win wrote:
> On Tuesday, 28 February 2012 at 07:59:16 UTC, Andrei Alexandrescu
> wrote:
>> I'm starting a new thread on this because I think the matter is of strategic importance.
>>
>> We all felt for a long time that there's a lot of potential in CTFE, and potential applications have been discussed more than a few times, ranging from formatting strings parsed to DSLs and parser generators.
>>
>> Such feats are now approaching fruition because a number of factors converge:
>>
>> * Dmitry Olshansky's regex library (now in Phobos) generates efficient D code straight from regexen.
>>
>> * The scope and quality of CTFE has improved enormously, making more advanced uses possible and even relatively easy (thanks Don!)
>>
>> * Hisayuki Mima implemented a parser generator in only 3000 lines of code (sadly, no comments or documentation yet :o))
>>
>> * With the occasion of that announcement we also find out Philippe Sigaud has already a competing design and implementation of a parser generator.
>>
>> This is the kind of stuff I've had an eye on for the longest time. I'm saying it's of strategic importance because CTFE technology, though not new and already available with some languages, has unique powers when combined with other features of D. With CTFE we get to do things that are quite literally impossible to do in other languages.
>>
>> We need to have a easy-to-use, complete, seamless, and efficient lexer-parser generator combo in Phobos, pronto. The lexer itself could use a character-level PEG or a classic automaton, and emit tokens for consumption by a parser generator. The two should work in perfect tandem (no need for glue code). At the end of the day, defining a complete lexer+parser combo for a language should be just a few lines longer than the textual representation of the grammar itself.
>>
>> What do you all think? Let's get this project off the ground!
>>
>>
>> Thanks,
>>
>> Andrei
>
> Definitely, I applaud this initiative! I've long been of the
> opinion that CTFE parsing is D's killer-feature, which would
> allow me to "sneak" D into a "nameless above average size
> company". ;)

I agree that the current direction of D in this area is impressive.  However, I fail to see a killer-feature in generating a lexer-parser generator at compile-time instead of run-time.

A run-time generator would benefit from not having to execute within the limited CTFE environment and would always be on-par in that respect.  A compile time generator would internalize the generation and compilation of the result (with possible glue-code), simplifying the build process somewhat.

What am I failing to pick up on?

February 29, 2012

Re: Lexer and parser generators using CTFE

Posted by Jonathan M Davis
in reply to Christopher Bergqvist

Jonathan M Davis

Posted in reply to Christopher Bergqvist

On Wednesday, February 29, 2012 02:16:12 Christopher Bergqvist wrote:
> I agree that the current direction of D in this area is impressive. However, I fail to see a killer-feature in generating a lexer-parser generator at compile-time instead of run-time.
> 
> A run-time generator would benefit from not having to execute within the limited CTFE environment and would always be on-par in that respect. A compile time generator would internalize the generation and compilation of the result (with possible glue-code), simplifying the build process somewhat.
> 
> What am I failing to pick up on?

Well, one cool thing would be that a compiler could effectively compile itself - or at least the lexer and parser would compile themselves. You wouldn't have to run a program to generate them. It could be self-contained.

- Jonathan M Davis

February 29, 2012

Re: Lexer and parser generators using CTFE

Posted by Piotr Szturmaj
in reply to Jonathan M Davis

Piotr Szturmaj

Posted in reply to Jonathan M Davis

Jonathan M Davis wrote:
> On Wednesday, February 29, 2012 02:16:12 Christopher Bergqvist wrote:
>> I agree that the current direction of D in this area is
>> impressive. However, I fail to see a killer-feature in
>> generating a lexer-parser generator at compile-time instead of
>> run-time.
>>
>> A run-time generator would benefit from not having to execute
>> within the limited CTFE environment and would always be on-par in
>> that respect. A compile time generator would internalize the
>> generation and compilation of the result (with possible
>> glue-code), simplifying the build process somewhat.
>>
>> What am I failing to pick up on?
>
> Well, one cool thing would be that a compiler could effectively compile itself
> - or at least the lexer and parser would compile themselves. You wouldn't have
> to run a program to generate them. It could be self-contained.

CTFE code can be much slower than native one, and I would like to see some kind of compiler cache for such code.

February 29, 2012

Re: Lexer and parser generators using CTFE

Posted by James Miller
in reply to Christopher Bergqvist

James Miller

Posted in reply to Christopher Bergqvist

On 29 February 2012 14:16, Christopher Bergqvist <spambox0@digitalpoetry.se> wrote:
> On Tuesday, 28 February 2012 at 08:36:13 UTC, CTFE-4-the-win wrote:
>>
>> On Tuesday, 28 February 2012 at 07:59:16 UTC, Andrei Alexandrescu wrote:
>>>
>>> I'm starting a new thread on this because I think the matter is of strategic importance.
>>>
>>> We all felt for a long time that there's a lot of potential in CTFE, and potential applications have been discussed more than a few times, ranging from formatting strings parsed to DSLs and parser generators.
>>>
>>> Such feats are now approaching fruition because a number of factors converge:
>>>
>>> * Dmitry Olshansky's regex library (now in Phobos) generates efficient D
>>> code straight from regexen.
>>>
>>> * The scope and quality of CTFE has improved enormously, making more advanced uses possible and even relatively easy (thanks Don!)
>>>
>>> * Hisayuki Mima implemented a parser generator in only 3000 lines of code
>>> (sadly, no comments or documentation yet :o))
>>>
>>> * With the occasion of that announcement we also find out Philippe Sigaud has already a competing design and implementation of a parser generator.
>>>
>>> This is the kind of stuff I've had an eye on for the longest time. I'm saying it's of strategic importance because CTFE technology, though not new and already available with some languages, has unique powers when combined with other features of D. With CTFE we get to do things that are quite literally impossible to do in other languages.
>>>
>>> We need to have a easy-to-use, complete, seamless, and efficient lexer-parser generator combo in Phobos, pronto. The lexer itself could use a character-level PEG or a classic automaton, and emit tokens for consumption by a parser generator. The two should work in perfect tandem (no need for glue code). At the end of the day, defining a complete lexer+parser combo for a language should be just a few lines longer than the textual representation of the grammar itself.
>>>
>>> What do you all think? Let's get this project off the ground!
>>>
>>>
>>> Thanks,
>>>
>>> Andrei
>>
>>
>> Definitely, I applaud this initiative! I've long been of the opinion that CTFE parsing is D's killer-feature, which would allow me to "sneak" D into a "nameless above average size company". ;)
>
>
> I agree that the current direction of D in this area is impressive.  However, I fail to see a killer-feature in generating a lexer-parser generator at compile-time instead of run-time.
>
> A run-time generator would benefit from not having to execute within the limited CTFE environment and would always be on-par in that respect.  A compile time generator would internalize the generation and compilation of the result (with possible glue-code), simplifying the build process somewhat.
>
> What am I failing to pick up on?

As far as I can tell, the advantage of compile-time parser-generation is that you get significantly better start-up times for the program. Generating a lexer/parser at runtime produces signficant overhead. Also, if you want to do a yacc-style action system, then it being created at compile-time means that you don't have the overhead of using anonymous functions or delegates at runtime, you can just give it a raw string of D code.

The CTFE environment is not that limited, and has enough functionality to produce a lexer/parser.

Complex language definitions would be benefited by CTFE parser-generation since the load times to create the generator can be quite long, a problem for programs that may not be long-running (a lint tool for example).

--
James Miller

February 29, 2012

Re: Lexer and parser generators using CTFE

Posted by H. S. Teoh

H. S. Teoh

On Tue, Feb 28, 2012 at 08:25:21PM -0500, Jonathan M Davis wrote:
> On Wednesday, February 29, 2012 02:16:12 Christopher Bergqvist wrote:
> > I agree that the current direction of D in this area is impressive. However, I fail to see a killer-feature in generating a lexer-parser generator at compile-time instead of run-time.
> > 
> > A run-time generator would benefit from not having to execute within the limited CTFE environment and would always be on-par in that respect. A compile time generator would internalize the generation and compilation of the result (with possible glue-code), simplifying the build process somewhat.
> > 
> > What am I failing to pick up on?
> 
> Well, one cool thing would be that a compiler could effectively compile itself - or at least the lexer and parser would compile themselves. You wouldn't have to run a program to generate them. It could be self-contained.
[...]

You also eliminate possibly lengthy program startup initialization. Which can add up, if the program is run many times. Plus, if something can already be computed at compile-time, why waste time & resources doing it at runtime?


T

-- 
Без труда не выловишь и рыбку из пруда.

February 29, 2012

Re: Lexer and parser generators using CTFE

Posted by d coder

d coder

Attachments:

text/html part

>
>
> > On Wednesday, February 29, 2012 02:16:12 Christopher Bergqvist wrote:
> > > I agree that the current direction of D in this area is impressive. However, I fail to see a killer-feature in generating a lexer-parser generator at compile-time instead of run-time.
> > >
>

CTFE parsing is especially useful for DSEL (Domain Specific Embedded Languages) or internal DSLs. The advantages are:

1. Syntactic errors (in the parsed constructs)  are given out at compile
time.
2. D reflections are available only at compile time. Referencing the
variables/identifiers in the parsed subset of DSL with the mainstream D
code is impossible without reflections in place.

Regards
- Puneet

February 29, 2012

Re: Lexer and parser generators using CTFE

Posted by Nick Sabalausky
in reply to H. S. Teoh

Nick Sabalausky

Posted in reply to H. S. Teoh

"H. S. Teoh" <hsteoh@quickfur.ath.cx> wrote in message news:mailman.215.1330472867.24984.digitalmars-d@puremagic.com...
> On Tue, Feb 28, 2012 at 10:03:42PM +0100, Martin Nowak wrote: [...]
>> I won't deny that the combination of CTFE text processing and static introspection could improve on this. It could be made more feasible by some conventions, e.g. parse result always uses structs or classes and built-in arrays.
>
> Excellent idea, I like this.
>
>
>> class Module
>> {
>>     this(Declaration[]);
>> }
>>
>> class StructDeclaration : Declaration
>> {
>>     enum _enbf = "struct $1=Identifier { $2=Declaration* }";
>>
>>     this(Identifier, Declaration[]);
>> }
>>
>> ...
>>
>> Parser!(Module, StructDeclaration, ...) parser;
>> Module m = parser.parse(read("file.d"));
>
> I like this! Definitely an improvement over yacc syntax.
>

In Goldie, I've taken an inverted approach, which IMHO is easier to use: The types are automatically generated from the grammar, not the other way around. So applying that approach to the above code, it'd be more like this:

mixin genGrammar!("myGrammar", `
    Identifier = [a-zA-Z_][a-zA-Z_0-9]*
    Module = Declaration+
    Declaration = StructDeclaration
    StructDeclaration = 'struct' Identifier '{' Declaration* '}'
`);

Which generates these classes:

Parser!"myGrammar"
Symbol!("myGrammar.Identifier")
Symbol!("myGrammar.Module")
Symbol!("myGrammar.Declaration")
Symbol!("myGrammar.StructDeclaration")

and/or these:

Parser_myGrammar
Symbol_myGrammar!"Identifier"
Symbol_myGrammar!"Module"
Symbol_myGrammar!"Declaration"
Symbol_myGrammar!"StructDeclaration"

would could then be aliased by the user however they wanted:

alias Symbol_myGrammar MySym;

And there can still be hooks (delegates, subclassing, whatever) to add customized behavior/functionality.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation