View mode: basic / threaded / horizontal-split · Log in · Help
February 28, 2012
Re: Lexer and parser generators using CTFE
Le 28/02/2012 21:02, H. S. Teoh a écrit :
> OTOH, perhaps once we start writing a lexer/parser generator, that will
> force us to fix the I/O and allocator system. Without a good project to
> motivate us to fix these tedious issues, it seems that we just lose
> inspiration to do it after a while.
>

std.conainer is already a good candidate for allocators.

Back to the topic, this is a great idea, but isn't it building on weak 
bases ? We still have issue with const correctness, or postblit, and on 
lib side, we lack allocators, error handling is poor, containers are 
poor, etc . . .
February 28, 2012
Re: Lexer and parser generators using CTFE
On Tue, 28 Feb 2012 21:02:24 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx>  
wrote:

> On Tue, Feb 28, 2012 at 07:46:04PM +0100, Martin Nowak wrote:
> [...]
>> I wrote a generic lexer generator some time ago.
>> It already let to some compiler O(N^2) optimizations, because the token
>> declarations sneak into the mangling :(.
>> I also finally added a workaround for a remaining CTFE bug (#6815).
>>
>> https://gist.github.com/1255439 - lexer generator
>> https://gist.github.com/1262321 - complete and fast D lexer
>
> Cool! I'll have to take a look at this sometime.
>
>
> [...]
>> <PERSONAL OPINION
>> The hassle of providing good error messages and synthesizing parse  
>> results
>> in a generic parser outweigh the benefit of a declarative grammar.
>> /PERSONAL OPINION>
>
> But things like lex/yacc have been useful throughout the years. With D's
> delegates, lexer/parser action rules should be even cleaner, no?
>
Yacc does work but at the price of an additional build step and total  
automaton obfuscation.
And even at that price the results are still hard to maintain klingon  
sources.
http://supercollider.git.sourceforge.net/git/gitweb.cgi?p=supercollider/supercollider;a=blob;f=lang/LangSource/Bison/lang11d

I won't deny that the combination of CTFE text processing and static  
introspection could
improve on this. It could be made more feasible by some conventions, e.g.  
parse result
always uses structs or classes and built-in arrays.

----

class Module
{
    this(Declaration[]);
}

class StructDeclaration : Declaration
{
    enum _enbf = "struct $1=Identifier { $2=Declaration* }";

    this(Identifier, Declaration[]);
}

...

Parser!(Module, StructDeclaration, ...) parser;
Module m = parser.parse(read("file.d"));
February 28, 2012
Re: Lexer and parser generators using CTFE
On Tue, Feb 28, 2012 at 10:03:42PM +0100, Martin Nowak wrote:
[...]
> I won't deny that the combination of CTFE text processing and static
> introspection could improve on this. It could be made more feasible by
> some conventions, e.g. parse result always uses structs or classes and
> built-in arrays.

Excellent idea, I like this.


> class Module
> {
>     this(Declaration[]);
> }
> 
> class StructDeclaration : Declaration
> {
>     enum _enbf = "struct $1=Identifier { $2=Declaration* }";
> 
>     this(Identifier, Declaration[]);
> }
> 
> ...
> 
> Parser!(Module, StructDeclaration, ...) parser;
> Module m = parser.parse(read("file.d"));

I like this! Definitely an improvement over yacc syntax.


T

-- 
Unix was not designed to stop people from doing stupid things, because
that would also stop them from doing clever things. -- Doug Gwyn
February 29, 2012
Re: Lexer and parser generators using CTFE
On Tuesday, 28 February 2012 at 08:36:13 UTC, CTFE-4-the-win 
wrote:
> On Tuesday, 28 February 2012 at 07:59:16 UTC, Andrei 
> Alexandrescu
> wrote:
>> I'm starting a new thread on this because I think the matter 
>> is of strategic importance.
>>
>> We all felt for a long time that there's a lot of potential in 
>> CTFE, and potential applications have been discussed more than 
>> a few times, ranging from formatting strings parsed to DSLs 
>> and parser generators.
>>
>> Such feats are now approaching fruition because a number of 
>> factors converge:
>>
>> * Dmitry Olshansky's regex library (now in Phobos) generates 
>> efficient D code straight from regexen.
>>
>> * The scope and quality of CTFE has improved enormously, 
>> making more advanced uses possible and even relatively easy 
>> (thanks Don!)
>>
>> * Hisayuki Mima implemented a parser generator in only 3000 
>> lines of code (sadly, no comments or documentation yet :o))
>>
>> * With the occasion of that announcement we also find out 
>> Philippe Sigaud has already a competing design and 
>> implementation of a parser generator.
>>
>> This is the kind of stuff I've had an eye on for the longest 
>> time. I'm saying it's of strategic importance because CTFE 
>> technology, though not new and already available with some 
>> languages, has unique powers when combined with other features 
>> of D. With CTFE we get to do things that are quite literally 
>> impossible to do in other languages.
>>
>> We need to have a easy-to-use, complete, seamless, and 
>> efficient lexer-parser generator combo in Phobos, pronto. The 
>> lexer itself could use a character-level PEG or a classic 
>> automaton, and emit tokens for consumption by a parser 
>> generator. The two should work in perfect tandem (no need for 
>> glue code). At the end of the day, defining a complete 
>> lexer+parser combo for a language should be just a few lines 
>> longer than the textual representation of the grammar itself.
>>
>> What do you all think? Let's get this project off the ground!
>>
>>
>> Thanks,
>>
>> Andrei
>
> Definitely, I applaud this initiative! I've long been of the
> opinion that CTFE parsing is D's killer-feature, which would
> allow me to "sneak" D into a "nameless above average size
> company". ;)

I agree that the current direction of D in this area is 
impressive.  However, I fail to see a killer-feature in 
generating a lexer-parser generator at compile-time instead of 
run-time.

A run-time generator would benefit from not having to execute 
within the limited CTFE environment and would always be on-par in 
that respect.  A compile time generator would internalize the 
generation and compilation of the result (with possible 
glue-code), simplifying the build process somewhat.

What am I failing to pick up on?
February 29, 2012
Re: Lexer and parser generators using CTFE
On Wednesday, February 29, 2012 02:16:12 Christopher Bergqvist wrote:
> I agree that the current direction of D in this area is
> impressive. However, I fail to see a killer-feature in
> generating a lexer-parser generator at compile-time instead of
> run-time.
> 
> A run-time generator would benefit from not having to execute
> within the limited CTFE environment and would always be on-par in
> that respect. A compile time generator would internalize the
> generation and compilation of the result (with possible
> glue-code), simplifying the build process somewhat.
> 
> What am I failing to pick up on?

Well, one cool thing would be that a compiler could effectively compile itself 
- or at least the lexer and parser would compile themselves. You wouldn't have 
to run a program to generate them. It could be self-contained.

- Jonathan M Davis
February 29, 2012
Re: Lexer and parser generators using CTFE
Jonathan M Davis wrote:
> On Wednesday, February 29, 2012 02:16:12 Christopher Bergqvist wrote:
>> I agree that the current direction of D in this area is
>> impressive. However, I fail to see a killer-feature in
>> generating a lexer-parser generator at compile-time instead of
>> run-time.
>>
>> A run-time generator would benefit from not having to execute
>> within the limited CTFE environment and would always be on-par in
>> that respect. A compile time generator would internalize the
>> generation and compilation of the result (with possible
>> glue-code), simplifying the build process somewhat.
>>
>> What am I failing to pick up on?
>
> Well, one cool thing would be that a compiler could effectively compile itself
> - or at least the lexer and parser would compile themselves. You wouldn't have
> to run a program to generate them. It could be self-contained.

CTFE code can be much slower than native one, and I would like to see 
some kind of compiler cache for such code.
February 29, 2012
Re: Lexer and parser generators using CTFE
On 29 February 2012 14:16, Christopher Bergqvist
<spambox0@digitalpoetry.se> wrote:
> On Tuesday, 28 February 2012 at 08:36:13 UTC, CTFE-4-the-win wrote:
>>
>> On Tuesday, 28 February 2012 at 07:59:16 UTC, Andrei Alexandrescu
>> wrote:
>>>
>>> I'm starting a new thread on this because I think the matter is of
>>> strategic importance.
>>>
>>> We all felt for a long time that there's a lot of potential in CTFE, and
>>> potential applications have been discussed more than a few times, ranging
>>> from formatting strings parsed to DSLs and parser generators.
>>>
>>> Such feats are now approaching fruition because a number of factors
>>> converge:
>>>
>>> * Dmitry Olshansky's regex library (now in Phobos) generates efficient D
>>> code straight from regexen.
>>>
>>> * The scope and quality of CTFE has improved enormously, making more
>>> advanced uses possible and even relatively easy (thanks Don!)
>>>
>>> * Hisayuki Mima implemented a parser generator in only 3000 lines of code
>>> (sadly, no comments or documentation yet :o))
>>>
>>> * With the occasion of that announcement we also find out Philippe Sigaud
>>> has already a competing design and implementation of a parser generator.
>>>
>>> This is the kind of stuff I've had an eye on for the longest time. I'm
>>> saying it's of strategic importance because CTFE technology, though not new
>>> and already available with some languages, has unique powers when combined
>>> with other features of D. With CTFE we get to do things that are quite
>>> literally impossible to do in other languages.
>>>
>>> We need to have a easy-to-use, complete, seamless, and efficient
>>> lexer-parser generator combo in Phobos, pronto. The lexer itself could use a
>>> character-level PEG or a classic automaton, and emit tokens for consumption
>>> by a parser generator. The two should work in perfect tandem (no need for
>>> glue code). At the end of the day, defining a complete lexer+parser combo
>>> for a language should be just a few lines longer than the textual
>>> representation of the grammar itself.
>>>
>>> What do you all think? Let's get this project off the ground!
>>>
>>>
>>> Thanks,
>>>
>>> Andrei
>>
>>
>> Definitely, I applaud this initiative! I've long been of the
>> opinion that CTFE parsing is D's killer-feature, which would
>> allow me to "sneak" D into a "nameless above average size
>> company". ;)
>
>
> I agree that the current direction of D in this area is impressive.
>  However, I fail to see a killer-feature in generating a lexer-parser
> generator at compile-time instead of run-time.
>
> A run-time generator would benefit from not having to execute within the
> limited CTFE environment and would always be on-par in that respect.  A
> compile time generator would internalize the generation and compilation of
> the result (with possible glue-code), simplifying the build process
> somewhat.
>
> What am I failing to pick up on?

As far as I can tell, the advantage of compile-time parser-generation
is that you get significantly better start-up times for the program.
Generating a lexer/parser at runtime produces signficant overhead.
Also, if you want to do a yacc-style action system, then it being
created at compile-time means that you don't have the overhead of
using anonymous functions or delegates at runtime, you can just give
it a raw string of D code.

The CTFE environment is not that limited, and has enough functionality
to produce a lexer/parser.

Complex language definitions would be benefited by CTFE
parser-generation since the load times to create the generator can be
quite long, a problem for programs that may not be long-running (a
lint tool for example).

--
James Miller
February 29, 2012
Re: Lexer and parser generators using CTFE
On Tue, Feb 28, 2012 at 08:25:21PM -0500, Jonathan M Davis wrote:
> On Wednesday, February 29, 2012 02:16:12 Christopher Bergqvist wrote:
> > I agree that the current direction of D in this area is impressive.
> > However, I fail to see a killer-feature in generating a lexer-parser
> > generator at compile-time instead of run-time.
> > 
> > A run-time generator would benefit from not having to execute within
> > the limited CTFE environment and would always be on-par in that
> > respect. A compile time generator would internalize the generation
> > and compilation of the result (with possible glue-code), simplifying
> > the build process somewhat.
> > 
> > What am I failing to pick up on?
> 
> Well, one cool thing would be that a compiler could effectively
> compile itself - or at least the lexer and parser would compile
> themselves. You wouldn't have to run a program to generate them. It
> could be self-contained.
[...]

You also eliminate possibly lengthy program startup initialization.
Which can add up, if the program is run many times. Plus, if something
can already be computed at compile-time, why waste time & resources
doing it at runtime?


T

-- 
Без труда не выловишь и рыбку из пруда.
February 29, 2012
Re: Lexer and parser generators using CTFE
>
>
> > On Wednesday, February 29, 2012 02:16:12 Christopher Bergqvist wrote:
> > > I agree that the current direction of D in this area is impressive.
> > > However, I fail to see a killer-feature in generating a lexer-parser
> > > generator at compile-time instead of run-time.
> > >
>

CTFE parsing is especially useful for DSEL (Domain Specific Embedded
Languages) or internal DSLs. The advantages are:

1. Syntactic errors (in the parsed constructs)  are given out at compile
time.
2. D reflections are available only at compile time. Referencing the
variables/identifiers in the parsed subset of DSL with the mainstream D
code is impossible without reflections in place.

Regards
- Puneet
February 29, 2012
Re: Lexer and parser generators using CTFE
"H. S. Teoh" <hsteoh@quickfur.ath.cx> wrote in message 
news:mailman.215.1330472867.24984.digitalmars-d@puremagic.com...
> On Tue, Feb 28, 2012 at 10:03:42PM +0100, Martin Nowak wrote:
> [...]
>> I won't deny that the combination of CTFE text processing and static
>> introspection could improve on this. It could be made more feasible by
>> some conventions, e.g. parse result always uses structs or classes and
>> built-in arrays.
>
> Excellent idea, I like this.
>
>
>> class Module
>> {
>>     this(Declaration[]);
>> }
>>
>> class StructDeclaration : Declaration
>> {
>>     enum _enbf = "struct $1=Identifier { $2=Declaration* }";
>>
>>     this(Identifier, Declaration[]);
>> }
>>
>> ...
>>
>> Parser!(Module, StructDeclaration, ...) parser;
>> Module m = parser.parse(read("file.d"));
>
> I like this! Definitely an improvement over yacc syntax.
>

In Goldie, I've taken an inverted approach, which IMHO is easier to use: The 
types are automatically generated from the grammar, not the other way 
around. So applying that approach to the above code, it'd be more like this:

mixin genGrammar!("myGrammar", `
   Identifier = [a-zA-Z_][a-zA-Z_0-9]*
   Module = Declaration+
   Declaration = StructDeclaration
   StructDeclaration = 'struct' Identifier '{' Declaration* '}'
`);

Which generates these classes:

Parser!"myGrammar"
Symbol!("myGrammar.Identifier")
Symbol!("myGrammar.Module")
Symbol!("myGrammar.Declaration")
Symbol!("myGrammar.StructDeclaration")

and/or these:

Parser_myGrammar
Symbol_myGrammar!"Identifier"
Symbol_myGrammar!"Module"
Symbol_myGrammar!"Declaration"
Symbol_myGrammar!"StructDeclaration"

would could then be aliased by the user however they wanted:

alias Symbol_myGrammar MySym;

And there can still be hooks (delegates, subclassing, whatever) to add 
customized behavior/functionality.
1 2 3 4 5 6
Top | Discussion index | About this forum | D home