Lexer and parser generators using CTFE (page 5)

On 2/29/12 10:03 AM, Hisayuki Mima wrote: > A minimum of documentation is now available here: > https://github.com/youkei/ctpg/wiki/Home-en Great start, you may want to aggressively add examples. Then people would love using ctpg and will write more documentation, tutorials etc. Andrei

"Philippe Sigaud" <philippe.sigaud@gmail.com> wrote in message news:mailman.229.1330508922.24984.digitalmars-d@puremagic.com... Le 29 févr. 2012 07:20, "Nick Sabalausky" <a@a.a> a écrit : > >> >> Just gave it a quick shot. It was looking like it might not be too bad, >but >> then I hit: >> >> Assertion failure: 'ctfeStack.stackPointer() == 0' on line 4823 in file >> 'interpret.c' >> >> Bleh. >> >> (This was with DMD 2.058) > >Yeah, I had the very same yesterday :( > >Also, another one on line 94 in interpret.c 'v->ctfeSomethin' failing. > >Too bad. > >In my case, I found a workaround: I was doing > >array[] ~= SomeStruct(args); > >which asserts at CTFE. > >But: > >auto s = SomeStructs(args); >array[] ~= s; > >works. > Ooh, cool! I know exactly where I'm doing that!

"Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message news:jilkj6$1a50$2@digitalmars.com... > On 2/28/12 11:15 PM, Nick Sabalausky wrote: >> In Goldie, I've taken an inverted approach, which IMHO is easier to use: >> The >> types are automatically generated from the grammar, not the other way >> around. So applying that approach to the above code, it'd be more like >> this: >> >> mixin genGrammar!("myGrammar", ` >> Identifier = [a-zA-Z_][a-zA-Z_0-9]* >> Module = Declaration+ >> Declaration = StructDeclaration >> StructDeclaration = 'struct' Identifier '{' Declaration* '}' >> `); >> >> Which generates these classes: >> >> Parser!"myGrammar" >> Symbol!("myGrammar.Identifier") >> Symbol!("myGrammar.Module") >> Symbol!("myGrammar.Declaration") >> Symbol!("myGrammar.StructDeclaration") >> >> and/or these: >> >> Parser_myGrammar >> Symbol_myGrammar!"Identifier" >> Symbol_myGrammar!"Module" >> Symbol_myGrammar!"Declaration" >> Symbol_myGrammar!"StructDeclaration" >> >> would could then be aliased by the user however they wanted: >> >> alias Symbol_myGrammar MySym; >> >> And there can still be hooks (delegates, subclassing, whatever) to add customized behavior/functionality. > > I think this is the right approach. > What? We agree? ;)

"Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message news:jiljf7$18em$1@digitalmars.com... > On 2/28/12 9:57 AM, H. S. Teoh wrote: >> On Tue, Feb 28, 2012 at 01:59:21AM -0600, Andrei Alexandrescu wrote: >>> This is the kind of stuff I've had an eye on for the longest time. I'm saying it's of strategic importance because CTFE technology, though not new and already available with some languages, has unique powers when combined with other features of D. With CTFE we get to do things that are quite literally impossible to do in other languages. >> >> For example? > > For example integrated lexer and parser generators! > ...With statically-checked symbol names that can include arbitrary characters: Sym!"+" Sym!"{" Sym!"<Foo>" etc...

"Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message news:jiljsg$193s$1@digitalmars.com... > On 2/28/12 1:52 PM, Dmitry Olshansky wrote: >> - have reasonable compile times and memory consumption (though it will >> only improve over time) > > Yes. I guess PEGs have problems there. > Probably LR, too, unless you build the state tables in a separate prior build step. >> There is prototype of interactive regex matcher that >> works directly on stream (buried in std.regex), it even passed dry-run >> unittests back then. Though I had to postpone till I/O is sorted out. I >> really loved Steven's design with it's easy access to buffer and well >> thought out primitives, hope it will come about sometime soon. > > An interactive regex would be a dream come true to me... > One thing I've been wanting to see in a lexer's regex would be a way to match things like D's nested comments or: qEOS fancy string blah blah EOS Support for either one would probably render it "not *technically* a regex" but it should be entirely possible by just adding some extra data/processing to the states.

"Nick Sabalausky" <a@a.a> wrote in message news:jilmhr$1dul$1@digitalmars.com... > "Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message news:jiljsg$193s$1@digitalmars.com... >> On 2/28/12 1:52 PM, Dmitry Olshansky wrote: >>> - have reasonable compile times and memory consumption (though it will >>> only improve over time) >> >> Yes. I guess PEGs have problems there. >> > > Probably LR, too, unless you build the state tables in a separate prior build step. > Erm, I mean LALR specifically.

On 2/29/12 11:15 AM, Nick Sabalausky wrote: > "Andrei Alexandrescu"<SeeWebsiteForEmail@erdani.org> wrote in message > news:jiljsg$193s$1@digitalmars.com... >> On 2/28/12 1:52 PM, Dmitry Olshansky wrote: >>> - have reasonable compile times and memory consumption (though it will >>> only improve over time) >> >> Yes. I guess PEGs have problems there. >> > > Probably LR, too, unless you build the state tables in a separate prior > build step. It's been a while since I read about PEGs (the packrat parser paper), but think it's a more fundamental issue with PEGs because they need to memoize several possible parses of the input. Andrei

February 29, 2012

Re: Lexer and parser generators using CTFE

Posted by Nick Sabalausky
in reply to Andrei Alexandrescu

Permalink

Nick Sabalausky

Posted in reply to Andrei Alexandrescu

Permalink

"Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message news:jilnjg$1eu9$3@digitalmars.com...
> On 2/29/12 11:15 AM, Nick Sabalausky wrote:
>> "Andrei Alexandrescu"<SeeWebsiteForEmail@erdani.org>  wrote in message news:jiljsg$193s$1@digitalmars.com...
>>> On 2/28/12 1:52 PM, Dmitry Olshansky wrote:
>>>> - have reasonable compile times and memory consumption (though it will
>>>> only improve over time)
>>>
>>> Yes. I guess PEGs have problems there.
>>>
>>
>> Probably LR, too, unless you build the state tables in a separate prior build step.
>
> It's been a while since I read about PEGs (the packrat parser paper), but think it's a more fundamental issue with PEGs because they need to memoize several possible parses of the input.
>

I don't know much about packrat parsers, but what I meant is that generating LALR tables (which are then used by an LALR parser) can be very computationally expensive: For simple grammars, it's trivial, but for more complex "typical programming langauge" grammars, it can easily take upwards of quite a few minutes when *not* done in CTFE (at least on my underpowered machine). Although maybe my implementation and GOLD's implementation both just have some additional hidden scaling issue that isn't inherent to the algorithms?

Actually using an *already*-generated LALR table (which only needs to be generated once per grammar, not once per input) to parse doesn't have major inherent efficiency issues compared to other parsing algorithms, AFAIK.

On Wed, 29 Feb 2012 17:41:19 +0100, Timon Gehr <timon.gehr@gmx.ch> wrote: > On 02/28/2012 07:46 PM, Martin Nowak wrote: >> >> https://gist.github.com/1255439 - lexer generator >> https://gist.github.com/1262321 - complete and fast D lexer >> > > Well, it is slower at lexing than DMD at parsing. What is the bottleneck? No, it's as fast as dmd's lexer. Writing the tokens to stdout takes a lot of time though. Just disable the "writeln(tok);" in the main loop. There is also an --echo option which recreates the complete source from the tokens.

Forums