View mode: basic / threaded / horizontal-split · Log in · Help
February 29, 2012
Re: Lexer and parser generators using CTFE
On 2/29/12 10:03 AM, Hisayuki Mima wrote:
> A minimum of documentation is now available here:
> https://github.com/youkei/ctpg/wiki/Home-en

Great start, you may want to aggressively add examples. Then people 
would love using ctpg and will write more documentation, tutorials etc.

Andrei
February 29, 2012
Re: Lexer and parser generators using CTFE
"Philippe Sigaud" <philippe.sigaud@gmail.com> wrote in message 
news:mailman.229.1330508922.24984.digitalmars-d@puremagic.com...
Le 29 févr. 2012 07:20, "Nick Sabalausky" <a@a.a> a écrit :
>
>>
>> Just gave it a quick shot. It was looking like it might not be too bad,
>but
>> then I hit:
>>
>> Assertion failure: 'ctfeStack.stackPointer() == 0' on line 4823 in file
>> 'interpret.c'
>>
>> Bleh.
>>
>> (This was with DMD 2.058)
>
>Yeah, I had the very same yesterday :(
>
>Also, another one on line 94 in interpret.c 'v->ctfeSomethin' failing.
>
>Too bad.
>
>In my case, I found a workaround: I was doing
>
>array[] ~= SomeStruct(args);
>
>which asserts at CTFE.
>
>But:
>
>auto s = SomeStructs(args);
>array[] ~= s;
>
>works.
>

Ooh, cool! I know exactly where I'm doing that!
February 29, 2012
Re: Lexer and parser generators using CTFE
"Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message 
news:jilkj6$1a50$2@digitalmars.com...
> On 2/28/12 11:15 PM, Nick Sabalausky wrote:
>> In Goldie, I've taken an inverted approach, which IMHO is easier to use: 
>> The
>> types are automatically generated from the grammar, not the other way
>> around. So applying that approach to the above code, it'd be more like 
>> this:
>>
>> mixin genGrammar!("myGrammar", `
>>      Identifier = [a-zA-Z_][a-zA-Z_0-9]*
>>      Module = Declaration+
>>      Declaration = StructDeclaration
>>      StructDeclaration = 'struct' Identifier '{' Declaration* '}'
>> `);
>>
>> Which generates these classes:
>>
>> Parser!"myGrammar"
>> Symbol!("myGrammar.Identifier")
>> Symbol!("myGrammar.Module")
>> Symbol!("myGrammar.Declaration")
>> Symbol!("myGrammar.StructDeclaration")
>>
>> and/or these:
>>
>> Parser_myGrammar
>> Symbol_myGrammar!"Identifier"
>> Symbol_myGrammar!"Module"
>> Symbol_myGrammar!"Declaration"
>> Symbol_myGrammar!"StructDeclaration"
>>
>> would could then be aliased by the user however they wanted:
>>
>> alias Symbol_myGrammar MySym;
>>
>> And there can still be hooks (delegates, subclassing, whatever) to add
>> customized behavior/functionality.
>
> I think this is the right approach.
>

What? We agree? ;)
February 29, 2012
Re: Lexer and parser generators using CTFE
On 2/29/12 11:05 AM, Nick Sabalausky wrote:
> What? We agree? ;)

It's 2012 after all.

Andrei
February 29, 2012
Re: Lexer and parser generators using CTFE
"Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message 
news:jiljf7$18em$1@digitalmars.com...
> On 2/28/12 9:57 AM, H. S. Teoh wrote:
>> On Tue, Feb 28, 2012 at 01:59:21AM -0600, Andrei Alexandrescu wrote:
>>> This is the kind of stuff I've had an eye on for the longest time.
>>> I'm saying it's of strategic importance because CTFE technology,
>>> though not new and already available with some languages, has unique
>>> powers when combined with other features of D. With CTFE we get to do
>>> things that are quite literally impossible to do in other languages.
>>
>> For example?
>
> For example integrated lexer and parser generators!
>

...With statically-checked symbol names that can include arbitrary 
characters:

Sym!"+"
Sym!"{"
Sym!"<Foo>"
etc...
February 29, 2012
Re: Lexer and parser generators using CTFE
"Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message 
news:jiljsg$193s$1@digitalmars.com...
> On 2/28/12 1:52 PM, Dmitry Olshansky wrote:
>> - have reasonable compile times and memory consumption (though it will
>> only improve over time)
>
> Yes. I guess PEGs have problems there.
>

Probably LR, too, unless you build the state tables in a separate prior 
build step.

>> There is prototype of interactive regex matcher that
>> works directly on stream (buried in std.regex), it even passed dry-run
>> unittests back then. Though I had to postpone till I/O is sorted out. I
>> really loved Steven's design with it's easy access to buffer and well
>> thought out primitives, hope it will come about sometime soon.
>
> An interactive regex would be a dream come true to me...
>

One thing I've been wanting to see in a lexer's regex would be a way to 
match things like D's nested comments or:

qEOS fancy string blah blah EOS

Support for either one would probably render it "not *technically* a regex" 
but it should be entirely possible by just adding some extra data/processing 
to the states.
February 29, 2012
Re: Lexer and parser generators using CTFE
"Nick Sabalausky" <a@a.a> wrote in message 
news:jilmhr$1dul$1@digitalmars.com...
> "Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message 
> news:jiljsg$193s$1@digitalmars.com...
>> On 2/28/12 1:52 PM, Dmitry Olshansky wrote:
>>> - have reasonable compile times and memory consumption (though it will
>>> only improve over time)
>>
>> Yes. I guess PEGs have problems there.
>>
>
> Probably LR, too, unless you build the state tables in a separate prior 
> build step.
>

Erm, I mean LALR specifically.
February 29, 2012
Re: Lexer and parser generators using CTFE
On 2/29/12 11:15 AM, Nick Sabalausky wrote:
> "Andrei Alexandrescu"<SeeWebsiteForEmail@erdani.org>  wrote in message
> news:jiljsg$193s$1@digitalmars.com...
>> On 2/28/12 1:52 PM, Dmitry Olshansky wrote:
>>> - have reasonable compile times and memory consumption (though it will
>>> only improve over time)
>>
>> Yes. I guess PEGs have problems there.
>>
>
> Probably LR, too, unless you build the state tables in a separate prior
> build step.

It's been a while since I read about PEGs (the packrat parser paper), 
but think it's a more fundamental issue with PEGs because they need to 
memoize several possible parses of the input.

Andrei
February 29, 2012
Re: Lexer and parser generators using CTFE
"Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message 
news:jilnjg$1eu9$3@digitalmars.com...
> On 2/29/12 11:15 AM, Nick Sabalausky wrote:
>> "Andrei Alexandrescu"<SeeWebsiteForEmail@erdani.org>  wrote in message
>> news:jiljsg$193s$1@digitalmars.com...
>>> On 2/28/12 1:52 PM, Dmitry Olshansky wrote:
>>>> - have reasonable compile times and memory consumption (though it will
>>>> only improve over time)
>>>
>>> Yes. I guess PEGs have problems there.
>>>
>>
>> Probably LR, too, unless you build the state tables in a separate prior
>> build step.
>
> It's been a while since I read about PEGs (the packrat parser paper), but 
> think it's a more fundamental issue with PEGs because they need to memoize 
> several possible parses of the input.
>

I don't know much about packrat parsers, but what I meant is that generating 
LALR tables (which are then used by an LALR parser) can be very 
computationally expensive: For simple grammars, it's trivial, but for more 
complex "typical programming langauge" grammars, it can easily take upwards 
of quite a few minutes when *not* done in CTFE (at least on my underpowered 
machine). Although maybe my implementation and GOLD's implementation both 
just have some additional hidden scaling issue that isn't inherent to the 
algorithms?

Actually using an *already*-generated LALR table (which only needs to be 
generated once per grammar, not once per input) to parse doesn't have major 
inherent efficiency issues compared to other parsing algorithms, AFAIK.
February 29, 2012
Re: Lexer and parser generators using CTFE
On Wed, 29 Feb 2012 17:41:19 +0100, Timon Gehr <timon.gehr@gmx.ch> wrote:

> On 02/28/2012 07:46 PM, Martin Nowak wrote:
>>
>> https://gist.github.com/1255439 - lexer generator
>> https://gist.github.com/1262321 - complete and fast D lexer
>>
>
> Well, it is slower at lexing than DMD at parsing. What is the bottleneck?

No, it's as fast as dmd's lexer.

Writing the tokens to stdout takes a lot of time though.
Just disable the "writeln(tok);" in the main loop.
There is also an --echo option which recreates the complete
source from the tokens.
1 2 3 4 5 6 7 8 9
Top | Discussion index | About this forum | D home