Jump to page: 1 2 3
Thread overview
Anyone interested in a Spirit for D?
Oct 18, 2006
Walter Bright
Oct 18, 2006
Pragma
Oct 18, 2006
J Duncan
Oct 18, 2006
Walter Bright
Oct 18, 2006
Bill Baxter
Oct 18, 2006
Richard Koch
Oct 19, 2006
Bill Baxter
Oct 19, 2006
Paolo Invernizzi
Oct 19, 2006
Walter Bright
Oct 18, 2006
Bill Baxter
Oct 18, 2006
Walter Bright
Oct 18, 2006
Bill Baxter
Oct 19, 2006
BCS
Oct 19, 2006
Kristian
Oct 19, 2006
Walter Bright
Oct 19, 2006
Jacques
Oct 19, 2006
BCS
Oct 19, 2006
Karen Lanrap
Oct 18, 2006
Pragma
Oct 19, 2006
Bill Baxter
Oct 19, 2006
pragma
October 18, 2006
Along the lines of Don's regexp template metaprograms, is anyone interested in a Spirit-like parser generator capability in D?

http://spirit.sourceforge.net/

Apparently, someone has gotten Spirit to work with C#:

http://www.codeproject.com/useritems/spart.asp
October 18, 2006
Walter Bright wrote:
> Along the lines of Don's regexp template metaprograms, is anyone interested in a Spirit-like parser generator capability in D?
> 
> http://spirit.sourceforge.net/
> 
> Apparently, someone has gotten Spirit to work with C#:
> 
> http://www.codeproject.com/useritems/spart.asp

Now there's an idea!

Words of caution to follow:

FWIW, I looked into doing this years ago, and didn't get to far.  The biggest hurdle, aside from the limitations of templates at the time, was a lack of unary operators to override.  In particular, not being able to override unary '*' and '!' caused some cosmetic problems.

The only other major hangup I had was not having IFTI so I could instantiate templates transparently.  This feature alone could close the gap on most of Spirit's useage of C++ templates.  At a minimum, it means that a D programmer could get very close to the cosmetic appeal of Spirit (operator problems aside).

Don't get me wrong: I'm not a nay-sayer here.  I think this is very doable and worthwhile suggestion by Walter.  Folks should take it seriously. But it will require some design compromises and changes from the original - IMO, it'll probably require more of a re-write than a port.

-- 
- EricAnderton at yahoo
October 18, 2006
Pragma wrote:
> Walter Bright wrote:
>> Along the lines of Don's regexp template metaprograms, is anyone interested in a Spirit-like parser generator capability in D?
>>
>> http://spirit.sourceforge.net/
>>
>> Apparently, someone has gotten Spirit to work with C#:
>>
>> http://www.codeproject.com/useritems/spart.asp
> 
> Now there's an idea!
> 
> Words of caution to follow:
> 
> FWIW, I looked into doing this years ago, and didn't get to far.  The biggest hurdle, aside from the limitations of templates at the time, was a lack of unary operators to override.  In particular, not being able to override unary '*' and '!' caused some cosmetic problems.
> 
> The only other major hangup I had was not having IFTI so I could instantiate templates transparently.  This feature alone could close the gap on most of Spirit's useage of C++ templates.  At a minimum, it means that a D programmer could get very close to the cosmetic appeal of Spirit (operator problems aside).
> 
> Don't get me wrong: I'm not a nay-sayer here.  I think this is very doable and worthwhile suggestion by Walter.  Folks should take it seriously. But it will require some design compromises and changes from the original - IMO, it'll probably require more of a re-write than a port.
> 

Yeah Its a good idea, but my first thought was "is that even possible?" It wont be spirit, but a lexer in the uh spirit of spirit :)
October 18, 2006
Pragma wrote:
> Words of caution to follow:
> 
> FWIW, I looked into doing this years ago, and didn't get to far.  The biggest hurdle, aside from the limitations of templates at the time, was a lack of unary operators to override.  In particular, not being able to override unary '*' and '!' caused some cosmetic problems.

I think the operator overloading aspect of Spirit is only a minor part of the implementation - in fact, just a pretty shell around it. It could all be done using functional notation.

> The only other major hangup I had was not having IFTI so I could instantiate templates transparently.  This feature alone could close the gap on most of Spirit's useage of C++ templates.  At a minimum, it means that a D programmer could get very close to the cosmetic appeal of Spirit (operator problems aside).

I think it would be worth looking at again. The C# version of it doesn't use operator overloading or even templates.


> Don't get me wrong: I'm not a nay-sayer here.  I think this is very doable and worthwhile suggestion by Walter.  Folks should take it seriously. But it will require some design compromises and changes from the original - IMO, it'll probably require more of a re-write than a port.

I think it would be a complete rewrite.

The reason I'm interested in it for D is that:

1) it's a pretty cool library
2) it's one of Boost's most popular ones
3) it's been touted as a reason why D is no good and C++ roolz
4) it's popular enough to have been a driving force behind improvements in C++ compilers
5) it would surely improve D
6) and last, and most importantly, it's very useful
October 18, 2006
Walter Bright wrote:
> Along the lines of Don's regexp template metaprograms, is anyone interested in a Spirit-like parser generator capability in D?
> 
> http://spirit.sourceforge.net/

Now that would be useful I think.

Take this example from the Spirit intro of code to make a parser for a list of real numbers:
   r = real_p >> *(ch_p(',') >> real_p);

In EBNF that's just:
   real_number ("," real_number)*

In C++ you have to get creative with the operator overloading there (prefix '*' used to denote the regexp Kleene star, '>>' used to separate tokens)

But given Don's experiments with compile-time text parsing in D, it's conceivable that in D the above parser could just be created with:

   r = make_parser("real_number (',' real_number)*");

I.e. use the EBNF version directly in a string literal that gets parsed at compile time.
That would be pretty cool.

Though, you know, even thinking about Boost::Spirit, I have to wonder if it really is necessary.  From the intro it says that it's primary use is "extremely small micro-parsers", not a full blown language processor. But if that's the target then the runtime overhead of translating the EBNF description to a parser would be pretty trivial.  So I guess the real benefit of a compile-time parser-generator is that your grammar can be _verified_ at compile-time.

I wonder if it would be any easier to make a compile-time grammar verifier than a full blown parser generator?   Then just do the parser-generating at runtime.


---
heh heh, this is fun.  From one of the code examples:

  typedef
   alternative<alternative<space_parser, sequence<sequence<
   strlit<const char*>, kleene_star<difference<anychar_parser,
   chlit<char> > > >, chlit<char> > >, sequence<sequence<
   strlit<const char*>, kleene_star<difference<anychar_parser,
   strlit<const char*> > > >, strlit<const char*> > >
   skip_t;

   skip_t skip;

That monster type signature was determined by deliberately forcing a compiler error and then copy-pasting the type from the resulting error message.  Too funny.  (Note that this as given not as the main way to use the library but as a way to eliminate some of the code bloat all the templates lead to -- another reason to not try to generate the parser at compile-time, but just verify it.)

At any rate the Spirit documentation seems to be rife with juicy comments of the form "yes it looks funky, but we're stuck with C++ here".  So it's a good place to get ideas for how to make things better.

--bb
October 18, 2006
Walter Bright wrote:
> I think it would be worth looking at again. The C# version of it doesn't use operator overloading or even templates.

Huh.  Very interesting.  Here's the example:

// spirit:
num_p >> *( ch_p(',') >> num_p)

// C#
Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit)))

Though it's definitely not as easy to read, I think I might actually prefer the C# version.  Part of the annoyance with Boost super-clever use of operator-overloading is that it can be a real pain to discover things because they don't have real names.

I bet the C# version could be compacted with some aliases or imports (assuming C# has these):
  Seq( Digit, Start( Seq(Ch(','), Digit)))

That doesn't look too bad to me.

Still it would rock the world if you could just do:
  parser("digit (',' digit)*");
and have the grammar be verified at compile-time.

> I think it would be a complete rewrite.
> 
> The reason I'm interested in it for D is that:
> 
> 1) it's a pretty cool library
> 2) it's one of Boost's most popular ones
> 3) it's been touted as a reason why D is no good and C++ roolz
> 4) it's popular enough to have been a driving force behind improvements in C++ compilers
> 5) it would surely improve D
> 6) and last, and most importantly, it's very useful

Excellent reasons.

--bb
October 18, 2006
Bill Baxter wrote:
> Walter Bright wrote:
>> I think it would be worth looking at again. The C# version of it doesn't use operator overloading or even templates.
> 
> Huh.  Very interesting.  Here's the example:
> 
> // spirit:
> num_p >> *( ch_p(',') >> num_p)
> 
> // C#
> Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit)))
> 
> Though it's definitely not as easy to read, I think I might actually prefer the C# version.  Part of the annoyance with Boost super-clever use of operator-overloading is that it can be a real pain to discover things because they don't have real names.
> 
> I bet the C# version could be compacted with some aliases or imports (assuming C# has these):
>   Seq( Digit, Start( Seq(Ch(','), Digit)))
> 
> That doesn't look too bad to me.
> 
> Still it would rock the world if you could just do:
>   parser("digit (',' digit)*");
> and have the grammar be verified at compile-time.
> 
>> I think it would be a complete rewrite.
>>
>> The reason I'm interested in it for D is that:
>>
>> 1) it's a pretty cool library
>> 2) it's one of Boost's most popular ones
>> 3) it's been touted as a reason why D is no good and C++ roolz
>> 4) it's popular enough to have been a driving force behind improvements in C++ compilers
>> 5) it would surely improve D
>> 6) and last, and most importantly, it's very useful
> 
> Excellent reasons.
> 
> --bb
all that is cool, but (i know i am the dummy here) readability as in bnf is something that eludes me. better to go for coco?

richard
October 18, 2006
Bill Baxter wrote:
> But given Don's experiments with compile-time text parsing in D, it's conceivable that in D the above parser could just be created with:
> 
>    r = make_parser("real_number (',' real_number)*");
> 
> I.e. use the EBNF version directly in a string literal that gets parsed at compile time.
> That would be pretty cool.

Yes, it would be. But there's a catastrophic problem with it. Spirit enables code snippets to be attached to terminals by overloading the [] operator. If the EBNF was all in a string literal, this would be impossible.

> Though, you know, even thinking about Boost::Spirit, I have to wonder if it really is necessary.  From the intro it says that it's primary use is "extremely small micro-parsers", not a full blown language processor. But if that's the target then the runtime overhead of translating the EBNF description to a parser would be pretty trivial.  So I guess the real benefit of a compile-time parser-generator is that your grammar can be _verified_ at compile-time.

I disagree. I think the real benefit is avoiding reliance on an add-on tool. Such tools are a nuisance; making archival, maintenance, etc., clumsy.

> At any rate the Spirit documentation seems to be rife with juicy comments of the form "yes it looks funky, but we're stuck with C++ here".  So it's a good place to get ideas for how to make things better.

Yup.
October 18, 2006
Bill Baxter wrote:
> [snip]
>
> Though, you know, even thinking about Boost::Spirit, I have to wonder if it really is necessary.  From the intro it says that it's primary use is "extremely small micro-parsers", not a full blown language processor. But if that's the target then the runtime overhead of translating the EBNF description to a parser would be pretty trivial.  So I guess the real benefit of a compile-time parser-generator is that your grammar can be _verified_ at compile-time.

From what I gather, that's the major benefit, other than a "self-documenting design".  All the "prettyness" of using a near EBNF syntax in C++ code gets you close enough to actual EBNF that it's apparent what and how it functions.

However, the only problem with composing this as an EBNF compile-time parser, is that you can't attach actions to arbitrary terminals without some sort of binding lookup.  I'm not saying it's impossible, but it'll be a little odd to use until we get some stronger reflection support.

But what you're suggesting could just as easily be a Compile-Time rendition of Enki. It's quite possible to pull off.  Especially if you digest the grammar one production at a time as to side-step any recursion depth limitations when processing the parser templates. :)

auto grammar = new Parser(
  Production!("Number ::= NumberPart {NumberPart}",
    // binding attached to production ('all' is supplied by default?)
    void function(char[] all){
      writefln("Parsed Number: %s",all);
    }
  ),
  Production!("NumberPart ::= Sep | Digit "),
  Production!("Digit ::= 0|1|2|3|4|5|6|7|8|9"),
  Production!("Sep ::= '_' | ','")
);

// call specifying start production
grammar.parse("Number",myInput);

Depending on how you'd like the call bindings to go, you could probably go about as complex as what Enki lets you get away with.  But you'll have to accept a 'soft' binding in there someplace, hence you loose the type/name checking benefits of being at compile time.

> 
> I wonder if it would be any easier to make a compile-time grammar verifier than a full blown parser generator?   Then just do the parser-generating at runtime.

Maybe I don't fully understand, but I don't think there's a gain there.  If you've already gone through the gyrations of parsing the BNF expression, it's hardly any extra trouble to do something at each step of the resulting parse tree*.

(* of course template-based parsers use the call-tree as a parse-tree but that's besides the point)

-- 
- EricAnderton at yahoo
October 18, 2006
Walter Bright wrote:
> Bill Baxter wrote:
>> But given Don's experiments with compile-time text parsing in D, it's conceivable that in D the above parser could just be created with:
>>
>>    r = make_parser("real_number (',' real_number)*");
>>
>> I.e. use the EBNF version directly in a string literal that gets parsed at compile time.
>> That would be pretty cool.
> 
> Yes, it would be. But there's a catastrophic problem with it. Spirit enables code snippets to be attached to terminals by overloading the [] operator. If the EBNF was all in a string literal, this would be impossible.

But maybe you could allow the user to access those terminals via strings:
   r.lookup_terminal("real_number").add_action(&func);
or just
   r.add_action("real_number", &func);


>> So I guess the real benefit of a compile-time parser-generator is that your grammar can be _verified_ at compile-time.
> 
> I disagree. I think the real benefit is avoiding reliance on an add-on tool. Such tools are a nuisance; making archival, maintenance, etc., clumsy.

Hmm.  Well if no external tools is the main benefit, then simply making Lex/Yacc (or more apropriately, Enki) into a library should be sufficient.  I guess you do need some way to attach code to terminals at runtime, but that's doable via various existing callback mechanisms. The machinery needed is basically the same as signals/slots.  You just need to be able to do something like
        connect(ASTreeNode.accept(), mycode);
 at runtime.

Then you should be able to get this kind of thing to work:

   auto r = make_parser_node("real_number (',' real_number)*");
   r.add_action("real_number", &func);

using nothing but runtime parsing of the grammar to build your AST.  No fancy templates needed, except perhaps in adding the callback to &func.

That kind of thing could be done in C++ too.

--bb
« First   ‹ Prev
1 2 3