October 19, 2006
Walter Bright wrote:
> Bill Baxter wrote:
> 
>> But given Don's experiments with compile-time text parsing in D, it's conceivable that in D the above parser could just be created with:
>>
>>    r = make_parser("real_number (',' real_number)*");
>>
>> I.e. use the EBNF version directly in a string literal that gets parsed at compile time.
>> That would be pretty cool.
> 
> 
> Yes, it would be. But there's a catastrophic problem with it. Spirit enables code snippets to be attached to terminals by overloading the [] operator. If the EBNF was all in a string literal, this would be impossible.


How about delegate literals for code snipits??


template parse(char[] rulename, char[] rule, T, T delegate(T[]) act)
{
	// mixin allows this to be used in the scope
	// parse!(rulename)(out T);

	// for rule ==
	// "AssignExpression | AssignExpression ',' Expression",
	// parse!(rulename) expand to something like this

		// mixin a specialization
	bool parse!(rulename)(out T ret)
	{
		T[2] set
		if(T* ret = rule!("AssignExpression")(set[0]))
		{
			ret = act[0](set);
			return true;
		}
		if(rule!("AssignExpression")(set[0]) &&
			rule!("Expression")(set[1]))
		{
			ret = act[1](set);
			return true;
		}
		false;
	}
	
}

///used something like this

class Parser : RootParser
{
	mixin parse("Expression",
			"
			AssignExpression |
			AssignExpression ',' Expression
			",
		Expr,
		[
		(Expr[] e){return e[0];},
		(Expr[] e){return new Expr(e[0], e[1]);}
		]
		);

	mixin parse("AssignExpression",
			"
			ConditionalExpression |
			ConditionalExpression '=' AssignExpression |
			ConditionalExpression '+=' AssignExpression
			"
		Expr,
		[
		(Expr[] e){return e[0];},
		(Expr[] e){return new AssignExper(e[0], e[1]);},
		(Expr[] e){return new AssignExper(e[0], e[1]);}
		]
		);

}

//// I've never used mixins so I most likely have something wrong in there
October 19, 2006
Pragma wrote:
> Bill Baxter wrote:
>> [snip]
>  >
>> Though, you know, even thinking about Boost::Spirit, I have to wonder if it really is necessary.  From the intro it says that it's primary use is "extremely small micro-parsers", not a full blown language processor. But if that's the target then the runtime overhead of translating the EBNF description to a parser would be pretty trivial.  So I guess the real benefit of a compile-time parser-generator is that your grammar can be _verified_ at compile-time.
> 
>  From what I gather, that's the major benefit, other than a "self-documenting design".  All the "prettyness" of using a near EBNF syntax in C++ code gets you close enough to actual EBNF that it's apparent what and how it functions.
> 
> However, the only problem with composing this as an EBNF compile-time parser, is that you can't attach actions to arbitrary terminals without some sort of binding lookup.  I'm not saying it's impossible, but it'll be a little odd to use until we get some stronger reflection support.
> 
> But what you're suggesting could just as easily be a Compile-Time rendition of Enki. It's quite possible to pull off.  Especially if you digest the grammar one production at a time as to side-step any recursion depth limitations when processing the parser templates. :)

Yes!  Sounds like we're thinking along the same lines here.  But if Walter's right, that the compile-time verification is not a big deal, then it would be even simpler.

Actually it sounds very similar to the way writing shader code for OpenGL/Direct3D works.  You have to compile the code it to use it, but conveniently compilation is so fast that you can do it at run-time easily.  Or if you prefer, you can still precompile it.  What I like to do is set up my IDE to go ahead and precompile my shaders just so I can check for errors at compile time, but then I use the runtime compilation in the end anyway because that makes some things easier -- like modifying the code on the fly.

It actually works pretty well I think.  The only difference between shader code and grammar code is that shader code doesn't need to make any callbacks.  But callbacks aren't hard.

> auto grammar = new Parser(
>   Production!("Number ::= NumberPart {NumberPart}",
>     // binding attached to production ('all' is supplied by default?)
>     void function(char[] all){
>       writefln("Parsed Number: %s",all);
>     }
>   ),
>   Production!("NumberPart ::= Sep | Digit "),
>   Production!("Digit ::= 0|1|2|3|4|5|6|7|8|9"),
>   Production!("Sep ::= '_' | ','")
> );
> 
> // call specifying start production
> grammar.parse("Number",myInput);

That's one way to do it, but I think you could also allow bindings to be attached after the fact:

 auto grammar = new Parser(
     "Number ::= NumberPart {NumberPart}
      NumberPart ::= Sep | Digit
      Digit ::= 0|1|2|3|4|5|6|7|8|9
      Sep ::= '_' | ','");
   );

 grammer.attach("Number",
     // binding attached to production ('all' is supplied by default?)
     void function(char[] all){
       writefln("Parsed Number: %s",all);
     })

This is _exactly_ how parameter binding works in shader code.  Just here the value we're binding is a function pointer instead of a texture coordinate or something.

> Depending on how you'd like the call bindings to go, you could probably go about as complex as what Enki lets you get away with.  But you'll have to accept a 'soft' binding in there someplace, hence you loose the type/name checking benefits of being at compile time.

I'll have to take your word for it.  You mean in Enki you can say that Number has to output something convertible to 'real'?

>> I wonder if it would be any easier to make a compile-time grammar verifier than a full blown parser generator?   Then just do the parser-generating at runtime.
> 
> Maybe I don't fully understand, but I don't think there's a gain there.  If you've already gone through the gyrations of parsing the BNF expression, it's hardly any extra trouble to do something at each step of the resulting parse tree*.
> 
> (* of course template-based parsers use the call-tree as a parse-tree but that's besides the point)

Yeh, I was just talking crap.  I thought maybe you might be able to save some bookkeeping if all you cared about was that the grammar made a valid tree, but didn't care about it's output.  But probably it's the other way around.  Checking validity is the hard part, not making a tree.

--bb
October 19, 2006
Richard Koch wrote:
> Bill Baxter wrote:
>> Walter Bright wrote:

>> // spirit:
>> num_p >> *( ch_p(',') >> num_p)
>>
>> // C#
>> Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit)))
>>
>>
> all that is cool, but (i know i am the dummy here) readability as in bnf is something that eludes me. better to go for coco?
> 

You mean this coco?
http://www.ssw.uni-linz.ac.at/Coco/

Not sure what you mean by coco being more readable than BNF.  Coco's grammar looks pretty much like BNF to me.

----- from Taste.atg -----
VarDecl                  (. wchar_t* name; int type; .)
= Type<type>
  Ident<name>            (. tab->NewObj(name, var, type); .)
  { ',' Ident<name>      (. tab->NewObj(name, var, type); .)
  } ';'.
--------------------------

As far as I can tell that's just good old EBNF with some notations.
   VarDecl ::= Type Ident ("," Ident)* ";"

--bb
October 19, 2006
Bill Baxter wrote:
> Pragma wrote:
>> Bill Baxter wrote:
>>> [snip]
>>  >
>>> Though, you know, even thinking about Boost::Spirit, I have to wonder if it really is necessary.  From the intro it says that it's primary use is "extremely small micro-parsers", not a full blown language processor. But if that's the target then the runtime overhead of translating the EBNF description to a parser would be pretty trivial.  So I guess the real benefit of a compile-time parser-generator is that your grammar can be _verified_ at compile-time.
>>
>>  From what I gather, that's the major benefit, other than a "self-documenting design".  All the "prettyness" of using a near EBNF syntax in C++ code gets you close enough to actual EBNF that it's apparent what and how it functions.
>>
>> However, the only problem with composing this as an EBNF compile-time parser, is that you can't attach actions to arbitrary terminals without some sort of binding lookup.  I'm not saying it's impossible, but it'll be a little odd to use until we get some stronger reflection support.
>>
>> But what you're suggesting could just as easily be a Compile-Time rendition of Enki. It's quite possible to pull off.  Especially if you digest the grammar one production at a time as to side-step any recursion depth limitations when processing the parser templates. :)
> 
> Yes!  Sounds like we're thinking along the same lines here.  But if Walter's right, that the compile-time verification is not a big deal, then it would be even simpler.
> 
> Actually it sounds very similar to the way writing shader code for OpenGL/Direct3D works.  You have to compile the code it to use it, but conveniently compilation is so fast that you can do it at run-time easily.  Or if you prefer, you can still precompile it.  What I like to do is set up my IDE to go ahead and precompile my shaders just so I can check for errors at compile time, but then I use the runtime compilation in the end anyway because that makes some things easier -- like modifying the code on the fly.
> 
> It actually works pretty well I think.  The only difference between shader code and grammar code is that shader code doesn't need to make any callbacks.  But callbacks aren't hard.
> 
>> auto grammar = new Parser(
>>   Production!("Number ::= NumberPart {NumberPart}",
>>     // binding attached to production ('all' is supplied by default?)
>>     void function(char[] all){
>>       writefln("Parsed Number: %s",all);
>>     }
>>   ),
>>   Production!("NumberPart ::= Sep | Digit "),
>>   Production!("Digit ::= 0|1|2|3|4|5|6|7|8|9"),
>>   Production!("Sep ::= '_' | ','")
>> );
>>
>> // call specifying start production
>> grammar.parse("Number",myInput);
> 
> That's one way to do it, but I think you could also allow bindings to be attached after the fact:
> 
>  auto grammar = new Parser(
>      "Number ::= NumberPart {NumberPart}
>       NumberPart ::= Sep | Digit
>       Digit ::= 0|1|2|3|4|5|6|7|8|9
>       Sep ::= '_' | ','");
>    );
> 
>  grammer.attach("Number",
>      // binding attached to production ('all' is supplied by default?)
>      void function(char[] all){
>        writefln("Parsed Number: %s",all);
>      })
> 
> This is _exactly_ how parameter binding works in shader code.  Just here the value we're binding is a function pointer instead of a texture coordinate or something.
> 
>> Depending on how you'd like the call bindings to go, you could probably go about as complex as what Enki lets you get away with.  But you'll have to accept a 'soft' binding in there someplace, hence you loose the type/name checking benefits of being at compile time.
> 
> I'll have to take your word for it.  You mean in Enki you can say that Number has to output something convertible to 'real'?

Yes and no.  The parser generator has a good deal of flexibility built-in, including a pseudo-variant type that tries to perform conversions wherever possible.  For instance, if we re-wrote the production for Number like so:

Number
  = real handleNumber(whole,part)
  ::= (NumberPart {NumberPart}):whole '.'
      (NumberPart {NumberPart}):fraction;

... Enki would emit code that binds the chars traversed for for 'whole' and 'fraction', and passes those onto a function called 'handleNumber' that returns a real.  That return value is passed up parse chain so that other terminals can bind to it:

Foobar = writeMe(foo) ::= Number:foo;

And so on.

> 
>>> I wonder if it would be any easier to make a compile-time grammar verifier than a full blown parser generator?   Then just do the parser-generating at runtime.
>>
>> Maybe I don't fully understand, but I don't think there's a gain there.  If you've already gone through the gyrations of parsing the BNF expression, it's hardly any extra trouble to do something at each step of the resulting parse tree*.
>>
>> (* of course template-based parsers use the call-tree as a parse-tree but that's besides the point)
> 
> Yeh, I was just talking crap.  I thought maybe you might be able to save some bookkeeping if all you cared about was that the grammar made a valid tree, but didn't care about it's output.  But probably it's the other way around.  Checking validity is the hard part, not making a tree.
> 
> --bb
October 19, 2006
Walter Bright wrote:

> I think it would be worth looking at again. The C# version of it doesn't use operator overloading or even templates.

> 3) it's been touted as a reason why D is no good and C++ roolz

As a Spirit user, I can tell that readability is one of the most important factor. So, IMHO, if you want to target your 'point 3', I think you must have at least the same amount of operator overloading you can have in C++.

Or C++ guys will continue to argue that D is no good and C++ roolz. ;-P

---
Paolo Invernizzi
October 19, 2006
Paolo Invernizzi wrote:
> Walter Bright wrote:
> 
>> I think it would be worth looking at again. The C# version of it doesn't use operator overloading or even templates.
> 
>> 3) it's been touted as a reason why D is no good and C++ roolz
> 
> As a Spirit user, I can tell that readability is one of the most important factor. So, IMHO, if you want to target your 'point 3', I think you must have at least the same amount of operator overloading you can have in C++.
> 
> Or C++ guys will continue to argue that D is no good and C++ roolz. ;-P

You're probably right.
October 19, 2006
On Thu, 19 Oct 2006 00:12:41 +0300, Walter Bright <newshound@digitalmars.com> wrote:

> Bill Baxter wrote:
>> But given Don's experiments with compile-time text parsing in D, it's conceivable that in D the above parser could just be created with:
>>     r = make_parser("real_number (',' real_number)*");
>>  I.e. use the EBNF version directly in a string literal that gets parsed at compile time.
>> That would be pretty cool.
>
> Yes, it would be. But there's a catastrophic problem with it. Spirit enables code snippets to be attached to terminals by overloading the [] operator. If the EBNF was all in a string literal, this would be impossible.
>

Well, couldn't one use arrays to define actions? For example:

    r = make_parser("real_number[0] (',' real_number[1])*");

    array[] = {&firstAction,   //[0]
               &generalAction  //[1]
               };
    r.attach(array);


You could also give string ids for actions:

    r = make_parser("real_number[myaction] (',' real_number[real])*");

    array[] = {"myaction", &firstAction,
               "real", &generalAction
               };
    r.attach(array);
October 19, 2006
Not a bad idea.
October 19, 2006
Sorry about that garbald post, lets try that again:

Walter Bright wrote:

> Bill Baxter wrote:
>
>> But given Don's experiments with compile-time text parsing in D, it's conceivable that in D the above parser could just be created with:
>>
>>    r = make_parser("real_number (',' real_number)*");
>>
>> I.e. use the EBNF version directly in a string literal that gets parsed at compile time.
>> That would be pretty cool.
>
>
>
> Yes, it would be. But there's a catastrophic problem with it. Spirit enables code snippets to be attached to terminals by overloading the [] operator. If the EBNF was all in a string literal, this would be impossible.



How about delegate literals for code snipits??


template parse(char[] rulename, char[] rule, T, T delegate(T[]) act)
{
    // mixin allows this to be used in the scope
    // parse!(rulename)(out T);

    // for rule ==
    // "AssignExpression | AssignExpression ',' Expression",
    // parse!(rulename) expand to something like this

        // mixin a specialization
    bool parse!(rulename)(out T ret)
    {
        T[2] set
        if(T* ret = rule!("AssignExpression")(set[0]))
        {
            ret = act[0](set);
            return true;
        }
        if(rule!("AssignExpression")(set[0]) &&
            rule!("Expression")(set[1]))
        {
            ret = act[1](set);
            return true;
        }
        false;
    }

}

///used something like this

class Parser : RootParser
{
    mixin parse("Expression",
            "
            AssignExpression |
            AssignExpression ',' Expression
            ",
        Expr,
        [
        (Expr[] e){return e[0];},
        (Expr[] e){return new Expr(e[0], e[1]);}
        ]
        );

    mixin parse("AssignExpression",
            "
            ConditionalExpression |
            ConditionalExpression '=' AssignExpression |
            ConditionalExpression '+=' AssignExpression
            "
        Expr,
        [
        (Expr[] e){return e[0];},
        (Expr[] e){return new AssignExper(e[0], e[1]);},
        (Expr[] e){return new AssignExper(e[0], e[1]);}
        ]
        );

}

//// I've never used mixins so I most likely have something wrong in there
October 19, 2006
Walter Bright wrote:

> I disagree. I think the real benefit is avoiding reliance on an add-on tool. Such tools are a nuisance; making archival, maintenance, etc., clumsy.

I do not see why transferring a tool into a library should reduce any clumsiness.

The process of developing compilers is still more a piece of art than a piece of engineering and who can propose that the D community is able to establish such an engeneering principle by the lonely fact that D has templates and mixins?