Thread overview
"spirit" like template parser generator in 100 Lines of code (almost)
Dec 18, 2006
BCS
Dec 19, 2006
BCS
Dec 19, 2006
Don Clugston
Dec 19, 2006
BCS
More Thoughts [was: "spirit" in 100LOC]
Dec 19, 2006
BCS
December 18, 2006
(The almost is with regards to it working not the 100 LOC, if all blank lines and comments are removed it is exactly 100 LOC.)

It actual doesn't work because of several things. However, some of these are bugs (I think) and the rest I am putting forward as feature requests.

What I have actually done is make an almost working prototype of a template system that generates functions for a recursive decent parser. It is to be mixed into a class or struct. The parameter for the template is a string containing almost normal BNF reduction definitions.

It extends BNF by allowing "|" for "or" but doesn't implement optional or repeated terms.

It isn't implemented yet, but actions will be defined by providing specializations for the "Action" template.

Example:

<code>
struct P
{
 char[] data;
 int pos;

 PObject Action(char[]string:"ADD_ACT")(PObject l, PObject e, PObject r)
 {
  Numeric ln = cast(Numeric) l;
  Numeric rn = cast(Numeric) r;
  return new AddExp(ln, rn);
 }
 ... //some more actions

 PObject ParseRule(char[] string : "PLUS")()
 {
  if (data[pos] == '+')
  {
   pos++;
   return PObject.pass;
  }
  return PObject.fail;
 }
 ... // more terminals

// ADD_EXP is the name of the reduction
// MUL2ADD, ADD_ACT, SUB_ACT, CAT2ADD are action names
 mixin Parser!("ADD_EXP :
      MUL2ADD / MUL_EXP |
      ADD_ACT / ADD_EXP PLUS MUL_EXP |
      SUB_ACT / ADD_EXP MINUS MUL_EXP |
      CAT2ADD / CAT_EXP;"); // also has more non terminals.
}

vod main()
{
 P p;
 p.data = GetData();
 p.pos = 0;
 if(ret = cast(AddExp)p.ParseRule!("ADD_EXP")())
  //good
}
</code>

(The templates are at the bottom of the post)

Now for why it doesn't work.

===Bugs:

-- Mixins don't work quite right. --
this for instance doesn't work unless the second line in foo is removed:

template First(char[] string) { const char[] First = string[0..1] }
template Temp(char[] string)
{
 static if(string != "")
 {
  void TempFn(char[] s : First!(string)) {}
  mixin Temp!(string[1..$]);
 }
}
struct S
{
 mixin Temp!("ab");
 void Foo()
 {
  TempFn!("a")();
  TempFn!("b")();// this fails
 }
}

what seems to be happening is that only the first iteration of the template is actual mixed in. Use of pragma(msg,...) seems to show that template does recurse correctly.

I think this is a bug

-- The parameter on a foreach isn't const. --
Actual this doesn't stop anything from working, it just makes some things more difficult.

foreach(i;TupleMaker!())
 Template!(i)();

becomes:

alias TupleMaker!() tempTuple
foreach(j,_;tempTuple)
{
 const i = tempTuple[j];
 Template!()();
}

This is not a trivial complaint. While I was working on my BigNum template I noticed that the const variable adds ops to the function even with the -O flag.


===New Feature Needed To Make This Work.


-- Abstract Templates --

The idea here is that a template marked as abstract would have no generic implementation, only specializations. Furthermore the compiler would assume that any given specializations it is asked for exists and leave it up to the linker to find the correct implementation. this would allow cyclical calls by specializations of a template.

abstract template foo(char[] s)
{
 void foo(uint);
}

template foo(char[] s : "A")
{
 void foo(uint i)
 {  // nothing generated here
  if(i) foo!("B")(i-1);
 }
}

template foo(char[] s : "B") // foo!("B") generated here.
{
 void foo(uint i)
 {
  if(i>1) foo!("A")(i-2);
  if(i>1) foo!("C")(0); // will cause a link error
 }
}

-- "static foreach" at same level as "static if" --

two parts:
 make it "static forach"
 allow it the same places as "static if"

template foo(char[] string)
{
 static foreach(i;BreakBy!(string,','))
  mixin Other!(i);
}

this isn't strictly needed to make my template work, but it makes it a whole lot cleaner.


===================


Now for what you all have been waiting for!!!!!

origonal formatting at:
http://www.webpages.uidaho.edu/~shro8822/dspirit.d

<code>
/************
 Discard leading whitespace
*/
template DropWhite(char[] string)
{
 static if(string == "")
  alias string DropWhite;
 else static if(string[0] == ' ' || string[0] == '\t' || string[0] == '\r' || string[0] == '\n')
  alias DropWhite!(string[1..$]) DropWhite;
 else
  alias string DropWhite;
}

/************
 return a slice upto the first whitespace char
*/
template UseTillWhite(char[] string)
{
 static if(string == "" || string[0] == ' ' || string[0] == '\t' || string[0] == '\r' || string[0] == '\n')
  const char[] UseTillWhite = "";
 else
  const char[] UseTillWhite = string[0] ~ UseTillWhite!(string[1..$]);
}

/************
 return a slice starting with the first whitespace char
*/
template DiscardTillWhite(char[] string)
{
 static if(string == "" || string[0] == ' ' || string[0] == '\t' || string[0] == '\r' || string[0] == '\n')
  alias string DiscardTillWhite;
 else
  alias DiscardTillWhite!(string[1..$]) DiscardTillWhite;
}

/************
 Return a slice up-to but not including the first instance of t.
*/
template UseTill(char[] string, char t)
{
 static if(string == "" || string[0] == t)
  const char[] UseTill = "";
 else
  const char[] UseTill = string[0..1] ~ UseTill!(string[1..$],t);
}

/***********
 Return a slice starting after the first instance of t and containing the rest of the string.
*/
template DiscardTill(char[] string, char t)
{
 static if(string == "")
  const char[] DiscardTill = "";
 else static if(string[0] == t)
  const char[] DiscardTill = string[1..$];
 else
  const char[] DiscardTill = DiscardTill!(string[1..$],t);
}

/***********
 discard [ ]*
 then return [a-zA-Z_][a-zA-Z0-9_]*
*/
template GetID(char[] string) { alias GetID_A!(DropWhite!(string)) GetID; }
template GetID_A(char[] string)
{
 static if(string == "")
  const char[] GetID_A = "";
 else static if('_' == string[0] || ('a' <= string[0] && string[0] <= 'z') || ('A' <= string[0] && string[0] <= 'Z'))
  const char[] GetID_A = string[0..1] ~ GetID_B!(string[1..$]);
 else
  const char[] GetID_A = "";
}
template GetID_B(char[] string)
{
 static if(string == "")
  const char[] GetID_B = "";
 else static if('_' == string[0] || ('a' <= string[0] && string[0] <= 'z') || ('A' <= string[0] && string[0] <= 'Z') || ('0' <= string[0] && string[0] <= '9'))
  const char[] GetID_B = string[0..1] ~ GetID_B!(string[1..$]);
 else
  const char[] GetID_B = "";
}

/***********
 mixin temp with {V + 'string' broken up by 'c'}
*/
template BreakBy(char[] string, char d, V...)
{
 static if(string == "")
  alias V BreakBy;
 else static
  alias BreakBy!(DiscardTill!(string,d), d, V, UseTill!(string,d)) BreakBy;
}

/***********
 mixin temp with {V + 'string' broken up by 'c'}
*/
template SplitWhite(char[] string, V...)
{
 static if(DropWhite!(string).length == 0)
  alias V SplitWhite;
 else
  alias SplitWhite!(DropWhite!(DiscardTillWhite!(string)), V, UseTillWhite!(DropWhite!(string))) SplitWhite;
}

/****
PARSER  : RULE PARSER | ;
RULE  : ID.name ":" CASE ALT ";";
ALT  : "|" CASE ALT | ;
CASE  : ID.action "/" SEQUENCE;
SEQUENCE : ID SEQUENCE | ;
*/
template Parser(char[] string)
{
 static if(string != "") // test for PARSER.2
 {
   // get RULE
  private static const char[] rule = DropWhite!(UseTill!(string, ';'));

   // report
  pragma(msg, rule);

   // instance RULE
  bool ParseRule(char[] name : GetID!(rule))()
  {
    // record start location
   uint start = this.pos;
    // try caes
   alias BreakBy!(DropWhite!(DiscardTill!(rule,':')),'|') cases;
   caseLoop: foreach(ci,c;cases)
   {
    const char[] Case = cases[ci];
    pragma(msg, "caseLoop: "~Case);
     // return to start location
    this.pos = start;

     // allocate storage for returns
    alias SplitWhite!(DiscardTill!(Case,'/')) clauses;

     // try clauses
    foreach(index, cl;clauses)
    {
     const char[] clause = clauses[index];
     pragma(msg, "clauseLoop: "~clause);
      // parse item
     bool fail = ParseRule!(GetID!(clause))();
      // if failed try next case
     if(fail) continue caseLoop;
      // otherwise try next clause
    }
     // enact reduction when all clauses fit.
    return false;
   }

   return true;
  }
   // recur on remainder of PARSER
  mixin Parser!(DiscardTill!(string, ';'));
 }
}
</code>
December 19, 2006
More work and some other thoughts.

BCS wrote:
>[...]
> What I have actually done is make an almost working prototype of a template system that generates functions for a recursive decent parser. It is to be mixed into a class or struct. The parameter for the template is a string containing almost normal BNF reduction definitions.
> 
>[...]
>
> It isn't implemented yet, but actions will be defined by providing specializations for the "Action" template.
>

This is now done in one form, I'm not sure I quite like it though...

[...]
> Now for why it doesn't work.
> 
> ===Bugs:
> 
> -- Mixins don't work quite right. --
> 
> what seems to be happening is that only the first iteration of the template is actual mixed in. Use of pragma(msg,...) seems to show that template does recurse correctly.
> 
> I think this is a bug

A bit more work and it seems that a mixin can't add new specializations to a template in the scope that is mixed into.

template a(){template b(char : 'c'){uint i;}}

struct S
{
	template b(char : 'd'){uint i;}
	mixin a!();
	// only b!('d') avalable
	// b!('c') masked by 'd' version
}

>[...]
> ===New Feature Needed To Make This Work.
> 
> 
> -- Abstract Templates --
> 
> The idea here is that a template marked as abstract would have no generic implementation, only specializations. Furthermore the compiler would assume that any given specializations it is asked for exists and leave it up to the linker to find the correct implementation. this would allow cyclical calls by specializations of a template.
> 

After a bit of playing around I'm not sure that this is strictly needed in this case. The order that templates are evaluated in may make it unneeded. I'd have to play around with it more to know for sure. However having the concept of an "abstract template" would remove this from the realm of "implementation specific details".

source:
http://www.webpages.uidaho.edu/~shro8822/dspirit.d

exmaple:
http://www.cs.uidaho.edu/~temp0092/test_t.d
December 19, 2006
BCS wrote:
> -- The parameter on a foreach isn't const. --
> Actual this doesn't stop anything from working, it just makes some things more difficult.
> 
> foreach(i;TupleMaker!())
>  Template!(i)();
> 
> becomes:
> 
> alias TupleMaker!() tempTuple
> foreach(j,_;tempTuple)
> {
>  const i = tempTuple[j];
>  Template!()();
> }
> 
> This is not a trivial complaint. While I was working on my BigNum template I noticed that the const variable adds ops to the function even with the -O flag.

Another alternative workaround is this:
-------
template countTo(int a, A...)
{
    static if (a==0) alias A countTo;
    else static if (a&1) alias countTo!(a-1, void, A) countTo;
    else alias countTo!(a/2, A, A) countTo;
}

foreach(j,_; countTo!(tempTuple.length)) {
  const i = tempTuple[j];
}
In fact, if you can calculate the value of 'i' based only on 'j', this can be a much more general solution, for example for your Bignum template. It is very quick, can cope with numbers as high as 20000 or so, and doesn't generate any runtime code (in fact, it can't, because there's nothing you can do with a tuple of 20000 voids, except count it).
It works well for cases like:

foreach(i,_; countTo!(s.length)) {
  static if (s[i]=='+') { ... }
  ...
}
allowing you to iterate over the contents of a string.
There are some pretty cool things that work here -- you can add labels in one iteration, and generate goto statements (or asm jmp instructions) in a different iteration. I don't know how that works, but it's awesome because it lets you turn a compile-time string into a run-time state machine...
December 19, 2006
Don Clugston wrote:
> Another alternative workaround is this:
> -------
> template countTo(int a, A...)
> {
>     static if (a==0) alias A countTo;
>     else static if (a&1) alias countTo!(a-1, void, A) countTo;
>     else alias countTo!(a/2, A, A) countTo;
> }
> 
> foreach(j,_; countTo!(tempTuple.length)) {
>   const i = tempTuple[j];
> }
> In fact, if you can calculate the value of 'i' based only on 'j', this can be a much more general solution, for example for your Bignum template. It is very quick, can cope with numbers as high as 20000 or so, and doesn't generate any runtime code (in fact, it can't, because there's nothing you can do with a tuple of 20000 voids, except count it).
> It works well for cases like:
> 
> foreach(i,_; countTo!(s.length)) {
>   static if (s[i]=='+') { ... }
>   ...
> }
> allowing you to iterate over the contents of a string.
> There are some pretty cool things that work here -- you can add labels in one iteration, and generate goto statements (or asm jmp instructions) in a different iteration. I don't know how that works, but it's awesome because it lets you turn a compile-time string into a run-time state machine...

That would be interesting to try. In fact the use of a "make dummy tuple list" meta function was the next thing I planed to do for BigNum. (BTW, my first hack at it soaked up my whole system with a~=5k)

OTOH that still doesn't solve the "instance for each" case. For that you still need a local const. Anyway, neat stuff.




December 19, 2006
After a bit of consideration and puzzling, I think I have figured out what is going on with the mixins, templates and specializations. From this, I'm not sure if what is happening is a bug or not. However it definitely prevents my program from working and blocks off a whole set of functionality for D templates.

*What I'm trying to do:*

I'm trying to mixin several function templates into the same scope, all with the same name, but differently specialized. This would effectively allow for identifier generation from strings.

void Foo(char[] s: "hello")(){}
void Foo(char[] s: "world")(){}

void fn(char[] string)()
{
 Foo!(string)();	// go find a Foo to call
}

The usefulness of this comes in when the string that is used comes from template processing. A dumb example of this could convert a string to an expression evaluation (why someone would want to do that...)

Eval!("SET this FROM that USING something over there")();
goes to:
Val!("this")=Act!("that")(Val!("something"),Val!("over"),Val!("there"));

*What is Happening That Prevents This:*

Things are getting mixed in for the most part. However it seems specializations for a template must all reside in the same "mixin scope". Supposedly through the use of named mixins and aliases, overloading of functions using more than one mixin can work. However this doesn't seem to work when generating specializations. This seems to be the same problem as experienced by Bill Baxter, namely that function templates aren't functions, but templates with functions in them. This causes problems because some things that work for functions don't work for templates. Importantly, templates mixed in from different scope don't overload.

*What Would Be Needed To Allow This:*

Two things would be needed. Firstly, some means to make a mixed in template (a template inside of a template that is used as a mixin) overload with other templates at the scope it is mixed into. Without this, none of this will be of much use.
	Secondly, A problems involving order-of-instancing shows up with templates as it is. If two template specialization refer to each other, there is the possibility that one of them will try to instance the other before the specialization for the other is created. The result would be either a compile error or an attempt to instance the wrong template. On the other hand, if I'm not missing something, a compiler would also be correct to lazy evaluate specializations, in which case this problem may well not occur. Some way to ensure the desired behavior here would be needed.
	One solution would be to define an "abstract template" at the outermost scope. This template would have no implementation, but would specify what symbols are to be generated. Any use of this template would skip the argument deduction rules for templates and just assume that an exact specialization will exist. If one is not specified, then an error is generated later, probably at link time.

Template Mapper(char[] from, char[] to)
{
 template Map(char[] string: from)
 {
  const char[] Map = to;
 }
}

struct Foo
{
 abstract template Map(char[])
 {
  const char[] Map;
 }

 void fn()
 {
  writeln("%s", Map!("hello"));	// maps "hello" to "world"
  writeln("%s", Map!("good"));	// link error
 }

 mixin Mapper!("hello", "world");
}

[1] see mixin.html down at the bottom
[2] digitalmars.D.learn:"Q about template function overloads"
[3] see "Argument Deduction" in template.html