October 17, 2006
Walter Bright wrote:
> Don Clugston wrote:
>> The question is -- would this be worthwhile? I'm really not interested in making another toy.
>> It's straightforward, but tedious, and would double the length of std.regexp.
>> Would the use of templates be such a turn-off that people wouldn't use it?
>> Do the benefits exceed the cost?
> 
> Yes, for the following reasons:
> 
> 1) I think it would make for faster regexp's, more than just by omitting the compile step. That's because the bytecoded program wouldn't need to be interpreted.

I haven't worked out how to do backtracking sensibly when using mixins, so what I'm doing is simply compiling to the bytecoded program at compile time. Later on, it would be possible to add another step which looks for simple pattern types, and generates specific code for them (my guess is that 90% of all regexps use just one of a few simple categories).

> 2) When I show the current one to people, as opposed to Eric Niebler's C++ one, the response is "but the D one is just a toy" with the implication that D is just a toy.

<raise eyebrows> I wonder if anyone is actually using the C++ regexp in production code? Personally I get the feeling that so much of boost is extremely clever, but not terribly practical.

> 3) I wrote std.regexp long before people needed or asked for it. I knew it would eventually become a critically needed module, and that came to pass. It was nice that it was there when they needed it, and the time passed had ensured that it was a solid piece of code.
> 
> 4) Your crafting of the current toy version was the catalyst for a big leap forward in D's templating system. Doing a professional one may expose critical problems that need to be fixed, and even if no such flaws are discovered, it would prove that D's TMP capabilities are up to scratch.
> 
> Sure, a lot of people are turned off by templates. That's one of my motivations for making things like associative arrays usable without templates. But for the people who do use templates, this can be a very big deal.

There's a potential piece of syntax sugar that IMHO would be a killer feature. If it were possible to overload a function based on whether a parameter is a compile-time literal, or a variable, the use of template metaprogramming could become completely invisible to the user.

In my code, the situation is basically like this:
---------
class RegExp
{
  char [] program;

  void makeProgramAtCompileTime(char [] pattern)()
  {
	program = RegExpCompile!(pattern);
  }

  void makeProgramAtRunTime(char [] pattern)
  {
    program = runtimecompile(pattern);
  }
}
---------

Imagine if there were some way to create an alias
makeProgram(char [] pattern, int otherparameters)

which became
makeProgramAtCompileTime!(pattern)(otherparameters);
if pattern were a compile time literal,
but became
makeProgramAtRunTime(pattern, otherparameters);
if pattern were a variable.

Then the user wouldn't even need to know that they were using templates.
It would also let you do some pretty amazing stuff.
Consider something as basic as sin(x).

real sin_compiletime(real x)()
{
  static if (x==0) return x; // we can do this one
  else return sin_runtime(x); // everything else is too hard, pass it to the runtime version.
}

Which is much easier than training the optimiser that it can replace sin(0) with 0. For more complex situations, you could perform optimisations that are technically infeasible for a normal optimiser.
You could also catch more errors at compile time:

real sqrt_compiletime(real x)()
{
  static assert(x>=0, "Error, square root of negative number");
  return sqrt_runtime(x);
}

There probably aren't many cases where this kind of optimisation would actually be worth doing, but for those cases, the benefit could be considerable.

One possible syntax would be something like:

real sqrt(template real x)
{
  static if (is(x: const)) return sqrt_compiletime!(x);
  else return sqrt_runtime(x);
}

Another might be "template alias".

Something to think about; not a proposal at present.
October 17, 2006
Don Clugston wrote:
> Walter Bright wrote:
> 
>> Don Clugston wrote:
>>
>> 2) When I show the current one to people, as opposed to Eric Niebler's C++ one, the response is "but the D one is just a toy" with the implication that D is just a toy.
> 
> <raise eyebrows> I wonder if anyone is actually using the C++ regexp in production code? Personally I get the feeling that so much of boost is extremely clever, but not terribly practical.

That's certainly been my experience with boost.  It's hit-or-miss. shared_ptr stuff -- great.  functional and serialization -- clever, but ultimately a big waste of my time.  It's easier to write the dang ugly for loop than to figure out the right incantation of _1's and _2's and binders to get functional::map working.  I really wish boost had some sort of feedback form at the bottom of every page so people could rate the packages and make comments.  Kind of like the MathWorks site for user-contributed Matlab code.  Could be a good thing for D-source too.

--bb
October 17, 2006
rm wrote:
> Don Clugston wrote:
>> In the past, Eric and I both developed compile-time regex engines, but
>> they were proof-of-concept rather than something you'd actually use in
>> production code. I think this also applies to C++ metaprogramming regexp
>> engines, too.
>>
>> I've had a bit of play around with the regexp code in Phobos, and have
>> convinced myself that it would be straightforward to create a
>> compile-time wrapper for the existing engine.
>>
>> Usage could be something like:
>> --------
>> void main()
>> {
>>     char [] s = "abcabcabab";
>>          // case insensitive search
>>     foreach(m; rexSearch!("ab+", "i")(s))
>>     {
>>         writefln("%s[%s]%s", m.pre, m.match(0), m.post);
>>     }
>> }
>> --------
>>
>> It would behave *exactly* like the existing std.regexp, except that
>> compilation into the internal form would happen via template
>> metaprogramming, so that
>> (1) all errors would be caught at compile time, and
>> (2) there'd be a minor speedup because the compilation step would not
>> happen at runtime, and
>> (3) otherwise it wouldn't be any faster than the existing regexp.
>> However, there'd be no template code bloat, either.
>>
>> Existing code would be unchanged. You could even write:
>>
>> Regexp a = StaticRegExp!("ab?(ab*)+", "g");
>>
>> (assign a pre-compiled regular expression to an existing phobos RegExp).
>>
>> There's potentially a greater speedup possible, because the Regexp class
>> could become a struct, with no need for any dynamic memory allocation;
>> but if this was done, mixing runtime and compile-time regexps together
>> wouldn't be as seamless. And of course there's load of room for future
>> enhancement.
>>
>> BUT...
>>
>> The question is -- would this be worthwhile? I'm really not interested
>> in making another toy.
>> It's straightforward, but tedious, and would double the length of
>> std.regexp.
>> Would the use of templates be such a turn-off that people wouldn't use it?
>> Do the benefits exceed the cost?
> 
> I'm not so far as looking into the current regexp module.
> But otoh I've already done some of the homework:
> 
> template findChar(char[] stringToSearch, char charToFind)
> {
>   static
>     if ( stringToSearch.length == 0
>        || stringToSearch[0] == charToFind )
>       const int findChar = 0;
>     else
>       const int findChar
>          = 1 + findChar!( stringToSearch[1..stringToSearch.length]
>                         , charToFind);
> }
> 
> gives the position of the char in the string, but if the position ==
> length of stringToSearch, charToFind is not present.
> 
> I've got some others as well, I can parse an string literal into an
> integer :-)
> 
> I'm willing to give a hand if you want.

Thanks.
Some of my old code is on dsource, in ddl/meta; it was much more difficult back then, when there were so many compiler bugs.
meta.nameof is the code I'm proudest of.

I have the basic framework in, and I've made progress on the character class stuff, which I think is the most complicated part (since it's full of bit masking operations on arrays). So far everything has been quite straightforward, and no innovation has been required.

A fantastically helpful thing you could do immediately would be to improve the code coverage of the unit tests in std.regexp. Currently it's only about 40%. For my purposes, I only need high test coverage, not necessarily sensible tests (I'll just test the compiled output). Sorry that it's not a very sexy job.
(Hint: we need some tests for the undocumented features like "a*?").
October 17, 2006
Don Clugston wrote:
> There's a potential piece of syntax sugar that IMHO would be a killer feature. If it were possible to overload a function based on whether a parameter is a compile-time literal, or a variable, the use of template metaprogramming could become completely invisible to the user.

That is a good idea.
October 17, 2006
Don Clugston wrote:
> 
> Imagine if there were some way to create an alias
> makeProgram(char [] pattern, int otherparameters)
> 
> which became
> makeProgramAtCompileTime!(pattern)(otherparameters);
> if pattern were a compile time literal,
> but became
> makeProgramAtRunTime(pattern, otherparameters);
> if pattern were a variable.
> 
> Then the user wouldn't even need to know that they were using templates.

I've always thought that meta-heavy libraries like Blitz++ were a noble attempt at something the language didn't really support, but the ability to mix compile and runtime code using the same call syntax would completely change things.

> One possible syntax would be something like:
> 
> real sqrt(template real x)
> {
>   static if (is(x: const)) return sqrt_compiletime!(x);
>   else return sqrt_runtime(x);
> }

This is pretty clear and straightforward.  Something along these lines would be a killer feature for 2.0.


Sean
October 18, 2006
On Tue, 17 Oct 2006 12:23:33 -0700, Walter Bright wrote:

> Don Clugston wrote:
>> There's a potential piece of syntax sugar that IMHO would be a killer feature. If it were possible to overload a function based on whether a parameter is a compile-time literal, or a variable, the use of template metaprogramming could become completely invisible to the user.
> 
> That is a good idea.

That put a crazy idea into my head.

disclaimer: My education is in physics not computer science, I have no idea of how to write a compiler so my view may be very wrong ;-)

I have previously suggested that D handle functions on compile-time
literal by making them inline and let the optimizer evaluate them.

Would it be possible to combine the template-engine and the optimizer
such that one of them could be refactored away ??
October 18, 2006
I don't know.
October 18, 2006
Knud Sørensen wrote:
> On Tue, 17 Oct 2006 12:23:33 -0700, Walter Bright wrote:
> 
>> Don Clugston wrote:
>>> There's a potential piece of syntax sugar that IMHO would be a killer feature. If it were possible to overload a function based on whether a parameter is a compile-time literal, or a variable, the use of template metaprogramming could become completely invisible to the user.
>> That is a good idea.
> 
> That put a crazy idea into my head.
> 
> disclaimer: My education is in physics not computer science, I have no idea of how to write a compiler so my view may be very wrong ;-)
> 
> I have previously suggested that D handle functions on compile-time
> literal by making them inline and let the optimizer evaluate them.  Would it be possible to combine the template-engine and the optimizer
> such that one of them could be refactored away ?? 
This is a very interesting question and I've thought a fair bit about this.

Three short answers:

1. This would be easy to do (ie it is only heavy syntax sugar) for a limited subset of the language (this subset would exclude reference semantics).

2. It would be possible to do for almost the entire language, but would require a real interpreter at compile time, which means a fair bit more work (but still possible and very appealing).

3. Neither of these two mechanisms would be enough to do away with the primary motivation for templates -- parameterised types. However, they would remove the need for a Turing-complete template system.


<rant>
Long answers:

1. Value semantics are inherently easier for a compiler to handle, because evaluation of them is just done by inlining, const folding and const propagation. Right now, an automated conversion of normal D functions to template form is pretty much possible, using the following rules:
   - convert every function to a template
   - convert every assignment to a variable into static single assignment form:
        a = 5;
        a = 6;
            becomes
        const a__1 = 5;
        const a__2 = 6;
   - convert function calls into template instantiations
        a = foo();
            becomes
        const a__1 = foo!().__val;
   - convert returns into
       const __val = whatever;
   - now that everything is constant, we can use static if instead of if everywhere. So, just convert all ifs into static ifs.
   - convert loops into recursive functions:
      while (condition)
      {  ... }
        becomes
      template __Loop1(local_variables....)
      {
         // Do stuff
         static if (condition) __val = _Loop10!(new_local_variables).__val;
      }

This basically does it but, since the template system only allows value semantics, this conversion must only allow them, so it means:
  - no pointers
  - no classes
  - no index-setting/slice-setting of arrays
  - no inout/out parameters
and also
  - no assembly
  - no C code
because the template system can't be expected to handle them.


2. By implementing a real interpreter in the compiler, you can handle reference semantics. The only things you can't handle are
  - assembly
  - other languages
  - global non-const variables (which should generally be avoided anyway)




Both of these systems would incorporate a way of saying, 'evaluate this at compile time', even if it were just as simple as declaring the variable const, using it as a template parameter or in a static if:

  version (good) { const int N = 80; }
  else { const int N = 16 }
  const foo = sqrt(N); // foo is 4
  static if (sqrt(foo) > 3) // that expression must be a compile-time constant
    alias ubyte MyByte;
  else
    alias byte MyByte;
  List!(foo, MyByte) myList;

The main motivations for this are, (in decreasing order of importance):
  - ability to use runtime functions for static if and template parameters (where static if is important because of
  - early catching of errors (eg incorrect regexes)
  - efficiency (eg not having to compile the regex at runtime; also, having cheap reflection at compile time)



In my opinion, fully supporting an interpreter at compile time simply turns metaprogramming into normal D with the requirement that it be optimized away by the compiler, and the extensibility and opportunities that this could allow is a huge motivation. However, being a very large amount of work with a lot of details, there is no hope of it being completed any time soon, but I think the motivation is so much that it is worth devoting an entire release of D (maybe 2.0 or 3.0) to being a 'metaprogramming release.'

</rant>

Cheers,

Reiner
October 18, 2006
Reiner Pope wrote:
> Knud Sørensen wrote:
>> On Tue, 17 Oct 2006 12:23:33 -0700, Walter Bright wrote:
>>
>>> Don Clugston wrote:

> Three short answers:
> 
> 1. This would be easy to do (ie it is only heavy syntax sugar) for a limited subset of the language (this subset would exclude reference semantics).
> 
> 2. It would be possible to do for almost the entire language, but would require a real interpreter at compile time, which means a fair bit more work (but still possible and very appealing).
> 
> 3. Neither of these two mechanisms would be enough to do away with the primary motivation for templates -- parameterised types. However, they would remove the need for a Turing-complete template system.
...
> Both of these systems would incorporate a way of saying, 'evaluate this at compile time', even if it were just as simple as declaring the variable const, using it as a template parameter or in a static if:
> 
>   version (good) { const int N = 80; }
>   else { const int N = 16 }
>   const foo = sqrt(N); // foo is 4
>   static if (sqrt(foo) > 3) // that expression must be a compile-time constant
>     alias ubyte MyByte;
>   else
>     alias byte MyByte;
>   List!(foo, MyByte) myList;
> 
> The main motivations for this are, (in decreasing order of importance):
>   - ability to use runtime functions for static if and template parameters (where static if is important because of
>   - early catching of errors (eg incorrect regexes)
>   - efficiency (eg not having to compile the regex at runtime; also, having cheap reflection at compile time)

This is exactly what I've been hoping someone would invent too.  It seems like the next logical step in template metaprogramming.  I think Walter's hip to the idea too.  Soon after the last big discussion about metaprogramming stuff and making it more like a true compile-time programming language, "static if" appeared.  I think that was something that Walter realized was "low-hanging fruit", which could quickly be implemented today and have a big impact now without waiting around till D 3.0.

So, I think it's vaguely on the D roadmap somewhere, it's just a huge undertaking and will take time.

--bb
October 21, 2006
Don Clugston wrote:
> 
> It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that
> (1) all errors would be caught at compile time, and
> (2) there'd be a minor speedup because the compilation step would not happen at runtime, and
> (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either.
> 

Whoa, "internal form" and "bytecoded program"? Out of curiosity, for those ignorant on the matter, like me, what kind of processing is done when creating a regexp, in terms of this internal form you speak of? Is it converted to a simple internal form, or something more complex? Bytecoded program seems pretty complex stuff, especially for a regexp (isn't the translation direct) ?

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D