Jump to page: 1 2 3
Thread overview
Would there be interest in a SERIOUS compile-time regex parser?
Oct 16, 2006
Don Clugston
Oct 16, 2006
rm
Oct 17, 2006
Don Clugston
Oct 16, 2006
Hasan Aljudy
Oct 16, 2006
Andrey Khropov
Oct 16, 2006
Pragma
Oct 16, 2006
Walter Bright
Oct 16, 2006
rm
Oct 17, 2006
Don Clugston
Oct 17, 2006
Bill Baxter
Oct 17, 2006
Walter Bright
Oct 18, 2006
Knud Sørensen
Oct 18, 2006
Walter Bright
Oct 18, 2006
Reiner Pope
Oct 18, 2006
Bill Baxter
Oct 17, 2006
Sean Kelly
Oct 16, 2006
Michael Butscher
Oct 17, 2006
BCS
Oct 17, 2006
Georg Wrede
Oct 21, 2006
Bruno Medeiros
Oct 21, 2006
Don Clugston
Oct 26, 2006
Bruno Medeiros
Oct 27, 2006
Don Clugston
Oct 24, 2006
Benji Smith
October 16, 2006
In the past, Eric and I both developed compile-time regex engines, but they were proof-of-concept rather than something you'd actually use in production code. I think this also applies to C++ metaprogramming regexp engines, too.

I've had a bit of play around with the regexp code in Phobos, and have convinced myself that it would be straightforward to create a compile-time wrapper for the existing engine.

Usage could be something like:
--------
void main()
{
	char [] s = "abcabcabab";
         // case insensitive search
	foreach(m; rexSearch!("ab+", "i")(s))
	{
		writefln("%s[%s]%s", m.pre, m.match(0), m.post);
	}
}
--------

It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that
(1) all errors would be caught at compile time, and
(2) there'd be a minor speedup because the compilation step would not happen at runtime, and
(3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either.

Existing code would be unchanged. You could even write:

Regexp a = StaticRegExp!("ab?(ab*)+", "g");

(assign a pre-compiled regular expression to an existing phobos RegExp).

There's potentially a greater speedup possible, because the Regexp class could become a struct, with no need for any dynamic memory allocation; but if this was done, mixing runtime and compile-time regexps together wouldn't be as seamless. And of course there's load of room for future enhancement.

BUT...

The question is -- would this be worthwhile? I'm really not interested in making another toy.
It's straightforward, but tedious, and would double the length of std.regexp.
Would the use of templates be such a turn-off that people wouldn't use it?
Do the benefits exceed the cost?
October 16, 2006
Don Clugston wrote:
> In the past, Eric and I both developed compile-time regex engines, but they were proof-of-concept rather than something you'd actually use in production code. I think this also applies to C++ metaprogramming regexp engines, too.
> 
> I've had a bit of play around with the regexp code in Phobos, and have convinced myself that it would be straightforward to create a compile-time wrapper for the existing engine.
> 
> Usage could be something like:
> --------
> void main()
> {
>     char [] s = "abcabcabab";
>          // case insensitive search
>     foreach(m; rexSearch!("ab+", "i")(s))
>     {
>         writefln("%s[%s]%s", m.pre, m.match(0), m.post);
>     }
> }
> --------
> 
> It would behave *exactly* like the existing std.regexp, except that
> compilation into the internal form would happen via template
> metaprogramming, so that
> (1) all errors would be caught at compile time, and
> (2) there'd be a minor speedup because the compilation step would not
> happen at runtime, and
> (3) otherwise it wouldn't be any faster than the existing regexp.
> However, there'd be no template code bloat, either.
> 
> Existing code would be unchanged. You could even write:
> 
> Regexp a = StaticRegExp!("ab?(ab*)+", "g");
> 
> (assign a pre-compiled regular expression to an existing phobos RegExp).
> 
> There's potentially a greater speedup possible, because the Regexp class could become a struct, with no need for any dynamic memory allocation; but if this was done, mixing runtime and compile-time regexps together wouldn't be as seamless. And of course there's load of room for future enhancement.
> 
> BUT...
> 
> The question is -- would this be worthwhile? I'm really not interested
> in making another toy.
> It's straightforward, but tedious, and would double the length of
> std.regexp.
> Would the use of templates be such a turn-off that people wouldn't use it?
> Do the benefits exceed the cost?

I'm not so far as looking into the current regexp module. But otoh I've already done some of the homework:

template findChar(char[] stringToSearch, char charToFind)
{
  static
    if ( stringToSearch.length == 0
       || stringToSearch[0] == charToFind )
      const int findChar = 0;
    else
      const int findChar
         = 1 + findChar!( stringToSearch[1..stringToSearch.length]
                        , charToFind);
}

gives the position of the char in the string, but if the position == length of stringToSearch, charToFind is not present.

I've got some others as well, I can parse an string literal into an integer :-)

I'm willing to give a hand if you want.

roel

October 16, 2006
I only speak for myself, but there are my 2 cents anyway:

I don't see myself ever needing this kind of thing.
As for the extra speedup, well, I've never been a speed freak, I always try to sacrifice performance in favor of code readability.

So I don't see this to be worthwhile.

Don Clugston wrote:
> In the past, Eric and I both developed compile-time regex engines, but they were proof-of-concept rather than something you'd actually use in production code. I think this also applies to C++ metaprogramming regexp engines, too.
> 
> I've had a bit of play around with the regexp code in Phobos, and have convinced myself that it would be straightforward to create a compile-time wrapper for the existing engine.
> 
> Usage could be something like:
> --------
> void main()
> {
>     char [] s = "abcabcabab";
>          // case insensitive search
>     foreach(m; rexSearch!("ab+", "i")(s))
>     {
>         writefln("%s[%s]%s", m.pre, m.match(0), m.post);
>     }
> }
> --------
> 
> It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that
> (1) all errors would be caught at compile time, and
> (2) there'd be a minor speedup because the compilation step would not happen at runtime, and
> (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either.
> 
> Existing code would be unchanged. You could even write:
> 
> Regexp a = StaticRegExp!("ab?(ab*)+", "g");
> 
> (assign a pre-compiled regular expression to an existing phobos RegExp).
> 
> There's potentially a greater speedup possible, because the Regexp class could become a struct, with no need for any dynamic memory allocation; but if this was done, mixing runtime and compile-time regexps together wouldn't be as seamless. And of course there's load of room for future enhancement.
> 
> BUT...
> 
> The question is -- would this be worthwhile? I'm really not interested in making another toy.
> It's straightforward, but tedious, and would double the length of std.regexp.
> Would the use of templates be such a turn-off that people wouldn't use it?
> Do the benefits exceed the cost?
October 16, 2006
Don Clugston wrote:

> Do the benefits exceed the cost?

I think yes. IMHO compile time version should be the default in std.regexp.

And it's also a good usecase of D features for the world at large.

-- 
AKhropov
October 16, 2006
Don Clugston wrote:
> In the past, Eric and I both developed compile-time regex engines, but they were proof-of-concept rather than something you'd actually use in production code. I think this also applies to C++ metaprogramming regexp engines, too.
> 
> I've had a bit of play around with the regexp code in Phobos, and have convinced myself that it would be straightforward to create a compile-time wrapper for the existing engine.
> 
> Usage could be something like:
> --------
> void main()
> {
>     char [] s = "abcabcabab";
>          // case insensitive search
>     foreach(m; rexSearch!("ab+", "i")(s))
>     {
>         writefln("%s[%s]%s", m.pre, m.match(0), m.post);
>     }
> }
> --------
> 
> It would behave *exactly* like the existing std.regexp, except that compilation into the internal form would happen via template metaprogramming, so that
> (1) all errors would be caught at compile time, and
> (2) there'd be a minor speedup because the compilation step would not happen at runtime, and
> (3) otherwise it wouldn't be any faster than the existing regexp. However, there'd be no template code bloat, either.
> 
> Existing code would be unchanged. You could even write:
> 
> Regexp a = StaticRegExp!("ab?(ab*)+", "g");
> 
> (assign a pre-compiled regular expression to an existing phobos RegExp).
> 
> There's potentially a greater speedup possible, because the Regexp class could become a struct, with no need for any dynamic memory allocation; but if this was done, mixing runtime and compile-time regexps together wouldn't be as seamless. And of course there's load of room for future enhancement.
> 
> BUT...
> 
> The question is -- would this be worthwhile? I'm really not interested in making another toy.
> It's straightforward, but tedious, and would double the length of std.regexp.
> Would the use of templates be such a turn-off that people wouldn't use it?
> Do the benefits exceed the cost?

It's always difficult to forsee the ramifications of anything that is new in this sense.  I'm curious as to how many people have used the Boost implementation... maybe that would give you an idea of how much real-world potential it has.

FWIW, I could use it within Enki for regex expressions, when they're used in an input grammar.  It would yield some nice speed increases for regex heavy designs, w/o having to re-implement it all over again for Enki's sake.  The only requirement I have is that it must handle UTF correctly, preferably UTF32 or a templated char type.

-- 
- EricAnderton at yahoo
October 16, 2006
Don Clugston wrote:
> The question is -- would this be worthwhile? I'm really not interested in making another toy.
> It's straightforward, but tedious, and would double the length of std.regexp.
> Would the use of templates be such a turn-off that people wouldn't use it?
> Do the benefits exceed the cost?

Yes, for the following reasons:

1) I think it would make for faster regexp's, more than just by omitting the compile step. That's because the bytecoded program wouldn't need to be interpreted.

2) When I show the current one to people, as opposed to Eric Niebler's C++ one, the response is "but the D one is just a toy" with the implication that D is just a toy.

3) I wrote std.regexp long before people needed or asked for it. I knew it would eventually become a critically needed module, and that came to pass. It was nice that it was there when they needed it, and the time passed had ensured that it was a solid piece of code.

4) Your crafting of the current toy version was the catalyst for a big leap forward in D's templating system. Doing a professional one may expose critical problems that need to be fixed, and even if no such flaws are discovered, it would prove that D's TMP capabilities are up to scratch.

Sure, a lot of people are turned off by templates. That's one of my motivations for making things like associative arrays usable without templates. But for the people who do use templates, this can be a very big deal.

I think it should be in a separate module, say, regexp_static or something like that. Ideally, the user can switch between the two just by changing the module name in his code, and compare.
October 16, 2006
Don Clugston wrote:
> The question is -- would this be worthwhile? I'm really not interested
> in making another toy.
> It's straightforward, but tedious, and would double the length of
> std.regexp.

Maybe it would be simpler, but similar effective, to add some persistence functionality to std.regexp.RegExp.

This could mean to convert an instance of RegExp to a byte array, cache it in a file, later load the array back and recreate the instance.

So the "compilation" of the RE to its opcodes is done only once and program initialization becomes faster.


Michael
October 16, 2006
Walter Bright wrote:
> Don Clugston wrote:
>> The question is -- would this be worthwhile? I'm really not interested
>> in making another toy.
>> It's straightforward, but tedious, and would double the length of
>> std.regexp.
>> Would the use of templates be such a turn-off that people wouldn't use
>> it?
>> Do the benefits exceed the cost?
> 
> Yes, for the following reasons:
> 
> 1) I think it would make for faster regexp's, more than just by omitting the compile step. That's because the bytecoded program wouldn't need to be interpreted.
> 
> 2) When I show the current one to people, as opposed to Eric Niebler's C++ one, the response is "but the D one is just a toy" with the implication that D is just a toy.
> 
> 3) I wrote std.regexp long before people needed or asked for it. I knew it would eventually become a critically needed module, and that came to pass. It was nice that it was there when they needed it, and the time passed had ensured that it was a solid piece of code.
> 
> 4) Your crafting of the current toy version was the catalyst for a big leap forward in D's templating system. Doing a professional one may expose critical problems that need to be fixed, and even if no such flaws are discovered, it would prove that D's TMP capabilities are up to scratch.
> 
> Sure, a lot of people are turned off by templates. That's one of my motivations for making things like associative arrays usable without templates. But for the people who do use templates, this can be a very big deal.
> 
> I think it should be in a separate module, say, regexp_static or something like that. Ideally, the user can switch between the two just by changing the module name in his code, and compare.

to code stuff like this, it would really come in handy, if there were a compiler switch which could be set, to trace which template instantiations are going on with which arguments.

For the moment I'm putting lot's of pragma-s in to manually trace this stuff, but it's so verbose ;-)

roel
October 17, 2006
Don Clugston wrote:
> BUT...
> 
> The question is -- would this be worthwhile? I'm really not interested in making another toy.
> It's straightforward, but tedious, and would double the length of std.regexp.
> Would the use of templates be such a turn-off that people wouldn't use it?
> Do the benefits exceed the cost?

I have an idea for a project that would benefit from it.
October 17, 2006
Don Clugston wrote:
> However, there'd be no template code bloat, either.
> 
> Existing code would be unchanged. You could even write:
> 
> Regexp a = StaticRegExp!("ab?(ab*)+", "g");
> 
> (assign a pre-compiled regular expression to an existing phobos RegExp).

IMHO, this would be a Killer App in template programming.

And definitely one of the show-off gems in D.
« First   ‹ Prev
1 2 3