Thread overview
RegExps and templates
Feb 27, 2005
xs0
Feb 27, 2005
zwang
Feb 27, 2005
xs0
Feb 27, 2005
Georg Wrede
Feb 28, 2005
xs0
February 27, 2005
Hello!

I was reading some of the RegExp posts and a thought occured to me - would it be possible to express regular expressions using meta-programming techniques? If so, it would be fairly easy to do compiler support for them - it would just rewrite the RE as a template instance, and optimization and whatnot would be inherited automatically.

To support RE also at run-time, it may be neat to have the templates work in two modes - either as static functions or as runtime objects, to allow runtime use.

It would also allow construction of REs by explicitly instancing templates, and the whole infrastructure could be expanded by implementing new templates that do matching, as long as they implemented some specified interface (this would have no compiler support, of course).

Now, the questions are:
- is this possible at all? I'm no RE expert, so I can't answer that..
- what would be the required interfaces?
- what would the templates look like?
- if everything else is worked out, is Walter willing to include this in the language?

Finally, for a partial example (with only matching):

interface REPart
{
// returns true if there is a match
// pos gets updated to where the match ends, or left alone if not
    bit isMatch(dchar[] text, inout int pos);
}

class MatchChar(dchar C) : REPart
{
// this is the static part:
    static bit isMatch(dchar[] text, inout int pos)
    {
        if (text[pos]==C) {
            ++pos;
            return 1;
        }
        return 0;
    }

// this is the dynamic part:
    dchar val;
    this(dchar val)
    {
        this.val=val;
    }

    bit isMatch(dchar[] text, inout int pos)
    {
        if (text[pos]==val) {
            ++pos;
            return 1;
        }
        return 0;
    }

}

class MatchCharRange(dchar A, dchar B) : REPart
{
    static bit isMatch(dchar[] text, inout int pos)
    {
        if (A<=text[pos] && text[pos]<=B) {
           ++pos;
           return 1;
        }
        return 0;
    }
// and the same for dynamic support
}

class MatchCount(int min, int max, SUB) : REPart
{
    static bit isMatch(dchar[] text, inout int pos)
    {
        int tmppos=pos;
        int count=0;
        while (SUB.isMatch(text, tmppos))
            count++;

        if (count>=min && count<=max) {
            pos=tmppos;
            return 1;
        }
        return 0;
    }
}

class MatchEither(A,B) : REPart
{
    static bit isMatch(dchar[] text, inout int pos)
    {
        int posA=pos;
        bit matchA=A.isMatch(text, posA);

        int posB=pos;
        bit matchB=B.isMatch(text, posB);

        if (matchA && matchB) { // match the longer one
            pos=posA>posB?posA:posB;
            return 1;
        }

        if (matchA || matchB) {
            pos=matchA?posA:posB;
            return 1;
        }

        return 0;
    }
}

class MatchTwo(A,B) : REPart
{
    static bit isMatch(dchar[] text, inout int pos)
    {
        int tmppos=pos;
        if (A.isMatch(text, tmppos) && B.isMatch(text, tmppos)) {
            pos=tmppos;
            return 1;
        }
        return 0;
    }
}


And now, something like

[0-9]+([.][0-9]+)?

would become

MatchTwo!(
    MatchCount!(1, int.max,
        MatchCharRange!('0', '9')
    ),
    MatchCount!(0, 1,
        MatchTwo!(
            MatchChar!('.'),
            MatchCount!(1, int.max,
                MatchCharRange!('0', '9')
            )
        )
    )
).isMatch(..)



Thoughts?


xs0
February 27, 2005
Interesting.  But what's the benefit of using template-based RE?

Isn't "MatchTwo!(MatchCount!(1, int.max, MatchCharRange!('0', '9')),
MatchCount!(0, 1, MatchTwo!(MatchChar!('.'), MatchCount!(1, int.max,
MatchCharRange!('0', '9')))))" overly verbose?
What's wrong with the good old succinct form "[0-9]+([.][0-9]+)?"?



xs0 wrote:
> Hello!
> 
> I was reading some of the RegExp posts and a thought occured to me - would it be possible to express regular expressions using meta-programming techniques? If so, it would be fairly easy to do compiler support for them - it would just rewrite the RE as a template instance, and optimization and whatnot would be inherited automatically.
> 
> To support RE also at run-time, it may be neat to have the templates work in two modes - either as static functions or as runtime objects, to allow runtime use.
> 
> It would also allow construction of REs by explicitly instancing templates, and the whole infrastructure could be expanded by implementing new templates that do matching, as long as they implemented some specified interface (this would have no compiler support, of course).
> 
> Now, the questions are:
> - is this possible at all? I'm no RE expert, so I can't answer that..
> - what would be the required interfaces?
> - what would the templates look like?
> - if everything else is worked out, is Walter willing to include this in the language?
> 
> Finally, for a partial example (with only matching):
> 
> interface REPart
> {
> // returns true if there is a match
> // pos gets updated to where the match ends, or left alone if not
>     bit isMatch(dchar[] text, inout int pos);
> }
> 
> class MatchChar(dchar C) : REPart
> {
> // this is the static part:
>     static bit isMatch(dchar[] text, inout int pos)
>     {
>         if (text[pos]==C) {
>             ++pos;
>             return 1;
>         }
>         return 0;
>     }
> 
> // this is the dynamic part:
>     dchar val;
>     this(dchar val)
>     {
>         this.val=val;
>     }
> 
>     bit isMatch(dchar[] text, inout int pos)
>     {
>         if (text[pos]==val) {
>             ++pos;
>             return 1;
>         }
>         return 0;
>     }
> 
> }
> 
> class MatchCharRange(dchar A, dchar B) : REPart
> {
>     static bit isMatch(dchar[] text, inout int pos)
>     {
>         if (A<=text[pos] && text[pos]<=B) {
>            ++pos;
>            return 1;
>         }
>         return 0;
>     }
> // and the same for dynamic support
> }
> 
> class MatchCount(int min, int max, SUB) : REPart
> {
>     static bit isMatch(dchar[] text, inout int pos)
>     {
>         int tmppos=pos;
>         int count=0;
>         while (SUB.isMatch(text, tmppos))
>             count++;
> 
>         if (count>=min && count<=max) {
>             pos=tmppos;
>             return 1;
>         }
>         return 0;
>     }
> }
> 
> class MatchEither(A,B) : REPart
> {
>     static bit isMatch(dchar[] text, inout int pos)
>     {
>         int posA=pos;
>         bit matchA=A.isMatch(text, posA);
> 
>         int posB=pos;
>         bit matchB=B.isMatch(text, posB);
> 
>         if (matchA && matchB) { // match the longer one
>             pos=posA>posB?posA:posB;
>             return 1;
>         }
> 
>         if (matchA || matchB) {
>             pos=matchA?posA:posB;
>             return 1;
>         }
> 
>         return 0;
>     }
> }
> 
> class MatchTwo(A,B) : REPart
> {
>     static bit isMatch(dchar[] text, inout int pos)
>     {
>         int tmppos=pos;
>         if (A.isMatch(text, tmppos) && B.isMatch(text, tmppos)) {
>             pos=tmppos;
>             return 1;
>         }
>         return 0;
>     }
> }
> 
> 
> And now, something like
> 
> [0-9]+([.][0-9]+)?
> 
> would become
> 
> MatchTwo!(
>     MatchCount!(1, int.max,
>         MatchCharRange!('0', '9')
>     ),
>     MatchCount!(0, 1,
>         MatchTwo!(
>             MatchChar!('.'),
>             MatchCount!(1, int.max,
>                 MatchCharRange!('0', '9')
>             )
>         )
>     )
> ).isMatch(..)
> 
> 
> 
> Thoughts?
> 
> 
> xs0
February 27, 2005
Well, the benefit is that it gets compiled to native code that handles exactly that RE. The idea is to make it simple to support in the compiler, otherwise it will never happen, I guess... And naturally, if it was compiler-supported, you would only write "[0-9]+([.][0-9]+)?" with a char or two indicating it was a RE :)


xs0

zwang wrote:
> Interesting.  But what's the benefit of using template-based RE?
> 
> Isn't "MatchTwo!(MatchCount!(1, int.max, MatchCharRange!('0', '9')),
> MatchCount!(0, 1, MatchTwo!(MatchChar!('.'), MatchCount!(1, int.max,
> MatchCharRange!('0', '9')))))" overly verbose?
> What's wrong with the good old succinct form "[0-9]+([.][0-9]+)?"?
February 27, 2005
Did you read the thread "What is a regular expression" (started feb 16)? How is this related to it?

I mean, is this in addition to that, or instead of it? Or something else?

xs0 wrote:
> Well, the benefit is that it gets compiled to native code that handles exactly that RE. The idea is to make it simple to support in the compiler, otherwise it will never happen, I guess... And naturally, if it was compiler-supported, you would only write "[0-9]+([.][0-9]+)?" with a char or two indicating it was a RE :)
> 
> 
> xs0
> 
> zwang wrote:
> 
>> Interesting.  But what's the benefit of using template-based RE?
>>
>> Isn't "MatchTwo!(MatchCount!(1, int.max, MatchCharRange!('0', '9')),
>> MatchCount!(0, 1, MatchTwo!(MatchChar!('.'), MatchCount!(1, int.max,
>> MatchCharRange!('0', '9')))))" overly verbose?
>> What's wrong with the good old succinct form "[0-9]+([.][0-9]+)?"?
February 28, 2005
Georg Wrede wrote:
> Did you read the thread "What is a regular expression" (started feb 16)? How is this related to it?
> 
> I mean, is this in addition to that, or instead of it? Or something else?

Yes I did, that's what got me started. In this thread I was trying to focus on implementation, rather than syntax or other topics. Unfortunately, it seems it was premature, as there are hardly any replies :)


xs0