Thread overview
Problem with RegExp
Jan 02, 2008
Matthew
Jan 02, 2008
Russell Lewis
Jan 02, 2008
Matthew
Jan 02, 2008
dennis luehring
Jan 02, 2008
Matthew
Jan 02, 2008
dennis luehring
Jan 03, 2008
Matthew
Jan 03, 2008
Tom
Jan 03, 2008
Matthew
January 02, 2008
I was playing around with RegExp and noticed it is not working like I think it should be working. This is the same with both 1.0 and 2.0.

import std.stdio;
import std.regexp;

void main (char [][] args) {
    string text = "Why doesn't it find the sssss's?";

    RegExp pattern = new RegExp(r"[^\s]+");
   //Notice the escape code in the expression.

    RegExp list = pattern.search(text);

    foreach(m; list) {
        writefln(m.match(0));
    }

}

The regular expression should match one or more non-whitespace characters. On my comp the whitespace characters don't match, but neither do lower case s's. Interestingly enough if I try the following.

RegExp pattern = new RegExp(r"[^\W]+");

I get the exact same behavior except now capitol W's aren't matched instead of lower case s's. I don't know if this is a bug or not as I've never used RegExp class before, but I wonder if it does this for everyone?
January 02, 2008
Matthew wrote:
> I was playing around with RegExp and noticed it is not working like I think it should be working. This is the same with both 1.0 and 2.0.
> 
> import std.stdio;
> import std.regexp;
> 
> void main (char [][] args) {
>     string text = "Why doesn't it find the sssss's?";
> 
>     RegExp pattern = new RegExp(r"[^\s]+");    //Notice the escape code in the expression.
> 
>     RegExp list = pattern.search(text);
> 
>     foreach(m; list) {
>         writefln(m.match(0));
>     }
> 
> }
> 
> The regular expression should match one or more non-whitespace characters. On my comp the whitespace characters don't match, but neither do lower case s's. Interestingly enough if I try the following.
> 
> RegExp pattern = new RegExp(r"[^\W]+");
> 
> I get the exact same behavior except now capitol W's aren't matched instead of lower case s's. I don't know if this is a bug or not as I've never used RegExp class before, but I wonder if it does this for everyone?

Do you need a double-backslash?
January 02, 2008
Russell Lewis Wrote:

> Matthew wrote:
> > I was playing around with RegExp and noticed it is not working like I think it should be working. This is the same with both 1.0 and 2.0.
> > 
> > import std.stdio;
> > import std.regexp;
> > 
> > void main (char [][] args) {
> >     string text = "Why doesn't it find the sssss's?";
> > 
> >     RegExp pattern = new RegExp(r"[^\s]+");
> >    //Notice the escape code in the expression.
> > 
> >     RegExp list = pattern.search(text);
> > 
> >     foreach(m; list) {
> >         writefln(m.match(0));
> >     }
> > 
> > }
> > 
> > The regular expression should match one or more non-whitespace characters. On my comp the whitespace characters don't match, but neither do lower case s's. Interestingly enough if I try the following.
> > 
> > RegExp pattern = new RegExp(r"[^\W]+");
> > 
> > I get the exact same behavior except now capitol W's aren't matched instead of lower case s's. I don't know if this is a bug or not as I've never used RegExp class before, but I wonder if it does this for everyone?
> 
> Do you need a double-backslash?

I did until I put the r before the quote. r"". Still got same behavior though.

January 02, 2008
> void main (char [][] args) {
>     string text = "Why doesn't it find the sssss's?";
> 
>     RegExp pattern = new RegExp(r"[^\s]+");    //Notice the escape code in the expression.
> 
>     RegExp list = pattern.search(text);
> 
>     foreach(m; list) {
>         writefln(m.match(0));
>     }
> 
> }

i think its a bug

\s machtes invisible chars AND the char s

[^\s] seem to be interpreted like [^s\s]
January 02, 2008
dennis luehring Wrote:

> > void main (char [][] args) {
> >     string text = "Why doesn't it find the sssss's?";
> > 
> >     RegExp pattern = new RegExp(r"[^\s]+");
> >    //Notice the escape code in the expression.
> > 
> >     RegExp list = pattern.search(text);
> > 
> >     foreach(m; list) {
> >         writefln(m.match(0));
> >     }
> > 
> > }
> 
> i think its a bug
> 
> \s machtes invisible chars AND the char s
> 
> [^\s] seem to be interpreted like [^s\s]

I didn't want to say it was a bug cause whenever I do that I get jinxed and it ends up being my own code, but equivalent code in C# doesn't seem to demonstrate the problem.

using System;
using System.Text.RegularExpressions;

class MyClass {
    public static void Main (String [] args) {
        string text = "Does it find the ssssss's?";
        Regex pattern = new Regex(@"[^\s]+");

        foreach (Match m in pattern.Matches(text)) {
            Console.WriteLine(m);
        }
    }

}

Now I'm going to continue writing my new D program because if I have to write public static void Main one more time I think I'm just gonna snap.
January 02, 2008
maybe we can use the regex test suit from the perl-source

\t\op\regexp.t - testprogram
\t\op\re_tests - testcases

January 03, 2008
Matthew escribió:
> I was playing around with RegExp and noticed it is not working like I think it should be working. This is the same with both 1.0 and 2.0.
> ...

RegExp has been broken for quite some time now. Search Bugzilla and you'll see.

--
Tom;
January 03, 2008
Tom Wrote:

> Matthew escribió:
> > I was playing around with RegExp and noticed it is not working like I think it should be working. This is the same with both 1.0 and 2.0.
>  > ...
> 
> RegExp has been broken for quite some time now. Search Bugzilla and you'll see.
> 
> --
> Tom;

So I've noticed since posting.

January 03, 2008
dennis luehring Wrote:

> maybe we can use the regex test suit from the perl-source
> 
> \t\op\regexp.t - testprogram
> \t\op\re_tests - testcases
> 

I don't have perl or sources on my comp. It's Windows and I just got it. So, it really doesn't have anything on it.

 But apparently from the bug reports RegExp is pretty buggy. Apparently both phobos and tango regex's are pretty buggy. So it would probably fail in many places. If I was ambitious I might try and write another regex lib, however I'm sure someone else is already working on this. Perhaps if it was really that important a well tested C or C++ regex library could just be wrapped by D. That would save a lot of time.