Thread overview
Regular expression woes
Jan 17, 2007
just jeff
Jan 17, 2007
Lionello Lunesu
Jan 17, 2007
just jeff
Jan 18, 2007
just jeff
Jan 18, 2007
Lionello Lunesu
Jan 19, 2007
just jeff
Jan 20, 2007
Lionello Lunesu
Jan 20, 2007
Frits van Bommel
Jan 24, 2007
just jeff
January 17, 2007
Is this a bug, or am I misunderstanding something? The code...

# import std.stdio;
# import std.regexp;
#
# int main(char[][] args) {
#     char[] string = "xfooxxxxxfoox";
#     writefln("Greedy matching:");
#     foreach (RegExp match; RegExp("x.*x").search(string))
#         writefln("%s[%s]%s", match.pre, match.match(0), match.post);
#     writefln("Conservative matching:");
#     foreach (RegExp match; RegExp("x.*?x").search(string))
#         writefln("%s[%s]%s", match.pre, match.match(0), match.post);
#     return 0;
# }

...compiled under GDC 0.21 (using the Phobos version that ships therewith) yields:

Greedy matching:
[xfooxxxxx]foox
Conservative matching:
[xfoox]xxxxfoox
xfoox[xx]xxfoox
xfooxxx[xx]foox

The latter part (conservative matching) makes plenty of sense to me, but I thought the former should have matched the whole string (i.e. read "[xfooxxxxxfoox]".

Is this behaviour intended?

Thanks. :)
January 17, 2007
When searching for x.*x in xfooxxxxxfoox, VisualStudio 2005 matches the entire string:

[xfooxxxxxfoox]

Also, the following two appear to be missing from "conservative matching":

xfoo[xx]xxxfoox
xfooxxxx[xfoox]

L.
January 17, 2007
Lionello Lunesu Wrote:

> (...)
> Also, the following two appear to be missing from "conservative matching":
> 
> xfoo[xx]xxxfoox
> xfooxxxx[xfoox]

I wouldn't have expected them to be found; I had thought standard regex behavior was not to find overlapping matches (i.e. to start searching again just past the end of any match it finds).

I'm at work at the moment (and unfortunately without my laptop), so the only library I have available to test that on is the VBA one that comes with Access (*shudders* :P), but that doesn't find those two matches either.
January 18, 2007
I just tested Java, and it doesn't return the extra matches either.

I'll trawl through std.regexp when I get home to see if I can find what's going on.

Any inspiration would be appreciated. I presume the default in std.regexp -is- supposed to be a greedy match, and not some strange sort of half-way match? Perhaps I presume too much? o_0
January 18, 2007
"just jeff" <psychobrat@gmail.com> wrote in message news:eom9pa$tjm$1@digitaldaemon.com...
> Lionello Lunesu Wrote:
>
>> (...)
>> Also, the following two appear to be missing from "conservative
>> matching":
>>
>> xfoo[xx]xxxfoox
>> xfooxxxx[xfoox]
>
> I wouldn't have expected them to be found; I had thought standard regex
> behavior was not to find overlapping matches (i.e. to start searching
> again
> just past the end of any match it finds).

VS2005 did find them, using x.@x

L.



January 19, 2007
> VS2005 did find them, using x.@x

Ack, I can't find any documentation on the use of "@". Funny, that; I've never had much luck with Microsoft's documentation at all... ;)

Care to elaborate?
January 20, 2007
"just jeff" <jeffrparsons@optusnet.com.au> wrote in message news:eopti4$lan$1@digitaldaemon.com...
>> VS2005 did find them, using x.@x
>
> Ack, I can't find any documentation on the use of "@". Funny, that; I've never had much luck with Microsoft's documentation at all... ;)
>
> Care to elaborate?

http://msdn2.microsoft.com/en-us/library/2k3te2cs(VS.80).aspx

But, interestingly, the .NET framework uses the same .*?

http://msdn2.microsoft.com/en-us/library/3206d374(VS.80).aspx


January 20, 2007
Lionello Lunesu wrote:
> "just jeff" <jeffrparsons@optusnet.com.au> wrote in message news:eopti4$lan$1@digitaldaemon.com...
>>> VS2005 did find them, using x.@x
>> Ack, I can't find any documentation on the use of "@". Funny, that; I've never had much luck with Microsoft's documentation at all... ;)
>>
>> Care to elaborate?
> 
> http://msdn2.microsoft.com/en-us/library/2k3te2cs(VS.80).aspx
> 
> But, interestingly, the .NET framework uses the same .*?
> 
> http://msdn2.microsoft.com/en-us/library/3206d374(VS.80).aspx

Looks like the .NET framework uses the "standard" syntax. The reason VS uses a different syntax is probably because it's meant to search in source code and some characters like *() etc are commonly used in C-like languages. Therefore they might be the characters searched for quite often, and excessive quoting is inconvenient. @{} are probably searched for a lot less, so they are arguably better choices for meta-characters in this context.
January 24, 2007
Could somebody confident in the way std.regexp should work please confirm whether or not this is a bug?