Thread overview | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
January 17, 2007 Regular expression woes | ||||
---|---|---|---|---|
| ||||
Is this a bug, or am I misunderstanding something? The code... # import std.stdio; # import std.regexp; # # int main(char[][] args) { # char[] string = "xfooxxxxxfoox"; # writefln("Greedy matching:"); # foreach (RegExp match; RegExp("x.*x").search(string)) # writefln("%s[%s]%s", match.pre, match.match(0), match.post); # writefln("Conservative matching:"); # foreach (RegExp match; RegExp("x.*?x").search(string)) # writefln("%s[%s]%s", match.pre, match.match(0), match.post); # return 0; # } ...compiled under GDC 0.21 (using the Phobos version that ships therewith) yields: Greedy matching: [xfooxxxxx]foox Conservative matching: [xfoox]xxxxfoox xfoox[xx]xxfoox xfooxxx[xx]foox The latter part (conservative matching) makes plenty of sense to me, but I thought the former should have matched the whole string (i.e. read "[xfooxxxxxfoox]". Is this behaviour intended? Thanks. :) |
January 17, 2007 Re: Regular expression woes | ||||
---|---|---|---|---|
| ||||
Posted in reply to just jeff | When searching for x.*x in xfooxxxxxfoox, VisualStudio 2005 matches the entire string: [xfooxxxxxfoox] Also, the following two appear to be missing from "conservative matching": xfoo[xx]xxxfoox xfooxxxx[xfoox] L. |
January 17, 2007 Re: Regular expression woes | ||||
---|---|---|---|---|
| ||||
Posted in reply to Lionello Lunesu | Lionello Lunesu Wrote:
> (...)
> Also, the following two appear to be missing from "conservative matching":
>
> xfoo[xx]xxxfoox
> xfooxxxx[xfoox]
I wouldn't have expected them to be found; I had thought standard regex behavior was not to find overlapping matches (i.e. to start searching again just past the end of any match it finds).
I'm at work at the moment (and unfortunately without my laptop), so the only library I have available to test that on is the VBA one that comes with Access (*shudders* :P), but that doesn't find those two matches either.
|
January 18, 2007 Re: Regular expression woes | ||||
---|---|---|---|---|
| ||||
Posted in reply to just jeff | I just tested Java, and it doesn't return the extra matches either. I'll trawl through std.regexp when I get home to see if I can find what's going on. Any inspiration would be appreciated. I presume the default in std.regexp -is- supposed to be a greedy match, and not some strange sort of half-way match? Perhaps I presume too much? o_0 |
January 18, 2007 Re: Regular expression woes | ||||
---|---|---|---|---|
| ||||
Posted in reply to just jeff | "just jeff" <psychobrat@gmail.com> wrote in message news:eom9pa$tjm$1@digitaldaemon.com... > Lionello Lunesu Wrote: > >> (...) >> Also, the following two appear to be missing from "conservative >> matching": >> >> xfoo[xx]xxxfoox >> xfooxxxx[xfoox] > > I wouldn't have expected them to be found; I had thought standard regex > behavior was not to find overlapping matches (i.e. to start searching > again > just past the end of any match it finds). VS2005 did find them, using x.@x L. |
January 19, 2007 Re: Regular expression woes | ||||
---|---|---|---|---|
| ||||
Posted in reply to Lionello Lunesu | > VS2005 did find them, using x.@x
Ack, I can't find any documentation on the use of "@". Funny, that; I've never had much luck with Microsoft's documentation at all... ;)
Care to elaborate?
|
January 20, 2007 Re: Regular expression woes | ||||
---|---|---|---|---|
| ||||
Posted in reply to just jeff | "just jeff" <jeffrparsons@optusnet.com.au> wrote in message news:eopti4$lan$1@digitaldaemon.com... >> VS2005 did find them, using x.@x > > Ack, I can't find any documentation on the use of "@". Funny, that; I've never had much luck with Microsoft's documentation at all... ;) > > Care to elaborate? http://msdn2.microsoft.com/en-us/library/2k3te2cs(VS.80).aspx But, interestingly, the .NET framework uses the same .*? http://msdn2.microsoft.com/en-us/library/3206d374(VS.80).aspx |
January 20, 2007 Re: Regular expression woes | ||||
---|---|---|---|---|
| ||||
Posted in reply to Lionello Lunesu | Lionello Lunesu wrote:
> "just jeff" <jeffrparsons@optusnet.com.au> wrote in message news:eopti4$lan$1@digitaldaemon.com...
>>> VS2005 did find them, using x.@x
>> Ack, I can't find any documentation on the use of "@". Funny, that; I've never had much luck with Microsoft's documentation at all... ;)
>>
>> Care to elaborate?
>
> http://msdn2.microsoft.com/en-us/library/2k3te2cs(VS.80).aspx
>
> But, interestingly, the .NET framework uses the same .*?
>
> http://msdn2.microsoft.com/en-us/library/3206d374(VS.80).aspx
Looks like the .NET framework uses the "standard" syntax. The reason VS uses a different syntax is probably because it's meant to search in source code and some characters like *() etc are commonly used in C-like languages. Therefore they might be the characters searched for quite often, and excessive quoting is inconvenient. @{} are probably searched for a lot less, so they are arguably better choices for meta-characters in this context.
|
Copyright © 1999-2021 by the D Language Foundation