Thread overview
std.regex bug? My regex doesn't match what it's supposed to.
Feb 03, 2011
Alex Folland
Feb 03, 2011
Alex Folland
Feb 03, 2011
Alex Folland
Feb 03, 2011
Stanislav Blinov
Feb 03, 2011
Alex Folland
Feb 03, 2011
Stanislav Blinov
February 03, 2011
I'm using std.regex from Phobos 2, which I heard was relatively new.  My regex is supposed to match a time to start playback in a game replay's file name (usually user-written).  It's very adaptive and works perfectly on http://regextester.com but doesn't match properly with Phobos.

I wrote a test program which displays filenames and the matched timecodes.
It's located here: http://lex.clansfx.co.uk/projects/wagametimecodes.d

The regex (might have to widen your mail client to see it properly):

((start|begin|enter|play(back)?)[\s_-]*)?((@|at|from)[\s_-]*)?(\d+([\.\-']|[\s_-]*|m(in(ute)?)?|[\s_-]*and[\s_-]*)*(s(ec(ond)?)?)?){1,3}
\_____________Ignore this part.__It works perfectly._________/\_______________This part is supposed to match time codes._______________/

The problematic string:

Guaton_at_9min59sec.WAgame

regextester.com matches "at_9min59sec" altogether perfectly, which is what I want to happen.
std.regex matches "at_9" and "59s", which I don't want to happen.

std.regex was matching "at_9min59s" before I changed the way it finds variations of "minute" and "second" from "[ms][inutecond]*" to its current method.  It was better before.  Now the numbers aren't even joined.

All in all, I'm pretty sure this is a std.regex bug, but I don't want to waste Andrei's time if it's not, since I'm not that experienced.
February 03, 2011
I figured something out, at least.  I had forgotten to use backslashes before the hyphens in the [...]s.  That makes the matches link together as expected, but it still doesn't make "s(ec(ond)?)?" match "sec" like it should.  It just matches "s".

For example, with std.regex, the following regex doesn't match the full string below it.

(\d+([\.\-\s_']|and|m(in(ute)?s?)?|s(ec(ond)?s?)?)*){1,3}
9min59sec24

It does match on http://regextester.com .  This is pretty clearly a bug at this point.  I don't see what else I could be doing wrong.
February 03, 2011
I figured out the bug.  Inside a set of square brackets, \s doesn't match whitespace.  It matches s instead.  I'm uncertain exactly how the ECMA-262 part 15.10 regular expression specification is meant to handle that situation.
February 03, 2011
03.02.2011 18:03, Alex Folland пишет:
> I figured out the bug. Inside a set of square brackets, \s doesn't match whitespace. It matches s instead. I'm uncertain exactly how the ECMA-262 part 15.10 regular expression specification is meant to handle that situation.
It does match for me:

foreach(m; match("a b c d e", regex("[a-z][\\s]?")))
{
writefln("%s[%s]%s", m.pre, m.hit, m.post);
}
February 03, 2011
On 2011-02-03 10:21, Stanislav Blinov wrote:
> 03.02.2011 18:03, Alex Folland пишет:
>> I figured out the bug. Inside a set of square brackets, \s doesn't
>> match whitespace. It matches s instead. I'm uncertain exactly how the
>> ECMA-262 part 15.10 regular expression specification is meant to
>> handle that situation.
> It does match for me:
>
> foreach(m; match("a b c d e", regex("[a-z][\\s]?")))
> {
> writefln("%s[%s]%s", m.pre, m.hit, m.post);
> }

Okay, now actually try the test I suggested.  I found it was working in other sections too, but not in this test which has another "s" section it's supposed to look for.

Since it's broken, you'll see 2 matches instead of 1.

module main;

import std.stdio,std.regex;

void main()
{
  foreach(m; match("9min59sec24", regex(`(\d+([\s_]|and|m(in(ute)?s?)?|s(ec(ond)?s?)?)*){1,3}`, "gi")))
    writefln("%s[%s]%s", m.pre, m.hit, m.post);
  return;
}
February 03, 2011
03.02.2011 19:08, Alex Folland пишет:
> On 2011-02-03 10:21, Stanislav Blinov wrote:
>> 03.02.2011 18:03, Alex Folland пишет:
>>> I figured out the bug. Inside a set of square brackets, \s doesn't
>>> match whitespace. It matches s instead. I'm uncertain exactly how the
>>> ECMA-262 part 15.10 regular expression specification is meant to
>>> handle that situation.
>> It does match for me:
>>
>> foreach(m; match("a b c d e", regex("[a-z][\\s]?")))
>> {
>> writefln("%s[%s]%s", m.pre, m.hit, m.post);
>> }
>
> Okay, now actually try the test I suggested.  I found it was working in other sections too, but not in this test which has another "s" section it's supposed to look for.
>
> Since it's broken, you'll see 2 matches instead of 1.
>
> module main;
>
> import std.stdio,std.regex;
>
> void main()
> {
>   foreach(m; match("9min59sec24", regex(`(\d+([\s_]|and|m(in(ute)?s?)?|s(ec(ond)?s?)?)*){1,3}`, "gi")))
>     writefln("%s[%s]%s", m.pre, m.hit, m.post);
>   return;
>

Oh, yes, I see it now.