Thread overview
[Issue 5511] New: std.regex optional capture with no-match cause error
Jan 31, 2011
karasu
Jan 31, 2011
karasu
May 27, 2011
Dmitry Olshansky
Jun 06, 2011
Dmitry Olshansky
Jun 06, 2011
Dmitry Olshansky
January 31, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5511

           Summary: std.regex optional capture with no-match cause error
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Keywords: patch
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: himana.karasu@orange.fr


--- Comment #0 from karasu <himana.karasu@orange.fr> 2011-01-31 05:33:58 PST ---
version used: 2.051
An matching optional capture works:

auto ms = match("ab", "(a(.*))?(b)");
assert(ms.captures.length == 4); // Ok
assert(ms.captures[0] == "ab"); // Ok
assert(ms.captures[1] == "a"); // Ok
assert(ms.captures[2] == ""); // Ok
assert(ms.captures[3] == "b"); // Ok


But if optional capture doesn't match :

auto ms = match("b", "(a(.*))?(b)"); // same issue with pattern "(a(.*))*(b)"
assert(ms.captures.length == 4); // Failed length = 1
assert(ms.captures[0] == "b"); // Ok
assert(ms.captures[1] == ""); // core.exception.AssertError@regex.d(1724): 1
assert(ms.captures[2] == ""); // core.exception.AssertError@regex.d(1724): 1
assert(ms.captures[3] == "b"); // core.exception.AssertError@regex.d(1724): 1

In Captures.length (line 1713 in v2.051):
        @property size_t length()
        {
            foreach (i; 0 .. matches.length)
            {
                if (matches[i].startIdx >= input.length) return i;
            }
            return matches.length;
        }

for matches[1] and matches[2], startIdx == endIdx == startIdx.max but matches[3] is fine: startIdx == 0 and endIdx == 1

in RegexMatch.trymatch (line 2397 in v2.051) startIdx and endIdx are set only
if matching:
            case engine.REparen:
                // ... pass
                if (!trymatch(pop, pop + len))
                    goto Lnomatch;
                pmatch[n + 1].startIdx = ss;
                pmatch[n + 1].endIdx = src;
                pc = pop + len;
                break;

and in RegexMatch.test (line 1905 in v2.051) startIdx and endIdx are initialized to max for each match:

            foreach (i; 0 .. engine.re_nsub + 1)
            {
                pmatch[i].startIdx = -1;
                pmatch[i].endIdx = -1;
            }


in RegexMatch.test (line 1905 in v2.051) initializing startIdx and endIdx to startindex instead of max seems to fix the problem:

            foreach (i; 0 .. engine.re_nsub + 1)
            {
                pmatch[i].startIdx = startindex;
                pmatch[i].endIdx = startindex;
            }

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 31, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5511



--- Comment #1 from karasu <himana.karasu@orange.fr> 2011-01-31 07:07:08 PST ---
initializing startIdx and endIdx isn't enough if optional capture with no match contains sub-capture with match. save matches state before parse parentheses and restore state if no-match is needed.

in RegexMatch.trymatch, case engine.REparen (line 2397 in v2.051)
replace lines 2404-2405:
                if (!trymatch(pop, pop + len))
                    goto Lnomatch;

by:
                if (!psave)
                {
                    psave = cast(regmatch_t *)alloca(
                        (engine.re_nsub + 1) * regmatch_t.sizeof);
                }
                memcpy(psave, pmatch.ptr,
                        (engine.re_nsub + 1) * regmatch_t.sizeof);
                if (!trymatch(pop, pop + len)) {
                    memcpy(pmatch.ptr, psave,
                            (engine.re_nsub + 1) * regmatch_t.sizeof);
                    goto Lnomatch;
                }

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 27, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5511


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh@gmail.com


--- Comment #2 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2011-05-27 15:13:01 PDT ---
*** Issue 5805 has been marked as a duplicate of this issue. ***

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 06, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5511


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |petevik38@yahoo.com.au


--- Comment #3 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2011-06-06 08:18:29 PDT ---
*** Issue 5019 has been marked as a duplicate of this issue. ***

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 06, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5511


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


--- Comment #4 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2011-06-06 08:27:40 PDT ---
Fixed in version 2.053 https://github.com/D-Programming-Language/phobos/commit/ee612d047c8c8a840fb601180306f65ec28c7853

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------