Thread overview
[Issue 5169] New: Add(?:) Non-capturing parentheses group support to std.regex
Nov 05, 2010
Dmitry Olshansky
Nov 05, 2010
Dmitry Olshansky
Feb 25, 2011
Jerry Quinn
Mar 01, 2011
Dmitry Olshansky
Jun 06, 2011
Dmitry Olshansky
November 05, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5169

           Summary: Add(?:) Non-capturing parentheses group support to
                    std.regex
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: dmitry.olsh@gmail.com


--- Comment #0 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2010-11-05 09:35:15 PDT ---
Intro: Non-capturing parentheses group the regex so you can apply regex operators, but do not capture anything and do not create backreferences.

Examples:
//A very dumb example, matches abcabcabc, no backrefs created
(?:abc){3}
//A decent attempt to snatch href field of <a> html tag, without unnessary
//backrefs:
<(?:a|A)(?:[^<>]*)href *= *"?([^"<> ]*)"?(?:[^<>]*)>

Rationale: ECMA262 standart mentioned on
http://www.digitalmars.com/d/2.0/phobos/std_regex.html
requires support of such construct. Sooner or later we should get rid of
"however, some of the very advanced forms may behave slightly differently",
also given the fact that sometimes it's simple. See attached patch.

Backtracking is also costly, see benchmark code/results
 (uses the proposed patch):
//===bench.d===
import std.regex, std.stdio,std.datetime;

void main(){
    auto r1 = regex(`(?:a|A)(?:[^<>]*)href *= *"?([^"<> ]*)"?(?:[^<>]*)>`,"g");
    auto r2 = regex(`(a|A)([^<>]*)href *= *"?([^"<> ]*)"?([^<>]*)>`,"g");
    void nobackref(){
        match(`<a href = http://www.google.com  id="G"/>`,r1).hit;
    }
    void backref(){
        match(`<a href = http://www.google.com  id="G"/>`,r2).hit;
    }
    auto bench = benchmark!(nobackref,backref)(1_000);
    writeln("No backref:   ",bench[0].milliseconds);
    writeln("With backref: ",bench[1].milliseconds);
}
//======
Results on my machine, min .. max of 10
No backref:   256.955 .. 267.341
With backref: 580.636 .. 587.187

P.S. I have rebuilt phobos (on Windows), and run unitestes, output:
C:\dmd2\src\phobos>unittest.exe
 --- std.socket(660) broken test ---
 (std.socket.HostException: Address family mismatch)
9abc5a5a12345678
args.length = 1
args[0] = 'C:\dmd2\src\phobos\unittest.exe~T'
Vendor string:    AuthenticAMD
Processor string: AMD Phenom(tm) II X4 940 Processor
Signature:        Family=16 Model=4 Stepping=2
Features:         MMX FXSR SSE SSE2 SSE3 3DNow! 3DNow!+ MMX+ AMD64 HTT
Multithreading:   4 threads / 4 cores

Success!

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
November 05, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5169



--- Comment #1 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2010-11-05 09:36:40 PDT ---
Created an attachment (id=800)
Patch for regex.d, enables (?:) regex syntax as per ECMA262

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 25, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5169


Jerry Quinn <jlquinn@optonline.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jlquinn@optonline.net
           Severity|enhancement                 |normal


--- Comment #2 from Jerry Quinn <jlquinn@optonline.net> 2011-02-25 08:17:20 PST ---
Changing from enhancement to a bug, as std.regex is supposed to support ECMA-262.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 01, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5169



--- Comment #3 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2011-03-01 07:52:03 PST ---
For a more full feature request for regex and a patch for it: http://d.puremagic.com/issues/show_bug.cgi?id=5673

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 06, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5169


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE


--- Comment #4 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2011-06-06 02:27:19 PDT ---
*** This issue has been marked as a duplicate of issue 5673 ***

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------