Thread overview | |||||||
---|---|---|---|---|---|---|---|
|
July 05, 2009 [Issue 3136] New: Incorrect and strange behavior of std.regexp.RegExp if using a pattern with optional prefix and suffix longer than 1 char | ||||
---|---|---|---|---|
| ||||
http://d.puremagic.com/issues/show_bug.cgi?id=3136 Summary: Incorrect and strange behavior of std.regexp.RegExp if using a pattern with optional prefix and suffix longer than 1 char Product: D Version: 2.030 Platform: x86 OS/Version: Windows Status: NEW Severity: major Priority: P2 Component: Phobos AssignedTo: nobody@puremagic.com ReportedBy: marcellognani@gmail.com It seems like std.regexp.RegExp get confused if I try using a pattern with optional prefix and suffix longer than 1 char. An expression of the form ([A]{0,2})(C)([D]{0,2}) matches all off "AC", "BC", "CD", "CE", "ACD", "BCE", "ABCDE", "C" (as expected). An expression of the form ([AB]{0,2})(C)([DE]{0,2}) or ([AB]?[AB]?)(C)([DE]?[DE]?) fails (incorrectly and unexpectedly) in some of the cases above (both "CD" and "CE", for example). Here the code: --- import std.regexp; import std.stdio; public { static void main() { RegExp eTest; void SetExp(string pattern) { eTest=new RegExp(pattern,"g"); std.stdio.writeln("Testing expression ",pattern); } void TryString(string s) { std.stdio.writeln("Trying on string\"",s,"\":"); auto captures=eTest.exec(s); if(captures.length) { std.stdio.writeln("Success!"); foreach(uint i,string capture;captures) std.stdio.writeln(i,"): \"",capture,"\""); } else { std.stdio.writeln("Failure!"); } } SetExp(r"([A]{0,2})(C)([D]{0,2})"); TryString("AC"); TryString("BC"); TryString("CD"); TryString("CE"); TryString("ACD"); TryString("BCE"); TryString("ABCDE"); TryString("C"); TryString("F"); SetExp(r"([AB]{0,2})(C)([DE]{0,2})"); TryString("AC"); TryString("BC"); TryString("CD"); TryString("CE"); TryString("ACD"); TryString("BCE"); TryString("ABCDE"); TryString("C"); TryString("F"); SetExp(r"([AB]?[AB]?)(C)([DE]?[DE]?)"); TryString("AC"); TryString("BC"); TryString("CD"); TryString("CE"); TryString("ACD"); TryString("BCE"); TryString("ABCDE"); TryString("C"); TryString("F"); } } --- Here the output: --- Testing expression ([A]{0,2})(C)([D]{0,2}) Trying on string"AC": Success! 0): "AC" 1): "A" 2): "C" 3): "" Trying on string"BC": Success! 0): "C" 1): "" 2): "C" 3): "" Trying on string"CD": Success! 0): "CD" 1): "" 2): "C" 3): "D" Trying on string"CE": Success! 0): "C" 1): "" 2): "C" 3): "" Trying on string"ACD": Success! 0): "ACD" 1): "A" 2): "C" 3): "D" Trying on string"BCE": Success! 0): "C" 1): "" 2): "C" 3): "" Trying on string"ABCDE": Success! 0): "CD" 1): "" 2): "C" 3): "D" Trying on string"C": Success! 0): "C" 1): "" 2): "C" 3): "" Trying on string"F": Failure! Testing expression ([AB]{0,2})(C)([DE]{0,2}) Trying on string"AC": Success! 0): "AC" 1): "A" 2): "C" 3): "" Trying on string"BC": Success! 0): "BC" 1): "B" 2): "C" 3): "" Trying on string"CD": Failure! Trying on string"CE": Failure! Trying on string"ACD": Success! 0): "ACD" 1): "A" 2): "C" 3): "D" Trying on string"BCE": Success! 0): "BCE" 1): "B" 2): "C" 3): "E" Trying on string"ABCDE": Success! 0): "ABCDE" 1): "AB" 2): "C" 3): "DE" Trying on string"C": Failure! Trying on string"F": Failure! Testing expression ([AB]?[AB]?)(C)([DE]?[DE]?) Trying on string"AC": Success! 0): "AC" 1): "A" 2): "C" 3): "" Trying on string"BC": Success! 0): "BC" 1): "B" 2): "C" 3): "" Trying on string"CD": Failure! Trying on string"CE": Failure! Trying on string"ACD": Success! 0): "ACD" 1): "A" 2): "C" 3): "D" Trying on string"BCE": Success! 0): "BCE" 1): "B" 2): "C" 3): "E" Trying on string"ABCDE": Success! 0): "ABCDE" 1): "AB" 2): "C" 3): "DE" Trying on string"C": Failure! Trying on string"F": Failure! --- Kind regards, Marcello Gnani -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
July 08, 2009 [Issue 3136] Incorrect and strange behavior of std.regexp.RegExp if using a pattern with optional prefix and suffix longer than 1 char | ||||
---|---|---|---|---|
| ||||
Posted in reply to marcellognani@gmail.com | http://d.puremagic.com/issues/show_bug.cgi?id=3136 --- Comment #1 from Marcello Gnani <marcellognani@gmail.com> 2009-07-08 12:06:26 PDT --- I had the time to investigate further; the problem is related to an incorrect optimization performed by Phobos on the optional prefix. The constructor code of the RegExp object calls "public void compile(string pattern, string attributes)", that builds a correct internal RegExp program; then, an optimization is tried calling the "void optimize()" function. In this function, during the optimization of the REbit opcode (the opcode that implements the prefix match when the prefix is of more than one letter), the optionality of the prefix is lost, leading to the incorrect behavior reported. The simplest patch I came up is to modify slightly the "int starrchars(Range r, const(ubyte)[] prog)" function (that is called by "optimize") as follows: . . . case REnm: case REnmq: // len, n, m, () len = (cast(uint *)&prog[i + 1])[0]; n = (cast(uint *)&prog[i + 1])[1]; m = (cast(uint *)&prog[i + 1])[2]; pop = &prog[i + 1 + uint.sizeof * 3]; if (!starrchars(r, pop[0 .. len])) return 0; if (n) return 1; i += 1 + uint.sizeof * 3 + len; break; . . . should return 0 if the n operand of the REnm opcode is 0 (this changes the line before the break statement); this avoids the insertion of the optionality-killing first filter: . . . case REnm: case REnmq: // len, n, m, () len = (cast(uint *)&prog[i + 1])[0]; n = (cast(uint *)&prog[i + 1])[1]; m = (cast(uint *)&prog[i + 1])[2]; pop = &prog[i + 1 + uint.sizeof * 3]; if (!starrchars(r, pop[0 .. len])) return 0; if (n) return 1; return 0; break; . . . I tried it and it works now. Maybe this solves some other regexp bug yet open. Best regards, Marcello Gnani -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
October 11, 2009 [Issue 3136] Incorrect and strange behavior of std.regexp.RegExp if using a pattern with optional prefix and suffix longer than 1 char | ||||
---|---|---|---|---|
| ||||
Posted in reply to marcellognani@gmail.com | http://d.puremagic.com/issues/show_bug.cgi?id=3136 Andrei Alexandrescu <andrei@metalanguage.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED CC| |andrei@metalanguage.com AssignedTo|nobody@puremagic.com |andrei@metalanguage.com -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
June 05, 2011 [Issue 3136] Incorrect and strange behavior of std.regexp.RegExp if using a pattern with optional prefix and suffix longer than 1 char | ||||
---|---|---|---|---|
| ||||
Posted in reply to marcellognani@gmail.com | http://d.puremagic.com/issues/show_bug.cgi?id=3136 Andrei Alexandrescu <andrei@metalanguage.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|andrei@metalanguage.com |dmitry.olsh@gmail.com --- Comment #2 from Andrei Alexandrescu <andrei@metalanguage.com> 2011-06-05 08:11:26 PDT --- Reassigning to Dmitry. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
June 06, 2011 [Issue 3136] Incorrect and strange behavior of std.regexp.RegExp if using a pattern with optional prefix and suffix longer than 1 char | ||||
---|---|---|---|---|
| ||||
Posted in reply to marcellognani@gmail.com | http://d.puremagic.com/issues/show_bug.cgi?id=3136 Dmitry Olshansky <dmitry.olsh@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED --- Comment #3 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2011-06-06 08:03:48 PDT --- Fixed for std.regex https://github.com/D-Programming-Language/phobos/commit/9afb00e36b625322d7f1d8ec0fbd876c2b5c03fc -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
Copyright © 1999-2021 by the D Language Foundation