Thread overview
[Issue 2108] New: regexp.d: The greedy dotstar isn't so greedy
May 15, 2008
d-bugmail
Oct 18, 2009
David Simcha
May 06, 2010
Jesse Phillips
Apr 18, 2011
Dmitry Olshansky
Jun 05, 2011
Dmitry Olshansky
Jun 06, 2011
Dmitry Olshansky
May 15, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2108

           Summary: regexp.d: The greedy dotstar isn't so greedy
           Product: D
           Version: 2.012
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: bugzilla@digitalmars.com
        ReportedBy: nyphbl8d@gmail.com


As far as I'm aware, ".*" should be greedy by default and become non-greedy when changed to ".*?".  As it stands now, both ".*" and ".*?" are non-greedy when it comes to std.regexp and I have found no way to make ".*" greedy, flags or otherwise.  This can be seen by using "<packet>text</packet><packet>text</packet>" as the buffer to match against and "<packet.*/packet>" as the pattern.  When I use this with std.regexp.search, it only matches the first opening and closing tag instead of the outer set.  I just hope this isn't my lack of regex-fu coming back to haunt me.


-- 

October 11, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=2108


Andrei Alexandrescu <andrei@metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |andrei@metalanguage.com
         AssignedTo|nobody@puremagic.com        |andrei@metalanguage.com


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
October 18, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=2108


David Simcha <dsimcha@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |swadenator@gmail.com


--- Comment #1 from David Simcha <dsimcha@yahoo.com> 2009-10-18 07:44:51 PDT ---
*** Issue 2487 has been marked as a duplicate of this issue. ***

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 06, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=2108


Jesse Phillips <Jesse.K.Phillips+D@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |Jesse.K.Phillips+D@gmail.co
                   |                            |m
         OS/Version|Linux                       |All


--- Comment #2 from Jesse Phillips <Jesse.K.Phillips+D@gmail.com> 2010-05-06 14:30:40 PDT ---
This is also an issue in Windows with std.regex using DMD 2.043

But I would like to add that it is always greedy prior to text. The first assert will fail since it was not non-greedy and the second is what it should be.

import std.regex;

void main() {
   assert(match("Hello there you silly person you.",
     regex(r"\b.+? you .+\w")).hit != "Hello there you silly");

   assert(match("Hello there you silly person you.",
     regex(r"\b.+? you .+\w")).hit == "there you silly person");
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 18, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=2108


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh@gmail.com


--- Comment #3 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2011-04-18 13:54:37 PDT ---
(In reply to comment #2)
> This is also an issue in Windows with std.regex using DMD 2.043
> 
> But I would like to add that it is always greedy prior to text. The first assert will fail since it was not non-greedy and the second is what it should be.
> 
> import std.regex;
> 
> void main() {
>    assert(match("Hello there you silly person you.",
>      regex(r"\b.+? you .+\w")).hit != "Hello there you silly");
> 
>    assert(match("Hello there you silly person you.",
>      regex(r"\b.+? you .+\w")).hit == "there you silly person");
> }

Actually it should be
 assert(match("Hello there you silly person you.",
      regex(r"\b.+? you .+\w")).hit == "Hello there you silly person you");

Two points - \b also matches at the begining of input (if the first char is
\w), and the last .+ is greedy, and since '.' is certainly not a \w, we have
what we have.
Also tested at:
http://www.regextester.com/
http://www.regular-expressions.info/javascriptexample.html
... etc.
P.S. The patch is coming ;)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 05, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=2108


Andrei Alexandrescu <andrei@metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|andrei@metalanguage.com     |dmitry.olsh@gmail.com


--- Comment #4 from Andrei Alexandrescu <andrei@metalanguage.com> 2011-06-04 17:48:52 PDT ---
Reassigning to Dmitry.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 05, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=2108



--- Comment #5 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2011-06-05 00:09:28 PDT ---
I'd gladly close this issue, since it now works correctly in std.regex. But the report is filed against std.regexP. Should I close it?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 05, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=2108



--- Comment #6 from Andrei Alexandrescu <andrei@metalanguage.com> 2011-06-05 06:18:54 PDT ---
Yes. Please also update the changelog.dd file.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
June 06, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=2108


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


--- Comment #7 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2011-06-06 08:02:43 PDT ---
Fixed for std.regex: https://github.com/D-Programming-Language/phobos/commit/9afb00e36b625322d7f1d8ec0fbd876c2b5c03fc

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------