Thread overview
[Issue 1750] New: RegExp: lack of support for wchar, dchar; lack of lookingAt() method
Dec 26, 2007
d-bugmail
Sep 27, 2010
Marcin Kuszczak
Mar 12, 2012
Dmitry Olshansky
December 26, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1750

           Summary: RegExp: lack of support for wchar, dchar; lack of
                    lookingAt() method
           Product: D
           Version: 2.008
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: bugzilla@digitalmars.com
        ReportedBy: aarti@interia.pl


1. RegExp should work for at least wchar & dchar. Maybe also for integral array
types (e.g. int[]).

2. There is no bool lookingAt() method which tries to match string at its beginning and if it doesn't match return. For reference: http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Matcher.html Currently it is very ineffective to match pattern in incoming stream of data. Solution with lookingAt() will be much faster.


-- 

October 11, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=1750


Andrei Alexandrescu <andrei@metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |andrei@metalanguage.com
         AssignedTo|nobody@puremagic.com        |andrei@metalanguage.com


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 26, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=1750



--- Comment #1 from Andrei Alexandrescu <andrei@metalanguage.com> 2010-09-26 11:37:42 PDT ---
The new RegEx supports wchar and dchar. Regarding lookingAt(), I'm unclear: how is it different from searching for a pattern starting with the anchor "^"?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 27, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=1750



--- Comment #2 from Marcin Kuszczak <aarti@interia.pl> 2010-09-27 11:02:35 PDT ---
lookingAt() can be used on streams without a need for getting whole string from stream. Also ^ can not be used for matching some specific pattern in stream. You just can not assume that your input is starting after line end. Input can even not be splitted into lines.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 12, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=1750


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


--- Comment #4 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-03-12 01:45:41 PDT ---
Ok. Meant to do it for ages.
The second point rised in this bug report has no proof, and, in fact, is
invalid.
Truth of the matter is that looking through all of Java's regex documentation I
observe:
1. There is no such thing as regex on stream in Java, all objects it works on
are  3 variants of character buffers i.e. wrapped arrays and it's ilk.
2. lookingAt is indeed equivalent to appending '^' to a regex pattern, and as
far as performance concerns go both versions should use the same optimization,
namely "no search" optimization. And at least current std.regex does optimize
for '^' _somewhere_ at start e.g. sily things like "(^...)..." still get
optimized.
3. Due to implementation details of Java-style regex there is no way it can to
work directly on stream and keep all it's syntax features, even if tried to do
so, the problem common to all backtracking engines. And yes, in some cases it
has to walk the entire input to make sure it matched what it should match.

Marking as fixed as the first point of the report was solved long ago, the second isinvalid as is. It also rises a good point on however that was accounted for already.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------