Thread overview | |||||||
---|---|---|---|---|---|---|---|
|
May 02, 2006 Restrictions in std.regexp? | ||||
---|---|---|---|---|
| ||||
Hi, the documentation of std.regexp is somewhat sparse, so I tried to find out a few things on my own. There seems to be no way to do lookaheads and lookbehinds. This: RegExp re = search("ABCDEF", "(?<=AB)CD(?=EF)"); should find "CD" as a match, but it yields a runtime error: Error: *+? not allowed in atom Is there any other way to get this working or am I just out of luck with the current implementation? op |
May 02, 2006 Re: Restrictions in std.regexp? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Olaf Pohlmann | Olaf Pohlmann wrote:
> Hi,
>
> the documentation of std.regexp is somewhat sparse, so I tried to find out a few things on my own. There seems to be no way to do lookaheads and lookbehinds. This:
>
> RegExp re = search("ABCDEF", "(?<=AB)CD(?=EF)");
>
> should find "CD" as a match, but it yields a runtime error:
Use "AB(CD)EF" and re.match(1) ??
I'm very inexperienced with regexp, mind you :S
L.
|
May 02, 2006 Re: Restrictions in std.regexp? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Olaf Pohlmann | On Tue, 02 May 2006 23:39:13 +1000, Olaf Pohlmann <op@nospam.org> wrote: > Hi, > > the documentation of std.regexp is somewhat sparse, so I tried to find out a few things on my own. There seems to be no way to do lookaheads and lookbehinds. This: > > RegExp re = search("ABCDEF", "(?<=AB)CD(?=EF)"); > > should find "CD" as a match, but it yields a runtime error: > > Error: *+? not allowed in atom > > Is there any other way to get this working or am I just out of luck with the current implementation? I can't tell what it is you are trying to do but it seems that the RE syntax you are expecting is not what has been implemented. See http:http://www.digitalmars.com/ctg/regular.html for details. Are you looking for an optional "AB" followed by "CD" followed by an optional "EF" ? If so try RegExp re = search("ABCDEF", "(AB)?(CD)(EF)?"); Here is a sample program ... import std.stdio; import std.regexp; void main() { RegExp re = search("AXCDEFGHI", "(AB)?(CD)(EF)?"); writefln("PRE: %s", re.pre()); writefln("MATCH: %s", re.match(0)); writefln("SUB1: %s", re.match(1)); writefln("SUB2: %s", re.match(2)); // this should be 'CD' writefln("SUB3: %s", re.match(3)); writefln("POST: %s", re.post()); } -- Derek Parnell Melbourne, Australia |
May 02, 2006 Re: Restrictions in std.regexp? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Derek Parnell | Derek Parnell wrote: > Are you looking for an optional "AB" followed by "CD" followed by an optional "EF" ? No. I'm looking for a string that is preceeded and followed by well defined other strings. The match should *not* return the whole sequence but only what is in the middle. It's actually about parsing some kind of text markup. If it was html like "<body><h1>Welcome</h1></body>" it should allow me to retrieve only the "Welcome". If you just use some grouping the match will be the whole <h1> element, so you have to extract the content in a 2nd step. The regexp with lookahead and lookbehind works fine in Python: import re html = "<body>\n<h1>Welcome</h1>\n</body>" match = re.search("(?<=\<h1\>).*?(?=\</h1\>)", html) html[m.start():m.end()] This prints 'Welcome'. The regexp is a bit hard to read, so see http://docs.python.org/lib/re-syntax.html for a description. Now, I can retrieve the whole h1 element with the D version of regexps and then do another scan for the content but it would be nice to get it in one step, like in the Python version. op |
May 02, 2006 Re: Restrictions in std.regexp? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Derek Parnell | Derek Parnell wrote: > RegExp re = search("ABCDEF", "(AB)?(CD)(EF)?"); Oops, this is actually very close to the solution, just drop both '?'. It's even more readable than what I tried before: import std.stdio; import std.regexp; void main() { char[] html = "<body>\n<h1>Welcome</h1>\n</body>"; RegExp re = search(html, r"(\<h1\>)(.*?)(\</h1\>)"); if (re !is null) writefln("%s", re.match(2)); } op |
Copyright © 1999-2021 by the D Language Foundation