Restrictions in std.regexp?

May 02, 2006

Olaf Pohlmann

May 02, 2006

Lionello Lunesu

May 02, 2006

Derek Parnell

May 02, 2006

Olaf Pohlmann

May 02, 2006

Olaf Pohlmann

Hi, the documentation of std.regexp is somewhat sparse, so I tried to find out a few things on my own. There seems to be no way to do lookaheads and lookbehinds. This: RegExp re = search("ABCDEF", "(?<=AB)CD(?=EF)"); should find "CD" as a match, but it yields a runtime error: Error: *+? not allowed in atom Is there any other way to get this working or am I just out of luck with the current implementation? op

Olaf Pohlmann wrote: > Hi, > > the documentation of std.regexp is somewhat sparse, so I tried to find out a few things on my own. There seems to be no way to do lookaheads and lookbehinds. This: > > RegExp re = search("ABCDEF", "(?<=AB)CD(?=EF)"); > > should find "CD" as a match, but it yields a runtime error: Use "AB(CD)EF" and re.match(1) ?? I'm very inexperienced with regexp, mind you :S L.

May 02, 2006

Re: Restrictions in std.regexp?

Posted by Derek Parnell
in reply to Olaf Pohlmann

Permalink

Derek Parnell

Posted in reply to Olaf Pohlmann

Permalink

On Tue, 02 May 2006 23:39:13 +1000, Olaf Pohlmann <op@nospam.org> wrote:

> Hi,
>
> the documentation of std.regexp is somewhat sparse, so I tried to find out a few things on my own. There seems to be no way to do lookaheads and lookbehinds. This:
>
> 	RegExp re = search("ABCDEF", "(?<=AB)CD(?=EF)");
>
> should find "CD" as a match, but it yields a runtime error:
>
> 	Error: *+? not allowed in atom
>
> Is there any other way to get this working or am I just out of luck with the current implementation?

I can't tell what it is you are trying to do but it seems that the RE syntax you are expecting is not what has been implemented. See http:http://www.digitalmars.com/ctg/regular.html for details.

Are you looking for an optional "AB" followed by "CD" followed by an optional "EF" ?

If so try

    RegExp re = search("ABCDEF", "(AB)?(CD)(EF)?");

Here is a sample program ...

import std.stdio;
import std.regexp;

void main()
{
  RegExp re = search("AXCDEFGHI", "(AB)?(CD)(EF)?");

  writefln("PRE: %s", re.pre());
  writefln("MATCH: %s", re.match(0));
  writefln("SUB1: %s", re.match(1));
  writefln("SUB2: %s", re.match(2));  // this should be 'CD'
  writefln("SUB3: %s", re.match(3));
  writefln("POST: %s", re.post());
}

-- 
Derek Parnell
Melbourne, Australia

Derek Parnell wrote: > Are you looking for an optional "AB" followed by "CD" followed by an optional "EF" ? No. I'm looking for a string that is preceeded and followed by well defined other strings. The match should *not* return the whole sequence but only what is in the middle. It's actually about parsing some kind of text markup. If it was html like "<body><h1>Welcome</h1></body>" it should allow me to retrieve only the "Welcome". If you just use some grouping the match will be the whole <h1> element, so you have to extract the content in a 2nd step. The regexp with lookahead and lookbehind works fine in Python: import re html = "<body>\n<h1>Welcome</h1>\n</body>" match = re.search("(?<=\<h1\>).*?(?=\</h1\>)", html) html[m.start():m.end()] This prints 'Welcome'. The regexp is a bit hard to read, so see http://docs.python.org/lib/re-syntax.html for a description. Now, I can retrieve the whole h1 element with the D version of regexps and then do another scan for the content but it would be nice to get it in one step, like in the Python version. op

Derek Parnell wrote: > RegExp re = search("ABCDEF", "(AB)?(CD)(EF)?"); Oops, this is actually very close to the solution, just drop both '?'. It's even more readable than what I tried before: import std.stdio; import std.regexp; void main() { char[] html = "<body>\n<h1>Welcome</h1>\n</body>"; RegExp re = search(html, r"(\<h1\>)(.*?)(\</h1\>)"); if (re !is null) writefln("%s", re.match(2)); } op

Forums