Thread overview
Restrictions in std.regexp?
May 02, 2006
Olaf Pohlmann
May 02, 2006
Lionello Lunesu
May 02, 2006
Derek Parnell
May 02, 2006
Olaf Pohlmann
May 02, 2006
Olaf Pohlmann
May 02, 2006
Hi,

the documentation of std.regexp is somewhat sparse, so I tried to find out a few things on my own. There seems to be no way to do lookaheads and lookbehinds. This:

	RegExp re = search("ABCDEF", "(?<=AB)CD(?=EF)");

should find "CD" as a match, but it yields a runtime error:

	Error: *+? not allowed in atom

Is there any other way to get this working or am I just out of luck with the current implementation?



op
May 02, 2006
Olaf Pohlmann wrote:
> Hi,
> 
> the documentation of std.regexp is somewhat sparse, so I tried to find out a few things on my own. There seems to be no way to do lookaheads and lookbehinds. This:
> 
>     RegExp re = search("ABCDEF", "(?<=AB)CD(?=EF)");
> 
> should find "CD" as a match, but it yields a runtime error:

Use "AB(CD)EF" and re.match(1) ??
I'm very inexperienced with regexp, mind you :S

L.
May 02, 2006
On Tue, 02 May 2006 23:39:13 +1000, Olaf Pohlmann <op@nospam.org> wrote:

> Hi,
>
> the documentation of std.regexp is somewhat sparse, so I tried to find out a few things on my own. There seems to be no way to do lookaheads and lookbehinds. This:
>
> 	RegExp re = search("ABCDEF", "(?<=AB)CD(?=EF)");
>
> should find "CD" as a match, but it yields a runtime error:
>
> 	Error: *+? not allowed in atom
>
> Is there any other way to get this working or am I just out of luck with the current implementation?

I can't tell what it is you are trying to do but it seems that the RE syntax you are expecting is not what has been implemented. See http:http://www.digitalmars.com/ctg/regular.html for details.

Are you looking for an optional "AB" followed by "CD" followed by an optional "EF" ?

If so try

    RegExp re = search("ABCDEF", "(AB)?(CD)(EF)?");

Here is a sample program ...

import std.stdio;
import std.regexp;

void main()
{
  RegExp re = search("AXCDEFGHI", "(AB)?(CD)(EF)?");

  writefln("PRE: %s", re.pre());
  writefln("MATCH: %s", re.match(0));
  writefln("SUB1: %s", re.match(1));
  writefln("SUB2: %s", re.match(2));  // this should be 'CD'
  writefln("SUB3: %s", re.match(3));
  writefln("POST: %s", re.post());
}

-- 
Derek Parnell
Melbourne, Australia
May 02, 2006
Derek Parnell wrote:
> Are you looking for an optional "AB" followed by "CD" followed by an  optional "EF" ?

No. I'm looking for a string that is preceeded and followed by well defined other strings. The match should *not* return the whole sequence but only what is in the middle. It's actually about parsing some kind of text markup. If it was html like "<body><h1>Welcome</h1></body>" it should allow me to retrieve only the "Welcome". If you just use some grouping the match will be the whole <h1> element, so you have to extract the content in a 2nd step. The regexp with lookahead and lookbehind works fine in Python:

import re
html = "<body>\n<h1>Welcome</h1>\n</body>"
match = re.search("(?<=\<h1\>).*?(?=\</h1\>)", html)
html[m.start():m.end()]

This prints 'Welcome'.

The regexp is a bit hard to read, so see http://docs.python.org/lib/re-syntax.html for a description.

Now, I can retrieve the whole h1 element with the D version of regexps and then do another scan for the content but it would be nice to get it in one step, like in the Python version.


op
May 02, 2006
Derek Parnell wrote:
>     RegExp re = search("ABCDEF", "(AB)?(CD)(EF)?");

Oops, this is actually very close to the solution, just drop both '?'. It's even more readable than what I tried before:

import std.stdio;
import std.regexp;

void main()
{
    char[] html = "<body>\n<h1>Welcome</h1>\n</body>";
	RegExp re = search(html, r"(\<h1\>)(.*?)(\</h1\>)");
	if (re !is null)
		writefln("%s", re.match(2));
}



op