Thread overview
negative assertion support for RegExp?
Aug 13, 2005
Thomas Kühne
Aug 14, 2005
Manfred Nowak
Aug 14, 2005
AJG
Aug 14, 2005
Thomas Kühne
Aug 14, 2005
Manfred Nowak
Re: neg. support for RegExp? (Yes, PCRE)
Aug 14, 2005
AJG
Aug 14, 2005
Derek Parnell
Aug 14, 2005
Thomas Kühne
August 13, 2005
Is there any D library that offers regular expressions with negative assertion support?

There seems to be no documented way to use negative assertions in Phobo's regular expressions. (http://digitalmars.com/ctg/regular.html)

Usually the syntax "(?!doNotMatch)" is used for that on Linux systems.


Thomas


- -- sample code ---
import std.regexp;
import std.stdio;

int main(){
	char[] log=
		"IP:127.0.0.1; USER:some; additional info\n"
		"IP:123.3.8.0; USER:other; additional info\n";

	char[] pattern = "^(?!IP:(127[.]0[.]0[.]1)); USER:([^;@]*);";
	char[] format = "; USER:$2@$1;";
	char[] attributes = "g";

	char[] filtered = sub(log, pattern, format, attributes);

	writef("---unfiltered---\n%s\n", log);
	writef("---filtered---\n%s\n", filtered);

	return 0;
}

/* Expected Output:

- ---unfiltered---
IP:127.0.0.1; USER:some; additional info
IP:123.3.8.17; USER:other; additional info

- ---filtered---
IP:127.0.0.1; USER:some; additional info
IP:123.3.8.17; USER:other@123.3.8.17; additional info

*/
August 14, 2005
=?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?= <thomas-dloop@kuehne.THISISSPAM.cn> wrote:

[...]
> Is there any D library that offers regular expressions with negative assertion support?
[...]

Why do you need such? With a little bit of programming with split, find and rfind you should be able to use std.regexp for that purpose.

-manfred
August 14, 2005
In article <ddmogt$19aq$1@digitaldaemon.com>, Manfred Nowak says...
>
>=?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?= <thomas-dloop@kuehne.THISISSPAM.cn> wrote:
>
>[...]
>> Is there any D library that offers regular expressions with negative assertion support?
>[...]
>
>Why do you need such? With a little bit of programming with split, find and rfind you should be able to use std.regexp for that purpose.

To save himself that bit of programming? ;) Regexes are currently somewhat limited in phobos. I find myself missing Perl features all the time.

--AJG.


August 14, 2005
Hi Thomas,

Actually, I ported PCRE version 5 to D about a month ago when Walter told me phobos didn't support named groups. AFAIK it works correctly; I compiled the test program (a version of grep) and it didn't show any errors.

The only problem is that it's not object-oriented (it's the C API).

Anyway, I'm going to upload the code and maybe you can use that. You can find example code in main.d. All you need essentially is:

# import pcre;

And off you go. If you have Build you can do:

% build main

And that's it.

Let me know if you find it useful. If there's enough interest, I could develop a D-based OO interface for it, and maybe Walter will consider it for inclusion in phobos to replace the old regex.

Some technical notes:

I ported the code with SUPPORT_UTF8, but _not_ with SUPPORT_UCP because that was just a lot of bloat. Also, the LINK_SIZE I selected was 2, the default.

Here's the link:

http://pantheon.yale.edu/~ajg36/pcre.zip

Enjoy!
--AJG.



In article <ddkoss$2u5m$1@digitaldaemon.com>, =?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?= says...
>
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>Is there any D library that offers regular expressions with negative assertion support?
>
>There seems to be no documented way to use negative assertions in Phobo's regular expressions. (http://digitalmars.com/ctg/regular.html)
>
>Usually the syntax "(?!doNotMatch)" is used for that on Linux systems.
>
>
>Thomas
>
>
>- -- sample code ---
>import std.regexp;
>import std.stdio;
>
>int main(){
>	char[] log=
>		"IP:127.0.0.1; USER:some; additional info\n"
>		"IP:123.3.8.0; USER:other; additional info\n";
>
>	char[] pattern = "^(?!IP:(127[.]0[.]0[.]1)); USER:([^;@]*);";
>	char[] format = "; USER:$2@$1;";
>	char[] attributes = "g";
>
>	char[] filtered = sub(log, pattern, format, attributes);
>
>	writef("---unfiltered---\n%s\n", log);
>	writef("---filtered---\n%s\n", filtered);
>
>	return 0;
>}
>
>/* Expected Output:
>
>- ---unfiltered---
>IP:127.0.0.1; USER:some; additional info
>IP:123.3.8.17; USER:other; additional info
>
>- ---filtered---
>IP:127.0.0.1; USER:some; additional info
>IP:123.3.8.17; USER:other@123.3.8.17; additional info
>
>*/
>-----BEGIN PGP SIGNATURE-----
>
>iD4DBQFC/ec13w+/yD4P9tIRAh+7AJ9kLB27xKffpuoXhbkuT34WDP/DYQCYo1x7
>r0vTnBDmV/cn7+gjOfKbyA==
>=Ep0M
>-----END PGP SIGNATURE-----


August 14, 2005
AJG schrieb:
> In article <ddmogt$19aq$1@digitaldaemon.com>, Manfred Nowak says...
> 
>>=?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?= <thomas-dloop@kuehne.THISISSPAM.cn> wrote:
>>
>>[...]
>>
>>>Is there any D library that offers regular expressions with negative assertion support?
>>
>>[...]
>>
>>Why do you need such? With a little bit of programming with split, find and rfind you should be able to use std.regexp for that purpose.
> 
> 
> To save himself that bit of programming? ;) Regexes are currently somewhat limited in phobos. I find myself missing Perl features all the time.

What I gave was a very simple regex. The production ones are nested, include alternatives and contain more than one negative assertion.

Thomas
August 14, 2005
=?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?= <thomas-dloop@kuehne.THISISSPAM.cn> wrote:

[...]
> What I gave was a very simple regex. The production ones are nested, include alternatives and contain more than one negative assertion.
[...]

Then I do not believe, that an approach with RE's and "assertions" is feasable in terms of run time requirements in first place, but also in terms of time for development and maintenance, because you are implementing some sort of lexer/parser for a language you do not have an explicit formal grammar for nor the definitions for the lexical tokens.

I do not know the details of the implementation of PCRE, but I do not believe, that a tool that has its emphasis on RE's incidentally also implements an LALR-parser.

-manfred
August 14, 2005
On Sun, 14 Aug 2005 08:02:41 +0000 (UTC), AJG wrote:

> Hi Thomas,
> 
> Actually, I ported PCRE version 5 to D about a month ago when Walter told me phobos didn't support named groups. AFAIK it works correctly; I compiled the test program (a version of grep) and it didn't show any errors.
> 
> The only problem is that it's not object-oriented (it's the C API).

I don't see that O-O is a requirement. A simple procedural API is quite satisfactory.

-- 
Derek Parnell
Melbourne, Australia
14/08/2005 11:07:28 PM
August 14, 2005
AJG schrieb:
> Hi Thomas,
> 
> Actually, I ported PCRE version 5 to D about a month ago when Walter told me phobos didn't support named groups. AFAIK it works correctly; I compiled the test program (a version of grep) and it didn't show any errors.
> 
> The only problem is that it's not object-oriented (it's the C API).
> 
> Anyway, I'm going to upload the code and maybe you can use that. You can find example code in main.d. All you need essentially is:
> 
> # import pcre;
> 
> And off you go. If you have Build you can do:
> 
> % build main
> 
> And that's it.
> 
> Let me know if you find it useful. If there's enough interest, I could develop a D-based OO interface for it, and maybe Walter will consider it for inclusion in phobos to replace the old regex.
> 
> Some technical notes:
> 
> I ported the code with SUPPORT_UTF8, but _not_ with SUPPORT_UCP because that was just a lot of bloat. Also, the LINK_SIZE I selected was 2, the default.
> 
> Here's the link:
> 
> http://pantheon.yale.edu/~ajg36/pcre.zip

Thanks for the code :)))

The main.d sample requires to small changes:

line 1
< private import pcre_c;
> private import pcre;

line 8
< pcre *re;
> pcre.pcre *re;

I think PCRE_D - after a bit of clean up and some unittests - might become a valuable Phobos addon.

Thomas