Thread overview
regexp (reluctant)
Aug 09, 2005
Fredrik Olsson
Aug 09, 2005
Chris Sauls
Aug 10, 2005
Walter
Aug 10, 2005
Fredrik Olsson
Aug 11, 2005
Walter
August 09, 2005
Since no one have replied to my "export to .h" post, and google yield no matches, I have begun coding a D exports to C header file tool. And I see it fitting to use D for the task. I would usual use Ruby, but hey, some practice at D and another recource for the comunity ;).


I have defined this regexp:
const char[] redoccom = "\\s*(\\/\\*[\\*\\!].*?\\*\\/)?\\s*";

Or in more "readable" form:
  \s*(\/\*[\*\!].*?\*\/)?\s*

The documentation for regexp does not mention reluctant quanitfiers, usually (exp)* would mean find the largest match you can of (exp) and (exp)*? would mean the smallest match of (exp). Well this is not the problem, even though I think the documentation should state if *, ? and the {} quantifiers are greedy or reluctant.

For those who read regular expressions this is a simplistic match for an optional documentation comment in code on one of theese two forms (With total ignorance of content as long as it is no nested comments):
  /**
    Foo
  */
or
  /*!
    Bar
  */
With a capture group for the actual comment. Useful for example as:
new TegExp(redocom ~ "export", "m");
to find exported members in a file allong with relevant documentation.

Any how. With or without the reluctant quantifier D does not give me the result I expect. I use SubEthaEdit with default regexp syntax (Ruby) to verify the matches and correct capture groups (Only problem is that with greedy quantifiers it matches from the start of the very first docdomment to the end of the very last comment, something reluctant quantifiers is required to compensate for).

regard
	Fredrik Olsson
August 09, 2005
Fredrik Olsson wrote:
> I have defined this regexp:
> const char[] redoccom = "\\s*(\\/\\*[\\*\\!].*?\\*\\/)?\\s*";
> 
> Or in more "readable" form:
>   \s*(\/\*[\*\!].*?\*\/)?\s*

This is such a little thing, but you probably should use WYSIWYG string literals for regexp's, to help with readability.  (I know I do.)

# const char[] RE_DOC_COM = r"\s*(\/\*[\*\!].*?\*\/)?\s*"c;

Or:

# const char[] RE_DOC_COM = `\s*(\/\*[\*\!].*?\*\/)?\s*`c;

-- Chris Sauls
August 10, 2005
Can you flesh it out a bit with some minimal sample text, the result you get with regexp, and what the correct result should be? Also, can you try and simpify the regular expression as much as possible?


August 10, 2005
Walter wrote:
> Can you flesh it out a bit with some minimal sample text, the result you get
> with regexp, and what the correct result should be? Also, can you try and
> simpify the regular expression as much as possible?
> 
> 

This is as shart as I can get it will the behavior still intact:
/* BEGIN: regexp.d */

import std.regexp;
import std.stdio;

int main(char[][] args) {

  RegExp re = new RegExp(r"\s*(\*.*?\*)?\s*", null);

  char[][] ms = re.match("*\n foo\n * bar");

  foreach(char[] m; ms) {
    writefln("'" ~ m ~ "'");
  }

  return 0;
}

/* END: regexp.d */

And an actual compile/run session:
peylow@imanicken:~$ gdc regexp.d -o regexp; ./regexp
''
''
peylow@imanicken:~$

Excpected compile run:
peylow@imanicken:~$ gdc regexp.d -o regexp; ./regexp
'*
 foo
 * '
'*
 foo
 *'
peylow@imanicken:~$

If I remove the newlines in the string and search in "* foo * bar" then I correctly get:
peylow@imanicken:~$ gdc regexp.d -o regexp; ./regexp
'* foo * '
'* foo *'
peylow@imanicken:~$

So it seams that "." dos not match any character, as it misses newline.

Regards
	Fredrik Olsson
August 11, 2005
Thanks, I can work with that.