Thread overview | |||||||
---|---|---|---|---|---|---|---|
|
August 09, 2005 regexp (reluctant) | ||||
---|---|---|---|---|
| ||||
Since no one have replied to my "export to .h" post, and google yield no matches, I have begun coding a D exports to C header file tool. And I see it fitting to use D for the task. I would usual use Ruby, but hey, some practice at D and another recource for the comunity ;). I have defined this regexp: const char[] redoccom = "\\s*(\\/\\*[\\*\\!].*?\\*\\/)?\\s*"; Or in more "readable" form: \s*(\/\*[\*\!].*?\*\/)?\s* The documentation for regexp does not mention reluctant quanitfiers, usually (exp)* would mean find the largest match you can of (exp) and (exp)*? would mean the smallest match of (exp). Well this is not the problem, even though I think the documentation should state if *, ? and the {} quantifiers are greedy or reluctant. For those who read regular expressions this is a simplistic match for an optional documentation comment in code on one of theese two forms (With total ignorance of content as long as it is no nested comments): /** Foo */ or /*! Bar */ With a capture group for the actual comment. Useful for example as: new TegExp(redocom ~ "export", "m"); to find exported members in a file allong with relevant documentation. Any how. With or without the reluctant quantifier D does not give me the result I expect. I use SubEthaEdit with default regexp syntax (Ruby) to verify the matches and correct capture groups (Only problem is that with greedy quantifiers it matches from the start of the very first docdomment to the end of the very last comment, something reluctant quantifiers is required to compensate for). regard Fredrik Olsson |
August 09, 2005 Re: regexp (reluctant) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Fredrik Olsson | Fredrik Olsson wrote: > I have defined this regexp: > const char[] redoccom = "\\s*(\\/\\*[\\*\\!].*?\\*\\/)?\\s*"; > > Or in more "readable" form: > \s*(\/\*[\*\!].*?\*\/)?\s* This is such a little thing, but you probably should use WYSIWYG string literals for regexp's, to help with readability. (I know I do.) # const char[] RE_DOC_COM = r"\s*(\/\*[\*\!].*?\*\/)?\s*"c; Or: # const char[] RE_DOC_COM = `\s*(\/\*[\*\!].*?\*\/)?\s*`c; -- Chris Sauls |
August 10, 2005 Re: regexp (reluctant) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Fredrik Olsson | Can you flesh it out a bit with some minimal sample text, the result you get with regexp, and what the correct result should be? Also, can you try and simpify the regular expression as much as possible? |
August 10, 2005 Re: regexp (reluctant) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | Walter wrote:
> Can you flesh it out a bit with some minimal sample text, the result you get
> with regexp, and what the correct result should be? Also, can you try and
> simpify the regular expression as much as possible?
>
>
This is as shart as I can get it will the behavior still intact:
/* BEGIN: regexp.d */
import std.regexp;
import std.stdio;
int main(char[][] args) {
RegExp re = new RegExp(r"\s*(\*.*?\*)?\s*", null);
char[][] ms = re.match("*\n foo\n * bar");
foreach(char[] m; ms) {
writefln("'" ~ m ~ "'");
}
return 0;
}
/* END: regexp.d */
And an actual compile/run session:
peylow@imanicken:~$ gdc regexp.d -o regexp; ./regexp
''
''
peylow@imanicken:~$
Excpected compile run:
peylow@imanicken:~$ gdc regexp.d -o regexp; ./regexp
'*
foo
* '
'*
foo
*'
peylow@imanicken:~$
If I remove the newlines in the string and search in "* foo * bar" then I correctly get:
peylow@imanicken:~$ gdc regexp.d -o regexp; ./regexp
'* foo * '
'* foo *'
peylow@imanicken:~$
So it seams that "." dos not match any character, as it misses newline.
Regards
Fredrik Olsson
|
August 11, 2005 Re: regexp (reluctant) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Fredrik Olsson | Thanks, I can work with that. |
Copyright © 1999-2021 by the D Language Foundation