Thread overview | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
August 07, 2014 Very Stupid Regex question | ||||
---|---|---|---|---|
| ||||
Cosider please the following: string s1 = PREabcdPOST; string s2 = PREabPOST; string[] srar = ["ab", "abcd"]; // this can not be constructed with a particular order foreach(sr; srar) { auto r = regex(sr; "g"); auto m = matchFirst(s1, r); break; // this one matches ab // but I want this to match abcd // and for s2 I want to match ab } obviously there are ways like counting the match length, and then using the maximum length, instead of breaking as soon as a match is found. Are there any other better ways? |
August 07, 2014 Re: Very Stupid Regex question | ||||
---|---|---|---|---|
| ||||
Posted in reply to seany | On Thursday, 7 August 2014 at 16:05:17 UTC, seany wrote:
> Cosider please the following:
>
> string s1 = PREabcdPOST;
> string s2 = PREabPOST;
>
>
> string[] srar = ["ab", "abcd"];
> // this can not be constructed with a particular order
>
> foreach(sr; srar)
> {
>
> auto r = regex(sr; "g");
> auto m = matchFirst(s1, r);
> break;
> // this one matches ab
> // but I want this to match abcd
> // and for s2 I want to match ab
>
> }
>
> obviously there are ways like counting the match length, and then using the maximum length, instead of breaking as soon as a match is found.
>
> Are there any other better ways?
It's not clear to me what exactly you want, but:
Are the regexes in `srar` related? That is, does one regex always include the previous one as a prefix? Then you can use optional matches:
/ab(cd)?/
This will match "abcd" if it is there, but will also match "ab" otherwise.
|
August 07, 2014 Re: Very Stupid Regex question | ||||
---|---|---|---|---|
| ||||
Posted in reply to seany | On Thu, 07 Aug 2014 16:05:16 +0000, seany wrote: > obviously there are ways like counting the match length, and then using the maximum length, instead of breaking as soon as a match is found. > > Are there any other better ways? You're not really using regexes properly. You want to greedily match as much as possible in this case, e.g.: void main() { import std.regex; auto re = regex("ab(cd)?"); assert("PREabcdPOST".matchFirst(re).hit == "abcd"); assert("PREabPOST".matchFirst(re).hit == "ab"); } |
August 07, 2014 Re: Very Stupid Regex question | ||||
---|---|---|---|---|
| ||||
Posted in reply to Justin Whear | On Thursday, 7 August 2014 at 16:12:59 UTC, Justin Whear wrote:
> On Thu, 07 Aug 2014 16:05:16 +0000, seany wrote:
>
>> obviously there are ways like counting the match length, and then using
>> the maximum length, instead of breaking as soon as a match is found.
>>
>> Are there any other better ways?
>
> You're not really using regexes properly. You want to greedily match as
> much as possible in this case, e.g.:
>
> void main()
> {
> import std.regex;
> auto re = regex("ab(cd)?");
> assert("PREabcdPOST".matchFirst(re).hit == "abcd");
> assert("PREabPOST".matchFirst(re).hit == "ab");
>
> }
thing is, abcd is read from a file, and in the compile time, i dont know if cd may at all be there or not, ir if it should be ab(ef)
|
August 07, 2014 Re: Very Stupid Regex question | ||||
---|---|---|---|---|
| ||||
Posted in reply to seany | On Thu, Aug 07, 2014 at 04:49:05PM +0000, seany via Digitalmars-d-learn wrote: > On Thursday, 7 August 2014 at 16:12:59 UTC, Justin Whear wrote: > >On Thu, 07 Aug 2014 16:05:16 +0000, seany wrote: > > > >>obviously there are ways like counting the match length, and then using the maximum length, instead of breaking as soon as a match is found. > >> > >>Are there any other better ways? > > > >You're not really using regexes properly. You want to greedily match as much as possible in this case, e.g.: > > > >void main() > >{ > > import std.regex; > > auto re = regex("ab(cd)?"); > > assert("PREabcdPOST".matchFirst(re).hit == "abcd"); > > assert("PREabPOST".matchFirst(re).hit == "ab"); > > > >} > > thing is, abcd is read from a file, and in the compile time, i dont know if cd may at all be there or not, ir if it should be ab(ef) So basically you have a file containing regex patterns, and you want to find the longest match among them? One way to do this is to combine them at runtime: string[] patterns = ... /* read from file, etc. */; // Longer patterns match first patterns.sort!((a,b) => a.length > b.length); // Build regex string regexStr = "%((%(%c%))%||%)".format(patterns); auto re = regex(regexStr); ... // Run matches against input char[] input = ...; auto m = input.match(re); auto matchedString = m.captures[0]; T -- When solving a problem, take care that you do not become part of the problem. |
August 07, 2014 Re: Very Stupid Regex question | ||||
---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On Thu, 07 Aug 2014 10:22:37 -0700, H. S. Teoh via Digitalmars-d-learn wrote: > > So basically you have a file containing regex patterns, and you want to find the longest match among them? > // Longer patterns match first patterns.sort!((a,b) => a.length > > b.length); > > // Build regex string regexStr = "%((%(%c%))%||%)".format (patterns); > auto re = regex(regexStr); This only works if the patterns are simple literals. E.g. the pattern 'a +' might match a longer sequence than 'aaa'. If you're out for the longest possible match, iteratively testing each pattern is probably the way to go. |
August 07, 2014 Re: Very Stupid Regex question | ||||
---|---|---|---|---|
| ||||
Posted in reply to Justin Whear | On Thu, Aug 07, 2014 at 05:33:42PM +0000, Justin Whear via Digitalmars-d-learn wrote: > On Thu, 07 Aug 2014 10:22:37 -0700, H. S. Teoh via Digitalmars-d-learn wrote: > > > > > So basically you have a file containing regex patterns, and you want to find the longest match among them? > > > // Longer patterns match first patterns.sort!((a,b) => a.length > > > b.length); > > > > // Build regex string regexStr = "%((%(%c%))%||%)".format > (patterns); > > auto re = regex(regexStr); > > This only works if the patterns are simple literals. E.g. the pattern 'a +' might match a longer sequence than 'aaa'. If you're out for the longest possible match, iteratively testing each pattern is probably the way to go. Hmm, you're right. I was a bit disappointed to find out that the | operator in std.regex (and also in Perl's regex) doesn't do longest-match but first-match. :-( I had always thought it did longest-match, like in lex/flex. I wish we can extend std.regex to allow longest-match for alternations... but there may be performance consequences. T -- There's light at the end of the tunnel. It's the oncoming train. |
August 07, 2014 Re: Very Stupid Regex question | ||||
---|---|---|---|---|
| ||||
On Thu, Aug 07, 2014 at 10:42:13AM -0700, H. S. Teoh via Digitalmars-d-learn wrote: [...] > Hmm, you're right. I was a bit disappointed to find out that the | operator in std.regex (and also in Perl's regex) doesn't do longest-match but first-match. :-( I had always thought it did longest-match, like in lex/flex. > > I wish we can extend std.regex to allow longest-match for alternations... but there may be performance consequences. https://issues.dlang.org/show_bug.cgi?id=13268 T -- Valentine's Day: an occasion for florists to reach into the wallets of nominal lovers in dire need of being reminded to profess their hypothetical love for their long-forgotten. |
August 07, 2014 Re: Very Stupid Regex question | ||||
---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On Thursday, 7 August 2014 at 18:16:11 UTC, H. S. Teoh via Digitalmars-d-learn wrote:
>
> https://issues.dlang.org/show_bug.cgi?id=13268
>
>
> T
Thank you soooooooooo much!!
|
Copyright © 1999-2021 by the D Language Foundation