View mode: basic / threaded / horizontal-split · Log in · Help
April 21, 2012
Nested RegEx
Hi guys,

I'm trying to use std.regex to parse a string like the following:

string myString = "preOuter {if condition1} content1 {if condition2} content2
{elseif condition3} content3 {else}any other content{/if}{/if} postOuter";

Is there any chance to use std.regex to parse the string above? I currently
used the following expression:

auto r = regex(`(.*?)\{if:(?P<condition>(.+?))\}(?P<content>(.*))(\{/if\})
(.*)`, "g");

but it doesn't fit my nested if- and else-statements correctly.

Thanks in advance for any suggestions!
April 21, 2012
Re: Nested RegEx
On 21.04.2012 21:24, nrgyzer wrote:
> Hi guys,
>
> I'm trying to use std.regex to parse a string like the following:
>
> string myString = "preOuter {if condition1} content1 {if condition2} content2
> {elseif condition3} content3 {else}any other content{/if}{/if} postOuter";
>

Simply put pure regex is incapable of arbitrary-nested if-else 
statements. In a sense if... /if is a case of balanced parens problem 
and it's widely know to form non-regular language.

The way out is to either preconstruct regex pattern for a given maximum 
depth or just use handwritten parser (it may use regex for parts of 
input internally).

One day I may add push-pop stack extensions (that allow this kind of 
thing) into regex but I'd have to think hard to make it efficient.

(look at .NET regex they kind of have syntax for this but it's too 
underpowered for my taste)

> Is there any chance to use std.regex to parse the string above? I currently
> used the following expression:
>
> auto r = regex(`(.*?)\{if:(?P<condition>(.+?))\}(?P<content>(.*))(\{/if\})
> (.*)`, "g");
>
> but it doesn't fit my nested if- and else-statements correctly.
>
> Thanks in advance for any suggestions!


-- 
Dmitry Olshansky
April 21, 2012
Re: Nested RegEx
On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:
> On 21.04.2012 21:24, nrgyzer wrote:
> >Hi guys,
> >
> >I'm trying to use std.regex to parse a string like the following:
> >
> >string myString = "preOuter {if condition1} content1 {if condition2} content2
> >{elseif condition3} content3 {else}any other content{/if}{/if} postOuter";
> >
> 
> Simply put pure regex is incapable of arbitrary-nested if-else
> statements. In a sense if... /if is a case of balanced parens
> problem and it's widely know to form non-regular language.

This is of theoretical interest to me. Very often I find myself wanting
a concise way to express patterns with nested matching delimiters, but
regexes can't do it. But to jump to a full-blown stack-based language
seems like an overkill to me: stack languages can express *much* more
than just nested delimiters, most of which is difficult to encapsulate
in a nice, concise syntax like regexes. All I want is a simple way to
express the kind of simple nesting that matching delimiters give.


[...]
> One day I may add push-pop stack extensions (that allow this kind of
> thing) into regex but I'd have to think hard to make it efficient.
[...]

Plus, have a nice concise syntax for it (otherwise I might as well just
write a recursive descent parser for it in the first place).


T

-- 
The day Microsoft makes something that doesn't suck is probably the day
they start making vacuum cleaners... -- Slashdotter
April 21, 2012
Re: Nested RegEx
On 21.04.2012 22:46, H. S. Teoh wrote:
> On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:
>> On 21.04.2012 21:24, nrgyzer wrote:
>>> Hi guys,
>>>
>>> I'm trying to use std.regex to parse a string like the following:
>>>
>>> string myString = "preOuter {if condition1} content1 {if condition2} content2
>>> {elseif condition3} content3 {else}any other content{/if}{/if} postOuter";
>>>
>>
>> Simply put pure regex is incapable of arbitrary-nested if-else
>> statements. In a sense if... /if is a case of balanced parens
>> problem and it's widely know to form non-regular language.
>
> This is of theoretical interest to me. Very often I find myself wanting
> a concise way to express patterns with nested matching delimiters, but
> regexes can't do it. But to jump to a full-blown stack-based language
> seems like an overkill to me: stack languages can express *much* more
> than just nested delimiters, most of which is difficult to encapsulate
> in a nice, concise syntax like regexes. All I want is a simple way to
> express the kind of simple nesting that matching delimiters give.
>

Recursive descent is not particularly bad unless minimal "grammar 
descent depth" is high. Example:
a+b-c
uses a quite a lot of recursive calls for grammar with e.g. 10+ operator 
precedence levels.


I'm thinking of merging operator precedence parser with regex might be a 
happy marriage you know.

Back to OP topic something along the lines of this will do it (beware of 
stack overflow):

void parseIf(){
	static int ifNest;
	if(input.startWith("{if")){
		ifNest++;
		scope(exit) ifNest--;
		enforce(ifNest < 10000, "conservative stack overflow");
		parseCond(input[2..$-1]);//regex that does condition
		enforce(input[$-1] == '}', "close that if");
		parseIf();//parse body or another nested if
		//parseElse();//goes here, as does elif
	}
	else
		parseBody();// the regex you used before
}


>
> [...]
>> One day I may add push-pop stack extensions (that allow this kind of
>> thing) into regex but I'd have to think hard to make it efficient.
> [...]
>
> Plus, have a nice concise syntax for it (otherwise I might as well just
> write a recursive descent parser for it in the first place).
>

Concise syntax & lots of power is a priority actually, because I can 
detect if push-pop stuff is used or not and pick the right engine for 
the job. So different big-oh complexity is not important.

-- 
Dmitry Olshansky
April 24, 2012
Re: Nested RegEx
== Auszug aus Dmitry Olshansky (dmitry.olsh@gmail.com)'s Artikel
> On 21.04.2012 22:46, H. S. Teoh wrote:
> > On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:
> >> On 21.04.2012 21:24, nrgyzer wrote:
> >>> Hi guys,
> >>>
> >>> I'm trying to use std.regex to parse a string like the following:
> >>>
> >>> string myString = "preOuter {if condition1} content1 {if condition2}
content2
> >>> {elseif condition3} content3 {else}any other content{/if}{/if} postOuter";
> >>>
> >>
> >> Simply put pure regex is incapable of arbitrary-nested if-else
> >> statements. In a sense if... /if is a case of balanced parens
> >> problem and it's widely know to form non-regular language.
> >
> > This is of theoretical interest to me. Very often I find myself wanting
> > a concise way to express patterns with nested matching delimiters, but
> > regexes can't do it. But to jump to a full-blown stack-based language
> > seems like an overkill to me: stack languages can express *much* more
> > than just nested delimiters, most of which is difficult to encapsulate
> > in a nice, concise syntax like regexes. All I want is a simple way to
> > express the kind of simple nesting that matching delimiters give.
> >
> Recursive descent is not particularly bad unless minimal "grammar
> descent depth" is high. Example:
> a+b-c
> uses a quite a lot of recursive calls for grammar with e.g. 10+ operator
> precedence levels.
> I'm thinking of merging operator precedence parser with regex might be a
> happy marriage you know.
> Back to OP topic something along the lines of this will do it (beware of
> stack overflow):
> void parseIf(){
> 	static int ifNest;
> 	if(input.startWith("{if")){
> 		ifNest++;
> 		scope(exit) ifNest--;
> 		enforce(ifNest < 10000, "conservative stack overflow");
> 		parseCond(input[2..$-1]);//regex that does condition
> 		enforce(input[$-1] == '}', "close that if");
> 		parseIf();//parse body or another nested if
> 		//parseElse();//goes here, as does elif
> 	}
> 	else
> 		parseBody();// the regex you used before
> }
> >
> > [...]
> >> One day I may add push-pop stack extensions (that allow this kind of
> >> thing) into regex but I'd have to think hard to make it efficient.
> > [...]
> >
> > Plus, have a nice concise syntax for it (otherwise I might as well just
> > write a recursive descent parser for it in the first place).
> >
> Concise syntax & lots of power is a priority actually, because I can
> detect if push-pop stuff is used or not and pick the right engine for
> the job. So different big-oh complexity is not important.

Thanks guys... I just solved the problem by using indexOf() from std.string for
parsing my if-statements. So... in principle, I can have unlimited nested if-
statements, but I think it's much more difficult (and slower) than using any
regex-expression.
Top | Discussion index | About this forum | D home