Thread overview
regex with literal (ie automatically replace '(' with '\(', etc) )
May 29, 2013
Timothee Cour
May 29, 2013
timotheecour
May 30, 2013
Diggory
May 30, 2013
Timothee Cour
May 30, 2013
Diggory
May 30, 2013
Timothee Cour
May 30, 2013
Dmitry Olshansky
May 30, 2013
Diggory
May 30, 2013
Dmitry Olshansky
May 29, 2013
See below:

import std.stdio;
import std.regex;

void main(){
"h(i".replace!(a=>a.hit~a.hit)(regex(`h\(`,"g")).writeln; //this works, but
I need to specify the escape manually
// "h(i".replace!(a=>a.hit~a.hit)(regex(`h(`,"gl")).writeln;  //I'd like
this to work with a flag, say 'l' (lowercase L) as in 'litteral'.
}

note, std.array.replace doesn't work because I want to be able to use
std.regex' replace with delegate functionality as above.
This is especially useful when the regex's first argument is given as an
input argument (ie is unknown), and we want to properly escape it.

Alternatively, (and perhaps more generally), could we have a function:
string toRegexLiteral(string){
//replace all regex special characters (like '(' ) with their escaped
equivalent
}


May 29, 2013
something like this, which we should have in std.regex:

string escapeRegex(string a){
	import std.string;
	enum transTable = ['[' : `\[`, '|' : `\|`, '*': `\*`, '+': `\+`, '?': `\?`, '(': `\(`, ')': `\)`];
	return translate(a, transTable);
}
string escapeRegexReplace(string a){
	import std.string;
//	enum transTable = ['$' : `$$`, '\\' : `\\`];
	enum transTable = ['$' : `$$`];
	return translate(a, transTable);
}

unittest{
	string a=`asdf(def[ghi]+*|)`;
	assert(match(a,regex(escapeRegex(a))).hit==a);
	string b=`$aa\/$ $$#@$\0$1#$@%#@%=+_`;
	auto s=replace(a,regex(escapeRegex(a)),escapeRegexReplace(b));
	assert(s==b);
}



On Wednesday, 29 May 2013 at 23:28:19 UTC, Timothee Cour wrote:
> See below:
>
> import std.stdio;
> import std.regex;
>
> void main(){
> "h(i".replace!(a=>a.hit~a.hit)(regex(`h\(`,"g")).writeln; //this works, but
> I need to specify the escape manually
> // "h(i".replace!(a=>a.hit~a.hit)(regex(`h(`,"gl")).writeln;  //I'd like
> this to work with a flag, say 'l' (lowercase L) as in 'litteral'.
> }
>
> note, std.array.replace doesn't work because I want to be able to use
> std.regex' replace with delegate functionality as above.
> This is especially useful when the regex's first argument is given as an
> input argument (ie is unknown), and we want to properly escape it.
>
> Alternatively, (and perhaps more generally), could we have a function:
> string toRegexLiteral(string){
> //replace all regex special characters (like '(' ) with their escaped
> equivalent
> }

May 30, 2013
On Wednesday, 29 May 2013 at 23:33:30 UTC, timotheecour wrote:
> something like this, which we should have in std.regex:
>
> string escapeRegex(string a){
> 	import std.string;
> 	enum transTable = ['[' : `\[`, '|' : `\|`, '*': `\*`, '+': `\+`, '?': `\?`, '(': `\(`, ')': `\)`];
> 	return translate(a, transTable);
> }
> string escapeRegexReplace(string a){
> 	import std.string;
> //	enum transTable = ['$' : `$$`, '\\' : `\\`];
> 	enum transTable = ['$' : `$$`];
> 	return translate(a, transTable);
> }
>
> unittest{
> 	string a=`asdf(def[ghi]+*|)`;
> 	assert(match(a,regex(escapeRegex(a))).hit==a);
> 	string b=`$aa\/$ $$#@$\0$1#$@%#@%=+_`;
> 	auto s=replace(a,regex(escapeRegex(a)),escapeRegexReplace(b));
> 	assert(s==b);
> }

That would be good (although you missed a few :P)

Technically any working "escapeRegex" would also function as a valid "escapeRegexReplace", although it might be slightly faster to have a specialised one.
May 30, 2013
ok, here it is:

https://github.com/timotheecour/dtools/blob/master/dtools/util/util.d#L78
simplified implementation and added missing escape symbols. Any symbol
missing?
I was basing myself based on http://dlang.org/phobos/std_regex.html, table
entry '\c where c is one of', but that was incomplete. I'm also noting that
table entry 'any character except' is also incomplete.

> Technically any working "escapeRegex" would also function as a valid
"escapeRegexReplace", although it might be slightly faster to have a specialised one.

not sure, because they escape differently (\$ vs $$).

shall i do a pull request for std.regex?


On Wed, May 29, 2013 at 8:32 PM, Diggory <diggsey@googlemail.com> wrote:

> On Wednesday, 29 May 2013 at 23:33:30 UTC, timotheecour wrote:
>
>> something like this, which we should have in std.regex:
>>
>> string escapeRegex(string a){
>>         import std.string;
>>         enum transTable = ['[' : `\[`, '|' : `\|`, '*': `\*`, '+': `\+`,
>> '?': `\?`, '(': `\(`, ')': `\)`];
>>         return translate(a, transTable);
>> }
>> string escapeRegexReplace(string a){
>>         import std.string;
>> //      enum transTable = ['$' : `$$`, '\\' : `\\`];
>>         enum transTable = ['$' : `$$`];
>>         return translate(a, transTable);
>> }
>>
>> unittest{
>>         string a=`asdf(def[ghi]+*|)`;
>>         assert(match(a,regex(**escapeRegex(a))).hit==a);
>>         string b=`$aa\/$ $$#@$\0$1#$@%#@%=+_`;
>>         auto s=replace(a,regex(escapeRegex(**a)),escapeRegexReplace(b));
>>         assert(s==b);
>> }
>>
>
> That would be good (although you missed a few :P)
>
> Technically any working "escapeRegex" would also function as a valid "escapeRegexReplace", although it might be slightly faster to have a specialised one.
>


May 30, 2013
On Thursday, 30 May 2013 at 06:50:06 UTC, Timothee Cour wrote:
> ok, here it is:
>
> https://github.com/timotheecour/dtools/blob/master/dtools/util/util.d#L78
> simplified implementation and added missing escape symbols. Any symbol
> missing?
> I was basing myself based on http://dlang.org/phobos/std_regex.html, table
> entry '\c where c is one of', but that was incomplete. I'm also noting that
> table entry 'any character except' is also incomplete.
>
>> Technically any working "escapeRegex" would also function as a valid
> "escapeRegexReplace", although it might be slightly faster to have a
> specialised one.
>
> not sure, because they escape differently (\$ vs $$).

According to this: http://dlang.org/phobos/std_regex.html#.replace you can use the same escape sequences for both (\c -> c in the replacement string).
May 30, 2013
> According to this: http://dlang.org/phobos/std_**regex.html#.replace<http://dlang.org/phobos/std_regex.html#.replace> you
can use the same escape sequences for both (\c -> c in the replacement
string).

Your suggestion does not work; try for yourself by replacing the $$ by \$
in my code. Is that a bug in std.regex' doc?
eg:
replace("",regex(``),`\$`);
=> invalid format string in regex replace

However everything works fine with $$, see my code above.

On Thu, May 30, 2013 at 1:14 AM, Diggory <diggsey@googlemail.com> wrote:

> On Thursday, 30 May 2013 at 06:50:06 UTC, Timothee Cour wrote:
>
>> ok, here it is:
>>
>> https://github.com/**timotheecour/dtools/blob/**
>> master/dtools/util/util.d#L78<https://github.com/timotheecour/dtools/blob/master/dtools/util/util.d#L78>
>> simplified implementation and added missing escape symbols. Any symbol
>> missing?
>> I was basing myself based on http://dlang.org/phobos/std_**regex.html<http://dlang.org/phobos/std_regex.html>,
>> table
>> entry '\c where c is one of', but that was incomplete. I'm also noting
>> that
>> table entry 'any character except' is also incomplete.
>>
>>  Technically any working "escapeRegex" would also function as a valid
>>>
>> "escapeRegexReplace", although it might be slightly faster to have a specialised one.
>>
>> not sure, because they escape differently (\$ vs $$).
>>
>
> According to this: http://dlang.org/phobos/std_**regex.html#.replace<http://dlang.org/phobos/std_regex.html#.replace>you can use the same escape sequences for both (\c -> c in the replacement
> string).
>


May 30, 2013
30-May-2013 14:24, Timothee Cour пишет:
>  > According to this: http://dlang.org/phobos/std___regex.html#.replace
> <http://dlang.org/phobos/std_regex.html#.replace> you can use the same
> escape sequences for both (\c -> c in the replacement string).
>
> Your suggestion does not work; try for yourself by replacing the $$ by
> \$ in my code. Is that a bug in std.regex' doc?
> eg:
> replace("",regex(``),`\$`);
> => invalid format string in regex replace
>

Indeed replace format string is a different beast. I can't recall if I stolen the original std.regex or devised this $$ myself.

By any rate replace(fmt, `\$`, "$$") would work or the same with replace from std.string. So I feel it's a bit of stretch to include a function for such a narrow case.

> However everything works fine with $$, see my code above.
>
> On Thu, May 30, 2013 at 1:14 AM, Diggory <diggsey@googlemail.com
> <mailto:diggsey@googlemail.com>> wrote:
>
>     On Thursday, 30 May 2013 at 06:50:06 UTC, Timothee Cour wrote:
>
>         ok, here it is:
>
>         https://github.com/__timotheecour/dtools/blob/__master/dtools/util/util.d#L78
>         <https://github.com/timotheecour/dtools/blob/master/dtools/util/util.d#L78>
>         simplified implementation and added missing escape symbols. Any
>         symbol
>         missing?
>         I was basing myself based on
>         http://dlang.org/phobos/std___regex.html
>         <http://dlang.org/phobos/std_regex.html>, table
>         entry '\c where c is one of', but that was incomplete. I'm also
>         noting that
>         table entry 'any character except' is also incomplete.
>
>             Technically any working "escapeRegex" would also function as
>             a valid
>
>         "escapeRegexReplace", although it might be slightly faster to have a
>         specialised one.
>
>         not sure, because they escape differently (\$ vs $$).
>
>
>     According to this: http://dlang.org/phobos/std___regex.html#.replace
>     <http://dlang.org/phobos/std_regex.html#.replace> you can use the
>     same escape sequences for both (\c -> c in the replacement string).
>
>


-- 
Dmitry Olshansky
May 30, 2013
30-May-2013 10:49, Timothee Cour пишет:
> ok, here it is:
>
> https://github.com/timotheecour/dtools/blob/master/dtools/util/util.d#L78
> simplified implementation and added missing escape symbols. Any symbol
> missing?
> I was basing myself based on http://dlang.org/phobos/std_regex.html,
> table entry '\c where c is one of', but that was incomplete. I'm also
> noting that table entry 'any character except' is also incomplete.

One thing missing that '.' that should become '\.'.

>
>  > Technically any working "escapeRegex" would also function as a valid
> "escapeRegexReplace", although it might be slightly faster to have a
> specialised one.
>
> not sure, because they escape differently (\$ vs $$).
>
> shall i do a pull request for std.regex?

Yes, please. It's was a blind spot for long time. Strictly speaking I think that a generic escaping routine would work:

auto escape(S1, S2, C)(S1 src, S2 escapables, C escape='\\')
	if(isSomeString!S1 && isSomeString!S2 && isSomeChar!C)
{
	....
}

Do we have something like this in std.string?
Then all we need is a convenience wrapper in std.regex?

BTW unescape is as important.

>
> On Wed, May 29, 2013 at 8:32 PM, Diggory <diggsey@googlemail.com
> <mailto:diggsey@googlemail.com>> wrote:
>
>     On Wednesday, 29 May 2013 at 23:33:30 UTC, timotheecour wrote:
>
>         something like this, which we should have in std.regex:
>
>         string escapeRegex(string a){
>                  import std.string;
>                  enum transTable = ['[' : `\[`, '|' : `\|`, '*': `\*`,
>         '+': `\+`, '?': `\?`, '(': `\(`, ')': `\)`];
>                  return translate(a, transTable);
>         }
>         string escapeRegexReplace(string a){
>                  import std.string;
>         //      enum transTable = ['$' : `$$`, '\\' : `\\`];
>                  enum transTable = ['$' : `$$`];
>                  return translate(a, transTable);
>         }
>
>         unittest{
>                  string a=`asdf(def[ghi]+*|)`;
>                  assert(match(a,regex(__escapeRegex(a))).hit==a);
>                  string b=`$aa\/$ $$#@$\0$1#$@%#@%=+_`;
>                  auto
>         s=replace(a,regex(escapeRegex(__a)),escapeRegexReplace(b));
>                  assert(s==b);
>         }
>
>
>     That would be good (although you missed a few :P)
>
>     Technically any working "escapeRegex" would also function as a valid
>     "escapeRegexReplace", although it might be slightly faster to have a
>     specialised one.
>
>


-- 
Dmitry Olshansky
May 30, 2013
> Your suggestion does not work; try for yourself by replacing the $$ by \$
> in my code. Is that a bug in std.regex' doc?
> eg:
> replace("",regex(``),`\$`);
> => invalid format string in regex replace
>
> However everything works fine with $$, see my code above.

Either the doc or the code should probably be changed then so they are consistent.