October 13, 2011
On 2011-10-12 21:41, Dmitry Olshansky wrote:
> On 12.10.2011 23:32, kennytm wrote:
>> Dmitry Olshansky<dmitry.olsh@gmail.com> wrote:
>>> Fresh version of documentation is here:
>>> http://blackwhale.github.com/
>>>
>>> This fixes all typos reported so far, adds missing overload of replace
>>> (ouch!) and introduces a brand new syntax table.
>>
>> The '.' really matches any character, including the new line '\n'?
>
> Hm, yes. Is that a problem?

Shouldn't "." exclude newlines? I think this is a good reference:

http://www.regular-expressions.info/reference.html

Which says:

Matches any single character except line break characters \r and \n. Most regex flavors have an option to make the dot match line break characters too.

-- 
/Jacob Carlborg
October 13, 2011
On 13.10.2011 8:38, Andrei Alexandrescu wrote:
> On 10/12/11 9:50 PM, Jesse Phillips wrote:
>> On Wed, 12 Oct 2011 23:35:49 +0000, kennytm wrote:
>>
>>> Most regex flavors don't match '\n' by default unless you supply the "s"
>>> flag -- including ECMAScript (well it doesn't even provide the "s" flag
>>> to allow '.' to match all characters).
>>
>> Really? Sense when? I didn't know there was any that didn't match \n. If
>> you want to match everything not a new line [^\n].
>
> Kenny's right.
>
> http://www.regular-expressions.info/dot.html
>
> Engines have special options for multiline.
>

The funny thing is that multiline mode affects only ^ & $ anchors. And single line mode affects only . matches \r and \n rule. So it's entirely possible to use both at the same time.

But anyway I guess I have to bite the bullet: add 's' option and introduce classic semantics by default.

BTW in unicode end of line is much more then just \r or \n and among other things includes "unbreakable" two codepoint sequence '\r\n'. I wonder if any engine matches . in the middle of \r\n or do they detect stop on any other end-of-line characters.


-- 
Dmitry Olshansky
October 15, 2011
On 12.10.2011 22:17, Dmitry Olshansky wrote:
> Fresh version of documentation is here:
> http://blackwhale.github.com/
>
> This fixes all typos reported so far, adds missing overload of replace
> (ouch!) and introduces a brand new syntax table.
>

Updated, with single-line mode and a few documentation fixes.
Source code is still here:
https://github.com/blackwhale/phobos

-- 
Dmitry Olshansky
October 22, 2011
Please note that the review will be ending this weekend in just 32 hours. At which point voting will begin, please do not wait for voting to criticize the library.

Updating Documentation: http://blackwhale.github.com/

On Sat, 08 Oct 2011 19:56:32 +0000, Jesse Phillips wrote:

> Hello everyone,
> 
> I have taken the role of review manager of the std.regex replacement by Dmitry Olshansky. The review period begins now 2011-10-8 and will end on 2011-10-23 at midnight UTC. A voting thread to include into Phobos will be held after review assuming such is appropriate. The Voting period is one week.
> 
> Please note that you can try FRed as part of Phobos (Code) or by itself
> (Package of FReD) which includes docs.
> 
> Doc:
> 
> http://nascent.freeshell.org/fred/doc/
> 
> Code:
> 
> https://github.com/blackwhale/phobos MASTER
> 
> Package of FReD:
> 
> https://github.com/downloads/blackwhale/FReD/FReD.zip
> 
> Remember this will be replacing the current std.regex and is intended to be a drop in replacement. This project is also part of GSoC.
> 
> Dmitry, I ask that you apply this patch to posix.mak (adding to internal
> modules).
> 
> --- a/posix.mak +++ b/posix.mak @@ -184,7 +184,8 @@ std/c/, fenv locale
> math process stdarg stddef stdio stdlib
>  time wcharh)
>  EXTRA_MODULES += $(EXTRA_DOCUMENTABLES) $(addprefix
>  \
>         std/internal/math/, biguintcore biguintnoasm biguintx86 \
> -       gammafunction errorfunction) std/internal/processinit +
> gammafunction errorfunction) std/internal/processinit \
> +       std/internal/uni std/internal/uni_tab
> 
>  # Aggregate all D modules relevant to this build D_MODULES = crc32
>  $(STD_MODULES) $(EXTRA_MODULES) $(STD_NET_MODULES)

October 22, 2011
I haven't followed the discussion closely, and I cannot really comment on the core regex functionality, but I did actually use FReD as a replacement of a buggy std.regex once.

In that case I wanted to have a lazily created static regex, but I did not find an official way to test whether a Regex has been initialized:

	static Regex!char re;
	if(!isInitializedRE(re))
		re = regex(r"^(.*)\(([0-9]+)\):(.*)$");

So I implemented isInitializedRE() as "re.ir !is null" for std.regex and "re.captures() > 0" for fred, but that fails for being a "drop-in replacement".

I think, both versions use implementation specifics, maybe there should be a documented way to test for being initialized.

I also noticed, that "auto match(R, RegEx)(R input, RegEx re);" appears twice in the documentation, same for "bmatch". I guess they should not appear together with the string versions.

Rainer

On 22.10.2011 18:21, Jesse Phillips wrote:
> Please note that the review will be ending this weekend in just 32 hours.
> At which point voting will begin, please do not wait for voting to
> criticize the library.
>
> Updating Documentation: http://blackwhale.github.com/
>
> On Sat, 08 Oct 2011 19:56:32 +0000, Jesse Phillips wrote:
>
>> Hello everyone,
>>
>> I have taken the role of review manager of the std.regex replacement by
>> Dmitry Olshansky. The review period begins now 2011-10-8 and will end on
>> 2011-10-23 at midnight UTC. A voting thread to include into Phobos will
>> be held after review assuming such is appropriate. The Voting period is
>> one week.
>>
>> Please note that you can try FRed as part of Phobos (Code) or by itself
>> (Package of FReD) which includes docs.
>>
>> Doc:
>>
>> http://nascent.freeshell.org/fred/doc/
>>
>> Code:
>>
>> https://github.com/blackwhale/phobos MASTER
>>
>> Package of FReD:
>>
>> https://github.com/downloads/blackwhale/FReD/FReD.zip
>>
>> Remember this will be replacing the current std.regex and is intended to
>> be a drop in replacement. This project is also part of GSoC.
>>
>> Dmitry, I ask that you apply this patch to posix.mak (adding to internal
>> modules).
>>
>> --- a/posix.mak +++ b/posix.mak @@ -184,7 +184,8 @@ std/c/, fenv locale
>> math process stdarg stddef stdio stdlib
>>   time wcharh)
>>   EXTRA_MODULES += $(EXTRA_DOCUMENTABLES) $(addprefix
>>   \
>>          std/internal/math/, biguintcore biguintnoasm biguintx86 \
>> -       gammafunction errorfunction) std/internal/processinit +
>> gammafunction errorfunction) std/internal/processinit \
>> +       std/internal/uni std/internal/uni_tab
>>
>>   # Aggregate all D modules relevant to this build D_MODULES = crc32
>>   $(STD_MODULES) $(EXTRA_MODULES) $(STD_NET_MODULES)
>
October 22, 2011
On 22.10.2011 20:56, Rainer Schuetze wrote:
> I haven't followed the discussion closely, and I cannot really comment
> on the core regex functionality, but I did actually use FReD as a
> replacement of a buggy std.regex once.
>
> In that case I wanted to have a lazily created static regex, but I did
> not find an official way to test whether a Regex has been initialized:
>
> static Regex!char re;
> if(!isInitializedRE(re))
> re = regex(r"^(.*)\(([0-9]+)\):(.*)$");
>
> So I implemented isInitializedRE() as "re.ir !is null" for std.regex and
> "re.captures() > 0" for fred, but that fails for being a "drop-in
> replacement".

Coincidentally, you still can access re.ir property in this way.
Wow, I wonder how far with backwards compatibility I can go :)

In both cases this relies on undocumented features.
Even now I can suggest a more portable and entirely generic way:

if(re == Regex!(char).init)
{
//create re
}

Though that risks doing more work then needed.

>
> I think, both versions use implementation specifics, maybe there should
> be a documented way to test for being initialized.
>

Definitely. How about adding an empty property + opCast to bool, with that you'd get:
if(!re)
{
//create re
}

and a bit more verbose:
if(re.empty)
{
//create re
}

> I also noticed, that "auto match(R, RegEx)(R input, RegEx re);" appears
> twice in the documentation, same for "bmatch". I guess they should not
> appear together with the string versions.
>

I gather that happens because there is another overload specifically for C-T regexes. It's docs state just that, but lacking the template constraint signatures are the same, so it indeed can cause some confusion.
Maybe it would be better to just combine docs together, and leave one overload undocumented.

-- 
Dmitry Olshansky
October 23, 2011
On Oct 22, 2011, at 12:05 PM, Dmitry Olshansky wrote:

> On 22.10.2011 20:56, Rainer Schuetze wrote:
>> […]
>> I think, both versions use implementation specifics, maybe there should
>> be a documented way to test for being initialized.
>> 
> 
> Definitely. How about adding an empty property + opCast to bool, with that you'd get:
> if(!re)
> {
> //create re
> }

I think this is better, should one ever want to switch to plain pointer…, also you need less thinking if it works like for classes.

> and a bit more verbose:
> if(re.empty)
> {
> //create re
> }

October 23, 2011

On 22.10.2011 21:05, Dmitry Olshansky wrote:
> On 22.10.2011 20:56, Rainer Schuetze wrote:
>> I haven't followed the discussion closely, and I cannot really comment
>> on the core regex functionality, but I did actually use FReD as a
>> replacement of a buggy std.regex once.
>>
>> In that case I wanted to have a lazily created static regex, but I did
>> not find an official way to test whether a Regex has been initialized:
>>
>> static Regex!char re;
>> if(!isInitializedRE(re))
>> re = regex(r"^(.*)\(([0-9]+)\):(.*)$");
>>
>> So I implemented isInitializedRE() as "re.ir !is null" for std.regex and
>> "re.captures() > 0" for fred, but that fails for being a "drop-in
>> replacement".
>
> Coincidentally, you still can access re.ir property in this way.
> Wow, I wonder how far with backwards compatibility I can go :)
>
> In both cases this relies on undocumented features.
> Even now I can suggest a more portable and entirely generic way:
>
> if(re == Regex!(char).init)
> {
> //create re
> }
>
> Though that risks doing more work then needed.
>
>>
>> I think, both versions use implementation specifics, maybe there should
>> be a documented way to test for being initialized.
>>
>
> Definitely. How about adding an empty property + opCast to bool, with
> that you'd get:
> if(!re)
> {
> //create re
> }
>
> and a bit more verbose:
> if(re.empty)
> {
> //create re
> }

I think, this might be confused with normal usage, like "is this regex the empty string?" (Is "" a valid regex?). Maybe a more explicite "valid()" predicate would be fine.

>
>> I also noticed, that "auto match(R, RegEx)(R input, RegEx re);" appears
>> twice in the documentation, same for "bmatch". I guess they should not
>> appear together with the string versions.
>>
>
> I gather that happens because there is another overload specifically for
> C-T regexes. It's docs state just that, but lacking the template
> constraint signatures are the same, so it indeed can cause some confusion.
> Maybe it would be better to just combine docs together, and leave one
> overload undocumented.
>

As RegEx is a template argument here, it can stand for both Regex and StaticRegex, and that should be mentioned. Whether it has two different implementations is an implementation detail that does not need to bother the user.

If you want to keep the second entries, I'd recommend renaming the argument to StaticRegEx.
October 23, 2011
On 23.10.2011 11:28, Rainer Schuetze wrote:
>
>
> On 22.10.2011 21:05, Dmitry Olshansky wrote:
>> On 22.10.2011 20:56, Rainer Schuetze wrote:
>>> I haven't followed the discussion closely, and I cannot really comment
>>> on the core regex functionality, but I did actually use FReD as a
>>> replacement of a buggy std.regex once.
>>>
>>> In that case I wanted to have a lazily created static regex, but I did
>>> not find an official way to test whether a Regex has been initialized:
>>>
>>> static Regex!char re;
>>> if(!isInitializedRE(re))
>>> re = regex(r"^(.*)\(([0-9]+)\):(.*)$");
>>>
>>> So I implemented isInitializedRE() as "re.ir !is null" for std.regex and
>>> "re.captures() > 0" for fred, but that fails for being a "drop-in
>>> replacement".
>>
>> Coincidentally, you still can access re.ir property in this way.
>> Wow, I wonder how far with backwards compatibility I can go :)
>>
>> In both cases this relies on undocumented features.
>> Even now I can suggest a more portable and entirely generic way:
>>
>> if(re == Regex!(char).init)
>> {
>> //create re
>> }
>>
>> Though that risks doing more work then needed.
>>
>>>
>>> I think, both versions use implementation specifics, maybe there should
>>> be a documented way to test for being initialized.
>>>
>>
>> Definitely. How about adding an empty property + opCast to bool, with
>> that you'd get:
>> if(!re)
>> {
>> //create re
>> }
>>
>> and a bit more verbose:
>> if(re.empty)
>> {
>> //create re
>> }
>
> I think, this might be confused with normal usage, like "is this regex
> the empty string?" (Is "" a valid regex?). Maybe a more explicite
> "valid()" predicate would be fine.

"" is a valid regex that matches anywhere, with global flag it will match before any codepoint + once at end.
I'm not sure using 'valid' is good, it may mislead user to check it all over the place e.g.:
auto r = regex("blah");
if(r.valid())
...


>
>>
>>> I also noticed, that "auto match(R, RegEx)(R input, RegEx re);" appears
>>> twice in the documentation, same for "bmatch". I guess they should not
>>> appear together with the string versions.
>>>
>>
>> I gather that happens because there is another overload specifically for
>> C-T regexes. It's docs state just that, but lacking the template
>> constraint signatures are the same, so it indeed can cause some
>> confusion.
>> Maybe it would be better to just combine docs together, and leave one
>> overload undocumented.
>>
>
> As RegEx is a template argument here, it can stand for both Regex and
> StaticRegex, and that should be mentioned. Whether it has two different
> implementations is an implementation detail that does not need to bother
> the user.

OK, will do.

>
> If you want to keep the second entries, I'd recommend renaming the
> argument to StaticRegEx.


-- 
Dmitry Olshansky
October 25, 2011
Am 22.10.2011, 21:05 Uhr, schrieb Dmitry Olshansky <dmitry.olsh@gmail.com>:

> Definitely. How about adding an empty property + opCast to bool, with that you'd get:
> if(!re)
> {
> //create re
> }

It is nice that you *can* do this,

> and a bit more verbose:
> if(re.empty)
> {
> //create re
> }

but I prefer some speaking name here. Otherwise I'd believe 're' is a pointer or boolean + it is harder to look up in the documentation.