Jump to page: 1 2
Thread overview
[Issue 7260] New: "g" on default in std.regex.match
Feb 24, 2012
Dmitry Olshansky
Apr 19, 2012
SomeDude
[Issue 7260] "g" on default in std.regex
Jan 25, 2013
Dmitry Olshansky
Mar 10, 2013
Dmitry Olshansky
Mar 10, 2013
Dmitry Olshansky
Mar 10, 2013
Dmitry Olshansky
Aug 17, 2013
Dmitry Olshansky
Sep 22, 2013
Dmitry Olshansky
January 09, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7260

           Summary: "g" on default in std.regex.match
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: bearophile_hugs@eml.cc


--- Comment #0 from bearophile_hugs@eml.cc 2012-01-09 13:52:08 PST ---
D2 code:


import std.stdio: write, writeln;
import std.regex: regex, match;

void main() {
    string text = "abc312de";

    foreach (c; text.match("1|2|3|4"))
        write(c, " ");
    writeln();

    foreach (c; text.match(regex("1|2|3|4", "g")))
        write(c, " ");
    writeln();
}


It outputs (DMD 2.058 Head):

["3"] ["3"] ["1"] ["2"]


In my code I have seen that usually the "g" option (that means "repeat over the whole input") is what I want.

So what do you think about making "g" the default?



Note: I have not marked this issue as "enhancement" because of this comment by Dmitry Olshansky (found by drey_ on IRC #D):

http://dfeed.kimsufi.thecybershadow.net/discussion/thread/jc9hrl$2lpp$1@digitalmars.com#post-jc9mag:2430tq:241:40digitalmars.com

> Yet I have to issue yet another warning about new std.regex compared with old one:
> 
> import std.stdio;
> import std.regex;
> 
> void main() {
>     string src = "4.5.1";
>     foreach (c; match(src, regex(r"(\d+)")))
>         writeln(c.hit);
> }
> 
> previously this will find all matches, now it finds only first one. To get all of matches use "g" option.
> 
> Seems like 100% compatibility was next to impossible.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 24, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7260


Dmitry Olshansky <dmitry.olsh@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh@gmail.com


--- Comment #1 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2012-02-24 12:21:44 PST ---
I dunno how to "fix" this bug. "g" by default imples there is a way to override
it. regex("blah","") ?
Leaving it as is now breaks old codebases that rely on "g" (though there should
be more of legacy std.regexp code out there).
Making it "g" on affects old code only inside foreach and generic constructs
that show all matches or iterate on them, it's rare but non-zero.

Another way would be to ditch current API, which I is not ideal btw ;)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
February 24, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7260



--- Comment #2 from bearophile_hugs@eml.cc 2012-02-24 12:45:08 PST ---
(In reply to comment #1)
> I dunno how to "fix" this bug. "g" by default imples there is a way to override
> it. regex("blah","") ?
> Leaving it as is now breaks old codebases that rely on "g" (though there should
> be more of legacy std.regexp code out there).
> Making it "g" on affects old code only inside foreach and generic constructs
> that show all matches or iterate on them, it's rare but non-zero.
> 
> Another way would be to ditch current API, which I is not ideal btw ;)

Fully ditching the currently used API is probably too much.

A possible idea:
regex("blah") <<== repeat over the whole input.
regex("blah","") <<== repeat over the whole input.
regex("blah","g") <<== repeat over the whole input.
regex("blah","d") <<== doesn't repeat over the whole input.


So far you have done good work on the regular expression implementation, so I trust your work. Thank you.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 19, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7260


SomeDude <lovelydear@mailmetrash.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lovelydear@mailmetrash.com
           Severity|normal                      |enhancement


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 19, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7260


bearophile_hugs@eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|enhancement                 |normal


--- Comment #3 from bearophile_hugs@eml.cc 2012-04-19 15:18:13 PDT ---
This is not an enhancement request (I consider it more like a little Phobos
regression).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 25, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=7260


bearophile_hugs@eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|"g" on default in           |"g" on default in std.regex
                   |std.regex.match             |


--- Comment #4 from bearophile_hugs@eml.cc 2013-01-24 19:21:14 PST ---
If changing std.regex.regex is not possible, then an alternative solution is to introduce the new little function "std.regex.re", that repeats on default, that is like:

re(someString) === regex(someString, "g")

re(someString, "d") === regex(someString, "dg")

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
January 25, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=7260



--- Comment #5 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2013-01-25 12:22:46 PST ---
(In reply to comment #4)
> If changing std.regex.regex is not possible, then an alternative solution is to introduce the new little function "std.regex.re", that repeats on default, that is like:
> 
> re(someString) === regex(someString, "g")
> 
> re(someString, "d") === regex(someString, "dg")

Frankly this is stupid (sorry). Obviously the wrong turn is that people (rightfully so) associate "find all" vs "find first" with operation that is "match"/"replace" not the "regex" as in the pattern itself.

Personally I think that we better go with explicit overrides on "match"/"replace"/etc. and very slowly deprecate the "g" switch.

Then how the override will look like is up for debate.

match(someString, pattern).all //range of all matches
match(someString, pattern).first //only the first one
match(someString, pattern) // using the "g" flag to decide


Or pass the override as optional parameter to match:

match(someString, pattern, Regex.all);
match(someString, pattern, Regex.first);
match(someString, pattern); //use the flag

I'll probably open a poll to pick the better one.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 10, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=7260



--- Comment #6 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2013-03-10 10:43:30 PDT ---
(In reply to comment #4)
> If changing std.regex.regex is not possible, then an alternative solution is to introduce the new little function "std.regex.re", that repeats on default, that is like:
> 
> re(someString) === regex(someString, "g")
> 
> re(someString, "d") === regex(someString, "dg")

Here is a plan based on one of my previous idea that I think is clean enough, given the circumstances and the fact that e.g. this Perl-ism is fairly popular in certain circles.

(Namely attaching mode of operation to the pattern itself as in
/`pattern`/`mode-suffix`).

What we do is at first specify that "g" serves only as the intended default "mode" of this pattern.

Then introduce simple and elegant way to explicitly specify what mode of matching to use: first, all or the default for this pattern.

The your code looks like this (I'm still pondering better names/ways for
overriding default):

void main() {
    string text = "abc312de";

    foreach (c; text.match("1|2|3|4").first)
        write(c, " ");
    writeln();

    foreach (c; text.match(regex("1|2|3|4")).all) //could use string pattern as
above
        write(c, " ");
    writeln();
}

Then I'd try to do the same with replace. No overrides used would imply "use whatever the default mode is".

How does it sound?

Then we place nice bold warning that use of "g" option is discouraged and is provided only for compatibilty and is going be deprecated in future.

A year later and depending on the mood of people it gets finally deprecated and slowly shifted towards oblivion.

I'll probably cross-post this to NG to collect opinions since this is the largest pain point of the otherwise fine interface.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 10, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=7260



--- Comment #7 from bearophile_hugs@eml.cc 2013-03-10 11:09:31 PDT ---
(In reply to comment #5)

> match(someString, pattern).all //range of all matches
> match(someString, pattern).first //only the first one
> match(someString, pattern) // using the "g" flag to decide


(In reply to comment #6)

> No overrides used would imply "use whatever the default mode is".
> 
> How does it sound?
> 
> Then we place nice bold warning that use of "g" option is discouraged and is provided only for compatibilty and is going be deprecated in future.
> 
> A year later and depending on the mood of people it gets finally deprecated and slowly shifted towards oblivion.

Once "g" is deprecated what is match(someString, pattern) (without all and
first) doing?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 10, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=7260



--- Comment #8 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2013-03-10 11:54:55 PDT ---
(In reply to comment #7)
> (In reply to comment #5)
> 
> > match(someString, pattern).all //range of all matches
> > match(someString, pattern).first //only the first one
> > match(someString, pattern) // using the "g" flag to decide
> 
> 
> (In reply to comment #6)
> 
> > No overrides used would imply "use whatever the default mode is".
> > 
> > How does it sound?
> > 
> > Then we place nice bold warning that use of "g" option is discouraged and is provided only for compatibilty and is going be deprecated in future.
> > 
> > A year later and depending on the mood of people it gets finally deprecated and slowly shifted towards oblivion.
> 
> Once "g" is deprecated what is match(someString, pattern) (without all and
> first) doing?

Could go both ways. The other posibility I just thought about is:

match(...).first - is the same as current match(...).front i.e. simplify
interface for the case when 1 match is needed
match(...).all - the same as current match(... with "g" overrided) i.e. a range

Then once "g" is off we could either make .all a nop.

Alternative is to make it opaque object that has 2 methods only .first/.all.

The third alternative to add alias this to make .first implicit. I feel it won't work reliably with range-based templates as it would make it "2 ranges in one".

So only the first 2 are viable. I'd go with 1st that gets upgraded to the second once people forget about "g" switch entierly.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2