February 17, 2009
Andrei Alexandrescu:
> Terrific. I prefer "regex" to "regexp" because it's easier to pronounce, particularly if you're a foreigner. "Regex" sounds like a frog utterance by a forest lake, "regexp" sounds like nothing in particular.

I'd like std.re :-)

Bye,
bearophile
February 17, 2009
On Tue, 17 Feb 2009 10:36:06 -0800, Andrei Alexandrescu wrote:

> I'm quite unhappy with the API of std.regexp.

I was so happy with using it I wrote my own simplified regex ;-)

> In the upcoming releases of D 2.0 there will be rather dramatic breaking changes of phobos. I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being.

If your changes are going to make things better for coding and maintenance then go for it.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
February 18, 2009
On Wed, Feb 18, 2009 at 7:44 AM, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Bill Baxter wrote:
>>
>> I think this choice is not so much available with D1, plus the constructor situation with D1 is less than ideal.  Given that, I think the choice of class for RegEx was apropriate.   But if the struct problems are all going away in D2, then that's great.  Sounds like you're saying we'll really be able to use D structs just like one uses a non-polymorphic C++ class.  If so, then that's super.
>
> I lost that perspective when criticizing RegExp, you're right. But still the API is lousy - every single time I am using a RegExp, I find myself fumbling through the thoroughly overlapping primitives in the documentation, and never seem to find an idiom that's simple, comfortable, and memorable.

Ok.  I'm certainly not in love with the API either.  Though, the only
RegEx API I've ever used that felt totally comfortable with was
Perl's, which in large part is syntax instead of an API.  Python's
syntax I have to look over the documentation every time I use it, too.
 Maybe it's because of the "matching" vs "searching" distinction that
I find impossible to remember.
(http://docs.python.org/library/re.html)

--bb
February 18, 2009
Andrei Alexandrescu wrote:
> Walter Bright wrote:
>> Andrei Alexandrescu wrote:
>>> I was thinking of moving older stuff to etc, is that ok?
>>
>> Yes. But you should also rename the new one, perhaps to std.regex. That way, legacy code will refuse to compile, rather than compile wrongly.
> 
> Terrific. I prefer "regex" to "regexp" because it's easier to pronounce, particularly if you're a foreigner. "Regex" sounds like a frog utterance by a forest lake, "regexp" sounds like nothing in particular.
> 
> Andrei

It sounds to me like a frog who, immediately post-utterance, just got gigged.  I guess that makes "regex" sound even better... as its still alive (sounding).

-- Chris Nicholson-Sauls
-- Who so far agrees with pretty much everything you've said, and therefore has no real contribution...
February 18, 2009
Jarrett Billingsley wrote:
> On Tue, Feb 17, 2009 at 3:16 PM, BCS <ao@pathlink.com> wrote:
>> could this be transitioned to CTFE? you could even have a debug mode that
>> delays till runtime
>>
>> RegEx mather = new CTFERegEx!("some regex");
>>
>>
>> class CTFERegEx(char[] regex) : RegEx
>> {
>>      debug(NoCTFE)  static char[] done;
>>      else     static const char[] done = CTFECompile(regex);
>>
>>      public this()
>>      {
>>         debug(NoCTFE) if(done == null) done = CTFECompile(regex);
>>
>>         base(done)
>>      }
>> }
> 
> For what it's worth the Tango regexes actually have a method to output
> a D function that will implement the regex after it's compiled.  So
> you _could_ precompile the regex into D code and use that.

I feature which I *adore* by the way.  So long as the precompiled regex is "guaranteed" to run at best possible performance (hand-rolled, hand-optimized solutions notwithstanding) I for one prefer them.

-- Chris Nicholson-Sauls
February 18, 2009
Bill Baxter:
>Python's syntax I have to look over the documentation every time I use it, too. Maybe it's because of the "matching" vs "searching" distinction that I find impossible to remember.<

I agree, I too need the Python docs every time I want to use something more than the basics. The syntax for group catching too is bad (groups? group? itersomething? etc). I have proposed an improvement (using [5] to grab the 5th group() but it was not implemented. Such syntax is possible in D too *hint*). It's because of situations like this that I say that designing a good API for std.re isn't easy at all. It will require care, brain, and maybe two or more tries :-)

Bye,
bearophile
February 18, 2009
On Tue, Feb 17, 2009 at 7:13 PM, Bill Baxter <wbaxter@gmail.com> wrote:
>
> Ok.  I'm certainly not in love with the API either.  Though, the only
> RegEx API I've ever used that felt totally comfortable with was
> Perl's, which in large part is syntax instead of an API.  Python's
> syntax I have to look over the documentation every time I use it, too.
>  Maybe it's because of the "matching" vs "searching" distinction that
> I find impossible to remember.
> (http://docs.python.org/library/re.html)
>

Is there ever a situation where you want to use a single regexp for both matching _and_ searching?  And if not, couldn't you just use ^ to anchor it?  I never understood why Python's API makes such a distinction.
February 18, 2009
Jarrett Billingsley wrote:
> On Tue, Feb 17, 2009 at 7:13 PM, Bill Baxter <wbaxter@gmail.com> wrote:
>> Ok.  I'm certainly not in love with the API either.  Though, the only
>> RegEx API I've ever used that felt totally comfortable with was
>> Perl's, which in large part is syntax instead of an API.  Python's
>> syntax I have to look over the documentation every time I use it, too.
>>  Maybe it's because of the "matching" vs "searching" distinction that
>> I find impossible to remember.
>> (http://docs.python.org/library/re.html)
>>
> 
> Is there ever a situation where you want to use a single regexp for
> both matching _and_ searching?  And if not, couldn't you just use ^ to
> anchor it?  I never understood why Python's API makes such a
> distinction.

Ehm, that's odd. You'd think that after Perl has set the precedent, it would be hard to do major goofs in designing a regex API.

By the way, the more I dig into std.regexp, the stiffer the hair on my neck gets. Get this: the API offers both global functions and member functions, with both RegExp and plain string arguments. The latter are carefully designed to maximize the number of clashes, potential confusions, and errors when using both std.string and std.regex.

But wait, there's more. The API defines the following functions that all ostensibly do some sort of mattern patching (sic): find, search, test, match, and exec. I wish I were kidding. There's some opIndex and opEquals thrown in for good measure. Knuth wouldn't know what each of them does after studying them for a week and then watching an episode from "The Bachelor". And get this: global search() does not do what member search() does. Nope. Global search() does what member test() does. I have only contempt for such designs.


Andrei
February 18, 2009
On Wed, Feb 18, 2009 at 11:38 AM, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> Jarrett Billingsley wrote:
>>
>> On Tue, Feb 17, 2009 at 7:13 PM, Bill Baxter <wbaxter@gmail.com> wrote:
>>>
>>> Ok.  I'm certainly not in love with the API either.  Though, the only
>>> RegEx API I've ever used that felt totally comfortable with was
>>> Perl's, which in large part is syntax instead of an API.  Python's
>>> syntax I have to look over the documentation every time I use it, too.
>>>  Maybe it's because of the "matching" vs "searching" distinction that
>>> I find impossible to remember.
>>> (http://docs.python.org/library/re.html)
>>>
>>
>> Is there ever a situation where you want to use a single regexp for both matching _and_ searching?  And if not, couldn't you just use ^ to anchor it?  I never understood why Python's API makes such a distinction.
>
> Ehm, that's odd. You'd think that after Perl has set the precedent, it would be hard to do major goofs in designing a regex API.
>
> By the way, the more I dig into std.regexp, the stiffer the hair on my neck gets. Get this: the API offers both global functions and member functions, with both RegExp and plain string arguments. The latter are carefully designed to maximize the number of clashes, potential confusions, and errors when using both std.string and std.regex.

All I know is that I found one incantation that works and I've been copy-pasting that every since. :-)

> But wait, there's more. The API defines the following functions that all ostensibly do some sort of mattern patching (sic): find, search, test, match, and exec. I wish I were kidding. There's some opIndex and opEquals thrown in for good measure. Knuth wouldn't know what each of them does after studying them for a week and then watching an episode from "The Bachelor". And get this: global search() does not do what member search() does. Nope. Global search() does what member test() does. I have only contempt for such designs.

Maybe "design" is too strong a word.  Most Phobos modules seem to have been put together rather hastily in order to fill a pressing need. Often *something* is better than nothing at all, even if the something is not so great.

--bb
February 18, 2009
On Tue, Feb 17, 2009 at 9:38 PM, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> By the way, the more I dig into std.regexp, the stiffer the hair on my neck gets. Get this: the API offers both global functions and member functions, with both RegExp and plain string arguments. The latter are carefully designed to maximize the number of clashes, potential confusions, and errors when using both std.string and std.regex.
>
> But wait, there's more. The API defines the following functions that all ostensibly do some sort of mattern patching (sic): find, search, test, match, and exec. I wish I were kidding. There's some opIndex and opEquals thrown in for good measure. Knuth wouldn't know what each of them does after studying them for a week and then watching an episode from "The Bachelor". And get this: global search() does not do what member search() does. Nope. Global search() does what member test() does. I have only contempt for such designs.

Well I don't mean to, uh, toot my own horn but.. I recently bound libpcre to MiniD and came up with a relatively simple but powerful and orthogonal API.

http://www.dsource.org/projects/minid/wiki/Addons/PcreLib#LibraryReference

The regex object has a single "subject" string at a time, the string that it's matching against. The subject is set with "search" and "test" does everything.  All other functions are basically defined in terms of those two.  "test" looks for the next match of the regex in the subject and returns true if it matched.  "match" returns match groups (0 for the whole regex and 1..n for subgroups, as well as string indices for named subgroups).   opApply is just a quicker way of writing something like:

re.search(someSubject)

while(re.test())
    // use re.match to get matches

You'll notice that opApply is also just defined in terms of test.

I've found it far more intuitive than other APIs.  I've never used Perl and I doubt I ever will, though.