Jump to page: 1 26  
Page
Thread overview
earthquake changes of std.regexp to come
Feb 17, 2009
bearophile
Feb 17, 2009
Joel C. Salomon
Feb 17, 2009
dsimcha
Feb 17, 2009
dsimcha
Feb 18, 2009
Yigal Chripun
Feb 18, 2009
Yigal Chripun
Feb 20, 2009
Georg Wrede
Feb 17, 2009
Bill Baxter
Feb 17, 2009
bearophile
Feb 17, 2009
Bill Baxter
Feb 18, 2009
Bill Baxter
Feb 18, 2009
bearophile
Feb 18, 2009
Bill Baxter
Feb 20, 2009
Walter Bright
Feb 20, 2009
Denis Koroskin
Feb 17, 2009
Bill Baxter
Feb 17, 2009
Walter Bright
Feb 17, 2009
bearophile
Feb 19, 2009
Leandro Lucarella
Feb 19, 2009
Ellery Newcomer
Feb 20, 2009
Georg Wrede
Feb 20, 2009
Bill Baxter
Feb 21, 2009
Leandro Lucarella
Feb 17, 2009
BCS
Feb 17, 2009
Daniel de Kok
Feb 17, 2009
bearophile
Feb 17, 2009
BCS
Feb 17, 2009
BCS
Feb 17, 2009
Daniel de Kok
Feb 17, 2009
Daniel de Kok
Feb 17, 2009
BCS
Feb 17, 2009
Daniel de Kok
Feb 17, 2009
Derek Parnell
February 17, 2009
I'm quite unhappy with the API of std.regexp. It's a chaotic design that provides a hodgepodge of functionality and tries to use as many synonyms of "to find" in the dictionary (e.g. search, match). I could swear Walter never really cared for using regexps, and that is felt throughout the design: it fills the bullet point but it's asinine to use.

Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it.

In the upcoming releases of D 2.0 there will be rather dramatic breaking changes of phobos. I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being.


Andrei
February 17, 2009
Don't be too much hard with the good Walter, please :-) One good thing in his designs (in D1) is that they are often simple to use: they give you back much more than you give them. D2 seems to ask much more from the programmer.

I agree that the API of regexes in Phobos is not much good, but I think designing a good API for it is quite hard.


> I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being.

I have no problems in accepting changes here too. D2 is already essentially another language compared to D1.

Regarding regexes of D1 Phobos, it has problems bigger than just the API, in the past I have found some common cases where it is O(n^2) or more.

You can see a case of such behaviours here (look at my comments that show what parts are slow, I have also commented out versions that more logical but much slower):
http://shootout.alioth.debian.org/debian/benchmark.php?test=regexdna&lang=gdc&id=4

If you want to test that code you can generate test data with this other code: http://shootout.alioth.debian.org/debian/benchmark.php?test=fasta&lang=dlang&id=1

Bye,
bearophile
February 17, 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> I'm quite unhappy with the API of std.regexp. It's a chaotic design that
> provides a hodgepodge of functionality and tries to use as many synonyms
> of "to find" in the dictionary (e.g. search, match). I could swear
> Walter never really cared for using regexps, and that is felt throughout
> the design: it fills the bullet point but it's asinine to use.
> Besides std.regexp only works with (narrow) strings and we want it to
> work on streams of all widths and structures. One pet complaint I have
> is that std.regexp puts a class around it all as if everybody's favorite
> pastime would be to inherit Regexp and override some random function in it.
> In the upcoming releases of D 2.0 there will be rather dramatic breaking
> changes of phobos. I just wanted to ask whether y'all could stomach yet
> another rewritten API or you'd rather use std.regexp as it is for the
> time being.
> Andrei

As I've said before, anyone who can't stomach breaking changes w/o complaining has no business using D2 at this point.  I'd rather deal with the aggravation of stuff breaking in the sort run to have a nice language and libraries to go with it in the long run.

This whole concept of ranges as you've created them seems to have achieved the the holy grail of both making simple things simple and complex things possible, where "complex things" includes needing code to be efficient, so I can see your reason for wanting to redo all kinds of stuff in them.  This compares favorably to C++ STL iterators, which are very flexible and efficient but a huge PITA to use for simple things because the syntax is so low-level and ugly, and to the D1/early D2 way, which gives beautiful, simple notation for the more common cases (basic dynamic arrays), at the expense of flexiblity when doing more complicated things like streams, chaining, strides, etc.
February 17, 2009
On Wed, Feb 18, 2009 at 3:36 AM, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it.

So what do you think it should be, a struct?
That would imply to me that everybody's favorite pastime is making
value copies of regex structures, when in fact nobody does that.

Regex is a class in order to give it reference semantics and provide encapsulation of some re-usable state.  Maybe it should be a final class, but my impression is "final class" doesn't really work in D.

--bb
February 17, 2009
== Quote from dsimcha (dsimcha@yahoo.com)'s article
> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> > I'm quite unhappy with the API of std.regexp. It's a chaotic design that
> > provides a hodgepodge of functionality and tries to use as many synonyms
> > of "to find" in the dictionary (e.g. search, match). I could swear
> > Walter never really cared for using regexps, and that is felt throughout
> > the design: it fills the bullet point but it's asinine to use.
> > Besides std.regexp only works with (narrow) strings and we want it to
> > work on streams of all widths and structures. One pet complaint I have
> > is that std.regexp puts a class around it all as if everybody's favorite
> > pastime would be to inherit Regexp and override some random function in it.
> > In the upcoming releases of D 2.0 there will be rather dramatic breaking
> > changes of phobos. I just wanted to ask whether y'all could stomach yet
> > another rewritten API or you'd rather use std.regexp as it is for the
> > time being.
> > Andrei
> As I've said before, anyone who can't stomach breaking changes w/o complaining has
> no business using D2 at this point.  I'd rather deal with the aggravation of stuff
> breaking in the sort run to have a nice language and libraries to go with it in
> the long run.
> This whole concept of ranges as you've created them seems to have achieved the the
> holy grail of both making simple things simple and complex things possible, where
> "complex things" includes needing code to be efficient, so I can see your reason
> for wanting to redo all kinds of stuff in them.  This compares favorably to C++
> STL iterators, which are very flexible and efficient but a huge PITA to use for
> simple things because the syntax is so low-level and ugly, and to the D1/early D2
> way, which gives beautiful, simple notation for the more common cases (basic
> dynamic arrays), at the expense of flexiblity when doing more complicated things
> like streams, chaining, strides, etc.

BTW, can you elaborate on how arrays, both builtin and any library versions, will work when everything is finalized?
February 17, 2009
On Wed, Feb 18, 2009 at 3:36 AM, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> I'm quite unhappy with the API of std.regexp. It's a chaotic design that provides a hodgepodge of functionality and tries to use as many synonyms of "to find" in the dictionary (e.g. search, match). I could swear Walter never really cared for using regexps, and that is felt throughout the design: it fills the bullet point but it's asinine to use.
>
> Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it.
>
> In the upcoming releases of D 2.0 there will be rather dramatic breaking changes of phobos. I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being.
>

Btw, I've got no problems with you breaking the API of 2.0 either. Though you might consider moving the current implementation to std.deprecated.regex and leaving it there for a year with a pragma(msg, "This module is deprecated").

That way making a quick fix to broken code is just a matter of inserting ".deprecated" into your import statements.

--bb
February 17, 2009
Reply to Andrei,

> I'm quite unhappy with the API of std.regexp. It's a chaotic design
> that provides a hodgepodge of functionality and tries to use as many
> synonyms of "to find" in the dictionary (e.g. search, match). I could
> swear Walter never really cared for using regexps, and that is felt
> throughout the design: it fills the bullet point but it's asinine to
> use.
> 
> Besides std.regexp only works with (narrow) strings and we want it to
> work on streams of all widths and structures. One pet complaint I have
> is that std.regexp puts a class around it all as if everybody's
> favorite pastime would be to inherit Regexp and override some random
> function in it.
> 
> In the upcoming releases of D 2.0 there will be rather dramatic
> breaking changes of phobos. I just wanted to ask whether y'all could
> stomach yet another rewritten API or you'd rather use std.regexp as it
> is for the time being.
> 
> Andrei
> 

For what it's worth, I have a partial clone of the .NET API built on top of PCRE. I would have to ask my boss but I expect I could donate it if anyone want to use it as a basis.


February 17, 2009
On Tue, Feb 17, 2009 at 8:39 PM, BCS <ao@pathlink.com> wrote:
> For what it's worth, I have a partial clone of the .NET API built on top of PCRE. I would have to ask my boss but I expect I could donate it if anyone want to use it as a basis.

Actually, I was wondering why nobody is considering real regular languages anymore, that can be compiled to a normal finite state recognizer or transducer. While this may not be as fancy as Perl-like extensions, they are much faster, and it's easier to do fun stuff such as composition.

Take care,
Daniel
February 17, 2009
bearophile wrote:
> I agree that the API of regexes in Phobos is not much good, but I think designing a good API for it is quite hard.

So steal one, rather than invent something new. My suggestion would be to expose the DFA object, as in Plan 9’s library (documentation at <http://plan9.bell-labs.com/magic/man2html/2/regexp>, implementation at <http://plan9.bell-labs.com/sources/plan9/sys/src/libregexp/>, discussion and links to a Unix implementation at <http://swtch.com/~rsc/regexp/>).

Simple API:
• regcomp: Compile a regexp DFA;
• regexec: Apply it to a string, returning a slice of the string that
matches the first hit (or an array of slices if parenthesized
expressions are used); and
• regsub: Apply substitutions to subexpressions of the matching slice.

—Joel Salomon
February 17, 2009
On Tue, Feb 17, 2009 at 2:47 PM, Daniel de Kok <me@danieldk.org> wrote:
> On Tue, Feb 17, 2009 at 8:39 PM, BCS <ao@pathlink.com> wrote:
>> For what it's worth, I have a partial clone of the .NET API built on top of PCRE. I would have to ask my boss but I expect I could donate it if anyone want to use it as a basis.
>
> Actually, I was wondering why nobody is considering real regular languages anymore, that can be compiled to a normal finite state recognizer or transducer. While this may not be as fancy as Perl-like extensions, they are much faster, and it's easier to do fun stuff such as composition.

Tango's regex engine is just that.  It uses a tagged NFA method. http://www.dsource.org/projects/tango/docs/current/tango.text.Regex.html

The problem with this method is that while it's certainly faster to match, it's MUCH slower to compile.  There are no pathological matches; only pathological compiles ;)  I'm talking 60-70 seconds to compile a more complex regex.  This might be an acceptable tradeoff for when you need to compile a regex in a long-running app like a server, but it's completely unacceptable for most small, Perl-like text munging programs.

Unless of course this slowdown is unique to Tango's implementation of this method!
« First   ‹ Prev
1 2 3 4 5 6