[draft] New std.regex walkthrough (page 2)

March 14, 2012

Re: [draft] New std.regex walkthrough

Posted by Dmitry Olshansky
in reply to Brad Anderson

Permalink

Dmitry Olshansky

Posted in reply to Brad Anderson

Permalink

On 14.03.2012 0:32, Brad Anderson wrote:
> On Tue, Mar 13, 2012 at 1:27 PM, Dmitry Olshansky <dmitry.olsh@gmail.com
> <mailto:dmitry.olsh@gmail.com>> wrote:
>
>     For a couple of releases we have a new revamped std.regex, that as
>     far as I'm concerned works nicely, thanks to my GSOC commitment last
>     summer. Yet there was certain dark trend around std.regex/std.regexp
>     as both had severe bugs, missing documentation and what not, enough
>     to consider them unusable or dismiss prematurely.
>
>     It's about time to break this gloomy aura, and show that std.regex
>     is actually easy to use, that it does the thing and has some nice
>     extras.
>
>     Link: http://blackwhale.github.com/__regular-expression.html
>     <http://blackwhale.github.com/regular-expression.html>
>
>     Comments are welcome from experts and newbies alike, in fact it
>     should encourage people to try out a few tricks ;)
>
>     This is intended as replacement for an article on dlang.org
>     <http://dlang.org>
>     about outdated (and soon to disappear) std.regexp:
>     http://dlang.org/regular-__expression.html
>     <http://dlang.org/regular-expression.html>
>
>     [Spoiler] one example relies on a parser bug being fixed (blush):
>     https://github.com/D-__Programming-Language/phobos/__pull/481
>     <https://github.com/D-Programming-Language/phobos/pull/481>
>     Well, it was a specific lookahead inside lookaround so that's not
>     severe bug ;)
>
>     P.S. I've been following through a bunch of new bug reports
>     recently, thanks to everyone involved :)
>
>
>     --
>     Dmitry Olshansky
>
>
> Second paragraph:
> - "..,expressions, though one though one should..." has too many "though
> one"s
>
> Third paragraph:
> - "...keeping it's implementation..." should be "its"
> - "We'll see how close to built-ins one can get this way." was kind of
> confusing.  I'd consider just doing away with the distinction between
> built in and non-built in regex since it's an implementation detail most
> programmers who use it don't even need to know about.  Maybe say that it
> is not built in and explain why that is a neat thing to have (meaning,
> the language itself is powerful enough to express it in user code).
>

Yeah, the point about built-in vs library is kind of dangling in the air for now. Will see how to wrap it up.

> Fourth paragraph:
> - "...article you'd have..." should probably be "you'll" or, preferably,
> "you will".
> - "...utilize it's API..." should be "its"
> - "yet it's not required to get an understanding of the API." I'd
> probably change this to "...yet it's not required to understand the API"
>
> Lost track of which paragraph:
> - "... that allows writing a regex pattern in it's natural notation"
> another "its"
> - "trying to match special characters like" I'd write "trying to match
> special regex characters like" for clarity
> - "over input like e.g. search or simillar" I'd remove the e.g., write
> search as "search()" to show it's a function in other languages and fix
> the spelling of similar :P
> - "An element type is Captures for the string type being used, it is a
> random access range." I just found this confusing.  Not sure what it's
> trying to say.
> - "I won't go into full detail of the range conception, suffice to say,"
> I'd change "conception" to "concept" and remove "suffice to say". (It's
> a shame we don't a range article we can link to).
> - "At that time ancors like" misspelled "anchors"

All to the point and fixed.

> - "Needless to say, one need not" I'd remove the "Needless to say,"
> because I think it's actually important to say :P

It's not important, as it has no effect on matching if there no anchors. It's just cleaner to the reader, because it alerts along the way of "hm, this guy don't know what multi-line is, let's stay sharp and watch out for other problems".

> - "replace(text, regex(r"([0-9]{1,2})/([0-9]{1,2})/([0-9]{4})","g"),
> "--");" Is this code example correct?  It references $1, $2, etc. in the
> explanatory paragraph below but they are no where to be found.

Damnable DDoc ate my dollars!
And that's inside source code section, any ideas on how to avoid this mess?

> - When you are explaining named captures it sounds like you are about to
> show them in the subsequent code example but you are actually showing
> what it'd look like without them which was a bit confusing.
> - Maybe some more words on what lookaround/lookahead do as I was lost.

> - "Amdittedly, barrage of ? and ! makes regex rather obscure, more then
> it's actually is. However" should be "Admittedly, the barrage of ? and !
> makes the regex rather obscure, more than it actually is.".  Maybe
> change "obscure" to a different adjective. Perhaps "complex looking" or
> "complicated". (note I've removed the "However" as the upcoming sentence
> isn't contradicting what you just said.
> - "Needless to say it's", again, I think it's rather important to say :P

Here I concur ;)

> - "Run-time version took around 10-20us on my machine, admittedly no
> statistics." here, borrow this "µ" :P.  Also, I'd get rid of "admittedly
> no statistics".
> - "meaningful tasks, it's features" another "its"
> - "together it's major" and another :P

Yeah, that an "it's" killing parade :)]

> - "...flexible tools: match, replace, spliter" should be spelled "splitter"
>
>
> Great article.  I didn't even know about the replacement delegate
> feature which is something I've often wished I could use in other regex
> systems.  D and Phobos need more articles like this.  We should have a
> link to it from the std.regex documentation once this is added to the
> website.
>

Thanks again.


-- 
Dmitry Olshansky

On 14.03.2012 0:54, H. S. Teoh wrote: > On Tue, Mar 13, 2012 at 11:27:57PM +0400, Dmitry Olshansky wrote: >> For a couple of releases we have a new revamped std.regex, that as >> far as I'm concerned works nicely, thanks to my GSOC commitment last >> summer. Yet there was certain dark trend around std.regex/std.regexp >> as both had severe bugs, missing documentation and what not, enough >> to consider them unusable or dismiss prematurely. >> >> It's about time to break this gloomy aura, and show that std.regex >> is actually easy to use, that it does the thing and has some nice >> extras. >> >> Link: http://blackwhale.github.com/regular-expression.html >> >> Comments are welcome from experts and newbies alike, in fact it >> should encourage people to try out a few tricks ;) > [...] > > Yay! Updated docs is always a good thing. I'd like to do some > copy-editing to make it nicer to read. (Hope you don't mind my extensive > revisions, I'm trying to make the docs as professional as possible.) > My revisions are in straight text under the quoted sections, and inline > comments are enclosed in []. > > >> Introduction >> >> String processing is a kind of daily routine that most applications do >> in a one way or another. It should come as no wonder that many >> programming languages have standard libraries stoked with specialized >> functions for common needs. > > String processing is a common task performed by many applications. Many > programming languages come with standard libraries that are equipped > with a variety of functions for common string processing needs. > I like equipped ;) > >> The D programming language standard library among others offers a nice >> assortment in std.string and generic ones from std.algorithm. > > The D programming language standard library also offers a nice > assortment of such functions in std.string, as well as generic functions > in std.algorithm that can also work with strings. > > >> Still no amount of fixed functionality could cover all needs, as >> naturally flexible text data needs flexible solutions. > > Still no amount of predefined string functions could cover all needs. > Text data is very flexible by nature, and so needs flexible solutions. > > >> Here is where regular expressions come in handy, often succinctly >> called as regexes. > > This is where regular expressions, or regexes for short, come in. > > >> Simple yet powerful language for defining patterns of strings, put >> together with a substitution mechanism, forms a Swiss Army knife of >> text processing. > > Regexes are a simple yet powerful language for defining patterns of > strings, and when integrated with a substitution mechanism, forms a > Swiss Army knife of text processing. > > >> It's considered so useful that a number of languages provides built-in >> support for regular expressions, though one though one should not jump >> to conclusion that built-in implies faster processing or more >> features. It's all about getting more convenient and friendly syntax >> for typical operations and usage patterns. > > It's considered so useful that a number of languages provides built-in > support for regular expressions. (This doesn't necessarily mean, > however, that built-in implies faster processing or more features. It's > more a matter of providing a more convenient and friendly syntax for > typical operations and usage patterns.) > > [I think it's better to put the second part in parentheses, since it's > not really the main point of this doc.] I think putting that much in parens is a bad idea, but your wording is clearly superior. > > >> The D programming language provides a standard library module >> std.regex. > > [OK] > > >> Being a highly expressive systems language, it opens a possibility to >> get a good look and feel via core features, while keeping it's >> implementation within the language. > > Being a highly expressive systems language, D allows regexes to be > implemented within the language itself, yet still have the same level of > readability and usability that a built-in implementation would provide. > Nice! > >> We'll see how close to built-ins one can get this way. > > We will see below how close to built-in regexes we can achieve. > > >> By the end of article you'd have a good understanding of regular >> expression capabilities in this library, and how to utilize it's API >> in a most straightforward way. > > By the end of this article, you will have a good understanding of the > regular expression capabilities offered by this library, and how to > utilize its API in the most straightforward way. > > > >> Examples in this article assume the reader has fairly good >> understanding of regex elements, yet it's not required to get an >> understanding of the API. > > Examples in this article assume that the reader has fairly good > understanding of regex elements, but this is not required to get an > understanding of the API. > > [I'll do this much for now. More to come later.] > > Thanks. -- Dmitry Olshansky

Forums