Thread overview |
---|
May 09, 2008 hyperlink regular expression pattern | ||||
---|---|---|---|---|
| ||||
I want to split an HTML anchor tag into its constituent parts. I have a regular expression pattern that works with .NET's Regex class, but not with std.regexp - it errors out with "*+? not allowed in atom". I think this means something in the pattern is non-standard. Here's my code: if (auto m = std.regexp.search( "<a href=\"www.google.com\">Google</a>", r"<a.*?href=[""'](?<url>.*?)[""'].*?>(?<name>.*?)</a>")) { string url = m.match(1); string name = m.match(2); } The problematic parts are "?<url>" and "?<name>" - but not being a whiz with regular expressions, I don't know what to use instead. Perhaps someone's got a better pattern they could post? John. |
May 09, 2008 Re: hyperlink regular expression pattern | ||||
---|---|---|---|---|
| ||||
Posted in reply to John C | > Perhaps someone's got a better pattern they could post? this works for me (dmd 1.029) import std.regexp; void main() { if (auto m = std.regexp.search( "<a href=\"www.google.com:8080/dfs?a1=1&a2=2\">This is Google link</a>", "<a[^>]+href=(['\"]?)(.*?)\\1.*?>(.*)</a>")) { for(int i=0; i<10; i++) { printf("%d=\"%.*s\"\n", i, m.match(i)); } } } |
May 09, 2008 Re: hyperlink regular expression pattern | ||||
---|---|---|---|---|
| ||||
Posted in reply to novice2 | novice2 Wrote:
> > Perhaps someone's got a better pattern they could post?
>
> this works for me (dmd 1.029)
>
> import std.regexp;
>
> void main()
> {
> if (auto m = std.regexp.search(
> "<a href=\"www.google.com:8080/dfs?a1=1&a2=2\">This is Google link</a>",
> "<a[^>]+href=(['\"]?)(.*?)\\1.*?>(.*)</a>"))
> {
> for(int i=0; i<10; i++)
> {
> printf("%d=\"%.*s\"\n", i, m.match(i));
> }
> }
> }
>
Thanks - that seems to extract the href and text. What about getting other attributes like name and title, as in this link:
<a href=\"www.google.com\" name=\"googleLink\" title=\"Click Me\">Google Link</a>
|
May 09, 2008 Re: hyperlink regular expression pattern | ||||
---|---|---|---|---|
| ||||
Posted in reply to John C | John C Wrote: > Thanks - that seems to extract the href and text. what you asked - that you got :) "question is half of answer" :) >What about getting other attributes like name and title, as in this link: > > <a href=\"www.google.com\" name=\"googleLink\" title=\"Click Me\">Google Link</a> imho, it can't be done by one regexp match. because of random sequense of attributes. imho, you should get whole <a> tag attributes string, then iterate attributes in it. something like this below. but sorry, it can't catch attributes without quotes. may be, std.strings non-regexp will be better when parsing attributes. ////// import std.regexp; import std.stdio; void main() { if (auto m = std.regexp.search("<anothertag><a href=\"www.google.com:8080/dfs?a1=1&a2=2\" name='google Link' color=red title=\"Click Me\"\">This is Google link</a></anothertag>", "<a(\\s.*?)>(.*?)</a>")) { writefln("tag attributes: \"%s\"", m.match(1)); writefln("tag content: \"%s\"", m.match(2)); foreach(s; RegExp("(\\S+?)=(['\"]?)(.*?)\\2").search(m.match(1))) { writefln("found attribute: name=\"%s\", value=\"%s\"", s.match(1), s.match(3)); } } } |
Copyright © 1999-2021 by the D Language Foundation