July 03, 2017
On Thursday, 29 June 2017 at 05:30:28 UTC, Patrick Schluter wrote:
> Ouch, parsing html or xml with regular expressions is problematic.
> What people generally don't realize is that the > is not required to be encoded as entity when in the data. This means that <thing attr="Hello >"> or
> <data>></data> are absolutely legal. Regular expressions may break when they encounter them.

Yes, and that is only the beginning: "<" is also legal inside a CDATA section and elements can be encoded as entities and therefore be hidden in the main text. I'm sure there are more gotchas. So, if you parse xml, use a real xml parser.

1 2
Next ›   Last »