July 03, 2017 Re: My simple implementation of PHP strip_tags() | ||||
---|---|---|---|---|
| ||||
Posted in reply to Patrick Schluter | On Thursday, 29 June 2017 at 05:30:28 UTC, Patrick Schluter wrote:
> Ouch, parsing html or xml with regular expressions is problematic.
> What people generally don't realize is that the > is not required to be encoded as entity when in the data. This means that <thing attr="Hello >"> or
> <data>></data> are absolutely legal. Regular expressions may break when they encounter them.
Yes, and that is only the beginning: "<" is also legal inside a CDATA section and elements can be encoded as entities and therefore be hidden in the main text. I'm sure there are more gotchas. So, if you parse xml, use a real xml parser.
|
Copyright © 1999-2021 by the D Language Foundation