Jump to page: 1 2
Thread overview
My simple implementation of PHP strip_tags()
Jun 28, 2017
aberba
Jun 28, 2017
Vladimir Panteleev
Jun 28, 2017
aberba
Jun 28, 2017
Adam D. Ruppe
Jun 28, 2017
Vladimir Panteleev
Jun 28, 2017
aberba
Jun 28, 2017
Vladimir Panteleev
Jun 28, 2017
aberba
Jun 29, 2017
Patrick Schluter
Jun 30, 2017
Márcio Martins
June 28, 2017
I wanted strip_tags() for sanitization in vibe.d and I set out for algorithms on how to do it and came across this JavaScript library at https://github.com/ericnorris/striptags/blob/master/src/striptags.js which is quite popular judging by the number of likes and forks. As a looked through, I didn't like the cumbersome approach it used so I tried to implement it in my own way. This is what I lazily did. It turns out to be so simple that I thought I could use some opinion. Notice I didn't add `tag_replacement` param but that's just like one line of code.

string stripTags(string input, in string[] allowedTags = [])
{
	import std.regex: Captures, replaceAll, ctRegex;

	auto regex = ctRegex!(`</?(\w*)>`);

	string regexHandler(Captures!(string) match)
	{
	    string insertSlash(in string tag)
	    in
	    {
		assert(tag.length, "Argument must contain one or more characters");
	    }
	    body
	    {
	    	return tag[0..1] ~ "/" ~ tag[1..$];
	    }

	    bool allowed = false;
	    foreach (tag; allowedTags)
	    {
    		if (tag == match.hit || insertSlash(tag) == match.hit)
    		{
    			allowed = true;
    			break;
    		}
	    }
	    return allowed ? match.hit : "";
	}

	return input.replaceAll!(regexHandler)(regex);
}

unittest
{
	assert(stripTags("<html><b>bold</b></html>") == "bold");
	assert(stripTags("<html><b>bold</b></html>", ["<html>"]) == "<html>bold</html>");
}



I'm not sure the tags matching regex I used is the best though.
June 28, 2017
On Wednesday, 28 June 2017 at 18:08:12 UTC, aberba wrote:
> I wanted strip_tags() for sanitization

Careful. If you don't implement this correctly (and it may be surprisingly difficult to), you may expose your site to XSS attacks.

Instead of stripping tags, you may want to encode HTML entities instead ('<' -> "&lt;" etc.)

> 	auto regex = ctRegex!(`</?(\w*)>`);

This will not capture <script src="...">.

June 28, 2017
On Wednesday, 28 June 2017 at 18:51:41 UTC, Vladimir Panteleev wrote:
> On Wednesday, 28 June 2017 at 18:08:12 UTC, aberba wrote:
>> I wanted strip_tags() for sanitization
>
> Careful. If you don't implement this correctly (and it may be surprisingly difficult to), you may expose your site to XSS attacks.
>
> Instead of stripping tags, you may want to encode HTML entities instead ('<' -> "&lt;" etc.)
>
>> 	auto regex = ctRegex!(`</?(\w*)>`);
>
> This will not capture <script src="...">.


I'm already using prepared statements thoroughly. strip_tags() has its own uses beside making it safe for db storage.
June 28, 2017
On Wednesday, 28 June 2017 at 19:14:19 UTC, aberba wrote:
> I'm already using prepared statements thoroughly. strip_tags() has its own uses beside making it safe for db storage.

prepared statements fight sql injection at save time. HTML encoding is about fighting XSS when displaying stuff to the browser.

XSS is when some user inserts a script on your site that another user then sees and executes as them.

Personally, I'd never use a strip_tags function. I'd actually parse the html, work on a dom level, then reoutput it with proper encoding for whatever context it is being used in.
June 28, 2017
On Wednesday, 28 June 2017 at 19:14:19 UTC, aberba wrote:
> I'm already using prepared statements thoroughly. strip_tags() has its own uses beside making it safe for db storage.

Nothing to do with DB storage! XSS and SQL injections are two very distinct classes of vulnerabilities.

Please read this ASAP: https://en.wikipedia.org/wiki/Cross-site_scripting
June 28, 2017
On Wednesday, 28 June 2017 at 19:21:35 UTC, Vladimir Panteleev wrote:
> On Wednesday, 28 June 2017 at 19:14:19 UTC, aberba wrote:
>> I'm already using prepared statements thoroughly. strip_tags() has its own uses beside making it safe for db storage.
>
> Nothing to do with DB storage! XSS and SQL injections are two very distinct classes of vulnerabilities.
>
> Please read this ASAP: https://en.wikipedia.org/wiki/Cross-site_scripting

Ha ha. I will strip out <script> tags in the regex. Its better to get rig of tags where not needed for clients other than a browser. Please criticize the stripTags() implementation
June 28, 2017
On Wednesday, 28 June 2017 at 19:50:44 UTC, aberba wrote:
>> Please read this ASAP: https://en.wikipedia.org/wiki/Cross-site_scripting
>
> Ha ha. I will strip out <script> tags in the regex. Its better to get rig of tags where not needed for clients other than a browser. Please criticize the stripTags() implementation

I see you've ignored my advice.

Please, at least read this section:

https://en.wikipedia.org/wiki/Cross-site_scripting#Safely_validating_untrusted_HTML_input

June 28, 2017
On Wednesday, 28 June 2017 at 19:58:31 UTC, Vladimir Panteleev wrote:
> On Wednesday, 28 June 2017 at 19:50:44 UTC, aberba wrote:
>>> Please read this ASAP: https://en.wikipedia.org/wiki/Cross-site_scripting
>>
>> Ha ha. I will strip out <script> tags in the regex. Its better to get rig of tags where not needed for clients other than a browser. Please criticize the stripTags() implementation
>
> I see you've ignored my advice.
>
> Please, at least read this section:
>
> https://en.wikipedia.org/wiki/Cross-site_scripting#Safely_validating_untrusted_HTML_input

My bad. I will read it.
June 29, 2017
On Wednesday, 28 June 2017 at 18:08:12 UTC, aberba wrote:
> I wanted strip_tags() for sanitization in vibe.d and I set out for algorithms on how to do it and came across this JavaScript library at
>
> string stripTags(string input, in string[] allowedTags = [])
> {
> 	import std.regex: Captures, replaceAll, ctRegex;
>
> 	auto regex = ctRegex!(`</?(\w*)>`);
>
Ouch, parsing html or xml with regular expressions is problematic.
What people generally don't realize is that the > is not required to be encoded as entity when in the data. This means that <thing attr="Hello >"> or
<data>></data> are absolutely legal. Regular expressions may break when they encounter them.

http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx/
https://blog.codinghorror.com/parsing-html-the-cthulhu-way/


June 30, 2017
On Wednesday, 28 June 2017 at 18:08:12 UTC, aberba wrote:
> I wanted strip_tags() for sanitization in vibe.d and I set out for algorithms on how to do it and came across this JavaScript library at https://github.com/ericnorris/striptags/blob/master/src/striptags.js which is quite popular judging by the number of likes and forks. As a looked through, I didn't like the cumbersome approach it used so I tried to implement it in my own way. This is what I lazily did. It turns out to be so simple that I thought I could use some opinion. Notice I didn't add `tag_replacement` param but that's just like one line of code.
>
> [...]

I wrote this a while ago, not sure if it's useful for your purposes, but it has been working quite well so far: http://code.dlang.org/packages/plain
« First   ‹ Prev
1 2