Html escaping for security: howto in D? (page 2)

July 07, 2020

Re: Html escaping for security: howto in D?

Posted by aberba
in reply to Fitz

Permalink

aberba

Posted in reply to Fitz

Permalink

On Tuesday, 7 July 2020 at 17:55:44 UTC, Fitz wrote:
> On Monday, 6 July 2020 at 14:57:22 UTC, aberba wrote:
>> utilities...a very long long time ago...two yrs 😜. See https://code.dlang.org/packages/sanival for stripTags()
>> Its a very limited implementation and uses std.regex which many people here who are critical about performance will speak against. I'm yet to see an alternative. So you could use that if you don't find a better alternative.
>>
>
> Can't see stripTags? in https://code.dlang.org/packages/sanival

string stripTags(string input, in string[] allowedTags = [])
{
	import std.regex: Captures, replaceAll, ctRegex;

	auto regex = ctRegex!(`</?(\w*)>`);

	string regexHandler(Captures!(string) match)
	{
	    string insertSlash(in string tag)
	    in
	    {
		assert(tag.length, "Argument must contain one or more characters");
	    }
	    body
	    {
	    	return tag[0..1] ~ "/" ~ tag[1..$];
	    }

	    bool allowed = false;
	    foreach (tag; allowedTags)
	    {
    		if (tag == match.hit || insertSlash(tag) == match.hit)
    		{
    			allowed = true;
    			break;
    		}
	    }
	    return allowed ? match.hit : "";
	}

	return input.replaceAll!(regexHandler)(regex);
}

unittest
{
	assert(stripTags("<html><b>bold</b></html>") == "bold");
	assert(stripTags("<html><b>bold</b></html>", ["<html>"]) == "<html>bold</html>");
}

On Tuesday, 7 July 2020 at 17:59:21 UTC, Fitz wrote: > On Monday, 6 July 2020 at 15:13:30 UTC, aberba wrote: > >> If you want to completely removed all tags, https://code.dlang.org/packages/plain might be better. > > seems overkill, just implemented something simple: > // https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html Again I'm not sure I really understood what you want. If you're trying to escape them with html entities, then my suggestions don't apply. I believe Adam (arsd) has some function in his library for doing html entities of tags.

On Tuesday, 7 July 2020 at 20:21:19 UTC, aberba wrote: > On Tuesday, 7 July 2020 at 17:59:21 UTC, Fitz wrote: >> On Monday, 6 July 2020 at 15:13:30 UTC, aberba wrote: >> >>> If you want to completely removed all tags, https://code.dlang.org/packages/plain might be better. >> >> seems overkill, just implemented something simple: >> // https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html > >I believe Adam (arsd) has some function in his > library for doing html entities of tags. See https://dpldocs.info/experimental-docs/arsd.dom.htmlEntitiesEncode.html

On Tuesday, 7 July 2020 at 23:19:46 UTC, aberba wrote: >>I believe Adam (arsd) has some function in his >> library for doing html entities of tags. > > See https://dpldocs.info/experimental-docs/arsd.dom.htmlEntitiesEncode.html Yeah, that function will encode basically everything so you can concat it into HTML. My libs also have sanitation functions that go even further - you can do a html tag and attribute whitelist via the dom (html.d in my repo) and construct things with those functions too (using just dom.d for this). But I haven't documented all that stuff so you're kinda on your own in figuring it all out... that's why I don't advertise as much as the others. It is easy to use once you get to know it but instead of writing beginner-friendly documentation I often just answer individual's emails. Maybe I will blog about it later though.

On Wednesday, 8 July 2020 at 02:17:31 UTC, Adam D. Ruppe wrote: > On Tuesday, 7 July 2020 at 23:19:46 UTC, aberba wrote: >>>I believe Adam (arsd) has some function in his >>> library for doing html entities of tags. >> >> See https://dpldocs.info/experimental-docs/arsd.dom.htmlEntitiesEncode.html oh another note: that specific function does not encode ' either. So if you using it in an attribute make sure you double quote it correctly. If you build a tree using dom.d's Element class, it will do that consistently for you.

On Tuesday, 7 July 2020 at 20:10:14 UTC, aberba wrote: > unittest > { > assert(stripTags("<html>bold</html>") == "bold"); > assert(stripTags("<html>bold</html>", ["<html>"]) == "<html>bold</html>"); > } Meh, skype strips tags and it's infuriating, basically any text that contains < or > disappears.

On Wednesday, 8 July 2020 at 05:29:16 UTC, Kagamin wrote: > On Tuesday, 7 July 2020 at 20:10:14 UTC, aberba wrote: >> unittest >> { >> assert(stripTags("<html>bold</html>") == "bold"); >> assert(stripTags("<html>bold</html>", ["<html>"]) == "<html>bold</html>"); >> } > > Meh, skype strips tags and it's infuriating, basically any text that contains < or > disappears. Its not perfect and there surely can be a better implementation that covers those edge cases. However stripTags() has its place. Its a very used function available in PHP among others for specific use cases. Now I can't stress "specific" use case enough. Sometimes removing tags...those not whitelisted...is the desired behaviour. You don't want to encode, you simply want to remove them. These days manual tags entry is phasing out for rich text editors. And the rest are using markdown. Nevertheless, stripTags() has its place.

On Tuesday, 7 July 2020 at 18:30:38 UTC, bauss wrote: > On Tuesday, 7 July 2020 at 17:59:21 UTC, Fitz wrote: >> On Monday, 6 July 2020 at 15:13:30 UTC, aberba wrote: > There is no reason to escape / and it might break some parsers for links etc. You should only escape <, >, &, " and ' '/' is in on the OSWASP list. you can use it to break out of a html tag. tbh I can't think about how it can be used?

On Wednesday, 8 July 2020 at 17:27:25 UTC, Fitz wrote: > '/' is in on the OSWASP list. you can use it to break out of a html tag. > tbh I can't think about how it can be used? A javascript string including </script> will end the script interpreter and then spit out html. So a lot of things will do \/ instead to prevent this. If you do context-aware encoding though a lot of this goes away.

Forums