New and Improved AntiXss 3.1, Now With Sanitization

Hi everyone, Bryan here.

As we’ve talked about on this blog many times in the past, the SDL requires the use of the Microsoft AntiXss library to defend against cross-site scripting attacks. However, we haven’t talked about the fact that until now, there have been two separate versions of AntiXss: one freely available to external users, and one restricted to use only inside Microsoft hosted data centers. Both versions include functionality to encode HTML output, so that injected script will be harmlessly rendered as text instead of executed by the target’s browser. However, the internal version also includes functionality to sanitize user input and remove potentially malicious script.

We have wanted to bring this internal technology to the external developer community for some time, so I’m excited to announce that the Information Security Tools team is including the HTML sanitization functionality in the new public version of AntiXss (version 3.1) and releasing the entire library under the Ms-PL open source license. Let’s take a quick look at how this functionality works and when you might want to use it.

When used correctly, output encoding is very effective at preventing XSS. However, a side effect of this is that it’s also very effective at preventing any type of user-specified HTML markup, whether malicious or benign. Yes, “<script>document.location=’'</script>” should probably be blocked, but what about “I like <b>strong</b> coffee”? This is not malicious in any way and it seems overly restrictive to block it. (I’ll leave it to your own sense of good taste to decide whether the use of the <marquee> tag is malicious under any circumstances.)

Until now, the preferred way to selectively allow only certain HTML tags like <b> and <i> was to regex the input to ensure it contained only valid Unicode letter and number characters and those specified tags, something like this:

if (!Regex.IsMatch(input, @"^([p{L}p{N}'s]|<b>|</b>|<i>|</i>){1,40}$")) throw new Exception();

This approach will prevent all unwanted tags, but it will also prevent all attributes on the allowed tags. Sometimes this is good – attackers can add malicious script to onmouseover attributes of <b> and <i> tags – but again, sometimes this is overkill and blocks the use of benign attributes like lang or title. It would be theoretically possible to extend the regular expression to allow these attributes, as well as other safe HTML tags and their attributes, but realistically that would be an incredibly difficult regex both to develop and maintain.

AntiXss 3.1 takes care of all of this logic for you, using the same whitelist approach: it filters the input using a list of known good tags and attributes and strips out all other text. Simply pass the untrusted input through the AntiXss.GetSafeHtml or GetSafeHtmlFragment method to sanitize it:

string output = AntiXss.GetSafeHtml(input);

I strongly encourage everyone to download the new AntiXss 3.1 and incorporate it into your applications starting today. It’s a very effective defense, especially when used in conjunction with the output encoding functionality that’s been a part of AntiXss from the beginning. And again, both output encoding and input sanitization are required by the SDL.

Finally, I’d like to thank both the Exchange team (whose HtmlToHtml library provides the sanitization logic) and the Information Security Tools team for bringing this functionality to the public, where it can do the most good for the most people.

Join the conversation

  1. Anonymous

    Just including references to the project and compiling works on the local machine, but does not work on the shared hosting. Is there anything we need to do except including two .DLL's in the deploy?

Comments are closed.