Cross-site Scripting – Prevention

This is the final post in my short series on cross-site scripting. After explaining the fundamentals of XSS and having some fun with a few examples, I’d now like to discuss how to prevent these naughty little scripts from threatening web app users. Preventing XSS attacks is the responsibility of the web designer. However, there are a few options for keeping yourself (the user) safe from attack which I’ll discuss afterward.

As I’ve made quite clear in my earlier posts, cross-site scripting attacks occur when untrusted data gets inserted dynamically into the HTML body without being validated first. However, writing the encoders necessary to cover all possible types of input data requires a great deal of effort and can be incredibly difficult. Fortunately, OWASP has already taken the difficulty out of this with ESAPI – Enterprise Security Application Programming Interface. ESAPI is a web app security library that provides security controls such as authentication, access control, input validation, output encoding and escaping, and a lot more. It’s an absolutely incredible tool with bindings for several different languages. If you want a secure web app, use ESAPI. End of story. Though if you’re a horribly deranged .NET developer, Microsoft offers the AntiXSS library. However, I can’t vouch for it since I’m a free and open source software advocate.

OWASP has a really great article called the XSS Prevention Cheat Sheet. The positive model they describe is excellent. In fact, I think it’s so great that I’m going to make it the subject of this post. OWASP has defined 7 rules for strengthening web apps. Let’s take a look at each of these rules.

Rule 0: Never Insert Untrusted Data Except in Allowed Locations

If you’re a network administrator, think of this as the “default deny” ruleset used in firewalls. That is, unless it’s within one of the slots described in Rules 1 – 5, don’t insert untrusted data anywhere else into the body of an HTML document. Nowhere! En ninguna parte! Hiçbir yerde! 아직은 없어요

<script> NEVER PUT UNTRUSTED DATA HERE </script>

<!-- NEVER PUT UNTRUSTED DATA HERE -->

<div NEVER PUT UNTRUSTED DATA HERE=test />

<NEVER PUT UNTRUSTED DATA HERE href="/foo" />

<style> NEVER PUT UNTRUSTED DATA HERE </style>

I can’t think of any legitimate reason for inserting untrusted data into an HTML comment or directly inside a <script> tag. It’s completely asinine. If your web app requires it, I urge you to seriously reconsider the design of your web app.

Rule 1: HTML Escape Before Inserting Untrusted Data into HTML Element Content

This should be a no-brainer. I hope that by now I’ve made it very clear that you must escape untrusted data before inserting it into any dynamic HTML content.

<body> ESCAPE UNTRUSTED DATA BEFORE PUTTING IT HERE </body>

<div> ESCAPE UNTRUSTED DATA BEFORE PUTTING IT HERE </div>

All other HTML elements as well...

Furthermore, you should also use HTML character entity references to escape the following characters:

When this goes in This comes out
& &amp;
< &lt;
> &gt;
" &quot;
' &#x27;
/ &#x2f;

The reason these need to be escaped is because they all introduce a new execution context for the HTML interpreter. Depending on the character, they can be used to either introduce a new subcontext or close the current one.

Rule 2: Attribute Escape Before Inserting Untrusted Data into HTML Common Attributes

This rule only applies to the so-called “common” attributes like name and width. It does not apply to the more advanced attributes like src, href, or class. Neither should it be used with event handlers such as onclick, onload, or onmouseover (event handlers fall under Rule 3).

<div attr=ESCAPE UNTRUSTED DATA BEFORE PUTTING IT HERE>content</div>

<div attr='ESCAPE UNTRUSTED DATA BEFORE PUTTING IT HERE'>content</div>

<div attr="ESCAPE UNTRUSTED DATA BEFORE PUTTING IT HERE">content</div>

All characters below ASCII value 256 that are not alphanumeric should be escaped. If you like, you could use its respective named character entity reference (e.g. &quot; or &amp;) but it’d be easier to just use the numeric format (i.e. &#nnnn; or &#xhhhh;). Wikipedia has a decent list here (I’m totally in love with Wikipedia, by the way).

I should mention that it’s always a good idea to surround attribute values with single or double quotes. In fact, section 3.2.2 of the HTML 4.0 specification recommends it. Regardless of the effect on validators/checkers or issues of style, quoting attribute values does have some security implications. When attributes are properly quoted, they can only be escaped with the corresponding quote. However, it’s possible to break out of unquoted attributes with a number of different characters. Since the spec says that unquoted attributes can only contain alphanumeric characters, hyphens, periods and underscores, an attacker can use any other character to break out of the context. For example, % * + , - / ; ^ | and the space character (ASCII value 0x20).

Rule 3: JavaScript Escape Before Inserting Untrusted Data into JavaScript Data Values

Like I said before, if your web app allows for dynamically generated JavaScript, ask yourself “Is there some other way I can implement the behavior I want?” You’d be putting your users at a tremendous risk. However, if it’s absolutely essential, then be sure to follow this rule.

<script>alert('ESCAPE UNTRUSTED DATA BEFORE PUTTING IT HERE')</script>

<script>var foo = 'ESCAPE UNTRUSTED DATA BEFORE PUTTING IT HERE'</script>

<div onclick="foo = 'ESCAPE UNTRUSTED DATA BEFORE PUTTING IT HERE'"</div>

The only place to put untrusted data where it’s mostly free from harm is inside a quoted data value. This is for the very same reason that you should used quoted attribute values: switching into a new execution context anywhere else is practically child’s play. Just about any JavaScript operator can be used: ; == != && || ?:.

Again, all characters below ASCII value 256 that are not alphanumeric should be escaped. Since we’re talking about JavaScript now and not HTML, characters are escaped using the \xHH syntax. It’s important that you use the ASCII hexadecimal value and not something like \&, \#, or \@. There’s one gotcha: using \" leaves you vulnerable to “escape-the-escape” attacks. All the attacker has to do is include \" which would get transformed to \\" and enables the quote. Pretty clever, right?

Rule 4: CSS Escape and Strictly Validate Before Inserting Untrusted Data into HTML Style Property Values

When inserting untrusted data into CSS <style> tags or stylesheets, it’s important to remember two things: first, it should be properly validated and second, it should only be put inside property values. Placing it anywhere else is just asking for trouble.

<style> selector { property: ESCAPE UNTRUSTED DATA BEFORE PUTTING IT HERE; } </style>

<style> selector { property: "ESCAPE UNTRUSTED DATA BEFORE PUTTING IT HERE"; } </style>

<span style="property: ESCAPE UNTRUSTED DATA BEFORE PUTTING IT HERE">text</style>

Since CSS allows you to use JavaScript to dynamically generate property values, there are a few other things you should be aware of. Make sure that you filter URLs so that they only begin with http: and not javascript:. Internet Explorer 5 introduced a new extension called CSS expressions which you must filter out as well. Even when escaped, there is no safe way to place untrusted data into these expressions.

{ background-url: “javascript:alert('XSS')”; }

{ text-size: “expression(alert('XSS'))”; }

The same rules from Rule 3 for character escaping applies here as well. That is, all characters below ASCII value 256 that are not alphanumeric should be escaped. You also should not use shortcuts like \” for the same reason as in Rule 3.

Rule 5: URL Escape Before Inserting Untrusted Data into HTML URL Parameter Values

Following this rule helps prevent the most common type of XSS attack: placing malicious code directly in GET request parameters.

<a href=”http://example.com?variable=ESCAPE UNTRUSTED DATA BEFORE PUTTING IT HERE”>Link text</a>

All characters below ASCII value 256 that are not alphanumeric should be escaped. URLs use percent-encoding with the %HH syntax where HH is the hexadecimal ASCII value of the character. Character entity references are useless here.

Rule 6: Use an HTML Policy Engine to Validate or Clean User-driven HTML in an Outbound Way

If you want to allow users to embed HTML in their content but only a restricted subset of tags, this is where using a library like ESAPI will come in handy. It allows you to build up an HTML sanitizing whitelist policy very easily without having to write it yourself.

Customizing your policy is very easy. For example, to convert headers into <div> tags, you would use what’s called an element policy:

import org.owasp.html.HtmlPolicyBuilder;

new HtmlPolicyBuilder.allowElement(
    new ElementPolicy() {
        public String apply(String elementName, List attributes) {
            attributes.add("class");
            attributes.add("header-" + elementName);
            return "div";
        }
    },
    "h1", "h2", "h3", "h4", "h5", "h6").build(outputChannel);

For more information, see ESAPI’s javadoc for the Sanitizers and HtmlPolicyBuilder classes.

Bonus Rule: Use the HttpOnly Cookie Flag

The HttpOnly flag is an optional flag that can be included in the Set-Cookie field in the header of an HTTP request. It instructs the browser to use cookies only through the HTTP protocol. When set, a cookie will not be accessible though non-HTTP methods like JavaScript (e.g. document.cookie). As you can imagine, this makes it much more difficult for cookies to be stolen.

If you’re using Apache Tomcat as a servlet container, the HttpOnly flag can be set for all web apps in the conf/context.xml file:

<Context useHttpOnly=”true”>
…
</Context>

In .NET 2.0, it can be set in the web.config file:

<httpCookies httpOnlyCookies=”true” … />

If you’re using PHP (excuse me while I throw up really quick), the flag can be set in the php.ini file using the parameter:

session.cookie_httponly = True

Most modern browsers support HttpOnly so you do not have to worry about issues of platform compatibility.

It’s important to note that these rules only help prevent reflected and stored XSS attacks. Preventing DOM-based attacks is a whole different story. Since I did not talk about DOM-based attacks in this series, I won’t be covering it here. If you’re interested, good ol’ OWASP has a separate article called the DOM-based XSS Prevention Cheat Sheet. Gotta love OWASP. ;)

While it’s impossible to completely eliminate the threat of XSS attacks, following these rules can really strengthen the security model of your web app. Input validation isn’t the key to preventing just XSS attacks but also other forms of code injection like SQL, LDAP, and SSI injection. If you are not cautious, user input forms can easily become a gateway for cyber criminals to compromise both your server and the innocent people who visit your website.

Even if you’re not a web developer, you can still protect yourself from XSS attacks as a user. The most obvious way is simply to disable JavaScript but not only is this less than 100% effective, it just downright stupid. In our modern world of Web 2.0, disabling JavaScript would severely cripple the functionality of nearly every site you visit.

The most effective way to protect yourself would be to use a wonderful little piece of software called NoScript. NoScript is an award-winning Firefox extension that permits executable web content (e.g. JavaScript, Java, Flash, Silverlight, etc.) to run only if the site has been previously deemed trustworthy by you. It implements numerous countermeasures for specific web exploits. Not only can it recognize potential XSS attacks, it also has a firewall-like component called the Application Boundaries Enforcer (ABE), prevents clickjacking with the ClearClick component, and features a few HTTPS enhancements. It even recognized my own attempts to use XSS on my machines which is fun to watch. I’ve been using NoScript for years and I can not recommend it enough. Best of all, it’s free and open source!

There’s also the (unoriginal) NotScripts for you Chrome and Opera users. However, it’s very simplistic and nowhere near as featureful as NoScript. Besides, you should be using Firefox anyway. There’s no reason not to. :P

I hope you’ve enjoyed fooling around with XSS as much as I have. It’s quite distinct from most other web app exploits (except for XSRF which is closely related). Quite frankly though, I’ve had about all I can handle for now with web technologies like JavaScript and HTML. I can tolerate it once in a while but my heart truly lies in lower level hacking like systems programming. At the moment, I’m reading Kevin Mitnick’s “The Art of Deception” so perhaps a post or two on social engineering is in order. We’ll see.

Advertisements

Comments are closed.