XSS – Validating User Input

Server\Passive XSS is when a malicious script or HTML is injected by a hacker, usually through your website, and is then persisted or stored somewhere – this is usually in your database. Then another user views a page that references the content, the website passes the malicious script or HTML to the innocent user’s browser and then we have a problem.

For example, lets say a website allows html to be added to a page about a product as part of a user review comment. If the validation is weak a hacker may be able to add a script tag that links to a javascript file.  The javascript file is crafted to add controls to the page to ask the user to re-enter their user name and password. Once the user enters the data the page sends the details to a site under control of the hacker.

An innocent user then logs on to the site, clicks to view the infected product page with the comment on it. The javascript runs on the user’s browser and presents a login screen. The users thinks that it is a little unusual as they have just entered their details but trusts the site and so re-enters their details and clicks submit. At this point their login details are passed to the hacker’s site and the hacker now has control of the user’s account.

The example above is one approach that the hacker could use to implant dangerous content by entering data via a website into the database, but the source of the malacious code could be via a third party system or indeed any source of data your website uses.

Analysing how the hack works there are two distinct steps. The first part is the hacker must inject malicious code or html into the back end and the second part is that the malicious code must be rendered on the victim’s browser in such a way that it is allowed to alter the page.

The second part is usually prevented by encoding dynamic output. This post is about the first part which is about avoiding the malicious code ending up in your database in the first place.

The first line of defence is provided for free by asp.net and is called request validation as discussed here – XSS Prevention – Request Validation. However request validation can be bypassed, the second line of defence is to whitelist or restrict all input which is what this post is about.

Lets look at restricting input; take a look at the following Model for a Person. For the purposes of this discussion we will assume that a matching table sits behind it.
namespace MyModels
{

public class Person
{

public int PersonId { get; set; }

[StringLength(100)]
public string FirstName { get; set; }

[StringLength(100)]
public string LastName { get; set; }

[StringLength(600)]
public string Address{ get; set; }

[StringLength(100)]
public string Telephone{ get; set; }

[StringLength(100)]
public string Email{ get; set; }

[StringLength(100)]
public string MyMainInterest { get; set; }

}

}
This looks fairly standard and does not look particularly risky, the developer has even added [StringLength( )] attributes to match the DB lengths. To make the example more interesting let us assume on the screen MyMainInterest is displayed as a dropdown of potential interests, but for some reason the developer has decided to store it as text in the database. This is perhaps an historical hangover from a previous decision or integration with a report (this is really just for illustration).

There is nothing particularly risky at first sight however a brief investigation will show some weaknesses. Firstly the lengths of all the text fields are quite long for the data being captured. The longer they are the easier it is to enter a script. The following type of script download takes about 40-50 characters
<script src=”http://MySite.com/a.js”/&gt;

Obviously inline malicious javascript would tend to be much longer. The following sizes would limit exposure and reflect more realistic lengths for the fields. This will also probably improve your database performance as well.

namespace MyModels
{

public class Person
{

public int PersonId { get; set; }

[StringLength(20)]
public string FirstName { get; set; }

[StringLength(20)]
public string LastName { get; set; }

[StringLength(400)]
public string Address{ get; set; }

[StringLength(15)]
public string Telephone{ get; set; }

[StringLength(30)]
public string Email{ get; set; }

[StringLength(50)]
public string MyMainInterest { get; set; }

}

}

The next step is to look at the data that is allowed to be entered into each field.

Telephone – can we restrict this via a regex to integers only? (sometimes there is a requirement to include spaces, brackets and sometimes even text like “ext”). For this example we will assume this is a simple system and the customer is happy with integers.

Email – we can regex this as well using a standard Microsoft attribute [EmailAddress].

FirstName and LastName – can we regex these to alpha characters and spaces only?

Address – can we regex this to alpha numeric characters and spaces only? We may need to allow quotes and commas perhaps. Alternatively the address could be split into component parts which would reduce lengths. Let us assume we break up the address into component parts

Next let us look at MyMainInterest, it is a string in the database but it is populated via a dropdown list on the screen. We have a few options on how best to deal with this:
1) Change it in the database and all models to be a foreign key to another table i.e. instead of a string it becomes an int. However we are assuming there is reason this is not acceptable.
2) Change the model on the screen to accept a key (probably an int) and then convert it to the associated text before passing to the database
3) Whitelist the input when received from the user – simply an if or a case statement that checks the value is in a list of values and if not throws an error

Approach 1 or 2 are better solutions, however with the business constraints 2 is the best viable solution. For the sake of illustration let us assume 3 is the chosen option.

The final model may look like
namespace MyModels
{

public class Person
{

public int PersonId { get; set; }

[RegularExpression(@”^[a-zA-Z\s]+$”, ErrorMessage = “Use letters and spaces only please”)]
[StringLength(20)]
public string FirstName { get; set; }

[RegularExpression(@”^[a-zA-Z\s]+$”, ErrorMessage = “Use letters only please”)]
[StringLength(20)]
public string LastName { get; set; }

[RegularExpression(@”^[0-9a-zA-Z ,.]+$”, ErrorMessage = “Use numbers, letters, spaces, commas and full stops only please”)]
[StringLength(100)]
public string Address1{ get; set; }

[RegularExpression(@”^[0-9a-zA-Z ,.]+$”, ErrorMessage = “Use numbers, letters, spaces, commas and full stops only please”)]
[StringLength(50)]
public string Town{ get; set; }

[RegularExpression(@”^[a-zA-Z ]+$”, ErrorMessage = “Use letters, spaces only please”)]
[StringLength(20)]
public string CountyOrState{ get; set; }

public int Country{ get; set; }

[RegularExpression(@”^[0-9a-zA-Z ,.]+$”, ErrorMessage = “Use numbers, letters, spaces, commas and full stops only please”)]
[StringLength(15)]
public string ZipPostCode{ get; set; }

[RegularExpression(@”^\d+$”, ErrorMessage = “Please enter numbers only.”)]
[StringLength(15)]
public string Telephone{ get; set; }

[EmailAddress]
[StringLength(30)]
public string Email{ get; set; }

[StringLength(50)]
public string MyMainInterest { get; set; }

}

}

The last item to validate is MyMainInterest. We know this should be one of a fixed list of inputs. We can validate this as part of our action to save the input with an action method something like below.

[HttpPost]
[ValidateAntiForgeryToken]
public ActionResult Login(Person person)
{

/* lets check automatically validated items are correct*/
if (!ModelState.IsValid)
{
return View(model);
}

/*Now lets check MyMainInterest is correc if not reject input*/
if ((MyMainInterest != “XXX”) || (MyMainInterest != “YYY”) /*etc*/)
{
/*I would send an unhelpful error message as someone is probably tampering with the data so why help them*/
ModelState.AddModelError(String.Empty, “Error has occurred”);
return View(model);
}

/* Add any more complex validation */

/*Finally if everything ok perform the work*/

}

Also for usability you may also want to add maxlengths to your input fields
<textarea type=”text” id=” telephone” name=”telephone” class=”form-control” maxlength = “15”></textarea>

How far you can restrict input really depends on the system in question and what it is for. A government site may have to support a wider range of input than a site that allows a simple registration for a free competition. The less in size and variation of content someone can input into a text field the harder it is to get malicious html or javascript injected into a site.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s