5

I am thinking of secure ways to serve HTML and JSON to JavaScript. Currently I am just outputting the JSON like:

ajax.php?type=article&id=15
{
 "name":    "something",
 "content": "some content"
}

but I do realize this is a security risk -- because the articles are created by users. So, someone could insert script tags (just an example) for the content and link to his article directly in the AJAX API. Thus, I am now wondering what's the best way to prevent such issues. One way would be to encode all non alphanumerical characters from the input, and then decode in JavaScript (and encode again when put in somewhere).

Another option could be to send some headers that force the browser to never render the response of the AJAX API requests (Content-Type and X-Content-Type-Options).

6
  • 1
    The best way is to clean or validate the user input when they submit an article. Commented Jul 24, 2010 at 11:37
  • @Felix Kling possibly, but who's to say that an article cannot include the string <script>? Commented Jul 24, 2010 at 12:55
  • @Pointy the validation routine, that's who! Commented Jul 24, 2010 at 13:38
  • Well, I for one don't like telling my customers what they can and can't type into "Comment" or "Notes" textareas. It just seems rude. And how do I know that the street they live on doesn't have an ampersand in its name? It's fine to validate input when it is naturally constrained (though it's better to design UIs when possible that do that constraining for the user so that invalid input is impossible, like using a datepicker instead of a text field), but otherwise input-time is the wrong time to scrub user-supplied content. Commented Jul 24, 2010 at 14:09
  • @Felix Kling Sorry but you are wrong, thats not always the right answer. Commented Jul 24, 2010 at 18:38

6 Answers 6

7

If you set the Content-Type to application/json then NO Browser will execute JavaScript on that page. This is apart of RFC-4627, and Google uses this to protect them selves. Other Application/ Content types follow similar rules.

You still have to worry about DOM Based XSS, however this would be a problem with your JavaScript, not really the content of the json. Another more exotic security concern with Json is information leakage like this vulnerability in gmail.

Make sure to always test your code. There is the Sitewatch free xss scanner, or the open source Skipfish and finally you could test this manually with a simple <script>alert(/xss/)</script>.

Sign up to request clarification or add additional context in comments.

Comments

4

Instead of worrying about how you could encode the malicious code when you return it, you should probably take care that it does not even get into your database. A quick google search about preventing cross-site scripting and input validation might help you here. Cheers

8 Comments

So, you are suggesting Input Encoding. Well, that's one option, but it has to be very strict -- as the server cannot yet tell where the content will end up.
@The Rook If you read carefully, you will notice that I suggested to encode the user input before storing it on the server (The OP apparently understood this. btw). The google reference was just a helpful pointer in the right direction.
@rFactor That's true. There are various things you can do to improve security and usually using only one of them is not enough.
@moxn I disagree, data in the db shouldn't be encoded because it makes it more difficult to make comparisons. For something like a comment or a blog post its not going to matter. But what about the date/time or an address? XSS is only a vulnerability when it reaches the client and in most cases its best to use htmlentity encoding before printing it out. In this case, its not necessary if you follow the RFC and you use the content-type header properly.
@The Rook Point taken. But I am still convinced, that if I don't allow the inclusion of Javascript in, let's say, the comment section of a blog post, I will encode the content before storing it in the DB. I don't want to store potentially harmful content on my side...
|
1

If the user has to be logged in to view the web page then secure the ajax.php with the same authorization mechanism. Then a client that's not logged in cannot access ajax.php directly to retrieve the data.

Comments

0

I don't think your question is about validating user input, as others pointed out. You don't want to provide your JSON api to other people... right?

If this is the case then there isn't much you can do... in fact, even if you were serving HTML instead of JSON, people would still be doing HTML scraping to get what they wanted from your site (this is how Search Engine spiders work).

A good way to prevent scraping is to allow only a specific amount of downloads from an IP address. This way if someone is requesting http://yoursite.com/somejson.json more than 100 times a day, you probably know it's a scraper, and not someone visiting your page for 100 times in 1 day.

2 Comments

Okay, so the attacker will just use a list of proxy servers to scrape your site. Also input validation isn't always the right answer, and is not required to secure this potential vulnerability.
Define 'list of proxy servers'. There aren't many public proxy servers available and it would sure be harder to setup the proxy servers yourself... but yeah there are indeed ways to get around it.
0

Insertion of script tags (or SQL) is only a problem if you fail to ensure it isn't at the point that it could be a problem.

A <script> tag in the middle of a comment that somebody submits will not hurt your server and it won't hurt your database. What it would hurt, if you fail to take appropriate measures, would be a page that includes the comment when you subsequently serve it up and it reaches a client browser. In order to prevent that from happening, your code that prepares the page must make sure that user-supplied content is always scrubbed before it is exposed to an unaware interpreter. In this case, that unaware interpreter is a client web browser. In fact, your client web browser really involves two unaware interpreters: the HTML parser & layout engine and the Javascript interpreter.

Another important example of an unaware interpreter is your database server. Note that a <script> tag is (almost certainly) harmless to your database, because "" doesn't mean anything in SQL. It's other sorts of input that cause problems for SQL, like quotes in strings (which are harmless to your HTML pages!).

Stackoverflow would be pretty lame if I couldn't put <script> tags in my answers, as I'm doing now. Same goes for examples of SQL Injection attacks. Recently somebody linked a page from some prominent US bank, where a big <textarea> was footnoted by a warning not to include the characters "<" or ">" in whatever you typed. Predictably, the bank was ridiculed over hundreds of Reddit comments, and rightly so.

Exactly how you "scrub" user-supplied content depends on the unaware interpreter to which you're delivering it. If it's going to be dropped in the middle of HTML markup, then you have to make sure that the "<", ">", and "&" characters are all encoded as HTML entitites. (You might want to do quote characters too, if the content might end up in an HTML element attribute value.) If the content is to be dropped into Javascript, however, you may not need to worry about HTML escaping, but you do need to worry about quotes, and possibly Unicode characters outside the 7-bit range.

Comments

0

For outputting safe html from php, I recommend http://htmlpurifier.org/

1 Comment

...Or you could kill a fly with a brick.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.