CodeIgniter - why use xss_clean
if I'm sanitizing my DB inserts, and also escaping the HTML I write with htmlentities($text, ENT_COMPAT, 'UTF-8') - is there any point to also filtering the inputs with xss_clean? What other benefits does it give?
xss_clean() is extensive, and also silly. 90% of this function does nothing to prevent xss. Such as looking for the word alert but not document.cookie. No hacker is going to use alert in their exploit, they are going to hijack the cookie with xss or read a CSRF token to make an XHR.
However running htmlentities() or htmlspecialchars() with it is redundant. A case where xss_clean() fixes the issue and htmlentities($text, ENT_COMPAT, 'UTF-8') fails is the following:
<?php print "<img src='$var'>"; ?>
A simple poc is:
This will add the onload= event handler to the image tag. A method of stopping this form of xss is htmlspecialchars($var,ENT_QUOTES); or in this case xss_clean() will also prevent this.
However, quoting from the xss_clean() documentation:
Nothing is ever 100% foolproof, of course, but I haven't been able to get anything passed the filter.
That being said, XSS is an output problem not an input problem. For instance this function cannot take into account that the variable is already within a <script> tag or event handler. It also doesn't stop DOM Based XSS. You need to take into consideration how you are using the data in order to use the best function. Filtering all data on input is a bad practice. Not only is it insecure but it also corrupts data which can make comparisons difficult.
In your case, "stricter methods are fine, and lighter weight". CodeIgniter developers intend xss_clean() for a different use case, "a commenting system or forum that allows 'safe' HTML tags". This isn't clear from the documentation, where xss_clean is shown applied to a username field.
There's another reason to never use xss_clean(), that hasn't been highlighted on stackoverflow so far. xss_clean() was broken during 2011 and 2012, and it's impossible to fix completely. At least without a complete redesign, which didn't happen. At the moment, it's still vulnerable to strings like this:
An attacker can simply encode their exploit twice. It will be decoded once by xss_clean(), and pass as clean. You then have a singly-encoded exploit, ready for execution in the browser.
I call these checks "naive" and unfixable because they're largely reliant on regular expressions. HTML is not a regular language. You need a more powerful parser to match the one in the browser; xss_clean() doesn't have anything like that. Maybe it's possible to whitelist a subset of HTML, which lexes cleanly with regular expressions. However, the current xss_clean() is very much a blacklist.
Yes you should still be using it, I generally make it a rule to use it at least on public facing input, meaning any input that anyone can access and submit to.
Generally sanitizing the input for DB queries seems like a side-effect as the true purpose of the function is to prevent Cross-site Scripting Attacks.
I'm not going to get into the nitty gritty details of every step xss_clean takes, but i will tell you it does more than the few steps you mentioned, I've pastied the source of the xss_clean function so you can look yourself, it is fully commented.
I would recommend using http://htmlpurifier.org/ for doing XSS purification. I'm working on extending my CodeIgniter Input class to start leveraging it.
If you want the filter to run automatically every time it encounters POST or COOKIE data you can enable it by opening your application/config/config.php file and setting this: $config['global_xss_filtering'] = TRUE;
You can enable csrf protection by opening your application/config/config.php file and setting this: $config['csrf_protection'] = TRUE;
for more details, please see on following link.