Removing Javascript from HREFs

We want to allow "normal" href links to other webpages, but we don't want to allow anyone to sneak in client-side scripting.

Is searching for "javascript:" within the HREF and onclick/onmouseover/etc. events good enough? Or are there other things to check?

Answers


It sounds like you're allowing users to submit content with markup. As such, I would recommend taking a look at a few articles about preventing cross-site scripting which would cover a bit more than simply preventing javascript from being inserted into an HREF tag. Below is one I found that might be useful:

http://weblogs.java.net/blog/gmurray71/archive/2006/09/preventing_cros.html


You'll have to use a whitelist of allowed protocols to be completely safe. If you use a blacklist, sooner or later you'll miss something like "telnet://" or "shell:" or some exploitable browser-specific thing you've never heard of...


Nope, there's a lot more that you need to check.

First of the URL could be encoded (using HTML entities or URL encoding or a mixture of both).

Secondly you need to check for malformed HTML, which the browser might guess at and end up allowing some script in.

Thirdly you need to check for CSS based script, e.g. background: url(javascript:...) or width:expression(...)

There's probably more that I've missed - you need to be careful!


You have to be extremely careful when taking user input. You'll want to do a whitelist as mentioned, but not just with the href. Example:

<img src="nosuchimage.blahblah" onerror="alert('Haxored!!!');" />

or

<a href="about:blank;" onclick="alert('Haxored again!!!');">click meh</a>

one option would be to disallow html at all and use the same sort of formatting that some forums use. Just replace

[url="xxx"]yyy[/url]

with

<a href="xxx">yyy</a>

That'll get you around the issues with mouse over etc. Then just make sure the link starts off with a white-listed protocol, and doesn't have a quote in it (&quot; or some such that might be decrypted by php or the browser).


Sounds like you're looking for the companion function to PHP's strip_tags, which is strip_attributes. Unfortunately, it hasn't been written yet. (Hint, hint.)

There is, however, an interesting-looking suggestion in the strip_tags documentation, here:

http://www.php.net/manual/en/function.strip-tags.php#85718

In theory this will strip anything that isn't an href, class, or ID from submitted links; seems like you probably want to lock it down even further and just take hrefs.


Need Your Help

What is the best way to determine the correct Charset for a given LCID at runtime in VB6?

vb6 locale character-encoding lcid

I am displaying Japanese characters in a VB6 application with the system locale set to Japan and the language for non Unicode programs as Japanese. A call to GetACP() correctly returns 932 for Jap...

unable to install jekyll on OSX

ruby macos gem jekyll

I am following this guide to install jekyll on OSX 10.7.3 and came up with the problem