Is there a good Javascript based HTML parsing library available?

My goal is to take HTML entered by an end user, remove certain unsafe tags like <script>, and add it to the document. Does anybody know of a good Javascript library to sanitize html?

I searched around and found a few online, including John Resig's HTML parser, Erik Arvidsson's simple html parser, and Google's Caja Sanitizer, but I haven't been able to find much information about whether people have had good experiences using these libraries, and I'm worried that they aren't really robust enough to handle arbitrary HTML. Would I be better off just sending the HTML to my Java server for sanitization?

Answers


You can parse HTML with jQuery, but I'm pretty sure any blacklist based (i.e. filtering out) approach to sanitizing is going to fail - you probably need a "filtering in" based approach and ultimately you don't want to be relying on JavaScript for security anyway. In any case for reference you can use jQuery for DOM-parsing like this:

var htmlS = "<html>etc.etc.";
$(htmlS).remove("script"); /* DONT RELY ON THIS FOR SECURITY */

Would I be better off just sending the HTML to my Java server for sanitization?

Yes.

Filtering "unsafe" input must be done server-side. There is no other way to do it. It's not possible to do filtering client-side because the "client-side" could be a web browser or it could just as easily be a bot with a script.


Need Your Help

ask for record id on form open (microsoft access 2013)

forms ms-access ms-access-2013

I am willing to create a form which upon it's opening, it should prompt the user to enter the record ID it should open on.

Ember "transition was aborted"

javascript ember.js ember-data

I have a route in my Ember App Kit project which fetches from a REST service. This is the code: