I searched around and found a few online, including John Resig's HTML parser, Erik Arvidsson's simple html parser, and Google's Caja Sanitizer, but I haven't been able to find much information about whether people have had good experiences using these libraries, and I'm worried that they aren't really robust enough to handle arbitrary HTML. Would I be better off just sending the HTML to my Java server for sanitization?
var htmlS = "<html>etc.etc."; $(htmlS).remove("script"); /* DONT RELY ON THIS FOR SECURITY */
Would I be better off just sending the HTML to my Java server for sanitization?
Filtering "unsafe" input must be done server-side. There is no other way to do it. It's not possible to do filtering client-side because the "client-side" could be a web browser or it could just as easily be a bot with a script.