HTML encode UTF-8 string gets mangled into latin1

I'm parsing my nginx logs, and I want to discover some details from the HTTP_REFERER string, for example, the query string used to find the web site. One user typed in "México" which gets encoded in the log as "query=M%E9xico".

Passing this through Rack::Utils.parse_query('query=M%E9xico') you get a hash, {"query" => "M?xico"}

When you to stuff "M?exico" into Postgres (but not the more forgiving SQLite), it pukes because the string isn't proper UTF-8. Looking at http://rack.rubyforge.org/doc/Rack/Utils.html#M000324, unescape is packing a hex string.

How can I convert the string back to UTF-8, or can I get parse_query to return UTF-8 in the first place.

Answers


unescape will decode the URL encoding:

Rack::Utils.parse_query(URI.unescape('query=M%E9xico'))

Or

Rack::Utils.parse_query(Utils.unescape('query=M%E9xico'))

The problem here happens well before you get ahold of the data. You need to fix the problem upstream if you can, and if you can't then my suggestion is find out the encoding and convert it on input or using conversion libraries in Ruby (iconv for example).

The problem is not in PostgreSQL, though.


Need Your Help

Save to isolated storage directly in WP8

windows-phone-8 webclient isolatedstorage

I want to save a zip file directly to isolated storage from server , But the problem i am facing was when i try to save using the below code , i get out of memory exception since my file size is > ...

Swift dyld: Library not loaded - using CocoaPods

ios iphone xcode swift cocoapods

I apologize for what may seem like an overly asked question, but no matter how many answers to related questions I'm asking, none of them seem to work. See (in order) here, here, here, and here.