Handling UTF-8 Character with Latin1 db encoding
I keep getting an exception that ActiveRecord::StatementInvalid: PG::UntranslatableCharacter: ERROR: character with byte sequence 0xe2 0x80 0x99 in encoding "UTF8" has no equivalent in encoding "LATIN1". I did some checking and it looks like it is the backtick or apostrophe. What is the best way to handle this? Just strip out the character or convert the whole db to UTF-8? If it is converting to UTF-8 how can I do that permanently as it always seems to revert if you do it in the shell?
I don't understand what you mean by "revert, if done in the shell", but: You seem to have an application where some parts (at least the database) using encoding LATIN1, and one part (your Rails App) is using UTF-8. IMO, it is best if you have every in Unicode, but to what extend a conversion makes sense, can not be said in general. For example, if your database is also being processed by other tools, and those expect Latin1, a conversion is not sensible.
In any case, you need to define a clear borderline between where you use which encoding, and handle conversion at this border. This applies not only to the database, but also - for example - to the HTML pages you are generating (hopefully UTF-8), to files uploaded by the users and processes by your application, and so on.
If you convert to an encoding, where certain characters can not be represented - as this is in your case -, you have only three choices:
Reject the data (they must have been generated somewhere, perhaps as user input in a web form),
Simply remove the offending characters
Replace the offending characters by a placeholder (for instance, a question mark)
None of these options is very pleasant, but if converting your database to UTF-8 is no option, you should deal with this problem at the point where the problem string is generated, and not when it is written into the database.