Why is table CHARSET set to utf8mb4 and COLLATION to utf8mb4_unicode_520_ci
I've recently noticed that, when ever I start a new WordPress project, my tables' collation automatically changes from utf8_unicode_ci (which I select when I create a new DB from phpMyAdmin) to utf8mb4_unicode_520_ci.
Also, I've noticed in phpMyAdmin under “General Settings” that server connection Collation defaults to utf8mb4_unicode_520_ci.
I'm running MySQL Server 5.7.17 and phpMyAdmin 4.6.6 on Ubuntu 17.04.
My questions are following:
- Why is this happening?
- If possible, how do I prevent this? Because of utf8mb4 I've experienced problems when migrating WP sites to an older MySQL server which does not support it.
- Is point 2. advisable? Are there any benefits in using charset utf8mb4 over utf8, and collation utf8mb4_unicode_520_ci over utf8_unicode_ci?
In the past, there was only utf8; in the future, utf8mb4 will be the default character set. now utf8mb4 is the default character set.
In the past, _general_ci was the default collation; then _unicode_ci (Unicode 4.0) was better, then _unicode_520_ci (Unicode 5.20). In the future (MySQL 8.0), the default will be _0900_ci_ai (Unicode 9.0).
Meanwhile, the road is full of potholes generated by MySQL's past mistakes. And WP designers are driving in a big tank that does not notice the potholes.
MySQL 5.6 was a big pothole that swallowed up many a WP user because of a 767 limit on indexes together with WP indexes on the overly-long VARCHAR(255) and the possibility of using utf8mb4. You are well past it by having 5.7.17. (Your future move to 8.0 will be less bumpy.)
That is, newly created databases/tables/columns on 5.7.7+ should not experience the 767 problem, but things migrated from older versions (5.5.3+) may have issues, especially if something causes you to change to utf8mb4.
What to do? I'll probably run out of space trying to spell out all the options. So provide the history of the data, the upgrade path (if any), the current settings, the ROW_FORMAT of the tables, the CHARACTER SET and COLLATION of the columns, the output of SHOW VARIABLES LIKE 'char%';
Where should you be? For 5.7.7+, utf8mb4 and utf8mb4_unicode_520_ci wherever practical. That charset gives you Emoji and all of Chinese (utf8 does not). That collation is the best available, although you might be hard pressed to notice where it matters.
Note: the first part of the collation name is the only character set that it works with. That is utf8_unicode_ci does not work with utf8mb4.