Update: WordPress 4.2 has full UTF-8 support! There’s no need to upgrade manually any more. 😀
For many years, MySQL had only supported a small part of UTF-8, a section commonly referred to as plane 0, the “Basic Multilingual Plane”, or the BMP. The UTF-8 spec is divided into “planes“, and plane 0 contains the most commonly used characters. For a long time, this was reasonably sufficient for MySQL’s purposes, and WordPress made do with this limitation.
It has always been possible to store all UTF-8 characters in the
latin1 character set, though
latin1 has shortcomings. While it recognises the connection between upper and lower case characters in Latin alphabets (such as English, French and German), it doesn’t recognise the same connection for other alphabets. For example, it doesn’t know that ‘Ω’ and ‘ω’ are the upper and lower-case versions of the Greek letter omega. This creates problems for searching text, when you generally want to match characters regardless of their case.
With the release of MySQL 5.5, however, the
utf8mb4 character set was added, and a whole new world opened up. Plane 1 contains many practical characters from historic scripts, music notation and mathematical symbols. It also contains fun characters, such an Emoji and various game symbols. Plane 2 is dedicated to CJK Ideographs, an attempt to create a common library of Chinese, Japanese and Korean characters.
For many websites, being able to use Emoji without installing an extra plugin is an excellent reason to switch your WordPress database to
utf8mb4, but unfortunately it’s not quite that simple. MySQL still has a few more limitations that cause problems with
Without further ado, here’s how to configure MySQL, so that WordPress can use
utf8mb4. If you don’t have the ability to configure your MySQL server directly, you should speak to your host. If they don’t want to, it’s probably time to look for a new host.
You need to be running MySQL 5.5.14, or higher. If you’re not already running at least MySQL 5.5 (ideally 5.6), you should be doing that anyway, as they provides significant performance and stability improvements over previous versions. For help with upgrading MySQL, check out the MySQL manual.
Before we convert your tables, we need to configure MySQL correctly.
In your my.cnf file, add the following settings to the
[mysqld] section. Remember to double check that you’re not duplicating settings in your my.cnf file – if any of these options are already set to something different, you’ll need to change that setting, rather than add a new setting.
You’ll need to restart your MySQL server after adding these settings.
Next, convert your WordPress tables to InnoDB and
ALTER TABLE wp_posts ENGINE=InnoDB ROW_FORMAT=DYNAMIC;
ALTER TABLE wp_posts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
You’ll need to run these two queries for each WordPress table, I used
wp_posts as an example. This is a bit tedious, but the good news is that you’ll only ever need to run them once. A word of warning, you should be prepared for some downtime if you have particularly large tables.
Finally, you can tell WordPress to use
utf8mb4 by changing your
DB_CHARSET setting in your wp-config.php file.
define( 'DB_CHARSET', 'utf8mb4' );
And there you have it. I know, it’s not pretty. I’d really like to add this to WordPress core so you don’t need to go through the hassle, but currently only a very small percentage of WordPress sites are using MySQL 5.5+ and InnoDB – in order to justify it, we need to see lots of sites upgrading! You can head on over to the core ticket for further reading, too – login and click the star at the top to show your support, there’s no need to post “+1″ comments.
Oh, and a final note on Emoji – Chrome support is pretty broken. There’s an extension to add Emoji to Chrome, but it interferes with WordPress’ post editor. If you really want to use Emoji in your posts, Safari or Firefox would be better options.