MySQL, WordPress

WordPress and UTF-8

For many years, MySQL had only supported a small part of UTF-8, a section commonly referred to as plane 0, the “Basic Multilingual Plane”, or the BMP. The UTF-8 spec is divided into “planes“, and plane 0 contains the most commonly used characters. For a long time, this was reasonably sufficient for MySQL’s purposes, and WordPress made do with this limitation.

It has always been possible to store all UTF-8 characters in the latin1 character set, though latin1 has shortcomings. While it recognises the connection between upper and lower case characters in Latin alphabets (such as English, French and German), it doesn’t recognise the same connection for other alphabets. For example, it doesn’t know that ‘Ω’ and ‘ω’ are the upper and lower-case versions of the Greek letter omega. This creates problems for searching text, when you generally want to match characters regardless of their case.

5557649

With the release of MySQL 5.5, however, the utf8mb4 character set was added, and a whole new world opened up. Plane 1 contains many practical characters from historic scripts, music notation and mathematical symbols. It also contains fun characters, such an Emoji and various game symbols. Plane 2 is dedicated to CJK Ideographs, an attempt to create a common library of Chinese, Japanese and Korean characters.

For many websites, being able to use Emoji without installing an extra plugin is an excellent reason to switch your WordPress database to utf8mb4, but unfortunately it’s not quite that simple. MySQL still has a few more limitations that cause problems with  utf8mb4.

Without further ado, here’s how to configure MySQL, so that WordPress can use utf8mb4. If you don’t have the ability to configure your MySQL server directly, you should speak to your host. If they don’t want to, it’s probably time to look for a new host.

Upgrade MySQL

You need to be running MySQL 5.5.14, or higher. If you’re not already running at least MySQL 5.5 (ideally 5.6), you should be doing that anyway, as they provides significant performance and stability improvements over previous versions. For help with upgrading MySQL, check out the MySQL manual.

Configure MySQL

Before we convert your tables, we need to configure MySQL correctly.

In your my.cnf file, add the following settings to the [mysqld] section. Remember to double check that you’re not duplicating settings in your my.cnf file – if any of these options are already set to something different, you’ll need to change that setting, rather than add a new setting.

default-engine=InnoDB

innodb-file-format=barracuda
innodb-file-per-table=true
innodb-large-prefix=true

collation-server=utf8mb4_unicode_ci
character-set-server=utf8mb4

You’ll need to restart your MySQL server after adding these settings.

Use InnoDB

Next, convert your WordPress tables to InnoDB and utf8mb4:

ALTER TABLE wp_posts ENGINE=InnoDB ROW_FORMAT=DYNAMIC;
ALTER TABLE wp_posts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

You’ll need to run these two queries for each WordPress table, I used wp_posts as an example. This is a bit tedious, but the good news is that you’ll only ever need to run them once. A word of warning, you should be prepared for some downtime if you have particularly large tables.

Configure WordPress

Finally, you can tell WordPress to use utf8mb4 by changing your DB_CHARSET setting in your wp-config.php file.

define( 'DB_CHARSET', 'utf8mb4' );

And there you have it. I know, it’s not pretty. I’d really like to add this to WordPress core so you don’t need to go through the hassle, but currently only a very small percentage of WordPress sites are using MySQL 5.5+ and InnoDB – in order to justify it, we need to see lots of sites upgrading! You can head on over to the core ticket for further reading, too – login and click the star at the top to show your support, there’s no need to post “+1″ comments. :-)

Oh, and a final note on Emoji – Chrome support is pretty broken. There’s an extension to add Emoji to Chrome, but it interferes with WordPress’ post editor. If you really want to use Emoji in your posts, Safari or Firefox would be better options.

Standard

4 thoughts on “WordPress and UTF-8

  1. Pingback: WordPress and UTF-8 | The WordPress C(h)ronicle

  2. Pingback: Google, where’s my alt text? | Kate Negin

  3. Navjot Singh says:

    There’s an additional step required too. You need to convert the character set of each column inside the table to utf8mb4_unicode_ci too. Otherwise, it won’t work.

Leave a Reply