WordPress and UTF-8

Update: WordPress 4.2 has full UTF-8 support! There’s no need to upgrade manually any more. ?

For many years, MySQL had only supported a small part of UTF-8, a section commonly referred to as plane 0, the “Basic Multilingual Plane”, or the BMP. The UTF-8 spec is divided into “planes“, and plane 0 contains the most commonly used characters. For a long time, this was reasonably sufficient for MySQL’s purposes, and WordPress made do with this limitation.

It has always been possible to store all UTF-8 characters in the latin1 character set, though latin1 has shortcomings. While it recognises the connection between upper and lower case characters in Latin alphabets (such as English, French and German), it doesn’t recognise the same connection for other alphabets. For example, it doesn’t know that ‘Ω’ and ‘ω’ are the upper and lower-case versions of the Greek letter omega. This creates problems for searching text, when you generally want to match characters regardless of their case.

5557649

With the release of MySQL 5.5, however, the utf8mb4 character set was added, and a whole new world opened up. Plane 1 contains many practical characters from historic scripts, music notation and mathematical symbols. It also contains fun characters, such an Emoji and various game symbols. Plane 2 is dedicated to CJK Ideographs, an attempt to create a common library of Chinese, Japanese and Korean characters.

For many websites, being able to use Emoji without installing an extra plugin is an excellent reason to switch your WordPress database to utf8mb4, but unfortunately it’s not quite that simple. MySQL still has a few more limitations that cause problems with  utf8mb4.

Without further ado, here’s how to configure MySQL, so that WordPress can use utf8mb4. If you don’t have the ability to configure your MySQL server directly, you should speak to your host. If they don’t want to, it’s probably time to look for a new host.

Upgrade MySQL

You need to be running MySQL 5.5.14, or higher. If you’re not already running at least MySQL 5.5 (ideally 5.6), you should be doing that anyway, as they provides significant performance and stability improvements over previous versions. For help with upgrading MySQL, check out the MySQL manual.

Configure MySQL

Before we convert your tables, we need to configure MySQL correctly.

In your my.cnf file, add the following settings to the [mysqld] section. Remember to double check that you’re not duplicating settings in your my.cnf file – if any of these options are already set to something different, you’ll need to change that setting, rather than add a new setting.

default-engine=InnoDB

innodb-file-format=barracuda
innodb-file-per-table=true
innodb-large-prefix=true

collation-server=utf8mb4_unicode_ci
character-set-server=utf8mb4

You’ll need to restart your MySQL server after adding these settings.

Use InnoDB

Next, convert your WordPress tables to InnoDB and utf8mb4:

ALTER TABLE wp_posts ENGINE=InnoDB ROW_FORMAT=DYNAMIC;
ALTER TABLE wp_posts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

You’ll need to run these two queries for each WordPress table, I used wp_posts as an example. This is a bit tedious, but the good news is that you’ll only ever need to run them once. A word of warning, you should be prepared for some downtime if you have particularly large tables.

Configure WordPress

Finally, you can tell WordPress to use utf8mb4 by changing your DB_CHARSET setting in your wp-config.php file.

define( 'DB_CHARSET', 'utf8mb4' );

And there you have it. I know, it’s not pretty. I’d really like to add this to WordPress core so you don’t need to go through the hassle, but currently only a very small percentage of WordPress sites are using MySQL 5.5+ and InnoDB – in order to justify it, we need to see lots of sites upgrading! You can head on over to the core ticket for further reading, too – login and click the star at the top to show your support, there’s no need to post “+1” comments. 🙂

Oh, and a final note on Emoji – Chrome support is pretty broken. There’s an extension to add Emoji to Chrome, but it interferes with WordPress’ post editor. If you really want to use Emoji in your posts, Safari or Firefox would be better options.

14 comments

  1. You need to convert the character set of each column inside the table to utf8mb4_unicode_ci too. Otherwise, it won’t work.

    Is this correct? If so, what would be the best way to do this en masse? Thanks.

  2. I would like your opinion on my only other question about this issue. In years past I researched the difference between utf8_general_ci and utf8_unicode_ci (speed mostly, I believe). You have used utf8mb4_unicode_ci for your collation. Is this your recommendation for WP installations moving from utf8_general_ci? Does it really matter much?

    1. The difference is performance vs. accuracy.

      utf8_general_ci is slightly faster, but is less accurate – it doesn’t implement all unicode collation rules. Given the speed of modern servers, I don’t think the perfomance benefits of utf8_general_ci are useful anymore, so I always recommend utf8_unicode_ci (or in the case of this post, utf8mb4_unicode_ci).

  3. I want to migrate a site to twenty fifteen theme so am experimenting with a dummy version which was working well pre-4.2.2. Post-4.2.2 the menu button icons (genericons?) were replaced by the 4-char codes e.g. ‘f431’. Searching round I found this post. My server SQL version for this site is 5.5.41 with charset latin1 (cp 1252). Experimenting with customize on the live site, the menu buttons are fine – same vesion of SQL but collation utf8_general_ci.

    Is this difference on the server the likely cause of my menu icons not being translated or should I be looking elsewhere? The site is http://spic.org.uk/wp

    I have posted this on the forums but not in this detail and had no response so far.

    Hoping you can give somev advice as you clearly understand this subject.

    Many thanks

Comments are closed.