MySQL, WordPress

WordPress and UTF-8

For many years, MySQL had only supported a small part of UTF-8, a section commonly referred to as plane 0, the “Basic Multilingual Plane”, or the BMP. The UTF-8 spec is divided into “planes“, and plane 0 contains the most commonly used characters. For a long time, this was reasonably sufficient for MySQL’s purposes, and WordPress made do with this limitation.

It has always been possible to store all UTF-8 characters in the latin1 character set, though latin1 has shortcomings. While it recognises the connection between upper and lower case characters in Latin alphabets (such as English, French and German), it doesn’t recognise the same connection for other alphabets. For example, it doesn’t know that ‘Ω’ and ‘ω’ are the upper and lower-case versions of the Greek letter omega. This creates problems for searching text, when you generally want to match characters regardless of their case.

5557649

With the release of MySQL 5.5, however, the utf8mb4 character set was added, and a whole new world opened up. Plane 1 contains many practical characters from historic scripts, music notation and mathematical symbols. It also contains fun characters, such an Emoji and various game symbols. Plane 2 is dedicated to CJK Ideographs, an attempt to create a common library of Chinese, Japanese and Korean characters.

For many websites, being able to use Emoji without installing an extra plugin is an excellent reason to switch your WordPress database to utf8mb4, but unfortunately it’s not quite that simple. MySQL still has a few more limitations that cause problems with  utf8mb4.

Without further ado, here’s how to configure MySQL, so that WordPress can use utf8mb4. If you don’t have the ability to configure your MySQL server directly, you should speak to your host. If they don’t want to, it’s probably time to look for a new host.

Upgrade MySQL

You need to be running MySQL 5.5.14, or higher. If you’re not already running at least MySQL 5.5 (ideally 5.6), you should be doing that anyway, as they provides significant performance and stability improvements over previous versions. For help with upgrading MySQL, check out the MySQL manual.

Configure MySQL

Before we convert your tables, we need to configure MySQL correctly.

In your my.cnf file, add the following settings to the [mysqld] section. Remember to double check that you’re not duplicating settings in your my.cnf file – if any of these options are already set to something different, you’ll need to change that setting, rather than add a new setting.

default-engine=InnoDB

innodb-file-format=barracuda
innodb-file-per-table=true
innodb-large-prefix=true

collation-server=utf8mb4_unicode_ci
character-set-server=utf8mb4

You’ll need to restart your MySQL server after adding these settings.

Use InnoDB

Next, convert your WordPress tables to InnoDB and utf8mb4:

ALTER TABLE wp_posts ENGINE=InnoDB ROW_FORMAT=DYNAMIC;
ALTER TABLE wp_posts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

You’ll need to run these two queries for each WordPress table, I used wp_posts as an example. This is a bit tedious, but the good news is that you’ll only ever need to run them once. A word of warning, you should be prepared for some downtime if you have particularly large tables.

Configure WordPress

Finally, you can tell WordPress to use utf8mb4 by changing your DB_CHARSET setting in your wp-config.php file.

define( 'DB_CHARSET', 'utf8mb4' );

And there you have it. I know, it’s not pretty. I’d really like to add this to WordPress core so you don’t need to go through the hassle, but currently only a very small percentage of WordPress sites are using MySQL 5.5+ and InnoDB – in order to justify it, we need to see lots of sites upgrading! You can head on over to the core ticket for further reading, too – login and click the star at the top to show your support, there’s no need to post “+1″ comments. :-)

Oh, and a final note on Emoji – Chrome support is pretty broken. There’s an extension to add Emoji to Chrome, but it interferes with WordPress’ post editor. If you really want to use Emoji in your posts, Safari or Firefox would be better options.

Standard
Misc

PayPal is Still Bad at Account Security

A couple of months ago, following the news of PayPal being partially responsible for a person’s identity theft, I activated Two Factor Authentication on my PayPal account. First up, I was fairly unimpressed with their configuration options. In order to use 2FA, my options were to buy a dongle to generate the security codes, or have the codes SMSed to me. Neither of these are particularly good – I don’t want to have to pay for and carry around a dongle everywhere, and SMS isn’t a secure protocol, as SIM cards can be cloned or hacked. If someone really wanted to get into my account, then this wouldn’t present much of a barrier.

Then, there’s the login process. For some reason, PayPal doesn’t automatically send me an SMS, I need to click an extra button for that while logging in. This isn’t so much a security problem as a weird UX. Also, the Android app doesn’t support 2FA, so I can’t use that at all.

The real fun started last night, however. I tried to login to my PayPal account, and was prompted to enter my security code. No problem, I clicked the Send SMS button, and waited. And waited. I clicked it again. Waited. Tried to login again, and repeated the process a few times. No luck.

Okay, so their SMS service was having issues. Apart from the security issues with SMS, it’s also a notoriously unreliable protocol, regularly causing problems exactly like this. While I was pondering this, I noticed there was an option to bypass the 2FA. I clicked the button, and was prompted to answer my two security questions: my favourite author, and my favourite movie. Unfortunately, I’d set these questions 10 years ago when I first created my PayPal account, and never thought about them since. It turns out that 22 year old me had very different taste in film and literature than 32 year old me, and I had no idea what the answers were. Defeated, I went to bed.

This morning, I decided to try again, with the same result. This time, I called their customer support centre, to see if they could at least give me an update on when SMSes would be working again. Unfortunately, it seemed the customer support representative wasn’t familiar with how PayPal’s 2FA worked, so after a bit of back-and-forth explaining the situation, the CSR said they’d “reset my account” (I don’t know what this means), and it should be working again in 15 minutes.

Half an hour later, still not working, so I call back. Fortunately, this CSR was aware of the SMS issues they were having, and was able to fill me in. Unfortunately, it seems PayPal hadn’t really thought about the implications of their policy for this situation, as he immediately offered to disable 2FA on my account for me.

I’ll just let that sink in for a moment. At this point, I’d only loosely identified myself – I had an identity code from the PayPal support site, that I was able to get with just my username and password. The support systems probably showed my current phone number as matching my 2FA phone number, but they shouldn’t be relying on that at all – the source phone number can be easily spoofed, Skype even offers this as a service.

Sadly, it’s clearly evident that PayPal’s 2FA is broken in a bunch of different ways. You can still keep your account secure by choosing a strong password, and making sure you only login to your PayPal account on devices your trust.

Even if PayPal are in no hurry to mend their ways, here are some things for developers to make sure their own 2FA system is secure:

  • Don’t offer SMS as the only option. SMS-based 2FA is okay for guarding against mass account hijacking, but cannot prevent a targeted attack. As we’ve seen, it’s also wildly unreliable.
  • You should be using a standard method for generating your 2FA codes, such as RFC 6238, which is used by a bunch of different websites, like Google and WordPress.com.
  • Make your 2FA system as easy to use as possible – your users should want to use it, because it doesn’t get in their way, but makes their account safe.
  • Teach your support reps the 2FA mantra: “Something you have, and something you know”. In the case of PayPal, they’d already confirmed something I know (my password), so they could’ve easily confirmed something I have, like my ID or my credit card.
  • If you’re going to use security questions, prompt your users to re-enter them occasionally, so they don’t forget.
Standard
WordPress

Don’t Let Your Plugin Be Activated on Incompatible Sites

When you write a WordPress plugin, you can specify a minimum WordPress version that your code is compatible with, using the “Requires” option in your plugin header, but it isn’t enforced. Along with that, there’s no way to specify a minimum version of PHP or MySQL.

This can cause your users to have bad experiences with your plugin, and you’ll get bad ratings.

So, here’s how I make sure my plugins are only activated when I want them to be.

//  In this example, only allow activation on WordPress 3.7 or higher
class MyPlugin {
    function __construct() {
        add_action( 'admin_init', array( $this, 'check_version' ) );

        // Don't run anything else in the plugin, if we're on an incompatible WordPress version
        if ( ! self::compatible_version() ) {
            return;
        }
    }

    // The primary sanity check, automatically disable the plugin on activation if it doesn't
    // meet minimum requirements.
    static function activation_check() {
        if ( ! self::compatible_version() ) {
            deactivate_plugins( plugin_basename( __FILE__ ) );
            wp_die( __( 'My Plugin requires WordPress 3.7 or higher!', 'my-plugin' ) );
        }
    }

    // The backup sanity check, in case the plugin is activated in a weird way,
    // or the versions change after activation.
    function check_version() {
        if ( ! self::compatible_version() ) {
            if ( is_plugin_active( plugin_basename( __FILE__ ) ) ) {
                deactivate_plugins( plugin_basename( __FILE__ ) );
                add_action( 'admin_notices', array( $this, 'disabled_notice' ) );
                if ( isset( $_GET['activate'] ) ) {
                    unset( $_GET['activate'] );
                }
            }
        }
    }

    function disabled_notice() {
       echo '<strong>' . esc_html__( 'My Plugin requires WordPress 3.7 or higher!', 'my-plugin' ) . '</strong>';
    } 

    static function compatible_version() {
        if ( version_compare( $GLOBALS['wp_version'], '3.7', '<' ) ) {
             return false;
         }

        // Add sanity checks for other version requirements here

        return true;
    }
}

global $myplugin;
$myplugin = new MyPlugin();

register_activation_hook( __FILE__, array( 'MyPlugin', 'activation_check' ) );

It’s only a little extra code when added to a plugin, provides complete protection, and won’t cause any weirdness on the front end of your site, it’ll just deactivate itself either on activation, or when someone visits wp-admin.

Standard
PonyEdit

PonyEdit: A New Beginning

PonyEdit has always been a labour of love for Mark and I. We wanted to build the code editor that we wanted to use for our jobs – fast, clean and as at home in the cloud as it is on the desktop. We absolutely feel we’ve succeeded here – PonyEdit the primary editor for both of us, as well as for a bunch of friends. Sadly, however, PonyEdit was never a commercial success. We made a handful of sales over the years, but certainly not enough to pay for server costs, nevermind enough to pay ourselves anything!

When we first started working on PonyEdit, we decided that we’d try to make a commercial success of it, but if that didn’t work out, we’d Open Source it. And so, that day has come. Effective immediately, PonyEdit is now up on GitHub, released under the GPLv3 licence.

So, what does that mean for the future?

For us? Not much. We’ll still keep doing the same thing with PonyEdit – bug fixes as it needs them, features as we decide we want them.

For you? Pretty much the same. We’ve already transferred Automatic Updates to pull from GitHub, so you won’t see any difference. If you feel the sudden urge to dive into the code, Mark and I are always happy to have a chat with you.

Do you mean it?

You betcha. We’re PonyEdit’s most active users, so we’re going to keep on maintaining it. Sadly, though, it does mean that regular Windows and Linux builds have fallen by the wayside. If you’re a Windows or Linux user with an urge to build your own editor, we’d love to hear from you!

So, that’s about all there is to say. Be excellent to each other!

Standard
WordPress

Don’t Do Regular Expressions, Use The DOM

I’m as guilty of this as anyone – I have a lump of HTML that I need to extract information from. So, I write a quick regular expression, knowing full well that they’re not appropriate for the job. But I do it anyway.

This time, I decided to try doing things a better way.

Here’s the problem I’m trying to solve. In o2, (here’s a feature preview for you!) we’re experimenting with the idea of having post tags inline with the post content, instead of as a separate text field, like in P2. So, when a user saves a post with “#foo” in it, this needs to be extracted and saved as a tag “foo”.

With a regular expression, extraction seems pretty easy at first:

$tags = array();
preg_match_all( '/#[\w-]+/', $content, $tags );

That works on a simple text string, but things start to get complicated pretty quickly. What happens when you enter a URL, like http://pento.net/#foo? Or even worse, enter the URL in a tag like <a href="http://pento.net/#foo">...</a>? In both of these cases, “#foo” clearly isn’t meant to be a tag, so your regular expression quickly becomes a mess. Eventually, it gets to the point where you can’t even guarantee it’ll work under all cases.

Enter DOM parsing.

We’re all pretty familiar with dealing with the DOM, thanks to JavaScript, but it remains a less popular choice on the server side. PHP has various built in libraries to help, and there are plenty of wrappers for the PHP libs, as well as independent implementations, some of which are listed here. There are pros and cons to each option, so far nothing has appeared with the ubiquity of jQuery.

For this exercise, we’ll use PHP’s native DOM extension.

To begin with, let’s create a function to extract the tags from a new post, and save them.

function process_tags( $new, $old, $post ) {
	if ( 'publish' !== $new )
		return;

	$tags = find_tags( $post->post_content );

	wp_set_post_tags( $post->ID, $tags, false );
}
add_action( 'transition_post_status', 'process_tags', 12, 3 );

So far, this is all pretty straight forward. Our find_tags() function is where all the magic happens.

static function find_tags( $content ) {
	$tags = array();

	$dom = new DOMDocument;
	$dom->loadHTML( '<?xml encoding="UTF-8">' . $content );

	$xpath = new DOMXPath( $dom );
	$textNodes = $xpath->query( '//text()' );

	foreach ( $textNodes as $textNode ) {
		$parent = $textNode;
		while ( $parent ) {
			if ( ! empty( $parent->tagName ) && in_array( strtolower( $parent->tagName ), array( 'pre', 'code', 'a' ) ) ) {
				continue 2;
			}
			$parent = $parent->parentNode;
		}

		$matches = array();
		if ( preg_match_all( '/(?:^|\s)#([\w-]+)\b/', $textNode->nodeValue, $matches ) ) {
			$tags = array_merge( $tags, $matches[1] );
		}
	}

	return $tags;
}

The easiest way to explain how this works is to walk through it, so let’s do that now. We’ll feed find_tags() some basic HTML:

<p>#foo <a href="http://pento.net/?a=b&amp;c=d#bar">#baz</a> text</p>

Line 5: We load our HTML into the DOM. The <?xml encoding="UTF-8"> is to force DOMDocument to treat our HTML as being encoded as UTF-8 – by default it assumes ISO-8859-1 (latin1).

Line 7-8: DOMDocument supports XPath selectors, which saves us so much hassle. If you’re not familiar with XPath, it’s kind of like jQuery selectors, but for XML. So, with the //text() selector, we grab an array of all the text nodes in the HTML, “#foo “, “#baz” and ” text”. This fixes one of our big problems, detecting if something is inside of a HTML tag – the DOM library does all of the heavy lifting for us.

Line 10: Now we need to check each text node, to see if it contains a tag.

Line 11-17: But before we do that, we need to make sure we’re not inside a tag we don’t care about. In this example, we assume that anything inside a <pre>, <code> or <a> tag isn’t a post tag, so we can safely ignore it. This loop walks up through the text node’s parents, to make sure it’s not inside one of these tags. This eliminates the “#baz” text node, which is inside an <a> tag.

Line 19-22: Finally, we check the text node for tags, finding the “#foo” tag.


The code is significantly longer than a regular expression, but it has a couple of clear advantages:

  • The function operates exactly as you expect, it only finds tags where you want it to.
  • The regular expression to find tags remains simple, it doesn’t have to care about the hundreds of edge cases you might encounter.

So there you have it. DOM parsing in PHP isn’t a land of monsters, it’s actually pretty easy to wrap your head around, and write code that does exactly what you want it to do.

For an amusing postscript: While writing this post, I ran into a problem with a HTML minification plugin removing the blank lines in the code blocks, because it was just blindly removing all blank lines. By using a DOM parser, instead, it would’ve been able to remove blank lines from everywhere except inside <pre> or <code> tags.

UPDATE (2013-12-19): Fixed a few bugs in the sample code. Props mdawaffe.

Standard
WordPress

How To Backup Before Automatic Updates

When I originally wrote Automatic Updater, I had an action trigger just before the update, so site owners could easily take a backup snapshot, in case of a catastrophic failure. Now that we have Automatic Updates in WordPress Core, however, there’s no such action, and with good reason.

Backups Are Slow

We’re not talking 10-20 seconds slow. We’re talking minutes, or even hours, if the backup includes everything you’ve every uploaded. Some hosts kill processes that take too long, so WordPress might never get to update. Alternatively, it’s possible the backup could still be running when WordPress tries to run the update again, which could potentially cause conflicts.

So with this in mind, how should you do backups before an automatic update runs?

Backup Incrementally

If your backup software doesn’t know how to make an incremental backup, you need to get rid of it, and buy better backup software. I’m naturally biased towards VaultPress, which my employer makes, but there are plenty of good options available. Incremental backups are faster, so it’s easier for you to take backups on a more regular basis – even multiple times a day!

Schedule Your Backups

Instead of running your backup exactly when the update runs, you can schedule a cron job to run 5 minutes before (adjust as needed for the time your backups take). It’s a little tricky to determine if an update is going to run, so here’s a gist you’re welcome to use for your own purposes.

Have fun, stay safe, and remember to test your backups!

Standard
Automatic Updater, WordPress, WordPress Plugins

Automatic Updates

There are few people more excited than I about the recent WordPress 3.7 release – it’s amazing to see Automatic Updates land in WordPress Core, thanks to the hard work of Dion, Nacin, and the excellent testing and input of thousands of developers, we’ve shipped a great feature. With 3.7.1 being automatically rolled out as I type this, it’s truly amazing to see it all come to life on a grand scale.

So, with that now live on millions of sites, what’s next for my old Automatic Updater plugin? Well, it still has some life in it yet. I’ve just released a version 1.0, which strips out all of the old update code in favour of the shiny new Core code, as well as adding a few new features. To match its new and evolved role, I’ve renamed it to Advanced Automatic Updates – it lets you into all of the advanced options that the Core Automatic Updates feature provides.

So, whether you’re a long time Automatic Updater user who wants to continue having a UI for setting up your preferred update options, or you’re a new user who wants to tweak the “under the hood” options of WordPress’ Automatic Updates, you should go ahead and download Advanced Automatic Updates now!

Advanced Automatic Updates can also be found on GitHub – pull requests accepted!

Standard