WordPress

The Next Adventure

Over my past few years at Automattic, I’ve worked on a bunch of different teams and projects – VideoPress, the WordPress iOS app, various Social projects, and most recently, o2. I even took a few months to work on WordPress core, helping build the auto-update functionality that we now see rolling out security updates within hours of their release.

The few months I spent working on WordPress core made me realise something – there’s a lot more I have to contribute. So, with the WordPress 4.0 RC out the door, I’m super excited to be moving to my next project – working on WordPress core full time!

Automattic naturally puts a lot of people-hours into WordPress, with over 30 of us contributing to WordPress 3.9. I’m looking forward to being a bigger part of that, and giving more back to WordPress community!

Standard
Misc

My Media Server

Over the years, I’ve experimented with a bunch of different methods for media servers, and I think I’ve finally arrived at something that works well for me. Here are the details:

The Server

An old Dell Zino HD I had lying around, running Windows 7. Pretty much any server will be sufficient, this is just the one I had available. Dell doesn’t sell micro-PCs anymore, so just choose your favourite brand that sells something small and with low power requirements. The main things you need from it are a reasonable processor (fast enough to handle transcoding a few video streams in at least realtime), and lots of HD space. I don’t bother with RAID, because I won’t be sad about losing videos that I can easily re-download (the internet is my backup service).

Downloading

I make no excuses, nor apologies for downloading movies and TV shows in a manner that some may describe as involving “copyright violation”.

If you’re in a similar position, there are plenty of BitTorrent sites that allow you register and add videos to a personal RSS feed. Most BitTorrent clients can then subscribe to that feed, and automatically download anything added to it. Some sites even allow you to subscribe to collections, so you can subscribe to a TV show at the start of the season, and automatically get new episodes as soon as they arrive.

For your BitTorrent client, there are two features you need: the ability to subscribe to an RSS feed, and the ability to automatically run a command when the download finishes. I’ve found qBittorrent to be a good option for this.

Sorting

Once a file is downloaded, you need to sort them. By using a standard file layout, you have a much easier time of loading them into your media server later. For automatically sorting your files when they download, nothing compares to the amazing FileBot, which will automatically grab info about the download from TheMovieDB or TheTVDB, and pass it onto your media server. It’s entirely scriptable, but you don’t need to worry about that, because there’s already a great script to do all this for you, called Advanced Media Server (AMC). The initial setup for this was a bit annoying, so here’s the command I use (you can tweak the file locations for your own server, and you’ll need to fix the %n if you use something other than qBittorent):

"C:/Program Files/FileBot/filebot" -script fn:amc --output "C:/Media/Other" --log-file C:/Media/amc.log --action hardlink --conflict override -non-strict --def "seriesFormat=C:/Media/TV/{n}/{'S'+s}/{fn}" "movieFormat=C:/Media/Movies/{n} {y}/{fn}" excludeList=C:/Media/amc-input.txt plex=localhost "ut_dir=C:/Media/Downloads/%n" "ut_kind=multi" "ut_title=%n"

Media Server

Plex is the answer to this question. It looks great, it’ll automatically download extra information about your media, and it has really nice mobile apps for remote control. Extra features include offline syncing to your mobile device, so you can take your media when you’re flying, and Chromecast support so you can watch everything on your TV.

The Filebot command above will automatically tell Plex that a new file has arrived, which is great for if you choose to have your media stored on a NAS (Plex may not be able to automatically watch a directory on a NAS for when new files are added).

Backup

Having a local server is great for keeping a local backup of things that do matter – your photos and documents, for example. I use CrashPlan to sync my most important things to my server, so I have a copy immediately available if my laptop dies. I also use CrashPlan’s remote backup service to keep an offsite backup of everything.

Conclusion

While I’ve enjoyed figuring out how to get this all working smoothly, I’d love to be able to pay a monthly fee for an Rdio or Spotify style service, where I get the latest movies and TV shows as soon as they’re available. If you’re wondering what your next startup should be, get onto that.

Standard
MySQL, WordPress

WordPress and UTF-8

For many years, MySQL had only supported a small part of UTF-8, a section commonly referred to as plane 0, the “Basic Multilingual Plane”, or the BMP. The UTF-8 spec is divided into “planes“, and plane 0 contains the most commonly used characters. For a long time, this was reasonably sufficient for MySQL’s purposes, and WordPress made do with this limitation.

It has always been possible to store all UTF-8 characters in the latin1 character set, though latin1 has shortcomings. While it recognises the connection between upper and lower case characters in Latin alphabets (such as English, French and German), it doesn’t recognise the same connection for other alphabets. For example, it doesn’t know that ‘Ω’ and ‘ω’ are the upper and lower-case versions of the Greek letter omega. This creates problems for searching text, when you generally want to match characters regardless of their case.

5557649

With the release of MySQL 5.5, however, the utf8mb4 character set was added, and a whole new world opened up. Plane 1 contains many practical characters from historic scripts, music notation and mathematical symbols. It also contains fun characters, such an Emoji and various game symbols. Plane 2 is dedicated to CJK Ideographs, an attempt to create a common library of Chinese, Japanese and Korean characters.

For many websites, being able to use Emoji without installing an extra plugin is an excellent reason to switch your WordPress database to utf8mb4, but unfortunately it’s not quite that simple. MySQL still has a few more limitations that cause problems with  utf8mb4.

Without further ado, here’s how to configure MySQL, so that WordPress can use utf8mb4. If you don’t have the ability to configure your MySQL server directly, you should speak to your host. If they don’t want to, it’s probably time to look for a new host.

Upgrade MySQL

You need to be running MySQL 5.5.14, or higher. If you’re not already running at least MySQL 5.5 (ideally 5.6), you should be doing that anyway, as they provides significant performance and stability improvements over previous versions. For help with upgrading MySQL, check out the MySQL manual.

Configure MySQL

Before we convert your tables, we need to configure MySQL correctly.

In your my.cnf file, add the following settings to the [mysqld] section. Remember to double check that you’re not duplicating settings in your my.cnf file – if any of these options are already set to something different, you’ll need to change that setting, rather than add a new setting.

default-engine=InnoDB

innodb-file-format=barracuda
innodb-file-per-table=true
innodb-large-prefix=true

collation-server=utf8mb4_unicode_ci
character-set-server=utf8mb4

You’ll need to restart your MySQL server after adding these settings.

Use InnoDB

Next, convert your WordPress tables to InnoDB and utf8mb4:

ALTER TABLE wp_posts ENGINE=InnoDB ROW_FORMAT=DYNAMIC;
ALTER TABLE wp_posts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

You’ll need to run these two queries for each WordPress table, I used wp_posts as an example. This is a bit tedious, but the good news is that you’ll only ever need to run them once. A word of warning, you should be prepared for some downtime if you have particularly large tables.

Configure WordPress

Finally, you can tell WordPress to use utf8mb4 by changing your DB_CHARSET setting in your wp-config.php file.

define( 'DB_CHARSET', 'utf8mb4' );

And there you have it. I know, it’s not pretty. I’d really like to add this to WordPress core so you don’t need to go through the hassle, but currently only a very small percentage of WordPress sites are using MySQL 5.5+ and InnoDB – in order to justify it, we need to see lots of sites upgrading! You can head on over to the core ticket for further reading, too – login and click the star at the top to show your support, there’s no need to post “+1″ comments. :-)

Oh, and a final note on Emoji – Chrome support is pretty broken. There’s an extension to add Emoji to Chrome, but it interferes with WordPress’ post editor. If you really want to use Emoji in your posts, Safari or Firefox would be better options.

Standard
Misc

PayPal is Still Bad at Account Security

A couple of months ago, following the news of PayPal being partially responsible for a person’s identity theft, I activated Two Factor Authentication on my PayPal account. First up, I was fairly unimpressed with their configuration options. In order to use 2FA, my options were to buy a dongle to generate the security codes, or have the codes SMSed to me. Neither of these are particularly good – I don’t want to have to pay for and carry around a dongle everywhere, and SMS isn’t a secure protocol, as SIM cards can be cloned or hacked. If someone really wanted to get into my account, then this wouldn’t present much of a barrier.

Then, there’s the login process. For some reason, PayPal doesn’t automatically send me an SMS, I need to click an extra button for that while logging in. This isn’t so much a security problem as a weird UX. Also, the Android app doesn’t support 2FA, so I can’t use that at all.

The real fun started last night, however. I tried to login to my PayPal account, and was prompted to enter my security code. No problem, I clicked the Send SMS button, and waited. And waited. I clicked it again. Waited. Tried to login again, and repeated the process a few times. No luck.

Okay, so their SMS service was having issues. Apart from the security issues with SMS, it’s also a notoriously unreliable protocol, regularly causing problems exactly like this. While I was pondering this, I noticed there was an option to bypass the 2FA. I clicked the button, and was prompted to answer my two security questions: my favourite author, and my favourite movie. Unfortunately, I’d set these questions 10 years ago when I first created my PayPal account, and never thought about them since. It turns out that 22 year old me had very different taste in film and literature than 32 year old me, and I had no idea what the answers were. Defeated, I went to bed.

This morning, I decided to try again, with the same result. This time, I called their customer support centre, to see if they could at least give me an update on when SMSes would be working again. Unfortunately, it seemed the customer support representative wasn’t familiar with how PayPal’s 2FA worked, so after a bit of back-and-forth explaining the situation, the CSR said they’d “reset my account” (I don’t know what this means), and it should be working again in 15 minutes.

Half an hour later, still not working, so I call back. Fortunately, this CSR was aware of the SMS issues they were having, and was able to fill me in. Unfortunately, it seems PayPal hadn’t really thought about the implications of their policy for this situation, as he immediately offered to disable 2FA on my account for me.

I’ll just let that sink in for a moment. At this point, I’d only loosely identified myself – I had an identity code from the PayPal support site, that I was able to get with just my username and password. The support systems probably showed my current phone number as matching my 2FA phone number, but they shouldn’t be relying on that at all – the source phone number can be easily spoofed, Skype even offers this as a service.

Sadly, it’s clearly evident that PayPal’s 2FA is broken in a bunch of different ways. You can still keep your account secure by choosing a strong password, and making sure you only login to your PayPal account on devices your trust.

Even if PayPal are in no hurry to mend their ways, here are some things for developers to make sure their own 2FA system is secure:

  • Don’t offer SMS as the only option. SMS-based 2FA is okay for guarding against mass account hijacking, but cannot prevent a targeted attack. As we’ve seen, it’s also wildly unreliable.
  • You should be using a standard method for generating your 2FA codes, such as RFC 6238, which is used by a bunch of different websites, like Google and WordPress.com.
  • Make your 2FA system as easy to use as possible – your users should want to use it, because it doesn’t get in their way, but makes their account safe.
  • Teach your support reps the 2FA mantra: “Something you have, and something you know”. In the case of PayPal, they’d already confirmed something I know (my password), so they could’ve easily confirmed something I have, like my ID or my credit card.
  • If you’re going to use security questions, prompt your users to re-enter them occasionally, so they don’t forget.
Standard
WordPress

Don’t Let Your Plugin Be Activated on Incompatible Sites

When you write a WordPress plugin, you can specify a minimum WordPress version that your code is compatible with, using the “Requires” option in your plugin header, but it isn’t enforced. Along with that, there’s no way to specify a minimum version of PHP or MySQL.

This can cause your users to have bad experiences with your plugin, and you’ll get bad ratings.

So, here’s how I make sure my plugins are only activated when I want them to be.

//  In this example, only allow activation on WordPress 3.7 or higher
class MyPlugin {
    function __construct() {
        add_action( 'admin_init', array( $this, 'check_version' ) );

        // Don't run anything else in the plugin, if we're on an incompatible WordPress version
        if ( ! self::compatible_version() ) {
            return;
        }
    }

    // The primary sanity check, automatically disable the plugin on activation if it doesn't
    // meet minimum requirements.
    static function activation_check() {
        if ( ! self::compatible_version() ) {
            deactivate_plugins( plugin_basename( __FILE__ ) );
            wp_die( __( 'My Plugin requires WordPress 3.7 or higher!', 'my-plugin' ) );
        }
    }

    // The backup sanity check, in case the plugin is activated in a weird way,
    // or the versions change after activation.
    function check_version() {
        if ( ! self::compatible_version() ) {
            if ( is_plugin_active( plugin_basename( __FILE__ ) ) ) {
                deactivate_plugins( plugin_basename( __FILE__ ) );
                add_action( 'admin_notices', array( $this, 'disabled_notice' ) );
                if ( isset( $_GET['activate'] ) ) {
                    unset( $_GET['activate'] );
                }
            }
        }
    }

    function disabled_notice() {
       echo '<strong>' . esc_html__( 'My Plugin requires WordPress 3.7 or higher!', 'my-plugin' ) . '</strong>';
    } 

    static function compatible_version() {
        if ( version_compare( $GLOBALS['wp_version'], '3.7', '<' ) ) {
             return false;
         }

        // Add sanity checks for other version requirements here

        return true;
    }
}

global $myplugin;
$myplugin = new MyPlugin();

register_activation_hook( __FILE__, array( 'MyPlugin', 'activation_check' ) );

It’s only a little extra code when added to a plugin, provides complete protection, and won’t cause any weirdness on the front end of your site, it’ll just deactivate itself either on activation, or when someone visits wp-admin.

Standard
PonyEdit

PonyEdit: A New Beginning

PonyEdit has always been a labour of love for Mark and I. We wanted to build the code editor that we wanted to use for our jobs – fast, clean and as at home in the cloud as it is on the desktop. We absolutely feel we’ve succeeded here – PonyEdit the primary editor for both of us, as well as for a bunch of friends. Sadly, however, PonyEdit was never a commercial success. We made a handful of sales over the years, but certainly not enough to pay for server costs, nevermind enough to pay ourselves anything!

When we first started working on PonyEdit, we decided that we’d try to make a commercial success of it, but if that didn’t work out, we’d Open Source it. And so, that day has come. Effective immediately, PonyEdit is now up on GitHub, released under the GPLv3 licence.

So, what does that mean for the future?

For us? Not much. We’ll still keep doing the same thing with PonyEdit – bug fixes as it needs them, features as we decide we want them.

For you? Pretty much the same. We’ve already transferred Automatic Updates to pull from GitHub, so you won’t see any difference. If you feel the sudden urge to dive into the code, Mark and I are always happy to have a chat with you.

Do you mean it?

You betcha. We’re PonyEdit’s most active users, so we’re going to keep on maintaining it. Sadly, though, it does mean that regular Windows and Linux builds have fallen by the wayside. If you’re a Windows or Linux user with an urge to build your own editor, we’d love to hear from you!

So, that’s about all there is to say. Be excellent to each other!

Standard
WordPress

Don’t Do Regular Expressions, Use The DOM

I’m as guilty of this as anyone – I have a lump of HTML that I need to extract information from. So, I write a quick regular expression, knowing full well that they’re not appropriate for the job. But I do it anyway.

This time, I decided to try doing things a better way.

Here’s the problem I’m trying to solve. In o2, (here’s a feature preview for you!) we’re experimenting with the idea of having post tags inline with the post content, instead of as a separate text field, like in P2. So, when a user saves a post with “#foo” in it, this needs to be extracted and saved as a tag “foo”.

With a regular expression, extraction seems pretty easy at first:

$tags = array();
preg_match_all( '/#[\w-]+/', $content, $tags );

That works on a simple text string, but things start to get complicated pretty quickly. What happens when you enter a URL, like http://pento.net/#foo? Or even worse, enter the URL in a tag like <a href="http://pento.net/#foo">...</a>? In both of these cases, “#foo” clearly isn’t meant to be a tag, so your regular expression quickly becomes a mess. Eventually, it gets to the point where you can’t even guarantee it’ll work under all cases.

Enter DOM parsing.

We’re all pretty familiar with dealing with the DOM, thanks to JavaScript, but it remains a less popular choice on the server side. PHP has various built in libraries to help, and there are plenty of wrappers for the PHP libs, as well as independent implementations, some of which are listed here. There are pros and cons to each option, so far nothing has appeared with the ubiquity of jQuery.

For this exercise, we’ll use PHP’s native DOM extension.

To begin with, let’s create a function to extract the tags from a new post, and save them.

function process_tags( $new, $old, $post ) {
	if ( 'publish' !== $new )
		return;

	$tags = find_tags( $post->post_content );

	wp_set_post_tags( $post->ID, $tags, false );
}
add_action( 'transition_post_status', 'process_tags', 12, 3 );

So far, this is all pretty straight forward. Our find_tags() function is where all the magic happens.

static function find_tags( $content ) {
	$tags = array();

	$dom = new DOMDocument;
	$dom->loadHTML( '<?xml encoding="UTF-8">' . $content );

	$xpath = new DOMXPath( $dom );
	$textNodes = $xpath->query( '//text()' );

	foreach ( $textNodes as $textNode ) {
		$parent = $textNode;
		while ( $parent ) {
			if ( ! empty( $parent->tagName ) && in_array( strtolower( $parent->tagName ), array( 'pre', 'code', 'a' ) ) ) {
				continue 2;
			}
			$parent = $parent->parentNode;
		}

		$matches = array();
		if ( preg_match_all( '/(?:^|\s)#([\w-]+)\b/', $textNode->nodeValue, $matches ) ) {
			$tags = array_merge( $tags, $matches[1] );
		}
	}

	return $tags;
}

The easiest way to explain how this works is to walk through it, so let’s do that now. We’ll feed find_tags() some basic HTML:

<p>#foo <a href="http://pento.net/?a=b&amp;c=d#bar">#baz</a> text</p>

Line 5: We load our HTML into the DOM. The <?xml encoding="UTF-8"> is to force DOMDocument to treat our HTML as being encoded as UTF-8 – by default it assumes ISO-8859-1 (latin1).

Line 7-8: DOMDocument supports XPath selectors, which saves us so much hassle. If you’re not familiar with XPath, it’s kind of like jQuery selectors, but for XML. So, with the //text() selector, we grab an array of all the text nodes in the HTML, “#foo “, “#baz” and ” text”. This fixes one of our big problems, detecting if something is inside of a HTML tag – the DOM library does all of the heavy lifting for us.

Line 10: Now we need to check each text node, to see if it contains a tag.

Line 11-17: But before we do that, we need to make sure we’re not inside a tag we don’t care about. In this example, we assume that anything inside a <pre>, <code> or <a> tag isn’t a post tag, so we can safely ignore it. This loop walks up through the text node’s parents, to make sure it’s not inside one of these tags. This eliminates the “#baz” text node, which is inside an <a> tag.

Line 19-22: Finally, we check the text node for tags, finding the “#foo” tag.


The code is significantly longer than a regular expression, but it has a couple of clear advantages:

  • The function operates exactly as you expect, it only finds tags where you want it to.
  • The regular expression to find tags remains simple, it doesn’t have to care about the hundreds of edge cases you might encounter.

So there you have it. DOM parsing in PHP isn’t a land of monsters, it’s actually pretty easy to wrap your head around, and write code that does exactly what you want it to do.

For an amusing postscript: While writing this post, I ran into a problem with a HTML minification plugin removing the blank lines in the code blocks, because it was just blindly removing all blank lines. By using a DOM parser, instead, it would’ve been able to remove blank lines from everywhere except inside <pre> or <code> tags.

UPDATE (2013-12-19): Fixed a few bugs in the sample code. Props mdawaffe.

Standard