PonyEdit: A New Beginning

PonyEdit has always been a labour of love for Mark and I. We wanted to build the code editor that we wanted to use for our jobs – fast, clean and as at home in the cloud as it is on the desktop. We absolutely feel we’ve succeeded here – PonyEdit the primary editor for both of us, as well as for a bunch of friends. Sadly, however, PonyEdit was never a commercial success. We made a handful of sales over the years, but certainly not enough to pay for server costs, nevermind enough to pay ourselves anything!

When we first started working on PonyEdit, we decided that we’d try to make a commercial success of it, but if that didn’t work out, we’d Open Source it. And so, that day has come. Effective immediately, PonyEdit is now up on GitHub, released under the GPLv3 licence.

So, what does that mean for the future?

For us? Not much. We’ll still keep doing the same thing with PonyEdit – bug fixes as it needs them, features as we decide we want them.

For you? Pretty much the same. We’ve already transferred Automatic Updates to pull from GitHub, so you won’t see any difference. If you feel the sudden urge to dive into the code, Mark and I are always happy to have a chat with you.

Do you mean it?

You betcha. We’re PonyEdit’s most active users, so we’re going to keep on maintaining it. Sadly, though, it does mean that regular Windows and Linux builds have fallen by the wayside. If you’re a Windows or Linux user with an urge to build your own editor, we’d love to hear from you!

So, that’s about all there is to say. Be excellent to each other!

Don’t Do Regular Expressions, Use The DOM

I’m as guilty of this as anyone – I have a lump of HTML that I need to extract information from. So, I write a quick regular expression, knowing full well that they’re not appropriate for the job. But I do it anyway.

This time, I decided to try doing things a better way.

Here’s the problem I’m trying to solve. In o2, (here’s a feature preview for you!) we’re experimenting with the idea of having post tags inline with the post content, instead of as a separate text field, like in P2. So, when a user saves a post with “#foo” in it, this needs to be extracted and saved as a tag “foo”.

With a regular expression, extraction seems pretty easy at first:

$tags = array();
preg_match_all( '/#[\w-]+/', $content, $tags );

That works on a simple text string, but things start to get complicated pretty quickly. What happens when you enter a URL, like http://pento.net/#foo? Or even worse, enter the URL in a tag like <a href="http://pento.net/#foo">...</a>? In both of these cases, “#foo” clearly isn’t meant to be a tag, so your regular expression quickly becomes a mess. Eventually, it gets to the point where you can’t even guarantee it’ll work under all cases.

Enter DOM parsing.

We’re all pretty familiar with dealing with the DOM, thanks to JavaScript, but it remains a less popular choice on the server side. PHP has various built in libraries to help, and there are plenty of wrappers for the PHP libs, as well as independent implementations, some of which are listed here. There are pros and cons to each option, so far nothing has appeared with the ubiquity of jQuery.

For this exercise, we’ll use PHP’s native DOM extension.

To begin with, let’s create a function to extract the tags from a new post, and save them.

function process_tags( $new, $old, $post ) {
    if ( 'publish' !== $new )

    $tags = find_tags( $post->post_content );

    wp_set_post_tags( $post->ID, $tags, false );
add_action( 'transition_post_status', 'process_tags', 12, 3 );


So far, this is all pretty straight forward. Our find_tags() function is where all the magic happens.

static function find_tags( $content ) {
    $tags = array();

    $dom = new DOMDocument;
    $dom->loadHTML( '<?xml encoding="UTF-8">' . $content );

    $xpath = new DOMXPath( $dom );
    $textNodes = $xpath->query( '//text()' );

    foreach ( $textNodes as $textNode ) {
        $parent = $textNode;
        while ( $parent ) {
            if ( ! empty( $parent->tagName ) && in_array( strtolower( $parent->tagName ), array( 'pre', 'code', 'a' ) ) ) {
                continue 2;
            $parent = $parent->parentNode;

        $matches = array();
        if ( preg_match_all( '/(?:^|\s)#([\w-]+)\b/', $textNode->nodeValue, $matches ) ) {
            $tags = array_merge( $tags, $matches[1] );

    return $tags;

The easiest way to explain how this works is to walk through it, so let’s do that now. We’ll feed find_tags() some basic HTML:

<p>#foo <a href="http://pento.net/?a=b&amp;c=d#bar">#baz</a> text</p>

Line 5: We load our HTML into the DOM. The <?xml encoding="UTF-8"> is to force DOMDocument to treat our HTML as being encoded as UTF-8 – by default it assumes ISO-8859-1 (latin1).

Line 7-8: DOMDocument supports XPath selectors, which saves us so much hassle. If you’re not familiar with XPath, it’s kind of like jQuery selectors, but for XML. So, with the //text() selector, we grab an array of all the text nodes in the HTML, “#foo “, “#baz” and ” text”. This fixes one of our big problems, detecting if something is inside of a HTML tag – the DOM library does all of the heavy lifting for us.

Line 10: Now we need to check each text node, to see if it contains a tag.

Line 11-17: But before we do that, we need to make sure we’re not inside a tag we don’t care about. In this example, we assume that anything inside a <pre>, <code> or <a> tag isn’t a post tag, so we can safely ignore it. This loop walks up through the text node’s parents, to make sure it’s not inside one of these tags. This eliminates the “#baz” text node, which is inside an <a> tag.

Line 19-22: Finally, we check the text node for tags, finding the “#foo” tag.

The code is significantly longer than a regular expression, but it has a couple of clear advantages:

  • The function operates exactly as you expect, it only finds tags where you want it to.
  • The regular expression to find tags remains simple, it doesn’t have to care about the hundreds of edge cases you might encounter.

So there you have it. DOM parsing in PHP isn’t a land of monsters, it’s actually pretty easy to wrap your head around, and write code that does exactly what you want it to do.

For an amusing postscript: While writing this post, I ran into a problem with a HTML minification plugin removing the blank lines in the code blocks, because it was just blindly removing all blank lines. By using a DOM parser, instead, it would’ve been able to remove blank lines from everywhere except inside <pre> or <code> tags.

UPDATE (2013-12-19): Fixed a few bugs in the sample code. Props mdawaffe.

How To Backup Before Automatic Updates

When I originally wrote Automatic Updater, I had an action trigger just before the update, so site owners could easily take a backup snapshot, in case of a catastrophic failure. Now that we have Automatic Updates in WordPress Core, however, there’s no such action, and with good reason.

Backups Are Slow

We’re not talking 10-20 seconds slow. We’re talking minutes, or even hours, if the backup includes everything you’ve every uploaded. Some hosts kill processes that take too long, so WordPress might never get to update. Alternatively, it’s possible the backup could still be running when WordPress tries to run the update again, which could potentially cause conflicts.

So with this in mind, how should you do backups before an automatic update runs?

Backup Incrementally

If your backup software doesn’t know how to make an incremental backup, you need to get rid of it, and buy better backup software. I’m naturally biased towards VaultPress, which my employer makes, but there are plenty of good options available. Incremental backups are faster, so it’s easier for you to take backups on a more regular basis – even multiple times a day!

Schedule Your Backups

Instead of running your backup exactly when the update runs, you can schedule a cron job to run 5 minutes before (adjust as needed for the time your backups take). It’s a little tricky to determine if an update is going to run, so here’s a gist you’re welcome to use for your own purposes.

Have fun, stay safe, and remember to test your backups!

Automatic Updates

There are few people more excited than I about the recent WordPress 3.7 release – it’s amazing to see Automatic Updates land in WordPress Core, thanks to the hard work of Dion, Nacin, and the excellent testing and input of thousands of developers, we’ve shipped a great feature. With 3.7.1 being automatically rolled out as I type this, it’s truly amazing to see it all come to life on a grand scale.

So, with that now live on millions of sites, what’s next for my old Automatic Updater plugin? Well, it still has some life in it yet. I’ve just released a version 1.0, which strips out all of the old update code in favour of the shiny new Core code, as well as adding a few new features. To match its new and evolved role, I’ve renamed it to Advanced Automatic Updates – it lets you into all of the advanced options that the Core Automatic Updates feature provides.

So, whether you’re a long time Automatic Updater user who wants to continue having a UI for setting up your preferred update options, or you’re a new user who wants to tweak the “under the hood” options of WordPress’ Automatic Updates, you should go ahead and download Advanced Automatic Updates now!

Advanced Automatic Updates can also be found on GitHub – pull requests accepted!

A WordPress Adventure

I like to think of working at Automattic as a Choose Your Own Adventure career. Over the past couple of years, I’ve worked on a wide range of projects, from VideoPress, to the WordPress iOS App, through to Jetpack Likes, Two Step Authentication and most recently o2, the upcoming successor to P2. If I (or any of my colleagues) feel it’s time to mix things up, it’s as simple as deciding to work on something different. So, it’s hardly a surprise that I’m moving to a new project, except this time we’re trying an experiment.

It’s no secret that WordPress.com is the largest WordPress install in the world – behind all the custom plugins and themes is a single copy of WordPress. WordPress is the core of all of our day-to-day work, and we contribute on a regular basis – over 40 Automatticians contributed to the latest WordPress 3.6 release alone! But, when your day-to-day work involves working on things other than the WordPress Core project, it can be hard to allocate time to do some core work.

This is where the experiment comes in: for the entire WordPress 3.7 development and release cycle, I’m dropping all of my usual work, instead working full time on WordPress Core.

So, what am I be working on? Well, you may recall during Matt’s State of the Word talk, he mentioned that we’d be introducing Automatic Updates for minor WordPress releases. In what is clearly a massive coincidence, over the past year I’ve been experimenting with WordPress Automatic Updates, under the guise of my Automatic Updater plugin. (In other words, Dion and I will be copy/pasting some code we’ve already written, then taking the next few months off. :-D) If we can’t get away with that, however, there’s always plenty of work to be done on WordPress Core. We currently have around 3200 open tickets to get through, so “pick a ticket and fix it” contributions are a good thing to do!

As for the experimental part, this is the first instance of what we’d like to become an ongoing thing – every WordPress release cycle, a few Automatticians can drop their usual work, instead devoting time to contributing back to the WordPress Core project. The wider WordPress community has been one of the many factors contributing to Automattic’s success, it’s only right that we return the love.

So, that’s about it. Time to get on with writing some WordPress core code.