WordPress Importers: Stating the Problem

It’s time to focus on the WordPress Importers.

I’m not talking about tidying them up, or improve performance, or fixing some bugs, though these are certainly things that should happen. Instead, we need to consider their purpose, how they fit as a driver of WordPress’ commitment to Open Source, and how they can be a key element in helping to keep the Internet Open and Free.

The History

The WordPress Importers are arguably the key driver to WordPress’ early success. Before the importer plugins existed (before WordPress even supported plugins!) there were a handful of import-*.php scripts in the wp-admin directory that could be used to import blogs from other blogging platforms. When other platforms fell out of favour, WordPress already had an importer ready for people to move their site over. One of the most notable instances was in 2004, when Moveable Type changed their license and prices, suddenly requiring personal blog authors to pay for something that had previously been free. WordPress was fortunate enough to be in the right place at the right time: many of WordPress’ earliest users came from Moveable Type.

As time went on, WordPress became well known in its own right. Growth relied less on people wanting to switch from another provider, and more on people choosing to start their site with WordPress. For practical reasons, the importers were moved out of WordPress Core, and into their own plugins. Since then, they’ve largely been in maintenance mode: bugs are fixed when they come up, but since export formats rarely change, they’ve just continued to work for all these years.

An unfortunate side effect of this, however, is that new importers are rarely written. While a new breed of services have sprung up over the years, the WordPress importers haven’t kept up.

The New Services

There are many new CMS services that have cropped up in recent years, and we don’t have importers for any of them. WordPress.com has a few extra ones written, but they’ve been built on the WordPress.com infrastructure out of necessity.

You see, we’ve always assumed that other CMSes will provide some sort of export file that we can use to import into WordPress. That isn’t always the case, however. Some services (notable, Wix and GoDaddy Website Builder) deliberately don’t allow you to export your own content. Other services provide incomplete or fragmented exports, needlessly forcing stress upon site owners who want to use their own content outside of that service.

To work around this, WordPress.com has implemented importers that effectively scrape the site: while this has worked to some degree, it does require regular maintenance, and the importer has to do a lot of guessing about how the content should be transformed. This is clearly not a solution that would be maintainable as a plugin.

Problem Number 4

Some services work against their customers, and actively prevent site owners from controlling their own content.

This strikes at the heart of the WordPress Bill of Rights. WordPress is built with fundamental freedoms in mind: all of those freedoms point to owning your content, and being able to make use of it in any form you like. When a CMS actively works against providing such freedom to their community, I would argue that we have an obligation to help that community out.

A Variety of Content

It’s worth discussing how, when starting a modern CMS service, the bar for success is very high. You can’t get away with just providing a basic CMS: you need to provide all the options. Blogs, eCommerce, mailing lists, forums, themes, polls, statistics, contact forms, integrations, embeds, the list goes on. The closest comparison to modern CMS services is… the entire WordPress ecosystem: built on WordPress core, but with the myriad of plugins and themes available, along with the variety of services offered by a huge array of companies.

So, when we talk about the importers, we need to consider how they’ll be used.

Problem Number 3

To import from a modern CMS service into WordPress, your importer needs to map from service features to WordPress plugins.

Getting Our Own House In Order

Some of these problems don’t just apply to new services, however.

Out of the box, WordPress exports to WXR (WordPress eXtended RSS) files: an XML file that contains the content of the site. Back when WXR was first created, this was all you really needed, but much like the rest of the WordPress importers, it hasn’t kept up with the times. A modern WordPress site isn’t just the sum of its content: a WordPress site has plugins and themes. It has various options configured, it has huge quantities of media, it has masses of text content, far more than the first WordPress sites ever had.

Problem Number 2

WXR doesn’t contain a full export of a WordPress site.

In my view, WXR is a solid format for handling exports. An XML-based system is quite capable of containing all forms of content, so it’s reasonable that we could expand the WXR format to contain the entire site.

Built for the Future

If there’s one thing we can learn from the history of the WordPress importers, it’s that maintenance will potentially be sporadic. Importers are unlikely to receive the same attention that the broader WordPress Core project does, owners may come and go. An importer will get attention if it breaks, of course, but it otherwise may go months or years without changing.

Problem Number 1

We can’t depend on regular importer maintenance in the future.

It’s quite possible to build code that will be running in 10+ years: we see examples all across the WordPress ecosystem. Doing it in a reliable fashion needs to be a deliberate choice, however.

What’s Next?

Having worked our way down from the larger philosophical reasons for the importers, to some of the more technically-oriented implementation problems; I’d like to work our way back out again, focussing on each problem individually. In the following posts, I’ll start laying out how I think we can bring our importers up to speed, prepare them for the future, and make them available for everyone.

This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

9 comments

  1. Yes please! One of my biggest frustrations is with the exporter NOT including media. Come on, WP! It’s 2021 already!

  2. There’s another important aspect of the WordPress Importer (specifically WP > WP) that shouldn’t be overlooked, and that’s exporting/importing WP blogs to new sites. Maybe it’s the Media Library that should be rebuilt here, but you can’t just copy over the /wp-content/uploads folder and have your Media Library remain intact. It seems like this should be easy, but I’ve been finding out it’s far from that.

    1. That’s a tricky problem, since a Media Library is made up of more than just the files: there’s a lot of metadata stored in the database. Some of that could be rebuilt from the uploads folder, but much of it would be lost. I’m inclined to think that any solution will need to include both the files, and the database content.

      That said, I agree with the broader point that media is poorly handled by the importer, and something that feels like it should be easy… isn’t. It’s certainly an area that needs attention.

  3. The biggest challenge is that modern blogs and websites are not a set of plain texts anymore. In the past, you had a website with a set of pages with several attributes that could be easily mapped to WordPress ones. It was a flat structure, and you could even use CSV files to export/import website content. Today, if you want to move a website to a different platform, you have to create a custom-built solution for every case separately. Why? Because modern CMSs have a set of complex, hierarchical attributes and links between them. It’s obvious that you will use JSON to transport them, but you need to develop an algorithm to map all attributes and objects in an article programmatically. It’s not a static mapping from a CMS schema to the WordPress schema anymore.

    1. I agree, mapping the complexities of modern CMSes over to WordPress is a huge challenge, but it’s far from insurmountable.

      There’s certainly room to handle much if it in an automated fashion, but as we’ve found in WordPress.com when implementing importers for Wix and GoDaddy, there’s a certain amount of human intervention required. Making that experience as smooth as possible is key, and would allow us to iterate on the automation over time.

  4. Excellent points and thoughts, Garry! Thank you for sharing! I have been working with team newspack.pub where we assist digital newsroom publications to adopt Newspack, which is a WP-based platform. This has been a part of my work for two years now, and we have experiences with both non-WP sites, and WP sites.

    I relate to the fact that different data formats (and data scopes) are a primary challenge here, but also closely tied to the accessibility of this raw data — hence creating a need for scrapers.

    But I particularly like you pointing out that a modern CMSes contain services which manage their content (different types of sites), and therefore an importer (which might be called a converter in a broader sense, or a migrator perhaps) has a task and a challenge to map those services too, and successfully manage and integrate the data to its proper flow.

  5. Great to see this getting a priority item.

    If I need to migrate a WP site there many problems in our way:
    Featured images are not migrated, Customizer settings are not migrated, Widgets are not migrated, the importer has problems with big media libraries (time outs) and there is no easy way to migrate site to or from a multisite … to just name a few big problems with the status quo of the importer.

  6. Using the exporter and importer duplicates posts and images… I have had this issue countless times until I started using wp-cli so imagine a photographer moving to another WP site using the export/import only to have 3 versions of a 5MB image? 2 articles of same?

    1. This is a good point, the importer currently doesn’t do much in the way of de-duplication, but I think now would be a good time to tackle that problem.

Comments are closed.