Upgrading MySQL with minimal downtime through Replication

Problem

With the release of MySQL 5.1, many DBAs are going to be scheduling downtime to upgrade their MySQL Server. As with all upgrades between major version numbers, it requires one of two upgrade paths:

  • Dump/reload: The safest method of upgrading, but it takes out your server for quite some time, especially if you have a large data set.
  • mysql_upgrade: A much faster method, but it can still be slow for very large data sets.

I’m here to present a third option. It requires minimal application downtime, and is reasonably simple to prepare for and perform.

Preparation

First of all, you’re going to need a second server (which I’ll refer to as S2). It will act as a ‘stand-in’, while the main server (which I’ll refer to as S1) is upgraded. Once S2 is ready to go, you can begin the preparation:

  • If you haven’t already, enable Binary Logging on S1. We will need it to act as a replication Master.
  • Add an extra bit of functionality to your backup procedure. You will need to store the Binary Log position from when the backup was taken.
    • If you’re using mysqldump, simply add the –master-data option to your mysqldump call.
    • If you’re using InnoDB Hot Backup, there’s no need to make a change.  The Binary Log position is shown when you restore the backup.
    • For other backup methods, you will probably need to get the Binary Log position manually:

    Once you have a backup with the corresponding Binary Log position, you can setup S2:

    • Install MySQL 5.1 on S2.
    • Restore the backup from S1 to S2.
    • Create the Slave user on S1.
    • Enter the Slave settings on S2. You should familiarise yourself with the Replication documentation.
    • Enable Binary Logging on S2. We’ll need this during the upgrade process.
    • Setup S2 as a Slave of S1:
      • If you used mysqldump for the backup, you will need to run the following query:
      • For any other method, you’ll need to specify the Binary Log position as well:
    • Start the Slave on S2:

    The major pre-upgrade work is now complete.

    Upgrade

    Just before beginning the upgrade, take a backup of S2. For speed, I’d recommend running the following queries, then shutting down the MySQL server and copying the data files for the backup.

    Once the backup is complete, restart S2 and let it catch up with S1 again.

    When you’re ready to begin the upgrade, you will need a minor outage. Stop your application, and let S2 catch up with S1. Once it has caught up, they will have identical data. So, switch your application to using S2 instead of S1. Your application can continue running unaffected while you upgrade S1 server.

    • Stop the Slave process on S2:
    • Stop S1.
    • Upgrade S1 to MySQL 5.1.
    • Move the S1 data files to a backup location.
    • Move the backup from S2 into S1’s data directory.
    • Start S2.
    • Setup S1 as a Slave to S2, same as when we made S2 a Slave of S1.
    • Let S1 catch up with S2. When it has caught up, stop your application, and make sure S1 is still caught up with S2.
    • Switch your application back to using S1.

    Complete! Hooray! You just need to run a couple of queries on S1 to clean up the Slave settings:

    Conclusion

    You can keep the outage to only a few minutes while performing this upgrade, removing the need for potentially expensive downtime. If you need the downtime to be zero, you probably want to be looking at a Circular Replication system, though that’s getting a little outside of this blog post.

pento.net goes mobile!

With a bit of fiddling around, I’ve found a good combination of WordPress plugins for mobile support.

  • For iPhone/iPod: WPtouch. Lots of options, looks good on the iPhone browser.
  • For all other mobiles: WP-viewMobile. This one is particularly handy, because it gives you the option to define the user agent strings it should activate for. To make it play nicely with WPtouch, I just had to remove the iPhone and iPod entries.

For both of these, setting them up was as simple as turning them on. I also added the various search engine mobile crawlers to WP-viewMobile. At the moment, the list of user agent strings I have are: Googlebot-Mobile, Y!J-SRD/1.0, msnbot-mobile, MSMObot. If anyone knows any others, please let me know.

So now, if you desperately want to check my site from the road, you can read it a bit easier on your mobile screen.

Permissions by interface on the local server

I had an issue come up recently that involved some confusion over permissions for the same user, connecting through different interfaces. For example, say you have a server with the public IP address of 192.168.0.1. You could connect to it from the local machine using the following commands:

They all connect to the local server, but they can all have different permissions. Here are a couple of rules to make your life easier:

  • Don’t use @127.0.0.1, unless you absolutely can’t use the socket file for some reason. Connecting through @localhost is usually faster than the loopback device, and it’s easier to type.
  • Only connect through the network interface if you’re planning on moving the application to a different server later on.

That’s all. 🙂

Don’t use GoDaddy Hosting

It will only cause you pain.

To give a brief description of the problems I had:

  • Apache is horribly slow. To the point that some RSS readers were having problems subscribing to my blog.
  • Apache setup is weird, and causes all sorts of problems.
  • FTP access is mind-boggling slow, only allows one connection at a time, and at one point, wouldn’t let me download directories.
  • MySQL is sickeningly slow.
  • More expensive than better hosts.
  • Really bad web interface for site management.

Now, I’m happily away from them. Never again.

Replication Checksumming Through Encryption

Problem

A problem we occasionally see is Relay Log corruption, which is most frequently caused by network errors. At this point in time, the replication IO thread does not perform checksumming on incoming data (currently scheduled for MySQL 6.x). In the mean time, we have a relatively easy workaround: encrypt the replication connection. Because of the nature of encrypted connections, they have to checksum each packet.

Solution 1: Replication over SSH Tunnel

This is the easiest to setup. You simply need to do the following on the Slave:

This sets up the tunnel. slave.server:4306 is now a tunnelled link to master.server:3306. So now, you just need to alter the Slave to go through the tunnel:

Everything else stays the same. Your Slave is still connecting to the same Master, just in a different manner.

This solution does have a couple of downsides, however:

  • If the SSH tunnel goes down, it won’t automatically reconnect. This can be fixed with a small script that restarts the connection if it fails. The script can be added to your init.d setup, so it automatically opens on server startup.
  • If you use MySQL Enterprise Monitor, it won’t be able to recognize that the Master/Slave pair go together.

Solution 2: Replication with SSL

Replication with SSL can be trickier to setup, but it removes the two downsides of the previous solution. Luckily, the MySQL Documentation Team have done all the hard work for you.

Conclusion

If you’re seeing corruption problems in your Relay Log, but not in your Master Binary Log, try Solution 1. It’s quick to setup and will determine if encryption is the solution to your problem. If it works, setup Solution 2. It will take a little bit of fiddling around, but is certainly worth the effort.