Tag Archives: MySQL

Partitioning the WordPress Comments Table

WordPress sites can get big. Really big. When you’re looking at a site of Cheezburger, Engadget or Techcrunch proportions, you get hundreds of comments per post, on dozens of posts per day, which adds up to millions of comments per year.

In order to keep your site running in top condition, you don’t want to be running queries against tables with lots of rarely accessed rows, which is what happens with most comments – after the post drops off the front page, readership drops, so the comments are viewed much less frequently. So, what we want to do is remove these old comments from the primary comment table, but keep them handy, for when people read the archives.

Enter partitioning.

The idea of MySQL partitioning is that it splits tables up into multiple logical tablespaces, based on your criteria. Running a query on a single partition of a large table is much faster than running it across the entire table, even with appropriate indexes.

In the case of the WordPress comments table, splitting it up by the `comment_post_ID` seems to be the most appropriate . This should keep the partitions to a reasonable size, and ensure that there’s minimal cross-over between partitions.

First off, we need to add the `comment_post_ID` column to the Primary Key. This can be a slow process if you already have a massive `wp_comments` table, so you may need to schedule some downtime to handle this. Alternatively, there many methods for making schema changes with no downtime, such as judicious use of Replication, Facebook’s Online Schema Change Tool, or the currently-in-development mk-online-schema-change, for Maatkit.

ALTER TABLE wp_comments DROP PRIMARY KEY, ADD PRIMARY KEY (comment_ID, comment_post_ID);

Now that we’ve altered this index, we can define the partitions. For this example, we’ll say we want the comments for 1000 posts per partition. This query can take a long time to run, if you already have many comments in your system.

ALTER TABLE wp_comments PARTITION BY RANGE(comment_post_ID) (
    PARTITION p0 VALUES LESS THAN (1000),
    PARTITION p1 VALUES LESS THAN (2000),
    PARTITION p2 VALUES LESS THAN (3000),
    PARTITION p3 VALUES LESS THAN (4000),
    PARTITION p4 VALUES LESS THAN (5000),
    PARTITION p5 VALUES LESS THAN (6000),
    PARTITION p6 VALUES LESS THAN MAXVALUE
);

When you’re approaching the next partition divider value, adding a new partition is simple. For example, you’d run this query around post 6000.

ALTER TABLE wp_comments REORGANIZE PARTITION p6 INTO (
    PARTITION p6 VALUES LESS THAN (7000),
    PARTITION p7 VALUES LESS THAN MAXVALUE
);

Naturally, this process is most useful for very large WordPress sites. If you’re starting a new site with big plans, however, you may just want to factor this into your architecture.

UPDATE: Changed the partition definition to better reflect how WordPress uses the wp_comments table, per Giuseppe’s comments.

Welcome, SkySQL!

It seems the SkySQL website just went live, which I hope will breath some life back into the MySQL ecosphere – it’s been a while since there’s been some new competition, especially in the style of classic MySQL services.

For those too lazy to read the SkySQL site, the services offered are similar to what you’d be familiar with from Oracle:

  • SkySQL’s Consulting and Training are pretty much the same as Oracle’s existing offerings, though a bit more limited. I expect this to grow as SkySQL grows, however.
  • SkySQL’s Support has been simplified slightly, removing the equivalent of MySQL Enterprise Gold support.
  • They are offering monitoring and query editing, (which Oracle offers with MySQL Enterprise Monitor, and MySQL Workbench) through what I can only assume are branded versions of Webyog‘s MONyog and SQLyog. A smart move on their part – rather than having to develop something in house on a startup budget and timeframe, they can offer a mature product off the bat.

Sadly, prices aren’t listed, so we can’t really compare that.

I do hope to see SkySQL evolve further – I count many of SkySQL’s founding employees as friends, and I know they won’t stop at just offering the same services as Oracle. I’m always a fan of a bit of friendly competition! :)

Lycka till, SkySQL!

Leaving MySQL (Not Really)

I’ve been a bit slack about writing my MySQL thoughts of late. This would be caused by the fact that, as I write this, I’m now one week into a 12 month leave of absence from MySQL.

Having given it much careful consideration, I’ve decided that the wisest way to survive the current economic problems is by blowing my savings on a year long holiday in Italy. Wait, did I say holiday? Not really. I’m still a Sun employee, and I’m still going to be active in the MySQL community. My dear support customers just won’t be seeing me around for a while. :)

I’m looking forward to having time to write more extensively about some of the cool things we’re doing, and what’s going on in the community at large. If there’s anything you’d like to hear about (either expanding on my previous posts, or a completely new topic), please let me know.

Another thing I’d like to do, if there are any interested parties, is to see how companies are using MySQL in their part of the world. So, if you don’t mind showing off what you’re doing and having me write a little bit about it, feel free to drop me a line. All of my current contact details can be found on my contact page. I’m going to be primarily based in Milan, but I’ll be looking to travel around the rest of Europe at some point, so I’d be more than happy to stop by and see you if the opportunity arises.

MySQL and Geospatial Data

MySQL has had basic support for Geospatial Data since 4.1, but has lacked some of the features of the OpenGIS specifications since then. The good news is, this is rapidly changing. Our own Holyfoot has been hammering away at WorkLog #1327, to provide precise functions for our GIS support.

Even better, it’s fast. How fast? Well, the good people at Oki Labs, apart from having implemented several new GIS functions for MySQL, have done some benchmarking, and it’s looking good. If you’ll excuse the cliched comparison to Postgres, here are the response times (seconds) of MySQL GIS vs. PostGIS in Oki’s test:

Connections PostGIS MySQL
11.8170.220
10010.5170.557

Source: http://www.osgeo.jp/wordpress/wp-content/uploads/2008/11/foss4g2008_okumura.pdf

If you’re interested in checking it out, the source tree (regularly merged with MySQL 5.1) is available here. Have a look at Giuseppe’s guide to running a Bazaar export in MySQL Sandbox.

Open Database Alliance = Awesome

The big news coming from the MySQL Community today is that Monty Widenius and Percona have founded the Open Database Alliance, a group focused on ”unifing all MySQL-related development and services, providing a solution to the fragmentation and uncertainty facing the communities, businesses and technical experts involved with MySQL”.

I, for one, am 100% behind this. I’ve always been a big fan of community foundations being a focus point for development efforts, they work well to bring everyone together, and to provide a sensible foundation to help avoid much of the uncertainty that seems to spring up around MySQL. I certainly hope that the ODA is able to do the same.

Though I do have one question, how does the ODA plan on handling competing members? If you have two companies offering the same service in the same market, which one will the ODA recommend? Monty specifically says that “all companies that are joining the Alliance should bring something to the table”, but it’s a bit difficult to bring something new when there are already several large players in the MySQL market.

I shall certainly be watching the progress of this alliance with great interest, it has the potential to turn the MySQL Community into a large driving force for development and change.

The press release is available here, Monty has written some interesting thoughts about it here.

Don’t Forget to Alter your Federated Tables!

If you’re using the Federated engine, here’s something important to remember (apart from the usual advice of “please don’t”). If you need to change the structure of the remote table, always remember to update the Federated table. If not, when you try to use the table, you’ll get this error:

mysql> SELECT * FROM foo;
ERROR 1030 (HY000): Got error 1 from storage engine

This error isn’t really helpful. The problem is, the Federated engine only checks that the remote table structure is correct when it initially connects. Once it has connected, no more checks. When you restart the server, you get a much more helpful message:

mysql SELECT * FROM foo;
ERROR 1431 (HY000): The foreign data source you are trying to reference does not exist. Data source error:  error: 1054  'Unknown column 'b' in 'field list''

Also, keep your eye on the FederatedX project. It’s still under development, but will hopefully upgrade the Federated engine to being useful again.

MySQL is People!

I went skydiving yesterday. Here’s a short video of me voluntarily leaving an airborne and perfectly sound aeroplane:

What does this have to do with MySQL? Well, over the past few weeks there have been a bunch of conspiracy theories bouncing around. There are various topics, but the two favourite at the moment happen to be Oracle’s plans for MySQL, and the licensing of the MySQL documentation. There has been a long history of conspiracies surrounding MySQL, from Oracle’s original purchase of InnoDB, to our decision to create the Enterprise edition of the server, through to our long and bumpy release cycle.

Now, don’t get me wrong. I’m not making any calls to stifle discussion, I’m a big fan of community input. I was a member of the community before I joined MySQL, and I like to think that I still am. But I would like it if we could at least think about conspiracy theories before posting about them. We’re all people here at MySQL, we have evenings and weekends and lives just like you. Some of us are crazy enough to do silly things like jumping out of aeroplanes. We’re not out to get you, and we’re certainly not planning on turning into some sort of faceless corporate stereotype. We’re here to do what we love, creating and supporting a really good product.

Oh, and how do you know this isn’t some corporate play to make us seem human? Well, it’s 9:30pm on a Sunday night here, I’m yet to find a company who could pay me well enough to be shilling for them. But MySQL happens to be a group of people I like enough to defend them on my own time.

MySQL Workbench: My Impressions

I’ve been using the MySQL Workbench 5.1 beta for the past few days now, and I’m wondering how I designed databases without it.

Okay, so that’s a pretty strong statement, but I’m genuinely happy with it. 5.1 has fixed my main problem with 5.0, in that the EER diagram mode was horribly slow to render, now it’s all nice and smooth. The ability to easily visualise tables and their relationships makes design very simple.

In fact, I really only have one (minor) complaint, the ability to export without foreign keys would be nice. Sometimes you just don’t want to deal with the performance hit.

That’s about it. Go and download the OSS edition for free now, have a play around. Make it your Friday afternoon experiment. I promise you’ll like it.

Backing up permissions for individual databases

Sometimes, you want to backup individual databases in MySQL to move to a different server. This part is easy using mysqldump:

shell> mysqldump -u root -p --databases db1 db2 ... > backup.sql

The problem is, what happens when you want to backup the permissions associated with these databases? Well, here are a few queries to help you out.

-- Grab the users with global permissions,
-- with permissions to the databases you want,
-- and tables/stored procedures in it.
mysql> SELECT u.* INTO OUTFILE '/tmp/user.txt'
	FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\'
	LINES TERMINATED BY '\n'
	FROM
		mysql.user u
	WHERE
		u.Select_priv='Y'
	UNION
	SELECT u.*
	FROM
		mysql.user u,
		mysql.db d
	WHERE
		d.Db IN('db1', 'db2', ...) AND
		d.User = u.user
	UNION
	SELECT u.*
	FROM
		mysql.user u,
		mysql.tables_priv t
	WHERE
		t.Db IN('db1', 'db2', ...) AND
		t.User = u.User
	UNION
	SELECT u.*
	FROM
		mysql.user u,
		mysql.procs_priv p
	WHERE
		p.Db IN('db1', 'db2', ...) AND
		p.User = u.User;
-- Now, grab the database permissions, and those of objects in the database.
mysql> SELECT * INTO OUTFILE '/tmp/db.txt'
	FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\'
	LINES TERMINATED BY '\n'
	FROM
		mysql.db
	WHERE
		Db IN('db1', 'db2', ...);
mysql> SELECT * INTO OUTFILE '/tmp/tables_priv.txt'
	FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\'
	LINES TERMINATED BY '\n'
	FROM
		mysql.tables_priv
	WHERE
		Db IN('db1', 'db2', ...);
mysql> SELECT * INTO OUTFILE '/tmp/procs_priv.txt'
	FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\'
	LINES TERMINATED BY '\n'
	FROM
		mysql.procs_priv
	WHERE
		Db IN('db1', 'db2', ...);

Then, re-loading the permissions onto the new server is simple:

mysql> LOAD DATA INFILE '/tmp/user.txt'
	FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\'
	LINES TERMINATED BY '\n'
	INTO TABLE mysql.user;
mysql> LOAD DATA INFILE '/tmp/db.txt'
	FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\'
	LINES TERMINATED BY '\n'
	INTO TABLE mysql.db;
mysql> LOAD DATA INFILE '/tmp/tables_priv.txt'
	FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\'
	LINES TERMINATED BY '\n'
	INTO TABLE mysql.tables_priv;
mysql> LOAD DATA INFILE '/tmp/procs_priv.txt'
	FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\'
	LINES TERMINATED BY '\n'
	INTO TABLE mysql.procs_priv;

All up, a few queries to account for everything, but pretty easy to include in your backup/restore process. For further development, you could put the database list in a variable, so that you only need to change it on one line, rather than 6.

A Brief Introduction to MySQL Performance Tuning

Here are some common performance tuning concepts that I frequently run into. Please note that this really is only a basic introduction to performance tuning. For more in-depth tuning, it strongly depends on your systems, data and usage.

Server Variables

For tuning InnoDB performance, your primary variable is innodb_buffer_pool_size. This is the chunk of memory that InnoDB uses for caching data, indexes and various pieces of information about your database. The bigger, the better. If you can cache all of your data in memory, you’ll see significant performance improvements.

For MyISAM, there is a similar buffer defined by key_buffer_size, though this is only used for indexes, not data. Again, the bigger, the better.

Other variables that are worth investigating for performance tuning are:

query_cache_size – This can be very useful if you have a small number of read queries that are repeated frequently, with no write queries in between. There have been problems with too large a query cache locking up the server, so you will need to experiment to find a value that’s right for you.

innodb_log_file_size – Don’t fall into the trap of setting this to be too large. A large InnoDB log file group is necessary if you have lots of large, concurrent transactions, but comes at the expense of slowing down InnoDB recover, in event of a crash.

sort_buffer_size – Another one that shouldn’t be set too large. Peter Zaitsev did some testing a while back showing that increasing sort_buffer_size can in fact reduce the speed of the query.

Server Hardware

There are a few solid recommendations for improving the performance of MySQL by upgrading your hardware:

  • Use a 64-bit processor, operating system and MySQL binary. This will allow you to address lots of RAM. At this point in time, InnoDB does have issues scaling past 8 cores, so you don’t need to go out of your way to have lots of processors.
  • Speaking of RAM, buy lots of it. Enough to fit all of your data and indexes, if you can.
  • If you can’t fit all of your data into RAM, you’ll need fast disks, RAID if you can. Have multiple disks, so you can seperate your data files, OS files and log files onto different physical disks.

Query Tuning

Finally, though probably the most important, we look at tuning queries. In particular, we make sure that they’re using indexes, and they’re running quickly. To do so, turn on the Slow Query Log for a day, with log_queries_not_using_indexes enabled as well. Run the resulting log through mysqldumpslow, which will produce a summary of the log. This will help you prioritize which queries to tackle first. Then, you can use EXPLAIN to find out what they’re doing, and adjust your indexes accordingly.

Have fun!