Partitioning the WordPress Comments Table
WordPress sites can get big. Really big. When you’re looking at a site of Cheezburger, Engadget or Techcrunch proportions, you get hundreds of comments per post, on dozens of posts per day, which adds up to millions of comments per year.
In order to keep your site running in top condition, you don’t want to be running queries against tables with lots of rarely accessed rows, which is what happens with most comments – after the post drops off the front page, readership drops, so the comments are viewed much less frequently. So, what we want to do is remove these old comments from the primary comment table, but keep them handy, for when people read the archives.
Enter partitioning.
The idea of MySQL partitioning is that it splits tables up into multiple logical tablespaces, based on your criteria. Running a query on a single partition of a large table is much faster than running it across the entire table, even with appropriate indexes.
In the case of the WordPress comments table, splitting it up by the `comment_post_ID` seems to be the most appropriate . This should keep the partitions to a reasonable size, and ensure that there’s minimal cross-over between partitions.
First off, we need to add the `comment_post_ID` column to the Primary Key. This can be a slow process if you already have a massive `wp_comments` table, so you may need to schedule some downtime to handle this. Alternatively, there many methods for making schema changes with no downtime, such as judicious use of Replication, Facebook’s Online Schema Change Tool, or the currently-in-development mk-online-schema-change, for Maatkit.
ALTER TABLE wp_comments DROP PRIMARY KEY, ADD PRIMARY KEY (comment_ID, comment_post_ID);
Now that we’ve altered this index, we can define the partitions. For this example, we’ll say we want the comments for 1000 posts per partition. This query can take a long time to run, if you already have many comments in your system.
ALTER TABLE wp_comments PARTITION BY RANGE(comment_post_ID) ( PARTITION p0 VALUES LESS THAN (1000), PARTITION p1 VALUES LESS THAN (2000), PARTITION p2 VALUES LESS THAN (3000), PARTITION p3 VALUES LESS THAN (4000), PARTITION p4 VALUES LESS THAN (5000), PARTITION p5 VALUES LESS THAN (6000), PARTITION p6 VALUES LESS THAN MAXVALUE );
When you’re approaching the next partition divider value, adding a new partition is simple. For example, you’d run this query around post 6000.
ALTER TABLE wp_comments REORGANIZE PARTITION p6 INTO ( PARTITION p6 VALUES LESS THAN (7000), PARTITION p7 VALUES LESS THAN MAXVALUE );
Naturally, this process is most useful for very large WordPress sites. If you’re starting a new site with big plans, however, you may just want to factor this into your architecture.
UPDATE: Changed the partition definition to better reflect how WordPress uses the wp_comments table, per Giuseppe’s comments.
Welcome, SkySQL!
It seems the SkySQL website just went live, which I hope will breath some life back into the MySQL ecosphere – it’s been a while since there’s been some new competition, especially in the style of classic MySQL services.
For those too lazy to read the SkySQL site, the services offered are similar to what you’d be familiar with from Oracle:
- SkySQL’s Consulting and Training are pretty much the same as Oracle’s existing offerings, though a bit more limited. I expect this to grow as SkySQL grows, however.
- SkySQL’s Support has been simplified slightly, removing the equivalent of MySQL Enterprise Gold support.
- They are offering monitoring and query editing, (which Oracle offers with MySQL Enterprise Monitor, and MySQL Workbench) through what I can only assume are branded versions of Webyog‘s MONyog and SQLyog. A smart move on their part – rather than having to develop something in house on a startup budget and timeframe, they can offer a mature product off the bat.
Sadly, prices aren’t listed, so we can’t really compare that.
I do hope to see SkySQL evolve further – I count many of SkySQL’s founding employees as friends, and I know they won’t stop at just offering the same services as Oracle. I’m always a fan of a bit of friendly competition! ![]()
Lycka till, SkySQL!
Leaving MySQL (Not Really)
I’ve been a bit slack about writing my MySQL thoughts of late. This would be caused by the fact that, as I write this, I’m now one week into a 12 month leave of absence from MySQL.
Having given it much careful consideration, I’ve decided that the wisest way to survive the current economic problems is by blowing my savings on a year long holiday in Italy. Wait, did I say holiday? Not really. I’m still a Sun employee, and I’m still going to be active in the MySQL community. My dear support customers just won’t be seeing me around for a while. ![]()
I’m looking forward to having time to write more extensively about some of the cool things we’re doing, and what’s going on in the community at large. If there’s anything you’d like to hear about (either expanding on my previous posts, or a completely new topic), please let me know.
Another thing I’d like to do, if there are any interested parties, is to see how companies are using MySQL in their part of the world. So, if you don’t mind showing off what you’re doing and having me write a little bit about it, feel free to drop me a line. All of my current contact details can be found on my contact page. I’m going to be primarily based in Milan, but I’ll be looking to travel around the rest of Europe at some point, so I’d be more than happy to stop by and see you if the opportunity arises.
MySQL and Geospatial Data
MySQL has had basic support for Geospatial Data since 4.1, but has lacked some of the features of the OpenGIS specifications since then. The good news is, this is rapidly changing. Our own Holyfoot has been hammering away at WorkLog #1327, to provide precise functions for our GIS support.
Even better, it’s fast. How fast? Well, the good people at Oki Labs, apart from having implemented several new GIS functions for MySQL, have done some benchmarking, and it’s looking good. If you’ll excuse the cliched comparison to Postgres, here are the response times (seconds) of MySQL GIS vs. PostGIS in Oki’s test:
| Connections | PostGIS | MySQL |
|---|---|---|
| 1 | 1.817 | 0.220 |
| 100 | 10.517 | 0.557 |
Source: http://www.osgeo.jp/wordpress/wp-content/uploads/2008/11/foss4g2008_okumura.pdf
If you’re interested in checking it out, the source tree (regularly merged with MySQL 5.1) is available here. Have a look at Giuseppe’s guide to running a Bazaar export in MySQL Sandbox.
Open Database Alliance = Awesome
The big news coming from the MySQL Community today is that Monty Widenius and Percona have founded the Open Database Alliance, a group focused on ”unifing all MySQL-related development and services, providing a solution to the fragmentation and uncertainty facing the communities, businesses and technical experts involved with MySQL”.
I, for one, am 100% behind this. I’ve always been a big fan of community foundations being a focus point for development efforts, they work well to bring everyone together, and to provide a sensible foundation to help avoid much of the uncertainty that seems to spring up around MySQL. I certainly hope that the ODA is able to do the same.
Though I do have one question, how does the ODA plan on handling competing members? If you have two companies offering the same service in the same market, which one will the ODA recommend? Monty specifically says that “all companies that are joining the Alliance should bring something to the table”, but it’s a bit difficult to bring something new when there are already several large players in the MySQL market.
I shall certainly be watching the progress of this alliance with great interest, it has the potential to turn the MySQL Community into a large driving force for development and change.
The press release is available here, Monty has written some interesting thoughts about it here.
Don’t Forget to Alter your Federated Tables!
If you’re using the Federated engine, here’s something important to remember (apart from the usual advice of “please don’t”). If you need to change the structure of the remote table, always remember to update the Federated table. If not, when you try to use the table, you’ll get this error:
mysql> SELECT * FROM foo; ERROR 1030 (HY000): Got error 1 from storage engine
This error isn’t really helpful. The problem is, the Federated engine only checks that the remote table structure is correct when it initially connects. Once it has connected, no more checks. When you restart the server, you get a much more helpful message:
mysql SELECT * FROM foo; ERROR 1431 (HY000): The foreign data source you are trying to reference does not exist. Data source error: error: 1054 'Unknown column 'b' in 'field list''
Also, keep your eye on the FederatedX project. It’s still under development, but will hopefully upgrade the Federated engine to being useful again.
MySQL is People!
I went skydiving yesterday. Here’s a short video of me voluntarily leaving an airborne and perfectly sound aeroplane:
What does this have to do with MySQL? Well, over the past few weeks there have been a bunch of conspiracy theories bouncing around. There are various topics, but the two favourite at the moment happen to be Oracle’s plans for MySQL, and the licensing of the MySQL documentation. There has been a long history of conspiracies surrounding MySQL, from Oracle’s original purchase of InnoDB, to our decision to create the Enterprise edition of the server, through to our long and bumpy release cycle.
Now, don’t get me wrong. I’m not making any calls to stifle discussion, I’m a big fan of community input. I was a member of the community before I joined MySQL, and I like to think that I still am. But I would like it if we could at least think about conspiracy theories before posting about them. We’re all people here at MySQL, we have evenings and weekends and lives just like you. Some of us are crazy enough to do silly things like jumping out of aeroplanes. We’re not out to get you, and we’re certainly not planning on turning into some sort of faceless corporate stereotype. We’re here to do what we love, creating and supporting a really good product.
Oh, and how do you know this isn’t some corporate play to make us seem human? Well, it’s 9:30pm on a Sunday night here, I’m yet to find a company who could pay me well enough to be shilling for them. But MySQL happens to be a group of people I like enough to defend them on my own time.
MySQL Workbench: My Impressions
I’ve been using the MySQL Workbench 5.1 beta for the past few days now, and I’m wondering how I designed databases without it.
Okay, so that’s a pretty strong statement, but I’m genuinely happy with it. 5.1 has fixed my main problem with 5.0, in that the EER diagram mode was horribly slow to render, now it’s all nice and smooth. The ability to easily visualise tables and their relationships makes design very simple.
In fact, I really only have one (minor) complaint, the ability to export without foreign keys would be nice. Sometimes you just don’t want to deal with the performance hit.
That’s about it. Go and download the OSS edition for free now, have a play around. Make it your Friday afternoon experiment. I promise you’ll like it.
Extracting a Database From a mysqldump File
Restoring a single database from a full dump is pretty easy, using the mysql command line client’s --one-database option:
mysql> mysql -u root -p --one-database db_to_restore < fulldump.sql
But what if you don’t want to restore the database, you just want to extract it out of the dump file? Well, that happens to be easy as well, thanks to the magic of sed:
shell> sed -n '/^-- Current Database: `test`/,/^-- Current Database: `/p' fulldump.sql > test.sql
You just need to change “test” to be the name of the database you want extracted.
Don’t put a NULL in the IN clause in 5.1
There seems to be an optimizer problem in 5.1, if you put a NULL in the IN clause of a SELECT. For example, given the following table:
CREATE TABLE foo ( a INT NOT NULL AUTO_INCREMENT, PRIMARY KEY (a) );
Compare these two EXPLAINs:
mysql> EXPLAIN * FROM foo WHERE a IN (160000, 160001, 160002)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: foo type: range possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 3 Extra: Using where 1 row in set (0.06 sec) mysql> EXPLAIN SELECT * FROM foo WHERE a IN (NULL, 160000, 160001, 160002)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: foo type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 327680 Extra: Using where 1 row in set (0.00 sec)
In the query with the NULL, it does a full table scan. So, if you’ve run into this problem under MySQL 5.1, the workaround is to remove the NULL. This doesn’t affect MySQL 4.x or 5.0.
You can also follow along with Bug #33139.


