Below the Line Voting Data: Roll Your Own Analysis

I’ve had a few people ask where I got the data for my last couple of posts, so I thought I’d put together a quick how-to for performing the same analysis, or running your own queries. I’ve used MySQL for this, but you should be able to easily convert these statements to work with your data store of choice. (Feel free to post them in the comments!)

First of all, you’re going to need to get the raw data. The AEC provides this as a bunch of CSV files, available for download on their site. The 2010 data can be found here, the 2007 data can be found here.1 Grab the National list of Candidates, and the State Below the Line Preferences files.

Next up, create some tables to hold this data. Here’s the table definition for the candidates table:

You’ll also need to create a table for each state. You could put it in one big table, but I was doing all of this on my local machine with a limited amount of RAM, so I wanted to keep the tables small:

Importing the data is pretty easy. Just run this query for the Candidate file:

Similarly, run this query for each state (modifying for the correct file name and table name, of course).

You now have all of the raw data. I preferred to make a summary table to hold the preference count for each candidate:

Populating this table is pretty easy. Just repeat this query for each state (note the reference to the 2010_prefs_nsw table, and the “NSW” WHERE clause):

Now, grabbing the preference data for any candidate is quite easy, you just need to know their candidate_id (look it up in the 2010_candidates table):

Have fun! If you come up with anything interesting, let me know on Twitter, I’d love to hear about it!

  1. I’ve contacted the AEC, but it seems they don’t have data available from before the 2007 federal election – I assume they just didn’t store the raw counting data.

2 replies on “Below the Line Voting Data: Roll Your Own Analysis”

  1. It’s a pretty slow process, there are a lot of rows. It took ~20 minutes or so to load VIC on my Core i5. Try creating the table without the KEY, then adding that in after – it should speed up the process a bit.

Comments are closed.