Below the Line Voting Data: Roll Your Own Analysis

I’ve had a few people ask where I got the data for my last couple of posts, so I thought I’d put together a quick how-to for performing the same analysis, or running your own queries. I’ve used MySQL for this, but you should be able to easily convert these statements to work with your data store of choice. (Feel free to post them in the comments!)

First of all, you’re going to need to get the raw data. The AEC provides this as a bunch of CSV files, available for download on their site. The 2010 data can be found here, the 2007 data can be found here. (I’ve contacted the AEC, but it seems they don’t have data available from before the 2007 federal election – I assume they just didn’t store the raw counting data.) Grab the National list of Candidates, and the State Below the Line Preferences files.

Next up, create some tables to hold this data. Here’s the table definition for the candidates table:

CREATE TABLE `2010_candidates` (
    `state` varchar(3) NOT NULL,
    `party` varchar(5) NOT NULL,
    `party_name` text NOT NULL,
    `candidate_id` bigint(20) NOT NULL,
    `surname` text NOT NULL,
    `name` text NOT NULL,
    `elected` varchar(1) NOT NULL,
    `historic_elected` varchar(1) NOT NULL,
    KEY `party` (`party`),
    KEY `candidate_state` (`candidate_id`,`state`)
) ENGINE=InnoDB;

You’ll also need to create a table for each state. You could put it in one big table, but I was doing all of this on my local machine with a limited amount of RAM, so I wanted to keep the tables small:

CREATE TABLE `2010_prefs_nsw` (
    `candidate_id` bigint(20) NOT NULL,
    `preference` int(11) NOT NULL,
    `batch` int(11) NOT NULL,
    `paper` int(11) NOT NULL,
    KEY `candidate_pref` (`candidate_id`,`preference`)
) ENGINE=InnoDB;

Importing the data is pretty easy. Just run this query for the Candidate file:

LOAD DATA LOCAL INFILE "/path/to/SenateCandidatesDownload-15508.csv" INTO TABLE 2010_candidates FIELDS TERMINATED BY ',' IGNORE 2 LINES;

Similarly, run this query for each state (modifying for the correct file name and table name, of course).

LOAD DATA LOCAL INFILE "/path/to/SenateStateBTLPreferences-15508-NSW.csv"  INTO TABLE 2010_prefs_nsw FIELDS TERMINATED BY ',' IGNORE 2 LINES;

You now have all of the raw data. I preferred to make a summary table to hold the preference count for each candidate:

CREATE TABLE `2010_prefs_summary` (
    `candidate_id` bigint(20) NOT NULL,
    `preference` int(11) NOT NULL,
    `preference_total` bigint(20) NOT NULL,
    KEY `candidate_pref` (`candidate_id`,`preference`)
) ENGINE=InnoDB;

Populating this table is pretty easy. Just repeat this query for each state (note the reference to the 2010_prefs_nsw table, and the “NSW” WHERE clause):

INSERT INTO 2010_prefs_summary
    ( SELECT
        c.candidate_id, p.preference, count( p.preference ) AS preference_total
    FROM
        2010_candidates c
    LEFT JOIN
        2010_prefs_nsw p
    ON
        c.candidate_id=p.candidate_id
    WHERE
        p.state="NSW" );

Now, grabbing the preference data for any candidate is quite easy, you just need to know their candidate_id (look it up in the 2010_candidates table):

SELECT * FROM 2010_prefs_summary WHERE candidate_id=12345;

Have fun! If you come up with anything interesting, let me know on Twitter, I’d love to hear about it!

2 comments

Mikel says:

September 24, 2010 at 7:58 am

Thanks for the suggestion, but what’s the deal with LOAD DATA INFILE? It’s been going for over an hour on my old PII server.
Gary Pendergast says:

September 24, 2010 at 8:01 am

It’s a pretty slow process, there are a lot of rows. It took ~20 minutes or so to load VIC on my Core i5. Try creating the table without the KEY, then adding that in after – it should speed up the process a bit.

Comments are closed.