Using open data to understand voter registration levels

With an election coming up, thoughts turn to registering to vote; campaigns at all levels will start encouraging people to register and participate.

From an open data perspective, the we want to know what could be learnt about electoral registration from the data held locally or in national sources and could it be used to help increase registration levels?

A sub-set of that question was then ‘how can we understand which areas have higher or lower levels of registration in Bath and North East Somerset?’.

This post intends to show it is possible (and reasonably easy) for anyone to run a quick analysis to answer these questions, provided the local authority is willing and able to release some electoral registration statistics.

We were able to achieve this to a small geographical level and publish some findings reasonably easily and it should be easy for any local area to repeat this.

Getting Started

In order to understand registration levels we needed to be able to calculate a registered elector base and a population base against which to compare it.

Electoral Registration data

Thanks to an upcoming boundary review we’ve spent a lot of time thinking about electoral statistics outside the normal time frame of elections. This meant we had good access to electoral registration data.

The standard geographical unit is a polling district; which is great for defining a polling station – not so good for any statistical analysis.

The full published electoral register is, naturally, sensitive personal information, but is regularly manipulated and aggregated, both for producing electoral statistics against relevant elections and for administrative purposes. It was a straightforward piece of work to aggregate the register to postcode (the data structure of our register was in good order and was linked to the local land and property gazetteer which made this process straightforward), along with their associated administrative geographies. This work would need to be undertaken by local authority employees or any other person authorised to process this data.

The challenge naturally comes in removing the risk of inadvertently identifying an individual. Some postcodes can contain one property. A simple, if not somewhat crude, mechanism for mitigating this was to randomly adjust any postcode with less than 10 electors to a value between 1 and 9. As a caveat this analysis excludes non-postcoded addresses (people living on narrow boats for instance).

The randomised allocation of small numbers and exclusion of non-addresses probably built an approximate 5-6% level of error in at the start of the analysis. This would probably be unacceptable from a formal administration purpose, but for a bit of mild statistical inquiry; that seems reasonable.

This data is available from as

Population Data

Having derived an electorate we then have to find a mechanism for aligning that data to a population. Electoral data will never fully match to a population for a number of reasons, including:

  • Students don’t have to register where they are studying, they may chose to register at home.
  • Some residents are not eligible to vote (Under 18s, certain foreign nationals etc.)

The census produced postcode headcounts only give a headline population number, so they’re not really suitable. The next step was to look at some of the administrative geographies available. My personal favourite is Lower Super Output Areas, as they’re large enough to give a reasonable statistical coverage as well as being small enough to show distinct geographical variation at a small area.

The Census at these geographies gives great nationality and student status data, but at 2011, is really out of date. We know student numbers have changed locally and we know that there has been significant house building over the last six years. With that in mind we settled on the 18+ (All persons) mid-year population area estimates at the LSOA level as a population base.

The Mid-year population estimates and the lookups for matching postcode to LSOA are available from the ONS.

Once the data was aligned it was a simple calculation to derive a registration rate of (derived LSOA electorate)/(LSOA population aged 18+)

Visualizing the data

We used Microsoft PowerBI; a free piece of software (well, not totally, bug for the purposes of this exercise we have only used the free elements of it) to create an interactive with key pieces of information that hopefully make it more accessible.

What can we learn from this analysis?

The most obvious use would be to target specific geographical communities who appear under-registered; Council communication teams might be interested, in the run up to elections or during registration campaigns, as any local organisation with an interest in increasing registration levels.

We could also use it to help understand our population more effectively. If we look at the two outlying geographies, they both contain student halls of residence, but why does the registered population of one overtake the population and one seem incredibly low. This could be down to nationality, to how the ONS calculate population levels (I have a theory about GP registrations, for instance, but it’s guesswork), but more work could be done.

If it was deemed valuable then there are improvements that could be made at all levels of the analysis to improve accuracy or validity, certainly about producing the initial data file.

Finally, I’d also hope we’ve showed how easy it would be for any local authority area to do work of this nature – say hello on Twitter @jonpoole – to find out more about how we did it.

Leave a Reply

Your email address will not be published. Required fields are marked *