Using open data to understand voter registration levels

With an election coming up, thoughts turn to registering to vote; campaigns at all levels will start encouraging people to register and participate.

From an open data perspective, the we want to know what could be learnt about electoral registration from the data held locally or in national sources and could it be used to help increase registration levels?

A sub-set of that question was then ‘how can we understand which areas have higher or lower levels of registration in Bath and North East Somerset?’.

This post intends to show it is possible (and reasonably easy) for anyone to run a quick analysis to answer these questions, provided the local authority is willing and able to release some electoral registration statistics.

We were able to achieve this to a small geographical level and publish some findings reasonably easily and it should be easy for any local area to repeat this.

Continue reading “Using open data to understand voter registration levels”

Why I stopped worrying and learnt to love local geography.

I was fortunate enough to get the opportunity recently to talk with friends at Bath:Hacked about local geographies. This was in part to advertise the upcoming Boundary Review of the area, but also a chance to reflect on the many ways we slice and dice the local authority area.

This was something of a semi-structured ramble through geographies against the ONS’ rather wonderful hierarchical representation of  statistical geographies.

In practice, particularly when thinking about any locality, its geography is intertwined with its history, its natural and ecological setting, its psychogeography (if you’re into such things) and so forth. But those caveats notwithstanding, it was a fun exercise.

Download the slides here.

 

 

 

 

Top 20 Datasets for UK Local Government.

I thought

was an interesting thought piece. Could we even identify them, let alone publish them? Here follows a very quick rapid reflection on it…

Continue reading “Top 20 Datasets for UK Local Government.”

D3 learning Part 3 – linking to live data

Previously I’ve managed to display a simple d3.js visualisation on the site through a couple of different methods. Next step is to go from proving a general concept to actually displaying some data that might be interesting.

So, what steps did I go through:

1. Find some lovely data

I wanted something interesting to play with that was home related; Mendip Council’s open data offerings are less than impressive, so national agencies it is; I recalled there was and there’s a rainfall monitoring station only a mile or so from home. So why not try to build a graph that shows recent rainfall (say, the last 100 recording incidences) for where I live (notwithstanding the obvious empirical alternatives to understanding like looking out of the window)

The data is already well defined and it was easy to determine the Frome station (531108) thanks to the  EA API demonstrator.

A bit of digging with that demonstrator showed it is possible  to call  a .csv file with the data that I want to visualise in a fairly basic three column format.

2. Find a viz

I have found a simple chart I like the look of (well, actually it’s ugly as hell, but I’ve got a load of time data and a value – a basic line chart is a good starting point.)

First thing I did was upload the csv and the html files to my site to check that there wasn’t something odd that WordPress might do with them.

There wasn’t

3. Make that viz look at the data

Now I need to stick my data to the viz. The data being produced by the  .csv associated with that example vis obviously doesn’t look like my EA data. I got the index.html into a text editor and started to look at the variables I think define the data.

4. The Detective work 1) Date formatting

The Environment Agency data isn’t in the same format as that in the example. So finding the right bit of code which determines that was the first job. It’s

var parseDate = d3.time.format("%d-%b-%y").parse

(which was easy to determine thanks to excellent notation in the d3 code)

D3 has its own formatting guidelines (the documentation was comprehensive and made a broad level of sense. We are relatively fortunate in that our EA data comes in a standard ISO data format for date/time (ISO 8601). A bit of googling gave a d3 format with which to replace the bits in the “”, specifically

"%Y-%m-%dT%H:%M:%SZ"

(source)

5) More detective work 2) Calling the csv file.

Rather than calling from a static csv file already on my site, I want to link to that live csv (that’s the exciting bit)

d3.csv("http://intelligentplaces.org.uk/wp-content/uploads/2017/02/lineclean.csv", function(error, data) {
 data.forEach(function(d) {
 d.date = parseDate(d.date);
 d.close = +d.close;
 });

I want to get it from the csv I discussed earlier. Which means fiddling with this a bit…

I *think* the d.{variable} bit is where we’re telling d3 how to handle the individual column in the csv.

Assuming that is the case, then I tried

d3.csv("http://environment.data.gov.uk/flood-monitoring/id/stations/531108/readings.csv?_limit=100&_sorted¶meter=rainfall", function(error, data) {
 data.forEach(function(d) {
 d.DateTime = parseDate(d.date);
 d.value = +d.close;

And…

Nothing.

A whole page of blankness.

6) Error checking – following through the fields (and double checking blank values).

Now the fact that there is simply nothing to be seen leads me to believe that either the data is simply not there (and because the data is defined relative to itself, there’s nothing to show from nothing) or that I’ve bodged up one of the above. I’ve no idea whether there’s error handling built into this – I’m assuming not.

First I did a formatting/sense check on the code; I amended all the other references to the previous fields with my newly defined ones and I got a really boring graph

Now it’s not rained much recently (it’s snowed a bit, but I’m guessing a “tipping bucket” needs a bit more than some wimpy flurries to register).

So I swapped weather station URL for somewhere near the Pennines (559100R) where there’d been a bit more rain recently and…

Whooop!, data, visualized using code and linked to a live source.

I have now reset the live data back to Frome.

7) Note next steps.

This worked to prove the concept; but there’s a lot more to do before

  •  Find a way of displaying something when there’s no data because of a nil return, as opposed to a processing error, can’t immediately tell at the moment.
  • Find a visualisation that isn’t boring as hell. is more visually appealing
  • Find out more about the underlying data, could there be some some appropriate presets for the axes?.
  • Work out how to have a bit more control over the x axis, those dates aren’t pretty and labelling is essential.
  • Marvel at the decades you’ve been working with data and only just discovered that axes is the plural of axis.

We need to talk about (the small-area) Brexit (voting data)

The BBC recently published a piece on small area referendum voting data. The information was gathered, through Freedom of Information requests, from less than half the local authorities (acting as ‘designated counting areas’) who administered the referendum. It’s a rather problematic piece of work.

This analysis is riddled with errors; there could be evidence of illegal practice; there are clear opportunities to improve practice around election data and the implications of this release have relevance to other realms of government data.

It is incorrect

The analysis and the underlying data are flawed for a number of reasons which immediately spring to mind;

Postal votes weren’t counted spatially. The article repeatedly notes that postal votes aren’t allocated to the areas of their electors beyond the principle area of the election (this is technically untrue, they’re *probably* not allocated to areas) but makes no significant attempts to account for this error.

Mass FOI mail-outs are not and cannot be a systematic data collection method. 44% of counting areas (rather, their corresponding local authorities) responded. No consideration of statistical confidence is made in the analysis, neither is any proportional demographic assessment conducted between responder and non-responders.

At its most basic level, comparing voters with population involves several layers of extrapolation. Even in an area with unprecedented high turnout it can feel spurious. Turnout is a subset of electorate, itself a subset of the eligible voters which in its turn is a subset of the population. The practice of sticking whole populations’ demographics from some nationally standardised data (usually the Census –  Hello 2011, don’t you feel like a lifetime ago!) on a scatterplot against the electorate data is, to be polite, rather crude.

I’d add that none of the above should necessarily invalidate the analysis; the disciplines of social research/political science are riddled with bias and I’ve seen far less rigorous analysis published to far greater acclaim. I do think they suggest that they mean the data warrants far more thought and debate than has so far accompanied it.

Some of it is possibly illegal.

A quick tangent into electoral practice: When you vote, you’re given an elector number. This is your sequence on the electoral register within an individual polling district. These are geographical areas allocated to individual ballot boxes in polling stations. These share boundaries with the relevant electoral geography for the election in question. Local government elections being the smallest geographical election they are also (usually) in line with parishes and wards.

And so, to the article:

A few councils released their data at remarkably localised levels, down even to individual polling districts (ie ballot boxes)…or clusters of two/three/four districts… Most places mixed boxes of postal and non-postal votes for counting, so generally it’s not possible to draw comparative conclusions. However there were a few exceptions which recorded them separately, or included a very small number of non-postal votes with the postals.

As identified by @richgreenhill on Twitter, it seems that the data released gives evidence that rule 46(2) of the referendum act which require that votes from individual ballot boxes are mixed with another and that postal votes are counted alongside individual ballots was broken. It is worth caveating here that there may be reasonable responses to all these (multi-polling district ballot boxes being one immediately mentioned).

There is also no legal requirement to produce electorate statistics at an area lower than the principle counting area. The Electoral Commission’s guidance (p3) notes only the requirement for a Counting Officer (responsible officer for each Counting Area) to declare for their voting area. This means that the process of allocation votes to a smaller geography is based on local administrative practice. Elections may be heavily regulated and systematic exercises, but there is a wealth of different on-the-ground practice around management and counting of papers and onward dissemination of data not explicitly prescribed in statutory notices. In addition to creating what I would suggest is at least a legal grey area this creates a further challenge to the validity of the data.

There is a case for change.

Whether a local authority made the internal, administrative decision to collate small-area results, the data was simply not collected for this purpose. Every authority who responded would have been justified in saying ‘no’; but many released. That such a large volume of data could be gathered with relative ease suggests that the standard practice, guidance and legislation is deficient.

Does this make it any different from any other Open Data on its slow route to the public domain or anything different from the next speculative FOI-based bit of journalism? Perhaps not, perhaps this is simply all part of the process, part of the ebb and flow of an increasingly data literate society. Whatever the context of the situation, I would argue that the above challenges mean continuing to release data in this way is not desirable.

The Electoral Commission are well placed to take a view on this, perhaps with support from organisations such as the Open Data Institute and I think it would be in their interests to do so sooner rather than later. I would imagine that we’ll be seeing more Freedom of Information requests of this nature over time and data collection practices will inevitability become more varied. In a time when public perception of political institutions is challenging and intense political divisions are presented as material fact, it seems the time is right to put some thought into this practice.

There are wider implications for data release.

If there does turn out to be evidence of electoral malpractice in this data, then whose responsibility is it to identify that? Is it individual local authorities, or does the author/requester have an accountability to understand the legal context in which they operate or is it sufficient to leave it to a wider, interested community? My initial take is that administrative data is now so prevalent in the public realm that no one stakeholder can practically take sole accountability for it, regardless of legal reality.

Finally, looking towards the end of the article, we can see elements which are relevant to any non-standardised release of data.

…releasing the information was up to the discretion of councils. While some were very willing, in other cases it required a lot of persistence and persuasion…A few places such as Birmingham released their… data …on their own initiative, but in most cases the information had to be obtained by us requesting it directly, and sometimes repeatedly, from the authority.

Whatever the outcome of this (and ongoing confusion is by far the most likely) debate, there are lessons about how creators and users behave with re-purposed data. It’s never been more critical to think about what we should legitimately, pragmatically and ethically do with it.

Four reasons local government data got difficult.

Pressures on local government are well documented; the pressures faced by local government research (by which I mean analytics in all its flavours) maybe less so?

I believe there are four key challenges we currently face.

1. There is more data than ever before

Surely the most self evident truth facing anyone working in any field, let alone one in which data is an important currency. Every service (from Abandoned shopping trolleys to Zoos) adds realms of transactional data daily to the corpus of local government knowledge. Changing technologies allow more immediate feedback, more automation of transactions, more sensing all create more and more data.This creates opportunities, possibilities and a popular narrative that Something Should Be Done with this data, which creates increased demand. We are asked to anticipate this demand and help decision makers understand the opportunities, challenges and risks in all this data.

2. There are new disciplines and technologies

and how.

“Big Data”, “Data Visualisation”, “Guided Analytics”, “Open Data”, “Data Science”, “Personal Analytics”, “Predictive Analytics”, “Machine Learning”, “Risk stratification”, “Natural-language question answering”…

are just a handful of the phrases and concepts doing the rounds. Some will inevitably become little more than redundant jargon while some may cause the same sort of disruptive shift as data.police.uk did when it rendered my first job in this field (taking crime data out of a database to do longitudinal analysis) effectively pointless.

Local government is often painted as being behind the times, but the market will catch up to us, even if we don’t catch up with it. Our citizens, our partners and our service providers will be live to these technologies and so will we if we want to continue to work with them.

3. We have fewer resources

This isn’t a place to debate the politics and practice of public sector austerity. Neither, however, is it possible not to consider the size (both in £s and narrative) of the budget reductions experienced by the sector in recent years. Research is the very definition of a back office function, its value lies in influence and abstracted outcomes, rather than its output. It’s not hard to see why it could be a particularly tempting function to consider as an “efficiency saving”.

It has never been more critical to demonstrate ongoing impact and effectiveness of our work, particularly in the light of the previously mentioned demand pressures. As well as being live to new developments we have to strive to ensure that there is a genuine and politically articulated value to our work.

4. Local government reform has arrived

Significant Local Government reform is happening, even it’s not been so structured as it has been in the past. The Devolution agenda is creating new public bodies, with new powers and has the potential over time to radically reshape what local government can do. The opportunities held within a new public body to react to the above pressures can’t be underestimated, but it won’t do so by behaving in the same ways as before. These new ways of working will engender new approaches to research and analytics, but what will they be?

What happens next?

it’s clear there are a number of distinct approaches emerging as to how local areas are responding to these challenges. A review of these will form some future posts.

D3 in WordPress Pt 2 – A more direct approach?

In fiddling around with wp-d3 I found myself getting increasingly confused. Particularly when faced with code that looks like this. It wasn’t at all clear* how to translate it into something that would be understood by the plugin.

Once again I returned to searching and this time came across this tutorial, which used an iframe (I’ve come across these before in embedding tableau content in our wiki – they’re sort of a window to point to content somewhere else, I think?)

This approach to cut/paste/load seems infinitely more intuitive to me than the wp-d3 approach and seems to have delivered something reasonable. I’m sure there are 1001 great reasons not to take an iframe approach, but this seems to have worked…

 

Next steps are to get some real content into this and then start fiddling with the design…

*big caveat – neither should it be: I am not intending to be critical of it, just that it made no sense to me as a complete newcomer to this.

Friday Lunchtime Learning – First Time with D3

First in an occasional series, maybe?

I’ve long been of the opinion that anyone who’s going to want to be anywhere near employment in the analytics market is going to have to learn to code sooner rather than later.

No, don’t panic, I’ve not done that.

Continue reading “Friday Lunchtime Learning – First Time with D3”