We need to talk about (the small-area) Brexit (voting data)

The BBC recently published a piece on small area referendum voting data. The information was gathered, through Freedom of Information requests, from less than half the local authorities (acting as ‘designated counting areas’) who administered the referendum. It’s a rather problematic piece of work.

This analysis is riddled with errors; there could be evidence of illegal practice; there are clear opportunities to improve practice around election data and the implications of this release have relevance to other realms of government data.

It is incorrect

The analysis and the underlying data are flawed for a number of reasons which immediately spring to mind;

Postal votes weren’t counted spatially. The article repeatedly notes that postal votes aren’t allocated to the areas of their electors beyond the principle area of the election (this is technically untrue, they’re *probably* not allocated to areas) but makes no significant attempts to account for this error.

Mass FOI mail-outs are not and cannot be a systematic data collection method. 44% of counting areas (rather, their corresponding local authorities) responded. No consideration of statistical confidence is made in the analysis, neither is any proportional demographic assessment conducted between responder and non-responders.

At its most basic level, comparing voters with population involves several layers of extrapolation. Even in an area with unprecedented high turnout it can feel spurious. Turnout is a subset of electorate, itself a subset of the eligible voters which in its turn is a subset of the population. The practice of sticking whole populations’ demographics from some nationally standardised data (usually the Census –  Hello 2011, don’t you feel like a lifetime ago!) on a scatterplot against the electorate data is, to be polite, rather crude.

I’d add that none of the above should necessarily invalidate the analysis; the disciplines of social research/political science are riddled with bias and I’ve seen far less rigorous analysis published to far greater acclaim. I do think they suggest that they mean the data warrants far more thought and debate than has so far accompanied it.

Some of it is possibly illegal.

A quick tangent into electoral practice: When you vote, you’re given an elector number. This is your sequence on the electoral register within an individual polling district. These are geographical areas allocated to individual ballot boxes in polling stations. These share boundaries with the relevant electoral geography for the election in question. Local government elections being the smallest geographical election they are also (usually) in line with parishes and wards.

And so, to the article:

A few councils released their data at remarkably localised levels, down even to individual polling districts (ie ballot boxes)…or clusters of two/three/four districts… Most places mixed boxes of postal and non-postal votes for counting, so generally it’s not possible to draw comparative conclusions. However there were a few exceptions which recorded them separately, or included a very small number of non-postal votes with the postals.

As identified by @richgreenhill on Twitter, it seems that the data released gives evidence that rule 46(2) of the referendum act which require that votes from individual ballot boxes are mixed with another and that postal votes are counted alongside individual ballots was broken. It is worth caveating here that there may be reasonable responses to all these (multi-polling district ballot boxes being one immediately mentioned).

There is also no legal requirement to produce electorate statistics at an area lower than the principle counting area. The Electoral Commission’s guidance (p3) notes only the requirement for a Counting Officer (responsible officer for each Counting Area) to declare for their voting area. This means that the process of allocation votes to a smaller geography is based on local administrative practice. Elections may be heavily regulated and systematic exercises, but there is a wealth of different on-the-ground practice around management and counting of papers and onward dissemination of data not explicitly prescribed in statutory notices. In addition to creating what I would suggest is at least a legal grey area this creates a further challenge to the validity of the data.

There is a case for change.

Whether a local authority made the internal, administrative decision to collate small-area results, the data was simply not collected for this purpose. Every authority who responded would have been justified in saying ‘no’; but many released. That such a large volume of data could be gathered with relative ease suggests that the standard practice, guidance and legislation is deficient.

Does this make it any different from any other Open Data on its slow route to the public domain or anything different from the next speculative FOI-based bit of journalism? Perhaps not, perhaps this is simply all part of the process, part of the ebb and flow of an increasingly data literate society. Whatever the context of the situation, I would argue that the above challenges mean continuing to release data in this way is not desirable.

The Electoral Commission are well placed to take a view on this, perhaps with support from organisations such as the Open Data Institute and I think it would be in their interests to do so sooner rather than later. I would imagine that we’ll be seeing more Freedom of Information requests of this nature over time and data collection practices will inevitability become more varied. In a time when public perception of political institutions is challenging and intense political divisions are presented as material fact, it seems the time is right to put some thought into this practice.

There are wider implications for data release.

If there does turn out to be evidence of electoral malpractice in this data, then whose responsibility is it to identify that? Is it individual local authorities, or does the author/requester have an accountability to understand the legal context in which they operate or is it sufficient to leave it to a wider, interested community? My initial take is that administrative data is now so prevalent in the public realm that no one stakeholder can practically take sole accountability for it, regardless of legal reality.

Finally, looking towards the end of the article, we can see elements which are relevant to any non-standardised release of data.

…releasing the information was up to the discretion of councils. While some were very willing, in other cases it required a lot of persistence and persuasion…A few places such as Birmingham released their… data …on their own initiative, but in most cases the information had to be obtained by us requesting it directly, and sometimes repeatedly, from the authority.

Whatever the outcome of this (and ongoing confusion is by far the most likely) debate, there are lessons about how creators and users behave with re-purposed data. It’s never been more critical to think about what we should legitimately, pragmatically and ethically do with it.

2 thoughts on “We need to talk about (the small-area) Brexit (voting data)”

  1. Many thanks for taking an interest in the article. You raise a number of detailed issues. I won’t go into everything for reasons of time, but focus on what I think are the key points.

    The main reason why various things you mention were not dealt with in the article was for reasons of space. At 3,700 words it was already very long for a piece of journalism intended for a general audience.

    To take one example: I did indeed consider possible ways in which to account for postal votes, as you suggest, but because generally we did not know how many postal votes had been added to each ward I decided in the end that seeking to estimate this would cause more problems of uncertainty than it would solve. And having decided not to go down this route, I didn’t think the fact that I had reflected on it and decided against was the most important detail to include in the piece, which as I say is aimed at a general audience, not people interested in statistical methodology.

    The areas for which data was available were indeed representative in terms of national vote shares. This was again something I examined. The r-squared figures were given, as the proportion of variation explained.

    Some of your points on methodology apply much more strongly to the numerous analyses published soon after the referendum which employed data at counting area level, and one reason for doing this much more granular analysis was precisely in order to mitigate such problems.

    But in any case fundamentally none of this invalidates the analysis and the overall broad conclusions, as indeed you yourself state. And to be clear: far from the article being ‘riddled with errors’, a statement you make while not actually pointing to any errors, no one has yet pointed out to me any errors at all in it.

    Obviously we would be better off if we had better data, but that’s not the world we’re in. My aim was to do something useful and informative with what we could get.

    I am also well aware of the legal issues you mention. We raised these with councils and with the Electoral Commission. I believe matters are still unresolved. But again I didn’t go into all of that because it was not the main focus of the article, which was about voting patterns rather than electoral law.

    I am not clear whether you think it would better if this information was released as open data. I’m also not clear if you think it is wrong for journalists to ask for it. Personally I can see the case for open data, as for a start it would have saved us a great deal of work. But meanwhile as journalists we seek to get hold of what information we can. The great deal of public interest in this article has confirmed to me that the effort was justified.

    Please note however that this was not an exercise in freedom of information in legal terms. As the article clearly states, Electoral Returning Officers are not by covered FOI, and releasing the data was at their discretion. If FOI had applied, I am sure we would have got more data, which would have assisted the analysis.

    1. Firstly, thanks hugely for reading and responding!

      On the points of methodology and validity of criticisms of same,then I’ll agree to disagree, my aim was to outline limitations of the methodology and I believe I did so.

      On more specific points – “The areas for which data was available were indeed representative in terms of national vote shares.” – Would you mind sharing your working on that? In particular, what metric/s you used to make that determination? FOI respondents are by their nature self-selecting so I’d imagine applying any sampling methodology would be difficult.

      With regards the exposure of legal risk, I’m not meaning to criticise your activity for that, simply observe the exposure. My hope is to influence policy/practice, but neither would I want to see anyone accidentally land themselves in trouble.

      A further point was to expressly note that trying to draw finer detail from random administrative practice across multiple local authorities is a bloody hard job, the old apples and pears analogy springs to mind. That you attempt it at all is to your absolute credit and I think your lines of reasoning are sound, the flaws lie in application and in the underlying data.

      I certainly don’t want to come across as being anti-FOI but I’d certainly rather such things were handled consistently. More practically, I’m not sure the issue of FOI being applied would have fixed much in terms of my concerns, it simply wouldn’t have been administratively possible to disaggregate the data as you’d have requested unless if was predefined requirement earlier in the election.

      As to my wider aim? Whilst I don’t know what the answer is; I think someone (again, the Electoral Commission seem well placed) could provide some outline guidance on the issue of releasing electoral data at a geography lower than its primary administrative area. I’d hate to see this kind of partial analysis become the norm without a good understanding from all sides about what it does and doesn’t mean. You’ve pushed a new boundary of administrative data, that’s important and interesting but needs more consideration than could be provided in a 3,700 word article. I’m trying to provoke that.

Leave a Reply

Your email address will not be published. Required fields are marked *