The Problem with Purdah

There is no such thing as ‘purdah’. That is what the Cabinet Office press office would tell you if you asked. It is the ‘pre-election period’ and rather unusually this time (see Stefan’s post for why it is unusual) there is some comprehensive guidance available. I happen to think it is pretty useless but it is comprehensive.

The entire mysterious process of ‘purdah’ (and despite this guidance it does remain mysterious) does the public a huge disservice. It is amongst the things most frustrating about the Civil Service (and as you well know it is not like there aren’t any other contenders!).

Though lets be clear — I totally agree with the idea that the Civil Service should be seen as non-partisan and independent. You know what you are getting into when you join and the Civil Service code (another mysterious item referred too much more often than sighted) makes that clear. The problem with ‘purdah’ is that it actually undermines that non-partisan status — it says to organisations that actually we can’t trust you so we will instead gag you — no matter the cost.

I’m not going to really going into the effect of ‘purdah’ on individual Civil Servants other to say this. Like so much else the very spirit of what it is trying to achieve entirely belongs in a pre-internet era. When individual, personal Twitter accounts can count on more reach and engagement than official channels where should you draw the line? Because the all or enough muting of an entire work force cannot be a sustainable solution. All it encourages is stealth communications, anonymous accounts and general confusion.

My biggest problem with ‘purdah’ though is much more specific and born out of personal experience. The use (and mis-use) of data is becoming increasingly common in political campaigning and yet when we need them most the very body created to provide and police the use of that data is forced to say silent. The Government Statistical Service has a mission that reads;

High quality statistics, analysis and advice to help Britain make better decisions.

Yet at a time when Britain is make the most vital of decisions our independent experts on the data of the day are unable to provide analysis nor advice.

Now Michael Gove might no longer want to confer with experts but I think it is safe to say that they still have a role to play and benching them all at a time like this just seems increasingly like folly (and it isn’t just the statisticians, economists and demographers in Newport and Fareham — there are Civil Service experts all over the UK who will be biting their tongues for the next seven weeks.)

These experts, engaging in the right way, on the right channels, are the best weapons against #fakenews and not allowing them to operate is basically surrendering for the entire pre-election period.

There are amazing organisations like Full Fact who step into the breech but they themselves have called on the Government to re-examine the rules when it comes to these roles.

When I first pitched the idea that became Visual.ONS the ambition was always that it would provide an accessible route into ONS data — using interesting formats and providing timely tools based on what was happening in the world. This gets rather weakened though if you are forced to go dark for the entire period and not even promote the work that already exists!

Clearly nothing was going to change this time — a snap election caught everyone on the hop but there really needs to be some evidence of change. With Brexit consuming pretty much every corner of the Civil Service though I fear the guidance will just once again be put in a drawer (or the GDS web archive) to be dusted off in five years with no thought given to it in the meantime and who knows where we will be then!

I can say this now as I am for the first time in almost 15 years not covered by the Civil Service code or ‘purdah’ but in my heart I know I am not finished with that chapter and one day will return so I’ll keep pushing for a new approach to save my frustrations when I do!

API Days

In my last post I at least tried to make the case for;

Publish for humans, all of the humans. But don’t forget the machines.

This time I’m going to talk a little bit about what we might get from those machines — because I’m not convinced it is always what people are expecting.

While it can be easy to compare our website to something like GOV.UK or the other statistical institutes around the world I often find it more helpful to compare it to something like the Guardian website. Functionally we are essentially a publisher of multiple story/report formats each made up of multiple components (words, tables, charts, interactive tools, maps, images, spreadsheets — lots and lots of spreadsheets) with collaborative, multidisciplinary teams working to strict deadlines.

So when I came across a report about the use of open APIs by news organisations (primarily the Guardian, New York Times and NPR) by one of the original authors of the Cluetrain Manifesto — David Weinberger — I settled down to read and learn.

After all the ONS Beta site is essentially a set of APIs with a user interface (albeit one where we have sweated over every button, label and interaction) and Florence, our publishing application, is the same. We have a commitment, maybe even a responsibility, to encourage the use of our (open) data and providing open, public APIs have long been held up as a way of achieving this. We have made the underlying JSON available from day one (visible by appending /data to any URI) and documenting what is possible/available is task fighting its way up the backlog.

“It was a success in every dimension except the one we thought it would be.” Daniel Jacobson, former Director of Application Development at NPR

One of the stand out findings from the report is that when they released their APIs (all within months of each other back in 2008) the big motivator was that ‘it would let a thousand flowers bloom’. Developers would see this as something on which to build. Like Rufus Pollock once said;

“The best thing to do with your data will be thought of by someone else.”

The reality however was somewhat sobering. Despite an initial burst of development and innovation those thousand flowers never really materialised. However what it did do was almost provide an outsourced R&D function — they could all see what ideas people had even if they weren’t really fully formed and this influenced the direction of internal development.

That is important as where the focus on APIs absolutely proved its worth was in supporting internal development. All the teams spoken to found themselves able to react much more agilely to development demands (the most obvious for all of them being the release of the iPad) where they had APIs to build on. The embodiment of ‘eating your own dog food’.

There were other wins that are interesting to us — the ease and flexibility of syndicating stories and assets improved, it became easier and quicker to experiment and prototype new features and it was possible to constantly improve their CMSs.

Now obviously these lessons might not transfer to us but it is worth considering. I think there is still an expectation that if we can get the API right there will be an explosion of apps using our data.

Robert L. Read, one of the founders of 18F in the US, certainly seems to think there is still a built in audience for Government APIs and that apriority should be to ‘democratize the data’ first and foremost because technologists will provide expert interfaces* to that data/service faster than Government will create the UI. Hhhmmmm.

The more likely, to me, ‘customer’ is likely to be more enterprise in scale and be looking to hook up to their own systems — people like Bloomberg, the Financial Times and Local Authorities spring to mind. This would/will be important but doesn’t really do much for supporting our open data agenda as such — but good set of APIs, with useful documentation and solid performance should make everybody happy — so if there is a chance for those thousand flowers to bloom we need to be ready.

*he seems to suggest that this interface could simply be an expert intermediary.

Show your workings: a digital statistical publication

Russell has written a post that touches on some of his thinking about what a ‘digital white paper’ might look like and in doing so draws attention to Bret Victor’s tour de force of a ‘longread’ about climate change. The real brilliance of the work by Victor is that not only is it wonderfully interactive but it also fulfils that old staple of maths classes; ‘show your workings’.

Given where I work, my primary project and my recent reading & writing it probably isn’t a surprise that I have found this interesting.

One of the things I keep noodling with in my spare moments is what might a truly digital statistical publication look like. To be honest other, better qualified people, are looking at more immediate, practical responses to that question whereas I am really using it as something on which to hang various ideas and hunches about the future of digital publishing to give things some kind of structure.

So the ability to expose the methodology behind a particular statistic and make that explorable in place might make for an interesting experiment. Our user research has identified that there is an expectation that our statistics are methodologically sound above and beyond what is perhaps expected elsewhere and making that visible (it is always available and on our new site much more obvious) would provide pretty radical levels of transparency.

There is almost certainly something that can be learned from ‘open science’ here and in particular ideas about ‘open notebooks’. The more transparent you are the more trust you build in the results. That said we have very important disclosure rules to consider at all times so it isn’t as simple as providing all the underlying data to allow truly replicable ‘experiments’.

Our QMI documents (for example) provide a great source of information already but they are far from ‘digital first’ with most of the pertinent information locked away in a PDF. The challenge would be surfacing that in an ‘of the web’ rather than ‘on the web’ sort of way.

We already do a better job than Russell’s complaint about white papers;

“tables and the diagrams you get are included for the rhetorical power of their presence rather than any explanatory work they might do”

..and every report (we actually call them Bulletins but that is another blogpost) comes with a whole supporting ‘reference tables’ in Excel but it still feels a bit disjointed and the real power (I think) would be presenting the combined narrative and the data seamlessly and in a way where it can be queried and explored (while still providing the data free from words from those who like their statistics straight with no mixer.)

Given my role the big thing I am always thinking about with these ideas is are they repeatable. I have no interest in trying to provide a system that can support a thousand unique snowflakes (or god help us Snowfalls) so an additional challenge would be creating something that could work across multiple outputs.

Maybe. Anyway.

Pretty much at the same time as my writing this Leigh Dodds wrote a complimentary post that shows just how this kind of development could make things better.

A hobbyist hacker, spreadsheet warrior & an API advocate walked in to a pub…

..and they all tell the bar staff they can’t pour a pint.

That is basically the life of being a data publisher at the moment.

Leigh has written a great post about ‘Who is the intended audience for open data?’ where he articulates some of the problems and clashes of cultures better than I could but I just want to pick up on a couple of things.

For the purpose of this post I have magicked up three new ‘personas’ –

The hobbyist hacker — someone usually extremely technical, advocates (hard) for machine-readable open data, gives up a lot of their own free time to support hackdays and volunteer projects, has particular passions about how this stuff should be done and whose day job is in a parallel field (i.e. they work in software development or similar but not somewhere actually providing the data.) Just to be clear the hobbyist bit isn’t supposed to sound dismissive — I just like the way ‘hobbyist hacker’ scans 🙂

Like Leigh mentions this group can be pretty hard on any efforts to improve data release that is not focused on the needs of machines. Basically they want it to be (much easier) to programmatically query, aggregate and visualise the data (from multiple publishers). I absolutely understand where they are coming from, agree with the needs and have a lot of the same objectives in my role. The problem is that this isn’t the whole story.

The vast majority of ONS data users would fall in to the next entirely made up ‘persona’ — the spreadsheet warrior. These people work with ONS data day in and day out. Their weapon of choice though is not Python or even R — it is Excel. They are the audience the GSS (and FullFact) spreadsheet guidance sought to support. They need ‘human readable’ spreadsheets. The Venn diagram of needs between persona one and two would have more than its share of overlap but they are not the same. To be clear most data publishers (us included for sure) don’t do this well enough and if we are looking at a hierarchy of user needs (which I am not as I am just babbling away on a weekend) then providing the best possible experience here would be where we would get the biggest win (well assuming we made the data findable but that is another post).

The third, and final for the purposes of this, persona is the API advocate. This group — again for the purposes of this — tend to be somewhere between the two previous camps. They are often in-house technical staff whose role it is to support organisations who have to use the data day in and day out but they themselves are not always the end user. They share a lot of digital DNA with the hackers but the core difference is that they are trying to find a way to improve tools and processes at their day job. They see APIs (and open data more widely) as an opportunity to improve things but they need them to be documented, supported and stable as they want to integrate them in to, often vital, business systems. They are a growing, but still relatively small, group and one that really needs cultivating. My feeling is that finding a way to help these users will create an environment that improves the offerings to everyone.

Leigh ends his post with the advice

Publish for machines, but don’t forget the humans. All of the humans.

I’d just like to reorder that a little

Publish for humans, all of the humans. But don’t forget the machines.