PASC lives: Open data & statistics

One of the (many) new experiences I am encountering in this job is the scrutiny of the ‘Public Administration Select Committee (PASC)’. According to its Twitter bio PASC is;

“..a cross-party group of MPs appointed by the House of Commons to scrutinise the Government.”

At least in part the focus of my job has been shaped as a response to an earlier ‘study’ around the Communication of Statistics which spent some time investigating the limitations of the current website. The final repost didn’t pull any punches but also was a fair reflection of the state of play.

This week a new ‘study’ started focusing on ‘open data’ and ‘statistics’. As I find myself gravitating more and more towards the ONS open data work I found myself reading the submitted written evidence and transcripts with great interest (watching live was beyond the capabilities of my work laptop!).

You can read all the written evidence on the site  – in general I found it useful but that the focus was very much on just ‘open data’ rather than the interface between ‘open data’ and ‘official statistics’ which is where I think things might get a little trickier with more grey areas. The written submission from ‘Full Fact’ about the idea for ‘open statistics’ was more along these lines and was based on the famous (in some circles) five star ratings for data from Sir Tim.

Full Fact recommended the following;

0 – Published, with tabular data provided as spreadsheets. Although this is equivalent to 2* open data, in statistics it is a bare minimum. Statistics not published in this way would usually be in breach of the Code.)

1* – Basic metadata included. For example, the geographic scope of statistics (e.g. UK vs England and Wales) and whether or not financial time series are inflation-adjusted.

2* – The above, made available at a consistent URL (web address) with a consistent title or identifier and open machine readable standards used for data where applicable.

3* – All of the above, but include explanation and caveats.

4* – All of the above, but in addition to using open formats, use URLs to identify things using open standards and w3c recommendations so that other people can point at the data.

5* – All of the above, but in addition to using open formats and URLs to identify things, link your data to other people’s data to provide context.

I’m not sure if this is correct – I still don’t know enough about the statistics side of things but it certainly seems to be a good starting place for a conversation.

Helen Margetts, Director of the Oxford Internet Institute, summed up my ongoing issue with the world of ‘open data’ almost immediately in her oral evidence;

‘Open data’ is an interesting term and it gets many terms bundled into it…The confusion comes from the fact that it basically has three aims…They are: greater transparency; improving government, as somehow open data will make government better; and encouraging innovation and enterprise in the wider commercial world.

I think the OKFN definition of ‘open data’ that is also mentioned gets closer to the mark in reality but it is the constant bundling of other things under the umbrella of ‘open data’ that makes it a difficult thing to pin down.

Joining Professor Margetts giving evidence were Dr Ben Worthy a lecturer in Politics at Birkbeck College, Tom Steinberg the Director of MySociety, Heather Savory who is Chair of the Open Data User Group and Dr Rufus Pollock, CEO and founder of the Open Knowledge Foundation.

It was nice to see both Heather Savory and Rufus Pollock praise the ONS as leaders in Government ‘open data’ – if I’m honest I’d like to dig into this more with both of them and see what it is we do that they appreciate. It is nice to hear and clearly something we should really build on.

Tom Steinberg, who I admire greatly for his work with MySociety, came across a little battle scarred in his evidence (probably not surprisingly) but I thought he made a very practical point that might be missed in the rush for the big picture. To make the release of ‘open data’ efficient and cost effective we have to build that capability in from day one with our software solutions and this means changes at the procurement stage – not trying to bolt things on later.

I certainly learned a great deal from reading all of this and my overall impression is that more needs to be done to separate the different strands under the ‘open data’ banner and decide what it is we are trying to achieve. Is it about ‘transparency’, ‘economic benefits’ or some move to ‘open government’? Or maybe we are trying to achieve all of these things and ‘open data’ is just what fuels these different objectives?

I also still need to better understand the specific challenges for releasing the ONS underlying data given the issues around personal data and things like the ‘mosaic effect‘.

Any much I am sure to the amusement of Laura I continue to grow in my interest in this area (though I was happy to see little if any mention of the dreaded semantic web or linked data🙂 )

One thought on "PASC lives: Open data & statistics

