A hobbyist hacker, spreadsheet warrior & an API advocate walked in to a pub…

..and they all tell the bar staff they can’t pour a pint.

That is basically the life of being a data publisher at the moment.

Leigh has written a great post about ‘Who is the intended audience for open data?’ where he articulates some of the problems and clashes of cultures better than I could but I just want to pick up on a couple of things.

For the purpose of this post I have magicked up three new ‘personas’ –

The hobbyist hacker — someone usually extremely technical, advocates (hard) for machine-readable open data, gives up a lot of their own free time to support hackdays and volunteer projects, has particular passions about how this stuff should be done and whose day job is in a parallel field (i.e. they work in software development or similar but not somewhere actually providing the data.) Just to be clear the hobbyist bit isn’t supposed to sound dismissive — I just like the way ‘hobbyist hacker’ scans :)

Like Leigh mentions this group can be pretty hard on any efforts to improve data release that is not focused on the needs of machines. Basically they want it to be (much easier) to programmatically query, aggregate and visualise the data (from multiple publishers). I absolutely understand where they are coming from, agree with the needs and have a lot of the same objectives in my role. The problem is that this isn’t the whole story.

The vast majority of ONS data users would fall in to the next entirely made up ‘persona’ — the spreadsheet warrior. These people work with ONS data day in and day out. Their weapon of choice though is not Python or even R — it is Excel. They are the audience the GSS (and FullFact) spreadsheet guidance sought to support. They need ‘human readable’ spreadsheets. The Venn diagram of needs between persona one and two would have more than its share of overlap but they are not the same. To be clear most data publishers (us included for sure) don’t do this well enough and if we are looking at a hierarchy of user needs (which I am not as I am just babbling away on a weekend) then providing the best possible experience here would be where we would get the biggest win (well assuming we made the data findable but that is another post).

The third, and final for the purposes of this, persona is the API advocate. This group — again for the purposes of this — tend to be somewhere between the two previous camps. They are often in-house technical staff whose role it is to support organisations who have to use the data day in and day out but they themselves are not always the end user. They share a lot of digital DNA with the hackers but the core difference is that they are trying to find a way to improve tools and processes at their day job. They see APIs (and open data more widely) as an opportunity to improve things but they need them to be documented, supported and stable as they want to integrate them in to, often vital, business systems. They are a growing, but still relatively small, group and one that really needs cultivating. My feeling is that finding a way to help these users will create an environment that improves the offerings to everyone.

Leigh ends his post with the advice

Publish for machines, but don’t forget the humans. All of the humans.

I’d just like to reorder that a little

Publish for humans, all of the humans. But don’t forget the machines.