A major preoccupation of my day job is finding out ways to provide as much data as possible in open, reusable formats. In fact probably the only other thing that occupies more of my cognitive activity is the challenge of the user experience of getting people to that data.
As we start to make breakthroughs with that first preoccupation my ‘cognitive surplus’ has increasingly turned to what are the possibilities for innovation once we have this ‘open data lake’ in machine readable formats.
One thing I am particularly fascinated is what seems to be popularly termed ‘robot journalism.’ This is software that interrogates complex datasets and using magic* it produces written analysis rather than dashboards or visualisations.
The Associated Press already uses Wordsmith from Automated Insights to produce stories for their newswire especially in data heavy topics like finance & US sports (which are obsessed with stats). The Wire covered this phenomenon a couple of months ago where it also noted that Yahoo uses the same technology to provide ‘Fantasy Football’ coverage for a huge audience.
It is still a relatively young technology with Automated Insights and a company called Narrative Science dominating what they call ‘natural language generation’. What makes it particularly interesting, I think, is that you can feed the system from multiple types of structured data – spreadsheets, databases, APIs – and you can tune the language generation to better follow editorial guidance like Style.ONS for instance.
At the ONS we generate a lot of commentary about our statistics and a significant amount of it follows a common template month on month. Potentially a data-driven reporter in this mould could free the statisticians up to spend more time on identifying the real insights within the data, which is what they enjoy and where they add most value, rather than recycling content month on month just with new numbers. Time is always tight and deadlines are pretty much set in stone so you can see how software like this could find a place in the workflow. Now there are all sorts of questions about quality assurance and trust and I am not saying it is something that is on the horizon but I am seriously intrigued with the possibilities.
For something like the next Census in a few years the possibilities could be enormous – especially if you combine it with something like the location-generated stories the NY Times was recently experimenting with. Imagine it – select a location and the algorithm(s) produce the story based on a pre-approved template. Sprinkle in some location aware data visualisations and you could have a very compelling product.
This is all very blue-skies but as the AP are demonstrating it is happening right now. Like William Gibson once said;
The future is already here — it’s just not very evenly distributed.
* probably not but it seems that way🙂