Musings on Linked Data stuff

So apparently “the idea that doing linked data is really hard is a myth” according to attendees at last weeks LinkedDataLondon event. I have to admit to wondering what they are comparing it to then as it seems to me that it is anything but easy.

I am a little bit of a Linked Data skeptic I will admit. It still has too many undertones of the Semantic Web for me ever to be entirely comfortable with it and I’ll be honest and say I am not quite sure how the Open Data movement became so overtly associated with Linked Data. It seemed to me that it was only one option amongst many and now is increasingly pushed as THE solution. I guess the involvement of Tim Berners-Lee in the Governments Open Data programme was also going to lead to things having this spin as it is a direction he has pursued for many years.

The thing is I am willing to be convinced – most of the cleverest web folk I know are putting their intellectual and professional weight behind Linked Data and JISC is putting a not insignificant chunk of money into it as well so I have to keep an open mind.

I am interested to see the work the BBC have been putting into their Wildlife site in various areas and the concept of your ‘site as your API’ is compelling but the BBC is not your typical website nor web team and very few of the people talking about this stuff are actually the people who have to run big information rich websites on a day-to-day basis and deal with all the issues that brings up. Amongst those issues dealing with web publishers who do not even know HTML let alone understand RDFa.

Maybe the work Drupal are doing with Linked Data will make this all easier.

So does this mean a return to the old ‘webmaster’ model of running websites – where content was pushed to a central person/team who took care of mark-up, QA and publishing? Not saying this is a bad thing but in an era of job cuts I can’t see many teams being given the resources to achieve that.

A couple of other ideas came out of the discussions that made me wince a little.

The first was that it is more important for URIs to be persistent than to be readable by people (the machine vs human debate). I’m never going to be happy with this – I have spent half my career fighting to get away from dodgy, database generated URIs that make no sense to a web with readable URIs that are logical (I will forever use Traintimes.Org as an example here..) It seems to me that cleverer people than me have made the case that the tools exist within the HTTP/DNS world to achieve both persistence and readability. That said if it comes to a choice I know which side I’ll be on.

The other concept was one that I need to understand better but wince I did. Getting away from the file/folder metaphor as it is too limiting for the Linked Data web of ‘things’. This might well be true but it is a useful and understood way in explaining things on the web and unless people come come up with an equally understandable way of explaining things (and not just from one web scientist to another!) then that is a problem.

Maybe I am simply getting the wrong end of the stick on a consistent basis or maybe my original prejudices about the semantic web are too ingrained for anything similar to stand a chance with me. I do however hope that I do get by the lightning bolt soon and for it all to become clear.

I am in the process of reading Paul Millers’ Linked Data Horizon Scan that JISC funded and hopefully this will start to answer some of my questions. I’ll buy Paul a beer next time I see him if it does.

Updated: 09:42 01/03/2010 with link to http://www.frankieroberto.com/weblog/1621

5 thoughts on “Musings on Linked Data stuff

  1. Hi

    As the person that said doing linked data is a hard is a myth and as one of the people on the side of persistence I thought I should try to explain where I’m coming from on a few things:

    How hard is it? In my experience the thing that’s difficult is the basic IA – working out how to structure your information is difficult, but it’s difficult when designing and building any large site. Modelling is difficult. Publishing that information as RDF/XML is easy, it really is.

    URLs human readable and persistence – if you can create human readable and persistent URLs than that is marvelous and you should do it. Indeed that’s what I’ve done with bbc.co.uk/wildlifefinder. But it’s not always that easy – when you have very large volumes of data (e.g. bbc.co.uk/programmes) creating human readable URLs is not practical: you could do it, but the cost would be prohibitive. And if I had to choose between human readable URLs and persistence I would go for persistence. But that’s not for linked data reasons, it’s because I think it’s important because people bookmark, link to and discuss your content using your URLs and it’s best not to break them.

    Re files and folders – again this isn’t about linked data but a recognition that trying to define a mono-hierarchy isn’t intuitive as soon as you get lots of stuff. In my opinion modelling information and the relationships between bits of information in the way people think about it rather than shoehorning it into a file/folder structure makes sense.

    FWIW none of this means that content needs to be mediated via a webmaster – at the BBC the none of the people creating content on e.g. /programmes, /music nor /wildlifefinder deal with mark-up none of it is done via a webmaster. They just author content.

  2. Matt says:

    Tom

    Thanks for the comprehensive reply!

    Just to be clear from my standpoint I am absolutely in favour of digital preservation and understand the role of persistent ids in that world (afterall I do work for an organisation for whom it it a major preoccupation). I just hope that we can find a way to achieve that while maintaining human-readable URIs rather than the terrifying machine-generated identifiers I have seen suggested elsewhere. I can see why this might not be possible for an enormous dataset like /programmes but very few of us are operating on that scale.

    Like I said in my post I do for the most-part think the BBC is a unique case in an awful lot of ways if for no other reason that you have content creators producing markup! In the 10 years I have been involved with large, information rich CMS driven websites I can only think of a tiny percentage of people who would be able/willing to do that (and I wouldn’t be one of them). It has been my experience that the majority of people publishing on websites I’ve been involved with never looked beyond the WYSIWYG view and only supplied the minimum amount of metadata to get published – the idea that they would add RDFa seems a long way off.

    As for the files/folder thing – I regret writing that if I’m honest – not sure why it bugged me as its been a long time since I thought of websites in those terms – maybe it was nostagia!

  3. I’m not say, nor have I ever said, the all URLs should be opaque; but we did go with opaque URLs for /programmes because of the volume of data. Because if you have to choose I would pick persistence over human readable URLs, but not for LOD rasons.

    If you can have human readable URLs and persistence then you should do that. That’s what we’ve done for Wildlife Finder – if you can only have one I would go for persistence.

    I think I’ve also mislead you re what folks do vis-a-vis data entry. The data behind /programmes (and /music and Wildlife Finder) is stored within a database (as data not html) – to edit the info you see on a page nobody edits mark-up! No webmaster nor content producer. The webpage is generated dynamically from a web app – that’s true for all the representations (desktop, mobile and RDF/XML).

Comments are closed.