Contemplating a content migration plan B


It is becoming more and more apparent that I might actually have to *do* something about this new website I have been talking about for about 18 months. The project to procure a CMS has some real momentum and for the first time probably I think it is going to happen this year. [No idea what that CMS will be though – but whatever it is will be a massive step forward!]

The technical side of the new site will clearly be driven by the CMS and the visual design is likely to be driven by outside forces to some extent as well so I am not worrying *too* much about that yet.

The two things that are starting to preoccupy me though are the information architecture (which continues to vex me as it has for months but I think we are closing in on a schema which will be tested this summer) and also the content.

In particular something I am wondering about the existing content. We don’t have a huge site – Google only registers about 9k URLs including all the ‘vanity URLs’, redirects and documents – I reckon we only have 1500 HTML pages and a similar amount of PDFs etc if truth be told. The nature of a large amount of our content also is that it is very time sensitive (funding calls have a time window as do news stories) and traffic drops off a cliff pretty sharpish for certain pages.

I was re-reading this piece by Gerry McGovern recently – Web content migration: disastrous strategy – and it got me to thinking about whether our planned ‘lift and shift’ strategy was really the best course of action.

What I would like to do is start anew on the new site – select the most sought after content, freshen it up (alot in some cases), rethink the format of some content from the ground up, fill in some of the gaps we know we have and basically launch the new site with content that really has a use focus. This all sounds a little like a ‘content strategy’ I guess – just as well I am going to Bathcamp on Wednesday 🙂

For now I am more interested though in what to do with all the pages that wouldn’t be migrated. I am writing a proposal based on the following idea but am hoping someone who reads this will point out if it is stupid in advance of me sending it to anyone important 🙂

The idea is to use software like Heritrix (if I can ever get my head around it) or perhaps a company like Hanzo (if funds permit) to create a web archive snapshot of the website prior to the launch of the new site.

This ‘archive’ would be hosted at a domain like http://www.archive.mrc.ac.uk (for instance) and all existing pages that are not reproduced on the new site would be 301 redirected to there. A pop-up (or modal dialogue!) in the style of Gov.UK warnings would warm users that it was an archived page and no longer updated but the Google juice would not be lost and we would maintain persistence around our URLs.

Does this sound sensible? If content needed to be un-archived as it were then I can imagine it could get a bit tangled and that needs more thought.


4 responses to “Contemplating a content migration plan B”

  1. That is nearly what we are going to be doing to our existing/new stuff in the coming weeks. Content that is being kept/improved/merged will be in the new site. All the stuff that is effectively for the archive will be kept where it is and there will be a banner (set to postion:fixed so that is always visible) saying if there is an updated version with a link or just to warn it is old. Banner rather than the pop-up as i have struggled with the popup on mobile/small viewports.
    Toyed with the idea of an archive url BUT so what if it old, if it has been bookmarked/discovered I ultimately think it fine to be where it is and no need to be in a special section/archive.

    Will let you know how it goes

  2. Sounds very similar to what I’m planning to do with our new site. To get the new site running for the deadline I’ve been given will need me to develop in agile sprints, producing the highest priority content first, run both sites in parallel and rely heavily on linking to the existing site while the new one is built up (gulp!)

    Anyway, It is possible, I’ve been down this path before with BBC Local where sites were run in parallel for the transition period. The easy bit for me is that I don’t have to create an archive as you will, I’ll just keep Joomla running for a few months longer for existing content.

    In the past I’ve just used wget and a little bit of perl or sed (to convert absolute links to the new hostname) to create an archive site. I’m more than happy to take a look at getting you a functional archive site if you’re having trouble getting your head around Heritrix. Drop me a line.

  3. Cheers Phil – your project sounds pretty challenging as well! I am going to try DeepVacuum as suggested by Steph Gray – it seems to be wget based and I’ll see what it gives me.

    Zak – I think I’ll end up with the banner style *and* the popup – we aren’t too worried about mobile browsers for most of this older content but we are worried about people not realising it is out of date or no longer supported (which doesn’t mean it won’t still be useful for a small % of our users.)

%d bloggers like this: