Data Discovery Alpha: An Overview

We have mentioned our Data Discovery Alpha a few times in ONS Digital posts over the last few weeks, so I thought it would be useful to put together a little bit of an overview of the work we have done so far and why this is so important to our corporate website.

The website rebuild that went live earlier this year gave us a great foundation to build on, but we have an awful lot more work that we need to do to ensure the site continues to evolve in line with our users’ needs. The Data Discovery project is a code name for an evolution of the site. It isn’t a single new thing, it won’t have a URL or a brand, it is just a package of work to make the ONS site better.

To be a little more specific about that, we have things we are trying to do:

  • enable users to find data more easily
  • enable users to customise datasets
    (so they don’t have to download everything about a subject)
  • enable users to interact with data at lower geographic levels (for example, data for Newport West)

To do this, we need to alter the way that data gets published. Currently the site is formed by a collection of JSON templates that are rendered as HTML. Within these pages we hold a collection of (mainly) .xls files that we ask users to download, with at times, very little context about what is contained within them. We are instead attempting to build a process whereby those .xls files have all the metadata around code lists, attributes and so on automatically extracted from them (as part of a process to make them easier to find within the site). The original spreadsheets are then broken down row by row and pushed into a database so that we can dynamically serve the exact content the user is interested in.

This should allow our users to ask for much smaller sections of the data, with a much clearer understanding up front of what the files they request will contain.

So far, so technically simple. Right? Well, kind of. However, the challenge of how to represent this within the website has proved to be a huge test for the team here so far. Knowing when to expose the complexity of statistical geographies (the thing that happens when you take the concept of all the different geographical systems we use in the UK and add in the additional bonus of statistical production) and how to describe, in plain terms, the process of breaking down complex data is, to be honest, really tricky.

So – we’ve published our working out on Github. You will find every prototype we have built, the user research we undertook on them, the choices we made based on that research and the analytics work we use to keep us honest. Take a look!

Did you click on the link? If you are reading this post, you really should have done. It is interesting stuff. We work to the Government Digital Service (GDS) service assessment, so pretty much all of this uses a helpful framework which guides us when producing this kind of work.

We are giving ourselves another couple of months to really refine our thinking here and if we can validate that this is indeed what our users are looking for, we will look to start the work of building it for real.

You can help us with this process by filling in our questionnaire (which will allow you to sign up as a user tester of our new work).

Or tell us directly what you think! I am here on twitter if the feedback is short and the comments section below will stay open on this post for more detailed responses. Please do get in touch.