Dueling with datasets

As you will already have noticed, there have been some big changes to the way data is structured on the new website. The /data feature has already been discussed on the launch post so instead this will be looking in more detail at the changes to our other datasets and the journeys designed to get you to them.

Back in the discovery phase before the Alpha and Beta we started looking at how the (mostly) Excel spreadsheets we publish could be better presented on the site. When talking to users some clear issues came up over and over again about the way we were making these available on the (now old) website. We boiled these down into a number of user needs;

  • access to historic data
  • context – don’t make me download the data to find out what is in it
  • be sure the data I am looking at is the latest estimate available
  • to find previous versions of tables
  • be able to search effectively for data both on site and through Google
  • be able to re-find data I use regularly
  • clear access to supporting information

There were others, but these were the key ones we felt the new website could, and should, be addressing.

Looking back

So before I explain the new approach, it is probably worth a quick review of how data was structured on the old website. Each reference table, as they were named, was linked to a release; every time this was published a new version of this table was created, at a new location. These were presented as title and a short description and linked directly from a list attached to the given release. In addition to this the files were available to download directly from any search results and the latest 6 were displayed on any related taxonomy pages.

The tables themselves fall into two rough camps; tables which contain the entire historical data for the statistics they contain, and tables where this history is split over multiple files, and on the old website therefore multiple releases.

Moving forward – how is this different on the new website?

The first step we made to address the needs outlined above was to break the link between the release and the dataset, and instead looked to treat these as their own page. Whilst this introduces an extra ‘click’ into some journeys it offers some immediate benefits and some we will be looking to build on longer term.

The biggest benefit of this approach is that it gives a singlular place where any given dataset will be located – with all future updates made to that page rather than creating a new version in a new location. This makes it possible for users to bookmark tables they use frequently and for search engines to index the site more effectively.

Having a specific page for a dataset also means we can bring together into one place all the historic data. Previously for you would have had to locate each of these table separatly on the website in order to see how these figures have changed over time. This change in particular became very popular and tested really well with users. For example:

dataset

Going away from publishing separate versions at different locations and instead having a single location, that displayed the latest estimates by default, added the challenge of providing access to these old versions. To address this need each specific spreadsheet has its own version history that users can look back through to see what estimates were at a given date via a ‘view previous versions’ link below the download.

What’s next

There is still a way to go in addressing some of those initial user needs and there are a number of key areas we will be looking at going forward. Language and titling of all of our statistics is something we need to improve and these datasets are no exception. Looking at how we can clearly identify to users exactly what data they will receive when they click on any given link is going to be critical in solving this problem. The title is a big part of this, but the solution likely involves other aspects of these new dataset pages as well and the idea of clearly identifying the dimensions and breakdowns included in each dataset is something that has come up in testing and we will be looking to work towards.

We also know that there are some areas of the site where the volume of data we produce makes the ordering of the datasets a key tool for providing a logical structure and getting users to the data they require. On the old site this was often achieved by using table IDs preceding titles to force these pages into a specific order. Whilst this worked well on some of the pages, on others (particularly in the search results) it made these datasets difficult to scan. On any page where we list items we provide a text filter to help users narrow down the results and the intend is to build on these filters over time and provide additional tools to find specific datasets.

Hopefully these changes will reflect the testing we have done on them and prove useful for users. Ultimately one of the benefits of this approach is that it gives us a solid base to build and iterate upon as we continue to look to make getting to data as easy as possible. If you have any thoughts or comments on this  please do let us know by email, the comments on this blogpost or on Twitter to @ONSdigital.