csvcubed: a new tool for creating CSVWs

A little over a year ago Rob Barry, a Technical Lead in ONS’ Integrated Data Service (IDS), talked about how the IDS Dissemination project aims to help people adopt the open standards supporting 5☆ Linked Data, especially the CSV on the Web (CSVW) format. Since then we have been busy developing new tools to help you create CSVW cubes for your statistical data.

Introducing csvcubed

csvcubed is a Python library and command line interface (CLI) tool for people with statistical or observational data to share. Our focus has been to remove the steep learning curve that makes it hard to use open standards for linked data. Our well documented command line interface makes it possible for users to create 4☆ open data from a single CSV file, or 5☆ linked data by providing a little configuration.

The barrier to entry is low: only the CLI and a csv which follows a certain design pattern are required to create a CSVW, but users can optionally customise their data cubes’ metadata and schema. `csvcubed` creates W3C-valid CSVWs, leveraging several vocabularies and standards including SKOS, DCAT, and the RDF
Cube Vocabulary
.

Since csvcubed is a CLI tool with a light footprint, it can easily be appended to many RAP processes (e.g. R, SQL, Python) provided that you can output a CSV. Schema definitions and associated metadata are part of the CSVW standard which strengthens data integrity, removes ambiguity for humans, and creates explicit instructions to machines for interpretation of the data.

Our data engineers have been building pipelines using csvcubed to create CSVWs on a production basis since Autumn 2021.

Why CSVW?

CSVW is an open standard recommended by the United Kingdom’s Central Digital & Data Office. By using CSVWs:

  • Data can be ingested into annotated data models
  • Sharing and collaboration is easier
  • CSVs become machine-readable

The resulting CSVWs can be used as-is, or easily converted into RDF and JSON-LD.

Getting started with csvcubed

It requires Python 3.9 or newer. Users can install it using their favoured package manager. For example, pip install csvcubed will install csvcubed and its associated dependencies using pip. From there you can start building CSVWs using the command line.

If you want to see csvcubed at work, once you have csvcubed installed try the following two commands using your chosen shell:
wget https://raw.githubusercontent.com/GSS-Cogs/csvcubed-demo/v1.0/sweden_at_eurovision_no_missing.csv
csvcubed build sweden_at_eurovision_no_missing.csv

​All the files in the ./out/ folder are part of your CSVW, so you can host them in a GitHub repo, zip them up to distribute via email.​

How ONS uses csvcubed

IDS’s Dissemination branch uses csvcubed’s output to build RDF for loading into a triple store which is in public beta. You can do a lot of interesting things with CSVWs and linked data, including the innovative Climate Change dashboard which launched to coincide with the COP-26 conference.​

We look forward to hearing how you get on with csvcubed.