PDF is a bit of a hot button topic at the moment. The recent research by the World Bank was picked up by the Washington Post and the Guardian and here at the ONS some recent user research undertaken by one of our business areas has got staff talking about the topic as well.
The relationship between PDFs and the web has always been a difficult one but also one that is closely interwoven.
The Worldbank is an example of an organisation that on one hand has embraced the web but on the other hand publishes 1000s of PDFs on its website. Even the new GOV.UK site has a considerable amount of PDF content lacking a web native equivalent.
At the ONS the majority of our downloads are PDF. More than Excel or CSV. This is remarkable given the nature of the majority of our outputs. Clearly many of our users rely on that format and the recent research backs that up. What we haven’t gotten to the bottom of, is why they use it.
It would be unusual for anyone to have a specific attachment (!) to a specific format. It seems more likely that currently the limitations of our website do not serve some specific user needs and those needs are better fulfilled by the PDF. Even given its other problems.
The reality is that PDF is not a great fit for the web and an even worse one for anyone interested in doing anything with data.
In the Guardian Nathanial Manning, a fellow for the White House’s open data project, said the following which is as recognisable for statistical bulletins etc as development reports;
“The status quo in development reporting practices is built on the foundation of the PDF report. This is understandable. There are often numerous different documents used to make a single project report, including excel models, GIS shapefiles, and Photoshop charts. The ease of taking screenshots and putting it all into a PDF report, and sending it along via email is completely understandable. But this is like funding James Cameron to make Avatar, and then releasing it in a black and white flipbook. We are missing all the good stuff. This has to change.”
PDFs are difficult to search, often have serious accessibility problems (they shouldn’t in this day and age but..), they often end up with a mix of web like features within them (i.e. including hyperlinks that clearly are useless once offline or printed) and make reusing any data within them incredibly difficult thus alienating whole tranches of influential users (in particular the growing tide of data journalists.)
There is also a particularly salient problem. Once a PDF is downloaded you have no way of knowing how, when or even if it is ever actually looked at. I don’t know about you but I have all sorts of documents printed out gathering dust on my desk, pushed to my Kindle never to be opened or emailed to myself never to be opened. Without a better idea of how or why these PDFs are used it is difficult to know what is missing on the web native side of things.
I am not an anti-PDF zealot. I believe for quite some time yet there will be an audience for outputs in this format but I firmly believe that they should become a secondary option for a specific use case and that the primary source should be web native (essentially HTML) and properly searchable, reusable and machine readable. This is being ‘of the web not just on the web’.
The initial research we have been undertaking with the Open Data Institute [ODI] has been very much about trying to build an ‘open statistics platform’ that achieves this. It doesn’t seek to do away with PDFs — rather it seeks to make ONS data and analysis available in such a (re)usable way that the demand for the PDFs disappears apart from in very specific cases. This research is underpinning our thinking for the future of the ONS website and hopefully will meet our users needs and expectations. Time will tell and for the moment we still have plenty of PDFs to keep everyone happy!