Our RSS feeds
LATEST JOBS:
resize text: decrease text sizereset text size increase text size

Online Journalism News

Do publications need style guides for data?

Tony Hirst Tony Hirst, lecturer at the Department of Communication and Systems at The Open University, was one the first pioneers of the Guardian's Open Platform, using its API and data-sets in various projects. He blogs at OUseful.info.

If you're a working journalist, then you're probably familiar with the idea of style guides: documents that specify preferred spellings, capitalisation, acronyms as well as document styles for a particular publication; the Guardian's for example. As interest in publishing data grows, inspired in part by official releases of US Government data at http://data.gov, and through the Guardian newspaper's OpenPlatform DataStore, it's maybe worth pausing for a moment to ask whether we need consistent style guides for data, compared with a more laissez faire attitude to publishing data more or less as it comes.

So what issues might a style guide for data address? To give us something concrete to talk around, consider the spreadsheet of Shadow Cabinet member expenses, as published by the Conservative Party (Google the Shadow Cabinet's expenses; and the actual Shadow Cabinet's expenses spreadsheet).

Shadow Cabinet expenses spreadsheet http://spreadsheets.google.com/pub?key=rR1-1we0zvyDXudWYRFkPpQ

This document is published as a public document on Google Spreadsheets, the same platform that is used by the Guardian DataStore, where it can be viewed without any need for authentication. If you have a Google account, you can also make a copy of the spreadsheet and then use that copy to plot charts or analyse the data as you require.

Being a Google spreadsheet, it is possible to interrogate the spreadsheet in a variety of ways via an API (Application Programming Interface), which allows you to treat the spreadsheet almost as if it were a database. The details are rather too technical to mention here - follow my blog for greater depth - but suffice it to say that it is possible to construct a wide variety of URIs that can request a wide variety of different reports based on the data contained in the spreadsheet.

So for example, one thing we can do with the Shadow Cabinet expenses spreadsheet is to request a summary report that contains the total expenses claimed by each member by expenses category:

Google Visualization

So what do we see here? The report is generated by identifying unique 'strings' of text in the Shadow Cabinet member and Allowance columns, and then summing the expenses for every claim made against each name and category. The spreadsheet works by identifying exact text matches, which is why Theresa Villiers' Administrative & Office Expenditure and Administrative and Office Allowance are identified as two different types of claim.

This inconsistency arises through the failure to adopt a very strict convention about how to categorise expenses claims even within the same spreadsheet. These conventions can be trivially enforced by using a form to enter the expenses details, with categories selected from a drop down list, for example. Incidentally, the Google Forms application stores results from completed forms in a Google spreadsheets and could potentially be used to provide a handy interface to the expenses spreadsheet.

Can you see any other inconsistencies in the expenses to date? How about Owen Paterson's claims? Problems such as this are compounded once we start trying to maintain consistency across different spreadsheets. Taking the Guardian DataStore Higher Education spreadsheets as an example, we see different naming conventions used for the Higher Education Insititutions across the national Survey and Dropout spreadsheets.

The following image (follow link to view on Flickr) shows results being pulled in to a single Google spreadsheet from two Guardian DataStore spreadsheets using the HEI search term 'Glasgow':

 Guardian HE Data Grid

To the human eye, it's easy enough to identify which of the differently presented (that is, differently styled) names map on to each other, but the inconsistent naming means that it becomes difficult to create a combined report that pulls back data from spreadsheets using different naming conventions.

So, for example, what search term could I use to display just the results for Glasgow University? Although both the Conservative Party Shadow Cabinet Expenses spreadsheet and the Guardian DataStore are relatively young resources, they already present a range of problems to the journalist wanting to produce summary reports from them, or data mash-ups by combining them.

In the first case, we have an example of inconsistent styling within a single spreadsheet; in the second case, we have inconsistencies across several different spreadsheets.

But, by adopting something akin to a data style guide these simple errors can be avoided, and the resulting data resources will be more valuable as a result.

Tags (click tag to find related articles; click icon for feed):
guardian | api | online journalism awards shortlist | open platform | tony hirst | datajourn | ouuseful |

Sign up here for our free, daily email newsletter to get all the latest stories, jobs, tips and more.

Got a story? Email our news team: Laura Oliver; Judith Townend or telephone +44 (0)1273 384290. You can also follow us on Twitter: @journalismnews / @LauraOliver / @JTownend.

Comments

No comments

You must be registered in order to post a comment. Click here to register or login below if you are already registered:

    

Forgotten your password? Please click here



JOB OF THE WEEK

Online content producer (full-time contract - six months)

With experience writing for online and working to daily deadlines, expertise in home products and the ability to turn complex technical copy into an online story that answers key consumer questions for Which? ...more

Freelancers for hire

...see all

DISPLAY ADVERTISING

image

Target our journalism community of 17,200 subscribers and 140k+ visitors monthly. Call Chris on 01273 384291

Advertisements

How fast is your broadband?

Broadband Speed

Test your speed now

Click for
mobile broadband deals from Mobile Broadband Genie


Compare Broadband

Alternatively take a look at mobile broadband packages.