Data journalism can be a daunting task for some, so Steve Doig, professor of journalism at the University of Arizona, ran through the first steps for delegates at the International Journalism Festival yesterday.
Finding story ideas
Data is everywhere now, he said, and most social issues or human interest stories will have some data tied to it somewhere. Stories on local beats may have patterns, so it is worth looking at the areas you are used to covering to spot them.
If you're still stuck for a place to start though, Doig recommended getting inspiration from other projects or stories.
"If there is some kind of social problem or criminal activity in one city, chances are it is going on in your city also," he said, so reporters could look at recent stories from further afield and see if they apply more locally.
There are more explicit places to look though.
The Data Driven Journalism site has featured stories, Doig regularly checks the Investigative Reporters & Editors' Extra Extra feed and the Guardian Data Blog are all good places to gain inspiration.
"Informants and whistleblowers" will always be a good source of stories, if you have a good contact network, he said, while reading documents and reports from institutions and academics will often prove fruitful.
And it does not always have to be serious stories, he said. At the Miami Herald he discovered that all the dogs in the city had been registered, so was able to see the most popular breeds (doberman or poodle) and names (Winston featured strongly for bulldogs).
Work backwards from your idea
A data story normally revolves around a hypothesis, said Doig, and then testing that hypothesis against stated, verified data.
To know what data is needed, a journalist must assess what variables are involved – age, sex, location, job, income, crime – and understand which organisations or government bodies collect those variables. Then you need to...
Get the data
The Freedom of Information Act is a primary source of data, Doig said, while in European countries where data may not be so readily accessible, sites like Wobbing.eu are useful.
Check out these Journalism.co.uk articles on Freedom of Information Act requests and sources and tools for data journalism for ways to get data for a story.
Data can be provided in a multitude of different formats, Doig said, so the best bet would be to "find yourself a data nerd" and then "have their nerd to talk to your nerd" to get the best file format.
Try to avoid PDF files though, he said, as it can be difficult to import data into other programs to make it usable. This is almost always necessary as most data is not perfectly arranged for analysis.
Clean the data
"People who collect data are doing it for bureaucratic reasons," said Doig, "but we want it for analysis reasons. We have to be more precise."
Cities can often have multiple spellings in one set of data ("I've seen Phoenix, Pheonix and Feenix in one column") so the data often needs to be cleaned up.
Open Refine is Doig's highest recommendation as it can spot inconsistencies, recommend changes and generally clean the data and make it ready to analyse.
Look for patterns
"Look for highs and lows, the range, the averages and medians," he said. "Get the shape of the data in your mind."
Then the journalist can start to look for patterns or anomalies, statistical outliers could be a huge contribution to a politician's election purse or a woefully underfunded school. Some time the outliers can be as big a story as a pattern, he said.
The best tool for examining data – or at least the "gateway drug", said Doig – is Microsoft Excel, where "80 to 90 per cent of jobs" regarding data can be used.
Sort, filter, function and pivot tables are all useful tools to get to grips with, he said, and can be easily picked up with a bit of practice, or with the help of a colleague.
"Data journalism is a team sport," said Doig, and just like with other types of stories collaboration and idea sharing can often yield the best results.
A really good data story should also include lots of different elements to being it to life, he said, including text, audio, interviews, charts, pictures, multimedia, social media, laws, documents and anything else which will put it onto context and make it more human.
Doig also recommended joining the European Journalism Centre's free online data journalism course, set to start on 19 May, for people wishing to get their hands dirty with data.
- Journalism.co.uk is also running an online data journalism course on Thursday 24 July at MSN UK's offices in London, led by Conrad Quilty Harper, a data journalist at Ampp3d. You can book the course as part of a news:rewired+ bundle ticket, currently available at an earlybird discount rate of £245 +VAT, giving you access to both the training course and our news:rewired conference the day before, on Wednesday 23 July. You will also be able to book the training course individually for £200 +VAT soon.