A Guardian data visualisation showing principal areas of Cabinet Office spending
That's the view of Chris Taggart, founder of OpenlyLocal, which opens up and collates local authority data, and co-founder of OpenCorporates, "the open database of the corporate world".
His statement, which he gave during a Journalism.co.uk podcast on open data and journalism, puts clear weight on what he sees as the huge value of data being accessible for the future development and even existence of the journalism industry.
There are a number of outlets which are working to ensure greater transparency and organisation of data online, which also act as great resources for journalists researching certain topics or bodies.
In this feature we will bring together just some of these key tools and resources, as well as advice from data journalism experts, with eight top tips for reporters interested in making more of open data and other statistical resources.
1. Dig into open data resources
"If journalism is about telling a story that someone somewhere doesn't want you to tell, then where is that going to be? That's going to be in these large data sets, it's going to be about combining this data set with that data set" – Chris Taggart, co-founder, OpenCorporates.
Tony Hirst, lecturer in telematics at the Open University, gave the following list of key data sources:
- Data.gov.uk – an "index of public data sets"
- The Guardian Datastore – which is "leading the charge with the release of data sets"
- WalesOnline's local
"From a news point of view, following the money is one of the tricks ... One tool developed by Adrian Short called Armchair Auditor takes spending data at a local council level and allows you to just look through that to see what amounts of cash have been given to what companies.
"This is where different open data initiatives can start to inform each other. So Armchair Auditor can list monies spent with particular companies. If we then look at data from OpenCorporates we can see whether these companies are the same as some larger corporate groupings and start to get some picture of how money flows through the corporate world."
2. Uncover links and add colour with data aggregators
So as mentioned above, platforms such as OpenCorporates can be used by journalists to group together entities and be alerted to new filings relating to them.
Taggart says this is not about replacing journalism but aims to "make it easier or enrich it".
"I started off in journalism in the days before the internet. You'd have to go across town to the city business library and go through US telephone books and trade directories. Now that takes two minutes searching on Google.
"One of the goals of OpenCorporates from a journalistic point of view is to do the same thing for searching for company information. It's not that it's going to give you the story but it's going to make the job of getting a story so much easier."It's not that it's going to give you the story but it's going to make the job of getting a story so much easierChris Taggart on OpenCorporates
Hirst said one of the keys of data aggregation work comes down to having data which enables those using it to "unambiguously identify things you want to talk about".
"Data might be referred to in different ways. It can be hard to take data from one source and merge with data from another source. By coming up with identifiers or ways of uniquely identifying bodies, you can then say this data set is talking about same thing as another dataset and from that we can aggregate from different sources and build up a more rounded picture of the data being issued around a particular body or entity."
This aggregated data can then help to "provide colour", he added, with one data set used to "inform us about our view of the other".
3. Make the most of free data tools online
Hirst outlined three key tools he often uses when working with data:
- Google Fusion tables: "A good place for coding addresses and generating maps off the back of them"
- Gephi: "Good for generating network diagrams"
- ManyEyes: Offers a "range of interactive graphics"
"A large part of it is knowing what tools there are available and how to get data into them in a particular shape or structure so the tool can do its business."A large part of it is knowing what tools there are available and how to get data into them in a particular shape or structure so the tool can do its businessTony Hirst
Taggart also recommended Google Fusion tables as being "really good for handling big datasets", as well as using Google Refine to "clean up messy data and match the entries in that to entities". OpenCorporates has produced a screencast on "how to use OpenCorporates to match companies in Google Refine".
He added that platforms such as BuzzData or Google spreadsheets "can do an awful lot just by pulling in a .csv file into that and having a bit of a play around".
"There's an increasing number of tools out there. There is a learning curve but I think that learning curve's not necessarily a bad thing. From a journalist point of view, if journalism was just asking the right question, working out what database query to do, then there wouldn't be a job for journalists."
Hirst also highlighted the recently published Data Journalism Handbook, as a useful online resource for learning more about starting out in data journalism.
And the range of tools that are available means money and time does not have to be an issue, the Guardian's Datastore and Datablog editor Simon Rogers says.
"We're probably the cheapest editorial department at the Guardian. We don't get that much development time. Actually we don't cost a lot of money but use a lot of free tools. There's no excuse not to use Google Fusion Tables or things like Tableau Public, these things are free. You can make a map in 20 minutes and we're noticing people picking up these things and using free tools on sites dead quick."
He added: "Day to day most of the stuff we do is on the free tools."
4. Make the most of your specialism/patch
Data being opened up online falls across a number of topics, but there are some key areas where open data is often found, as Hirst identifies as "state of the locale reports that are data driven":
- NHS: "Huge source of data likely to be of local interest, e.g. hospitals, local surgeries"
- Education: This might include schools data, league table/performance data, demographic data
- Transport: "Increasingly being opened up", e.g. access to timetables, roadworks information
Rogers added that other topics can range from energy and the construction industry to shopping, and said "there probably isn't an area of journalism that doesn't have datasets associated to it".If you're a specialist reporter you've already got a body of knowledge most people don't have. The datasets you see everyday you may take for granted but are going to probably be fascinating to people in your area and readersSimon Rogers, the Guardian
"In fact if you're a specialist reporter you're even better placed to use data … I think that's where the future is. If you're a specialist reporter you've already got a body of knowledge most people don't have and they would kill to get insight into.
"The datasets you see everyday you may take for granted but are going to probably be fascinating to people in your area and readers."
5. Get visual with maps or charts
Hirst called on journalists to "try engaging" with maps and other accessible visualisation platforms more, while Rogers added that adding visualisations to a store make it more likely readers will take a look.
"Most times visualisation of some kind is worth it, even if it's little, a bar chart can do wonders for you. It doesn't have to be an overcomplicated, massive, great interactive, it can just be a pie chart or a bar chart and that's often enough."
But he added that news outlets need to look out at visualisation formats.
"When we started most visualisations used flash which is not available on some platforms. Even Google spreadsheet charts work on all platforms, they're easy to make, easy to embed, very neutral and can read on all platforms. Often that's enough."
6. Be brave and lead the way for others
Journalists were also encouraged not be put off if some data requires significantly more time to dig through, Taggart said.
"Ultimately journalism still means getting your hands dirty, it still means looking at something and trying to spot something else that no one else has spotted.Ultimately journalism still means getting your hands dirty, it still means looking at something and trying to spot something else that no one else has spottedChris Taggart
"Sometimes that means going down to database level ... It sometimes does require some investment in skill and time. I don't think that's a bad thing, good journalism has always required that."
Hirst added that journalists who have gained some experience should help to raise awareness of the resources available to others.
"It's about people who do know how to do these things advocating their use and communicating more widely about how to get involved. You shouldn't be afraid to try, to pick up a tutorial online about how to plot things on a map and just give it a go, and then maybe embed a map you produce in your own news web pages".
7. Keep a note and share your experiences
Hirst also recommends his own method of using a blog as a notebook to document experiences.
"One thing I would encourage people to do is when you are working through a tool, blog how you get on and what you did, what worked and what didn't. And start to engage in essentially a community of practice around working with these tools."
8. Make friends and partner up
And as well as sharing your own experiences you can share in those of others in online communities of others interested in this area.
As Rogers adds, "the important thing is to have a friend".
"Make friends out there who can help, who can offer you advice. So if you're looking at a dataset about waiting times you need to find someone who knows about waiting times to check your results rather than assume the numbers are right.Make friends, make lots of friends. There is a whole open data community with programmers desperate to get stuff into news organisations and who don't have that journalistic sense of what a story is or isn'tSimon Rogers
"In any other area of journalism people check their sources. I think there's a reluctance to do the same with numbers, you either use them or are too nervous to use them properly.
"Make friends, make lots of friends. There is a whole open data community with programmers desperate to get stuff into news organisations and who don't have that journalistic sense of what a story is or isn't. Training's great but I think the best thing is to get advice and talk to people who are experts in that area."
- Hear more from Hirst, Rogers and Taggart in this Journalism.co.uk podcast about open data and journalism
Those looking to expand their skills quickly can book on one of the introduction to data journalism courses and the intermediate course. If you book both options the cost is at the reduced rate.
This is the last time we will be offering this course led by Kevin Anderson due to his commitments – so take advantage of the final opportunity to learn from this former BBC and Guardian journalist.
Free daily newsletter
- ProPublica is collaborating with newsrooms to create a national database for hate crimes and bias incidents in the US
- Survey: Journalists can now have their say on access to UK Government data
- Tip: How a Washington Post reporter mapped American infrastructure
- Report: Technology trends journalists should watch in 2017
- Tip: How to get started with scraping data