Government websites can be a starting point for many journalists investigating issues in the public interest, such as local planning and development, or spending.
There are also many other data sources available that are often the result of other investigations and projects journalists and news organisations have worked on, or that have been compiled by civic groups and non-profits, which other reporters can use.
Below we have compiled a list of 18 databases and data sets that contain information about a variety of topics and industries. Most of them are available for free, while some, such as the ones maintained by the Investigative Reporters and Editors, require you to have a membership to the organisation to access them.
This list is not exhaustive. Are there any additional resources you have come across in your reporting, or has your organisation made any datasets available following an investigation? Tweet us at @journalismnews and we will update this collection.
OCCRP's Investigative Dashboard
The Investigative Dashboard from the Organised Crime and Corruption Reporting Project allows you to search documents from the organisation's previous investigations, as well as official sources and scraped websites. In total, you can explore 93,374,268 leads for your investigations, the website states, including information on people and companies and their assets.
ICIJ's Offshore Leaks
More than a year has passed since Panama Papers broke, yet journalists are still finding stories in the 11.5 million documents leaked from law firm Mossack Fonseca.
The International Consortium of Investigative Journalists (ICIJ) created a public database of 320,000 offshore companies, based on information from Panama Papers and the 2013 Offshore Leaks Investigation. Here's a step-by-step guide on how to find stories in this database.
The Marshall Project's 'Next to die'
The 'Next to die' project from non-profit US outlet The Marshall Project is not a traditional database, but it can definitely be a starting point if you are investigating executions in the United States.
It tracks upcoming scheduled executions, and provides information on the name of the person, the state, the date and a summary of each individual's case. Each case also comes with background on the history of the death penalty in a given state, and past executions to have taken place in that location.
The Guardian's 'The Counted'
Similarly to 'The next to die', 'The Counted' project from the Guardian tracks people killed by police in the United States.
The database contains information for 2015 and 2016, which can be filtered by an individual's name, state, race, age and whether or not they were armed at the time the incident took place.
ProPublica's Data Store
Another US non-profit, ProPublica, makes some of the data and APIs from its investigations available online through its Data Store. Some datasets, such as the Trump administration's financial disclosures, or information about the types of ads shown on Facebook, are available for free, while others are paid for.
NICAR Data library
The National Institute for Computer-Assisted Reporting (NICAR) in the US maintains a list of databases on topics such as boat accidents and federal campaign contributions.
A data sample can be downloaded for free, but the full information is available at a cost (or for free if you are a member of Investigative Reporters and Editors). An annual membership for non-US journalists costs $70 (£54), and also provides access to tipsheets and reporting guides.
Companies House is a UK government database, currently available in beta, that allows journalists to search for a specific company and find information on ownership, affiliates, and registration information, among others. Here's a tutorial for using Companies House to access financial information about UK firms.
Open Corporates is the "largest open database of companies in the world", with information about more than 134 million companies and over 176 million individuals.
With OpenSpending, reporters can search for more than 45 million government fiscal records across 76 countries.
Global Open Data Index
The Global Open Data Index (GODI) is the annual global benchmark for publication of open government data, run by the Open Knowledge Network. It offers datasets on topics such as government budgets, national statistics, and draft legislation for a variety of countries going back to 2013.
The data is available for use under a public domain data license and is provided as a single, downloadable archive, or as individual files in CSV and JSON formats.
European Data Portal
The European Data Portal can be searched by keyword or browsed by category, including energy, transport and education, among others. For each category, there are a number of datasets available, which you can dig into further by country.
Open Data Inception Project
The Open Data Inception online directory is an interactive map of more than 2,600 open data portals around the world. You can search by topic or country, or go straight to the location you are investigating and see what databases are available.
According to its website, OpenInterests "combines different sets of information into a search engine, which can be used to quickly retrieve information about the activities of companies, people and institutions in a European context". Browse its 107,105 organisations, 18,200 individuals and 28 EU institutions to get started.
LittleSis is a free database connecting the dots between powerful business and government figures. Use it to investigative conflicts of interests and corruption, before trying the Oligrapher tool to visualise networks of influence using the data.
Get access to global development resources through the World Bank data catalog, which can be searched by topic, alphabetically or according to the date when a dataset was last updated.
International Aid Transparency Initiative Datastore
The International Aid Transparency Initiative (IATI) Registry provides links to all raw data published by organisations using the IATI Standard. Its Datastore is currently available in an Alpha version, so the data can be queried through an API or using an online form to access it by sector or organisation.
The Aleph search engine has over two million documents filed by oil, gas and mining companies to financial regulators, which can be browsed by people, company, project or technical terms.
Fifty-eight datasets that aim to "help people realise the benefits of their countries' endowments of oil, gas, and minerals" are available on the Natural Resource Governance Institute (NRGI) website.
If you need to sharpen your investigative and data journalism skills, Journalism.co.uk is running three practical workshops this autumn that will help: Investigative skills for young journalists; Web detectives: online research; and Finding Stories in data: Intermediate Excel skills for digging deeper.
Free daily newsletter
- 5 ways data journalism became more innovative in the past few years
- Tool for journalists: Sourcelist, a database of technology experts for improving diversity in reporting
- Why algorithmic accountability reporting needs to go beyond transparency
- Emily Dugan, BuzzFeed News, joins opening panel at newsrewired on 11 July
- The Daphne Project: 'Even if you stop a messenger, you will not stop the message'