stack of papers
Credit: Image from

Aside from having the skills and methods required to find, obtain and understand existing data, journalists can also collate it themselves or facilitate the process of assembling data from different sources.

Jonathan Gray, fellow at the Institute for Policy Research at University of Bath, and co-founder of Public Data Lab, chaired a panel discussion about the people, processes, and technologies used to create collaborative data infrastructures for journalism at the International Journalism Festival in Italy today (6 April).

Turning traditional information into structured data

Nicholas Kayser-Bril, co-founder and chief executive of Journalism++, explained how he worked with a cross-border team of journalists to produce the ongoing project The Migrants' Files. The project tracks how many people have died while attempting to cross the border into Europe, as well as how much money has been spent on trying to prevent their arrival.

The data used for The Migrants' Files came from the work of an NGO and that of an Italian journalist who had been already been compiling information about the topic, as well as through the project team's own research.

However, the data from the first two sources consisted of a list of press clippings, which couldn't have been used to create the map for the project, so Kayser-Bril and his colleagues had to turn it from plain text into structured data, by cleaning and geolocating it.

The data and methodology used in the project was then re-used by public institutions, which is "an example of how journalistic work can lay a foundation for data infrastructure", he said.

Identifying the needs of the audience and the journalists working on an investigation

"Everything is interconnected nowadays, we jump across borders easily in any story," said Mar Cabra, editor of the data and research unit at the International Consortium for Investigative Journalists (ICIJ).

For Panama Papers, the ICIJ team identified three key needs of the more than 370 journalists working on the project: reliable communication, full access to the data, and the ability to visualise information.

To serve these needs, they created new softwares and tools, as well as customised existing open-source platforms, such as Global I-Hub, a social network developed for the journalists who were part of the investigation.

"Radical sharing" was one of the priorities, so ICIJ developed a private platform for reporters to access and search through the entire contents of the leak, which amounted to some 2.6 terrabytes of information. They also created a public-facing platform which anyone can use to browse data from Panama Papers as well as otherinvestigations ICIJ has worked on.

"Visualisation is very important, especially when dealing with data about people, companies, and addresses," Cabra added, and platforms such as Linkurious and Neo4j were used to show the connections between the politicians and other powerful figures named in the investigation, and their business endeavours.

A breaking news desk structure for Electionland

Electionland, the initiative spearheaded by ProPublica in which news organisations, universities and technology companies collaborated to cover voting issues at the US election polls, had a breaking news desk structure at its pop-up newsroom based in New York.

Each team performed a specific role in finding, verifying, processing and distributing the data that came from a variety of sources and that was ultimately turned into leads for local reporters around the US, said Scott Klein, deputy managing editor at ProPublica.

The newsroom had: feeders, who were mainly journalism students who were monitoring social media and verifying the issues voters were flagging; catchers, a group of professional journalists from ProPublica and other news organisations who were looking for patterns and breaking up the issues according to area; and the national desk, which had reporters writing stories about the leads.

The technology used for Electionland included Check, a content management system dedicated to social media verification from Meedan; Slack; and Google Trends. ProPublica developers also created a platform called Landslide, which collected all the data coming in from sources other than social media, including texts, voter respones from online forms, and call center information.

Connecting archive content to stories

The Organised Crime and Corruption Reporting Committee (OCCRP) has two or three dozen member centres across Eastern Europe. Friedrich Lindenberg, data journalist at OCCRP, said it's not always easy to find out if the issue someone is investigating has already been covered by other participating journalists.

He has been focusing on the relationship between archive content and stories and ways to connect the two.

For OCCRP's Laundromat investigation into money laundering, for example, could the database be run against Panama Papers data about company registrations or an older story about money laundering to reveal any connections?

"For these 'follow the money' type of investigations, there are story atoms: these are people, companies, documents, events.

"Investigative journalism in this context is about finding ways to connect them."

Free daily newsletter

If you like our news and feature articles, you can sign up to receive our free daily (Mon-Fri) email newsletter (mobile friendly).