Journalism++, a network of data journalists and developers, has released an open source platform for managing large data sets and searching the connections between them.
Detective.io lets users input large data sets, predominantly of people or organisations, and then maps the connections between them into an easily searchable database.
"We wanted to investigate all the innovative energy projects in developing countries," Nicolas Kayser-Bril, co-founder and chief executive of Journalism++, told Journalism.co.uk, explaining the project which instigated the construction of Detective.io.
At the heart of the project, funded by the Bill and Melinda Gates Foundation via the European Journalism Centre’s Innovation in Development Reporting Grant Programme, was a suspicion "that a few players were pulling the strings", he said, "so we wanted to use some network analysis techniques" to investigate relationships between the key parties.
The team quickly realised that they had too many types of 'entities' – too many projects, companies, subsidiaries, individuals, countries – for existing tools so decided to build their own way of mapping the networks and connections between the different players in the field.
These connections make up an ontology, a way of describing a field that provides the links between nodes in a network, which can then be structured, searched and tested through Detective.io.
"What we wanted was a way to store the data, to find a way to input it rapidly and to structure it," said Kayser-Bril. "We've now got a crunch base with all our data in one place where it's very accessible."
The 36 projects, 82 people and 90 organisations of this energy project are now linked and searchable on the site. The search function works in a similar way to the Facebook graph search in that you can search for projects or people, for example.
The search function allowed the team working on the energy project to quickly test hypotheses about the connections, relationships or similarities between different agents.
"One of our hypotheses was that a lot of the people working in the field of innovative energy projects in developing countries were educated in the US," he said. "Using the graph search we can easily search that using natural language and see that 50 people results come back from the database.
"It's very easy to test hypotheses like that and you can do that on any dimension in the data."
At present, the search function requires users to type in the precise wording of the search terms, such as "United States" or "person", but Kayser-Bril said this will be adjusted in the coming days to give a more nuanced and semantic search while still providing the right results.
Upcoming projects involve an investigation into microfinance in the Democratic Republic of Congo and an investigation to store data on migrants dying on their way to Europe. Journalism++ is providing the paid service of setting up and inputting the data and ontologies for these projects but the Detective.io software is open source so any newsroom with a large data set can use it to understand their data better.
"We found that there were lots of people who had lots of data that they didn't know what to do with and being able to just map the data in an ontology that can be as complex and specific as the user requires is very powerful."
One example is an organisation in Paris working on civil society in Belarus which had a lot of information stored in a WordPress blog. They could not mine it or visualise it in WordPress, said Kayser-Bril, but Detective.io could give them the opportunity to make all of their data machine-readable to search and test as they please.
Other platforms are available – Poderopedia Plug & Play and the VIS, a data visualisation platform, were both built to analyse relationships of people in power – but were not large enough in scope, said Kayser-Bril.
"We needed more entities," he said, "two organisations could own the same project or one project could do different products and none of the platforms that were available could handle all of this. So we built our own."
Update: This article has been updated to show that Journalism++ will be upgrading the Detective.io search function in the near future and to clarify a quote from Kayser-Bril regarding the inspiration for the project.
We have also been asked to make clear that the project's funding was received via the European Journalism Centre's Innovation in Development Reporting Grant Programme.
Free daily newsletter
- Advertorial: Improving strategic content planning in your newsroom
- James Hewes, CEO of FIPP, on the legacy of the pandemic on digital media
- 36 blogs by journalists, for journalists
- Elisabeth Gamperl of Süddeutsche Zeitung on metrics anxiety in the newsroom
- Tip: Use audio effectively for multimedia stories