The Panama Papers, the biggest leaks in journalism’s history, saw hundreds of investigative journalists analyse 11.5 million documents over several years to produce data-led stories.
Perhaps it would have been different today as Microsoft is trying to solve this pain point of data journalism. Its new tool - Content Insights and Discovery Accelerator, or IDA - can analyse hundreds of thousands documents or long video footage within seconds.
Combining artificial intelligence, object vision and optical character recognition (OCR), IDA can analyse pages and extract text, images and other key data. It also helps journalist search long videos, identifying faces or keywords, and provide searchable footage transcript.
You can use IDA to analyse documents on your own or create a collaborative team (using the 'Portfolio' function) that can work on the same investigation and leave comments on your shared project. If you choose the 'Private' setting, only you can see your database, while 'Public' setting will allow anyone within your organisation to take a look at your work.
Once you have uploaded your data, such as scanned pages of a document or a large number of emails, you can start searching for a specific term. We used the Mueller Report to try out IDA ourselves, and searched for Putin.
IDA helps you not only find a name on every page of your dataset (highlighted in yellow), it also provides you with insights about how often it features, and other names connected to it.
You can also click on the keywords in grey to search additional context or definition from Bing. Although this has its limitations, it can be a good place to start exploring an unfamiliar topic.
Once you have your analytics displayed, you can start exploring 'Insights'. Colour-coded graphics will show data and people contained in your dataset.
'Relationships' tab helps you explore how these names are connected and 'Stacked Bar' allows you to compare variables, for example, names and locations.
This can be of great help when you start analysing a dataset as you can see what people, locations or other data feature most prominently.
This feature helps you analyse speech and faces in any footage, no matter the length. Facial detection function currently recognises more than 2 million public figures and also gives you the percentage of how how long they appear in the video, along with information of who they are and the probability of it being that person.
The facial detection database contains most of the mainstream celebrities, politicians, or sportspeople. It can, however, be personalised and you can add your local politicians or any people of interest your publication routinely reports on.
Video indexer also gives you the main topics and ‘named entities’ which it picks up from the speech. If you want to see these in their context, you can search the exact wording in the transcription. This function also shows you where in the video your term appears and how many times.
Video indexer also has a subtitles function that allows you to follow the speech with real-time transcription. This can be translated to more than 30 languages and you can even share the translated video with your audience.
Like with the facial database, you can add specific terms or jargon to the language database. If, for example, you report on video games and you have a footage from a gaming show, you are able to add all the names of video game characters for an accurate transcription and analysis. Same goes for health or sports reporting.
IDA is 80 per cent developed and the other 20 per cent customisable with your own developer so you can add features or data for analytics that matter the most to your reporting.
Free daily newsletter
- Tool for journalists: Subly, for adding captions to video content on social media
- Tip: How to become a data journalist
- David Leigh's survival guide to investigative journalism
- Bloomberg Media invests in climate reporting, launches new brand
- 'Conscious commissioning': what The Times learned from deep analysis of its journalism