Finding one name in a sea of scanned documents or a video footage takes a lot of legwork. This new tool can speed up the process and supercharge collaborative data investigations
The Panama Papers, the biggest leaks in journalism’s history, saw hundreds of investigative journalists analyse 11.5 million documents over several years to produce data-led stories.
Perhaps it would have been different today as Microsoft is trying to solve this pain point of data journalism. Its new tool - Content Insights and Discovery Accelerator, or IDA - can analyse hundreds of thousands documents or long video footage within seconds.
Combining artificial intelligence, object vision and optical character recognition (OCR), IDA can analyse pages and extract text, images and other key data. It also helps journalist search long videos, identifying faces or keywords, and provide searchable footage transcript.
You can use IDA to analyse documents on your own or create a collaborative team (using the 'Portfolio' function) that can work on the same investigation and leave comments on your shared project. If you choose the 'Private' setting, only you can see your database, while 'Public' setting will allow anyone within your organisation to take a look at your work.
Once you have uploaded your data, such as scanned pages of a document or a large number of emails, you can start searching for a specific term. We used the Mueller Report to try out IDA ourselves, and searched for Putin.
IDA helps you not only find a name on every page of your dataset (highlighted in yellow), it also provides you with insights about how often it features, and other names connected to it.
You can also click on the keywords in grey to search additional context or definition from Bing. Although this has its limitations, it can be a good place to start exploring an unfamiliar topic.
Once you have your analytics displayed, you can start exploring 'Insights'. Colour-coded graphics will show data and people contained in your dataset.
'Relationships' tab helps you explore how these names are connected and 'Stacked Bar' allows you to compare variables, for example, names and locations.
This can be of great help when you start analysing a dataset as you can see what people, locations or other data feature most prominently.
This feature helps you analyse speech and faces in any footage, no matter the length. Facial detection function currently recognises more than 2 million public figures and also gives you the percentage of how how long they appear in the video, along with information of who they are and the probability of it being that person.
The facial detection database contains most of the mainstream celebrities, politicians, or sportspeople. It can, however, be personalised and you can add your local politicians or any people of interest your publication routinely reports on.
Video indexer also gives you the main topics and ‘named entities’ which it picks up from the speech. If you want to see these in their context, you can search the exact wording in the transcription. This function also shows you where in the video your term appears and how many times.
Video indexer also has a subtitles function that allows you to follow the speech with real-time transcription. This can be translated to more than 30 languages and you can even share the translated video with your audience.
Like with the facial database, you can add specific terms or jargon to the language database. If, for example, you report on video games and you have a footage from a gaming show, you are able to add all the names of video game characters for an accurate transcription and analysis. Same goes for health or sports reporting.
IDA is 80 per cent developed and the other 20 per cent customisable with your own developer so you can add features or data for analytics that matter the most to your reporting.
If you like our news and feature articles, you can sign up to receive our free daily (Mon-Fri) email newsletter (mobile friendly).
Sign up to receive job alerts of your choice by email, or manage your subscription
Featured recruiter: click to view its vacancies
Publisher of professional magazines and web sites seeks experienced editor to maintain market-leading products and develop additional channels. To be based at its Kent offices
Subscribe to our newsletter for latest news, tips, jobs and more
End that deadline stress today and find help in our freelance directory
Cargo Force stuns the world: free 10kg shipping to India in celebration of ICC Trophy victory – offer ongoing until Sunday, 16 March!
Our 35th Newsrewired conference will be held 13 May 2025, News UK, London.
Balance quality content with strategic growth
A new report by FT Strategies and smartocto reveals how newsrooms are increasing relevance, engagement and revenue by focusing on why readers consume news rather than what journalists think is important
The Scandinavian media company is using innovative tools, including bullet-pointed explainers embedded in crime stories, to rebuild trust among young audiences and counter growing news avoidance
How do the likes of DER SPIEGEL and L'Equipe turn fly-by readers into loyal subscribers? The Audiencer's Madeleine White dives into top case studies, best practices and benchmarks