The BBC is preparing to launch a cloud-based sound archiving tool that could make it easier for any news organisation, journalist or academic institution to find and share old audio.
The project, called Comma, is being developed with two commercial partners, Kite and Somethin' Else, and is based on the technology behind the BBC World Service audio archive.
Due to launch next spring, it will be made available to third parties who are sitting on decades of interviews, documentaries and news reports that are gathering dust, poorly labelled or hard to search.
Rob Cooper, a development producer at BBC R&D, told Journalism.co.uk: "When digitisation came along we thought that would be the solution to the problem. The truth is what you do is swap out a dusty and under-used physical archive for a digital equivalent.Even though you've got all these files now and they're easy to send, share and listen to, finding stuff within the content is still really hardRob Cooper
"Even though you've got all these files now and they're easy to send, share and listen to, finding stuff within the content is still really hard."
Comma takes a digital audio file and analyses the sound to extract meta-data, using speech-to-text to create a transcript, identify topics and even identify who is speaking, using voice detection.
"It can really aid the search and archive of large amounts of audio," Cooper said. "It should do a pretty good job at being able to identify the bits where you speak in an interview. This, we think, could be potentially useful for journalists and archivists who need to try and search audio a bit more rapidly."
Comma is the commercial equivalent of the BBC's World Service archive project, which saw 70,000 hours of old audio analysed and made searchable.
It could help documentary-makers, by taking the audio from raw footage, transcribing it and making quotes and clips easier to find.
Other institutions such as the British Library are also interested in the potential.
Cooper said: "There's lots of exciting opportunities out there for making these fantastic resources, which have been sitting there for years, a little bit more useable.
"I think it has fantastic potential for the BBC for programme-making, as more audio comes online and tools like this open it up a bit."
He added: "This software is incredibly exciting but it's very much down to the quality of the input audio. The audio quality has to be pretty good for these speech-to-text programs to work well.
"If you've got old tapes where it's very fuzzy and the volume is low it's going to be very hard to get meaningful speech-to-text out of it. The system is pretty demanding."
You can find out more about this archive project - and general issues relating to making audio findable in the digital age - in this week's Journalism.co.uk podcast.
A more detailed blog post about how Comma works can also be found here.
Free daily newsletter
- Newsrooms choose collaborative approach to protect journalists from online trolls
- CNN embarks on three-year digital project to cover gender inequality worldwide
- Tip: Use audio effectively for multimedia stories
- 107 speakers you need at your next journalism event to avoid all-male panels
- 'Digital first, print second': how Ireland's INM went from zero to 30k subscribers in one year