The first day of each Olympics generates more data than the entire previous Games, according to MarkLogic, a data software specialist that is providing the BBC and the Press Association with a way of processing data on the games.
The firm has built a database to manage the unstructured information and data in a variety of forms, including images, video and text, generated by the publishers.
The database will "manipulate in real-time the enormous volumes of data that will be generated during the Olympics", John Pomeroy, MarkLogic's vice president for Europe told Journalism.co.uk.
Pomeroy said the software firm had been approached by the BBC and PA for a system that would allow them to "manage the speed of ingest of the data and the number of enquiries", plus the "shear volume of information that they were expecting from an editorial and a results data perspective".
Pomeroy added that the outlets were also looking at it from a business process perspective, "trying to automate where pieces of correspondence should reside on their site automatically".
There are two objectives, Pomeroy explained, the first to provide the technology to cope with the rate of ingest of the data and secondly, to "re-architect" the workflow "so that as correspondents are writing about a particular subject, whether that's an athlete or a team or a country, the system can look at that, analyse it and identify where that should appear in the various places on their online presence, which may well look different on tablet and on their website".
"The data PA manages on a typical Saturday is vast, with all the football matches, every touch of every ball, who passed it to who and was it successful, that is streaming in from every match live and then they franchise that information out to betting shops."
"The volumes of data both incoming but also the volume of enquiry relating to the Olympics is pretty mind boggling."
Pomeroy described the process as "dynamic semantic publishing".
"The system is analysing the text the correspondent has written to identify what that article is associated with and from that it is automatically working out where that information should be placed in the various places on the site."
"From a correspondent's perspective it grossly simplifies what they need to do. Before they used to write the article and then probably write something alongside it that was instructing the person putting the site together where this information should exist. So you are cutting out a lot of that manual intervention and streaming and speeding up the publishing process."
MarkLogic has more details of dealing with Olympics data on its blog.
John O’Donovan, director of architecture and development at the Press Association said MarkLogic is one of a number of "innovative technologies" which are delivering "services relating to the London 2012 Olympics".
He added that MarkLogic "is being used as a key component in PA's new Common Platform which underpins all of our text, image, video and data delivery".
"In particular it gives us the flexibility to deal with the complex content and data services we manage - enabling us to deliver real-time, bespoke content to many customers.
"This speed and flexibility is demonstrated particularly well in our data services which deliver Olympics results through MarkLogic and our own infrastructure. These results are then processed, organised and made available in our APIs and web services for clients before they even reach them on live TV."
Earlier this year the BBC's lead technical architect for news and knowledge core engineering Jem Rayfield blogged about the broadcaster's strategy for a fully dynamic semantic publishing (DPS) architecture and MarkLogic's role in this.
Free daily newsletter
- How to make the most of mentorships throughout covid-19
- BBC director-general: "We are activists for impartiality"
- Should journalists use social media to voice their opinions?
- Tip: Use up-to-date and reliable covid-19 data
- Transphobia Project uses data visualisation to zoom in on outlets that spread biased transgender content