Library books
Credit: Image by twechy on Flickr. Some rights reserved

Eighteen months ago a team led by Brian Boyer, who was then news applications editor at the Chicago Tribune, was awarded Knight News Challenge funding to develop PANDA as a "newsroom data library".

Boyer, who is now news applications editor at National Public Radio (NPR) and project leader on the PANDA project, told why he wanted to develop "a place for newsrooms to put their data".

"So much data in newsrooms is just stuck on someone's hard drive, and we think that's sad," he said.

"We would like newsrooms to better collaborate with data, but then above and beyond that we hope that PANDA can help increase newsroom intelligence, make people more efficient reporters."

So how does it work? Journalists, perhaps with help from developers, set up screen scrapers, and that data is automatically fed into their PANDA library. They also feed in other data, such as responses to Freedom of Information requests.

Once data is in the newsroom's private data library, it can be searched, much in the same way as Google is used to search for information on the internet.

So much data in newsrooms is just stuck on someone's hard drive, and we think that's sadBrian Boyer
Journalists can also set up email alerts so they can receive a message in their inbox when a news event happens.

Boyer gave an example of how the team at the Chicago Tribune uses PANDA.

He explained how the Cook County Sheriff's office, for example, publishes a list of arrest warrants on its website. "The website is impenetrable", Boyer said. "There are no reverse chronological list of warrants, it's a mess. It would be very difficult for you to visit that website every day and see if a new warrant had been issued."

"What we did in Chicago was to write a screen scraper, a little bit of code that combs through the website, pulls all the warrants out every night and then feeds them into PANDA.

"Then the reporter can go and visit PANDA, search for homicides or burglary and see a list of all the warrants that are outstanding."

PANDA also allows reporters to subscribe to get updates by email "so when you get to work in the morning you would have an email in your inbox telling you 'hey, there was just a new warrant issued for homicide'".

The PANDA team has written screen scrapers and made them available to others via ScraperWiki, a site for collaboratively building programmes to extract and analyse data.

So where next for PANDA? The newsroom data library has now been created and is used by a number or newsrooms. The $150,000 funding is nearing an end, and according to Boyer, the project will require additional investment in order to be progressed and developed.

The remaining Knight News Challenge funding is being used to work on bug fixes and releases, Boyer said.

"Next up is internationalisation – we hope to launch Spanish and German translations soon."

PANDA costs $70 a month, which is "roughly the cost of an Amazon EC2 Small Instance, the target hardware for PANDA", Boyer explained. There is a Google group of those using PANDA, which provides updates.

Free daily newsletter

If you like our news and feature articles, you can sign up to receive our free daily (Mon-Fri) email newsletter (mobile friendly).