But no-one wanted to talk about it till now: finally Journalism.co.uk got to put its questions to the team this week. Head of digital production, Ian Douglas, had been rather demoted in an Independent on Sunday piece which said he 'loaded the stories on to the web'. In fact, Douglas' job has involved rather more than that, not least managing the release of data online. "[Data publication] was one of the first things we thought about - how can we publish this in full?" says Douglas.
So why delay?
"The scans are forms - mostly handwritten. You've got quite a lot of wiggly bits of Biro and that sort of thing. There's no real substitute for getting someone to go in and fill out a big spreadsheet - a terrible job."
"We have been building it up as we go along, but we wanted to make sure we had the overview before we published anything.
"Through the whole thing we haven't wanted to give partial coverage to it. Part of the idea about having the spreadsheets is about saying 'here's everybody's figures'.
"News stories focus on individuals really well - we wanted more comprehensive data which is why it hasn't been around till now," Douglas explains.
The Telegraph's format - as anyone has tried out the MPs' expenses database will know - is a series of fact-sheet style tabs, and documents published using Issuu organised by name, but also searchable by constituency. A Google spreadsheet has also been released. Douglas' latest blog post can be found at this link.
Wasn't it commercial pressure and the incentive of added print sales, that dissuaded the Telegraph from publishing it in full at first?
"Public interest has always been the justification for publishing," answers Douglas.
"I think actually that the commercial advantage is lined up quite nicely. When you have a publishing business and there's something that's clearly in the public interest and we publish as much of it as possible, that fits quite nicely into the business model and we have tried to make money off the stuff we've published."
Having said that, he adds that the Google Doc hasn't earned them a penny. But then it cost nothing - apart from manpower - to put it up.
Sharing the PDFs wasn't costly either. While there were other services that they could have chosen, they opted for Issuu, which was pretty easy to get set up with, he says. "We signed up, got a pro account for something like $19 - not exactly a big deal."
The Guardian beat them to publication of a database: its interactive feature allows users to sift through the redacted data and has seen overwhelming interest both nationally and internationally, and a positive response from users.
Why didn't the Telegraph go down the crowd-sourced route?
"I think the Guardian stuff is very good but they can't publish all the source documents that we have [over one million PDF pages]. There's too much missing from the public view.
"Parliament approved a heavily redacted one [version] and we're still concentrating on getting everything out," Douglas says.
There's more to come?
What they did on Monday was the 'beginning of the process' he says. "We got to the point where we felt like we had a good stripe of the data covered. There are more PDFs to come and more figures to come. It's a steady process now."
Douglas' colleague, Tim Rowell, Telegraph.co.uk digital publisher, made a revelation in an interview with Paul Bradshaw for the Online Journalism Blog: the database site (hosted on parliament.telegraph.co.uk) will be expanded in the run-up to the General Election.
"We will be enhancing our political resources over the coming months as we build up to the General Election. This application is not just for the Expenses files, we have plans to develop this area into a full service that enables our users to engage more closely with the democratic process," Rowell told Bradshaw in an email.
What has Douglas learnt from all this?
"It has been a bigger job than we initially thought to really compile all the data, particularly the minimal redacting we have to do before publishing PDFs," Douglas says.
"It's taking a lot more effort than I had originally thought. Although as far as the approach [goes], these are things we would have done with smaller amounts of data."
He would re-consider the staffing level for future big data projects he says. "The reporting team stopped reporting for a minute, and started compiling figures. I don't think there's any other way of doing it."
So, could he see 'data journalists' playing an increasing role in the newsroom?
'You need people to go through data, who understand it,' to 'pick up the bigger picture' he says.
"Journalism generally is becoming a lot more about trawling large data-sets - [there are] very few journalists now who will get through careers without doing that at some point. There are still vast arrays of public data that no-one has gone through yet."
Related links on Journalism.co.uk
- 'Telegraph.co.uk: Guide to the full MP expenses database' [23/06/09]
- 'Let the expenses data war commence: Telegraph begins its document drip feed' [19/06/09]
- 'Telegraph to publish 'unredacted' information… in print' [18/06/09]
- 'Guardian launches 'major crowd-sourcing experiment' with MPs’ expenses application' [18/06/09]
- 'Has the Telegraph failed by keeping expenses process and data to itself?' [15/06/09]
- 'Telegraph 'didn't tell any lies but was selective in its facts', says Lib Dem Voice site editor' [11/06/09]