The investigative non-profit outlet announced the arrival of the data library late yesterday (Wednesday, 26 February), claiming that for journalists, the provision of the 'premium' data in particular "will save you months of work preparing the data".
In an online post Scott Klein, senior editor of news applications, and data reporter Ryann Grochowski Jones, described the platform as "a new way for us to share our datasets and for them to help sustain our work".
A number of other news sites have launched online data stores, to share materials with a wider audience, the Guardian Data Blog and Data Store being a well-known example.
What is particularly interesting about the ProPublica Data Store is that it is also implementing a cost structure to view certain materials. "In most cases, it's $200 for journalists and $2,000 for academic researchers", yesterday's online post explains, while different rules may apply for commercial use.
This idea of charging for access to specific data sets is something other news sites have also been experimenting with, such as another investigative news site Exaro, which introduced a charging system in relation to its Insolvency Index last year, instead of continuing with its paywall.
Detailing the new ProPublica Data Store, Klein and Grochowski said the "reasonable one-time fee" would apply to data sets which have come out of intensive work by the ProPublica team.
"Much of our data comes from our developers spending months scraping and assembling material from web sites and out of Acrobat documents," the announcement post explains. "Some data requires months of labour to clean or requires combining datasets from different sources in a way that's never been done before."
Data sets falling under the 'premium' bracket include materials relating to ProPublica projects such as Dollars for Docs and the site's Prescriber Check-up app.
But ProPublica stresses that before journalists part with their cash they are able to see a sample of the data set. And they are confident the material being shared on the platform will be a valuable resource for the journalism community, as well as others.
"The datasets contain a wealth of information for researchers and journalists. The premium datasets are cleaned and ready for analysis. They will save you months of work preparing the data. Each one comes with documentation, including a data dictionary, a list of caveats, and details about how we have used the data here at ProPublica."
And charges do not apply to all data being shared on the platform. The store also offers "as-is datasets we receive from government sources" in response to FOI requests at no cost, as well as links to "datasets that are free and available online".
Klein and Grochowski added that the platform "is a bit of an experiment", and that they will be "paying close attention and expect to learn a lot in the first few weeks after launch".
Free daily newsletter
- Investigation tips from two student journalists who worked on Panama Papers
- Tool for journalists: Recipes for 'cooking public budgets' to investigate corruption
- How journalism students helped on the Panama Papers investigation
- Get with the program: The benefits of coding skills in the newsroom
- Advice from FT and WSJ for getting started with interactive graphics