“More often than not data will show up on your doorstep with a problem – or two or three or four."
When collecting information for a story, journalists might sometimes get incomplete, inconsistent or even incorrect data, which can complicate their work and delay them from starting to analyse it. In this article on DataDrivenJournalism, MaryJo Webster, data editor at the Minneapolis Star Tribune, points out the most common problems that occur with data requested from agencies or public institutions.
Inconsistencies in spelling or units of measurements, when it's not clear which currency value is used to display a salary for example, can make calculations difficult. To find these inconsistencies, Webster recommends creating a "summary of each field/column using a Pivot Table or a group by query or any other tool you would use to summarise data”, which will make it easier to see the discrepancies.
If you notice something odd about the data like potential duplicates, make sure to ask your source for clarification and try to compare your dataset to paper records and other sources.
Check out DataProofer, a free tool that automates the process of checking a dataset for potential mistakes or missing information.