The democratization of data is one of the most powerful forces shaping society today. Not so long ago, the gathering of data, its storage, and its analysis was, for the most part, the work of a fairly small circle of highly trained people. Researchers used social science methods and data largely drawn from national surveys to produce accurate information that policy makers could use to make key decisions. But things are very different today.
The walls of training and barriers to access that used to make data the province of an educated elite—the people who could interact with a raw dataset, had access to the right software, could extract meaning from the data, and understood the limitations of the sources—have fallen. Today anyone with a mobile device can access, create, analyze, and disseminate vast quantities of information. This is the democratization of data.
And this transformation is producing a lot of valuable work that advances the public good. Here in Chicago, I am struck by the work of Smart Chicago and the Art Institute of Chicago who cohosted an event that explored the relationship between information technology, urban space, and the public good in the age of big data. Another example is the Chicago Tribune’s remarkable data journalism series revealing that the city’s red light cameras were looking more like cash registers than traffic monitors.
While there are lots of positive aspects of data democratization, there are plenty of challenges to address, many of them having to do with making sure the vast new sources of data are used wisely and well. What’s important now is expanding data literacy: helping make the public informed consumers of what they are seeing, reading, and using. When the data no longer flows through the hands of the experts, it must come with added education so that people can use it wisely and to their advantage.
This is where some of the lessons learned during the days when we were figuring out how to use data in the service of democracy can still be very important and useful. It is important to remember that analysis of data is a science and whether we are compiling a dataset from traditional survey data or scraping it from social media, there are key questions we must keep in mind. First, is it representative of the population or phenomenon we are trying to understand? Second, is it big enough for us to draw meaningful conclusions? Third, is it asking the right questions, in the right ways, to address what we need to know? Fourth, is it open and transparent about its limitations and possible biases. Without this information, it is hard to trust the results. The election season certainly provides plenty of examples of data being put to use to advance a preconceived point of view.
The democratization of data can be a powerful force for good and it will certainly transform the ways society makes informed decisions. As that transformation takes place it is important to keep both the data sound and the science intact.
Dan Gaylin is President and CEO of NORC at the University of Chicago, one of the nation’s premier social science research institutes. Gaylin’s career has spanned think tanks, commercial consulting, and government. A nationally recognized expert on health policy and program evaluation, his work has focused on using complex data of many different types and sophisticated analysis to inform some of the most important issues facing society. At NORC he leads a staff of 2000 people who conduct research across the spectrum of the human experience including economics, markets and the workforce; education training and learning; global development; health and well-being; and society and public affairs.