The Data Dilemma - Microsoft On the Issues

The world is awash in data—on one estimate, almost three zettabytes (three billion terabytes) of information had been created by 2012, a digital deluge that is growing at around 50% a year. A unique combination of technological innovation, social media, ubiquitous connectivity and digital globalization, among other factors, is fueling this exponential growth in the volume, variety and availability of data. At the same time, increasingly powerful computing technologies can now take massive amounts of these data, commingle them, and use advanced machine-learning and analytics to gain new insights and knowledge. And we are only at the start of this data revolution.

As the World Economic Forum (WEF) observes in a new report, Unlocking the Value of Personal Data: From Collection to Usage, these technologies hold extraordinary potential for new innovations, economic growth and societal benefit. For example, predictive models developed from large-scale hospital data sets can be used to identify patients who are at the highest risk of being rehospitalized within 30 days after they are discharged. A recent analysis using Microsoft technology applied machine learning to a large multi-year data set of patient hospitalizations in the Greater Washington, DC, metropolitan area. The resulting predictive model can reveal risk factors that were previously undetectable: for example, if a patient was admitted for congestive heart failure, they were more likely to be readmitted within 30 days if they were depressed or taking drugs for gastrointestinal disorders.

There are, however, risks. As their personal data flow unseen across global networks, people are increasingly concerned about a loss of control, and their growing reliance on technologies that impact their lives in ways they don’t understand, and often can’t even know. Small wonder that regulators are concerned about an imbalance between industry and individuals, and moving to protect citizens from risks posed by a data-driven economy.

The challenge, as the WEF points out, is to protect individuals without destroying data’s socio-economic potential. Some policymakers favor the imposition of additional restrictions at the time data is collected, an extension of today’s “notice and consent” approach that will slow the flow of data, slow innovation and—in the process of mitigating risk—deprive individuals and economies of many potential benefits.

In reality, it’s not primarily the collection of data that is the source of potential harm, but its unconstrained use. Yet what is considered acceptable use is personal, contextual, and subject to change: ten years ago few people would have shared every detail about their lives with distant friends; now, we use Facebook. Going forward, we need a flexible policy framework that can accommodate evolving technologies and create a sustainable data ecosystem that will drive new business models and innovation—while also strongly protecting the rights of individuals. Such a framework must also be driven by principles that can promote trustworthy data practices.

Technology can help policymakers achieve this. For example, user permissions and policies can be permanently bound to data, requiring that any party handling that data does so in accordance with the user’s wishes. Such a “metadata-based” architecture can also enable users to change their preferences and permissions, and help prevent further unanticipated use of previously collected data in ways they find objectionable. It can be a highly effective way to strengthen the enforcement of user permissions in a decentralized data ecosystem.

When implemented as part of a principles-based policy framework that provides guidance on trustworthy data practices, and supplemented by voluntary but enforceable codes of conduct, this is a flexible approach that holds the promise of satisfying the interests of regulators, individuals, and industry. It could also prevent an immensely promising and innovative driver of socio-economic benefits from stalling.