Data not only drives our modern world; it also bears enormous potential. Data is necessary to shape creative solutions to critical challenges including climate change, terrorism, income and racial inequality, and COVID-19. The concern is that the deeper you dig into the data, the more likely that sensitive personal information will be revealed.
To overcome this, we have developed and released a first-of-its-kind open source platform for differential privacy. This technology, pioneered by researchers at Microsoft in a collaboration with the OpenDP Initiative led by Harvard, allows researchers to preserve privacy while fully analyzing datasets. As a part of this effort, we are granting a royalty-free license under Microsoft’s differential privacy patents to the world through OpenDP, encouraging widespread use of the platform, and allowing anyone to begin utilizing the platform to make their datasets widely available to others around the world.
Cynthia Dwork, Gordon McKay professor of CS at Harvard and Distinguished Scientist at Microsoft said, “Differential privacy, the heart of today’s landmark milestone, was invented at Microsoft Research a mere 15 years ago. In the life cycle of transformative research, the field is still young. I am excited to see what this platform will make possible.”
Differential privacy does this via a complex mathematical framework that utilizes two mechanisms to protect personally identifiable or confidential information within datasets:
- A small amount of statistical “noise” is added to each result to mask the contribution of individual data points. This noise works to protect the privacy of an individual while not significantly impacting the accuracy of the answers extracted by analysts and researchers.
- The amount of information revealed from each query is calculated and deducted from an overall privacy budget to halt additional queries when personal privacy may be compromised.
Through these mechanisms, differential privacy protects personally identifiable information by preventing it from appearing in data analysis altogether. It further masks the contribution of an individual, essentially rendering it impossible to infer any information specific to any particular person, including whether the dataset utilized that individual’s information at all. As a result, outputs from data computations, including analytics and machine learning, do not reveal private information from the underlying data, which opens the door for researchers to harness and share massive quantities of data in a manner and scale never seen before.
“We need privacy enhancing technologies to earn and maintain trust as we use data. Creating an open source platform for differential privacy, with contributions from developers and researchers from organizations around the world, will be essential in maturing this important technology and enabling its widespread use,” said Julie Brill, Chief Privacy Officer, Corporate Vice President, and Deputy General Counsel of Global Privacy and Regulatory Affairs.
Over the past year, Microsoft and Harvard worked to build an open solution that utilizes differential privacy to keep data private while empowering researchers across disciplines to gain insights that possess the potential to rapidly advance human knowledge.
“Our partnership with Microsoft – in developing open source software and in spanning the industry-academia divide – has been tremendously productive. The software for differential privacy we are developing together will enable governments, private companies and other organizations to safely share data with academics seeking to create public good, protect individual privacy and ensure statistical validity,” said Gary King, Weatherhead University Professor, and Director Institute for Quantitative Social Science, Harvard University.
Because the platform is open source, experts can directly validate the implementation, while researchers and others working within an area can collaborate on projects and co-develop simultaneously. The result is that we will be able to iterate more rapidly to mature the technology. Only through collaboration at a massive scale will we be able to combine previously unconnected or even unrelated datasets into extensive inventories that can be analyzed by AI to further unlock the power of data.
Large and open datasets possess an unimaginable amount of potential. The differential privacy platform paves the way for us to contribute, collaborate and harness this data, and we need your help to grow and analyze the world’s collective data repositories. The resulting insights will have an enormous and lasting impact and will open new avenues of research that allow us to develop creative solutions for some of the most pressing problems we currently face.
The differential privacy platform and its algorithms are now available on GitHub for developers, researchers, academics and companies worldwide to use for testing, building and support. We welcome and look forward to the feedback in response to this historic project.
Tags: AI, artificial intelligence, data privacy, Data Protection, Open Data, Privacy