Closing the data divide: the need for open data

Today, Microsoft is launching an Open Data Campaign to help address the looming “data divide” and help organizations of all sizes to realize the benefits of data and the new technologies it powers. We believe everyone can benefit from opening, sharing and collaborating around data to make better decisions, improve efficiency and even help tackle some of the world’s most pressing societal challenges.

The goal of our campaign is to advance a much-needed discussion about how the world uses and shares data. To start, today we’re announcing three steps:

  • First, we’re publishing new principles that will guide how Microsoft itself approaches sharing our data with others.
  • Second, we’re committing to take action by developing 20 new collaborations built around shared data by 2022. This includes work with leading organizations in the open data movement like the Open Data Institute and The Governance Lab (GovLab) at the New York University Tandon School of Engineering. And we’ll seek to lead by example by making our Microsoft social impact initiatives “open by default,” beginning with sharing data on broadband access from our Airband initiative and combining it with data from others to help accelerate improvements in broadband connectivity.
  • Finally, we’ll invest in the essential assets that will make data sharing easier, including the required tools, frameworks and templates.

In recent months, we’ve again seen the benefits that better data sharing can bring not just for companies and other organizations, but also in tackling the world’s biggest challenges. From climate change to the COVID-19 pandemic, it is clear that data plays a critical role in helping us understand these challenges and in addressing them. To fully realize the benefit of data, we need to develop the ability to share data across organizational boundaries in a way that is safe and secure, and allows the data to be used effectively.  If ever there was a time to accelerate the world’s efforts around open data, it is now. We hope our steps today can contribute to these efforts. We’re committed to the cause, and to learning from and working with others.

YouTube Video

What do we mean by the “data divide” and why now?

Despite the enormous growth in data and AI, both are increasingly concentrated in the hands of a small number of companies. Indeed, fewer than 100 companies now collect more than 50% of the data generated by online interactions (based on analysis of, and and around half of all people with technical AI skills work in the technology sector (according to figures from LinkedIn). Not surprisingly, these businesses are then able to reap the enormous benefits of data and AI while others are left at a disadvantage. This data divide poses a serious challenge for society and, if left unaddressed, could lead to huge economic power flowing to just a few countries and companies. Based on current trends, for example, PWC predicts that around 70% of the economic value generated by AI will accrue to just two countries: the USA and China. But we do not believe that an ever-growing data divide is inevitable. By doing more to open up and share data, organizations can unlock value, share expertise and make data more useful for all, allowing everyone to benefit in ways they are not able to by going it alone. By acting now and joining together, more civil society organizations, governments and businesses of all sizes will be able to realize the full value of data.

Charting a principled course

To help guide our own efforts on open data, we are adopting a set of principles to inform how we at Microsoft open and share data in a responsible way. We’ve learned through our work on protecting privacy, responsible AI and sustainability that it is valuable to define a clear set of principles when engaging with important and complex societal issues. We hope these principles will inform the broader conversation on open data and that others can build on and improve them. The five principles that will guide our contributions to trusted data collaboration are:

  • Open – We will work to make data that is relevant to important social problems as open as possible, including by contributing open data ourselves
  • Usable – We will invest in creating new technologies and tools, governance mechanisms and policies to make data more usable for everyone
  • Empowering – We will help organizations generate value from their data according to their choices, and develop their AI talent to use data effectively and independently
  • Secure – We will employ security controls to ensure data collaboration is operationally secure where it is desired
  • Private – We will help organizations to protect individuals’ privacy in data-sharing collaborations that involve personally identifiable information

Each of these principles is important. However, as has become clear to us in our work in this area, one stands out as the most challenging but vital key to success: the need to make data more usable. Unless organizations are able to collect and categorize data in a standardized way, they will not be able to aggregate and analyze it in a manner that produces the transformative insights that shared data has the potential to unlock.

Committing to new collaborations

In addition to charting a principled course, we believe success will depend on building deep collaborations with others from across industry, government and civil society around the world. We want to try and lead by example and do more to learn firsthand about the challenges and solutions around open data. To this end, Microsoft is committing to launching 20 data collaborations by 2022, building partnerships to tackle the major challenges of our time. To help seed these collaborations, Microsoft will make its social impact initiatives “open by default” and explore whether our data related to initiatives such as Airband, AI for Good and our work on sustainability and accessibility might be able to be opened up and built on to help solve major challenges. We are excited to be partnering with the Open Data Institute in this effort, working together to develop our initial collaborations and share the lessons we learn with others so that they may also benefit. Our initial work will focus on:

  • Tackling connectivity challenges: Microsoft is publishing under open agreement on GitHub a small, but important, dataset around broadband usage in the United States, gathered as part of our Airband Initiative. We will be working with the Open Data Institute and BroadbandNow, a company that help consumers find broadband access in the U.S. to add to this dataset to help improve broadband availability. The BroadbandNow dataset provides county-level pricing and competition data.
  • Addressing COVID-19: As one of the most pressing challenges today, we will contribute to the work being done to use data to address the COVID-19 crisis. This includes expanding work Microsoft is doing with partner Adaptive Biotechnologies to decode how the immune system responds to COVID-19 and share research findings via an open data access portal for any researcher to use in the fight against the pandemic. More broadly, Microsoft has also built a COVID-19 tracker on our Bing search engine and is releasing aggregated data to those in academia and research. We are also working with GitHub, which is hosting a range of collaborative COVID-19 projects, including open source software, hardware designs, models and many leading COVID-19 datasets.
  • Helping cities collaborate around data: Microsoft will partner with Arup and the Oliver Wyman Forum on the London Data Commission, an open data initiative run by London First working with the Greater London Authority and others, to lead a data collaboration project around city-based data that can help address social and economic challenges in London. 
  • Helping governments collaborate around data: To help governments better open up and collaborate around data, we will co-launch the Open Data Policy Lab with The GovLab at NYU. The Lab will provide a live repository of best practices and resources with a focus on: 1) analysis, in the form of comparative research of data initiatives that contribute to economic development; 2) guidance, to include toolkits, frameworks and best practices to support data sharing and data-driven decision-making; 3) community, of data stewards and other data stakeholders within the public and private sectors; and 4) action, to implement proof-of-concept initiatives.
  • Advancing data-driven healthcare: This work will enable the first global data collaborative to improve cardiovascular health, bringing together data from a range of sources to help address one of the world’s leading causes of death. Microsoft is working with the Novartis Foundation, Apollo Hospitals in India and Coala Life in Sweden to consolidate their respective cardiovascular datasets from hospitals and primary-care centers around the world. The collaborative aims to further develop and use the leading cardiovascular AI tool – AICVD Risk Score, created by Apollo Hospitals – to accelerate the use of data-driven decisions in tackling cardiovascular disease and informing the direction of health policy.

YouTube Video

Making data sharing easier and safer

If data is open and available but unusable, it serves little to no purpose. We are therefore committing to helping tackle the problems created by the lack of easy-to-use tools and frameworks for sharing data to ensure that we are able to help make data more usable. One big challenge we have seen in our work on data sharing and the analysis we’ve been doing to help fight the COVID-19 crisis is the difficulty around inconsistent data collection. Currently, data is collected in a variety of different formats and document types – some in Word documents, some in PDFs, some in spreadsheets, some still on paper. This makes it all but impossible to share and aggregate data in a way that is valuable and provides a huge barrier to collaboration. The campaign will work to address this challenge and also continue our work to develop scalable tools that any organization can utilize, reducing the friction around sharing.

In this work, there are valuable lessons to be taken from the world of open source software. While there are important differences between data and code, particularly around the steps needed to address privacy and security considerations when dealing with data, our experience with open source provides us with insights for enabling successful collaboration. A priority will be continuing our work on open data use agreements, providing templates that anyone can use to easily share data and continue to build on the governance, licensing and legal tools provided on the Open Data Campaign microsite. We will also continue to advance our work on differential privacy with Harvard’s IQSS, providing tools to allow people to extract useful insights from datasets in a way that safeguards the privacy of individuals.

Closing the data divide is a big challenge. But the benefits for organizations of all sizes, and the broader community are significant if we can work together to make progress on open data. We’re committed to making our contribution, and we look forward to working with, and learning from, others so that everyone can realize the benefits of data.

Tags: , , , , , , , , , ,