In 2016, the United Nation’s International Organization for Migration (IOM), the International Labor Organization (ILO) and the Walk Free Foundation estimated that approximately 40 million people around the world were subjected to a form of modern slavery. In addition to the immense personal suffering represented by this estimate, the shared tragedy is that nobody really knows the true extent of human trafficking. Without a strong foundation of data standards and data sharing that show the true nature and scale of the problem the collective action needed to eradicate all forms of human trafficking and exploitation will remain elusive.
Today, IOM is releasing the first ever synthetic dataset of individual survivors of trafficking. The dataset is the largest collection of primary human trafficking case data ever made available to the public. It represents victims identified by IOM and major anti-trafficking organizations around the world, including Polaris, Liberty Shared, OTSH and A21. No combination of attributes in the released data can be used to isolate and identify any individual, or even a small group of individuals, which is paramount to the protection of vulnerable populations – including victims of trafficking. This dataset promises to give policymakers around the world detailed insight into how trafficking affects their countries to help build understanding, inform policy decisions and direct assistance and prevention resources more effectively.
This data release was made possible by Synthetic Data Showcase, a new open-source tool from Microsoft, developed with IOM as part of Tech Against Trafficking – a coalition of technology companies co-founded by Microsoft in 2018. Synthetic Data Showcase is one of several open-source software tools and infrastructure being developed by the Societal Resilience team within Microsoft Research aimed at empowering non-technical experts to address challenges that accompany crises, from global pandemics and climate change, to corruption, cybercrime, misinformation and human trafficking.
The Societal Resilience team takes a new approach to science and research that’s tuned for a post-pandemic, post-carbon world of frequent and severe crises – and is guided by a mission to build the open-source software tools to enable reusable, community-oriented and multi-stakeholder collective problem solving needed to create a more resilient society.
Roots in the deep and dark web
My work in this area dates to the early and mid-2010s, when I was a program manager at the Defense Advanced Research Projects Agency (DARPA) leading a program called Memex. As part of the program, we built software tools to conduct domain-specific searches on the deep and dark web – the parts of the internet that are not indexed by commercial search engines such as Bing, Google and Yahoo – and to create visualizations and machine-learned models that can show connections between disparate data points.
In collaboration with law enforcement agencies and human rights organizations, we applied these tools to start collecting – and connecting – the “dots” that allowed us to build a rich and detailed view of organized, criminal sex trafficking networks. For example, the tools revealed connectivity between seemingly disconnected websites that advertised sexual services, including the same telephone numbers and email addresses. This allowed law enforcement agencies to find evidence of human trafficking networks, leading to order-of-magnitude increases in trafficking prosecutions and even large international network busts.
We found that these tools were effective in revealing the size and degree of organization of a form of human trafficking happening online. This understanding of the scope of the problem led us to shift toward a market-oriented approach. We asked, “How can we start to understand the way that prices, advertisements and policies influence this area of activity as if it were a market?”
For example, when the U.S. passed legislation that made it illegal for platform companies to provide content that could be related to human trafficking, this content disappeared from websites like Backpage and Craigslist. We could check the impact of the policy by asking questions like, “If the platforms can’t host these advertisements, does that change whether and how sex trafficking occurs?”
Over time, the market-oriented approach to the problem of human trafficking has evolved into a policy frame. At the same time, the problem of human trafficking has continued to grow in response to crises including the COVID-19 pandemic and extreme weather events that disproportionately disrupt and force migration of the world’s most vulnerable people. Now we’re asking if we can inform evidence-based policies that bring new levels of support to affected populations – and whether the technologies that help us in this process might also enable a new kind of science. Can we use data and software to help societies survive crises?
Synthetic data showcase
We know that the basis of evidence is data, and that data on real-world problems can be highly sensitive – especially if it relates to vulnerable individuals. However, to shape and inform policy this data needs to be shared. This requires technology that preserves the privacy of individuals represented in sensitive datasets, while also retaining the structural and statistical properties of those datasets. Microsoft’s Societal Resilience team worked in collaboration with IOM to create a dataset – the Global Human Trafficking Synthetic Dataset – that is full of the type of vital, sharable and privacy-preserving real-world evidence needed to inform policy.
The dataset was created with Microsoft’s open-source Synthetic Data Showcase tool and released today through the Counter Trafficking Data Collaborative – the first global data portal on human trafficking.
The downloadable dataset represents data from more than 156,000 victims and survivors of trafficking across 189 countries and territories. It provides first-hand, critical information on the socio-demographic profile of victims, types of exploitation and the trafficking process, including means of control used on victims – all vital information needed to better assist survivors and prosecute perpetrators.
Strong privacy guarantees that preserve the anonymity and safety of victims and survivors is key to the success of the dataset. That’s because, while the data is essential for evidence-based policy, publishing the data directly risks revealing the presence of individual victims to their traffickers. If traffickers believe a victim has received assistance, they may assume that this implies collaboration with law enforcement and initiate retaliation against the victim, their friends, family or community.
Synthetic Data Showcase generates synthetic, aggregate and anonymous data for sharing in place of actual sensitive data and brings it together in an intuitive interface that enables privacy-preserving exploration. By making every effort for this information to be openly and safely available, IOM and Microsoft hope to ensure the voices of victims and survivors are heard and protected while empowering governments and other stakeholders to take progressive and collective action to end this crime.
Prior to the COVID-19 pandemic, my colleagues in Microsoft Research were working on and exploring the potential impact of open-source software tools like Synthetic Data Showcase – the pandemic accelerated the effort. We realized that the world was forever changed and our role as computer science researchers needed to change with it. To be relevant today, we need tools to practice science at the pace of a crisis. To deliver results that matter, we need to adopt a mission-oriented mindset to research that focuses on durability instead of efficiency, emphasizes a community-oriented perspective and leverages a collective problem-solving approach.
At the outset of the pandemic, I was surprised to learn how different institutions make decisions – what kind of evidence leads them to start a clinical trial, fund medical research or change public health policy. The differences I observed highlighted an urgent need for tools to collect and understand real-time data to inform decision-making, especially in scenarios where established methods like randomized controlled trials and A/B testing are unavailable. For example, as people across the globe were getting sick, could we examine the differences between those getting more or less sick? By doing so, could we learn how to improve outcomes in the immediate term?
For this crisis-oriented science, we need to measure what’s happening in real-time, observe things as they unfold and gather enough evidence, with enough justification to determine the answers to important heath or policy-oriented questions. Prior to the pandemic, the computer science and machine learning community had not always emphasized the importance of real-world evidence. The pandemic forced us to prioritize turning observations of activity into the kind of evidence that individuals and institutions can have confidence in, whether making a medical decision, allocating resources around healthcare or creating a policy response to human trafficking.
The pandemic also revealed the need for a new, mission-oriented perspective on research. To be useful in a rapidly changing context, our software tools must have the ability to flex, scale, adapt and generalize. When creating software to address shifting, ill-defined, global problems like COVID-19, we must focus on robustness and reuse rather than on efficiency that can lead to isolated and brittle systems. To build these kinds of tools, we must combine individual expertise with a collective problem-solving approach. We’ve seen how emerging threats disproportionately affect those with the fewest resources to address them. Solving problems for society means that we have to build multi-stakeholder relationships with individuals, institutions, governments and communities to develop technologies that serve their needs.
As COVID-19 tore through communities, we raced to acquire the medical knowledge needed to make policy decisions about treatment and prevention. We learned that addressing real-world problems is messy. You don’t have the same control that you have in the normal scientific process. Data is incomplete and constantly evolving, and the coordination between different groups who need access to data is complex. We realized that a policy ecosystem grounded in science requires tools built for the speed and scale of a crisis, such as open-source platforms that uphold the privacy, accessibility, interoperability and validity of data.
The Societal Resilience team is creating software tools and infrastructure aimed at empowering non-technical experts, but experts in their domains and subjects such as human trafficking, medicine and policy. These tools will allow subject matter experts to organize their data and leverage real-world evidence to engage in real-time causal analysis, and share insights with others for decision-making that is grounded in real information.
For example, ShowWhy, another open-source software tool developed by the Societal Resilience team, uses an intuitive interactive visualization interface to guide users through the causal inference process. Our team will use it to explore air quality data collected as part of Project Eclipse, an innovative collaboration in Chicago among community groups, private businesses, environmental justice organizations and local government agencies to measure neighborhood-scale air quality with a network of low-cost sensors deployed at bus shelters throughout the city. This collaboration shows the power of community-engaged technology development and deployment to incorporate local knowledge and subject matter expertise to inform sensor placement and hypothesis formation. For example, data collected by the sensors can be analyzed with ShowWhy to illustrate how pollution from specific locales leads to degraded air quality in specific neighborhoods.
One of the greatest levers we have to achieve societal impact at scale is policy as policy can determine allocation of money, people and other resources. We believe that evidence-based policies are more likely to have beneficial impact than those focused on assumptions. That’s why, in addition to building technologies that offer direct assistance in times of a crisis, (for example the Vaccine Eligibility Bot) the Societal Resilience team is focused on building a class of tools and technologies that operate at higher level of evidence-based policy.
Only through policy can we improve the condition of the most vulnerable in society, and it is the condition of the most vulnerable that truly defines societal resilience. By partnering with frontline organizations to elevate the resilience of such groups, we have a direct path toward elevating the resilience of society as a whole – whether by combating crimes like trafficking and corruption, or learning to live in a post-pandemic, post-carbon world.
All of these issues require new kinds of technology, and a new kind of science. However, we’re unlikely to make meaningful progress without thinking deeply about what it means to research and design for societal resilience. Technology alone is never the solution and many societal resilience problems can never truly be solved. Instead, it’s about bringing together the right kind of technology expertise, social science expertise and frontline expertise that can fundamentally transform real-world practice for the better – one day at a time.
Why Microsoft Research?
My personal motivation has always been problem solving, empowerment and enabling the voices of others. It’s exciting to be a part of a company and an industrial research lab that has the same mission orientation. With our global scale and access to computer science talent, we have the resources and opportunity to focus on solving problems that are relevant for society at scale.
The society we are seeking to empower is global. This investment isn’t about revenue; it’s about societal maintainability, human rights and corporate responsibilities. We can’t build and promote technological solutions that are only affordable to a select few individuals, organizations or countries. The only way we can do this is by building tools that enable affordable access to capabilities that may otherwise be unaffordable or inaccessible.
The open software approach also aligns with Microsoft’s business. When the world succeeds, Microsoft succeeds. With our scale, open technology and collaborative ecosystem, we can have real societal impact that advances Microsoft’s mission to empower every person on the planet to achieve more. What is more aligned with this than to help every person on the planet survive crises better? And how better to prepare for these crises of the future than to start developing today the technology, evidence and policy that we’ll need to make a difference.
- Microsoft Societal Resilience
- Real-world evidence and the path from data to impact
- Societal Resilience – Microsoft Research
- Synthetic Data Showcase
- Vaccine Eligibility Bot
- Project Eclipse
- Tech Against Trafficking
- Counter Trafficking Data Collaborative
- Tech Minutes: Organizational Analytics
Top image: Chris White, the managing director of Microsoft Research Special Projects, is leading a team that is developing open-source software tools and infrastructure that empower non-technical experts to address challenges that accompany crises. Photo courtesy of John Edwards / TriFilm