Data & Transparency

Celebrate Open Data Day in New York This Weekend

Every March, we’re excited to join data enthusiasts worldwide to celebrate International Data Day, a worldwide event that promotes awareness and use of open data. Through a series of events around the globe, people of all skill levels converge to create new projects, analyze data, and find new ways to visualize data.

We believe open data is a priority for civic tech enthusiasts — and we invite you to join us as we kick off the open data celebration this weekend. Here are some highlights of this weekend’s schedule — we hope to see you there:

March 3-5

Giving Tuesday DataDive, Presented by 92Y, DataKind, and the Bill & Melinda Gates Foundation

  • Friday 3/3 6:30pm-8pm EST: discuss goals for the DataDive and dive into the data!
  • Saturday 3/4 9am-9pm EST: choose a team and get to work!
  • Sunday 3/5 9am-3pm EST: final presentations and networking
    Note: You can attend one or all days.

We’re thrilled to be hosting a DataDive March 3-5 and are looking for data pros of all backgrounds to roll up their sleeves and work side by side with experts from the 92Y, the Bill & Melinda Gates Foundation, and Facebook to help use data to unravel tough questions and prototype new solutions.

March 4

International Data Day

Open Data Day is an annual celebration of open data all over the world. For the fifth time in history, groups from around the world will create local events on the day where they will use open data in their communities. It is an opportunity to show the benefits of open data and encourage the adoption of open data policies in government, business and civil society. View activities happening around the world here.

NYC School of Data (SOLD OUT)

New York City School of Data is a community conference showcasing NYC’s civic design, civic/government technology, and open data ecosystem.

March 6

Civic Hall Presents: Open Data, Mapping Global Security & the Department of Defense

How can we get national security data into the open? The National Geospatial-Intelligence Agency (NGA) will demo its geospatial data portals for the Arctic, for combating wildlife trafficking in Africa, and for Hurricane Matthew.

March 7

Five Year Anniversary of New York City’s Open Data Law, Local Law 11 of 2012

In many countries, states and cities Open Data is a policy – here in New York City it is a law, which ensures that Open Data is here to stay.

NYC Chief Analytics Officer Dr. Amen Ra Mashariki speaking at Socrata’s Connect 2017 Conference in DC

10 – 10:25am on the Main Stage. Livestream details coming soon.

NYC Big Apps Workshop – NYC Open Data Portal & Department of City Planning Facilities Explorer Tutorials

Join members of the NYC Open Data team and Department of City Planning for a demo of the NYC Open Data Portal and new Facilities Explorer tool (launching soon) followed by a breakout session at the Tuesday March 7th NYC Big Apps Workshop. You’ll learn the basics about how to access NYC data (1600+ datasets!) and get an overview of other tools such as the Facilities Explorer powered by NYC Open Data that you can use to support your research and work for the Big Apps competition as well.

March 8

Made in NY Media Center + Fabernovel Data & Media: Open Data Breakfast

Whether you are a developer, agency or civil service non-profit having access to data drives business, improves services, and promotes free public access.

Together with FaberNovel we are hosting and interactive breakfast and conversation on March 8th to learn more about the City of New York’s Free Open Data Portal and how you can use it to build products, conduct research and analysis or create new applications.

Department of Small Business Services: 2017 Smart Districts Summit

Inaugural NYC Smart Districts Summit, where community and technology leaders will collaboratively explore how emerging technologies are being leveraged to address the most pressing district-level challenges.

College of Staten Island (CSI) Tech Incubator + Vizalytics: Data – A Driving Force of Innovation

Connect with us to discover how organizations and entrepreneurs are utilizing data to drive innovation within our local community. Learn the practices, technologies, and patterns the experts use to fuel their enterprises by way of big data.

March 9

Reaktor Open Data Studio

The goal of this evening is to share some ideas about how Open Data could be utilized in new ways, especially in New York. We have a happy hour with benchmarks from Helsinki, where open data catalogues have been advanced for a while, and companies and developers alike are used to creating cool applications for it.

Join us to hear examples of applying open data in a user-friendly way, and let’s come up with new ways to use open data to create new tools.

General Assembly Panel Discussion: Data and…Health

Big Data is continuing to significantly impact the way in which organizations operate and make informed business decisions. Emerging technologies are now paving the way to innovative medical developments, and it looks as though data is beginning to transform the entire healthcare industry! In collaboration with the first annual NYC Open Data Week, GA is bringing together influencers from the health and wellness spaces to discuss how data is impacting their organizations.

March 11

NYC Parks Computer Resource Centers Open Data for All: TreesCount! Workshop

This free workshop, presented by NYC Parks and the NYC Open Data team, offers a broad introduction to the NYC Open Data Portal along with the concept of data literacy and analysis.

Using NYC TreesCount! 2015 data, the most accurate map of NYC’s street trees ever created, you will learn how to identify, download, manipulate, and visualize NYC Open Data with a focus on community engagement and awareness. Using tools such as Google Sheets and CARTO, you will be able to create your own graphs and maps from NYC Open Data.

Heat Seek Keeps the Heat On This Winter With Data, Tech & Transparency

During New York City’s annual Big Apps competition in 2014, I learned about a new civic tech solution called Heat Seek, and I’ve been a supporter ever since. I’m currently a proud member of the Board of Directors of this tech-powered non-profit that is making a real impact by using an internet of things approach to empower tenants, landlords, community organizations, and the justice system with accurate data that can make a dent in our city’s heating crisis. For many New Yorkers, heat equals health, education, and opportunity. Heat Seek uses technology to defend people’s rights to those things. Thanks to Co-Founder and CEO Noelle Francois for sitting down with us to tell the Heat Seek story.
John Paul Farmer

downloadIt’s cold in your apartment, but the heat and hot water are regulated by your landlord. The wall thermostat reads 57 degrees Fahrenheit for three days — well below the city’s mandate. When the landlord dodges your calls, you file a complaint with the city’s 311 service. Immediately before inspection, the landlord raises the heat, only to lower it back once the city’s inspector leaves. Your options are limited: continue to freeze, find a new apartment, or take action. In court, the proof you provide is your word and handwritten heat log with temperatures and timestamps you’ve recorded yourself. Oftentimes, it’s not enough to get the landlord to comply.

This is the reality for many New Yorkers. Every year, thousands of renters spend the winter in a frigid apartment, Heat Seek Executive Director Noelle Francois told Microsoft New York. Heat Seek aims to combat this struggle. Their mission is to make the city a safer, warmer place to live for all New Yorkers. The nonprofit uses tech to empower tenants, providing unbiased evidence (data) to verify heating code abuse claims in court.

Last winter, the city received more than 200,000 heat-related complaints from 37,000 different buildings, most of which were in lower-income neighborhoods throughout Upper Manhattan, the Bronx, and Brooklyn.

“What we find is that for the specific subset of tenants we work with most closely, they’re being harassed — they don’t just have inept landlords,“ Francois said.

Heat Seek installs temperature sensors in these apartment complexes, where tenants have not been able to resolve heating problems through traditional channels. The sensors, consisting of a printed circuit board, a thermistor, a Raspberry Pi, and a wireless modem, talk to each other through a mesh network. Once an hour, they collect and transmit ambient temperature data to Heat Seek’s servers. Tenants and their lawyers can access this data on a web app. This data, integrated with public 311 heating complaint information, illustrates what Heat Seek has determined is a heating crisis in New York City.

The sensors come at no cost to the tenants participating in Heat Seek’s program, and can function without Wi-Fi in the homes of elderly or lower-income tenants — those who need the data the most.


Credit: Heat Seek

“The idea was always that we want to be able to serve the most vulnerable tenants in the city who don’t have the means or the resources to solve this problem on their own. We didn’t want cost to be a barrier,” Francois said.

Heat Seek sensors are currently installed in about a dozen buildings, with an expectation to expand to 25 buildings by the end of the season.

“We intentionally scaled down from last year because one of the big things we found last year was that providing tenants with data is great, but it’s not enough, especially if tenants don’t have a lawyer and don’t know what to do with that data,” Francois told MSNY.

Heat Seek has begun working hand-in-hand with tenant organizers, public interest attorneys, and city officials at the Housing Preservation & Development department. Together, they look at the city’s open data — complaints, violations, court cases, change in rent-stabilized units, and other indicators that demonstrate a building might benefit from Heat Seek’s sensors. Heat sensor data is shared with the city, so inspectors can drop by unannounced to confirm a pattern in the data.

“We’re hoping that this year, with a more targeted approach, we’re able to see a higher percentage of the buildings where we have sensors actually resolve their issues,” Francois said.

They’ve already seen success. At 178 Rockaway Parkway in Brownsville, sensors were installed in partnership with the Legal Aid Society in October. Nearly a quarter of the time, the temperature hovered around 60 degrees, in violation with NYC Housing Code. In December, Heat Seek held a press conference in front of the building, and Legal Aid Society filed a case against the landlord. Before the case saw a trial, a day after the press conference, the heat came on almost 10 degrees warmer.


A graph of the temperature inside apartments at 178 Rockaway Parkway in Brownsville, before and after Legal Aid Services took action using Heat Seek’s data. Credit: Heat Seek

“After we see more of that impact, then it’s about scaling. There’s no point in scaling for scaling sake,” Francois said of the company’s plans to expand.

Looking ahead, Heat Seek plans to focus on some of the neighborhoods that are up for rezoning as part of the mayor’s housing plans.

“We know that during rezoning and after rezoning, the cost of living in those neighborhoods goes up. Landlords can start to charge more for rent, making it difficult for a lot of the tenants. We want to at least eliminate this one harassment tactic, of refusing to heat the apartments, that’s really effective in driving tenants out,” she said.

They aim to help landlords, too, to heat their buildings more effectively while reducing costs.

“We’re trying to be a non-biased third party,” Tristan Siegel, a coder with Heat Seek since the beginning, told the New York Times. “Even though we did start with tenants in mind, we’re really trying to bridge that gap.”

Ms. Francois credited the NYCBigApps competition in helping them move from an idea to a civic tech success, as well as support from Civic Hall, Beespace, and Robinhood Labs, to name a few.

“We’re proud to be a part of the civic tech, tech for the public good, community. We’re a nonprofit, simply driven to make tech that serves the needs of the partners we work with and the tenants we work with,” she said. “That impacts every aspect of what we do, right down to the design of the sensors.”

Heat Seek works most closely with:
The Legal Aid Society
Legal Services NYC
Brooklyn Legal Services Corp A
Flatbush Tenant Coalition
St. Nick’s Alliance
– Brooklyn Borough President Eric Adams
– Council Member Ben Kallos
– Council Member Ritchie Torres
Housing Preservation & Development
Make the Road

Habitat III: A Once-In-A-Generation Civic Experience


Photo: John Paul Farmer

It’s hard to catch your breath in Quito, Ecuador. Whether it’s the thin air of its 10,000 foot elevation, the natural beauty of its volcanic mountains, or the built beauty of its colonial-era architecture, Quito is a city that leaves you breathless.

Last month, 30,000 people came together in the scenic Ecuadorian capital to discuss the future of cities at Habitat III. Hosted by the United Nations, this once-every-20-years convening marked just the third of its kind, following in the footsteps of Habitat I in Vancouver in 1976 and Habitat II in Istanbul in 1996. UN-hosted World Urban Forums have been held every couple of years in recent decades, although none has reached the scale of Habitat.

At Habitat III, a wide range of individuals and organizations – including governments, companies, non-profits, and academic institutions – gathered to share best practices, to celebrate successes, and to approve a New Urban Agenda that marks the culmination of years of negotiations among United Nations member states.

Gatherings ranged from formal (including official delegate discussions in the National Theater), to participatory (such as the youth assemblies) to informal (like the lightning talks that electrified the expo hall). Some of the most interesting highlights were the following:

The Global Municipal Database – Lourdes German, Director of International and Institute-wide Initiatives at the Lincoln Institute, showcased a dashboard for cities that is built upon Microsoft technologies such as Azure, Power Map and Power BI. Working with cities in Africa, Asia, and Latin America, the Global Municipal Database tracks key fiscal indicators including expenditures, revenue, and borrowing and gives communities the tools to visualize the data and create actionable insights. What’s so powerful about these technologies is that many of their functionalities are Excel-based, meaning millions of people could use them tomorrow to make their cities more transparent and accountable, with no further training necessary.

Water and Resilience – It has been said that everyone has a water problem: either too polluted, too much, or not enough. For example, fully one-third of the Netherlands – a country built on its shipping and ports – lies below sea level. The country’s strength – water – is also its greatest vulnerability. With years of such experience living with water, the Netherlands was especially well qualified to host a conversation on the subject, which included viewpoints from Rotterdam and The Hague as well as a framework shared by 100 Resilient Cities’ Andy Salkin. One insight from The Hague was that resilience is not only physical, but must also be social and digital. Every aspect of a city must be able to bounce back. And while the cities of the Netherlands are especially advanced in learning how to live with water, most cities around the world are just getting started.

Public Spaces – Public spaces also played a key role, with planners asking whether placemaking will be at the heart of cities in the future. With a discussion of Eastern and Western traditions in terms of public spaces, the room erupted into a lively debate, during which an audience member noted that urban planners are increasingly using Microsoft’s Minecraft to engage people – particularly the young – in co-designing their own public spaces.

Housing – Housing was a major focus at Habitat III, for developed cities such as New York and for developing cities such as Lagos alike. With the majority of humanity living in cities for the first time in history, the influx of newcomers creates new stresses. Safe, accessible, and affordable housing is a priority.

Accessibility – A theme that was more woven into the conference experience than something explicitly called out was the need for more accessible communities. Microsoft is increasingly collaborating with cities to use technology to improve accessibility to services, information, and opportunity. “Eliminate the unnecessary barriers that limit our potential,” implored Dr. Victor Santiago Pineda of the University of California at Berkeley, who also served as co-chair for accessibility at Habitat III.

Youth – A particularly interesting aspect of Habitat III was the prevalence of young people everywhere you went. While most delegates were more senior, accomplished professionals, the conference grounds also teamed with young people of high school and college age. Many of those youth were local Ecuadorians engaging with this once-in-a-lifetime event that was on their home soil. Others were young people from around the world who journeyed to Quito to serve as agents of change. A middle-aged delegate at one youth-run session exclaimed “I’ve been going to sessions back-to-back for two days and this is the first one that is participatory. I think we need more of this.”

After several incredible days in Quito, the big question on everyone’s lips was, “What happens next?” How does the New Urban Agenda get implemented? To what extent will cities be prioritized by the UN? What role will technology play in forging solutions to our hardest problems? Will upcoming World Urban Forums be effectively leveraged to ensure steady progress on such audacious goals? Will the assumptions and priorities of Habitat III stand the test of time? Only time will tell.

Habitat III brought together planners, policymakers, technologists, and young people who care about the future of cities. Technology was there and will be an increasingly ubiquitous part of our lives. These new cross-sector connections have the potential to pay dividends between now and Habitat IV in 2036 – but that potential requires action by us to be fulfilled.

Source: Habitat III

Quadratic Voting: Civic Tech for Eminent Domain

Historians say we owe the industrial predominance of England over France to it, but fifty years earlier Jane Jacobs called it “unjust involuntary subsidies…fantastically wasteful of city economic assets.”  Whatever you feel about eminent domain, the government taking of private land to avoid holdouts against development projects, you probably feel it strongly.  Jointly with my colleagues Jerry Green, Scott Duke Kominers and Steven P. Lalley I am working to harness the latest ideas from economic theory and technology to find a solution that almost everyone can be happy with.

I first started thinking about eminent domain during the summer of 2007, when I lived in Rio de Janeiro, Brazil by the slopes of Rocinha, the enormous favela (slum) perched on hills with one of the world’s most beautiful views.  I couldn’t understand why the slums remained there; couldn’t all of the residents be much better off if they were moved out of their poverty in exchange for a share of the enormous income that could be earned building luxury developments over that view?  My wife, and later an expert on squatter settlements, Alisha Holland explained to me that the problem was the lack of eminent domain: given the crime in the favelas, Brazil’s elite would never move in until all squatters could be removed.  But because the squatters lacked formal title to their land, no mechanism existed for compensating them and any effort to evict them would meet with violent political resistance. Sharing Alisha’s sympathy for the inhabitants, I wanted to find a solution that would work for everyone.


The basic problem was that if every resident was given a veto over the project, it would be impossible to ever carry it out, as there would always be some individual who could hold the whole process up.  On the other hand, if the community were given no right to refuse the government, there would be no legitimacy to the action or projection of property rights.  The natural solution is to allow the community of owners to take a vote on whether to accept an offer made to all of them.  

However, such a vote can be very unfair.  A developer might choose to strategically target 50% of the landholders that have small and low-value plots of lands, make them juicy offers and thereby get the land on the cheap.  For a system to be fair, it would have to protect the interests of those who strongly oppose a deal.  The transaction should only go through if, in total, it made the sellers better off.  

To solve this problem, I devised a new voting system, called Quadratic Voting, in which individuals can buy additional votes on an issue at an increasing cost. One vote costs $12=$1, two votes cost $22=$4, three votes $32=$9, etc. This allows a minority strongly affected by a project to express its feelings, but only if the issue is extremely important to them.  My work has shown this is the only rule that causes people to vote in proportion to how much the issue matters to them, thus ensuring that the sale will go through exactly when it benefits the sellers overall.

poli-votingObviously, a quadratic function isn’t easy for most people to grasp, any more than are the powerful algorithms underlying services like Uber.  That’s where the power of technology to force people to understand only what is necessary and let computers do the rest comes in.  A user interface represents the cost of influencing the choice visually with sliding bars, moving at an increasingly rapid pace as votes are cast.  Beyond eminent domain, Quadratic Voting has a variety of other uses in cities and politics more broadly, allowing citizens to find compromises that allow them to have more say on the issues most important to them in exchange for letting their fellow citizens have their way on the issues more important to them.

By bringing the power of civic tech to some of the most contentions issues of local government, Microsoft is paving the way for the happier, wealthier, more harmonious and functional cities of tomorrow.

glen-weylE. Glen Weyl is a Senior Researcher at Microsoft Research New York City, a Visiting Senior Research Scholar at the Yale Economics Department and Law School and an Alfred P. Sloan Research Fellow.  He was valedictorian of his undergraduate class at Princeton in 2007 and received his PhD in economics also from Princeton a year later.

Microsoft Research: Using Data to Transform Engagement

ScreenHunter_24 Mar. 10 20.03

Microsoft Research (MSR) has an incredibly open research policy that allows its researchers to completely direct their work. This is effective for Microsoft for many reasons. First, an amazing caliber of researcher works at MSR, because they are attracted by the freedom. Second, once they get here they naturally gravitate to projects that take advantage of Microsoft data and platforms. But, rather than solving short-term problems, they have the freedom to dream up long-range solutions.

This is critical to Microsoft’s mission for two reasons. First, in constantly evolving technology landscape Microsoft needs to be evolving and MSR helps lead that path. Second, when customers commit to Microsoft’s platforms it is frequently a long-term commitment because it is expensive to move. They want to know that Microsoft is leading innovation, that they are trusted long-term technology partner. MSR provides a showcase for that.

I keep all of this in mind with my research. I was drawn to MSR because of the freedom it gives us and inevitably my research looked for ways to take advantage of Microsoft’s data and platforms. I study how and why people provide information, how to aggregate that information into market intelligence, and how decision makers consumer faster, larger, and more flexible market intelligence. I have been very fortunate to test large-scale polling infrastructure as MSN and Xbox. Where we can reach hundreds of thousands of respondents and revolutionize the impact of opt-in survey tools. I have been fortunate to explore the value of Bing query data and Cortana question answers. Where we can learn how people engage. Some of what I do helps improve these tools in the short run: make them more engaging, more effective at providing information, but much of what I do is think about how they will evolve over time in the years and decades to come.

A lot of the work I do is seen publicly, as demonstrations of where technology can head. In 2012 we had polling on the Xbox that is now seen as a primary example of how opt-in polling, from very unrepresentative respondents, can provide valuable market intelligence. Slowly that work has evolved into more generic infrastructure for Microsoft that we look forward to demonstrating publicly in the coming months. In 2012 we demonstrated some insights from social media and query data on Bing. Slowly that work has evolved into more generic infrastructure for Microsoft that is starting to power backend functionality for Microsoft.

A lot of the work I do is seen publicly, as demonstrations of where technology can head. In 2012 we had polling on the Xbox that is now seen as a primary example of how opt-in polling, from very unrepresentative respondents, can provide valuable market intelligence. Slowly that work has evolved into more generic infrastructure for Microsoft that we look forward to demonstrating publicly in the coming months. In 2012 we demonstrated some insights from social media and query data on Bing. Slowly that work has evolved into more generic infrastructure for Microsoft that is starting to power backend functionality for Microsoft.I have worked on many projects at Microsoft, but they ultimately share a common core. A very defined set of long-range academic theories on how data is collected from opt-in users and analyzed have been empirically tested in many different prototypes and now products. It is a process that can really only be nurtured in an environment like Microsoft Research, in a company like Microsoft.

I have worked on many projects at Microsoft, but they ultimately share a common core. A very defined set of long-range academic theories on how data is collected from opt-in users and analyzed have been empirically tested in many different prototypes and now products. It is a process that can really only be nurtured in an environment like Microsoft Research, in a company like Microsoft.


David Rothschild is an economist at Microsoft Research. He has a Ph.D. in applied economics from the Wharton School of Business at the University of Pennsylvania. His primary body of work is on forecasting, and understanding public interest and sentiment. Related work examines how the public absorbs information. He has written extensively, in both the academic and popular press, on polling, prediction markets, social media and online data, and predictions of upcoming events; most of his popular work has focused on predicting elections and an economist take on public policy. He correctly predicted 50 of 51 Electoral College outcomes in February of 2012, average of 20 of 24 Oscars from 2013-6, and 15 of 15 knockout games in the 2014 World Cup.

DataViz for good: How to ethically communicate data in a visual manner: #RDFviz


Catherine D’Ignazio brainstorms around data inclusion

Last Friday I participated in my second Responsible Data Forum. Last year’s workshop on private sector data sharing (data philanthropy, if you like) inspired some of our thinking and collaborations over the past year, and today’s event about data visualization for social impact did not disappoint. You can see what people posted at #RDFviz, on the wiki, and in a great collection of related resources here.


Mushon Zer-Aviv facilitates the Responsible Data Forum

At the top of the day, we did the classic Post-It note brainstorm to inventory all of the potential avenues for working groups. Given the incredible experience of the people in the room, there was a lot to work with. To give you a sense of the conversation and work coming out of this event, I’ve attempted to capture a sample of the questions and prompts the participants asked:

  • Non-screen data visualizations
    • Experiential data visualization, sonification, physical experiences, and installations
    • Data viz for the blind
    • Sand mandalas
    • Getting data offline
    • Translating data visualizations across various forms of media
    • Low-bandwidth visuals for inclusivity
  • Communicating uncertainty
    • How do we communicate uncertainty in data?
    • In metadata?
    • How do we represent gaps in the data?
    • What if our knowledge of the uncertainty in the data is anecdotal?
    • How can visuals show “no answer”?
    • How can data visualization promote ambiguity?
  • Literacy
    • How do we improve everyone’s data visualization literacy, as creators and as viewers?
    • How do we educate people about the data they create?
    • Which people / sectors / fields most need data literacy?
    • Can we provide interactive tools that let viewers adjust data visualizations in real time as a means of improving literacy?
    • How can we support grassroots groups to create better data visualization?
    • Is there a need for basic design principles and data viz 101 resources for human rights activists?
    • How do we navigate a fear of numbers?
  • Perspective
    • How do we visualize when there’s a dispute or a problem with the “facts”?
    • How do we show different perspectives on the same data?
    • How do we establish trust with our audience?
  • Data Visualization Theory (one of the less popular categories in this very practical group)
    • Let’s connect #RDFViz with the academic visualization community
    • How do we create a data visualization of data visualization?
    • Is data visualization abstracted thought?
  • Power and Data Visualization
    • Is persuasive data visualization
      • good?
      • bad?
      • necessary?
    • The relationship between big data and advocacy visualization
    • If we don’t amplify what we don’t know, visualization will amplify the most powerful voices
    • What does good adversarial data visualization look like?
  • BAD data viz
    • Is meaningless data visualization worth anything?
    • What about when people make decisions based on bad data viz?
    • If raw data is unrepresentative, will visualizations on it be bad?
    • We should collect examples of unethical data visualization
  • Data Visualization Tools
    • Let’s consider the limits of software and the tools we use
    • The trade-off between ease of use and privacy
    • Data visualization does not immediately create data storytelling
    • We should be more open about the true cost of doing a data visualization
    • We need tools that allow us to share our process as well as the data source and output
    • “Proprietary viz companies will die” vs. “Open source communities are Kafkaesque nightmares”
    • There’s a distinct lack of non-English data viz tools
    • What are some reasonable principles or guidelines to provide designers creating software tools for use by the general public and specialists?
    • Which types of interactivity are most useful in enhancing analytical inspiration?
  • Data Visualization Methodology
    • We should discuss methodologies when we discuss visualizing data
      • How do we choose what we visualize?
      • How do we represent data quality?
      • How do we visualize metadata?
    • What’s the lifespan of an infographic? Can we design continuously updated visuals, or include expiration dates for stale graphics?
    • How do we encourage consideration of ethics in the creation process of data visualizations?
  • Collaboration
    • Let’s connect the data producers and the visualizers with a tighter feedback loop. The producers will see how their data’s been applied in the world, and visualizers will get a better sense of the contours of the data.
    • How do we encourage more collaboration between human rights activists and data visualizers?
  • Engagement and Participation
  • Audience
    • How do we involve the audience?
      Who is the audience, and why?
    • How do we create community ownership of a data viz?
    • How do we allow a data viz to speak to multiple disparate audiences?
  • Transparency and openness
    • Expose methodologies
    • Replicability of a data viz
    • Making the data viz process transparent
    • What assumptions are there in that data visualization?
    • How do design and aesthetic decisions bias a data viz?
  • Simplicity
    • How can we be succinct without over-simplifying the content?
    • Nuanced vs. bombastic
    • Can we build a language for the critique of data visualizations’ ethics?
    • Are there ethical ways to avoid nuance?
    • Presenting individual data points vs. an overview
  • Objectivity vs. subjectivity
    • Data as expression vs. data as fact
    • Is objectivity desired?
    • How do we use empathy without creating compassion fatigue?
    • The difference between invoking sympathy vs empathy
  • Honesty
    • When is a data viz most true?
    • When is a data viz most honest?
    • What about high-stakes data visualizations, like when there are life and death risks for participating subjects?
    • How do we incorporate criticism and critique into the visualization?
    • Data visualization is rooted in an Enlightenment fallacy that “the truth”, presented just so, will change things
  • Motivation and goals
  • Responsibility
    • Anonymizing data
    • Fact-checking data
    • Transparency vs. protection of subject
    • Marginal populations
    • Whose data is it, and is there consent?
    • Responsibly visualizing video / images
    • Does reliance on data de-humanize subjects?
    • How do we responsibly reduce complexity to convey points?
    • How do we make the creators of data visualizations
  • Culture
  • Risk & danger
  • The future…
    • Is visualization always stuck in the past?
    • Time travel strategies for slowing down time
    • Holodeck data visualization

A constellation of Post-its

This is only a partial list, as I wasn’t able to type quickly enough for the fast-moving Post-It notes. You can view the original Post-It constellation over here and keep up with the conversation and the creative outputs over at

Trip Report: Epicentro Innovation Festival in Guadalajara, Mexico


I recently had the chance to attend Epicentro, an innovation conference hosted by the Mexican state of Jalisco in a bid to loft the city of Guadalajara into the class of ‘Innovation Capital.’ There’s something real here. People I spoke with from across Mexico’s technology sector genuinely believe that Guadalajara is one of the top cities for design and technology, after Mexico City (which, at a population of ~9 million, will be difficult to dislodge). Building momentum, there will be a drone festival shortly following Epicentro. Guadalajara doesn’t yet have rules around drones, so it should be fun.


The conference was billed as a ‘festival’, and the organizers lived up to the spirit of that word. Along with large keynote speeches and heavily-attended workshops, Epicentro included two massive outdoor public concerts. The goal here, which I admire, is to invite the city to participate in the festival and make the various innovation initiatives part of their own experience. There’s often a yawning chasm between those of us who feel the agency to redesign society’s institutions and the public at large, many of whom are often unaware the redesign is happening. For this reason, I was heartened that Epicentro’s organizers, including Cristina Yoshida Fernandes, Social Innovation & Entrepreneurship Coordinator for the Ministry of Innovation, Science and Technology, are directly addressing this gap. They set out a broad public invitation to participate, a completely gratis schedule, and strategies to bring people in to deeper engagement on the city’s path forward. One channel for advancing citizen engagement is to invite people to join citizen communities on issues they genuinely care about: traffic, health, education. There are 28 such communities.

The venues spilled across Guadalajara, including the university, a LEED-certified innovation tower (MIND), and a community-built and managed Hacker Garage. The Garage was actually a family home until the local developer community converted into a hackerspace. It still has a small pool in the backyard.

Each day there was a core theme underlying the talks and workshops, and I was pleased to see not just a day dedicated to Smart Cities, which is no longer too rare, but also an entire additional day focused on Retos Públicos (public challenges). This phrase is in keeping with the idea of “shared challenges“, and may even better describe the work we do than the broad descriptor of ‘civic tech’.


Friday, in one whirlwind day, with a voice hoarse from cheering on the luchadores the previous night, I gave a keynote talk and led a four-hour workshop. The talk followed GovLab’s Dinorah Cantú Pedraza in her native Spanish. I flexed my high-school Spanish to share some of the ideas behind our work at Microsoft Technology & Civic Engagement. The main theme was that technology has given us, the people, new abilities to actively participate in designing society. Our democratic institutions were designed in an era where the bleeding edge of communications was Thomas Jefferson’s polygraph, a composition tool which allowed him to write duplicates of each letter he sent and maintain the equivalent of an outbox. Individuals have been empowered by technology, putting us in an awkward relationship with our institutions as they’re unable to channel or benefit from our energy. The result is that we must upgrade our democracy, and we must do so collectively.

workshoworkshop (2)workshop

After the talk and a quick lunch with great people and strawberry-jam-covered quesadillas, we ran off the to Hacker Garage to facilitate a 4-hour civic design workshop with the Codeando Mexico and Codeando Guadalajara teams. The backgrounds and age ranges of participants were the most diverse of any workshop I’ve ever facilitated. Beginning with active issues the participants cared about, we worked from concern to solutions to plans, with an emphasis on social discovery and community involvement throughout. Dinorah made a surprise appearance to constructively critique the projects and help them connect to existing efforts.


One fortunate coincidence that I will attempt to replicate at future workshops: Codeando Guadalajara had a weekend-long civic hackathon scheduled for only a couple of weeks following the workshop. This gave participants, almost all of whom were new to the idea of civic tech, another clear way to get more deeply involved and to continue developing the initial prototypes they produced together.

Honoring Veterans Every Day

Every November, we pause for a day to honor our veterans who have served overseas and returned home to us. After a day filled with parades, ceremonies, and banquets, it can be hard to remember that our veterans need support year-round. From returning to the civilian workforce to receiving proper healthcare to simply finding a home, we can work around the clock to give back to our veterans who have served so bravely for our freedoms.

As part of Microsoft’s commitment to community, we take the time to serve our veterans the way they served for us. After months to years of service, these veterans must take on the jarring task of returning immediately to civilian life — and that’s where we can help.

This year, we were proud to sponsor Team Red White & Blue in their second annual Old Glory Relay, a two-month, 3,540-mile journey across the continental U.S to shine a light on our nation’s veterans:

And every year, we help open doors for our service members by providing training and career opportunities in the tech industry. After all, a veteran’s spirit lies within innovation and entrepreneurship. This year, we’re excited to announce an expansion to our Microsoft Software & Systems Academy (MSSA) from three regions to nine, bringing our coverage to a total of 12 bases nationwide. Read more here.

MSSA-Infographic_Web_102815 (1)

And our work has just begun. This Veterans Day, we encourage you to honor our service members and challenge you to keep them in mind year-round.

Modeling NYC Subway Flow and School Districts’ Effect on Housing Value

Justin Rao and Jake Hofman coordinate the Data Science Summer School program, hosted and sponsored by Microsoft Research. Each year, dedicated students spend their summer learning how to conduct research thanks to a network of researchers, mentors, and advisors. All of the course materials are openly and freely available on Github.

Tonight, we’re celebrating the program’s second class. Last year’s students researched questions about racial profiling in New York City and how to optimize the city’s bikeshare system.


New York City Subway data visualized

The first team, consisting of Eiman, Shannon, Riva, and Steven, studied the New York City subway system. They were interested in how people travel on complex transit networks, and classifying stations based on turnstile data. The system’s 468 stations serve approximately 6 million trips per weekday. The team took data from the MTA’s General Transit Feed Specification data and MTA turnstile data, but had to exclude PATH, LIRR, NJT, and Staten Island Railroad systems due to incomplete data.

There are 30% more entries than exits in the data due to New Yorkers’ habit of using the more accessible emergency exit door. New Yorkers also travel less on Mondays and Fridays, as well as holidays and during major snowstorms. The data required significant cleaning to balance entries and exits and match station names across the datasets.

Different stations serve different purposes in the city. The team categorized over 400 subway stations as commercial stations, residential stations, or link stations, based on the ratio of daytime to nighttime entries and exits. Residential stations serve roughly 1,000 fewer entries per hour (~400 per hour) than commercial stations (~1,500 per hour). Grand Central alone serves over 188,000 daily exits. As you might guess, the commercial stations serve Manhattan south of 59th Street, while residential stations cover the rest of the city, with a few exceptions for military bases, car dealerships, and other outliers.

Next, the team constructed a network graph to visualize the flow of activity through the stations. Nodes were train stations, edges were rail links between adjacent stations, and the cost is the time it takes to travel from one station to another based on schedules. The resulting adjacency matrix is a rat’s nest (sorry) of stations, improved by adding the stations’ geocoordinates.

One finding is that 14th Street Union Square has 10 neighboring adjacent stations, while Times Square, the most trafficked station, has only 7.

The team estimated demand of inflow and outflow, and computed minimum cost flow, where demand is satisfied while minimizing the previously defined cost. They chose Grand Central Station as the center of the visualization. The algorithm identifies high flow corridors. In the mornings, the flow is generally inbound to Grand Central, with the exception of flow down to Lower Manhattan.

Next, they wanted to model population flow. US Census data only shows residential population, which wouldn’t work for New York’s immense number of commuters. Combined with the commuter data, we can watch Census tracts empty or swell throughout the day based on commuting patterns. The team suggests applications such as correcting stop and frisk activity for an area’s current population, or studying disease spread in epidemiology.

Q: Has anything you’ve learned change how you use the subway?

A: “I definitely don’t use the [emergency] exit doors anymore”.

“I don’t know if Midtown is fire-safe”.


New York City School District

The second data science team sought to understand the relationship between housing values and the quality of the school district in which they’re situated. It’s comprised of Glenda, Thomas, Nikki, and Anastassiya.

The New York City school system is the largest in the country, with over 1 million students. It has some of the best schools, and some of the worst, depending where you live. The team looked at test scores and plotted it relative to the rest of the state. The result is a huge disparity, between boroughs like the Bronx and Queens, and within boroughs like Manhattan. They took shapefiles and plotted school performance on a map of New York by color.

PS111 performed worse than 60% of all schools in the state, where as PS59 performed better than 99.2% of all state elementary schools. They are a ten minute walk from one another on 53rd and 56th streets.

This is where school district boundaries can make or break your child’s education. Park Slope recently redrew their lines, stoking parents’ anxieties.

The team compared high-performing school zones to housing values. In an ideal experiment, they’d sell identical apartments in two different school districts. Instead the looked at historical sales value data by scraping StreetEasy, a major NYC-area real estate website. The data wasn’t perfect — apparently you can have a negative number of bedrooms and bathrooms.

They wrote a Python script to geocode the addresses with the NYC GeoClient API. They computed and displayed the latitude, longitude, school, and price per square foot for each home. 40,000 distinct sales produced 10,000 sales with complete data mappable to known school zones.

Apartment prices alone illustrate huge disparities in New York: $110 per square foot in Woodlawn, Bronx vs. $3,393 per square foot around Central Park South. When plotted against school performance, there’s a correlation between price and competitive schools, but the relationship is conflated by other factors like neighborhood quality and location. We can’t confidently say that price per square foot increases only because of school quality.

So how do we isolate the school zone premium? If we had infinite data, we’d simply subtract the average sale price in neighboring zones from the average sale price in the school zone. With limited data, the team had to build a model to estimate, and fit a linear model to predict sale price per square foot. They used regularization to select important features and avoid unreliable estimates of sales in the given areas. They took into account the number of bedrooms, bathrooms, demographics, test scores, and neighborhood, as well as the interactions between the number of bedrooms and the school (to capture families vs. studios).

The median absolute deviation for all boroughs was $103 per square foot, but it varied by borough: $48.99 in the Bronx and $138.48 in Manhattan. The model accurately captures average home price within each school zone.

The team zoomed in on Park Slope to identify school-based premiums. They found a premium of $84 per square foot for residencies in PS 321’s school district vs. those in PS 282, despite PS 282 being closer to the Atlantic Ave subway hub. Repeating the same procedure in each school district, they find that you do pay less to live in a school district with poorer test scores.

The result? New York City’s schools demonstrate extreme inequality, often over small geographic areas.

The information is compelling, but static, so the team built an interactive app where you can enter your address and number of bedrooms and bathrooms to see the price average, median price, and premium price (positive or negative) in each of the city’s school zones:

Screenshot (38)

Watch — Brad Smith Discusses United States Second Circuit Court of Appeals, Digital Privacy

Last week, Microsoft filed an appeal to the United States Second Circuit Court of Appeals in its ongoing case challenging a U.S. warrant seeking customer data stored overseas. The brief lays out the important legal and policy issues that Microsoft believes are at stake in this case and can be read here.

Microsoft will discuss the case and the larger issue of digital privacy today from 11am – 12:00pm EST.  The event will be held at Microsoft New York, 11 Times Square – at the corner of 8th Ave and 41st Street (please enter through the Microsoft lobby on 8th Ave.) Microsoft will host a live webcast at the Microsoft News Center.