Open Data

Why You Should Apply (Now!) for Microsoft’s Data Science Summer School

Last summer, I was utterly starstruck in the geekiest way possible. My peers and I in the Microsoft Research Data Science Summer School (DS3) had chosen to use Airbnb data for our final project. After we made our presentation to a packed room, a couple of data scientists working at Airbnb reached out to us to express interest in our paper, and we then presented our project to them in private. At the time, I just remember thinking to myself, “This is so cool!”

This Friday, April 21st, is the deadline for this summer’s version of DS3. So, if what you read here sounds interesting and you want to be a part of DS3, there’s no time to waste. Apply today!

DS3 is the brainchild of a handful of awesome Microsoft researchers – Jake Hofman, Justin Rao, and Sharad Goel – who wanted to inspire students and help create a more diverse and accessible field of data science. The program has two parts. First, you learn the equivalent of one semester of data science compressed into four weeks. It’s intense. In the mornings – which usually start around 10AM – renowned senior Microsoft researchers will privately teach you and seven other students cutting edge data science and statistics. No specific background is required, and they always make sure everyone understands what is going on. In the afternoon, you are left to complete a mini data science project, to put into practice the lessons you’re learning.
The second part of DS3 is the final project, which is the focus of the final four weeks. You form a team and work on your project for the entire day. You use real-world datasets to come up with an entirely original research paper. Each team typically has two mentors, and those mentors are there for the entire process: brainstorming research ideas, coding, writing the actual paper, learning how to cite properly and preparing for your presentation. My team’s research paper was accepted into conferences at MIT and the ACM’s Tapia Conference for Diversity in Computer Science, and that could not have happened without the amazing guidance of our mentors. I can’t stress how unbelievably awesome it is to have renowned researchers dedicate multiple weeks to help you write your first-ever research paper. They become your teachers, advisors, recommenders and debuggers. One of them has become an almost parental figure to me, and still advises me on my college classes to this day.

I highly encourage anyone who thinks data science, big data, and artificial intelligence are interesting — you should apply to DS3! You don’t need to be a genius; you just need to be curious and willing to work hard. You will be surprised at how helpful and humble everyone at Microsoft is. To be honest, I didn’t like statistics at all and wasn’t the best at math. But in DS3, you come to realize that quantitative skills are only part of the equation, and that good data scientists must also be creative, reflective and inquisitive. I guarantee you no matter what background you have, DS3 will give you a lifetime of skills, inspiration, friends, and confidence. I’m now working as a Civic Tech Fellow on Microsoft’s Technology & Civic Innovation team – and I wouldn’t be here if I hadn’t taken the leap of faith to spend a summer in DS3.

Data science is at a historic moment because it has already begun to change the way businesses and organizations work. It is applicable to so many more fields than you think. Like how the camera gave computers sight, data science is giving computers millions of new senses to interpret the world. There is a reason Harvard Business Review published an article proclaiming “Data Science: the Sexiest Job of the 21st Century.”  I feel like I am part of something big, I have new superpowers with which to change the world, and it is all very exciting.

The deadline to apply to the Microsoft Research Data Science Summer School (DS3) is this Friday, April 21st. Any interested college student can learn more and apply here.

Using Data Science to Improve Traffic Safety

As U.S. traffic deaths continue to rise, cities across America are increasingly focused on eliminating crash-related injuries and fatalities. Data can be a powerful resource in these efforts to make streets safer.  We’re happy to support this effort, partnering with DataKind, which recently completed the Vision Zero Labs Project. This effort worked to develop valuable analytical models and tools to help the cities of New York, Seattle and New Orleans further their work to increase road safety.

In partnership with DataKind, a nonprofit that harnesses the power of data science in service of humanity, and the New York City Department of Transportation, we launched this project in August 2015, joining forces with the Seattle Department of Transportation and the City of New Orleans’ Office of Performance and Accountability in March 2016. With these cities, the Vision Zero Labs Project has become the first and largest multi-city, data-driven collaboration of its kind to drive traffic safety efforts in the U.S.

Using data science techniques, DataKind accessed open and internal city data to design several models and tools that enable cities to test the effectiveness of various street safety interventions, estimate total traffic volumes and gain additional insight into crash-related factors.

Learn more about our work with DataKind and Vision Zero:

ABOUT DATAKIND

Launched in 2011, DataKind is a global nonprofit that harnesses the power of data science, AI and machine learning in the service of humanity. Through its core programs – Labs, DataCorps and DataDives – the organization brings together leading data scientists and social sector experts to collaborate on projects to tackle some of the world’s toughest challenges. A leader in the Data for Good movement, DataKind was named one of Fast Company’s Top 10 Most Innovative Nonprofits for 2017. Headquartered in New York City, DataKind has Chapters in Bangalore, Dublin, San Francisco, Singapore, the UK and Washington, D.C. For more information visit www.datakind.org

ABOUT VISION ZERO

An initiative born in Sweden in the 1990’s, Vision Zero aims to reduce traffic-related deaths and serious injuries to zero. It has been adopted by more than a dozen U.S. cities including New York and Seattle. Vision Zero believes that crashes are predictable and preventable, which means there is great potential for data and technology to help uncover patterns of incidents so governments can take action to prevent fatalities before they occur.

Creating Safer Streets Through Data Science — A Case Study

Executive Summary

Tens of thousands of people are killed or injured in traffic collisions each year. To improve road safety and combat life-threatening crashes, many U.S. cities have adopted Vision Zero, an initiative born in Sweden in the 1990s that aims to reduce traffic-related deaths and serious injuries to zero.

While many cities have access to data about where and why serious crashes occur, the use of predictive algorithms and advanced statistical methods to determine the effectiveness of different safety initiatives is less widespread. Therefore, DataKind, Microsoft and three U.S. cities — New York, Seattle and New Orleans — came together to help demonstrate how cutting edge and scalable solutions can be developed to help tackle a complex societal issue.

Each city had specific questions that they wished to address around local priorities for increasing traffic safety, to better understand the factors contributing to crashes and the potential impacts of different types of interventions. The DataKind team, working closely with local city transportation experts, brought together a wide variety of datasets such as information on past crashes, roadway attributes (e.g. lanes, traffic signals and sidewalks), land use, demographic data, commuting patterns, parking violations and existing safety intervention placements.

These inputs were leveraged to develop models that allowed cities to examine how different street characteristics impacts the injuries that occur, to determine the extent to which roadway user behavior and street design are contributing factors in crash occurrence and severity, to assess the effectiveness of interventions for increasing safety and guide the placement of future interventions.

The DataKind team also developed a model to help cities accurately and cost-effectively estimate “exposure” or total volume of vehicles on individual streets, a key factor in safety analyses as well as broader transportation planning activities.

Today, as a result of applying these models, the cities are better positioned to determine what kind of engineering, enforcement and educational interventions are effective and how to best allocate limited available resources.

Specifically,

  • In New York, with the new exposure model capability, the city can perform initial safety project feasibility studies more efficiently. When combined with DataKind’s crash models, the new capability will help the city test the potential impact different engineering, land use and traffic scenarios would have on total injuries and fatalities in the city.
  • In Seattle, the city focused on bicycle and pedestrian safety issues in order to gain insights that could contribute to the planning for more than $300 million in anticipated Vision Zero investments. The DataKind models identified collision patterns and factors that contributed to higher levels of injury severity, including whether a motor vehicle is making a right turn or left turn and the effectiveness of crosswalks in reducing crash severity. They also identified key variables affecting the likelihood of accidents taking place on particular stretches of road, including traffic volume, land use, number of traffic lanes, street width and pedestrian concentration.
  • And in New Orleans, the DataKind team created an Impact Assessment tool that will allow the city to compare various locations that are candidates for street treatments, such as bicycle lanes, and to evaluate the impact of implemented treatments over time.

In addition to aiding the participating cities in their efforts to make streets safer, the Vision Zero Labs project showed how data science and collaboration between the public and private sector can help benefit the greater good and produce innovative and scalable solutions to address complex civic issues like traffic safety. Cities around the world can adapt the methodologies and learnings to reduce traffic-related injuries and fatalities in their own communities.

DataKind Vision Zero Initiative: Purpose, Projects and Impacts

Visualization of an early version of the exposure model that estimates traffic volume by street in Seattle. This view shows the difference between the model’s estimates and actual measurements

Tens of thousands of people are killed or injured in traffic collisions each year. To improve road safety and combat life-threatening crashes, more than 25 U.S. cities have adopted Vision Zero, an initiative born in Sweden in the 1990’s that aims to reduce traffic-related deaths and serious injuries to zero. Vision Zero is built upon the belief that crashes are predictable and preventable, though determining what kind of engineering, enforcement and educational interventions are effective can be difficult and costly for cities with limited resources.

While many cities have access to data about where and why serious crashes occur to help pinpoint streets and intersections that are trouble spots, the use of predictive algorithms and advanced statistical methods to determine the effectiveness of different safety initiatives is less widespread. Seeing the potential for data and technology to advance the Vision Zero movement in the U.S., DataKind and Microsoft wondered: How might we support cities to apply data science to reduce traffic fatalities and injuries to zero?

Three U.S. cities — New York, Seattle and New Orleans — partnered with DataKind in the first and largest multi-city, data-driven collaboration of its kind to support Vision Zero efforts within the U.S. Each city had specific questions they wished to address related to better understanding the factors contributing to crashes and what types of engineering treatments or enforcement interventions may be most effective in helping each of their local efforts and increase traffic safety for all.

To help the cities answer these questions, DataKind launched its first ever Labs project, led by DataKind data scientists Erin Akred, Michael Dowd, Jackie Weiser and Sina Kashuk. A DataDive was held in Seattle to help support the project. Dozens of volunteers participated in the event and helped fuel the work that was achieved, including volunteers from Microsoft and the University of Washington’s E-Science Institute, as well as many other Seattle data scientists.

The DataKind team also worked closely with local city officials and transportation experts to gain valuable insight and feedback on the project and access a wide variety of datasets such as information on past crashes, roadway attributes (e.g. lanes, traffic signals, and sidewalks), land use, demographic data, commuting patterns, parking violations, and existing safety intervention placements.

The cities provided information about their priority issues, expertise on their local environments, access to their data, and feedback on the models and analytic insights. Microsoft enabled the overall collaboration by providing resources, including expertise in support of the collaborative model, technical approaches, and project goals.

Overall, the work accomplished by the Vision Zero Labs team proved to be invaluable for the cities of New York, Seattle and New Orleans, equipping them with powerful insights, models and tools that can help inform future planning to prevent severe traffic collisions and keep all road users safe. With this knowledge, the cities can better determine how to best allocate resources and investments towards improvements in infrastructure and policy changes.

In addition to aiding the participating cities in their efforts to make streets safer, the project showed how data science can be effectively used to address complex civic issues like transportation safety. A particular example is the technique developed in this project around estimating road use volume even when complete data is lacking. This technique is relevant both for safety analyses and broader transportation planning activities. These are the kinds of cutting edge and scalable solutions DataKind’s Labs projects aim to deliver to achieve sector wide impact.

The project also showed how collaboration between the public and private sector and amongst partner organizations can help benefit the greater good and result in innovative and scalable solutions to address complex and critical issues like traffic safety. Cities around the world will be able to benefit from the results of the Vision Zero Labs project and can adopt the methodologies and learnings from the work to reduce traffic-related injuries and fatalities in their own communities.

Below are detailed descriptions of the specific local traffic safety questions each city asked, the data science approach and outputs the DataKind team developed, and the outcomes and impacts these analyses are providing each city.

New York: Estimating Street Volumes and Understanding How Street Design Can Reduce Injuries

Map showing street improvement projects locations and change in crashes in New York

Local Question: According to the City of New York, on average, vehicles seriously injure or kill a New Yorker every two hours, with vehicle collisions being the leading cause of injury-related death for children under 14 and the second leading cause for seniors. Looking to improve traffic safety on its streets, the city wanted to understand what existing interventions are working and where there is potential for improvement to help inform how the city can better allocate its resources to protect its residents.

Data Science Approach and Outputs: The team leveraged datasets from New York City’s Department of Transportation, NYC OpenData, New York State and other internal city data to examine the effectiveness of various street treatments to help inform the city’s future planning and investment of resources. Lacking some of the data necessary to address the actual impact of existing street treatments, the team looked to answer other crucial questions regarding traffic safety that could help benefit the city.

Before they could answer these questions, they first needed to answer a more basic one — how many cars are on the road? Knowing the total volume of road users or “exposure” is necessary to understand the true rate of crashes, but most cities don’t have this data available. To overcome this, the team designed an innovative exposure model that can accurately estimate traffic volume in streets throughout the city. The model has two main components. The first is an algorithm that propagates traffic counts on a single street segment to adjacent street segments. It assumes that traffic on one city block is very similar to traffic on adjacent blocks. This process can be run many times and allows one to widely propagate traffic count values along neighboring streets. However, some streets may not have any nearby traffic counts available, so the second component of the model is a machine learning model, with high predictive accuracy, that predicts traffic volumes on streets based on their characteristics.

The team also created a crash model for New York, allowing the city to examine individual locations and test how different street characteristics impacts the number of injuries. For example, the city may be able to look at a particular street and determine whether it is safer for the street to be a one- or two-way road.

Outcomes: The exposure model will prove to be invaluable to the City of New York, filling a crucial void in vehicle volume data that many cities face. With it, the city can now perform initial safety project feasibility studies very quickly and provide context for a variety of other safety research work that requires an “exposure” rate. The model can also be altered to estimate other defined traffic volume measures, like peak hour traffic volumes. It can also help inform future work related to traffic congestion and citywide vehicle usage.

New York can also use the crash models to test the potential impact different engineering, land use and traffic scenarios would have on total injuries and fatalities in the city. They will continue to build upon the work started by DataKind, as the models developed set the stage for future research in crash prediction, congestion relief and city safety projects.

The team was able to leverage the work started in New York City to help develop and refine the approaches for both Seattle and New Orleans.

Seattle: Understanding How Street Design, Driver Behavior and the Surrounding Environment Contribute to Crashes

This “exposure” model developed for New York and Seattle shows estimates of citywide traffic volume, a key piece of information needed for advanced analyses that most cities don’t have

Local Question: While Seattle has seen a 30 percent decline in traffic fatalities over the last decade, traffic collisions are still a leading cause of death for Seattle residents age 5 to 24. Older adults are also disproportionately affected, so this trend could grow as the population ages. To supplement the findings of the City’s Bicycle and Pedestrian Safety Analysis project and provide policy makers and engineers with actionable information for developing and implementing interventions, Seattle sought to find out what mid-block street designs are most correlative with collisions involving vulnerable roadway users and what the probability of such collisions occurring is at identified locations.

Data Science Approach and Outputs: Using Seattle’s collision, roadway traffic, exposure data and environment characteristics, the DataKind team developed models to uncover collision patterns involving pedestrians or bicyclists and determined the extent to which contributing circumstance and street design are correlated with collision rates, as well as the severity level of specific types of crashes. The team also applied the methodology developed for their work with New York to calculate exposure or total traffic volume citywide for Seattle.

By incorporating incident-specific information such as time of day, weather, lighting conditions and behavioral aspects, the team was also able to further develop a crash model to evaluate elements that may contribute to crashes at intersections and to what extent driver behavior, road conditions and street design played a role.

Outcomes: The DataKind team was able to determine several variables that had the greatest impact on mid-block collisions — traffic volume, land use, number of traffic lanes, street width and pedestrian concentration were the most demonstrative inputs associated with collisions.

For instance, it was found that the fact of whether a motor vehicle is making a right turn or left turn at a given intersection will influence the severity of the collision. Researchers were also able to identify in which months of the year incidences of crashes appeared to be better or worse. Interestingly, the number of crosswalks was found to be significant and that more crosswalks at an intersection showed reduction in the severity of crashes.

With these insights, Seattle will be able to pinpoint high risk areas and the factors that can be addressed to help reduce future crashes. The city recently passed a levy to fund multi-modal transportation improvements city-wide and the results from this project, along with additional safety studies, will help guide more than $300 million in Vision Zero investments over the next nine years. 

New Orleans: Evaluating the Effectiveness of Street Treatments

Local Question: While New Orleans hasn’t officially adopted Vision Zero, the city government and community are working together to make roads safer. In 2014, New Orleans was named a “silver” level bicycle-friendly community by the League of American Bicyclists and had the eighth highest share of bicycle commuters among major U.S. cities. New Orleans also leads Southern U.S. cities in bicycle commuting. Yet, a disproportionately high number of the state’s pedestrian crashes occur in New Orleans and the number of bicycle crashes doubled from 2010 to 2014.

To help the city protect its growing number of roadways users, New Orleans wanted to understand the impact that future installation of street treatments, such as bike lanes and traffic signage, could have on preventing traffic injuries and fatalities.

Data Science Approach and Outputs: The DataKind team created an Impact Assessment tool that could be used to test the effectiveness of installed treatments, which would then be used to better inform the placement of future street treatments, both individual interventions and groups of interventions applied simultaneously.

Specifically, the tool takes a set of treatment locations and uses different statistical methods to create sets of comparison locations. These comparison locations are used as a point of reference to gauge the impact of the treatment on traffic safety by comparing crash rates before and after the installation of interventions to similar intersections that did not receive interventions. The tool includes visualizations to examine generated comparison groups, as well as methods for using manually selected comparison groups.

As an example, New Orleans could select a treatment, such as a bike lane, and compare the crash rates before and after the bike lane was installed. The city can then compare these crash rates to other comparison sites. The comparison sites are especially important because they allow the city to prepare for outside factors, such as overall growth in population or traffic. The crash rate could actually increase at a treatment site but this may be due to other factors such as large increases in traffic. When comparing a treatment site with similar untreated sites, we can see if the crash rate increased at a lower rate, thus indicating an improvement in safety due to the treatment. 

Outcomes: New Orleans has integrated the Impact Assessment tool into their systems and will be collecting more data to maximize the tool’s potential and evaluate the effectiveness of additional street features. These findings will help inform the placement of future street treatments.

“Making streets safer for all New Orleanians is a major priority of ours,” said Oliver Wise, director for the City of New Orleans’ Office of Performance and Accountability. 

Learn More:

Celebrate Open Data Day in New York This Weekend

Every March, we’re excited to join data enthusiasts worldwide to celebrate International Data Day, a worldwide event that promotes awareness and use of open data. Through a series of events around the globe, people of all skill levels converge to create new projects, analyze data, and find new ways to visualize data.

We believe open data is a priority for civic tech enthusiasts — and we invite you to join us as we kick off the open data celebration this weekend. Here are some highlights of this weekend’s schedule — we hope to see you there:

March 3-5

Giving Tuesday DataDive, Presented by 92Y, DataKind, and the Bill & Melinda Gates Foundation

  • Friday 3/3 6:30pm-8pm EST: discuss goals for the DataDive and dive into the data!
  • Saturday 3/4 9am-9pm EST: choose a team and get to work!
  • Sunday 3/5 9am-3pm EST: final presentations and networking
    Note: You can attend one or all days.

We’re thrilled to be hosting a DataDive March 3-5 and are looking for data pros of all backgrounds to roll up their sleeves and work side by side with experts from the 92Y, the Bill & Melinda Gates Foundation, and Facebook to help use data to unravel tough questions and prototype new solutions.

March 4

International Data Day

Open Data Day is an annual celebration of open data all over the world. For the fifth time in history, groups from around the world will create local events on the day where they will use open data in their communities. It is an opportunity to show the benefits of open data and encourage the adoption of open data policies in government, business and civil society. View activities happening around the world here.

NYC School of Data (SOLD OUT)

New York City School of Data is a community conference showcasing NYC’s civic design, civic/government technology, and open data ecosystem.

March 6

Civic Hall Presents: Open Data, Mapping Global Security & the Department of Defense

How can we get national security data into the open? The National Geospatial-Intelligence Agency (NGA) will demo its geospatial data portals for the Arctic, for combating wildlife trafficking in Africa, and for Hurricane Matthew.

March 7

Five Year Anniversary of New York City’s Open Data Law, Local Law 11 of 2012

In many countries, states and cities Open Data is a policy – here in New York City it is a law, which ensures that Open Data is here to stay.

NYC Chief Analytics Officer Dr. Amen Ra Mashariki speaking at Socrata’s Connect 2017 Conference in DC

10 – 10:25am on the Main Stage. Livestream details coming soon.

NYC Big Apps Workshop – NYC Open Data Portal & Department of City Planning Facilities Explorer Tutorials

Join members of the NYC Open Data team and Department of City Planning for a demo of the NYC Open Data Portal and new Facilities Explorer tool (launching soon) followed by a breakout session at the Tuesday March 7th NYC Big Apps Workshop. You’ll learn the basics about how to access NYC data (1600+ datasets!) and get an overview of other tools such as the Facilities Explorer powered by NYC Open Data that you can use to support your research and work for the Big Apps competition as well.

March 8

Made in NY Media Center + Fabernovel Data & Media: Open Data Breakfast

Whether you are a developer, agency or civil service non-profit having access to data drives business, improves services, and promotes free public access.

Together with FaberNovel we are hosting and interactive breakfast and conversation on March 8th to learn more about the City of New York’s Free Open Data Portal and how you can use it to build products, conduct research and analysis or create new applications.

Department of Small Business Services: 2017 Smart Districts Summit

Inaugural NYC Smart Districts Summit, where community and technology leaders will collaboratively explore how emerging technologies are being leveraged to address the most pressing district-level challenges.

College of Staten Island (CSI) Tech Incubator + Vizalytics: Data – A Driving Force of Innovation

Connect with us to discover how organizations and entrepreneurs are utilizing data to drive innovation within our local community. Learn the practices, technologies, and patterns the experts use to fuel their enterprises by way of big data.

March 9

Reaktor Open Data Studio

The goal of this evening is to share some ideas about how Open Data could be utilized in new ways, especially in New York. We have a happy hour with benchmarks from Helsinki, where open data catalogues have been advanced for a while, and companies and developers alike are used to creating cool applications for it.

Join us to hear examples of applying open data in a user-friendly way, and let’s come up with new ways to use open data to create new tools.

General Assembly Panel Discussion: Data and…Health

Big Data is continuing to significantly impact the way in which organizations operate and make informed business decisions. Emerging technologies are now paving the way to innovative medical developments, and it looks as though data is beginning to transform the entire healthcare industry! In collaboration with the first annual NYC Open Data Week, GA is bringing together influencers from the health and wellness spaces to discuss how data is impacting their organizations.

March 11

NYC Parks Computer Resource Centers Open Data for All: TreesCount! Workshop

This free workshop, presented by NYC Parks and the NYC Open Data team, offers a broad introduction to the NYC Open Data Portal along with the concept of data literacy and analysis.

Using NYC TreesCount! 2015 data, the most accurate map of NYC’s street trees ever created, you will learn how to identify, download, manipulate, and visualize NYC Open Data with a focus on community engagement and awareness. Using tools such as Google Sheets and CARTO, you will be able to create your own graphs and maps from NYC Open Data.

Habitat III: A Once-In-A-Generation Civic Experience

habitat3

Photo: John Paul Farmer

It’s hard to catch your breath in Quito, Ecuador. Whether it’s the thin air of its 10,000 foot elevation, the natural beauty of its volcanic mountains, or the built beauty of its colonial-era architecture, Quito is a city that leaves you breathless.

Last month, 30,000 people came together in the scenic Ecuadorian capital to discuss the future of cities at Habitat III. Hosted by the United Nations, this once-every-20-years convening marked just the third of its kind, following in the footsteps of Habitat I in Vancouver in 1976 and Habitat II in Istanbul in 1996. UN-hosted World Urban Forums have been held every couple of years in recent decades, although none has reached the scale of Habitat.

At Habitat III, a wide range of individuals and organizations – including governments, companies, non-profits, and academic institutions – gathered to share best practices, to celebrate successes, and to approve a New Urban Agenda that marks the culmination of years of negotiations among United Nations member states.

Gatherings ranged from formal (including official delegate discussions in the National Theater), to participatory (such as the youth assemblies) to informal (like the lightning talks that electrified the expo hall). Some of the most interesting highlights were the following:

The Global Municipal Database – Lourdes German, Director of International and Institute-wide Initiatives at the Lincoln Institute, showcased a dashboard for cities that is built upon Microsoft technologies such as Azure, Power Map and Power BI. Working with cities in Africa, Asia, and Latin America, the Global Municipal Database tracks key fiscal indicators including expenditures, revenue, and borrowing and gives communities the tools to visualize the data and create actionable insights. What’s so powerful about these technologies is that many of their functionalities are Excel-based, meaning millions of people could use them tomorrow to make their cities more transparent and accountable, with no further training necessary.

Water and Resilience – It has been said that everyone has a water problem: either too polluted, too much, or not enough. For example, fully one-third of the Netherlands – a country built on its shipping and ports – lies below sea level. The country’s strength – water – is also its greatest vulnerability. With years of such experience living with water, the Netherlands was especially well qualified to host a conversation on the subject, which included viewpoints from Rotterdam and The Hague as well as a framework shared by 100 Resilient Cities’ Andy Salkin. One insight from The Hague was that resilience is not only physical, but must also be social and digital. Every aspect of a city must be able to bounce back. And while the cities of the Netherlands are especially advanced in learning how to live with water, most cities around the world are just getting started.

Public Spaces – Public spaces also played a key role, with planners asking whether placemaking will be at the heart of cities in the future. With a discussion of Eastern and Western traditions in terms of public spaces, the room erupted into a lively debate, during which an audience member noted that urban planners are increasingly using Microsoft’s Minecraft to engage people – particularly the young – in co-designing their own public spaces.

Housing – Housing was a major focus at Habitat III, for developed cities such as New York and for developing cities such as Lagos alike. With the majority of humanity living in cities for the first time in history, the influx of newcomers creates new stresses. Safe, accessible, and affordable housing is a priority.

Accessibility – A theme that was more woven into the conference experience than something explicitly called out was the need for more accessible communities. Microsoft is increasingly collaborating with cities to use technology to improve accessibility to services, information, and opportunity. “Eliminate the unnecessary barriers that limit our potential,” implored Dr. Victor Santiago Pineda of the University of California at Berkeley, who also served as co-chair for accessibility at Habitat III.

Youth – A particularly interesting aspect of Habitat III was the prevalence of young people everywhere you went. While most delegates were more senior, accomplished professionals, the conference grounds also teamed with young people of high school and college age. Many of those youth were local Ecuadorians engaging with this once-in-a-lifetime event that was on their home soil. Others were young people from around the world who journeyed to Quito to serve as agents of change. A middle-aged delegate at one youth-run session exclaimed “I’ve been going to sessions back-to-back for two days and this is the first one that is participatory. I think we need more of this.”

After several incredible days in Quito, the big question on everyone’s lips was, “What happens next?” How does the New Urban Agenda get implemented? To what extent will cities be prioritized by the UN? What role will technology play in forging solutions to our hardest problems? Will upcoming World Urban Forums be effectively leveraged to ensure steady progress on such audacious goals? Will the assumptions and priorities of Habitat III stand the test of time? Only time will tell.

Habitat III brought together planners, policymakers, technologists, and young people who care about the future of cities. Technology was there and will be an increasingly ubiquitous part of our lives. These new cross-sector connections have the potential to pay dividends between now and Habitat IV in 2036 – but that potential requires action by us to be fulfilled.

Source: Habitat III

RECAP: White House Open Data Innovation Summit (#WHOpenData)

This week, we were honored to join and support the White House Open Data Innovation Summit. Leaders in open data, including White House leaders in data U.S. Chief Information Officer Tony Scott and U.S. Chief Technology Officer Megan Smith, spent the summit championing the use of Federal open data across all sectors.

Read more about the growth of open data with the Center for Open Data Enterprise‘s new Open Data Best Practices Report. You can also watch the livestream of the summit here.

Some top tweets from the Open Data Innovation Summit:

Data science for safer streets: DataKind Vision Zero project expands to three new cities

intersection-from-above21

Last August, Microsoft announced its partnership with DataKind to support the Vision Zero movement in the U.S., which aims to reduce traffic-related deaths and severe injuries to zero in cities around the world. Today Microsoft and DataKind said that San Jose, Seattle and New Orleans will join New York City as the lead cities working on this initiative.

“The DataKind Vision Zero project is a demonstration of the possibilities created by bringing diverse sources of data and expertise together,” writes Elizabeth L. Grossman, Microsoft Technology & Civic Engagement director of civic projects.

“New data science analyses, using a combination of public and private data, will be designed to help local decision makers identify and evaluate which engineering, education and enforcement interventions can most effectively address each city’s local efforts to increase traffic safety for all.”

Read more on Microsoft on the Issues.

DataViz for good: How to ethically communicate data in a visual manner: #RDFviz

IMG_20160115_155638

Catherine D’Ignazio brainstorms around data inclusion

Last Friday I participated in my second Responsible Data Forum. Last year’s workshop on private sector data sharing (data philanthropy, if you like) inspired some of our thinking and collaborations over the past year, and today’s event about data visualization for social impact did not disappoint. You can see what people posted at #RDFviz, on the wiki, and in a great collection of related resources here.

IMG_20160115_113307

Mushon Zer-Aviv facilitates the Responsible Data Forum

At the top of the day, we did the classic Post-It note brainstorm to inventory all of the potential avenues for working groups. Given the incredible experience of the people in the room, there was a lot to work with. To give you a sense of the conversation and work coming out of this event, I’ve attempted to capture a sample of the questions and prompts the participants asked:

  • Non-screen data visualizations
    • Experiential data visualization, sonification, physical experiences, and installations
    • Data viz for the blind
    • Sand mandalas
    • Getting data offline
    • Translating data visualizations across various forms of media
    • Low-bandwidth visuals for inclusivity
  • Communicating uncertainty
    • How do we communicate uncertainty in data?
    • In metadata?
    • How do we represent gaps in the data?
    • What if our knowledge of the uncertainty in the data is anecdotal?
    • How can visuals show “no answer”?
    • How can data visualization promote ambiguity?
  • Literacy
    • How do we improve everyone’s data visualization literacy, as creators and as viewers?
    • How do we educate people about the data they create?
    • Which people / sectors / fields most need data literacy?
    • Can we provide interactive tools that let viewers adjust data visualizations in real time as a means of improving literacy?
    • How can we support grassroots groups to create better data visualization?
    • Is there a need for basic design principles and data viz 101 resources for human rights activists?
    • How do we navigate a fear of numbers?
  • Perspective
    • How do we visualize when there’s a dispute or a problem with the “facts”?
    • How do we show different perspectives on the same data?
    • How do we establish trust with our audience?
  • Data Visualization Theory (one of the less popular categories in this very practical group)
    • Let’s connect #RDFViz with the academic visualization community
    • How do we create a data visualization of data visualization?
    • Is data visualization abstracted thought?
  • Power and Data Visualization
    • Is persuasive data visualization
      • good?
      • bad?
      • necessary?
    • The relationship between big data and advocacy visualization
    • If we don’t amplify what we don’t know, visualization will amplify the most powerful voices
    • What does good adversarial data visualization look like?
  • BAD data viz
    • Is meaningless data visualization worth anything?
    • What about when people make decisions based on bad data viz?
    • If raw data is unrepresentative, will visualizations on it be bad?
    • We should collect examples of unethical data visualization
  • Data Visualization Tools
    • Let’s consider the limits of software and the tools we use
    • The trade-off between ease of use and privacy
    • Data visualization does not immediately create data storytelling
    • We should be more open about the true cost of doing a data visualization
    • We need tools that allow us to share our process as well as the data source and output
    • “Proprietary viz companies will die” vs. “Open source communities are Kafkaesque nightmares”
    • There’s a distinct lack of non-English data viz tools
    • What are some reasonable principles or guidelines to provide designers creating software tools for use by the general public and specialists?
    • Which types of interactivity are most useful in enhancing analytical inspiration?
  • Data Visualization Methodology
    • We should discuss methodologies when we discuss visualizing data
      • How do we choose what we visualize?
      • How do we represent data quality?
      • How do we visualize metadata?
    • What’s the lifespan of an infographic? Can we design continuously updated visuals, or include expiration dates for stale graphics?
    • How do we encourage consideration of ethics in the creation process of data visualizations?
  • Collaboration
    • Let’s connect the data producers and the visualizers with a tighter feedback loop. The producers will see how their data’s been applied in the world, and visualizers will get a better sense of the contours of the data.
    • How do we encourage more collaboration between human rights activists and data visualizers?
  • Engagement and Participation
  • Audience
    • How do we involve the audience?
      Who is the audience, and why?
    • How do we create community ownership of a data viz?
    • How do we allow a data viz to speak to multiple disparate audiences?
  • Transparency and openness
    • Expose methodologies
    • Replicability of a data viz
    • Making the data viz process transparent
    • What assumptions are there in that data visualization?
    • How do design and aesthetic decisions bias a data viz?
  • Simplicity
    • How can we be succinct without over-simplifying the content?
    • Nuanced vs. bombastic
    • Can we build a language for the critique of data visualizations’ ethics?
    • Are there ethical ways to avoid nuance?
    • Presenting individual data points vs. an overview
  • Objectivity vs. subjectivity
    • Data as expression vs. data as fact
    • Is objectivity desired?
    • How do we use empathy without creating compassion fatigue?
    • The difference between invoking sympathy vs empathy
  • Honesty
    • When is a data viz most true?
    • When is a data viz most honest?
    • What about high-stakes data visualizations, like when there are life and death risks for participating subjects?
    • How do we incorporate criticism and critique into the visualization?
    • Data visualization is rooted in an Enlightenment fallacy that “the truth”, presented just so, will change things
  • Motivation and goals
  • Responsibility
    • Anonymizing data
    • Fact-checking data
    • Transparency vs. protection of subject
    • Marginal populations
    • Whose data is it, and is there consent?
    • Responsibly visualizing video / images
    • Does reliance on data de-humanize subjects?
    • How do we responsibly reduce complexity to convey points?
    • How do we make the creators of data visualizations
  • Culture
  • Risk & danger
  • The future…
    • Is visualization always stuck in the past?
    • Time travel strategies for slowing down time
    • Holodeck data visualization
IMG_20160115_113237

A constellation of Post-its

This is only a partial list, as I wasn’t able to type quickly enough for the fast-moving Post-It notes. You can view the original Post-It constellation over here and keep up with the conversation and the creative outputs over at responsibledata.io.

Let Me Get That Data For You: The Bing-Powered Data Inventory Tool

Let Me Get That Data For You: The Bing-Powered Data Inventory Tool

Our friends at the US Open Data Institute work to make it easier for governments and others to open their data. One of the first things a government agency must do before launching an open data repository is conduct an inventory of the data they’re already publishing. It lets you get everything in one place. This is a relatively minor step to creating an open data policy and repository, but it still takes work. We were thrilled to learn that the US Open Data Institute team, including Waldo Jaquith, Ted Han, and Dan Schultz, used the Bing Search API to create a tool that radically streamlines the data inventory process.

It’s called Let Me Get That Data For You. All you have to do is enter in a website URL, and it will search that domain for common data formats and return a machine-readable list for you to use. And then you’re on step closer to sharing your data with the world.

As of today, the service will return up to 2,000 datasets per search. This should cover many of the intended use cases, but if you’re working with an extreme case, you can head over to Github and run the open source code yourself.

This Weekend, Hack Away With #CodeAcross and Open Data Day

This weekend, New York City is all about open data. On February 21 and 22, the New York tech community will be celebrating two benchmark events: #CodeAcross and International Open Data Day. Birthed from the same passion to realize the potential of open data and civic technology, this weekend’s events seek to initiate newcomers into the world of data and bring practitioners and community members together to create new value from our community’s data sets.

We’ll kick off the weekend at Microsoft’s Times Square headquarters in a conversation with New York City’s Chief Analytics Officer, Dr. Amen Ra Mashariki. BetaNYC’s Noel Hidalgo will deliver questions to Amen that have been crowdsourced from the community. Ask Amen your question here.

This Weekend, Hack Away With #CodeAcross and Open Data Day

Then, on Saturday and Sunday, we’ll celebrate Open Data Day and CodeAcross at our partner Civic Hall. #CodeAcross NYC is an open event aimed at directing the incredible energy and talent of the tech community toward some of our community’s most pressing challenges and opportunities. The two-day festival brings together governments, community groups, academic organizations, and individuals passionate about data to create impactful solutions.

Cities around the world will band together to gain insights from datasets, make new applications, drive forward existing projects, and altogether use open data to show how sharing information can transform how governments operate and how society solves problems. Through the use of data platforms like SQL, HDInsight, Microsoft Azure, Excel, and Power BI, citizens can partake in civic engagement that promotes openness through all facets of government.

We’re proud that NYC has more open data sets than any other city. There is a plethora of data to be used for public good,particularly when you include BetaNYC’s community-maintained open data sets.  With hackathons, unconference sessions, a workshop on mapping open data, and a series of themed challenges designed to improve the City of New York’s data and its usage, the next few days will be especially busy. Importantly, these events are open to all – technical and non-technical, governmental and non-governmental – and aimed toward teaching about open data, in addition to building new tools and apps. Because realizing the impact of civic technology, in general, and open data, in particular, requires an all hands on deck approach that will continue long past this weekend. We hope you’ll join us.

Register for #CodeAcross NYC at https://codeacrossnyc2015.eventbrite.com.

Want to work toward public good through civic tech past #CodeAcross? Here’s how to hack for a cause.