Well, it has been an incredibly rewarding summer working with the Microsoft Civic Tech & Innovation team! I’ve learned a lot about the tools and programs that people in New York City and elsewhere are building to address various civic challenges. As my fellowship wraps up, I wanted to share a bit about what I worked on:
For the Civic Graph, I contributed over 900 lines of Python (and a bit of R) code to the Microsoft-curated codebase. In addition to preparing API documentation (viewable here), I spearheaded a project to automate data collection for Civic Graph.
Why is this important?
Civic Graph offers a unique visualization of the civic technology ecosystem. It contains hundreds of nodes and tracks connection-types between them. In its current state as a crowdsourced knowledge base, however, the application captures only a small subset of the actors and connections in the civic tech space. A few weeks into my fellowship, I asked “How might we make Civic Graph a bit smarter?”
With this in mind, I developed four tools — which I like to think of as “building blocks” — to improve the data quality (i.e. accuracy, completeness of data stored) as well as automate aspects of the data collection process. You can view everything I built in this repository.
I started by scraping the archives of Civicist and TechPresident, two core civic tech publications which together span 2004—Present. Next, I analyzed the scraped content by extracting named entities (e.g. people, organizations, companies, places). I utilized an open-source library called spaCy, which allowed me to tag and categorize entities based on part of speech and named entity type (e.g. person, place, organization). From Civicist and TechPresident alone, over 70,000 entities were extracted!
In order to build a true (and useful) semantic graph from the natural text I had scraped, I needed a way of identifying, extracting, and categorizing relationships between named entities. To do this, I implemented a classifier in Python using a Supervised Learning Model to label tokenized text data according to five initial categories: funding, data, employment, collaboration, location. These categories were chosen because they are the types of connections currently represented in Civic Graph. I trained the classifiers using Support Vector Machines, and used the content scraped from Civicist and TechPresident as my test data. Finally, I designed a pipeline outlining how a future civic tech fellow can integrate my classification system with the existing Civic Graph.
For Microsoft Translator, my primary role was to help with project development and organization.
Microsoft’s Translator is a machine learning-based voice-to-voice speech translation technology that enables two or more people speaking two or more languages to have a conversation using a common device such as a smartphone or tablet.
Our team collaborated with the Microsoft Speech Translation Research and Product teams as well as with various providers of citizen services in New York City. We tested all features of the Microsoft Translator application with the Microsoft Translator Project Manager to explore the product’s functionality, identify optimal civic use cases, and provide useful feedback to the Product team. I also prepared memos outlining goals, stakeholders, and project timelines for upcoming pilots that are expected to run this fall.
In addition to the two projects, I also supported the civic tech community through presentations and event-organizing. I co-presented to 24 Chinese delegates visiting Civic Hall about the Microsoft Translator project and my experience as a summer fellow. Although the delegates brought a translator with them, I used the presentation as an opportunity to practice my Mandarin-speaking skills!
Me (right) and John Paul Farmer presenting to the delegates.
Additionally, our team visited the Metropolitan Museum of Art to present to the Director of the Met MediaLab and some folks from the TED Fellows Program. We also presented to Microsoft CELA (Corporate External and Legal Affairs)’s Regional Director for Southeast Asia about our team’s current projects. Finally, together with 18F’s Aidan Feldman and civic tech fellow Briana Vecchione, I co-organized and hosted weekly “Hacker Hours” at Civic Hall, a two-hour period where folks at Civic Hall and outside members of the tech community can share what they are working on and get help on technical projects. Interested in attending? More information here!
The Civic Tech Fellowship has been a wonderful opportunity to participate in collaborative efforts between Microsoft and the City of New York, to work with talented and passionate people, to develop my technical skillset, and to observe how Microsoft is continuing to build a strong presence in the civic technology space. I am excited to follow the progress of Microsoft Translator, Civic Graph, and the other projects on which fellows around the country are working!