DataKind Recaps Machine Eatable: Interrogating Algorithms

This post was originally published on DataKind by Sara-Jayne Terp, Data Scientist at Thoughtworks.

In October, I attended the first-ever Machine Eatable, a new lunchtime series by DataKind and Microsoft Technology & Civic Engagement, held at Civic Hall New York.

The inaugural event’s “un-panel” – Cathy O’Neil, Meredith Broussard and Solon Barocas – spoke about interrogating algorithms, and the Machine Eatable team recorded this conversation. My key takeaways are below: if you missed the event or aren’t located in New York, you can listen to the full podcast below, and please be sure to check out the next event in November!

What is an algorithm?

Sketchnotes by Jonny Goldsten

An algorithm or model is a set of steps that you, or a machine, follow to solve a problem or answer a question. For example, you use an algorithm to get ready every morning: get up, take a shower, get dressed, eat breakfast. Machine algorithms are everywhere and increasingly are being used to help people make significant decisions, like determining who to hire for a job, who to accept into a university program or who to go on a date with.

While algorithms are transforming our world by automating decision making, they are not the impartial, precise mechanisms you might think they are. Algorithms and the data they depend on are created by people, and people are biased and make mistakes. People on the wrong side of the digital divide tend to be most affected by this, yet have the least number of tools to fight it.

And this is why we need to interrogate algorithms. We need to know what they’re considering when they make a decision, and whether they’re well-justified, substantiated, well-oriented and reasoned, or sloppy, unfair and discriminatory. This ultimately comes down to ensuring that algorithms are fair, transparent and accountable.

Can we make algorithms that are better and less biased than people are?

Sketchnotes by Jonny Goldsten

Fairness should be formalised and built into algorithms. We need an equivalent of civil rights laws for algorithms because right now, data is used unfairly without taking account of broader structural inequalities, and algorithms are developed using data from humans’ biased behavior and learn the same biases that they exhibit.

Algorithms are reproducing inequalities that affect people based on race or gender. For example, algorithms used to determine a person’s credit score may not explicitly use race in its decision making, but may inadvertently use characteristics like zipcodes that are proxies for race. Algorithms are also being used in the justice system to determine sentencing, recommending longer sentences for someone because of their past history of offenses or even their address. But this doesn’t take into account other societal factors that lead to recidivism or how a person might change his or her behavior. These structural biases should be changed, but we should also be careful not to build them into our algorithms.

Shining a Light on the Black Box

"Every worker should have the right to know how they're being evaluated." @mathbabedotorg #MachineEatable

— Jeanne Brooks (@jmfbrooks) October 22, 2015

Algorithms should be transparent: non-experts should be able to understand what they do and the biases that they introduce, through plain-language explanations and visuals (e.g. flow diagrams). Transparency can force algorithm choices, for example credit companies use decision trees because they have to explain why people are denied credit cards.

For algorithms (e.g. Google’s object recognition for photos) whose workings are more complex than rules produced by humans, interrogating algorithm outputs may be more practical than reading code. For instance, the Value Added Model creates a teacher score by comparing student test scores against expected scores. Teachers can’t interpret their scores because each score depends on every child in the school, and teachers of poor kids have highly variant scores (the kids have more variance in their scores). Giving teachers an app to examine their scores, confirm data and understand their score by doing sensitivity analysis (for class size, different schools etc) would be a better model.

Data Scientists Must Facilitate the Conversation

#MachineEatable is kicking off at @CivicHall with @mstem & @jmfbrooks introducing our speakers! pic.twitter.com/0bmVcACsx6

— DataKind (@DataKind) October 22, 2015

Part of our jobs as data scientists is to facilitate this conversation about algorithms and accountability for things that can change the course of someone’s life, like access to credit, jobs, and freedom. It’s difficult to identify responsibility for an algorithm, but we should do it. It’s difficult to make government accountable because politics are everywhere, for example the US Senate shutting down a CDC study showing correlations between public health and school shootings. We should nevertheless work to increase data literacy and remove the government’s excuse of “people don’t want to know that data.”

Please join in! Listen to the full conversation from the event and jump in with your own thoughts and questions at #MachineEatable on Twitter.