Q&A with Clay Shirky

Clay Shirky is one of those guys with a keen insight into technology and its impacts on culture. Last week we had the chance to sit down with him before he took the stage for the Microsoft Research Faculty Summit and Design Expo.

Below is an edited transcript and select audio from our talk, which touched on a range of issues, from designing for the internet of things to the theme of his keynote, making big data useful.


STEVE: How do we make data useful for human beings? Instinctively, there seems to be a connection between this concept of making data useful and the change in how device or user experiences are designed.

CLAY: You know, there’s a really interesting example of this. A guy from the UK by the name of Richard Pope designed what he calls the bicycle barometer, a device that combines publicly available weather data with all of the data from the Tube to give you a simple recommendation on whether to ride your bike to work or take the London Underground.

This is a device you couldn’t really imagine a few years ago. It’s battery operated, it’s wireless and it’s in constant contact with all of the rest of the network that it needs to be – just to generate this decision about whether you take your bike or the subway.

It’s not like I have to open my phone and say “Now please go check the weather.” The devices are there waiting to consume information on my behalf. And now, whether in the background or by giving me this sort of simple readout, I’m getting information without having to go ask for it. And I’m getting it because my devices are getting better at essentially acting on my behalf.

So suddenly you have designers saying, hey, I can make this device treat the web as a data store, treat the Internet as a data store, and the Internet people thinking, hey, if we have a device that any computer can talk to, it can be as tiny as an Arduino, it doesn’t have to be a thousand dollar laptop or it could be a $50 raspberry pie.

And especially for the small, cheap devices, those are the things that we carry around in our pockets and tie to our bodies all the time, and I think the rise of that part of the ecosystem is turning out to be really interesting.

STEVE: I hear you talking a lot about the hardware capabilities. How does this change the design principles?

CLAY: A lot of the ecosystem experience really comes down to understanding where is there data out there that I’m going to use in the background versus where is there data out there that I’m going to use in the foreground, right?

Whenever a new technology comes along, everybody wants to show the inside, to do a visualization that lets you see all of this data. I don’t have time to see all of the data. What I need is some kind of compressed sense of what’s going on.

So the Design Expo team from NYU had this idea to show people what they look like to data miners, right? We all have the sense, now heightened in the age of the NSA revelations, that large datasets are being built up about us and that people are looking at those patterns and making decisions, like whether to give you a credit card, or to ask you out on a date, or to hire you or whatever – the whole range.

We’re the ones producing all the data but we’re doing it a bit at a time. What the NYU team said was that you don’t have anything in this ecosystem that acts like a mirror – that shows you the way you look.

 So they worked on a project called MINE that would really give you that sense and what they’ve really wrestled with is what the interface should look like that tells people how they’re coming across in a way that’s useful. One of the things they learned quickly was that exposing all the data is a catastrophe because it confuses people and it’s too much.

What they’ve come down to is essentially a set of characteristics like employability, conviviality, whatever, and then they give you pluses and minuses. They essentially say, you know, if you’re going to be showing yourself checking into a bar every night in a week 20 days in a row, conviviality goes up, maybe your health insurance premiums take a hit.

They had to get it down to some middle zone where they could say “If you do X, more of Y is going to happen and less of Z is going to happen,” where Y and Z are things I care about – like people’s sense of my employability or my trustworthiness.

We hear about big data services as civilians but we don’t often interact with them. Much of the big data right now is essentially still in the background. Figuring out how to expose big data in a way that is valuable to individuals is, I think, a huge and interesting design challenge, and it’s now sort of starting to show up in this environment.

STEVE: What’s the latest book you’ve read that stands out?

CLAY: Yeah, so I recently read Dan Sperber’s book — he’s a sociologist — called “Explaining Culture.” “Culture” is one of those words that we use in all kinds of vague ways, but the hard question at the base of culture is why groups of people behave the same way over long periods of time but only short geographic distances, right? You can have two really different cultures right next to each other that don’t blend and turn into some kind of gray version of the combination of the two, and that stay stable over these generations.

What Sperber says is that way down at the bottom, away from thinking about piano concertos and sculptures and sort of all of these kind of cultural manifestations we see, there are really only two operations: I can take something in my head and try and manifest it in the world. I can talk, I can write, I can paint, I can dance, whatever. And there are things that I can see in the world that I can make a mental representation of. So there’s this back and forth between human minds and the outside world. I can make these gestures that other people can see, other people can make these gestures that I can see.

So we’re doing all of this tracing of very large scale social graphs, right? We’re looking at the Facebook social graph, we’re looking at the Twitter social graph, we’re looking at all this material flowing through it.

It’s really easy to trace a piece of information that went from here to there. We saw the digits move across the wire and so we know who looked at what when, when it ended up in your e-mail box or whatever. But we’re not looking at the mental representations. What are people making of this material? What does it mean that a “lolcat” spreads?

When things go viral, everybody is focusing on how many people saw it. But there is this really deep, complicated question of how many people internalized it, right?

And I think that a lot of the work that’s been done on the social graph and on virality can now be joined up with this idea of what effect is this having not just on the way individuals are thinking but in the way that groups work together, because we have all of these shared representations. And that’s a big vague question. It still needs to be tightened up.

Back to the theme of big data, we have so many of these large social datasets, but when we look at them as time signatures, we’re mainly looking at the flow of information. We’re not looking at changes in people’s behavior before and after they saw the information. And if you could say this movie, this piece of writing, this visual representation, this game spread through this community, and afterwards we could see how the community’s behavior changed. That’s a really hard set of variables to isolate but if you could, it would be a really interesting piece of research.

STEVE: Which bleeds over into machine learning.

CLAY: Yeah, right, exactly, picking up those things. The thing about machine learning is that it’s getting incredibly sophisticated at picking up stable signals, but one of the things we’re not very good at doing yet is figuring out how human behavior, which is kind of variable and crazy, can be made tractable enough for the machine learning algorithms to actually get some purchase on.

And the answer is not to assume that human beings have sensible, comprehensible behaviors, because we don’t. The answer is to build up datasets that are clear enough that the machines can actually learn something and tell us something useful about our own behavior.