Recently data strategist Max Shron of Polynumeral spoke to the NYC Data Science Meetup about techniques for improving focus, communication, and results in data science campaigns, including the CoNVO process detailed in his book Thinking with Data. It was a terrific and extremely practical talk, a video of which is embedded below. (If you’re pressed for time, you can use our time-stamped summary to skip to specific sections of the
Here’s the summary of Max’s talk, with video, slides, and the full transcript below:
- Introduction by K Young, CEO of Mortar Data (0:10)
- CoNVO (8:21)
- Context & Need (10:30)
- Vision (12:06)
- Outcome (13:30)
- Kitchen Sink Interrogation (16:49)
- Outcome (20:53)
- Arguments (21:55)
- Claims (28:54)
- Evidence (30:29)
- Rebuttals (33:06)
- Causation (35:46)
- Categories of Dispute (39:06)
- Disputes of Facts (39:24)
- Disputes of Definition (41:20)
- Disputes of Values (44:39)
- Disputes of Policy (46:31)
- Conclusion (51:29)
- Q&A (54:00)
Max Shron: Thinking with Data from Mortar Data
Introduction by K Young
(0:05) K: Hi, everyone. We have a really exciting speaker tonight, Max Shron. My name is K. I’m the CEO of Mortar, and we’re the folks that coordinate this meetup every couple of months. What we do at Mortar is we try and help our customers who are data scientists and engineers who work with data, especially in large volumes, to be able to focus just on problems that are unique to just their business. So, that means not building infrastructure and setting it up, and monitoring it, and not building complicated algorithms and that sort of thing. The way that I came to know Max is that we work with him, and now his company that he founded Polynumeral, which is very close to what we have — particularly our difficult data strategy challenges. So, we’ve been working with Max for a while, and I know it’s going to be a great talk. He’s actually been part of the New York data science scene for about five years and is a recently published author and writer. He wrote Thinking With Data and has got a couple copies with him here that are available for purchase and autographing if you guys are interested. Enjoy the talk.
(1:22) Max: Thanks, K! I don’t know if the microphone thing is on; I’m kind of loud anyway, so if you can’t hear me, let me know. And if I get talking too quickly or anything, just go like this [gestures] and I’ll slow down. I’m actually from New York, so my style of speech is in keeping with the speed of the city, how about that?
(1:43) So my name is Max Shron. My company is called Polynumeral. We work with lots of organizations that are trying to do things with data that they can’t do right now but want to. We do a mix of strategy work and actual implementation. We’ve done some projects with the World Bank, which I’m going to be talking about one of those part way through. We’ve worked with New York Public Radio, a number of start-ups here in the city, like Donors Choose.org and Warby Parker, and the kind of problems we tend to tackle are the kind of things where there isn’t a nice out-of-the-box solution. So what I want to talk to you about today are some of the techniques and some of the things out there that I think make it a lot easier to solve challenging data problems.
(2:24) The first thing I want to say, the genesis for this book I ended up writing over here, Thinking With Data, was a talk I gave last year called How to Kill Your Grandmother With Statistics — it was a little five-minute night talk, and the talk was all about how if you treat statistics or if you treat data science things, machine learning things, as a magic box — as a black box where numbers come in and truth comes out, it’s very easy to be taken advantage of. In this case, it was very easy for drug companies to mislead regulators because they treated statistics as magic, right? I put in my numbers, I do my statistics, and out comes a yes or no answer, and I’m done. And of course any of us here who work with data, who work with statistics, know that’s just not the way it works. There’s a lot of subtlety and a lot of nuance, and how we learn to manage that subtlety and nuance, I think, is really interesting. We’re also not the first people to tackle this kind of problem. I went rooting around in lots of other disciplines and tried to understand how it was that people in the world of design, people in the world of humanities and social sciences deal with data in a day-to-day kind of way, or deal with problems that are underspecified, hard to understand, and difficult to tackle. And I think that if you work in data, you’ll recognize a lot of these things are common with your work as in other places.
(3:48) So, in much the same way that data science did not invent statistics, data science did not invent machine learning or computer science but found harmonious ways to use these things together to actually accomplish things, I think we can learn a lot from other disciplines that can help us make things make more sense. I think one of the biggest challenges we face as data scientists is that it’s really easy to go down into a data rabbit hole. If you’ve ever played with data or worked on a project, it’s incredibly easy to get some data, pull it up in R or Python or even Excel and sit there and play with it and play with it and play with it until you feel like you can’t do any more and you fall asleep. It’s days and days and days without food or drink or anything, just playing with data — at least that’s how I feel. And you know, it’s really easy to do that and end up someplace totally wrong. It’s absolutely very easy to spend tons of time working on a data challenge, only to find yourself having created something that’s actually not that interesting or useful.
(4:48) Designers have had to deal with this for a long time, and I think that if we try to do this from scratch, we’re going to end up finding that, really, we’re missing out on a lot of great things that have been invented before. There’s another issue, that data science work in general is hard to communicate with people who are not already experts, to explain to them what we mean when we say a predictive model, to get examples, to elicit ideas from them, right? If you work with data, you almost never know everything coming in. You’re more of a horizontal specialist than a vertical specialist. Your skills are about tackling someone else’s problem with your skills, but if you don’t understand their problem, it’s really hard to get someplace good. So, we have a lot of specialized knowledge. It’s hard to communicate back and forth with people. Also, not all statistics is actually well-adapted to use in the real world. There’s quite a lot of things that you might cover in Intro Stats class or even in Advanced Stats classes. I’ve talked to lots of folks with PhDs who find that the kind of questions that they learn to ask in a classroom don’t necessarily translate well to business problems or problems that nonprofits face. In fact, sometimes even people who have done really great applied work but have always been given clean datasets, you know, those kinds of things, their challenges are quite different. So, how do we learn to bridge those things. Anyway, we need techniques that will let us handle these kinds of challenges of going down the rabbit hole, or making it hard to communicate with people, or making sure we’re using the right tools. And so I think we actually have quite a lot to learn from people in the design world, in the humanities, in the social sciences. And so I’m going to go through a couple of big models that you can fit in your brain that I have found, personally, very useful for my own work and working with clients and understanding how to scope and answer data questions in the right kind of way. So there’s some technical material in this talk. I’m going to presume that if I talk about predictive models and I give some examples that people are fine with that — I know it’s a data science meetup, but if you don’t have a strong technical background, I think you should be able to follow along just fine.
(7:05) This small data scientist here is writing something down, and I think one of the most useful skills — I remember hearing this always growing up, “if you want to be an engineer, you have to be a good writer.” Engineers are always good writers; it’s not just the math, it’s writing. But of course the curricula at universities don’t reflect that, right? They reflect the 50 math classes and 20 physics classes and all kinds of EE things, whatever you need to get a degree in Engineering. But in actuality, writing and understanding how to express your ideas I have found incredibly useful. What I’m going to talk about is a set of techniques for figuring out what to write down. What are the things you want to have as clear as possible in your head before you start playing with data and going down that rabbit hole?
(7:52) So, part of the issue, of course — and this is where the design stuff comes in — is the world really gives is very vague requests. If you are a practicing data person, either building infrastructure, or answering questions, or just writing reports, whatever it is you do, almost certainly the kind of questions someone first comes to you with is not the question you end up answering in the end. The more we can do circumvent that and get to the right thing, the better off we are. I’m going to actually get to the meat of the stuff.
(8:21) I’m going to go through what I think is an example of a bad scope first; if you just sat down and you’re working on a problem and you say the first thing that comes to mind, if it’s often not the right thing, right? So, imagine you’re working with a company, maybe you’re a consultant, maybe you work there full-time, and they have some sort of subscription business and the CEO says I need a churn model. I need to know the chances of somebody going to quit subscribing. That’s what they want to know. So, a very common answer is I’m going to use R to create a logistic regression to predict who will quit, and I’m pretty sure if you go to almost any tech-focused meetup, the discussion tends to be around what is the best X and Y for X the tool and Y the math. But actually I think this is the least important part of this problem. I think it should be taken for granted that if you are a data person working for a company that you already can do these parts, right? If by the time you get to this bit, you already know how to do that — the question is, what else is there? How do you actually know the right thing to do and how it fits into the larger question? Things that are not actionable — a lot of it is irrelevant detail. Like, what do I do with this? I finished making the model, and then what? How will it actually fit into the bigger scheme of things?
(9:41) What I’m going to talk about is a four-step process that I kind of culled from the design world, a bit of the consulting world, that I find really useful, outlined more heavily in Thinking With Data, called CoNVO [slide at 9:54], and CoNVO stands for Context, Need, Vision, and Outcome. These are the four things that you should really have clear in your head by the time you start working with data. They don’t have to be crystal clear; it’s an iterative process. You should have some idea, at least to either say out loud or write things down or something to know — what do you know about each of these steps? And if you find that one of them sounds vague — because after doing this for a while, you develop a nose for what it looks like to have a very vague scope. If one of them sounds vague, you have to go in that one deeper. So, I’m going to go through examples of what I’m talking about, and you’ll see what I mean.
Context & Need
(10:30) So, the context — CoNVO, Context, Need, Vision, Outcome, the Context. The first thing is who are we working with and what are their big picture, long-term goals. For this first example, I’m saying — this is pretty similar, the company has got a subscription model. They’re interested in improving profitability. Well, that’s interesting. You could imagine building a churn model in a case where someone’s trying to get as many as users as possible. You could imagine building a churn model where somebody is trying to just increase revenue as much as possible. Those are actually two different goals, and the kinds of features, the kinds of things you’re going to end up thinking about, knowing the ultimate goals of the organization are going to change. So, Context, Need — what’s the particular knowledge that we’re missing? When it comes to working with data, your job — my job, anyway, and probably your job — is to produce knowledge; either knowledge for a person, knowledge for an algorithm, some representation of the world that something can act on, some way of translating a lot of vague, messy numbers out there into something that you can rely on. So, what is that actual thing that we’re missing? In this case, we want to understand who drops off early enough that we can intervene? Already, if you go back to here, we’re looking at creating just a logistic regression. This doesn’t say anything about the timeframe. Finding out two seconds before that somebody is about to quit is not helpful, right? Figuring out that actually the need is to understand early enough so we can actually do something. That makes a really big difference, Context, Need, Vision. I think this is where quite a lot of the work gets done once you identify the need.
(12:06) And the vision is, what would we look like to have been finished? What will it look like when I’m done? So the example here is we’re going to build a predictive model. Notice we didn’t say anything about logistic regression or random forests or anything else like that. With a predictive model, we’re going to use behavioral data — so now the key thing here is we’re not just going to say, okay, based on the person’s zip code, when do we think they’re going to quit? That’s probably not going to be helpful. We say behavioral data because we’re realizing, at this stage, before we’ve done any analysis, the data that’s going to matter is someone using something. If we want to know whether or not they’re going to stop subscribing, are they opening the emails? Are they logging into the service? All of these things are — we would have maybe gotten to eventually, but if we think, out front, what are we actually trying to do here? Having that clear makes it a lot easier. So, the source of data, behavioral stuff, is important. The fact that we want to actually predict early enough — maybe we should be investigating the kinds of interventions we actually can do. If the problem was that they don’t open up their emails, sending them another email with the same kind of subject heading is not going to work. If we don’t know yet how we’re going to intervene, it’s really hard to know how we should build our model. If we realize that we have another channel to send people offers, or we know that we have, in reserve, some different kind of headers. Or, someone is our service and hasn’t logged in for a while, and we actually — if they’re high enough value, maybe a human being calls them and asks them if there’s an offer they can give them to get them to use it more often. So, figuring out your levers makes a really big difference about the kind of model you’re going to build, which you might not have realized if you went straight for it.
(13:30) Okay so the context, need, vision, and then the outcome. Now this, I think, is the thing that I think is most often neglected when somebody starts working on a problem. Outcome — who is actually responsible for what happens next? I build a model, my R was great. Now what? How does it actually connect back in to what happens on a day to day basis in the organization? In this case, we’re saying there’s a separate team — that’s important; there’s a separate team that’s going to actually implement this model. It’s going to run every day, and it’s going to send out email offers. What’s going to happen when this thing is done? It’s going to have a big effect on the kind of model we build, the kind of data we pull, and how much time we think it makes sense to spend on this.
(14:34) And then second question of the outcome, how do we know if we’re correct? You could build this model, and then you do great in cross validation, but cross validation is not the same thing as success. So, how do we actually know the model is working? Well, we can say that we’re going to have success calculated weekly on held-out users. We realize we need a control group to actually do this properly, to actually make this work. We need people who we think will quit, and we do nothing. We need to make sure that’s okay with the business. We need to figure out how we keep track of these things enough to see what’s confounding our experiment. One of the things that I’ve found in general that’s just apart from CoNVO things is that a lot of what doesn’t get taught in data science curricula or in stats curricula is, even after you’ve done something that’s not experimental there often still is an experimental step to understanding if it’s working or not. And seeing that is part of the bigger picture, I think, really helps focus the work that we do.
(15:32) So, a great question, of course, is “well, Max, that was a great CoNVO, how did you develop that?” Well in this case, I made it up, but in general I think the folks in the design world have done a great job of figuring out what are some of the different techniques out there that we can use to get our scope really clear. And of course part of it is write down what you think and share it with people. Get feedback from people and say, “here’s what I’m planning to do; does this make sense?” And I’m going to talk about another method for that in a moment, but you know, just literally talking to people, thinking for a second, roleplaying you’re a user of the service who is going about their day, thinking about personas — well, I’m a user who has been there for a couple of years, how might these offers actually affect me? If I’m already ignoring the email, will I notice it? If I have a long-term relationship, am I going to be paying attention? Thinking through some kind of personas and roleplaying and seeing, from the perspective of — either in this case, the person who is going to get the emails, or if we’re working on a problem where we’re building a tool for someone, building something where someone’s going to click around and solve some problem — the user’s persona method is really a powerful way to understand the kind of problems we’re trying to solve.
Kitchen Sink Interrogation
(16:49) So there’s something I call kitchen sink interrogation, just literally asking every question that comes to mind. Let’s say I’m going to build a model — when do people show up? When do I measure when they show up, how am I going to measure the results, are there more people on a Monday than a Tuesday, how long do people stick around before they leave? Just brainstorming as many possible questions as you can. Most of them are not going to be relevant, but brainstorming those kinds of things, giving yourself time to actually focus on the question-generation before you start X’ing things out is a really useful way to seed your brain for putting these kinds of things together.
(17:28) And finally, one of the things that I think I find to be the absolutely most useful tools out there — and that’s mockups. This is another thing that the design world is crazy about and I think every time I see a data scientist or statistician or a data mining person or machine learning person who doesn’t draw a fake graph or make up some fake sentence before they actually go ahead and do their work, I think, is wasting a huge opportunity. Just the process of writing down the axes on a graph — even if this is actually a thing which is going to require 40 steps to calculate — and even if we can never make this graph because it’s a 100-dimensional problem, just writing down some of the labels on these graphs, having some sense of how these things should relate, getting some sense beforehand of what we’re talking about I find to be incredibly useful.
(18:21) Here’s actually a picture I took at Polynumeral. We’ve got a ginger guy who, he wanted to know exactly what to work on for this one problem, and we sat down and we said, “Ok, what would it look like if we drew the graph for this?” What would it look like to be done? And so we drew some graphs and said well, here’s the stuff for year one and year two. We expect it to be mostly the same, there might be some outliers. If something’s up here, we know that it’s growing in a certain way. This process of drawing something out I find really useful, or getting a sentence — just yesterday, I was talking to a potential client, and we were going back and forth on what they were looking for. And then they sent me an email that said, oh, we want it to be able to say that if there are 9 rooms in your house and you don’t have a thermostat that’s smart, then our system is not going to be helpful for you. They gave me an example of the kind of conclusion you would come to, after having done the work. And now I can work backwards and actually figure out what are the requirements to let me answer those sorts of questions. So some kind of mockup I find to be incredibly useful thing.
(19:27) Let’s go through another example of what I mean about a CoNVO, Context, Need, Vision, Outcome. So, let’s imagine that you’re working with a hospital system. It has 1.25M patients and magically they have good records for 20 years. That never happens, but let’s just pretend it does happen. And the CEO is interested in building a tool for reducing medical issues. I don’t know yet what it is, but I know the data is useful for something. So, the need — so, we talk about some doctors, and some of them are asking, what do you think the biggest issues you’re facing are? What problems would you like us to solve? And they say, “oh, we think that actually there’s maybe antibiotics being overused, and we’re not sure; we’re not sure how to find out if they are and if they aren’t.” That’s a serious problem, but it’s a need. We’ve figured out what to solve here. And so rather than jumping straight into building the tool, let’s say first things first. The project here is we’re going to have a pilot investigation. And if we find some kind of signal, then we’ll build the tool. These are only — the sentences are too long; I think before I actually started doing work with this kind of problem, I would want maybe a paragraph or so for each. And if you knew it was going to take six months of time to work, maybe a page for each one. I think it’s possible to bang one of these out in 10 minutes for a problem you’re going to work on for the rest of the afternoon. But this is just a very high level — the most important bits.
(20:53) And the outcome is, the chief medical officer will decide if the pilot is valuable enough based on the report we’re going to make, and if it was good enough, the CMO will — not the CEO, the CMO — will be the one who will actually run the tool. It’s really useful when you’re building something to know who is going to be the person who uses it. Once I know that they’re the one who’s making the decision — maybe this person is a lot more technical than the CEO. And if I’m writing a report for them, I can include a lot more technical language than I might have otherwise. Before I start doing the investigation, I might want to know who is my audience. Learning to write these things down, to get these as concrete as possible and iterating a couple of times if necessary, is an extremely valuable way to make sure you’re attacking the right kinds of things.
(21:55) So next, I want to talk a little bit about arguments. Moving to your CoNVO is really about getting started. This is really talking about where you end up. Another set of mental models, another set of tools that I have found really useful for focusing the kind of work I do. And I’m going to enjoy this water while you enjoy this Creative Commons picture. I wonder who owns the rights to this, because some other company made this, and then some guy took a picture of it with his fingers in there.
(22:34) Anyway, so data is not a ray-gun. You cannot shoot somebody with your data and then afterwards they agree with you, no matter how nice it would be for that to be true. People have to be convinced; you have to be convinced. If you’re solving a problem, there will be many, many times where you scratch your head and say, “Really? Is that— really? Okay, okay. I get it.” Sometime between the “huh” and the “okay,” something happened. Something between when you said, “I don’t believe you,” and “I believe you,” somebody was convinced. You were convinced; however was reading the work you were doing or was working on this thing was convinced. Somewhere, that had to happen. And the world does not run on deductive logic, so if you’re making a proof, proof is deductive. When it comes to dealing with people, people don’t think deductively. There’s a lot of stuff going on in people’s heads; there’s a lot of rhetoric about how somebody comes to believe something. I think this is actually very valuable to know when we’re trying to figure out if we should believe something and we’re trying to convince someone of something else as well.
(23:44) Whenever we want to trust a tool — you know, I’m going to build something and it’s going to go into production — do I trust that it will do the right thing? I used to work at OKCupid, and we made a lot of graphs that were trying to explain things that we thought people should believe, and someone has to be convinced by that graph. That graph is communication, and if we don’t understand how people learn from the things we do, then we’re not going to be able to do it very well. I think it’s very interesting when deciding definitions — data science is full of deciding definitions. There is almost never a unique way to represent something. If you wanted to find stickiness or growth, or you wanted to find poverty, or you wanted to find success — these are things that you will see people, all the time, building models to define something that they haven’t even justified calling it the thing they’re calling. Words are really powerful. If I call something poverty versus calling it, “on the 80th percentile expected amount of $2.5 per day given door-to-door surveys actually computed from an international census with localization” — that’s actually the definition of what it means to talk about poverty when you’re working with the World Bank. However, they say poverty because poverty already means something, and if we want to talk about something that matters in the real world, you have to use those words. And justifying the definition of why I should believe that that’s how poverty is defined, that’s a challenge that data scientists have to deal with. You will define things, and if people define things for you, are they any good?
(25:26) And finally, getting somebody to act differently. If you do an analysis and you build a tool and something comes through and somebody doesn’t act differently — even if they make a decision, just because somebody decides something doesn’t mean they follow through on it. So actually, getting somebody to feel compelled to make a decision, getting a CEO to decide to invest in a new product or drop a slot in your advertising slots — whatever it is, getting somebody to make a decision and act differently is tough, requires argument. Let’s go through this very quickly, I said it’s not deductive. There’s something else going on here, and the nice thing is is we, again, don’t have to start with this from scratch. It’s an important part of what we do every day, but there’s literally 2000+ years of research on how people come to be convinced of things. So, when I originally sold this book to O’Reilly, I said I was going to write a book about applications of Ancient Greek rhetoric to data science, and they were really excited, and I had to disappoint them that it’s actually only a small part of the book. Tim O’Reilly, by the way, is a huge Classics nerd which I found out through this, who knew?
(26:44) So, this picture here, this is a picture of the Indian subcontinent. Over here, we have all the lights visible at night around Bombay and Chennai, all these places. Over here, we have Bangladesh; some of it visible, but a lot of it is darker spots. And the project that I want to talk to you about that I think will help us give these examples of what I mean when I talk about arguments is one that we worked on with DataKind that later Polynumeral took on with the World Bank — trying to predict changes in poverty with satellite data.
(27:21) Poverty estimates typically take 5 to 10 years to actually get a good on-the-ground estimate. 5 to 10 years is actually extremely quick in a lot of these countries. It is a lot of door to door, knocking on people’s doors, getting them to keep financial diaries, doing very complicated imputations which, obviously, you can’t ask the whole country. You can’t sample evenly across the whole country. How do we back that out, the censuses themselves only come out every 10 to 20 years in some of these countries. Very slow estimates.
(27:50) And so the vision in this project was to predict whether poverty estimates should go up or down in the next estimation based on satellite data. Is it bright at night? Is it shinier? Is there more or less green area? So, can we make something that would convince us that we actually can rely on some kind of model to informally guide where policy decisions are going to go. I think informal is important here because this wasn’t attempting to argue that satellite data can replace door to door surveys, and I think, when you start to talk about arguments, understanding what the outcome is going to be already helps us understand the kind of argument we have to make. We’re not going to make an argument that is going to convince every PhD in Development that this is THE way to define poverty understanding. We’re not going to do that, nor should we. The argument we’re trying to make is if we use this, we can maybe trust it to tell us, directionally, which way things are heading in a little, small-scale area.
(28:54) I’m going to go over some of the vocabulary that I think is really useful for talking about arguments and give some examples along the way. A claim — when I said before that there’s something that you didn’t believe and then you learned something and then you believed it afterwards, that was a claim. It’s basically a thing that you can state in a sentence or a couple of sentences that, at first, you don’t believe but after someone makes a case for it, you believe it. So, for example, “poverty can be modeled effectively with satellite data.” Another one might be that, “the average nighttime illumination across all of the small areas in this graph is 244 millilumens,” or whatever — that’s a claim. If you just told me that before you did the calculation, I might not believe you. If I was skeptical, I would want to see evidence. I can make this claim and then I provide some evidence and then the claim is actually, hopefully, demonstrated. But before I do that, there’s always going to be some prior knowledge. There’s always something that people come and take for granted by the time you’re trying to convince them of something, or you’re trying to convince yourself of something. You already understand “nighttime,” and you understand “light.” You understand the math, probably; you understand when I make a graph of something, how to read the graph. There’s a lot of things that come in already that are already in your brain before this claim makes sense. So when I say that you’ve got something you already understand, but something you don’t get yet, somebody gives you some evidence and now you believe it.
(30:29) Evidence is where data enters into an argument. So, counts and models and graphs and all the things that we do to turn our data into something which can be comprehended — pretty much always what we’re doing is we’re actually making it to serve as evidence for an argument. Because of this graph, because it looks roughly normal, I believe that this model is good because my residuals are as they should be. I’m making a case that the model is trustworthy, and because my cross-validation error with this metric is good enough, I’m making a case that it’s actually valid. And I’m using very stats-y, machine learning things here, but I find that, oftentimes, a lot of the times it’s just counting. So, I believe that A is bigger than B. Well, until I count it, I don’t know. I have to take my 100TB of data, fire up my Hadoop cluster using Mortar, write some of this in Pig, get out to numbers, and now I actually have some evidence for why I think A is bigger than B. The justification is the reason why I think this evidence implies the claim. And when you actually look at an argument, you’ll see somewhere along the line, somebody had a reason why this evidence was enough to convince them. Why do I think that A is bigger than B is good enough? An example might be, let’s say that we’re running an experiment and I believe that if I put the name of the company in bold, that I’m going to get more clicks. Let’s just use something very, very banal. My claim is if I make one thing in bold, comparative control, I’m going to have more clicks. My prior knowledge is I actually know the theory, a little bit, of how randomized controlled trials work. Alright, so I have some prior knowledge that if I present you some evidence of saying, “Well, I presented these two things to 5,000 people, and the click-through rate was higher for A than it was for B.” The actual math, the math I did, was just basically taking a ratio, but how it fits into the bigger picture is, it’s now evidence that because I know how a randomized controlled trial works, I know how A/B testing kinds of things work, I have some justification for why I should believe that this is actually a causal effect going on here.
(33:06) But of course there can be rebuttals, you can go through all of that work and somebody could say, “Actually, your sample size is too small,” or “Actually, your randomization was done incorrectly,” or maybe you did this experiment on population A, but it’s really population B you care about. You tested this on a Sunday afternoon, and Sunday afternoons are when all the grandmothers come to our site, and actually Monday through Thursday is when most of our traffic is all 20 somethings. And so it’s a totally different population, we made a mistake.
(33:40) The same sort of thing, funny enough, happens with medical testing a lot where something will work very effectively at one hospital, and then it won’t work at another hospital. And sometimes it’s because that’s just false positives and it happens. Sometimes it’s because there are small changes in how things are administered. You might give somebody an injection, and if the injection is at a different angle in one hospital versus another hospital, it can have a really big effect. So, there can be rebuttals. There can be reasons why we think this claim does not hold, and knowing those before you present an argument, of saying, why would I believe I’m wrong, what could be missing here, and attacking those things is extremely valuable. That’s how you cover your bases.
(34:26) If you imagine going back to this poverty thing, if you imagine poverty can be effectively modeled with satellite data — I’m going to come up, in the course of building this argument, with like 20 reasons why I might be wrong. And I probably can’t refute all of them, but if I take the time in advance to figure out what they are, I have a much better chance of producing something that’s valuable. And I’m not saying that people don’t already basically do this — when you’re sitting down to do work with data, you’re probably doing something just like this all the time. I have personally found having vocabulary makes it easier to focus yourself.
(35:01) However, the really, really useful thing is not so much the vocabulary as it is the patterns. Once we understand that we’re making arguments, and once we understand that what we do all the time with data is convince people of things, either ourselves or others, we realize that actually most arguments fall into a small number of buckets. There really are not an infinite number of kinds of arguments out there, and once we know the different buckets, it makes it much, much easier to quickly come up with what to say. This is a classic thing if you ever did debate or something like that, it’s very common to say, which kind of argument I’m making — great, I just plug in my claims and I’m done. So, I think that same thing holds for data, and I’m going to go through some examples of what I’m talking about here.
(35:46) I want to talk about something called categories of dispute that I think is an extremely useful technique that comes to us from, I believe, the philosophy and law and debate people. But first, I want to talk briefly about causation. Whether or not you think your work is casual, if you are doing any work with data and somebody else is reading what you’ve done, they will interpret it causally. You cannot get around that. Everybody will always make a story out of what you told them that convinces them they understand how the world works, and so it behooves you to do your damndest to meet that expectation. It is not good enough to say, “I’m sorry; correlation does not equal causation. I’m just going to give you R squared’s and call it a day.” You are abdicating your duty — you are neglecting what you should be doing; you’re doing it wrong, you’re not doing it right if you are not constantly thinking about how the models you make and the work you do fits into how the world actually works. Now, it may be that you legitimately can’t capture any of that, but if you believe that the only kinds of causal statements that are valid are causal statements made from randomized controlled trials, you should read some of the literature on all of sociology, psychology, or quantitative history. There is an enormous number of work out there on how we think about causal relationships with only observational data. Sometimes it’s better, sometimes it’s worse. Sometimes we can ask questions before and after and make comparisons; sometimes we can’t. I can’t, unfortunately, in this talk go super in-depth into the theories of causality out there, but suffice to say, if there are two things you take away from this talk, one of them should be mockups and the other should be that you should really educate yourself about things like quasi-experimental design. How do you actually make statements that are justifiable about causation when you don’t have randomized experiments? It can be done, and it is done all the time, and you should be doing it. So, enough of that rant.
(37:57) Causal analysis — these are patterns of arguments. How do I convince you that I have accounted for enough of the alternative explanations that I actually have a causal explanation? If you think about what it is when you say something is causal — when you do a randomized controlled trial, what’s going on? I have found a way to hopefully deal with as many confounding factors as possible, but I have to make a case to you that I have carefully controlled these things enough. There is no math that will prove a causal relationship. There is math that will prove is uncorrelated, and there is math that you can use as part of your argument to make it clear that when this happens, that happens too, most of the time. But the math will never be enough. There are things you’re going to have to fit together to make it make a clear case. If this is interesting to you, there’s a great book called Counterfactuals and Causal Reasoning, I believe is the title of it, and I have a brief introduction to this stuff in my book as well.
Categories of Dispute
(39:06) Enough about causality for now, I want to talk about something which you won’t find talked about a lot in the data world, which is the idea of categories of dispute. So just to tell you again, we’re talking about patterns, argument patterns, that lets us use — not to come up with these things every time.
Four Kinds of Disputes: (1) Facts
(39:24) There are four kinds of disputes that most arguments tend to fall into, and when I say “kind of dispute,” what I mean is if I give you my argument, why will you be pushing back? What is the thing that I’m saying that you’re going to say, “Ehh, convince me.” Starting with, I believe it was, Cicero around 50AD people started realizing that most times the core kernel of an argument — the bit where people actually come to a head — falls into one of four categories. The first category is called fact. This is the one we’re probably most comfortable with. Disputes of fact — you might say, “The F1 score for this model is 0.7” That is either true or false; it’s either factual or it’s not. Once everything is laid out, we just have to do the work to do the computation.
(40:00) Another example of a dispute of fact — if I say, “Global temperatures have risen 1.5 degrees Fahrenheit since 1880,” it might take several thousand scientists 20 or 30 years to show that this is actually factually true, but it’s still essentially “How do we know if we’re right or wrong?” and “Are we right or wrong?” I would know before I went and did the work how I would prove myself wrong, so it’s a dispute of fact. Some people, I think, incorrectly assume that these are the only kinds of things out there, that either you know a truth condition or you don’t, and if you don’t know a truth condition, somehow it’s not a thing you can argue about. Which, of course, is ridiculous because you argue about things which it’s hard to say if it’s right or wrong all the time.
Disputes of Definition
(41:20) For example, definitions, I can say a definition is right or wrong, but there are no cut and dry rules up front to say, “good definition,” “bad definition.” I can make a case for it. I can make an argument that a definition is valid, but I can’t give a boolean true or false out of defining something. So, this thing I said before, “poverty is defined as the Foster-Greer-Thorbecke measure of alpha equals two.” This is an example of a definition. Another example of a definition would be the way we define global temperature is based on a weighted average of land and sea measurements over several years with weights such and such from these different buoys that are out in the ocean and this satellite data we use for reflectivity or whatever. So, there are multiple ways to define global temperature, and until we have a good definition, it’s hard to make a truth condition. We have to actually justify this definition. So, first off, does this definition make a useful distinction? Once we know we have a definition, we have some stock issues. If we hit these, then hopefully someone will agree with us and the argument at least can continue. It is useful distinction. In other words, if we find something out from declaring this is how we measure poverty that separates out of the population in some way into “poor” and “not poor?” Does it actually tell us something about the population we didn’t know before? Or, with temperature, is it different if the average temperature is 65 versus 75 — how does it actually relate to the rest of the world? Maybe average temperature is not a useful concept. Does this give us something we can work with? Consistency — how consistent is this definition with our intuitive idea of poverty? This definition actually works out to something like “the fraction of population which is on dollar a day or less poverty and also takes in to weight income inequality as part of that.” So this actually does a very good job — this particular metric — of capturing what we mean when we say there’s a lot of poor people in a country. The fraction of folks that are living on dollar a day poverty and the amount of money that is concentrated in the hands of a very small number of people. Those things together give us two different understandings of what it means to talk about a poor country and actually this metric uses them both. Interestingly, if you look this up on Wikipedia, it’s this beautiful little metric where the choice of the constant lets you go smoothly from just fraction of population that is on dollar a day poverty or less up to a purely income inequality based one. So, in the Mexican constitution, it’s written that poverty will be defined by the Foster-Greer-Thorbecke measure of alpha equals two. It actually says in the constitution how to define poverty, and I guarantee you there was an argument of how that was going to happen when that was adopted. So, useful, consistent, and then finally what are the alternatives? If someone’s going to believe us this is a good definition, we’re going to have to understand what else we might have picked and explain why this definition is actually a smarter one than another one.
Disputes of Values
(44:39) So we have facts, definitions, now values. This one I’m going to skip over a little bit, it’s a bit harder to maintain with all of this stuff. So basically, disputes of value are, “are we making the right tradeoffs here?” Have we done the right thing? In the data context, this usually means have we managed simplicity against expressiveness, have we chosen the right kind of model here or the right kind of work to say we took the right trade-offs. What are our criteria on how we’re going to know if we have the right trade-offs, well, if accuracy is actually way more important than interpretability — if we value accuracy more highly, then we’re going to end up using random forests and things more than if we value interpretability, in which case a linear model or something that’s additive at least is going to make it much easier for us to understand things. Understanding what it means to say our model is simple enough — well, there’s trade-offs there and actually making the case, we have to say well, how much should we value these different things and then did we actually apply them properly, did we actually use values right?
(45:51) Disputes of value come up a lot more in legal settings and more philosophical arguments. I would love for there to be more discussions about what is right and wrong in data, but that’s probably not the right time to do this here. And finally, I think this is probably the most useful slide in the section and possibly the most useful thing you’ve heard in a long time. Whenever you want to convince somebody to act differently, there are four things that if they already agree with you on, they will think they’re an idiot for not agreeing with your course of action. There are four things that if you have clear, if you can make a strong case for, afterwards somebody will say “obviously, yeah, I’m going to follow your lead on that.”
Disputes of Policy
(46:31) So for example, we might want to use this model of using satellite data — we want to use that data to informally guide decisions. That’s actually a dispute of policy. There are four things that if you have straight makes any policy decision a no-brainer. (1) Is there a problem? First off, do I agree with you that there is a problem to start with? So, in this case, is it problematic that we have to wait 5 to 10 years to get poverty estimates? If somebody doesn’t think there’s a problem there, they’re not going to think it makes sense to use your model. Similarly, if you’re building some tool that’s going to go into production to make some experiments, some banded algorithm to optimize the conversion rates — if people don’t think the conversion rates are problematic, then they’re not going to adopt your banded algorithm. So that’s the first question, “is there a problem?”
(47:28) Second, where is credit or blame due? So, something somewhere is wrong. The problem with poverty estimates is that they rely on very expensive, difficult to collect data, which is why it takes 5 to 10 years to get a new estimate. In the case of conversions, the problem is that we don’t react quickly enough to people’s needs to adapt this software. I actually make a case somewhere that it’s actually the cause of the problem. If the cause of the problem were that it takes 4 to 5 years to get a poverty estimate is that it takes 4 to 5 years for poverty to meaningfully change, then my satellite model is useless, right? I have to first know there’s a problem, and then I have to actually know that the problem is that we’re relying on data that’s too slow to select. Ill, blame, cure. Will the proposal solve it? And this is where probably data scientists folks spend the bulk of their time. You identify there’s a problem, you identify what you think will fix this problem, and now, is your model good enough? Is your proposed solution — is your proposed change actually going to make a difference? That’s where all of the cross-validation and all of the model validation and all of the — looking at all of the funny graphs that tell me how I work on different kinds of data — how does it go on fake data? How does it work on Mondays? How does it work on Wednesdays? That’s basically saying, will this actually solve the problem?
(48:55) Ill, blame, cure, and finally cost. Is it actually better on balance? Every decision has trade-offs. Some of the time, the work that we do is going to be more costly to put into production, or more difficult than what’s already there, even if it’s better. Even if it actually solves the problem. It still might not be worth doing. And so if you can’t make a clear case that it’s actually worth doing in terms of the time put in there, the level of complexity, the results you’re going to get, then none of these other things matter. In fact, all four of these things matter or else you really shouldn’t be making a decision. If you don’t agree there’s a problem, you pinpoint at least roughly what the problem is, you think this will solve it, and you think it’s worth it on balance — then why are you making decisions at all? And when it comes to things that are really simple decisions, like I just want to run an A/B test — well, it’s actually very low-cost for the most part. So, if I think there’s a problem and I think I know where the problem is and this will solve it, these things don’t have to be that well-defined because it only costs like an hour of my time to set up a test. If it costs you 4 weeks of developer time to do an A/B test — well, first off, you should call me — probably your cost/benefit calculation is going to be totally skewed. In that scenario, these all work together.
(50:30) So, again, the four different things here are disputes of fact where we’re saying, “true or false, what do you think about this?” Disputes of definition, “is this defined properly?” Hopefully I’ve convinced you that definition comes up all the time in your work, even if you hadn’t really thought of it that way before. Value is a little bit tricker — how it is that we’re actually making the right trade-offs on something, not necessarily in terms of acting differently but in terms of what we might call taste issues. Does this model have the right kind of trade-offs to make it worthwhile? And finally, policy — policy arguments say, “what should we do differently?” And the mnemonic here is “ill, blame, cure, and cost.” When I want to convince somebody of something, I have to convince them there is problem; I know what the problem is; I know how to solve that problem, and it’s worth it on balance.
(51:29) So in summary, there’s this expression when you’re going traveling which is to lay out all your stuff on the bed and all your things and all your money and take half of the stuff and twice the money. I think when it comes to practical data work, you should lay out all your math and all of your fancy tools and all of your models, all of the different things, and all of your understanding of the world and understanding of the problem, and take half of the math and twice the understanding. I think this is really just the tip of the iceberg. I think there’s a lot of really useful things that we can learn from other disciplines. We can learn from people in the humanities and social sciences, and one of the things that I didn’t talk about that I think is — just to draw connections here, we talk about CoNVO; we talk about Context, Need, Vision, and Outcome, and in a lot of ways, that actually mirrors the arc of a story, right? The Context is the hero/heroine is in the woods. The Need is there’s some sort of conflict; something happens, there’s some problem. The Vision is, somehow this problem gets solved; we reach some sort of climax in the action, and then there’s a denouement; there’s an after party, the outcome of what happens afterwards. And recognizing that there are all of these parallels between what it is that we do on a day-to-day basis as data people and what other people in other unrelated disciplines have been thinking about for a long time is a very powerful notion. We aren’t alone in these things; we can reach out to others to figure out the right way to solve problems. I’m @maxshron, and let’s go for questions.
(54:00) Q: How good is satellite data at predicting changes in poverty?
A: It’s okay. The data that we were using is the freely available NOAA data. It’s one pixel per kilometer, basically, so it’s hard to get a lot of resolution of that. Directionally, it definitely works, and one of the interesting things is that cities versus countryside have opposite predictors for lots of things. When cities are getting brighter, it’s because there are more slums in the cities; when the countryside is getting brighter, it’s because there’s more economic activity because people are building useful things. So, understanding that dynamic was interesting and a little bit challenging, and what we had to do was spend time with the economists. Their knowledge of advances in machine learning is not that great, but their knowledge of the problem domain is superb, and so figuring out how to cross the barrier and make sure that we can work together, I think, was a challenge but was really worth it.
(55:33) Q: What do you do in situations before where people think they already know the solution but they probably don’t?
A: I have found that sometimes people really do; sometimes you talk to people and they really have a pretty good idea except for the technical bits. Asking them to write down what it is they’re looking for and clarify it in writing, I find extremely useful. Oftentimes, I find that people realize in the process of doing that there are things that they don’t get. Or, when they write things down, then you can say, “Great! This is really helpful. Here’s my understanding of what it is that you want. What do you think here?” And I think if you’ve got a persuasive case for why you understand their problem really well, they’ll often go with their understanding. I found mock-ups also to be one of the most useful tools for there as well, for getting on the same page for what the actual outcome or what it looks like to be done with a problem. Sometimes just using wireframe user interface mockups — saying, “Here’s what it would look like from the perspective of someone in this organization or a customer; here’s how they’ll see the results.” I find that to be a very powerful way to handle that. I think having concrete sentences where someone says, “Okay, after doing this, we should be able to know whether this sentence this true or false.” We did a project once where we were looking at about $1B of eCommerce data, and we were trying to understand whether or not there were patterns in there of how people were basically using credit cards and other things with different kinds of merchants. It’s a very sticky, complicated problem because there are lots of different kinds of credit cards, different kinds of merchants; people have very heterogeneous purchasing behavior. So, we sat down with the client and basically paired, on writing, a handful of sentences of the kinds of conclusions they would make when we were done building them a tool that would let them pull out these patterns. They might have had a different idea of what they were looking for, but by the time we really hammered down what the outcome was going to look like, I think they were on board that it was a stickier, trickier problem than they had originally anticipated it would be, and they were willing to put in the time and effort to make it happen.
(58:33) Q: Are there any times when I tried applying this and it blew up and I realized I had to change things?
A: I think the closest would come was also a talk I was giving — I gave a talk at Columbia about a year and a half ago that was right after somebody who had just been talking about making a spam filter, and I realized that fitting that kind of work in where the thing somebody is working on is really not intended to be convincing, but is intended, instead, to take action, I hadn’t really thought enough of how to explain what it is that CoNVO actually entails to fit the cases where you’re building a tool like that. So I stood up there and gave this talk and the students were like, “how does this relate to the thing that he was just talking about about building a thing, a spam filter?” So, I had to go back to the drawing board a bit and figure that out. I think the specific thing that I changed was realizing that when we talk about figuring out the knowledge of what it takes to actually take an action, that knowledge — so, first off, I realized that I was just an idiot because there are all kinds of times when you’re building a tool that you realize that you have to convince yourself of something. There’s knowledge that just comes around and figuring out how to fit that into the organization or figuring out if it’s worth doing, that kind of thing. Realizing that another kind of knowledge is the representation of the distributions of the theoretical data in your algorithm — to some extent, believing that that knowledge is represented properly, believing that this tool has actually apprehended well enough, that this same idea of knowledge fits for both — I don’t know if that’s a good example of a failure case, but that’s one of them. Apologies if that was a cop-out answer.
(1:00:00) Q: What is a good place to learn more about quasi-experiments besides my book?
A: Counterfactuals and Causal Modelsis good, I think it’s from Princeton University Press. There is a book, in theory, coming out — it’s not an excuse to wait — from O’Reilly in a year or nine months from an author who is talking about practical causal reasoning in business. I think it will be really good. I’ve read a little bit about what they’re doing, and I think her work makes a lot of sense. Counterfactuals and Causal Models, there is a classic, classic book, it’s got a super long name, and it’s by the guy who invented the term “validity,” I think. Thomas Campbell, I want to say? It’s got a quasi-experimental design and it’s surprisingly readable for a $100 Psych textbook. It really goes into as much of the philosophies you want, but a lot of the theory. So I would say the difference is the experimental and quasi-experimental design book from Campbell goes into a lot of details on how you talk about non-experimental causal relationships without a lot of statistics, of thinking about saying, “I’m going to have pre-tests and post-tests; I’m going to find natural experiments; I’m going to look at situations where I see a lot of natural variation in things and work from there.” Those things make a lot of sense. Another thing to mention is a lot of good economics books, if you can get over the fact that they flip a lot of axes in their diagrams. Counterfactuals and Causal Models is about more of the statistical approach to causal reasoning, and both of these things, I think, require you to make a good argument, but the more statistics side is matching and all of the Judea Pearl stuff of causality diagrams — I think a lot of those things are often overkill. Just knowing how to condition my models a little bit better and building a slightly smarter regression — this probably covers a lot of these. Another really great source on this is there’s a guy at NYU called Sinan Aral. He is, unfortunately, responsible for all of those Humin invites you might have gotten in the last week. There’s a startup that makes this thing where you upload your contact list and then it emails everyone on your list to ask if that’s still their email address. Anyway, so it’s called Humin. Sinan Aral is a fantastic guy. He’s written a lot about causal reasoning in network scenarios — so, how do you understand what it means to talk about cause and effect when we talk about a Facebook app, or when we talk about scenarios where people are sharing things and maybe they came to your app or they came to your service because a friend told them, or maybe they came to it because they just happened to read the same news as their friend. And so they actually did a round of great experiments that I find really easy to read and interesting around, first off, making fake Facebook applications that only, brokenly, share — like, you try to share with somebody and then it tells you it shared it, but actually it didn’t half the time. And so, they can figure out from that what the effect is of sharing; if it actually is the share or something else that gets somebody to use an app. And they also did a really fun thing where they partnered with Yahoo! and they looked at all of the installations of Yahoo! Messenger, and they actually — because Yahoo! has this toolbar that also hoovers up all of your sites you visit — they could look at a fraction of these people and see every single website they visited for the last two years and try to figure out if there’s a relationship there between what people read and what they actually end up doing. One of the things that they’ve found that I think is very interesting is that a lot of statistical methods for choosing — for trying to back out causality — if you go ahead and you take an actual experiment and then you forget you took an experiment and you just use the stats-y methods, with some reasonable work, you can get 80% of the way there. People are always like, “What if there’s some giant confounding variable that I couldn’t have thought of!” Well, clearly we can’t do causal reasoning without knowing everything or having a randomized experiment. If you actually look at how practical these things are — you can get a lot of the way there. 80% is way better than pretending that you can’t do causal reasoning at all.
I think we’re out of time for questions, so thank you very much; I’ll be up here.