Faith in Data or Faith in the Crowd?

Faith in Data or Faith in the Crowd?

Morgan: Okay. Welcome, everybody. I know we’re probably going to have some more people coming in as they clear out, out front. And we’re going to spend the next 54 minutes or so talking about a pretty interesting topic—and, actually, some might say it’s two topics, and that’s really one of the things we’ll be discussing here: “Faith in Data or Faith in the Crowd?” And it just might be a little bit that we might have to set up a little bit here so we have some people who are a little more data-centric on one side and the people who are a little more crowd-centric on the other side, but that will be determined.
So I’m going to introduce the folks—and if you will just bear with me, just because I seated them differently than I have their bios, and I hope people will look in their app and in their bios. But on my far right, on your left, you have James Surowiecki, staff writer for The New Yorker and I think probably most known to all of us as the author of “Wisdom of the Crowds.” And writes on finance for The New Yorker, and covers a lot of media stuff too, which I like.
And then next, Balaji Prabhakar—did I pronounce that correctly?
Prabhakar: Prabhakar, yes.
Morgan: Prabhakar—of Urban Engines, and also the founder and CEO, and also a professor at Stanford and the director of the Center for Societal Networks.
To my immediate right, Walter De Brouwer, a Belgian-born Internet and technology entrepreneur and semiotician—did I pronounce that right? Semiotics? Semiotician?—and the founder and CEO of Scanadu, a NASA Ames Research Center-based company with a mission of making this the last generation to know so little about our health.
And to my left, James Kocoloski—I’m sorry, Adam—one of the founders of Cloudant, a recent acquisition of IBM. And your official title also at IBM now is—?
Kocoloski: CTO for information management.
Morgan: Okay, great. And to my far left, Yan Qu, the vice president of data science at ShareThis, and, I think as I’ve read, the holder of 14 patents in various—in look-alike modeling and a lot of science in the online advertising technology area.
So starting with James—and then we’re going to come down the panel—what was your reaction when you saw the title of the panel that David Kirkpatrick put you on and were wondering what it was that we were going to talk about?
Surowiecki: I think my first reaction is maybe just to say that I think to some degree the opposition is a little artificial, that at least the way I conceive of the role that crowds can play, that certainly one of the things, one of the benefits that you can get from aggregating collective intelligence is getting better information, getting better data. That’s one of the things I’m really interested in, and I actually in my book talk quite a bit about, you know, what are essentially mathematical ways to do that. So while when people think about crowd-sourcing or aggregating the wisdom of crowds, accessing the wisdom of crowds, you can think about things like Wikipedia, which are not sort of mathematical, that’s a different model of contribution.
There are also many examples, and I think tremendously powerful ones, of using algorithms or different kinds of tools to convert people’s knowledge into data, and there are very simple examples of it. You know, I talk about the jelly bean experiment. You ask a large group of people to guess how many jelly beans are in a jar and their average guess is exceptionally good. But there are also much more important and sophisticated ways of doing that. So I would argue that that’s basically what a market does. So in financial markets you see it, and now you have these things—to my mind we need more of them, but prediction markets, where basically people are able to come together and collectively try to make forecasts about things like elections or sporting events and the like, and the results there are often very good.
And I would also say that there are—one of my favorite examples is the racetrack. So the crowd of bettors at the racetrack does an amazingly good job of forecasting the outcome of races, and that in a sense is data. It’s just what people’s opinions are being converted into numbers. So I actually think the opposition is a little, maybe more artificial than it needs to be.
The other thing I would say is I think in a lot of cases the data that let’s say deep learning machines are using is data that’s basically derived from humans in some form or another, and I think the prospects there are tremendous, but I, again, would say that it’s a little more complicated than setting machines up against human beings.
Morgan: But one of the things I thought was interesting in your writing is, let’s just maybe replace data for a moment with expert, the notion that sometimes when we think of like the person that comes with the data is the expert, it seems to me that was—could you—
Surowiecki: Sure. So that I think is a separate—I don’t know if it’s a separate issue, but that I think, there the opposition is really—if the question were “faith in experts versus faith in the crowd” then I would be very clearly on the “faith in the crowd” side. And I do think that—for me, the way I kind of think about it is that the—when you are able to access the wisdom of crowds, to the extent that it exists, and when it satisfies these conditions that I talk about in the book, what you are really getting is a kind of mosaic picture, right? So there are all these tiny pieces of information that are scattered out there, and when you are able to aggregate them, that’s what you kind of end up with, where with experts, typically no matter how smart they are, you know, they only have a limited view. And one of the great things I think technology does is it actually allows you to access the wisdom of a crowd in a much more efficient and rapid way than was ever possible before.
I think about this especially in organizations. You know, previously if you really wanted to find out what your employees really thought about, I don’t know, a new product or when a software project was going to launch, it was hard to do that efficiently across a big organization. Now it’s incredibly easy, and so that’s a case where I think technology and the crowd are very complementary to each other.
Morgan: Okay, so I’m going to jump over to here now. So Adam, what was your reaction when you read the title of what you were going to come be a part of?
Kocoloski: Sure. So looking at my personal background, I was very much a data person, right? Before starting this database as a service company, I was an academic, you know, a scientist doing particle physics data. You don’t ask a bunch of physicists where you think the Higgs is in and take a sort of aggregation of their opinion—
Morgan: With research on gluons, as I recall.
Kocoloski: Yeah, yeah, exactly, right. So, you know, I mean, there you ultimately have hard data that you have to apply numerical methods to and extract and analyze. But of course what we’re finding more and more in the enterprise these days is that the data sources that matter are—of course they have the systems of record, right? And these continue to run and function and be very profitable systems for many big enterprises. But where those companies are deriving a ton of value today is by blending that backend transactional data that they have with data that they are collecting from the outside world. In the IBM parlance they call it systems of engagement, right? But fundamentally, it’s a kinds of applications that are maybe more familiar to the audience here, you know, applications that interact with the users that collect data from the outside world, and those companies are trying to do the best they can to blend the insights that are derived from that world with the transactional data to do really differentiating things and provide value back to their customers. So like James, I would say that the opposition is a little bit false, and that, if nothing else, if you’re talking about deriving insights from the wisdom of crowds, you have to ultimately derive those insights using the right set of numerical methods and the right set of data analysis tools.
Surowiecki: Yeah, exactly.
Morgan: But I guess at some point it does start—whatever data structure you have or approaches, that you’re starting with some sense of hypothesis that are either—
Kocoloski: Absolutely, absolutely you’re starting with a set of hypotheses and you’re having to kind of validate those hypotheses, and do so in a statistically significant and appropriate way. One of the challenges these days is, you know, you gather enough data and you will find a signal, right? You will find something. Is it statistically significant? Well, that’s important for you to determine, right? But eventually, you know, there will be a deviation that if you are not careful you can take as a signal that verifies your hypothesis when in fact it really doesn’t.
Morgan: Yes, actually, I was at an event not long ago and I heard someone comment that one of the biggest challenges with big data is that in a big data world everything correlates. [LAUGHS] It’s ultimately possible to establish a correlation between any two things, given enough data. Yeah.
Okay, good. So I’m going to come back to this. So, Balaji—if I’m pronouncing that right—can you—what was your reaction to this topic and title?
Prabhakar: It was something similar to what James and Adam have said, but in a particular way. So at Urban Engines, and previously at Stanford, my work has all been about eradicating traffic congestion on the roads and buses and trains—and I’m a CS and EE type, I’m a professor of computer science and electrical engineering, so transportation is a new area. And the thing that we should look at carefully and closely is just what is the role of large scale nudge engines, how do you get people to shift out of the peak hour and so on, right? And also how do the buses, trains, or roads—the capacity, how is that consumed? Because this is something that we know really well in the world of data centers and cloud computing; we don’t have as much of a knowledge of this in the world of transportation.
And so the two things, right, the data versus the cloud, you can either look at exactly where the trains and buses are at a given time, you can even know in some cases how many people are in each vehicle—this is something people want to know: Is this crowded or not crowded? Should I wait for the next bus? I see this bus coming here, I see some people in it, but what about the one behind it? And there are methods that people have got. There are better sensors that, you know, the flow of trains is actually a big weighing scale, so you can weigh all the people in the train, in each carriage, and that I would call data emitted by the system.
On the other hand, you can also stand in a subway station, just outside the turnstile area, and see batches of people swiping out, then you know the train arrived and you can tell how many people were coming out of this train and make a guess as to how many might have been on the inside. So there is an inference you can make of the same quantity, but using just a different method of sensing. And so that is very crowd-based, and in the world of congestion, if you don’t look at the crowd what are you looking at anyway, right?
So for us, both actually go hand in hand, and it’s this sort of thing where you get some accuracy with—if you have a good-quality sensor—but unfortunately it’s not available in real time. Usually the bus or train has to go to end of the run before you can get that data, so it’s not timely. It’s useful for the next day, but not for telling this person to get into this bus or not. Helping here is the sort of the thing that James has written about, the idea that there are apps. Somebody in the bus or the train can say, “This is crowded,” okay? And that’s actually, if enough people do it, pretty—you can accurately know that that vehicle is crowded, okay?
And so this is the sort of thing that there are different sources of information, some of which is available to the operators, we think of it as data, and the rest of it, when it is emitted by passengers, we think of crowd. Okay? And it is the combination of these two things, and taking stuff from here and giving there, and taking the crowd-based knowledge and putting it back into the hands of the operators actually is what we believe will get us a lot closer to us first exposing bottlenecks and congestion points and then to solving it.
So from my perspective, they actually go very hand in hand. And I’m not just saying this just to be agreeable, but it is the case in my work.
Morgan: So might it be, like in your case, because of actually the constituents whose behavior you are trying to influence, that one is likely to be more data-driven and the other is actually more likely to be more crowd-driven in some ways? Because there are two parts to a system.
Prabhakar: Yes. That’s one of the reasons. The other reason is also the type of network that they are using. So once you sense some data you can either, you know, go out of the train or the bus following some sort of local area network eventually and being pulled out through a Wi-Fi system of the depot and then getting pushed out, or you can follow the cellular network coming straight out of the smartphones of some riders. And I think there is that dual possible—there are sensors, and there are networks, and there are people and their cellphones, right? So they are just complementary in just the right way.
Morgan: Great. So coming back, now, Yan, if you could—what’s your take on this?
Qu: I think I agree with the previous statements. So as a data scientist, I spend most of my time looking at data than looking at the crowd behaviors. But when I think about this question, really, really thinking about crowd, a crowd consist of individuals, and individuals have individual behaviors, and data really just captures that individual behavior.
So as a data scientist, I feel like our job is, you know, data is a lens into human behaviors, but we don’t just stop there. The next step is we aggregate intelligence, and you can give that into the systems, into platforms, so that influences the crowd behavior. That’s the feedback loop that I think people have been talking about.
But coming back to your expert versus the crowd question, I can use the—I think that big data actually makes us tap into the wisdom of the crowd more. If you look at machine translation, the first generation of machine translation is expert-based, rule-based systems. So we need linguists and experts to design the rules, and the reason for that is we don’t have enough data, so we have to be smart with the data that we have and then create generalizations on top of it, and that is the reason for grammars and all those kinds of things. With the Web, we get a lot of data, and people are talking about using the Web as a compass. Now we have the big data, we have a lot more data, and then you don’t have to rely on the experts that much, because then you can look at different algorithms and see how you can tap into the data itself. Google Translation, for example, is very statistical, it just looks at huge corpuses and tries to align different languages together.
So I will say they are complements, data and crowd complement each other, and they really help.
Morgan: So you’ve particularly worked in areas like look-alike modeling, where you take characteristics from, say, you know, what’s seen in a certain group and project it to others, sort of injecting a behavioral model into it. How do you see that in some ways maybe being at the intersection between the crowds, and why do you think those are valid models and tend to prove out?
Qu: Yeah, I think that’s the other—again, back to the data, it’s whether you have small data or big data. And look-alike is really addressing the small data problem. So when you first look at it, you know, for marketing, some advertisers want to target a particular segment of users, people who buy my diapers or people who buy beer. But in their dataset they may only have 5,000 users or 10,000 users, and that’s a very small dataset. And how can they run a campaign and leverage the other information about these users and then find more users like that?
So what we are doing there is really starting with a small dataset that is your target, and then you broaden that to other signals of human behavior and you find the representative profiles of these users and find more people like that.
Morgan: Great. So, Walter, you’ve now had the benefit of everyone having covered all of the space and in total agreement here, so I would love your reaction to this topic.
De Brouwer: Well, I think that, as consumers or as the crowd, we have inherited from corporations and academia certain tools that are correlated to outcomes and prescriptions. You know, like I am going to the doctor, so he’s going to take my blood pressure. He takes a device from the 19th century, a stethoscope, and he combines that with a device from the 20th century, the oscillometric method, the cuff, and the metric he uses is millimeters of mercury, and in the end the guidance lines from the 20th century say my systolic has to be 100, my diastolic less than 80. So I know what I’m going in for, I just go through the ritual. There is no serendipity. These tools have been made to find what they are looking for.
Now, enter big data. The big promise—surprise me. You know, we take—these tools no longer confined to outcomes—all the tools. Corporations, academia, put zettabytes of data into a big mixer in the cloud, and they are using mathematical models like unsupervised learning coming out of deep learning and, you know, support vector machines—and mathematical models are free because, you know, nobody knows how to sell them. So they are put all in the cloud, and then what happens? Well, unsupervised learning has to be curated, but it’s only one epic army that can curate that, and that’s the crowd. You know, Lehman’s Law, when enough eyes are on it, all bugs are shallow.
So that means that the crowd is going to become the weapon of mass curation of that whole big data thing there, and then they are going to surprise us, because I’m sure that my blood pressure has something to do with my credit score, with my cloud score, with my biometrics, and more than the carotid sounds that the doctor—you know, that’s the big promise, I find.
Morgan: So, James, as someone who certainly helped launch a lot of the thinking about how crowd-sourcing—you know, you wrote that book well before we had the kind of social media networks that we have today. What’s your perspective on—particularly to Walter’s point here, what’s your perspective on how this might come of age and how it might be a little bit different, or the same, than you anticipated?
Surowiecki: Well, I think—I actually think there are some challenges that social media and networks present. One of the premises of the book is that the best decision, group decisions actually emerge when people are diverse and independent. And independence is not a pure condition. I mean, let’s go back to the racetrack—I’m sorry I keep going back to it, but it’s a fascinating example. At the racetrack bettors are not purely independent. They can see what other people are doing, there are odds on the board, etcetera, etcetera. But relatively speaking, they have their own judgment about how likely a horse is to win or not, and the same is true—you know, in a stock market there are lots of different voices, but a lot of the times you can be—there are so many different voices that you can be relatively independent.
One of the challenges I think that social networks and social media present is that it’s actually easy for people to get locked in certain kinds of echo chambers, where instead of thinking for themselves they are mainly responding to what other people who sort of already agree with them are. And, you know, there is this famous idea from sociology that this guy Mark Granovetter came up with that weak ties are actually incredibly valuable rather than strong ties. I mean, strong ties are valuable in terms of your family and your close friends, but if you want to do things like get a job or whatever, weak ties can be very important. So people that you kind of know, you know, they may be two or three steps removed. And I actually think that the Web works best when people have lots of weak ties, when they are really keeping their minds and browsers, so to speak, open.
So one of my concerns about how the crowd works online, and the Web in particular, is how do we avoid the situation where you are just kind of listening to yourself? But I do think there size really matters, and I think some of the stuff people have already said about when you get to big, big data—and I think what Walter just said is right in that regard, that there, if the crowd is big enough, a lot of those concerns to some degree kind of fall away. And I think that—the way I think about it is actually not that different today than I did a decade ago, and that is that human beings—this sounds so trite, but I think it’s right. Human beings know a lot about the world, individually. We don’t—all of our views about the world are biased and flawed and limited, but collectively we have really good pictures of a lot of things, traffic certainly, to some degree health. You know, and so I think to the degree that you can aggregate large groups of people, you can actually get a pretty interesting picture in a lot of cases. And I think that that is—that’s why, again, I think that the opposition, to me it’s actually not so much an opposition. It’s really that technology offers you a much greater possibility of aggregating this stuff than you ever could before.
Kocoloski: So there is an interesting—kind of tying this back to some of the discussion yesterday, one of the first comments of the opening session was about, you know, is democracy failing us, because, you know, you get this sort of poor decision making, and contrasting that with the corporate board room, where you’ve got a clear directive to go make decisions, right? Those corporate boardrooms are very drive by data today, right? It’s a curated set of data that, you know, some expertise has decided—
Surowiecki: And politics.
Kocoloski: And politics. But if I was to decide whether it’s data or crowds—right?—it’s a sort of golden set of data that’s being used to drive the decisions at that boardroom. I’m wondering if there are—all the examples we’ve talked about have been very public examples, traffic patterns and racetracks and so on. But are there examples where we see this kind of crowd-based decision making leveraged effectively for internal decision making in an enterprise?
Surowiecki: Can I say something on that? I mean, I actually think this is one of the—if there is one area where I think too little progress has been made, it is actually in this regard. So I actually think companies are doing a better job in general of trying to access the knowledge of, say, their employees or their customers or the like. You know, IBM famously has these big jams basically, where they allow employees to participate. But I actually think that—you know, Lew Platt, who used to be CEO of Hewlett Packard, once said, “If Hewlett Packard knew what Hewlett Packard knows we would be far more successful than we actually are.” And I think that’s still true today, that—so there are organizations that have used like internal prediction markets to try to do a better job of forecasting which products are most likely to succeed, or forecasting when software projects will launch, because that’s a question, when you ask people, “When you are going to be done with this?” it’s hard to get an honest answer, but if you ask the organization or the team—and the crowd doesn’t even have to be that big. It could just be the team, but it’s the crowd.
So I think that those are cases where you can use that. Now, I don’t think you are going to see—and there are organizations that have actually set up idea markets, so where pretty much anyone in the organization can offer up an idea and the crowd weighs in on it. And I think the results there have been very good. The problem there is that for the most part it’s still—there are not very many organizations that are really doing this really seriously.
Morgan: So to one of you two, isn’t one of the things we are saying here is that it’s less about the algorithm than finding ways to aggregate as much data that can be re-exposed to the potential users, or just not necessarily an analytical insight approach? Or take it whatever way you want to go
Prabhakar: I think the thing is, the last comment that James is making is really the—when you are asking the crowd for something it’s usually, you get good answers if it’s usually a binary outcome, right? Who’s going to win the election, this or that, right? I think when you ask for something that requires just much more of a nuanced response, it actually then becomes hard to aggregate that information very clearly and effectively. You could give a multiple choice question, that’s easy. I mean, if you are just going to ask them to think aloud, right?
Morgan: But what about—let’s flip past that and say it’s less about what people declare than what they do, which isn’t nearly as binary, like a traffic pattern or a decision.
Prabhakar: I agree.
Morgan: So there, if you expose the information and then the human behavior interprets it—
Prabhakar: Yes, that’s right. And I think people are motivated to minimize their own commuting times and increase their own comfort levels, so they have chosen to do something against constraints, like you have to be somewhere at a certain time. And I think that is much more helping us in our sort of pursuit of going after traffic congestion. But this thing of like—you know, I was actually thinking of, paraphrasing what Adam said, is there a way, is there an example where you are actually better off trusting data versus the crowd, or the other way around? Okay?
And so “Moneyball” is an example I was just going to offer up, right? There is this whole industry called baseball recruiting that’s been around for a hundred years, and people have just got it down. I mean, not even the scouts, even the organizations and the fans, the sort of committed, deeply knowledgeable fans. And then out of nowhere comes this idea that data is actually more informative, right? I think that is an interesting case.
Morgan: Yeah, but isn’t it also—wouldn’t it be interesting that the people that were the stars of “Moneyball” were actually people who were providing data to fans who were starting to build fantasy teams probably counter to the way organized baseball had been?
Prabhakar: True, and so—but it makes the point that sometimes wisdom can be built upon a knowledge base that is in need of revision.
Morgan: Yeah, that’s a really interesting—Yan, so for someone who has worked in advertising, in the social sharing of media, there is certainly—that’s an industry that is driven by a lot of—they believe they know what’s right. I mean, do you find that issue, what Balaji is talking about here, that you are informing things that are intuitive or counterintuitive to what the crowds might have thought?
Qu: Well, I actually think that’s a good thing. So look at the crowd—there is a crowd and then there are crowds. And, you know, in social media we have a lot of things—people are sharing about what they had for breakfast every day, and then there are other things like Arab Springs. So there are different things at different crowds. So what is important? And that is relative, relative to your audience, right? So I feel when we do algorithms we actually need to capture the variety, not just aggregate to find what is the most trending in social media, for example. You need to look at, you need to have finer segmentation, you need to have personalization so that you don’t just have everyone, you know, converge to this single belief. So I feel that needs to be built when they build platforms, when they design algorithms, we need to think about that.
Morgan: Great. So as we keep the conversation, I want the audience to be prepared to jump in at any point with questions, so raise your hand, but—oh, we’ve got one right here. Coming right behind you. And identify yourself when you speak.
Bonchek: Mark Bonchek with Shift Academy. My question is about kind of the missing third item, so data, crowd, and experts, and when does the expert curation need to come into it? What is the role change? And for me personally, I am very interested around learning and education, and right now we’ve had a model where the professor, the teacher, is the expert, the curator, the faculty member at the center of things. And how do we start—it seems like an interesting thought experiment to say like where does the crowd and data come into education, particularly higher education, executive education? How do we start to weave those tings together into a different mix of those three elements all coming together?
De Brouwer: I was still at university when Ph.D. students couldn’t actually quote Wikipedia in their footnotes. It was completely forbidden. Now of course we can’t remember a Ph.D. anymore without Wikipedia quotes. Jimmy Wales said that the truth is arrived at when the consensus is no longer contested. So, you see, it’s like an automaton in mathematics, so it’s very, very simple, it’s a very simple state machine, but it gains complexity onto a level we don’t understand anymore. I think the crowd is growing up and it’s now probably at the level of an 18-year-old. It’s probably in its peak of finding bugs and correcting things, and I think in three years from now we will not be able to think of our civilization without the presence of that crowd. And I think we are underestimating it at the point today, because it can be a very forceful weapon, and it’s also—it also has purchasing power now.
And to come back to your question about experts, you know, when we were at university, as professors we also could not quote. But we were the crowd, but we were also the experts. We were in a super position, because who do you think made all that stuff in Wikipedia? We guarded every word of our domain, and we knew the others who guarded the other words. So basically we are the crowd, we are the experts, we are prosumers, we are users, you know, and we are takers. So it’s all one big thing. But nowadays, the crowd has a voice, it’s no longer illiterate, it’s no longer the unwashed majority, so we can do stuff, and so we have the money to do, where corporations will not say yes. So that’s I think the big promise of this.
Kocoloski: You know, I think I’d take that question and answer it in a different domain that isn’t education specific, but just when you talk about data and the crowd and the experts, I think one of the challenges that we have today is when we’re analyzing the kind of really large corpuses of data that are possible in the modern environment, the tools that are required to do that analysis and the knowledge of the statistical methods is actually in the hands of a kind of select few, right, I mean this data scientist market. And so I think what we need to see is an evolution where the expertise is not really sort of expertise in munging data, but actually being able to take my specific domain expertise and be able to work directly with these large datasets. So commoditizing some of the special province of the data scientist and putting the ability to extract insight from these large crowd-sourced datasets into the hands of the individual knowledge workers who are actually better suited to make decisions based on that particular domain of data.
Bonchek: Can I ask a quick follow up to that?
Morgan: Yes. Wait, just get the mic so we can just keep it—
Bonchek: It’s a really interesting point you’re making. So do you think it’s more likely to come—that new generation of expert, is that more likely to be a data scientist who learns some of the subject matter, or a subject matter expert who learns the data science?
Kocoloski: I think the latter, I think honestly the latter, the subject matter expert who takes advantage of a more refined set of tools that encapsulate some of the best practices from the data science community. But, you know, we’re not going to have the data scientists specializing into the innumerable number of possible different domains of human knowledge.
Surowiecki: Can I—I mean, I actually think the “Moneyball” thing is interesting in that regard—and, again, when I think about the opposition here, I actually think another way to think about it is exactly what you’re talking about, which is data then becomes, or can become, an input into the crowd, what the crowd thinks, right? Individuals, that’s huge. So “Moneyball” is actually an interesting example, because “Moneyball” was a data-driven revolution. It was really about trying to remove to some degree individual biases from the process and actually look at, statistically speaking over time, what correlated, what kinds of numbers on the field actually correlated with wins, right? That was really what the basic issue was. You know, instead of the traditional statistics we’d relied on, batting average, etcetera, we would look at other numbers.
But what’s interesting is that what we’ve seen over the last say four or five years is actually a reincorporation of individual collective insight, based on individuals, into the baseball decision making process. So Billy Beane—there was an article about him in the “Wall Street Journal” this year, and he talked about how he actually is now using the wisdom of crowds at his organization to change the way they evaluate players. Because there are all these things that are—you know, in trying to figure out whether or not a young player is going to become a great player, you cannot just use data, I mean numbers. You actually have to evaluate them in some way, and he’s starting to use collective insight.
And another thing is, you know, another example is there is a great website, a guy who was one of the founding members of sabermetrics, which is the “Moneyball” thinking, and every year he basically polls the members of his—or not the members, but the readers of his blog and just asks them to evaluate players on how good their fielding is, because fielding is one of those things where data is still not quite caught up. And he is convinced, and I think quite accurately, that that probably gives you a better picture of who the best fielders of the league are than any other model. So I actually think that that’s a way where data complements the crowd.
Morgan: So here is something I’d be interested in your thoughts on, because this made me think. I had a conversation earlier this year with former Senator Bob Kerrey, and he was sort of—we were talking in a frustrated way about the intelligence community and challenges, and you look at something like that, that is a very sort of data-intensive thing. And what he said was he thinks that because of the wisdom of the crowds, they actually need to open up all of the data collection, that the U.S. Government and other governments will win if they actually can open up all the information. Because he said one of the biggest problems with the secrecy and limiting the number of people who can look at things, or that there are so few analysts, and what this says is that no matter how much data you have—and they may have more than anybody—it’s really about the analysis of it and being able to get multiple competing points of analysis.
De Brouwer: I think that’s a good point. I think you know, like Stewart Brand famously said that information wants to be free, I think now not only information wants to be filtered by the crowd, but information wants to be symmetrical, because otherwise we will never learn to play non-zero sum games. I think that’s perhaps—you know, if you think of the information as a living organism, as Kevin Carey has done with technology, so any living organism wants to reproduce itself, wants to be filtered, and it wants to be symmetrical. I think that’s—actually you inspired me on that thought. [LAUGHTER]
Morgan: We’ve got another question in there, and then—
Howard: Thank you, panel, for this topic. Dane Howard from eBay. One topic that—or project—is the Safecast, where they had volunteers that gave Geiger counters post-Fukushima. And the thing that is interesting to me is, you know, here you’ve got a volunteer that produced higher density and better accurate information on the radiation levels in Japan than the government could provide, but it required volunteers. So that’s a great cause to volunteer into or opt into when your livelihood is a part of it. So could you comment a little bit about whether or not the crowd, when they opt in, is better to help serve a cause, or if really the anonymity of the crowd is better to serve this purpose? Because you could argue both sides to create a mandate across the density versus volunteerism. So I would just be curious on the awareness of the crowd for what they are doing.
Prabhakar: There are a lot of examples of this kind that show up in air quality sensing, for example. There is no reason why every car does not have a small sensor that just goes on sensing air quality in different parts of the city. In fact, that—such a thing is already underway in some cities at an experimental level. You don’t need to know who is sensing. You’ve just got meters out there, okay? And people just do this because it costs them nothing, basically, some bits on the wire, or wirelessly, and it goes away. Where it comes down to knowingly doing it is when there is some danger or some risk to you, in that case, or there is some sort of—you know, you would rather not do it. Some people may not want to do it, and then you have to have the opt-in model.
I think there are cases where—like one of the things we have been doing is to reward people for making offbeat trips. Instead of condition charging, you reward people for making offbeat trips, right? And there they have to let us know what their trips are. Okay, I am taking this trip, so—it’s like we have a frequent flyer program equivalent for public transit users, so I have to know what trips you make. You make a 10-kilometer trip you get 10 points. If you make it in the offbeat time you get 30 points. So now I have to reveal something about my travel, which makes people a little squeamish sometimes, but it’s an opt-in model, right? And people sign up, and at various times they sort of wait a couple of months and then see how their friends are doing and then they sign up.
So I think there is this—and through them we also understand how the system’s congestion levels are looking and what people are preferring to do, right? So this is something where you could do a lot of different types of sensing, because there are a lot of us out there in the world, and we could just be silently pulling stuff out and shipping back for someone to aggregate and look at.
Surowiecki: You know, I mean, the other thing is, I mean, this—I’m just going to make the obvious point, which is one of the I think remarkable things of the last decade—and this is really since, mostly since my book came out actually, and so I didn’t really write about it much in my book, but one of the things I think that is amazing about the last decade is just how willing people are to do work, real work, that is collectively beneficial, that they don’t really get much reward if any for individually, even in terms of recognition, and yet people are totally willing to do it. I mean, Wikipedia being the classic example in terms of people’s willingness to spend time and energy doing that, but there are myriad examples of it. Even something like Waze. I mean, why do people bother to type in that there is an accident? I mean, what’s the reason to do that really? But I do think, without trying to sound utopian or whatever—and there are obviously limits to this and obviously the more you expect people to do the harder it is to get them to do it if there is not reward—it is fascinating that there does seem to be this great willingness on the part of a lot of people to contribute to collectively beneficial projects. And you know, I think there is something that for a lot of us you do get some—I don’t know, whatever it is, an endorphin rush, a serotonin boost from helping on some level. And I do think one of the fascinating things from the last decade is how much value people are willing to create collectively—I mean, Scott Cook was talking about it earlier today, even though they are not getting paid for it at all, basically. So.
Kocoloski: Sure. Open source software is huge example of that, right?
Surowiecki: Yeah. Open source, right.
Kocoloski: The degree in which people have invested in that is tremendous.
Morgan: Can we have a mic out front here, please?
Vander Auwera: Thanks. Peter Vander Auwera from Innotribe/SWIFT. It’s especially the comment that you make about Scott Cook’s comments, about this willingness of people to contribute digital assets without being rewarded for it. I think there is a big difference between contributing voluntarily to something like Wikipedia and contributing without knowing it to Amazon or Google or anything like this. So my question is basically about business models and how can we see an economic model where it’s in the sense of what Walter says, it’s about that data and the rewards have to be symmetric.
Surowiecki: It’s a great question.
De Brouwer: I’m not a politician, but—so there should be, of course, a sort of consumer bill of rights that has a sort of Miranda, you know, for everything we do, and re-identification should be outlawed.
Prabhakar: Sorry—can you—what does that mean, re-identify—?
De Brouwer: Well, if you de-identify information—
Prabhakar: Oh, de-identify.
De Brouwer: Yes, so that if you would reverse engineer it, so—which is not so difficult, actually, with big datasets.
Morgan: So you take someone who has been de-identified and taking them out of it, and then you find them again some other place and reconnect it back—
De Brouwer: Yes. I still have to see the first corporation that, you know, like de-identifies but of course can re-identify in a click of a switch.
Surowiecki: I mean, I would actually say that I think—you know, you can probably separate those things even further. So I would say [LAUGHS]—I would have to—you would have to—it would take some work to do this rigorously, but I would say Google, which in a way, if you think about the way a search engine worked, or works, is really harvesting the knowledge—I mean, I sort of think of it as a search engine is built on the wisdom of crowds, right? I mean it’s basically harvesting the links that people are making and the clicks and all that to determine which pages are most likely to have the information you want. That to me is a little different from what Facebook let’s say is doing, where so much of Facebook’s value—well, all of Facebook’s value really is really created by the content that people create on Facebook.
And in the second case, I actually think there is a better case to be made that the reward should be more symmetrical. I don’t know exactly how you’d frame the rewards there, but it is quite astounding when you think about how valuable Facebook has become, basically built on this uncompensated value that its members basically contribute to it.
Google it seems to me is more like there is this stuff happening and Google is just kind of saying, “This is what’s going on and we’re going to figure out how to monetize it.” Or not even monetize it, but, “We’re just going to—” But I don’t know though, maybe that’s even that distinction. But I do think there’s—I think you’re right, it’s a really interesting and fascinating—you know, you have a whole business built on uncompensated labor, basically. So.
Morgan: Over here?
Audience: So it’s been a fascinating panel. I’m wondering about the immediacy of the problems that you are addressing versus long-term problems. So I’m very interested in climate and energy problems, and it seems like—that a lot of decisions about climate and energy are made from an ideological basis, which is not data based, and the wisdom of a crowd seems to shift into one of two or three categories, which won’t solve the problem. Either it denies the problem to begin with or makes decisions based on some prior ideology from the 1970s that also won’t rise to the magnitude of solving the problem. So how can you disrupt that process with the wisdom of crowds or big data?
Prabhakar: I think it’s already happening. I think it’s—people are—as time has passed, the ideological points or views someone may hold just get eroded by what, you know, Adam was saying earlier, you don’t—physicists don’t ask each other for an opinion. It becomes just a fact-based subject. So that is happening. I think the acceleration of that could be using some of the, you know—the more people are just aligned with the facts, the faster it will happen. I think that’s one of the things that, you know, this whole how does an idea become viral, how does a fact become widely adopted, right? I think that’s something where we could use this technology of crowd-based sort of spreading of an idea. That’s my take on it.
Surowiecki: I mean, I’m maybe actually less optimistic than you, which maybe sounds funny, when it comes to this. And the reason is that I do think that one of the biggest challenges when it comes to collective decision making is that it works best when people have a fundamental agreement about what the problem is they are trying to solve. Not when they agree about what the right solution is. Actually I think the wisdom of crowds works best when people have divergent opinions, like at the race track or whatever, or in the stock market. But I think that the big problem when it comes to energy, at least in the United States, is that so much of it seems to reflect, as you said, a difference of values or whatever. And I also think obviously in the case of energy one of the big problems has nothing to do with the crowd, it has to do with the rigidity of our political system and the power of lobbies and of established interests. I mean, I think that has a huge amount to do with what people are saying.
The other issue that I think is interesting, and I don’t think there is a good answer for, is the question of time horizons. And I think it’s absolutely the case—but I don’t think this is true of crowds versus individuals. I think it is true of just about every—humans. You know, problems with long time horizons are difficult to get people to focus on and to implement solutions to that can be painful on the short term and beneficial in the long term. And when I think about climate change that to me is the biggest challenge, at least in the United States, that the cost of—I mean, the solution seems so obvious, right? I mean, and it’s one of those funny things where everyone kind of knows what the solution—like, we need a carbon tax, whatever. We know what we need to do. But I don’t think our political system is very well set up to inflict short-term pain in the interest of long-term gain. I just think it’s—so I wish I had an answer for how to fix that. I do think that’s the heart of the issue though.
Qu: Well, I feel like one reason is maybe we don’t have enough data. So if you look at some of the first beliefs in the past, at one time everyone believed that the earth was flat, and that was probably a rational decision based on the data that you had and the experience that people had. But when you get new data—Columbus traveled the world and proved that that belief was false—then immediately people’s rational beliefs can be changed. So I think this is maybe a similar problem but maybe a longer horizon. If we can think about what is the data that we can bring in that can refute some of the hypotheses, then that might be one way to solve this.
Surowiecki: Can I actually add one thing to that? I think that’s a fascinating and a great point, and one of the questions I’ve always wondered about when it comes to people—let’s say—I’m thinking of Congress, Republicans in Congress, who quite explicitly are not interested in doing anything about global warming. The question I’ve always had is do they know that they should be doing something about global warming but don’t want to because they are from a coal state or because it doesn’t suit whatever blah-blah-blah, or do they just genuinely not believe—you know, could more data convince them? And that I think is a really interesting question.
Morgan: Well, you know, and to add a little twist to that question, Adam, you know, a theme that I think we’ve been hearing all along is that opening up the data for analysis to more is certainly improving the predictability of it and certainly the response to it. As someone who is part of pioneering taking data mining into the cloud, what about the tools? Let’s stretch this just a little bit on the technical side. How are the tools used in data analysis going to be changing to make them more available to more people, which may theoretically, you know, even help build, you know, stronger political coalitions and—?
Kocoloski: Yeah, yeah, yeah. This certainly goes back—I mean, we’re not [LAUGHS], we’ve got no shortage of data these days, right? We’ve got volumes and volumes of data on many topics. And maybe in a few special, specific domains there is a shortage of data, but in general the data is there, right? It’s the ability to extract insights from it that’s the challenge. I mean, I can tell you on the IBM side there are—we are building pools that—one of them is called Watson Analytics, that will actually go and run a full spectrum of mathematical models on a particular dataset and then rank the results on interestingness, right? Sort of trying, like I said earlier, to just democratize a little bit of what the data scientist would do and get you very quickly to something that is relevant for you as a knowledge worker to say actually, “Okay, now I want to explore this particular model in greater detail,” right? To kind of shorten those life cycles of getting out—
Morgan: Can you imagine that—I mean, I’m thinking of Scott Cook’s discussion earlier, talking about he though he had a business-to-business software company, and then it was a consumer, or consumer-to-business-to-business. Do you imagine that’s a consumer service or product at some point soon?
Kocoloski: Consumer—I mean, we certainly envision it as something that is sold directly to the line of business executives, right? It’s not the IT department that is the steward of all of the guarded insights from the corporate data site, it’s rather something that—
Morgan: But I’m sort of thinking about—there is that famous—I guess one of the XPRIZEs, the person that found the best remediation of the oil in the Gulf of Mexico that beat all of the university teams turned out, I think owned a tattoo parlor and—
Kocoloski: I think you will see a significant increase in a marketplace for insights. I mean, Walter mentioned that we don’t—no one’s figured out yet how to monetize the mathematical models. I actually think that’s coming, right? I actually think that, you know, these datasets are so huge, you don’t move them around, right? But these providers who have a lot of the datasets are going to be getting into the business of saying, “Actually, you think you’ve got a great model? Great. We’ll go run it and we’ll see how it ranks, and we’ll allow people to kind of fork that and refine that.”
Morgan: Yeah, well, I should also say, as someone who is working in an area where there is certainly an awful lot of economics put against solving problems, you know, and your vision, which you are very passionate about in bringing the information, the health information to everybody, do you imagine—I mean, can’t you imagine some models, the solutions that will be solved that will create a lot of value for the individual, you know, monetary or just, you know, societal?
De Brouwer: Yes, I think the—so I think the future of data is certainly user controlled, and when you give the user control then, you know, amazing things can happen. And I also don’t believe that the subject matter experts will—you know, because data is becoming a currency. It’s like money. So we’re all becoming data scientists, because we’re also all doing money, you know? So how it will work, nobody knows, so we are just trying things out, so the complete business models about data and apps and ecosystems and platforms, they are still being built.
Morgan: So with just under two minutes I’d love to just go down the whole line here. The one thing—you know, we didn’t have a polarization on the panel, which is good. That’s why I sat in the middle, you know, to see if this happened. [LAUGHTER] But what there does seem to be is a real polarization between what tools and the crowds can do relative to what experts—and I’d love everyone’s quick little ending on that, if they agree or disagree, and what might be a point they would add to that. So, Yan?
Qu: I always think it’s an ecosystem with technology, data, crowd, and you cannot separate them. And so they actually—it’s a feedback process. You can have the smartest algorithms, but still there needs to be—you need human insights into the results of the algorithms, and that could be the crowd and it could be, you know, experts in certain cases. And then the crowd would validate—clarify or validate your hypothesis and give you feedback. So it’s an ongoing process.
Kocoloski: Yeah, I think in some respects the experts are kind of a bottleneck in today’s sort of data-rich environment, right? And democratizing access to that and putting more people in a position to use their particular point of view, their particular private insight is really a goal that we need to get to.
Morgan: Walter?
De Brouwer: I think, you know, 20 years ago at MIT the students, 80 students were going to fly a plane together. They were all on their computers and the assumption was that it would fly without any error, because there was—there is in plane engines zero tolerance. The plane didn’t move, it was in paralysis, until one of the students shouted, “Lift!” and then it faultlessly took off. [LAUGHS]
Morgan: So I guess like the orchestra does need a conductor, no matter what.
Prabhakar: Actually I think that the more complex systems that we are building—and I’m now thinking of transportation systems—the role of data in the crowd is actually to surface the basic understanding of that system to the expert. The expert is not as well informed as they could be, the system is just too complex.
Kocoloski: I think you and I have different definitions of expert there. I meant the person who actually understood how that system worked, and you are talking about the person who is the transportation expert.
Prabhakar: That’s right. A decision maker, but also an expert who has institutional knowledge about how the systems are supposed to run, without the backing of data to really understand.
Morgan: And a separate kind, which is the leader, which is maybe the crowd—you know, the crowd information cannot be translated into action without some sort of leadership or organizational structure. So James, great to end with you on this.
Surowiecki: I would just say, just kind of what we’ve been talking about before, I mean, to me they really are complementary. I think that crowds are only as smart as the amount of information that’s in them, and to the extent that data expands that source of information, I actually have always liked the idea of organizations or models where you don’t actually need a person at the top, but I think the concrete reality is that what the crowd can really do is provide insights that decision makers and leaders can really act on. And then I think, you know—exactly right. Incredibly complex situations. I just think one person just, there’s no way to know enough basically. So.
Morgan: I thank you all very much for participating in this panel, and I’m sure we will also continue this conversation on our own in the hallways and over drinks at different times, so thank you very much.


Adam Kocoloski

IBM Distinguished Engineer and CTO of Information Management, IBM

Balaji Prabhakar

Chief Scientist and Co-founder, Urban Engines

Walter De Brouwer

Chief Executive Officer,

Yan Qu

Vice President of Data Science, ShareThis

Dave Morgan

Chief Executive Officer, Simulmedia, Inc.

Scroll to Top