If elementary hydrogen is the building block of the physical universe, data is the stardust of insight and innovation. Where is the art of prediction headed? How can and should we apply the special chemistry and physics of data analysis? How do we need to change organizations and attitudes to benefit from the data revolution?
Read the full transcript below. (Transcript by Realtime Transcriptions.)
Chui: Thanks for coming. As you see, the title of the session is "21st Century Alchemy: Open the Data and Stir."
David Kirkpatrick asked me to never use the words "big data," but I did. So I have already violated his request. It is a topic that a lot of the audience members are fairly sophisticated about it, perhaps at least as sophisticated as people on the stage so we will try to encourage a lot of dialogue with people not only on stage but out there in the Netherlands otherwise known as the audience, which makes no sense. So the mic runners might stay busy today, is one thing that might happen.
So real quickly, people have heard of big data. Yes? On the hype cycle, who thinks we are entering the trough of delusionment? Just one? The panel all does. They are all delusioned. Phase two. People think we are on the upswing on this? Anyone? On the upswing? Anyone think we are past the trough of delusionment?
By way of introduction, my name is Michael Chui, a Partner at the McKinsey Global Institute, I lead research around technology trends including big data. We just published a piece on open data, et cetera.
I will let my panel introduce themselves just to get a sense of where they are coming from.
Bell: I'm Brooks Bell with Brooks Bell, and we do AB testing for large companies, so we help decision‑makers kind of look to data more frequently to inform their decisions and actually test it before making a decision.
Messer: Stephen Messer with Collective[i]. We provide analytic applications around sales, marketing, customer support and service and deliver software service approach.
Schmarzo: Bill Schmarzo, I'm with EMC Consulting Organization. I'm the CTO and have the best job inside of EMS, I get to spend all of my time with customers, figuring out how and where to start the big data journey. Great place to be.
Scott: I'm Walter Scott. If you have ever seen satellite imagery on your mobile device or web portal, you have seen satellite imagery. We have five satellites around the planet. About six times over every year daily revisit, so a lot of what you can do with that may be a topic of this panel.
Chui: Terrific. Why don't we get the conversation started with you, Brooks? One of the things we have observed in using data is that actually it enables a different way of making management decisions which is around experimentation. But that is not the way that most managers either were taught or tend to manage. So how does an organization—first of all, is that hard to do? And then how does an organization start to move along that journey towards viewing an organization as a laboratory?
Bell: As you mentioned the standard way of managing an organization is kind of the "Highest Paid Person's Opinion", or HPPO. And testing and experimentation is a completely different way of approaching that—I don't know that it absolutely needs to be completely changed. I think there is a lot of value in the intuition in the CMO and CEO because their good judgment did get them into that role. But I think that what's important, what you lose the higher up you go, is often a connection to your customer. And your strategy ends up being much more about your competitors and much less about your own customer. And I think that testing kind of reconnects you to your end customer and gives you—and reconnects you to the objectivity that I think you kind of lose with a lot of politics. So I think that to actually start shifting their decision‑making to using experimentation, I think that, you know, like anything else, they are finding out that their competitors are doing it and it is a big competitive advantage right now. And even if they don't necessarily believe in it at first, we are seeing a lot of SVPs kind of secretly deciding, or maybe not so secretly deciding, that testing is becoming one of their strategic priorities and they start small with one product or one part of the business, and start small with one product or part of the business seeing if they can generate success there and create demand. And then when they start seeing the success, of course it is much easier to get more buy‑in across the rest of the organization.
Chui: Can you tell me a story of where it has become a competitive differentiator and how it is competitor for that industry?
Bell: I think one area where testing has had a big impact internally and has shifted the thinking of management is one of our big clients, we started working with a single individual who was a senior manager, in retention, in marketing; and she had, you know, a little bit of budget, but no head count. And she hired us to start running some tests for her on the side—secret, sort of under‑the‑radar tests. And then she tested for four months with no one really knowing.
Chui: What was she testing?
Bell: She was testing some—one of our key metrics was feature activation. So she was testing some key pages that led to feature activation. She was on the retention group, and activation led to higher customer satisfaction and retention.
And so once she got this huge, breakthrough win, then that's when she announced it, and then she got more resources. Fast‑forward, this is now 18 months later, she has ten people reporting to her. She is now the voice of testing in the organization, and she is training—there is this incredible demand for testing, and she is training all these other groups; and she is now—she has been promoted to a director level and has been invited to like the CEO's grand summits and is now very influential, a big bright spot in the organization.
Chui: So that all makes sense because testing online is fairly easy, right? But we have companies like Ford represented here. We had Bill Ruh on stage saying, you know, can we really do tests on jet engines, and all that kind of thing. So can you say more about is it possible to apply AB testing outside of online and mobile?
Bell: It's possible, but it's very difficult. I can't say a blanket statement yes or no, because a lot of times it's no. But anything that you can measure and iterate on, you can test. But it's just messier and, you know, if you have humans involved, you know, motivation, psychology, you know, salespeople, it's just so much messier. And it will take longer, and it just has to a have a much bigger upside to justify the investment and establishing that process. But you think about it like anyone does. I think homeless people are some of the best AB testers because of their signs of—what they write on their signs is like, I'm sure, drives more results than others.
Schmarzo: Yeah, I was going to say that what I've seen in experimentation is one of these aspects that many—I agree totally with Brooks, there is the person who is—I typically have been—fortunately, I have been involved in two separate situations where I was in a meeting where there the senior executive said we are going to do this kind of a campaign and, then some person in the back corner raised their hand and said "well, the data says" basically is risking future employment at that organization. The debate takes place.
There is a point when the organization says, let's try both, let's do both. In both cases there was this pause. You can see in the room, why can't we? When that moment happens, when ideas are out on the table and you can test ideas, whether it’s in marketing campaigns or how you stop fraud, when that happens, you unleash the organization. People now who were scared to share ideas now feel empowered to say, well, let's try this, let's try this. Right? So it's a liberating effect for the organization when they start to embrace this ability.
Chui: Let's pick up on that. You mentioned the people element and how they are doing it. And we've talked previously about people who have how these skills, data scientists, et cetera. How do you think about that and how do your clients think about that?
Schmarzo: I'm not sure there is a shortage of people. There is a shortage of capabilities. In some cases we need to repurpose people who have been doing certain things. There's new tools, new techniques and such.
The thing that we find, though, is that we are able to focus organizations when we get them to think about what it is you are trying to accomplish on the business side.
If it becomes a technology chase discussion, then certainly you can—the organization can auger into the ground, trying to figure out how many data scientists do we need? You know what? Let's be honest. That question is kind of immaterial until you really know what you're trying to accomplish. You might not need any. You may find that the people you have onboard who are already doing your structuring data using heavy sequel, but now instead of looking at 13 months of data, they want to look at 15 years of data, right? Well, you've got those people already.
The capabilities question is a hot topic, and there is no doubt that we're seeing organizations trying to advance people's capabilities, but unless you are know what you're trying to accomplish, it's really getting the cart ahead of the horse to start talking about capabilities until you figure out what you're trying to accomplish.
Chui: Stephen, your organization thinks you have at least one solution, which is software. Say more about what you think about talent challenge and problems and the way you think about solving them.
Messer: Collective[i] sort of looks at the world a little bit differently, which is, we say we've lived in the land of tools, technologies and platforms forever, and it sucks. It sucks because though the company is expected to know what problems they want to solve, they tend not to. They have biases that preclude them from actually doing the right thing. When they go about trying to buy and integrate all this software, they are asking the CTO or the CIO to make these decisions for business organizations that by the time the CIO makes the purchases, the wants for the business have changed. There's just all these things that are going on.
And so as all technology paradigms have developed, what you see is applications sort of come in to fill that void. So originally it would be—let's just take CRM as an example. Originally everyone started building their own little CRM systems. And today we look back and we say, oh, I feel bad for those companies because that's their anchor that's preventing them from moving forward all these horrible things.
We see the same happening in the data science area, which is everybody last year kept saying we want unstructured data. Why do you care? That's your problem. Our job is to figure out the questions that you need to solve that will give you some advantage in the marketplace. Just pure data, some kind of information that brings that data to a higher level or you need to coordinate to create some kind of knowledge that gives you an advantage.
There are a lot of different technologies. Let us deal with that. Our job is to abstract all that away because we think the likelihood of companies providing that—well, I mean, we have had five years of this big data things, which is really a continuation of 30 years of BI, and people are still making gut decisions so that doesn't feel like we were on a good path. So our goal is to try to change that.
Chui: So there was a question about how do you integrate data for multiple sources? Is that something that you want to ask?
Messer: Our argument is that you shouldn't care. What happens is people start off with this premise, and they say, okay, we have all these different sources. And boy, if we had all these sources in a master data warehouse, or all these sources in one place, that's great. And i think this is an example of where people struggle with the data science. What they are trying to figure out is how do they take the exhaust from their business and turn that into gold. So try to take this analogy of the data being the exhaust coming out of your car. You buy gasoline, you put it in your car, it drives you somewhere. Then you are producing all this carbon dioxide. You're saying, God, it costs a lot of money to store it, it's going into the atmosphere, we're going to die from it eventually. These things are growing at a cost. Everyone is thinking, boy, if I could turn that into something valuable. So everyone fixates on saying I have more data or I can do something better with that data.
In the data science world, you usually start off with what question would actually have a big impact on the business? That may need some of your data. Oftentimes it doesn't require any of your data. Sometimes you get a much better answer, more specific, much more valuable answer, and I may take almost nothing of your data. And that's the difference between sort of someone who's into the data sciences and someone who is inside of the business, which is, I can create a higher value without having to worry about trying to figuring out how I'm going to integrate 15 sources of unclean data, try to map it with FaceBook because I think I should, or weather is going to be the impact of everything so I better have weather data. You start off with these things saying I've got something, it has to be worth something. And I think that's a fallacy that is probably the core problem of why everyone is trying to store every piece of data, why everyone thinks big data, like the more I have, the better it is, which is actually completely opposite, and why people always fail.
Bell: I think people forget the vast majority of data is noise.
Messer: It sucks. It costs a lot money to store. It costs a lot of money to clean. It's usually useless.
Chui: That's why I sometimes think the signal is in the noise, but we will get back to that. You can find the signals and focus in on that. But collectively everything and saying, just because there is some signal there is like saying we need to get a lot of hay because we need needles.
Bell: I think a machine is not going to spit out the exact signals you need. It think that's the hope of big data, but I don't think it's unrealistic. I think a signal might be in there, but the only way you are find it is to know exactly what you are looking for and to have a very talented person to know exactly the right question and then the way to translate that question into how to query the data to start looking for those insights. It's just hard. That skill is just very difficult.
Chui: Go ahead.
Messer: In your statement there was a supposition which was, if you do it, there is value in doing it. And my argument is not that there is not value, at all. In fact, I think this is one of the more revolutionary things that are going on that is so impactful for people's lives in a positive way. My argument is more that most people have a day‑to‑day job to do. They don't have the time to think about how do you do data sciences right, and so they start with certain suppositions that they're not experts in stats, they're not experts in data. And what I worry about is that we have another five years of the way we've had the last five years, where people have gotten a free pass to buy a lot of hardware and software, and that hardware and software is now going to old problems. Dupe is being used to solve what used to be roll‑ups in databases. They are solving these old problems, and they are doing a good job, but they are not getting what people had hoped for. And they are going to walk away thinking—I lived through the dot com boom, through boom/bust cycle, and just when everyone said in 2000 this thing was over was just as it was really getting going, and it caused a lot decimation of a lot of companies that are recreated now in the new world that everyone thinks—you know, the FaceBooks and things like that. But I worry what will happen in this industry is a lot of people will walk away saying at a revolutionary moment in our time people are, voluntarily giving up information to get a value back. And people will say, you know what, It was all for naught.
Chui: We need to get Walter involved here.
Scott: This is interesting.
Schmarzo: I was going to challenge a point that Stephen made. You didn't make it harshly, so I'm not really picking a fight, I'm just trying to make it exciting here. There was a comment made that the users don't know what they want. And I'm going to say, pardon my French, bullshit. Every business user I talked to, and I probably talk to businesses, probably five to ten of them a week, and every business user I talk to knows what decisions they're trying to make. And they know what questions they're trying to answer in support of that decision. What they don't know, and where we have failed I think on the big data side of things, is we've not helped them understand how they can answer those questions in a more timely, more accurate, more fulfilling manner.
Let me give you an example. So I'm dealing with a trucking company in Kansas City, Missouri, the heartland of America. Actually, Iowa is the heartland, but we will give Kansas City its due here. And when I was meeting with the executive team, I asked them, what is your most important business process? There were six of them in the room, and they all almost said in a chorus, "on time pickup." If we don't pick it up on time, the profitability of that route goes to zero.
So I said okay, so what would you need to know in order to improve on‑time pickup?
They said, well, obviously traffic is pretty important and the weather.
I said what about events? What about if the Chiefs are playing in town, or the Royals are playing in town, or there's a concert in town? How would that be important. Of course that would be important. I said, well, you could screen that from EventBrite
Another person said trucks break down and when that happens it really screws us os knowing the predictability on our trucks and by the way, certain drivers are better than other drivers. What was interesting was, you started with their most important business process, which for them was on‑time pickup. They knew what question they needed to ask, but what they didn't know was all these different data sources out there and all this that technology capable of helping them to answer that question in a more accurate and more timely manner.
Scott: You gave an example of the use case where a question drives and asks for pieces of data. It's great if that data exists. If that data doesn't exist, you are kind of out of luck. I will come back to the orthagonality later.
Moderator: I love that point. Let's come back to—your company produces a pretty interesting set of data. Say more about how that is available. I see cool pictures on the web, but ...
Scott: So a picture is worth a thousand words. Why is that? Because unlike the other data services that are out there, they are in some sense instrumentations, instrumentations of what devices are doing, what a person is saying or doing. Satellite imagery is data of the white space between those measurements. So, you know, we collect the earth six times over, on average, some parts more interesting than others, and that means that in our image database, we have every rock, every tree, every house, every car, every boat, every—pick your favorite object; it's in the database. Some of those things are things that nobody thought to measure, like when where are the tires that could contain water that is breeding ground for mosquitos? Or where are the—this is something that I learned from the panel this morning—where are the shrubs that are the habitat for deer that are the source of lyme fever, or the ticks that produce or carry lyme fever?
So having that database of having the not just the points that you think are important, but all the white space between, means that you have an almost endless opportunity to mine. And talking 20, 30 petabytes worth of data, the imagery, and the stackable areas you can produce from that is as big as you can choose to produce.
Chui: So your company actually has satellites?
Scott: We own five satellites, two on the ground, five on orbit, owned and operated, flying 24/7.
Chui: What are your customers? What do they do? Are they looking for Lyme disease?
Scott: All over the map. We sell to government, to pretty much every web portal on the planet, location‑based services, oil and gas companies. We do a lot of work with humanitarian organizations and I don't know about Lyme disease per se, but tracking malarial indicators, following refugee camps around the world. And something very topical, we crowd‑source imagery that is collected after natural disasters and make the updated map available quickly to first responders and people in the affected areas. So, for example, the typhoon that has been devastating the Philippines, we crowd‑sourced to give an updated map and because the map is useless if it is out of date and, it gets really out of date fast after a disaster.
Chui: You described it as a way of instrumenting the globe, and I think we are increasingly seeing more instrumentation, whether it is remote sensing or sensing things throughout the physical world. Curious what you think about the future of that how that evolves and the question about how you avoid being tracked, given that I mean your bids are flying over us right now. Right?
Scott: As far as avoiding being tracked by our bids, stay indoors. Seriously, there are so many different ways of instrumenting the planet, that are exploding really faster than any of us can keep track of. The number of images that are uploaded to photo‑sharing sites from hand‑held devices, is growing exponentially. The introduction of drones, which once you get past the public policy issues that are associated with that, those are proliferating. You have satellites—you know, ours are probably the highest resolution that are up there, but increasing number of satellites circling the globe, not just collecting pictures but actually collecting data that is not what the human eye can see but tells you things that are invisible to the human eye. We're launching a satellite next year that has—this is maybe too geeky, but 27 spectral bands; so the ability to see what kind of rock you are looking at. I mean, rocks basically look brown to the human eye, but to that satellite you can tell the kind of mineral you are looking at
Chui: How much does it cost, by the way?
Scott: Too much, too much.
Chui: That's good. I mean, it's interesting. We have been talking about big data analytics, arguably for decades but certainly the recent iteration for a few years now, and unfortunately I have been part of that hype cycle. A good question is to just sort of take stock now. We asked the crowd a little bit about where they thought we were in the cycle. Where do you think we are in terms of reality? What do you think has changed in terms of the opinion over the past few years as we have had the experience with customers and others using data? You know, just realistically, where are we now and where do we head? How have things changed over the past few years. Brooks, maybe you would be willing to start.
Bell: I think big data is a PR thing. That word, big data, is catchy, and as a result, everyone is talking about it. But I don't think that much has really changed. I think we do open these new, you know, ways of collecting data and way more data, so it feels intuitively very exciting and I think it is conceptually very exciting, but I think where we're at is, that's not really what's helpful. I think it's more that we have always been, where we don't have enough analysts and we are not even good at looking at any data to make our intuition—to make our decisions; that we, if we just started using data at all, like if we had more people who even knew how to use Excel, that would be a great step in the right direction. I think we are still there be, and that's where we have always been. There is more opportunity and way more demand going forward just because there will be a lot more data out there, whether it's big or small or whatever, but I think that it's going to be held back by the number of people who have the, even the basic skills.
Chui: So, loaded question, do you think we are teaching too much calculus and not enough stats?
Chui: When is the last time you took an integral?
Bell: I took statistics in my undergraduate degree in college, and my basic stats class is really the number one class that I still, you know—that I learned something memorable from, that it was just a standard type class, that stats class. I think that, you know, statistics is helpful for anyone, no matter whether you are in analytics or really any other profession, because it helps you understand the nuances of how we think of knowledge and how it makes you more skeptical of everything.
Chui: Stephen, where do you think we are now?
Messer: You couch everything in a simple question like where are we at? I will try my best. I think the hard part about making that call is, the winds set the clock. What I mean by that is, everyone has this vision of the disruption ability behind big data and what it can do. And when you see something like a NetFlix come out and say, we bought this program, House of Cards, and we are able to do it completely based on using data to make that judgment, and it was a huge win, despite the industry saying they had no idea what they were doing, they had no experience, et cetera, and they were able to buy three seasons, all of a sudden everyone looks back and goes, oh my f'ing god, we are in trouble. Most people are not afraid of big data but of that big disruption that is coming. You have this company that just basically distributed videos over the Internet, didn't have a lot of videos, and now it basically has become a major producer and now is a TV channel in itself. People look at that and say, I hope that's not me. They wrap themselves in the shroud of their business and it gives them comfort to say, that can't be me. But the reality is, that is the risk. Where we have more wins like that, where NetFlix comes out and does that, that is amazing or someone else does that, it resets that clock for a year or two. So it's very hard to figure out where we are at in that cycle. I think newspapers are probably a better indicator. They are the signal‑to‑noise.
Bell: I think NetFlix, just a couple years ago, they used data that drove that decision about Quikster, which was a total disaster. Seemed like the most reckless thing you could ever do. They lost 800,000 subscribers from that.
Schmarzo: That's a good point, is that many organizations, especially older‑feeling organizations, are trying to use data to protect their current position. But that wasn't the question. The question, the one I took away, is where are we today? I would hate to be a CIO. I think that has to be the motion thankless job out there. Because the technology underpinnings are just constantly shifting. I am going to use a dupe. Really? Which version, which vendor? What are you going to put alongside of it as far as componentry? Which H‑base, which hi, which map reduce? Here comes yarn! So it's the total.
At the risk of getting people in my company pissed off, I think the technology vendors are really to blame here. And we keep changing the underpinnings so much that it's like a magician, we distract people from the important stuff so we can watch this hand movement over here. So I think from a technology perspective, while the technologies are definitely getting more enterprise and there's more marketability and capability, it's really hard to pick and choose a winner.
On the other hand, I am seeing organizations that are starting to think about it from a business perspective. I've got a business need, I'm trying to solve this need, I might have some data, there might be some data out there, and it's not as silly—it's young companies, but it's not young in age, it's young in approach. These are companies who maybe they have gotten a wake‑up call. Maybe the NetFlix has slapped them across the face or they are facing a near‑death situation sand they are willing to do things differently. So on the business side I am starting to see organizations creep up and attack this thing, but the technology things underneath them really have muddled things up.
Scott: Two thoughts. I said earlier we are in phase two. We are past big data; we are into enormous data, enormous data sources, well beyond the ability of a single organization to manage by themselves. I think, Stephen, to some of the points that you were making, that most of the data you need is often not data that you have created yourself but it's data you have gotten from a variety of other sources.
Second thread is orthogonality, the idea that you have different ways of making a measurement that allow you to compensate for the errors in one by having a completely different way of making a measurement. The example I use, let's say you want to know how much traffic there is going along a particular road. Well, you can measure the mobile devices that happen to be going along that road. And what you're—what that will tell you is not the traffic flow; it tells you the traffic flow of those mobile devices which isn't necessarily the same as the amount of traffic that's moving. So if you have a different way of measuring the traffic, whether it's the old technology way of the little rubber, pneumatic things across the road, or observation from above, or somebody sitting there clicking with a little clicker by the side, it's an orthogonal way you can calibrate the measurement that you made from the instrumented source to something else. I think as the data sources proliferate, the ability to make those kind of orthogonal measurements will become increasingly important so that you're not chasing the noise. You think you're measuring something, but you're not.
Schmarzo: May I speak to that point? I love that orthogonal. It's spot on. I'm a big baseball fan. My son is a professional baseball player, so I am forced to be a baseball fan even if I didn't like it. I am sure most of you have read the book, "Moneyball". The movie is interesting, but the book is great.
Scott: Loved the book.
Schmarzo: What is interesting about "Moneyball" is, how do you look at different metrics, that may be better predictors of performance? One of the metrics that is constantly gained in the baseball field is fielding percentage. Fielding percentage measures how many times—how you are successful at catching a ball that comes at you. If you go after the ball and you don't catch it, they measure that. It measures how many errors you have. Well, outfielders—some of the Chicago Cubs will go unnamed—have figured out if you don't go after a ball that is outside your comfort zone, your percentage goes up, so don't try for balls outside your range. That will benefit individual players but not the team.
So what has baseball done? Baseball has different video cameras, and they can videotape your effective fielding range, how much space you can cover going for the ball, and that's why Derek Jeter, for someone going for the ball, his effective range is still phenomenal for a guy his age. So that orthogonal data, that allows you to make sense, but it doesn't—this is a different perspective. We will probably get better. We will probably add time dimensions to it to figure out how fast they move, but the metrics are constantly being challenged by this orthogonal data that is out there.
Chui: I think that makes a lot of sense it is not only important to use data but to not suck at it. It is good to use that. We have talked about where we are now. I am curious about looking forward. People have talked a lot about predictive analytics and the fact that the world is increasingly instrumented, all the things we will be able to look at and predict, et cetera. What does a society look like where the use of analytics and alchemy is everywhere and that has become table stakes? What does it look like?
Bell: I started to read "The Circle" by Dave Eckerd on the flight over here. Has anyone read this? It's a fascinating book which basically answers what would it look like where all data sources were integrated and there was data everywhere, and what that would do for society. You know, I was thinking, you know, conceptually, what would that be like? Just how people would interact with each other. It's a fascinating look of the logical extreme of what it would look like with no privacy and total transparency and no social taboos and everything is measurable and you can find any piece of data effortlessly. And it's a great thought piece, because it's not—there's some really exciting, powerful things in terms of social acceptance that have come along with that, but a lot of really scary things as well.
Chui: So on balance, what would you say?
Bell: On balance, wait, what?
Chui: What does the world look like? Or shall we just read the book?
Bell: You should just read the book, but ...
Unidentified: Or see Minority Report. You think Sky Net?
Bell: It's kind of like a Sky Net kind of thing, but I think that overall, it just—we all want like a simple answer, but it wouldn't be simple, it would be a completely different paradigm. There's a lot of new pros and cons. I think that is how data is. I can't give you a simple answer.
Messer: May I start by saying I think humans are amazing? They are fan at that time particular so you have nothing to fear from predictive analytics. Predictive analytics is there for the most part to be a guide to help you. The human doesn't go away. Especially when you are talking human to human. Whatever we want to say about how good we will get with predictive analytics, you as a human on the other end of it have to make that judgment call. It is giving you a probability of success, but in statistics, especially in predictive analytics, they say, you know, all models are wrong, some are good or useful. And I think that is it. Which is, they are there to be hopefully a nudge to say we think this is probably a good decision but please use your best judgment. I think when you use it in conjunction with humans, you do have the ability to sort of up the play. I akin it to better Nikes when you are running. You still have to be a good runner, but Nikes can help a lot. At least they can make you look better when you run so it doesn't hurt there.
Schmarzo: I don't know. I'm not sure that, you know, when we turned everything over the machines, then Sky Net does become a reality. People do know what Sky Net is, right? Terminator? Right? Thank you.
I had an interesting conversation over lunch today when we were talking about the financial collapse recently and how a lot of that was predicated on some dubious human behaviors but also some bad models that were developed but not thoroughly tested. And humans have to be involved in developing and constantly testing, invalidating models. There needs to be somebody in there. When I was at Procter & Gamble in the 1980s the head of our research group said don't challenge models, challenge the assumptions. Apply your techniques to assumptions, and make sure your assumptions aren't false. We have to be, as humans, constantly in that process of monitoring and keeping involved so we won't have to be a world that is purely predictive analytics, because humans will be in there hopefully making sure that the results we are coming up with both make sense and are things that we should be doing.
Messer: May I just add, I think the Sky Net discussion comes up around big data partly because of its brand and name in the discussion of why are we targeting people, things like that, it has a negative connotation. As a dystopian reader, it's easy to fall into that behavior, where it is going to end badly. But I don't think that is where a lot of this stuff is about. The types of data that people are collecting, we are not gaining insight into your thoughts. We may be gaining insight into some of your patterns and some of the way you have acted, but it's not like we are in your head. It's not like we are trying to, you know, enter into your private lives. In fact, a lot of this stuff we will use is not stuff that is sensitive, because we are sensitive to it. And also, the other thing I would highlight is, there is an economic disadvantage to using all of that personal day‑to‑day every minute by minute stuff, which is it costs a lot to process and store, and there are limits to what you can do with it. So this may come as a shock to users in the audience, but I will pay more money for smaller amounts of data that will give me the same signal or better—a lot more, because it is a lot better, it is cheaper, easier to run in the process. When you use the word "big data" it means you are capturing everything about me. While the name "big data" in the sense of popularizing everything we do, but the bad is that it leads people to think you are watching me all time. It is like the refrigerator is watching me, it's going to call me fat and send it to my health care provider, who will up my fee because I eat food from my refrigerator.
Schmarzo: Just as long as the refrigerator doesn't lock itself. The physical effect is a little bit scary.
Messer: But that's not how this industry works, despite the mythology. A lot of people who'd be on this stage think mythology exists out there. It makes it like you are the wizard, you know everything that is going on, you are perfect, you are not needed anymore; just do what we say, everything will be okay. I really don't see that world coming to fruition for—but when it does, we probably won't exist anymore, so I don't think we have to worry about it for now.
Scott: I will pick up a few threads. One it's buffering effect in human behavior, that if you build a predictive model and the predictive model was built at a time when people didn't have access to predictive model, the ability it will give when people have the ability to transparently look at the predictive model, it's going to change. So you will always be evolving in a world of predictive analytics. And my belief is you will never be ahead of the game, but if you come close to catching up you will be doing pretty well.
Stephen, you made a point that is a pretty deep one, which is a small amount of data that indicates the pattern is a lot more valuable than a big piece. However, finding that—it's like, I know 5 percent of the data is the most valuable, but I don't know which 5 it is. So finding that requires you to look at the complete universe of data to be able to detect that, you know what, this small piece of data over here is the nugget that I need to answer my problem. So one of the values of predictive analytics will actually be making us a lot smarter about what data is really important.
Chui: So there was this question about—oh?
Schmarzo: On a point that Stephen was making, it's weird that you are from New York and I am from Iowa and I am taking this position, and that is, I think that organizations are doing—I don't want to call it "evil stuff" with our data, but they are definitely making assumptions about us and not always using—
Messer: Is the assumption about the data that I am from New York and therefore that I am evil?
Schmarzo: And I am from Iowa, so I am naive. I am sure we all know the Target story, where Target, because of this girl's web behaviors, ascertained and made a score that she was probably pregnant and started showing her display ads for diapers and cribs and I believe actually sent some physical coupons to the house, and the dad saw this, freaked out, went down to the local Target store, read them the riot act, the manager there, and then, of course, two weeks later finds out the daughter is pregnant though she is only 16 years old. I wasn't involved in that, but if Target could figure out she was pregnant, I think Target also figured out the high propensity that she was also 16 years old.
Scott: That was not big data. That was traditional, past buying behavior.
Schmarzo: That's right.
Scott: So there is nothing other than what you would have done with your discount card that you have using for 20 years.
Schmarzo: They did use web logs. Web logs is a little beyond most BI tools. But it still is very traditional; I totally agree.
Scott: That is a fair point, which in that case they are trying to provide their customer a better service. Whether they had thought through the let's figure out if someone is pregnant who is underage, they probably hadn't figured that out.
Schmarzo: Just because you know something doesn't mean you should acted on it.
Messer: Well, it was still a useful coupon.
Schmarzo: If she was pregnant, it probably was. Maybe bad for the Target brand.
Bell: What are the chances it was just a coincidence. I think we give Target way too much credence of how intentional they were.
Scott: I do think you can pick up patterns or behaviors, like passions for affiliations or associations from whatever they do on the web and what they post on media sites. She was probably looking at diaper sites on the Target sites, maybe even across other sites. But—
Messer: Brooks, is your point that big data is not about analytics but about statistics? So you wouldn't be able to say based on a single story that Target really had figured something out, but if you saw a statistical pattern that said by golly there is a strong correlation between this particular pattern of behavior and by golly, you are really pregnant, then you might be able to say, okay, yeah, they really do have something.
Scott: I mean, it was sold as a big data story, but it was all internal data.
Messer: Oh, I totally get that.
Scott: That is sort of the example of evil that says everyone thinks Target must have gone out to Facebook where she told her friends and they are spying on her. No, she just went and bought something through their Web site.
Schmarzo: She didn't buy anything. She just looked at it.
Scott: But she went through the pattern of looking at specific things that typically indicate someone who is pregnant. I think there were scents, moisturizers. They sent a coupon. I'm sorry, when I hear big data risks, I worry about like the NSA. I don't worry about Target, like Target being with the coupon bomb. That's okay. Like they go in my spam folder. It will go away. That's why I was saying, we as seller of products around data, we kind of like this mysticism, everyone thinking we will know more than everyone else. But there is a deep limit to what they are doing. We are not invading people's privacy, in fact, no human being is looking at this stuff. It is typically very public stuff. But you can find out a lot about people, hang out with them you know about them, they are not disguising stuff in their head or back in their home.
Chui: So this question, how can you use these 21st Century alchemy to empower them as opposed to targeting them? Any thoughts there?
Scott: Transparency. The more information people have to inform their decisions generally is a good thing locally. It may not be—the Target example may have had short‑term negative consequences, but in aggregate, the more we know, generally the better things are. There was a panel I was on a few years ago, NASA's 50th anniversary, and the topic of privacy came up. It always used to come up with satellite imaging companies until people realized their cellphones cameras are way more invasive of privacy than satellites in space. The question was, what is really the problem with invasion of privacy? And one of the members of the audience suggested it's the asymmetry, that if I know something about you and you don't know the same thing about me, then it's creepy. But if we all know the same thing about each other, it doesn't become quite as creepy anymore. You know, if we had the expectation that we were not being videotaped here and this is a private conversation among all the people in this room, and we suddenly discovered afterwards, you know, wow, lots of other people, maybe three or four even that are watching this, that would be creepy. But knowing about it up front, it's not so creepy. I believe if we also have the ability to watch those people. We can watch everybody in this room. We can't necessarily watch the people who are sitting at the other end of the video. If we could, does that become less creepy?
Chui: So with that in mind, I definitely invite anyone who has a question. Jump in.
Attendee: Jody Wesby (phn). I have a real problem with your saying we are not invading anyone's privacy. I have chaired the American Bar Association's Privacy in Computer Crime Committee for about ten years now, and when people go to a Web site, walk in a store, walk down the street, whether they look at something, open a book, listen to a tune to see if they like it, that is within the realm of their privacy. And to think that just because people are on a Web site instead of walking in a store, that that has no privacy associated with it, just shows how wrong business is headed today. The fact that someone can look at a Web site, or maybe this woman could purchase something, this young woman, and that suddenly is not deemed within the realm of someone's privacy? Please rethink that.
Messer: Look, obviously, you are differing with what I said. I disagree but I respect your opinion. I would put more weight on your opinion if people were more upset with what is going on with the government today. Privacy started off with the fear of what governments would do with that data. Today we are talk about cookies, we are talking about coupons. That is a horrible warping and a disservice to the privacy cause. I am no bigger advocate of privacy, but if you want to fight the battle, fight the first big battle, the guys that will go invade your house because they don't like your opinion. You do that, I will get behind you.
It is actually a global view. I think governments should fear their citizens, not the other way around
Attendee: You know what? Americans today, you are right about this, do not understand big data. But they will. And the learning curve is going to be steep, and the impact on businesses, once they get it, they are going to understand that the metadata that the NSA wants to call what is legally protected PIN Register Trap Trace Data under Title 18 of the U.S. Code, then they are going to suddenly understand that this data means a lot more than "metadata," quote, unquote. So when that learning curve goes up and they understand big data, that will be a big difference. And there's a big difference in the U.S. of how we view this. We don't want government to have stuff that people don't care what companies have. In the EU, they don't care what the government has, they don't want companies to have this. So it's a very different view in different companies in different countries, with different ages, with different ethnicity groups. So it is not one size fits all, but there are certain human actions that we can say no matter where you are, that maybe those things don't need to be, in a Democratic society, based on our values, something that is open for everyone to grab, and run it through an algorithm, and sell it.
Chui: You raised up an interesting point that was asked before, how do you think about different regulatory regimes, mores are different? How do you think about that question? Any thoughts from the panel?
Scott: I will start. This is something that the U.N. had to deal with a number of years ago in developing an open skies regime around the idea that there are certain kinds of data that it's not within the realm of any particular nation to regulate the ability to be observed from space. And some of that really goes back to the Cold War and the use of satellite imagery as a way of keeping the Cold War cold, because it enabled people to act on the basis of facts, not on the basis of fears. Where we have to be more sensitive is not so much with regard to the collection, but with regard to the different laws around the world regarding the dissemination of that data. There's still significant divergence from country to country relative to what countries consider mapping data to be sensitive versus what countries consider mapping data to be something that's out in the open. And you know, that's probably more the regulatory environment, less about the collection. It's more about the use.
Attendee: [Inaudible] beyond the regulation. Part of what Judy is hinting at is idea of the morality behind it. I think that is an interesting debate, beyond what a government tells us is right or wrong.
Schmarzo: So let me have my take on this. I voluntarily give up a lot of my information to sites, from cookies because I get valuable information back from it, like recommendations from NetFlix and Pandora, things like that. As a person, I'm willing to give up some of my data in return for value back. I will continue to give data to companies as long as they use that data to my advantage. And as I talk with companies, by the way, this is a really tricky area they run into, is just because you know something about a customer doesn't necessarily mean you should act on that. So when I work with our clients, I'm constantly trying to reinforce them, that you need to be looking out for the customer. So dealing with a large grocery chain and they are looking at delivering through the mobile device recommendations in the store, products you should have, right? Now, if the grocery chain had its way, it would push you totally you to their private label products. Better margins. But I don't buy private label products, and the minute they push a private label product at me, I will go shopping somewhere else. So I am willing to give up information about myself. I probably have ten customer loyalty cards. I am giving organizations information about me but I expect in return for that they are looking out for my best interest. Again, I am from Iowa, so I am a bit naive, but as soon as I see that being violated I will vote with my feet and my pocketbook.
Messer: Interesting. We did a study recently on the economic potential of open data, when the data becomes more liquid and shared, sometimes with consumers, sometimes with companies, over $3 trillion annually of potential impact. More than 50 percent of that impact actually gets captured by customers and consumers themselves, so the little bit where you realize unless some consumers get some benefit, that a lot of these levers don't make sense.
Bell: Some data is more invasive than others. Looking at it as a blanket statement simplifies it too much. I also think that there are—there is definitely—I think most companies are very aware of it. I have spoken with Target and asked them about their—I talked with the CMO at Target this summer. And he said, we have lots of data, but we actually intentionally don't use the vast majority of it, because we have—it is a very sensitive thing internally to be very, very sensitive about our client, our customers' privacy.
So they think about it very intentionally, and I think that—I think that—I think there's a lot of judgment involved, but I think it is, you know—there is some data I think is super helpful and other data is more creepy, and I think it needs to continue to be a judgment on what is worthy and what is not.
Messer: You brought up an issue. You really are talking about the morals of a society behind what is acceptable information for the use of other people. I think that is a dialogue that I wish we could have. It is almost like fate keeps giving one opportunity after another of either a misuse or a good use, and the dialogue never comes up. It is only when Angelo Merkel's phone gets recorded that Angelo Merkel gets upset about this, despite the misuse of data in World War II. There should be an opening of dialogue of what is acceptable.
My only point, the point I take back with you is, you have a real risk of a government bombing somewhere or killing someone, and this has happened over and over again. Start with that and work your way down to cookies and consumers and businesses. What is the worst that Pampers is going to do with you? Not give you a coupon? What is the worst that government can do? Well, they can actually take you out and shoot you and keep doing it. So start with that discussion point, what is the cost benefit, what are you willing to give up, what are the options of walking away? In your case, walking away with your wallet. That should be society's discussion, what should be happening, what are the benefits? It feels like it always gets into this discussion of why cookies are bad, you are taking information, misusing it, all this other stuff and there actually real issues talking about society, talking about free speech, what it means in a world where our actions are part of our speech. And I worry that at one point there has to be some kind of convention where people are talking about not a world moral but the morals that each country wants to live by. I don't think we ever get to that. I think we get into a stupid exchange are the coupons worth the information we give up? And that is belittling the discussion.
Scott: Flip it around, governments are being held accountable by having greater transparency against their actions. We work with the satellite channel project, looking at the Sudans, literally shining a light from space on actions that governments are taking that can't be taken, hidden anymore. They are exposed to the light of day. And one of the things we haven't talked about is the degree to which big data enables greater transparency into the actions of governments and puts that transparency into the hands of citizens around the world. Isn't that a good thing?
Chui: That is a question. You have been waiting patiently.
Attendee: My name is Joe Alert, with Sport Vision. We do those cameras you were talking about, those baseball—
Schmarzo: Very cool, very cool.
Attendee: This is a question really for Walter, maybe toward the transparency issue. There are a number of smaller companies emerging in your space—no pun intended, sorry.
Scott: It's big.
Attendee: In fact, Sky Box Imaging is launching next weeks week. Their premise is to provide me with the ability to go to a Web site, use my credit card, and buy imaging data for whatever purpose. I may want to do analytics on it, I may just want to see my backyard. What are you doing in that regard, in other words, to open up your database so that individuals or small businesses or countries can see what is available and make it available?
Scott: Well, it already is open. Anybody who has used their smart phone, anybody who has used any of the web mapping services has the ability to see our imagery. And if you wanted to buy it, you could. The question is, what's it worth for you to get an up‑to‑date image of your house relative to what you can get for free from any of the web mapping services? Yes?
Attendee: It's not part of your business model to provide that to an individual right now. Is that—
Scott: Flip it around, I would say that the business model is providing it to individuals, it's providing it through the various web mapping portals, the services that—make a distinction between who the customer is and who the user it. If user of satellite imagery—the vast majority of users, several billion of them, is users, consumers, who don't pay you a nickel for it but rather use services like Google and Bing and Apple and Bidou and Ten Cent and Yandex, and a long list of others around the world as a way of gaining access to continuously updated view of the world. But they are not direct customers of a company like Digital Globe.
I think our view is that there is an ecosystem out there, some of whom are direct customers of ours, and some of whom are enablers for end‑users to derive value. I don't know how that ecosystem is going to evolve over time. I wish—I mean we are looking forward to Sky Box having a successful launch because having an ecosystem of providers is generally good for the industry.
Attendee: Sorry, one more thought here about reframing the discussion the way it captures the morality, the ethics, the policy, is if we think about environmental controls. Sometimes it's interesting to think about parallel behavior and society. Something that Herman Miller thinks a lot about when we start to think about data, our future in the area of technology, because being stewards of the environment is on one level a moral, ethical issue, but it's also a policy issue. What I would argue is that, given the opportunity to do what's right, most companies do not do it. It's the unfortunate truth we have seen throughout the hundreds of years of the Industrial Revolution, the last hundred years.
That being said, if I draw that parallel into big data, shall I use that as a "Beware the Ides of March" type comment to what will happen with big data? I love your comment that I would be happy to give people information—I do it all the time, by the way—but I don't really know what they are doing with it. That's the problem. By the time we figure out what is being done in a negative way, it's kind of too late. So just some more food for thought.
Messer: You are making a statement that is pretty strong one. That is, companies that have the chance to do something do the wrong thing. Is that backed up data or personal perspective?
Attendee: It's a personal perspective. In the case of the company I work for, we have been stewards. As a case in example, we are moving PVC content out of our chairs. It will exist forever. We're moving it out not because any governmental agency is telling us so, but we think it's important. I would argue if you see what is happening with the environments and China and a lot of developing Third World countries where they don't have the government regulation in place, we have seen that they haven't necessarily made the right decisions.
Messer: To the point, the point may be fair enough, but since we are at an analysis panel, that is a great question for first run of data, to see is that true or are there some things that have happened that are so horrible, that is what sticks in our mind. They are actually solving problems that other people have. So sometimes things go awry, yeah; they do bad things. But we live in a world where buildings are not collapsing around us because they use shoddy material. We do have, you know—I mean it doesn't happen often, not here. And so the point is, it is a—first you start—since we are an analysis, we should start with do the analysis, see if that is actually the case and how big a problem it is, because we shouldn't blow that out of proportion. And if you are right, we have to think about that in a different way from managing the environment.
Bell: It is as simple as just measuring it. If that is a problem, the next question is, how do I measure this potential problem, how do I frame my theory around it? What are the right KPI's around it, or orthogonal—
Bell: Yeah, and then how to start answering that question with data.
Scott: And then the other question to ask is, to what degree is transparency about the actions of an entity, company or an individual, does that act as a mitigator against bad behavior? So, if you can see someone or catch them in the act, is that generally on balance a good thing? That is also something that would be amenable to analysis. Does it prevent the tragedy of the economists?
Attendee: I'm not a expert on big data, but did you hear Daniel Raley's definition of big data and how it is like teenage sex? Everyone is talking about it, no one knows how to do it, everything thinks everyone else is doing it, so I thought it was a good quote.
Bell: If you think about it, it is not just big data but everything else, AB testing, measurement, analysis, I think that can cover all kinds of stuff. But I think a lot in life is like teenage sex.
Scott: Yeah, except the actual quote was, like teenage sex, everyone talks about it, everyone is interested, and outside of a promiscuous few, no one is doing much.
See, he has the data to prove it.
Chui: One more question and then we will go to the lightning round. Anything else people want to ask? Peter?
Attendee: I'm Peter Von from Independence. I think we are missing in this debate something about human sensitivity; about things like privacy and transparency. So if you start thinking about transparency, like an architect thinking about transparency, an architect has built a fully transparent house. So you can really watch the person go to bed, go to the toilet, go to everything. So there is an element there that the house, a good house provides some form of shelter. And I have the feeling in this debate that's the sort of defense mechanisms that we are still left with as a consumer. Only as a consumer, don't give us freedom of choice, in the sense that the only sort of shelter that we can opt for is a defensive shelter and not one that we on purpose select to find shelter and peace, and peace of mind, and to crawl back in our sensitivities. So it's very, very technical. So we're saying things like "humans don't look at the data," this makes me feel very, very uncomfortable, because it's algorithms looking at the data. But I still don't feel comfortable. It's a sensitivity that I feel it is wrong. And I can't articulate it in front of experts, but that's a dimension I would like to draw into the debate.
Scott: You can make choices. You can, for example, choose to pay cash in bricks and mortar outlets. You could choose to essentially disconnect from the digital economy. And that's a choice. It's not necessarily a choice many of us would make, because the convenience of the digital economy significantly outweigh the inconveniences, at least in many of our minds. But the point is, you have that choice. And I think to some degree, Stephen, that was the point that you were making which was, in a lot of the debate around privacy, as long as you have the choice, you have more power than you think.
Attendee: I don't think we have the choice. I think that's the problem. It's like wanting to buy something from the Apple store, I have the choice to buy if I sign up for the terms and conditions that were dictated by the Apple store or by any other vendor. So I have the option to disconnect completely from the Internet. That is really not an option if you want to function normally in this society.
Attendee: [Inaudible]—image via satellite.
Scott: Stay indoors at 10:30 in the morning.
Attendee: That's the problem. And you are dictating the conditions of my life. And then you are controlling ...
Scott: I have a good piece of information for you. Because the resolution is limited, we won't be able to tell that it was you. Put a big hat on, nobody will know.
Messer: There's a feeling of helplessness, meaning the world is being thrust on me. I'm not sure I am comfortable with it. I don't understand it. I feel like every time I go anywhere where someone explains it to me, it gets more confusing. And I want to be able to live in a world where I can decide what is going on with this information. That's a good point. I don't think as an industry we have done a good job, and I think it is because we are at an incremental stage where we are trying to figure this out and just making this work. So we are not addressing the human engineering factors, or they are not addressed as well as they need to be. And to be honest, we get away with it because—I don't want to call it a generational shift; there is a philosophical shift that has taken place because of the Internet that has gone on now for maybe 16 years. And that shift is, and this is a horrible way of explaining it, it is like we have gone from like the James Bond Society where we all kept our information secret and that was the way we did it, to sort of the Paris Hilton Society where it is open. And you have these two competing views of the world fighting each other, and the Paris Hilton view is winning right now. Right? We had this debate early on with FaceBook, which is I don't want everyone knowing everything I'm doing and just because someone took a photo of me at a party, I don't want somebody to know I was there with someone else. The point is, everyone gave up on that, mostly because they feel like they were getting more back. But they didn't have the discussion of where are we going with these things, and I think the technologists up here see this as an open sourcing of information. And we're looking at it saying there is going to be a lot more benefits because one of the things you heard from, you know, how many people are sitting in seats will tell the AC unit how hard to blow the AC, things like that. And I think that's sort of the way we always see a positive view of the world, but there has to be a way to make me feel empowered. Because I think a lot of the debate around privacy is a feeling of helplessness, not necessarily the actuality of what is going on, although it could be. But it is that helplessness of not knowing what to do or how to fight it. And we should give you the ability to opt out in some way because I don't think we'd care.
Bell: I think on that, there is some—I think there's the new reality, to some degree, of this is not centrally managed kind of thing that we can just shut it down. This is going to be the new reality, and I think that's the unfortunate—an unfortunate truth. But I think the way to start addressing some of the feelings of helplessness is, I think there's many solutions. I think that the easiest ones are that—I think—I think it is ultimately about education, demystifying the analysis, and what data exists, the different types of data, so it's just not so scary. I think there's a lot of fear of the unknown because there is a—simply so much. But I think what is—it is realistic to be able to do a better job just talking about data more, breaking it down, simplifying, let's, you know, types of data and how it is being used, and then also, kind of going back to my earlier point of helping people, knowing the tools of analyzing data, like start using Excel, using the basic functions of Excel, seeing what your standard algorithm looks like, and—or understanding basic statistics. So just demystifies this whole thing that was very scary and big and unknown to seeing how it is really not that scary, not that sophisticated, really, and it's not that personally threatening.
Chui: Thank you. And I want to thank our panel, all of you, I appreciate it.
CTO, Enterprise Information & Analytics Management Practice, EMC Consulting, EMC Corporation