placeholder ad
Watch

Cities, Companies, and Cells: Big Data Needs Big Theory

Cities, Companies, and Cells: Big Data Needs Big Theory

Geoffrey West, physicist and distinguished Professor of The Santa Fe Institute, discusses the importance of collecting and analyzing Big Data.
Read the full transcript below. (Transcript by Realtime Transcriptions.)
Kirkpatrick:  Coming up next, Geoffrey West, who is a distinguished professor and formerly the President of the Sante Fe Institute and somebody who really is one of the people that we at Techonomy most look to to help form our own thinking about where the world is going.
He gave an incredible talk at the first Techonomy back in 2010 that really shook up a lot of people, and I think he's going to touch on a couple of related themes here today. But he's one of the deepest thinkers about the connection between biology and systems broadly, and particularly cities.
I'm very eager to hear what he has to stay about cities now. Geoffrey, please join me. And thanks so much for being here.
[APPLAUSE]
West:  Thank you, David. Pleasure to be here.
I'm a bit of a fish out of water. I'm a physicist, and I spend most of my time thinking about trying to make theories and so on and so forth. One of the things I'm going to try to do today is bring a physicist perspective to the "hype," so to speak, of Big Data. I will do it in the context, as you see, of cities, trying to understand cities and companies, and some biology.
And if I press that, does the first thing... yes. This is the title of the talk that, unfortunately, I did not get done in time to go into the booklet.
You know data is obviously good. No one can be against data. Data is the lifeblood of science, of technology, engineering. It also ought to be the lifeblood, or at least some of the lifeblood of businesses, of commerce and finance and so forth. And maybe the era of the big data will be mostly represented that way, and that's what we've heard in many of the sessions in the last day or so.
But there are some concerns that I'm going to raise, as you'll see in this talk, and I'm going to do it very much, as I say shamelessly, as a physicist. Not only is data good, the more the better, in a sense. Except that if it's done mindlessly—and I will talk a little bit about this—there's a serious problem of unintended consequences and even all kinds of dangerous consequences that I will talk just a little bit about.
But I want to put that in a context. And what I mean by maybe dangerous is that if it's done without an explicit or verbalized or at some level articulated conceptual framework—and that's what I will lead up to in terms of the idea of getting theoretical concepts into the discussion of how to use big data. Because in order to decide how much you want, where you want to put your sensors, when, what, and so forth, you have to have a framework. And by just even going out and putting sensors somewhere or making measurements somewhere, you've already got implicitly a framework, because there is no such thing as virgin data.
So I'm going to maybe, I don't know, I hope this is not out of context. But I wanted to spend just a few minutes on kind of the freshman history and philosophy of science. And I want to remind you of the father of big data, Tycho Brahe, who you know looked at the heavens and made all these extraordinary number of measurements of all kinds of moving objects that he saw in the sky.
But it came to his assistant, Kepler, who realized that in looking at these heavenly bodies and the motions of them, that despite the fact that each one—and I'm going to do a little thought experiment now—if we imagined that Kepler had known what we know now, that all these planets actually are quite different, they are different sizes, they have different characteristics. They have different features and so on. And yet what Kepler discovered, that despite their vertical axis, the cube of the distance from the sun of the planet versus on the horizontal axis, the square of the time, everything lined up.
So there was some extraordinary regularity in the data, and that's become known as Kepler's laws. That was great. But what was even greater was the recognition that had led to the idea that these could be derived from two generic laws, Newton's Laws of Motion and Newton's Laws of Gravitation. And what was remarkable was, not only did they describe the planets. But what was truly profound, any motion that any of us make, when you get in your plane, when you get in your car, are satisfied by those laws.
Every single motion that happens in the universe in some ways is manifested in those laws. So these two equations—you don't have time to understand them—but that symbolism represents actually an infinite amount of data that's encapsulated in that. And they can be generalized.
Similarly, Maxwell's equations, which are the equations that encapsulate the myriad amounts of data that are involved in electricity of magnetism and unified them, have in them, which was realized by Maxwell that by putting them in symbolic form and therefore encapsulated effectively infinite amounts of data, that it predicted electromagnetic waves so that I can press this and this bloody thing will move.
And we would not have a Techonomy meeting had that not been realized. And I would seriously doubt that if one only thought of the data, if one only assembled the data, one would never have had the concept of electromagnetic waves just looking at little charges of electricity and magnets. That's what happened with the development of science. So the role of theoretical thinking and conceptual framework is crucial in encapsulating data.
So this developed into, of course, the whole basis of the physical sciences, and maybe all of sciences, the search for underlying principles that in fact is a summary of data and leads hopefully for quantitative predictive framework. Because that's what we're all looking for, no matter which area we apply this to.
However, this framework starts to have some serious challenges outside of the kind of systems for which it was developed and where the data was assembled when we come to things that are like cells. These are highly complex phenomena. These are adaptive and evolving, things like our brains, like ecosystems, like cities, like weather, like the stock market and so on.
Even things like the fact that this guy Gallo is going to turn to this, and the fact that everybody in this room and every company that you work for is going to die in some reasonably finite time is encapsulated.
So the question is, Are there laws that describe those phenomenon that are somehow extensions of this kind of paradigmatic Newton's laws.
Now, the answer is actually open to question. And part of the problem of dealing with these systems is that they are evolving and adapting. And I've written down a laundry list there for you to skirt over—I'm not going to go through it—of the multiple characteristics of the kind of systems for which I just showed you pictures.
They are adaptive, they are evolving, they are historically contingent, they're nonlinear, they're out of equilibrium, and so on.
So can we have a framework for discussing those. That has led people to go in separate directions, one of which is we can understand these only by assembling huge amounts of data on them, and that's it. And that's been called sometimes "the fourth paradigm," and is encapsulated in its most extreme view by this ludicrously absurd article, may I say, by Chris Anderson in Wired, which says: Throw out all that other old rubbish of Newton's laws and Maxwell's equation, which actually gave us all this, because we don't need it. All we need is big data.
One of the reasons I'm bringing this up in this context is I hear a lot of that when I talk to people in the political and in the business sector; that somehow that will be the panacea for all good.
And I'm reminded of the following that many of you know who this man was. This was Ernest Rutherford, who discovered the nucleus of the atom and gave us this kind of solar system model of the atom.
Ernest Rutherford was much like Chris Anderson, very taken by the kinds of things he was involved in and made this wonderful statement, that all science is either physics or stamp collecting.
It's the same kind of ludicrous and arrogant statement that all I need is data.
So I want to spend just a few minutes before taking this theme and give you a few cautionary tales of hype that you are familiar with. In the last few years, one is the promise of nano and biotechnology that is going to kind of solve everything.
In the '90s, I could give you quotes—I took them out of this talk—of the kinds of promises that were made that it is the panacea for everything, as was the human genome project, which is a big data project. And the fact that that was supposed to somehow solve all of our problems of disease and health and so forth.
And I want to go back with that theme to something where there was very little data and very little science, and that was going from bicycles to airplanes. That was done by the Wright brothers, who were bicycle builders, and they didn't know much science. They actually didn't know very much engineering, really. But they were extremely intuitive, and they were very smart.
And one of the things that they did do is they recognized that one of the problems with airplane flying was the question of control. And they actually did some science, a teeny-weeny bit of science. They built a wind tunnel, a very crude wind tunnel. And that helped them gain more intuition about control. And that led to a true paradigm shift in a way. Change the way we travel.
That was great. But then the big question comes, Can you scale that up to this, which is the dream liner, without science?
The answer is obviously no. And indeed, you need to know the laws of aerodynamics, the physics and materials, and so on, in order for Boeing to build such an airplane. So scalability from something that may be a prototype to something that is actually going to work and be much bigger and much more powerful requires understanding and the science and getting the data that is associated with it.
By the way, there is another piece to this diagram. That is, you notice that you're all very familiar with, this airplane is built from pieces all over the world. So Boeing, which are masters of engineering, are bozos in my opinion of complexity, because they created a complex adaptive system by creating this extraordinary network of interacting parts to build this airplane. And as you know, this airplane in terms of manufacturing is several years late. And if we lived in a serious free market system, this company should and would have gone bust.
So I want to move on to big data. It's something that relates to some of the talks, the sessions that have been on. And that is I think that one of the places where big data should play a major role, and something that you should all be and are very well aware of, is something, the most extraordinary thing that's happening around us, and that is that we are living in an exponentially expanding socioeconomic universe; namely the urban population is expanding at this unbelievable rate.
So much so, if you read that bottom line there, that is equivalent to every two months on this planet there is equivalent to a New York metropolitan area. 15 million people are being urbanized every two months into the future, to the mid century. That is extraordinary, and everything you do is affected by this and will be affected by this. And you better learn to understand it and come to terms with it.
And indeed, we better develop a serious science of what's going on, and in particular a science of cities. Because this is the driver—it's a feedback mechanism, of course, to the open-ended economy that we all want and believe in.
And the cities across, all the origin of most of the challenges we face, from finance all the way through to global warming and the environment and energy and resources and so on, they are the source of all our solutions, because that's where all the smart people live.
So this is integral to cities, the open economy. And all of these questions and all of these challenges that are in this other laundry list are all urban driven. And because of that, none of them are independent.
So one of the big questions and one of the big issues is, you cannot treat them and you should not treat them as stove pipe issues, because crime is actually related in some curious way to financial markets and urbanization and to pollution and so on, because each one of these is a complex, adaptive, evolving system. And the whole fucking mess is one great big complex, adaptive, evolving system. And that is a technical adjective that I used, by the way.
So we need a kind of new paradigm, a new conceptual framework for dealing with these kinds of problems that is more systemic and holistic. And critical in that is the role of big data and getting big data, but we need the two to be highly coupled, the big data with the conceptual technical framework.
And another laundry list. These are the kind of things that people—these are just words in this talk, but these are the kinds of things that people do, some of which you're familiar with. Things like network theory and Bayesian analysis and scaling theory and agent-based modeling and so on.
But they all must be coupled with big data. Big data alone won't do it, and these alone won't do it.
This is the image of cities. This is the image of cities we love. This is what attracts us to cities, the sense of city, the sense of commerce, the sense of goods and the good life. You know, the kinds of things we're enjoying even here, even though we're not really in a city. But the culture and the good restaurants and all the rest.
But the whole point of a city is this. This is a picture of New York 100 years ago. This is what made New York great. This is what made United States great. The buzz and the activity, the interaction of people. The entrepreneurship. The wealth creation, ideas and so on, all integral to the city.
Cities do not look like this. But the whole point of cities is to create the spirit of this. And that's what we want to do and that's what we need to do to continue the whole thing.
But we do this as a price. There is another fundamental law of physics that I didn't mention that you are no doubt familiar with. It's called the second law of thermodynamics. The second law of thermodynamics said you had lunch earlier today and you had breakfast. Sometime in the next few hours, few days, certainly in the next few weeks, you're going to have to go to the bathroom. It isn't for free. Something is—economic gentrify. These are the kinds of things, the price which we pay for all the good things we have. And the question is, is that going to overwhelm us? And is it all going to look like this?
So a crucial aspect of this is the fact that if you want a system that is resilient, sustainable, evolvable and growing, it better be scalable. And we as animals are extraordinarily scalable.
Here is an example. This is the most fundamental quantity in your life. This is how much food you simply need to stay alive and do nothing else, just stay alive. So what's plotted on the vertical axis is metabolic rate in energy, which is the food you eat. And on the horizontal axis is mass of us, except "us" meaning a whole bunch of animals.
What you see next is something amazing. Each one of these animals, each subsystem, each organ, each cell type, each genome has evolved with its own unique history and its own unique environmental niche. So you have expected, if you plotted anything, there would be points all over the map representing that unique history, that historical contingency. Yet you see this extraordinary straight line plotted on this logarithmic plot. Logarithmic simply means you go in factors of 10 in both directions, as you can see:  1, 10, 100, and so on.
There's something extraordinary going on here, which is constraining natural selection. Furthermore, there is another remarkable thing about this graph; that the slope of it is three-quarters, roughly, less than 1. Which means that instead of doubling the metabolic rate when you double the size of an animal, which is what you would naively expect because you have double the number of cells, you actually only need 75 percent. That's what that three-quarters means.
So there is an extraordinary economy of scale which transcends biology from inside your cells, all the way up through ecosystem level. This kind of economy of scale and this kind of regular behavior in terms of anything you want to measure that is physiological or to do with your life history.
Here is just one. Your heart rate. Our heart rate. Very regular. Goes down. And, in fact, one of the things that comes out of this, is that the pace of life decreases the bigger you are, in a systematic way. And all of this is because we are controlled and we operate and are sustained by networks. And it is the mathematics and the underlying physical principles of networks that transcend design. Because this is true for all organisms: Plants, trees, insects, fish, mammals, whatever.
There they are. There is a bunch of networks. It is the properties of those that can be mathematized that give rise to those scaling laws and the scalability.
I'm just going to show you one trivial—not trivial—important example. And that is growth. What happens? I just said it. You eat, you metabolize, you send through these networks the energy that maintains the cells that are there, replaces ones that have died, grows new ones. And you can put that all into mathematics using this kind of network framework. And when you do it, this is what happens if you just do it for the rat. You get that line there. That solid line is a prediction from the theory and those points, the data points.
And the important point to make here is not just that you have a theory to understand the growth of any organism, which agrees very well with data. But, importantly, you stop growing. We stop growing. This is called sigmoidal growth. And that stopping of growth is intimately related to the economy of scale.
This is completely different than what we demand, have demanded in the last couple hundred years of our economy and our socioeconomic systems.
Now let's talk about cities. Cities, too, are network systems. They are supplied by networks. They are networks, and most importantly is, in the bottom two little quadrants, we are social networks. That's what the one on your left I guess is. And on the right is the modularity of those networks, the fact that we have families and so on and so forth, and jobs that we do it modularly.
So the question is, do cities scale like animals? So an elephant in a blown-up human being, amazingly, at the some 85, 90 percent level, in terms of all these variables one looks at.
So the question is, in this graph—well, I'll do it in the United States—is New York a scaled up Tucson, which is a scaled up Sante Fe? Well, you check it with dig data. This is big data—well, maybe medium data. This is gas stations as a function of size, again, plotted going up by factors of 10.
What you see is there's good evidence that things scale. And, just like in biology, for this part of the infrastructure, that dotted line is linear. This is scaling with a slope that is less than linear, sublinear, which means economy of scale. The bigger you are, the bigger the city is, systematically the less number of gas stations it has. In the same way the bigger you are as an animal, the less energy you need for each cell to operate at.
So here is what is amazing. This same graph is true for any infrastructure, anything you want to measure about cities anywhere in the world with basically the same slope. And the rule here is if you double the size of the city, you save on the average, 15 percent every time you do it on all infrastructure. That's interesting. That's like biology. But that's the dull, uninteresting part of the city. That's the infrastructure.
The interesting part of the city is its socioeconomic activity. If you look at that, for example, the top graph is wages. And on the bottom graph is so-called by Richard Florida "super creative people," like everybody in this room.
Then you find, you see scaling. There's more noise in the system. But what you see is that the bigger you are, you see something that's different than you ever see in biology. The slope of this is bigger than 1. It's 1.15, roughly. The bigger you are, you have more wages per capital, and you see more sexy, super creative people per capita, roughly speaking to the same degree. But what is amazing is that the number of patents you produce in a city increases in exactly the similar way. As you increase the size by this 1.15, the amount of crime you have increases by about the same amount.
And you have—I can show you lots of other graphs. And you have this generic rule that turns out to be true for all countries where you can get data across the globe for any metric that has anything to do with socioeconomic activity, meaning activity between people.
You double the size of the city, income, wealth, number of patents, all these various things I wrote down there. All increase by about 15 each time you double, and you save 15 percent on the infrastructure.
The bigger the city, the better. You should have big cities. Big cities are better. They're greener per capita. New York is the greenest city in the United States in terms of carbon per capita, all because of these rules. And where do they come from?
How in the hell can it be that the way the Japanese cities and Portuguese cities and U.S. cities and Chinese cities all scale in the same way, even though they have evolved completely independently? What the hell was going on when they didn't interact? What the hell is going on is the fact that all cities are people, and it is the social interactions among people that is the city. That's what gives rise to this.
And I don't have time to talk about the mathematics of that. But this encapsulates an enormous amount of data, and data is very important. That's what I'm saying. The universality of social networks, the fact that the social networks of China, even though people have a different culture, the people are the same at this level of talking as the social networks of the United States or Australia or Albania. So there is a statement of universality.
That graph on the top left is income, GDP, crime and patents, all plotted on the same graph. You can see there's quite a bit of noise, but nevertheless they all scale in the same way with this 15 percent rule.
Down on the right is some recent work we've done with MIT, the people at MIT, taking billions and billions, actually, of phone data, and using that as a way of measuring the interaction among people and asked how does that scale with city size; and if it is due to social networks, it should scale in the same way as all these socioeconomic quantities.
You can see visually that it does. What is plotted on that graph in the bottom right is data from the United Kingdom and from Portugal, just to give you two examples. It's identical.
Okay. It turns out that you can also show that the pace of social life has to increase with size rather than with, as you saw in biology, it decreases with size. The bigger you are, the faster everything goes. And one of the things that comes out of that is you can ask, how does that feed into growth?
You can derive a bunch of equations for the growth of a city or growth of a socioeconomic entity. What you find instead of its stopping growing, this super linear behavior, this more per capita as you get bigger leads to open-ended growth, which is great, except that it has a very bad problem with it. That is, it has something that says that the system will eventually collapse.
And anyone that's ever taken a physics course knows, in any case if you have exponential growth the system will eventually collapse. And this theory shows that. The question is, how do you stop that?
Well, you stop it by innovating. This is where innovation comes in. What you have to do is the following: You're growing within a certain innovative paradigm, and then you would collapse. But you have to invent something, you have to change the paradigm, and then you grow again. And then you would hit another point at which you would collapse, and you have to keep doing this.
That's what the theory says. There is a theorum that if you want open-ended growth, you have to keep continuous cycles of innovation. Great. That's what we've done.
However, there's a catch. The catch is, as you go up each one of these curves, life gets faster. That's difficult. And life has gotten faster. But more importantly, the time between innovations has to get shorter and shorter, systematically. So something that took 500 years 2,000 years ago now only takes 20 years, and so forth.
So you have—the clock is accelerating, and you have to do things faster and faster. Finally—I'm going to finish off very quickly and talk about companies, just to give you a sense of what's done there.
We have looked at data on 30,000 U.S. publicly traded companies since 1960. That's all of them. What you see, there's all kinds of metrics we've looked at. What you see is unlike cities, but like biology, companies scale. Instead, they all scale. But there's a huge amount of noise. But they scale sub lineally rather than super lineally like cities. And if you follow the argument through and you believe everything I said, then sublinear meant that you stop growing—and I didn't say this, and you die, which is something I didn't have time to talk about.
Super linear means you have open-ended growth, and you are in some sense immortal on the scale of human life, anyway. So here it would say that all companies stop growing and eventually die.
So there's data on all those companies that's adjusted for inflation. If you adjust it for the GDP, everything is flat. So all those big companies are just floating on top of the GDP. They're contributing to the GDP. Eventually they will all die, and hopefully this will come up.
What happened? I hope we haven't lost this graph. They died. They died on me!
Can I go back? Can you go back? It's a fun graph to see. What it is, it's taking the data on all U.S. companies and showing that the probability of survival goes to zero. You can ask yourself what is the average lifespan of a U.S. publicly traded company. That's a company, of course, that's already been on the stock exchange. And it's less than ten years.
So the expected lifespan is a little less than ten years for any company that's publicly traded. And those companies, all companies eventually die. There is one company that lived for 1400 years on this planet, and it died two years ago.
Big data needs big theory.
If you give me one minute, I wanted to do—since I'm a physicist and I love physics, and this year the Nobel prize was for a thing called the Higgs boson. You read about it. There was a lot of hype. Here it is.
This is truly big data. This blows your mind. There's the machine, the Large Hadron Collider, and there on the right is the detector. It has 150 million little detectors in it, and you can't see—maybe you can. This down on the right is a human being. It's humongous in size. Here's the data on it. There's 150 million sensors. It takes—600 million collision events every second is being recorded.
And if all of that were recorded, the data flow is 150 million petabytes per year—and I hope I did all these calculations right—corresponding to 500 exobytes per day. That's 200 times larger than everything that's going on on this planet in terms of the internet. It's extraordinary.
So you can't do it!  This is, to use that expletive again, squared data. This is unbelievable.
So how in the hell are you going to get everything from that? The only way that you get something from this is you have a conceptual framework and a theory. So this is an extreme view.
So you can say most of this data is useless. We don't need it. So all that data you take on human beings to show that everybody has two eyes is not very interesting, for example. There's a lot of data like that. So you have a framework.
And, in fact, as you can see down there, all you actually detect in terms of recording and analyzing it is .001 percent of it. 200 petabytes per year, which is manageable.
And by the way, thinking about these problems led to the Worldwide Web, as you well know. So Tim Berners-Lee. That's why he was just a regular computer scientist, and people were charged with trying to understand how to deal with this stream of data at this extraordinary rate. And out of that, in some miraculous, beautiful way came the Worldwide Web.
So having that kind of framework is crucial. So I will stop there, and I apologize. I'm sure I went on for too long. Thank you.
[APPLAUSE]
Kirkpatrick:  We love the way you think, Geoffrey. We love the way you think. Thank you so much for doing it in front of us. We love it.
 

Participants

Geoffrey West

Distinguished Professor, Santa Fe Institute

Scroll to Top