Big Data tells us things we didn't know about ourselves. It allows Amazon and Netflix to predict what you want to read and watch better than you yourself can. It allows Google to predict the flu better than the Centers for Disease Control. It tells Wal-Mart that, when storms are coming, people will buy more strawberry Pop-Tarts. Americans are constantly lectured about how Big Data will change their lives; seldom does anyone bother to define it.
There have always been Big Data systems. The average dog is one. A woman decides to take her Irish setter for a walk. She puts on some blush in front of the bathroom mirror, snaps her makeup case shut, and washes her hands. When she emerges, the dog is sitting in the hallway with the leash in his mouth and a "Ready to go?" look on his face. The woman is nowhere near the door. She's said nothing. She's not even dressed. How did her dog know?
He didn't. He pays no attention to the way humans mean things. He just correlates. He has "noticed," if that is the word, that there is usually a snapping sound, as of a makeup case closing, before the woman leaves the house, and that she usually leaves the house to take him on a walk. The system is messy, approximate, and error-prone. But if the dog could increase the number of data points he could do better. Computers allow us to do a lot better. They massively multiply the data points that we can collect, retain, and cross-reference. That is what Big Data is: the use of massive computer power to draw ever more reliable correlations.
* * *
In their new book Big Data, Oxford professor Viktor Mayer-Schönberger and Economist data editor Kenneth Cukier describe the many ways interesting correlations turn into eerily accurate predictions or new technologies. SWIFT, the electronic messaging company, can generate fairly precise measures of total economic activity in a society, simply by using the data from the bank transfers it handles. In the right analyst's hands, the traffic patterns generated by GPS navigation systems can also be a leading economic indicator. Sensor-equipped chairs developed at Tokyo's Advanced Institute of Industrial Technology turn the pressure of a given bum on thousands of transmission points into a "signature" that uniquely identifies the sitter.
Big Data can find out really intimate things about us. Target has managed to identify pregnant women among their customers (the better to bombard them with junk mail), because they tend to buy unscented lotion in the third month of a pregnancy and then mineral supplements a few weeks after, the authors tell us. Britain's second largest insurer, Aviva, and the giant professional services firm, Deloitte, are at work on a "health index" based on website visits, television shows watched, and income. "[I]t may let people applying for insurance avoid having to give blood and urine samples," the authors write. But one can think of other ways a profit-making insurance company might use such information, besides sparing its clients a trip to the bathroom.
Governments have use for such information, too. "De-anonymizing" data plucked off the internet is child's play. Harvard government professor Latanya Sweeney has shown that you can identify about 90% of people with just their birthdate, ZIP code, and sex. But most people leave plenty more information online. To know that someone last year bought a Porsche, an Anne Rice novel, Duke's mayonnaise, Field and Stream magazine, and Fred Perry tennis shoes would probably identify him uniquely. The only foolproof way to hide from creditors, police, and other authorities is—as Osama Bin Laden well understood—to stay off the internet altogether and not buy anything. You can see why a government bent on security would engage in the vast data-sifting programs that former NSA contract worker Edward Snowden revealed in June. Any suspect who does not use the internet belongs to a bygone technological generation, and is assumed too backward to pose any kind of terrorist threat. Anyone who does use the internet is a sitting duck for the forces of order.
Two things, then, can be safely predicted about Big Data: It is going to make a few people richer. And it is going to make a lot of people less free.
* * *
One very interesting thing about Google's flu search, Mayer-Schönberger and Cukier insist, is that Google had no hypothesis or "theory" of what would predict the flu. It did not start by asking how often people searched for the words "headache" or "sniffles." The company's algorithmists simply found 45 terms that correlated best with spikes in flu symptoms. "[S]ociety," the authors write, "will need to shed some of its obsession for causality in exchange for simple correlations: not knowing why but only what."
This is the epistemological paradox at the heart of Big Data, and the authors draw it out nicely: however precise its results, Big Data is essentially a non-scientific way of looking at the world. Chris Anderson of Wired magazine writes that "the data deluge makes the scientific method obsolete." Big Data is the creation of (to stereotype a bit) a bunch of Silicon Valley village atheists and mockers of Sarah Palin. And yet it works not because it allows us to think more like scientists but because it allows us to think more like dogs. Or superstitious pagans. Is that place where the old willow sits below the black cliff haunted? We don't know. We just know that your grandfather went there one afternoon and never came back. So stay away from it.
To talk about "sophisticated information systems" is wrong. Certain of its techniques may be sophisticated, but what makes Big Data useful as a tool for life is its unsophistication. It diminishes the benefit of acquiring knowledge for oneself, in the same way that writing, as Socrates described it in Plato's Phaedrus, "introduce[d] forgetfulness into the souls of those who learn it." Simply ignoring the why and wherefore of facts may bring moral corrosion, too. For the late Italian memoirist Primo Levi, the symbol of 20th-century barbarism was the reply of the Auschwitz guard when the thirsty Levi asked why he was not allowed to break an icicle off a roof to slake his thirst: "Hier ist kein warum" (Here there is no why).
Accepting the principle that correlation is as good as causation inevitably reshapes our sense of how people ought to be ruled. Big Data works by saying, "If you're X, then you're Y" or "If you're X, then you're probably Y"—which, in a world of hurried decision-making, amounts to the same thing. Bluntly put, Big Data reasserts on behalf of powerful corporations a right that has been stripped from other Americans over the last half-century: the right to "stereotype" or to "profile." The authors note, for example, that many start-ups are enamored of Facebook's "social graph" (its data on who hangs out with whom). They hope it can provide "signals for establishing credit scores. The idea is that birds of a feather flock together." The authors aim to distinguish this kind of stereotyping from old-fashioned bigotry by noting that it seeks "to identify specific individuals rather than groups." But this is all that traditional bigotry does. Big Data simply changes the basis of the stereotype, and renders the victimization slightly less obvious. It provides the old heuristic "benefits" of discrimination without discrimination's stigma.
* * *
Big Data says nothing about the principles on which people will be governed. The authors are wildly overconfident that those will go without saying, and that they will be compatible with our own principles. For instance, they oppose Google's insistence that prospective employees provide their SAT scores and grade point averages. "By Google's standards, not Bill Gates, nor Mark Zuckerberg, nor Steve Jobs would have been hired, since they lack college degrees." But this is sheerest political correctness—that's why they ask for the SAT scores, dummy!
Similarly, although the authors don't go into much detail about security, they seem to believe that a "neutral" and non-discriminatory system of catching terrorists is easily designed. They guffaw over the clumsiness of the U.S. data professionals who designed the post-9/11 "no fly" lists. "Had there been an algorithmist on staff at the Department of Homeland Security in 2004," they write, "he might have prevented the agency from generating a no-fly list so flawed that it included Senator Kennedy." Not necessarily, assuming the algorithmist was forbidden to consider the factors that might have kept Senator Kennedy off the list, such as age, ethnicity, religion, and place of birth.
Governing norms do not simply generate themselves from within Big Data. They get imposed on Big Data's workings from without, by interested parties. History tells us for a certitude that all these techniques—unless resistance is offered or regulation applied—will be abused. One abuse is already evident. Whatever moral meaning one imputes to Edward Snowden's revelations, the comprehensiveness of the National Security Agency's telephonic metadata sweeps is absolutely shocking. U.S. spy agencies retain almost literally everything on everybody. And there is every incentive to retain it forever. If storms cause Pop-Tart purchases at Wal-Mart, maybe some factor we aren't yet paying attention to will prove to be the best predictor of treason or violence. How foolish we would feel if we threw out one iota of the data that might have allowed us to identify tomorrow's terrorists.
* * *
The authors note that today's information technology can be far more intrusive than the Stasi, or any of the notorious European police forces of the mid-20th century, ever could be. Cell phones show where we move, second by second, and the information takes a negligible investment of manpower to decipher. And yet the authors raise security concerns only to pooh-pooh them. "We won't go to prison if Amazon discovers we like to read Chairman Mao's ‘Little Red Book,'" they jest. What a straw man! The fact is, we don't know what traits, beliefs, or behaviors the NSA considers correlated with terrorism and therefore grounds for heightened scrutiny or assassination by drone strike. (If we knew, the surveillance would not work.) Are you comfortable making a sexist joke in an e-mail? Ridiculing the leadership of the NAACP? Sending money to Julian Assange's defense fund? The authors unwittingly call to mind those Soviets who claimed their country was as free as the U.S. because there, too, one could criticize the American president.
A second worry is cronyism, for which Big Data opens new horizons. Consider (as the authors do not) the symbiosis between the present White House and information technology corporations. There was $4 billion in the stimulus package for the computerization of health records. Over $3 billion was budgeted to build the "smart grid" that allows authorities to root out inefficient energy use. More has been doled out in dribs and drabs, including last year's Big Data Research and Development Initiative. To the extent that corporate allies of the president use this data, they profit not just from tax subsidies but also from an in-kind levy on the public, in the form of resalable data. It may not be surprising that in 2011-12, according to the Center for Responsive Politics, Google employees gave 20 times as much money to Barack Obama ($802,000) as to Mitt Romney ($40,000). This is to ignore some of the less formal ways moguls connect with politicians. Former Newark mayor and newly elected New Jersey senator Cory Booker secured a promise from Facebook's Mark Zuckerberg to invest $100 million in Newark's schools. In August, the New York Times reported that Booker had become a "co-investor" with Google's Eric Schmidt in Waywire, a "video curation startup," with a stake of $1-5 million. Should Senator Booker ever come to vote on an anti-trust bill for the information age, those may wind up being the wisest investments Schmidt and Zuckerberg ever made.
* * *
The alliance of Big Data and Big Government is an intellectual humiliation for those conservatives who spent the last three or four decades hammering away at the state's taxation authority in order to unleash entrepreneurship as a countervailing power. The businesses thus "unleashed" turn out to be ones that thrive in collusion with politicians. Much as they did in Iraq, conservatives have, with the best intentions, defanged their enemies' enemies. Grover Norquist and Americans for Tax Reform campaign for the tax cuts of Mr. Schmidt and a Silicon Valley plutocracy that backs the president almost unanimously, sparing Obama the embarrassment of protecting their interests himself.
The authors have gathered a good deal of private information from Silicon Valley corporations. They deserve credit for sharing it. But they are in the same sort of reporter's bind faced by those Capitol Hill journalists who need to cover committees day by day. A certain official optimism is required to ensure continued access to a very limited number of key players. When they say that "Google benefits from vertical integration in the big-data value chain," they are dancing around the word "monopoly." When they talk about "creative outsiders," they mean money men. The prose in this book is repetitive and cacophonous. Often they seem to describe Big Data in the language of corporate Info-Paks: "Lest there be any doubt," they write, "big data saves lives."
Big Data algorithms often escape common sense and easy regulability. Those who create them have a powerful incentive—as the designers of financial derivatives did a decade ago—to render them opaque. Yet the privacy problem that most agitates the authors is the prospect that companies might have to reveal "confidential business strategies to outsiders." The authors' suggestion of a "privacy framework...focused less on individual consent at the time of collection and more on holding data users [corporations] accountable for what they do" sounds awfully convenient for the data users. In fact, it sounds a great deal like the voluntary compliance that was expected of banks in the Alan Greenspan era.
Some of the big constitutional questions of our time will revolve around who owns, who controls, and who can benefit from this data. Once it had computerized its files, MasterCard discovered that people who fill up their gas tanks in the afternoon tend to spend $35-50 on food in the next few hours. Why is this goldmine of information MasterCard's to sell? They didn't produce it, they didn't contract for it (at least originally). It is not, in fact, obvious why MasterCard should be the sole owner of the data you generate when you use their cards, or Amazon should pocket all the value that comes from the information you generate while shopping on their site. It was never considered ethical for the Bell system to sell information gleaned from listening in on subscribers' calls.
True, many websites require users to consent to a "cookies" policy under which they surrender data on very lenient terms, but this is a consent wrung out under such information asymmetry and such unequal network power that it resembles an oil-company geologist buying mineral rights from a bunch of illiterate Third World tribesmen. If Americans were indeed such tribesmen, and if the commodity that had been taken from them were rubber or cobalt or wheat instead of their personal information, there would be marches on Washington and denunciations from every pulpit in the country.
* * *
The fiasco surrounding Facebook's initial public offering in May 2012 is a good illustration of how precarious is the sense of where on-line property rights reside. The company's value had been estimated at $104 billion, on assets of $6.3 billion. Pundits expected the stock to level off at about $80 a share. Instead it closed its first day of trading at $38 and three months later had fallen to only $20. (It has since regained its initial value.) Some blame computer glitches in the IPO. The authors say markets are too dumb to price data. But pricing data is doable, using many of the same tricks Nobel prize-winning economist George Stigler developed to price search costs in the 1960s. More likely what makes markets nervous is that almost all of Facebook's value is in one asset—its data—and its claim to that asset is contestable. Facebook is not a revolution in technology; its technology is not essentially different from that of hundreds of startup companies. It is a revolution in property rights.
Government and the Big Data companies are, one way or another, bound to grow closer. Governments must have as much capability as those they rule. That makes the property rights regime governing Big Data unstable. The companies that use it seem destined in the coming decades either to be taken over as utilities, broken up as trusts, or brought into partnership with government in a kind of mercantilist setup. The situation calls for caution. The consequences of on-line institution building are hard to predict. Those people, after all, who used to say a decade or two ago that "information wants to be free," appear in retrospect to have been clamoring for their own enslavement.