Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World.
by Bruce Schneier
Review by: Owen King
We constantly interact with computers.
Computers generate data.
Data is surveillance.
Surveillance curtails privacy.
Privacy is a moral right.
This combination spells trouble, and we better think about how to deal with it. This, in essence, is the message of Bruce Schneier’s book, "Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World", published earlier this year.
Schneier is a well-known expert on computer security and cryptography. But "Data and Goliath" is not a technical book. Instead, it presents an analysis of the social and ethical implications of new information technology, and unabashedly offers prescriptions for reform. Though Schneier’s analysis is both striking and informative, his prescriptions and justifications are less compelling. Nonetheless, even these latter elements serve Schneier’s overarching purpose of advancing the conversation about the prevalence of surveillance and the role of big data in our lives.
"Data and Goliath" has three parts. Part One, “The World We’re Creating,” is about the state of information technology and how governments and other large organizations use it to monitor us. Part Two, “What’s at Stake,” articulates values, including liberty, privacy, fairness, and security, and argues that the technology described in Part One threatens each of these. In response to these concerns, in Part Three, “What to do About It,” Schneier lays out his prescriptions for governments, corporations, and individuals.
Part One is the most successful of the three. Schneier depicts current technology and its capabilities in a way that should impress any reader, even those who understand the technology only superficially. The crucial idea is that the massive data sets involved in what is now known as “big data” are the quite natural, perhaps inevitable, result of computation, and these data sets constitute a practically unlimited source of personal information about us. This data is generated as we interact with electronic devices that are now completely ordinary parts of our lives: our phones, our televisions, our cars, our homes, and, of course, our general-purpose computers. This occurs with every tap on a mobile phone app, every web page request, and so on; all these interactions create data. Hence, Schneier urges us to think of data as a by-product of computing.
Regarding the potential usefulness of big data, Schneier remarks, “That’s the basic promise of big data: save everything you can, and someday you’ll be able to figure out some use for it all.” Combining this remark with the point about data as a by-product of computing, we might formulate an especially informative definition of big data as follows: Big data encompasses the large, continually growing data sets that are a by-product of computation, along with the methods and tools for collecting, storing, and analyzing them, such that the data can be used for purposes beyond those that guided its collection. This conception, implicit in Schneier’s discussion, answers both the why? and the for what? of big data. In contrast, the standard definitions of big data put in terms of “Three V’s” or “Four V’s” or however many V’s merely list the general characteristics and uses of big data systems.
This puts us in a position to see why big data is such a big deal. The data constantly rolls in, and with some creativity, along with the possibility of joining data sets from different sources, it can shed light on the minutest details of our lives. (Schneier provides many clear examples of this.) The attractions for intelligence agencies and corporate marketing departments are obvious, and we are increasingly living with the results. Over the six chapters in Part One, Schneier describes the developments of the consequent big data revolution in gruesome and captivating detail. Most readers will come away convinced that their lives are far less private than they thought. Reading these chapters would be worthwhile for anyone with even minimal curiosity about the political, economic, and social effects of technology.
Part One is likely to induce at least a vague feeling of fright or anxiety in most readers. Part Two attempts to justify this uneasiness by explicitly appealing to ethical principles and values that big data seems to threaten. A highlight is the chapter called “Political Liberty and Justice,” which emphasizes the so-called “chilling effects” due to ubiquitous surveillance. Schneier compellingly explains how constant surveillance may dissuade us from engaging in some of the morally permissible and, in many cases, legal activities we would otherwise choose. Surveillance thus inhibits us, effectively reducing our liberty. As Schneier recognizes, this was the idea behind philosopher Jeremy Bentham’s famous panopticon—a prison designed to ensure compliance and conformity through (at least the appearance of) constant surveillance. Other useful observations come in the chapter on “Commercial Fairness and Equality,” in which Schneier points out ways in which surveillance through big data facilitates discrimination against individuals or groups.
Unfortunately, the weakest chapters of Part Two are those on which the most depends—viz., the chapters respectively entitled “Privacy” and “Security.” In order to justify his eventual prescriptions for limiting the collection and use of big data, it is crucial for Schneier to show that current big data policies are incompatible with a valuable sort of privacy, and furthermore that the losses in privacy due to big data are not outweighed by the increased security it helps provide. Schneier’s book falls short of a convincing case for either of these claims.
Schneier’s treatment of privacy is provocative, but it will likely be unconvincing to anyone not already on his side. Schneier’s view is that surveillance constituted by massive data collection—regardless of how the data is eventually used—is a serious privacy violation and, hence, constitutes harm. But Schneier does not theorize privacy in enough depth for us to see why we should agree.
Unsurprisingly, there already exists an extensive literature—spanning law, philosophy, and public policy—on privacy and information technology. Though much of that literature is not informed by the level of technical knowledge Schneier possesses, it offers some theoretical nuance that would have helped Schneier’s case against surveillance. If bulk data collection by computers is indeed a privacy violation, it is quite different from, say, an acquaintance listening in on your phone calls. Some state-of-the-art work on privacy, which dispenses with the public/private dichotomy as a tool of analysis, would put Schneier in a better position to address this. For instance, Helen Nissenbaum’s theory of privacy as contextual integrity understands privacy concerns in terms of violations of the various norms that govern the diverse social spheres of our lives. Such a theory may provide resources to better distinguish surveillance that is genuinely worrisome from more benign varieties. Schneier pays lip service to Nissenbaum’s idea that privacy concerns depend on context, but this acknowledgment is not reflected in his scantily justified, though adamant, insistence that surveillance through massive data collection is itself a violation of human rights.
Regardless, technologists, philosophers, and other thinkers, all should put more thought into the open question of how our concern for privacy bears on massive data collection practices. It is no criticism of Schneier to say that he has not resolved this issue. However, without more headway here, some of Schneier’s policy prescriptions are less than convincing.
Like his treatment of privacy, Schneier’s discussion of security feels underdeveloped. Schneier’s central claims are that privacy and security are not in tension, and that mass surveillance does little to improve our security. On the former point, he makes some interesting observations about ways in which privacy and security can be mutually reinforcing. Also convincing is his explanation of why designing computer systems to allow surveillance makes them less secure. Furthermore, Schneier nicely lays out a case for the increasingly accepted claim that mass surveillance is not very effective in predicting acts of terrorism. But these points do not suffice to show that pitting privacy concerns against security concerns imposes “a false trade-off.”
Now, Schneier is indeed right to argue that predicting acts of terrorism with enough precision that we can stop them before they occur is nearly impossible. Schneier cites three factors to account for this: First, predictions culled from mining big data have a high rate of error. Second, acts of terrorism do not tend to fit neat patterns. And, third, terrorists actively attempt to avoid detection. We can accept this and still wonder: What about the would-be terrorist who wants to hurt people but also wants to avoid being caught? The more data we have collected and stored, the harder it is for anyone to do anything without leaving a digital trail. Thus, big data enhances our forensic capacities. And, because of this, mass surveillance may have the effect of deterring would-be terrorists. Furthermore, in the event of a terrorist act, a large intelligence database makes it easier to discover any infrastructure—whether technological or social—that the terrorists left behind, and finding this may help prevent future attacks. Whether these benefits are enough to justify the mind-bogglingly extensive intelligence programs revealed by the Snowden documents is a further question. The present point is simply that Schneier has not addressed all of the security-related reasons that might lead one to favor mass surveillance, at the expense of some kinds of privacy.
Arriving at Part Three, we find a laundry list of proposals for reform, all consonant with the ethical outlook espoused in Part Two. And it does read more like a list than like a unified platform. Most of the proposals receive only a page or two of discussion, which is not enough to make convincing cases for any of them. But that is not to deny the value of this part of the book. Like-minded readers (and even many dissenters) will peruse Schneier’s prescriptions with interest, finding in them possibilities worthy of more thorough scrutiny, development, and discussion. This may be just what Schneier intends; those are the discussions he hopes we will be having more often.
Schneier’s list of proposals includes reorganizing the U.S. government’s intelligence agencies and redefining their missions, increasing corporate liability for breaches of client data, creating a class of corporate information fiduciaries, and encouraging more widespread use of various privacy-enhancing technologies, especially encryption. This is only fraction of the list, and, again, many of the ideas deserve to be taken seriously.
Schneier’s proposals prompt us to think in concrete ways about the costs and benefits of big data in the present and for the future. Big data promises huge gains in knowledge, but sometimes at the expense of a sort of privacy Schneier considers indispensable. Despite the many trade-offs, at least one of Schneier’s proposals ought to be a fairly easy sell for most people. That is the push for more widespread use of encryption.
Nothing about the use of encryption inherently precludes the existence of the continuously accreting data sets that characterize big data. If I am shopping on Amazon’s website over an encrypted connection, Amazon can still collect data about every product my pointer hovers over and every page I scroll through. It is just that third parties cannot see this (unless they are granted access). Such encryption is, of course, already standard for web commerce. But, in principle, all of our digital communication—every text, email, search, ad, picture, or video—could be encrypted. Then, at least ideally, only those whom we allow could collect our data. Thus, we gain some degree of control over how we are “surveilled without severely quelling big data and its benefits. Of course, this would make it much harder for intelligence agencies to keep tabs on us, which is why some government leaders wish to limit encryption.
Overall, the strength of just Part One of "Data and Goliath" is enough to make this book worthy of an emphatic recommendation; it offers a stunningly rich understanding of the possible applications of big data and a visceral sense of some of its dangers. The other two parts are also stimulating, and they provide a helpful starting point for responding to the issues raised in Part One. The appeal of the book is broadened further by the extensive notes, which make it a valuable resource for academic researchers.
Of course, books on technology and society rarely stay relevant for long, since the technology advances so quickly. In spite of this, due to its detailed exposition and the pointed way it frames choices about our relationship to big data, Data and Goliath should be quite influential for the foreseeable future.
Add new comment