sais.34.1

161
This issue is provided by the Johns Hopkins University Press Journals Division and powered by Project MUSE ®

Upload: booz-allen-hamilton

Post on 27-Jan-2015

160 views

Category:

Documents


12 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Sais.34.1

This issue is provided by

the Johns Hopkins University Press Journals Division

and powered by Project MUSE®

Page 2: Sais.34.1

Terms and Conditions of Use

Thank you for purchasing this Electronic J-Issue from the Journals Division of the JohnsHopkins University Press. We ask that you respect the rights of the copyright holder byadhering to the following usage guidelines:

This issue is for your personal, noncommercial use only. Individual articles from this J-Issue may be printed and stored on you personal computer.

You may not redistribute, resell, or license any part of the issue.

You may not post any part of the issue on any web site without the written permission ofthe copyright holder.

You may not alter or transform the content in any manner that would violate the rights ofthe copyright holder.

Sharing of personal account information, logins, and passwords is not permitted.

Page 3: Sais.34.1

1ForewordSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

1© 2014 by The Johns Hopkins University Press

Foreword

Following the exposure of the U.S. National Security Administration’s (NSA) controversial surveillance program, there has been heated debate

surrounding the collection and storage of personal data. Our latest issue of The SAIS Review of International Affairs, “Policy by Numbers: How Big Data is Transforming Security, Governance, and Development,” seeks to move beyond the sensationalism that has accompanied the NSA revelations. We hope to provide readers a more nuanced perspective on the role of data in international affairs, with a diverse collection of interviews, essays, and opinion editorials from scholars, technologists, and policymakers.

We explore the rise of big data, in which governments and profit-seeking organizations make policies and predictions based upon correla-tions among massive quantities of data. We examine the trend toward open data, in which governments provide valuable datasets directly to the public. We assess the impact of data—positive and negative, international and do-mestic—on public policy, national security, international development, and individual well-being.

While the rise of big and open data is associated with promising ap-plications, there are still vast uncertainties regarding how best to exploit this technology. We hope that readers from the academic and public policy communities will feel empowered to enhance their understanding of techni-cal tools and data analysis, in an age where technological innovation often outpaces government policy.

We begin with a conversation with Robert Kirkpatrick, Director of the United Nations Global Pulse Initiative. The UN Global Pulse Initiative collects and analyzes real-time data to better protect populations from socioeconomic shocks. Kirkpatrick explores the challenges associated with big data analytics, the surprising correlations among seemingly unrelated datasets, and the initiative’s effort to predict food price crises with data from social media.

Human rights data often impacts policy decisions. The next three articles explore the opportunities and risks associated with collecting and analyzing this sensitive information. Megan Price and Patrick Ball use case studies of violent conflicts in Syria and Iraq to evaluate data-gathering methodologies in conflict scenarios. They warn that datasets from conflict scenarios are often subject to bias, and should not be used in isolation to draw conclusions. Monti Narayan Datta argues that the collection of quantitative data on modern day slavery has generated discussion in media and among policymakers on how to mitigate and eradicate slavery. Our interview with Arch Puddington, Vice President for Research at Freedom House, discusses worldwide trends in freedom, and the impact of Freedom House’s annual reports and indices.

Page 4: Sais.34.1

2 SAIS Review Winter–Spring 2014

The rise of big and open data has a powerful impact on government policymaking. Pongkwan Sawasdipakdi frames data as an information weapon in the context of Thai domestic politics. She examines the govern-ment’s rice-pledging scheme, and argues that contrasting datasets from the government and opposition parties are used to gain political power and credibility. Ian Kalin describes the theory and practice of open data policy in the United States, and explains how government leaders can replicate successful open data initiatives. Joel Gurin argues that open government and open data can improve economic growth, transparency, and citizen engagement. He also notes the obstacles for implementing open data initia-tives in developing countries.

How can policymakers craft policies and frameworks that best take advantage of big data? Given the fast pace of innovation and the slow pace of policy, Kord Davis discusses how to bridge the gap between policymak-ers and innovators. He identifies spaces where the public and private sector can collaborate to produce effective and balanced policy. Aniket Bhushan argues that the rise of big data and open data has created an opportunity for disruptive innovation in international affairs. He offers examples related to real-time macroeconomic analysis, humanitarian response, and poverty measurement.

Data impacts national security and individual privacy, as well. Chris Poulin outlines the processes of data collection and analysis, using case studies from the Arab Spring, medical risk analysis, and his work at the Durkheim Project, a data analysis initiative that seeks to predict and prevent veteran suicides. David Rubin, Kim Lynch, Jason Escaravage, and Hillary Lerner explain how to balance the opposing forces of opportunity and risk, collective security and individual privacy, and innovation and protection when using data for national security programs.

Finally, we look to China for lessons on data infrastructure. Eric Hagt traces the history of China’s satellite navigation system, Beidou, and com-pares its potential as a tool for development versus domestic and national security. Margaret Ross statistically analyzes the risk of potential disrup-tions to the global undersea cable communications network.

We conclude with analyses of influential literature and scholarly re-search. Ilaria Mazzocco reviews Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schönberger and Kenneth Cukier. Bartholomew Thanhauser reviews Evgeny Morozov’s To Save Ev-erything, Click Here: The Folly of Technological Solutionism.

We would like to thank our advisory board for their guidance in shap-ing our exploration of data, our excellent editorial staff for their dedication and persistence, and our authors for their thoughtful work on complex global challenges. Their combined contributions made the publication of “Policy by Numbers” possible.

Meghan Kleinsteiber Lauren CaldwellEditor-in-Chief Senior Editor

Page 5: Sais.34.1

3A ConversAtion with robert KirKpAtriCKSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

3© 2014 by The Johns Hopkins University Press

A Conversation with Robert Kirkpatrick, Director of United Nations Global Pulse

You are director of United Nations Global Pulse, an initiative to leverage real-time data and analytics to monitor impacts of inter-national and local shocks. How did the idea of Global Pulse come about? What is your mission statement?

The initial idea of Global Pulse came about in the aftermath of the global financial crisis. There was a recognition that we live in a hyper-connected world where information moves at the speed of light, and crises and vul-nerabilities can emerge quickly, but we’re still using two- to three-year-old statistics to make most policy decisions. It was clear that there were swathes of people being pushed below the poverty line almost overnight, and we needed to modernize our systems and capacities for absorbing real-time informa-tion for decision-making.

As a result, United Nations Secretary-General Ban Ki-moon established Global Pulse in 2009 to act as an innovation lab and catalyst for the United Nations. We bring together global de-velopment experts, as well as experts from academia and the private sector, to explore how analysis of big data can reveal faster insights about human well-being and emerging vulnerabilities, in order to better protect populations from hunger, poverty, and disease.

So Global Pulse’s mission is to accelerate the use of data science for sustainable development and humanitarian action, to address systemic bar-riers to adoption, and to cultivate a robust innovation ecosystem.

Robert Kirkpatrick is the director of UN Global Pulse, an initiative of the Executive Office of the United Nations Secretary-General. The Global Pulse initiative explores how Big Data and real-time analytics technologies can power a more agile approach to sustainable development.

We . . . explore how digital data sources and real-time analytics technologies can help reveal insights about human well-being and emerging vulnerabilities, in order to better protect populations from shocks.

Page 6: Sais.34.1

4 SAIS Review Winter–Spring 2014

As the New York Times wrote in its August 2013 profile of Global Pulse, the United Nations is often perceived as a “sprawling bureaucracy.” What makes the Global Pulse team unique? What qualities—personal and professional—do you seek in a team member?

Global Pulse is unique because we have an “intrapreneurial” approach. It requires risk-taking and innovation to discover and generate new tools, techniques, and methodologies to help the UN system and wider community leverage new sources of real-time information and insights in the service of humanitarian response and development work. This also requires a real blend of expertise from within and outside of the UN. Due to the experi-mental nature of our work, we are set up as a network of labs.

We have multidisciplinary teams working at our Pulse Labs in New York, Jakarta, and Kampala that include data scientists and analysts, social scientists, legal experts, and communications and partnerships specialists. Pulse Lab teams design, scope, and co-create projects with UN agencies and national institutions that provide sectoral expertise, and with private sector or academic partners who provide access to data or analytical and engineering tools.When building a team, I look for “T-shaped people”—that is, people with a broad range of skills and a flexible attitude, as well as deep knowledge of one discipline, whether it is data science, design, partnership management, or legal and privacy matters.

From what range of sources do you derive the data used for your analyses? Which datasets do you consider to be the most unique or surprising? What are the challenges associated with data collection and analysis?

Global Pulse is interested in trends that reveal something about human well-being, which can be revealed from data produced by people as they go about their daily lives (sometimes known as “data exhaust”). Broadly speaking, we have been exploring two types of data in the Pulse Labs. The first is data that reflects “what people say,” which includes publicly available content from the open web, such as tweets, blog posts, news stories, and so forth. The second is data that reflects “what people do,” which can include information routinely generated for business intelligence and to optimize sales in a private sector company. An example of “what people do” data is anonymized mobile phone traffic information, which can reveal everything from footfall in a shopping district during rush hour to how a population migrates after a natural catastrophe.

A dataset that may be surprising is postal data (the traffic and volume of packages being shipped), which can be used as a proxy for GDP and eco-nomic activity in a country or region. We are beginning a series of research projects with the Universal Postal Union (UPU), the United Nations special-ized agency for the postal sector, to explore this relationship further.

There are several challenges associated with moving this kind of analysis out of an innovation lab and into practice, including the need to

Page 7: Sais.34.1

5A ConversAtion with robert KirKpAtriCK

build skills and capacity around data science, the formation of sustain-able partnerships with potential data providers in the private sector, and identifying where new data and insights can fit into the planning and decision-making processes. And most importantly, we must take data protection and privacy norms, policies, and techniques to a new level to mitigate the po-tential for misuse. Our mission to find responsible ways of using big data for global development purposes does not include analyzing private or confidential information. We follow, and advocate for, robust privacy protection principles.

Does Global Pulse focus on certain sectors? If so, why?

Although we can and do work with any part of the UN system that has a de-velopment problem that data science might contribute to solving, there are certain areas that are particularly well-suited to big data analysis. This year, we will focus in particular on public health, including attitudes to health as expressed on social media, news media, and patterns in anonymized search data. For example, in partnership with the World Health Organiza-tion, we are exploring whether early warning of non-communicable disease risk factors in a country or community could be understood via analysis of key words in social media data. We continue to look at parental attitudes to immunizing children as expressed on social media, in order to address misinformation that stops parents from protecting their children against preventable diseases.

Another research priority is food security. In Indonesia, our Pulse Lab Jakarta research team is exploring whether big data can provide insights about the impacts of food price changes, in order to support the social pro-tection policies of the government of Indonesia. Other areas of focus this year include supporting humanitarian action through new data analytics techniques, finding new ways to measure economic well-being, and using digital data mining to help shape the priority development agenda that will replace the Millennium Development Goals after they expire in 2015. Across all sectors, though, Global Pulse conducts a range of activities to strengthen the big data for development (BD4D) ecosystem by guiding the development of regulatory frameworks and technical standards to address data-sharing and privacy protection challenges. We support an emerging community of practice to accelerate public sector adoption through advocacy, policy guid-ance, and technical assistance.

How do you identify and maintain relationships with your private sec-tor partners?

Private sector partners are incredibly important in helping us leverage big data as a resource for sustainable development. Using big data responsibly

We must take data protection and privacy norms, policies, and techniques to a new level to mitigate the potential for misuse.

Page 8: Sais.34.1

6 SAIS Review Winter–Spring 2014

and effectively requires several different elements, so there are different areas of expertise, knowledge, and resources we look for when building partnerships. The Global Pulse network of partners and collaborators in-cludes forward-thinking private sector companies that are willing to engage in “data philanthropy,” by granting access to data and technology tools to the public sector. Our network also includes industry leaders, universities, research institutes, and non-profit networks of researchers and innovators who are ready to bring their skills and expertise to bear for advancing the use of data science across the global development and humanitarian fields.

To establish and maintain these relationships, we have a partnership manager and a privacy and legal expert, both of whom help guide potential partners through the process. They work with counterparts in the com-panies to ensure that safeguards, legal agreements, and data protection principles are in place. Once collaboration is underway, our research team will work closely with data analysts in the partner organization to initiate a project or exploration. Often, the data never leaves the business that owns it; rather, our data scientists guide the process and then the trends or results are shared. This modality works well when the data is sensitive.

The experience with our partners, overall, is one of mutual learning. The Pulse Lab network offers a safe “sandbox” for de-risking this type of experimentation as we all learn together how the public and private sector can responsibly harness big data for development.

Could you share a few Global Pulse success stories? Similarly, which development challenges (such as particular regions or issues) are par-ticularly difficult to tackle?

There are success stories on the data philanthropy front in which telecom-munications companies have made anonymized datasets available as part of a competition or challenge. For example, last year we collaborated with Orange Telecom to host a “Data for Development Challenge” in which the company opened up a dataset of anonymized mobile phone data to more than eighty research teams from around the world to analyze. This research garnered insights that the international development community can be inspired by or learn from.

In terms of projects we are carrying out, as I mentioned previously, our Pulse Lab in Indonesia is conducting research on mining tweets to under-stand food price crises. Their research has provided new insights into the very real problem of sudden increases in the price of staple foodstuffs, like rice prices, pushing families below the poverty line and causing regional economic instability. Real-time information about these impacts could help policymakers and governments provide support to families who are suffer-ing as a result of food price hikes.

Going forward, we plan to conduct further research on social media analysis for food security and for crowdsourcing food prices, since these are areas of focus for the government of Indonesia, and will be applicable in many other parts of the world. Certainly, this is not a solution that can be

Page 9: Sais.34.1

7A ConversAtion with robert KirKpAtriCK

applied universally. Social media analysis is of limited use in countries where internet penetration is low, and even in regions of a country where the digi-tal divide is vast. There are also ana-lytical challenges yet to be resolved, including the ne-cessity of build-ing technologies that can support diverse local lan-guages.

We are in the early days of dis-covering how big data can be applied to development and humanitarian contexts, and there are diverse challenges ranging from data access and the capacity to use real-time data in decision-making to data privacy. These challenges will be addressed over time.

As Viktor Mayer-Schönberger and Kenneth Cukier wrote, big data is revolutionizing the way we solve problems. You have noted several ways that data collection and analysis is helping Global Pulse address global development challenges. But does the shift toward big data analytics have drawbacks for the field of international development, as well?

There are risks rather than drawbacks. I hear a lot about the supposed fear that all development decisions would be made by algorithms. This is not a realistic fear, but rather a false dichotomy between quantitative use of data in decision-making and policymakers using qualitative experiences to decide a course of action. Big data will always be one part of a solution, not the only solution. Of course, we need research, official statistics, and the deep knowledge of field workers, communities, and practitioners, but big data and data science represents a useful addition to the development and humanitarian worker’s toolbox.

Another common misperception is that real-time data would replace official statistics, but this assumption is unrealistic, as well. Official statis-tics will continue to provide high-quality snapshots of progress that can be benchmarked. But increasingly, between those annual, bi-annual, or monthly updates, real-time data sources will provide valuable interim feed-back and indicators. This feedback can enable course-correction when it is evident that a program isn’t working. And real-time data can reveal shifts in food-pricing, population changes, or disease outbreaks within a day or an hour, rather than a month or a year.

We are in the early days of discovering how big data can be applied to development and humanitarian contexts, and there are diverse challenges ranging from data access and the capacity to use real-time data in decision-making to data privacy.

Page 10: Sais.34.1

8 SAIS Review Winter–Spring 2014

Many graduate schools of international affairs, including the Johns Hopkins School of Advanced International Studies (SAIS), offer cours-es or concentrations in international development. Do you find that these programs offer adequately rigorous quantitative or data analysis requirements? What skills should students develop if they intend to enter the field of international development?

There is a need for greater skills capacity for data analysis—this is something that is needed across the board and not only in our field. The international development practitioner of the future will be someone who is data literate, and capable of using data analysis to inform his or her understanding and decision-making. So yes, we’d like to see graduate schools covering data

for development and the process-es involved. Just as the schools of journal ism are now teaching data journal ism, al l students must un-derstand how to identify credible

informative sources, how to perform—or at least understand quantitative statistical and data analysis, and how to appropriately use data to inform their judgment. The good news is that I see a lot of appetite for these skills from current students, so this change is beginning to happen.

The international development practitioner of the future will be someone who is data literate, and capable of using data analysis to inform his or her understanding and decision-making.

Page 11: Sais.34.1

9SAIS Review vol. XXXVI no. 1 (Winter–Spring 2014)

9© 2014 by The Johns Hopkins University Press

Big Data, Selection Bias, and the Statistical Patterns of Mortality in Conflict

Megan Price and Patrick Ball

The notion of “big data” implies very specific technical assumptions. The tools that have made big data immensely powerful in the private sector depend on having all (or nearly all) of the possible data. In our experience, these technical assumptions are rarely met with data about the policy and social world. This paper explores how information is generated about killings in conflict, and how the process of information generation shapes the statistical patterns in the observed data. Using case studies from Syria and Iraq, we highlight the ways in which bias in the observed data could mislead policy. The paper closes with recommendations about the use of data and analysis in the development of policy.

Introduction

Emerging technology has greatly increased the amount and availability of data in a wide variety of fields. In particular, the notion of “big data”

has gained popularity in a number of business and industry applications, enabling companies to track products, measure marketing results, and in some cases, successfully predict customer behavior.1 These successes have, understandably, led to excitement about the potential to apply these meth-ods in an increasing number of disciplines.

Megan Price is the director of research at the Human Rights Data Analysis Group. She has conducted data analyses for projects in a number of locales including Syria and Guatemala. She recently served as the lead statistician and head author of two reports commissioned by the Office of the United Nations High Commissioner of Human Rights.

Patrick Ball is the executive director of the Human Rights Data Analysis Group. Beginning in El Salvador in 1991, Patrick has designed technology and conducted quantitative analyses for truth commissions, non-governmental organizations, domestic and international criminal tribunals, and United Nations missions. Most recently, he provided expert testimony in the trial of former de facto President of Guatemala, Gen. José Efraín Ríos Montt.

The materials contained herein represent the opinions of the authors and editors and should not be construed to be the view of HRDAG, any of HRDAG’s constituent projects, the HRDAG Board of Advisers, the donors to HRDAG, or this project.

Page 12: Sais.34.1

10 Big Data, Selection BiaS, anD Mortality in conflictSAIS Review Winter–Spring 2014

Although we share this excitement about the potential power of data analysis, our decades of experience analyzing data about conflict-related violence motivates us to proceed with caution. The data available to hu-man rights researchers is fundamentally different from the data available to business and industry. The difference is whether the data are complete. In most business processes, an organization has access to all the data: every item sold in the past twelve months, every customer who clicked through their website, etc. In the exceptional cases where complete data are unavail-able, industry analysts are often able to generate a representative sample of the data of interest.2

In human rights, and more specifically in studies of conflict violence, we rarely have access to complete data. What we have instead are snapshots of violence: a few videos of public killings posted to YouTube, a particular

set of events retro-spectively recorded by a truth commission, stories covered in the local or international press, protesters’ SMS messages aggregated

onto a map, or victims’ testimonies recorded by non-governmental human rights organizations (NGOs) are typical sources. Statistically speaking, these snapshots are “convenience samples,” and they cover an unknown propor-tion of the total number of cases of violence.3 It is mathematically difficult, often impossible, to know how much is undocumented and, consequently, missing from the sample.

Incompleteness is not a criticism of data—collecting complete or rep-resentative data under conflict conditions is generally impossible. The chal-lenge is that researchers and advocates naturally want to address questions that require either the total number or a representative subset of cases of violence. How many people have been killed? What proportion was from a vulnerable population? Were more victims killed last week or this week? Which perpetrator(s) are committing the majority of the violence? Basing answers and policy decisions on analyses of partial datasets with unknown, indeed unknowable, biases can prove to be misleading. These concerns should not deter researchers from asking questions of data; rather, it should caution them against basing conclusions on inadequate analyses of raw data. We conclude by suggesting methods from several quantitative disciplines to estimate the bias in direct observations.

The Problem of Bias

When people record data about events in the world, the records are almost always partial; reasons why the observation of violence often misses some or most of the violence are presented in the examples to follow. Most samples are partial, and in samples not collected randomly, the patterns of omission may have structure that influence the patterns observed in the data. For ex-ample, killings in urban areas may be nearly always reported, while killings

In human rights, and more specifically in studies of conflict violence, we rarely have access to complete data.

Page 13: Sais.34.1

11Big Data, Selection BiaS, anD Mortality in conflict

in rural areas are rarely documented. Thus, the probability of an event be-ing reported depends on where the event happened. Consequently, analysis done directly from this data will suggest that violence is primarily urban. This conclusion is incorrect because the data simply do not include many (or at least proportionally fewer) cases from the rural areas. In this case, the analysis is finding a pattern in the documentation that may appear to be a pattern in true violence—but if analysts are unaware of the documentation group’s relatively weaker coverage of the rural areas, they can be misled by the quantitative result. In our experience, even when analysts are aware of variable coverage in different areas, it is enormously difficult to draw a meaningful conclusion from a statistical pattern that is affected by bias.

Statisticians call this problem “selection bias” because some events (in this example, urban ones) are more likely to be “selected” for the sample than other events (in this example, rural ones). Selection bias can affect human rights data collection in many ways.4 We use the word “bias” in the statistical sense, meaning a statistical difference between what is observed and what is “truth” or reality. “Bias” in this sense is not used to connote judgment. Rather, the point is to focus attention on empirical, calculable differences between what is observed and what actually happened.

In this article, we focus on a particular kind of selection bias called “event size bias.” Event size bias is the variation in the probability that a given event is reported, related to the size of the event: big events are likely to be known, small events are less likely to be known. In studies of conflict violence, this kind of bias arises when events that involve only one victim are less likely to be documented than events that involve larger groups of victims. For example, a market bombing may involve the deaths of many people. The very public nature of the attack means that the event is likely to attract extensive attention from multiple media organizations. By contrast, an assassination of a single person, at night, by perpetrators who hide the victim’s body, may go unreported. The victim’s family may be too afraid to report the event, and the body may not be discovered until much later, if at all. These differences in the likelihood of observing information about an event can skew the available data and result in misleading interpretations about patterns of violence.5

Case Studies

We present here two examples from relatively well-documented conflicts. Some analysts have argued that information about conflict-related killings in Iraq and Syria is complete, or at least sufficient for detailed statistical analysis. In contrast, our analysis finds that in both cases, the available data are likely to be systematically biased in ways that are likely to confound interpretation.

SyriaMany civilian groups are currently carrying out documentation efforts in the midst of the ongoing conflict in Syria. In early 2012, the United Nations Of-fice for the High Commissioner for Human Rights (OHCHR) commissioned

Page 14: Sais.34.1

12 Big Data, Selection BiaS, anD Mortality in conflictSAIS Review Winter–Spring 2014

the Human Rights Data Analysis Group (HRDAG) to examine datasets from several of these groups, and in two reports, Price et al. provide in-depth de-scriptions of these sources.6 In this section, we focus our attention on four sources—in essence, lists of people killed—which cover the entire length of the ongoing conflict and which have continued to provide us with updated records of victims. These sources are the Syrian Center for Statistics and Re-search7 (CSR-SY), the Syrian Network for Human Rights8 (SNHR), the Syria Shuhada website9 (SS) and the Violations Documentation Centre10 (VDC).

Figure 1 shows the number of victims documented by each of the four sources over time within the Syrian governorate of Tartus. The large peak visible in all four lines in May 2013 corresponds to an alleged massacre in Banias.11 It appears that all four sources documented some portion of this event. Many victims were recorded in the alleged massacre, this event was very well reported, and all four of our sources reflect this event in their lists. However, three out of the four sources document very little violence occur-ring before or after May 2013 in Tartus. The fourth source, VDC, shows the peak of violence in May as the culmination of a year of consistent month-to-month increases in the number of reported killings.

When interpreting figures such as Figure 1, we should not aim to iden-tify a single “correct” source. All of these sources are documenting different snapshots of the violence, and all of them are contributing substantial num-bers of unique records of victims undocumented by the other sources.12 The presence of event size bias is detectable in this particular example because all four of the sources obviously captured a similar event (or set of events) in May 2013, while at the same time one of those sources captured a very different subset of events during the preceding months. If we did not have access to the VDC data, our analysis of conflict violence in Tartus would incorrectly conclude that the alleged massacre in May 2013 was an isolated event surrounded by relatively low levels of violence.

The conclusion from Figure 1 should not be that VDC is doing a “better” job of documenting victims. VDC is clearly capturing some events that are not captured by the other sources, but there is no way to tell how many events are not being captured by VDC. From this figure alone we cannot conclude what other biases may be present in the observed data. For example, the relatively small peak in February 2012 could be as small as it seems, or it could be as large as the later peak in May 2013. Without a method of statistical estimation that uses a probability model to account for the undocumented events, it is impossible to know.13

To underline this crucial point: despite the availability of a large amount of data describing violence in Tartus, there is no mathematically sound method to draw conclusions about the patterns of violence directly from the data (though it is possible to use the data and statistical models to estimate how many events are missing). The differences in the four sources available to us make it possible to detect the event size bias occurring in May 2013, but what other biases might also be present in this observed data and hidden from view? What new events might a fifth, sixth, or seventh source document? Are there enough undocumented events such that if they were

Page 15: Sais.34.1

13BIG DATA, SELECTION BIAS, AND MORTALITY IN CONFLICT

included, our interpretation of the patterns would change? These are the crucial questions that must be examined when interpreting perceived pat-terns in observed data.

IraqWe detect a subtler form of event size bias in data from the Iraq Body Count (IBC), which indexes media and other sources that report on violent deaths in Iraq since the Allied invasion in March 2003.14 Our analysis is motivated by a recent study by Carpenter et al., which found evidence of substantial event size bias.15 Their approach was to compare the U.S. military’s “sig-nifi cant acts” (SIGACTS) database to the IBC records. As they report, this comparison showed that “[e]vents that killed more people were far more likely to appear in both datasets, with 94.1% of events in which ≥20 people were killed being likely matches, as compared with 17.4% of … killings [that occurred one at a time].”16 This implies that IBC, SIGACTS, or both, capture a higher fraction of large events than small events. Carpenter et al. go on

Figure 1. Number of Victims Documented by Four Sources, Over Time, in Tartus

Page 16: Sais.34.1

14 Big Data, Selection BiaS, anD Mortality in conflictSAIS Review Winter–Spring 2014

to note that “[t]he possibility that large events, or certain kinds of events (e.g., car bombs) are overrepresented might allow attribution that one side in a conflict was more recklessly killing civilians, when in fact, that is just an artifact of the data collection process.”17

Motivated by this analysis, we considered other ways to examine IBC records for evidence of potential event size bias. Since IBC aggregates re-cords from multiple sources, updated IBC data already incorporates many records from SIGACTS.18 In contrast to the work of Carpenter et al., who treated IBC and SIGACTS as two separate data sources and conducted their own independent record linkage between the two sources, we examined only records in the IBC database, including those labeled as from SIGACTS.

It should be noted that we conducted this analysis on a subset of the data after filtering out very large events with more than fifty victims. We made this choice because, on inspection, many of the records with larger numbers of reported victims are data released in batches by institutions such as morgues, or incidents aggregated over a period of time, rather than specific, individual events.

We began by identifying the top one hundred data sources; one or more of the top one hundred sources cover 99.4 percent of the incidents in IBC.19 Given these sources, we counted the number of sources (up to one hundred) for each event. Event size was defined as the mean (rounded to the nearest integer) of the reported maximum and minimum event size values. Then the data were divided into three categories: events with one victim, events with two to five victims, and events with six to fifty victims. The analysis was performed on these groups.

Figure 2 summarizes our findings. The shading of each bar in Figure 2 indicates the proportion of events of that size reported by one, two, or three or more sources. For each category of event sizes, most events have two sources. For events of size one, the second most frequent number of sources is one, accounting for nearly a third of all events of this size; almost no single-victim events have three or more sources. The number of events with three or more sources increases quickly in medium-sized events and in large events. Relatively few of the largest events are reported by a single source. Thus there seems to be a relationship between event size and the number of sources: larger events are captured by more sources. This rein-forces the finding by Carpenter et al. that larger events are more likely to be captured by both IBC and SIGACTS. We have generalized this finding to the top one hundred sources; larger events are more likely to be captured by multiple sources.

The number of sources covering an event is an indicator of how “inter-esting” an event is to a community of documentation groups—in this case, media organizations. The pattern shown in Figure 2 implies that media sources are more interested in larger events than smaller events. Greater interest in the larger events implies that larger events are more likely to be reported (observed) by multiple sources relative to smaller events. Since a larger proportion of small events are covered by only a single source, it is likely that more small events are missed, and therefore excluded from IBC.20

Page 17: Sais.34.1

15Big Data, Selection BiaS, anD Mortality in conflict

As noted by Carpenter et al., the correlation between event attributes and the likely reporting of those events can result in highly misleading in-terpretation of apparent patterns in the data. As a relatively neutral example, analysts might erroneously conclude that most victims in Iraq were killed in large events, whereas this may actually be an artifact of the data collec-tion. A potentially more damag-ing, incorrect conclusion might be reached if large events are centered in certain geographic regions or attributed to certain perpetrators; in these cases, reading the raw data directly would mistake the event size bias for a true pattern, thereby misleading the analyst. Inap-propriate interpretations could result in incorrect decisions regarding security measures, intervention strategies, and ultimately, accountability.

The correlation between event attributes and the likely reporting of those events can result in highly misleading interpretation of apparent patterns in the data.

Figure 2. Proportion of Events Covered by One, Two, or Three or More Sources

Page 18: Sais.34.1

16 Big Data, Selection BiaS, anD Mortality in conflictSAIS Review Winter–Spring 2014

Discussion

Event size bias is one of many kinds of selection and reporting biases that are common to human rights data collection. It is important to recall that we refer here to biases in the statistical sense: a measurable difference be-tween the observed sample and the underlying population of interest. The biases that worry us here affect statistics and quantitative analyses; we are not implying that the political goals of the data collection groups have influenced their work.

In the context of conflict violence, meaningful statistical analysis involves comparisons to answer questions such as: Did more violence oc-cur this month or last month? Were there more victims of ethnicity A or

B? Did the majority of the violence occur in the north or the south of the country? The concern about bias focuses on how the data collection process may more effectively document one month rela-tive to another, creating the appearance of a difference between the months. Unfortu-

nately, the apparent difference is the result of changes in the documentation process, not real changes in the patterns of violence.

To make sense of such comparisons, the observed data must in some way be adjusted to represent the true rates. There are a number of methods for making this adjustment if the observed data were collected at random, but this is rarely the case. There are relatively few models that can adjust data that were collected because it was simply available.

In order to compare nonrandom data across categories like months or regions, the analyst must assume that the rate at which events from each category are observed is the same. For example, 60 percent of the total killings were collected in March, and 60 percent of the total killings were collected in April. This rate is called the coverage rate, and it is unknown, unless somehow the true number of events were known or estimated. If the coverage rates for different categories are not the same, the observed data tell only the story of the documentation; they do not indicate an accurate pattern. For example, if victims of ethnicity A are killed in large-scale vio-lent events with many witnesses, while victims of ethnicity B are killed in targeted, isolated violent events, we may receive more reports of victims of ethnicity A and erroneously conclude that the violence is targeted at eth-nicity A. Until we adjust for the event size bias resulting in more reports of victims of ethnicity A, we cannot draw conclusions about the true relation-ship between the number of victims from ethnicity A versus B.

There are many other kinds of selection bias. As an example, when rely-ing on media sources, journalists make decisions about what is considered newsworthy. Sometimes their decisions may create event size bias, as large

. . . the apparent difference is the result of changes in the documentation process, not real changes in the patterns of violence.

Page 19: Sais.34.1

17Big Data, Selection BiaS, anD Mortality in conflict

events are frequently considered newsworthy. But the death of individual, prominent members of a society are frequently also considered newswor-thy. Conversely, media “fatigue” may result in under-documentation later in a conflict, or when other newsworthy stories may limit the amount of time and space available to cover victims of a specific conflict.21 Many other characteristics of both the documentation groups and the conflict can result in these kinds of biases such as logistical or budgetary limitations, trust or affinity variations within the community, and the security and stability of the situation on the ground.22 As each of these factors changes, coverage rates are likely to change as well.

The fundamental reason why biases are so problematic for quantita-tive analyses is that bias often correlates with other dimensions that are interesting to analysts, such as trends over time, patterns over space, differ-ences compared by the victims’ sex, or some other factor. As in the example of ethnicities A and B above, the event size bias is correlated with the kind of event. Failing to adjust for the reporting bias leads to the wrong conclu-sion. As another example, consider the Iraq case described above: If event size is correlated with the events’ perpetrators, then bias on event size means bias on perpetrator, and a naïve reading of the data could lead to security officials trying to solve the wrong security problems. Or, in the Syria case, if decisions about resource allocation to Tartus were made on the basis of the observed information, without taking into account the patterns of kill-ings that were not observed, researchers may have inaccurately concluded that violence documented in May 2013 represented an isolated event. One could imagine that such a conclusion could lead to any number of incorrect decisions: sending aid groups into Tartus under the erroneous assumption of relative security, or failing to send aid and assistance before or after May 2013, assuming that such resources were more in need elsewhere.

It is important to note that these challenges frequently lack a scientific solution.23 We do not need to simply capture more data. What we need is to appropriately recognize and adjust for the biases present in the available data. Indeed, as indicated in the Iraq example, where multiple media sources appear to share similar biases, the addition of more data perpetuates and in some cases amplifies the event size bias.

Detection of, and adjustment for, bias requires statistical estimation. A wide variety of statistical methods can be used to adjust for bias and es-timate what is missing from observed data. In our work we favor multiple systems estimation, which has been developed under the name capture-recapture in ecology, and used to study a variety of human populations in research in demography and public health. Analysts more familiar with traditional survey methods often prefer adjustments based on post-stratifi-cation or “raking,” each of which involves scaling unrepresentative data to a known representative sample or population. 24 Each method has limitations and requires assumptions, which may or may not be reasonable, but formal statistical models provide a way to make those assumptions explicit, and in some cases, to test whether they are appropriate. Comparisons from raw data implicitly but necessarily assume that such snapshots are statistically repre-sentative. This assumption may sometimes be true, but only by coincidence.

Page 20: Sais.34.1

18 Big Data, Selection BiaS, anD Mortality in conflictSAIS Review Winter–Spring 2014

Conclusions

Carpenter et al. warn that “press members and scientists alike should be cau-tious about assuming the completeness and representativeness of tallies for which no formal evaluation of sensitivity has been conducted. Citing partial tallies as if they were scientific samples confuses the public, and opens the press and scholars to being manipulated in the interests of warring parties.” In a back-of-the-envelope description elsewhere, we have shown that small variations in coverage rates can lead to an exactly wrong conclusion from raw data. 25

Groups such as the Iraq Body Count, the Syrian Center for Statistics and Research, the Syrian Network for Human Rights, the Syria Shuhada website, and the Violations Documentation Centre collect invaluable data, and they do so systematically, and with principled discipline. These groups should continue to collate and share it as a fundamental record of the past. The data can also be used in qualitative research about specific cases, and in some circumstances, in statistical models that can adjust for biases.

It is tempting, particularly in emotionally charged research such as studies of conflict-related violence, to search available data for answers. It is intuitive to create infographics, to draw maps, and to calculate statistics and draft graphs to look for patterns in the data. Unfortunately, all people—even statisticians—tend to draw conclusions even when we know that the data are inadequate to support comparisons. Weakly founded statistics tend to mislead the reader.

Statistics, graphs, and maps are seductive because they seem to prom-ise a solid basis for conclusions. The current obsession with using data to formulate evidence-based policy increases the pressure to use statistics, even as new doubts emerge about whether “big data” predictions about social conditions are accurate.26 When calculations are made in a way that enables a mathematical foundation for statistical inference, these statistics deliver

on the promise of an objective measurement in relation to a specific question. But analysis with inadequate data is very hard even for subject matter experts to interpret. In the worst case, it offers a falsely precise view, a view that may be completely

wrong. In the best case, it invites speculation about what’s missing and what biases are uncontrolled, creating more questions than answers, and ultimately, a distraction. When policymakers turn to statistical analysis to address key questions, they must assure that the analysis gives the right answers.

Statistics, graphs, and maps are seductive because they seem to promise a solid basis for conclusions.

Page 21: Sais.34.1

19Big Data, Selection BiaS, anD Mortality in conflict

Notes

1 One extreme example includes Target successfully predicting a customer’s pregnancy, as reported in the New York Times and Forbes. In particular, Target noticed that pregnant women buy specific kinds of products at regular points in their pregnancy, and the company used this information to build marketing campaigns.2 However it is certainly worth noting that even in these contexts sometimes big data are not big enough and may still be subject to the kinds of biases we worry about in this paper. See Kate Crawford’s keynote at STRATA and Tim Harford’s recent post on Financial Times for examples.3 Specifically, “convenience samples” refer to data that is non-randomly collected, though collecting such data is rarely convenient.4 Another common kind of bias that affects human rights data is reporting bias. Whereas se-lection bias focuses on how the data collection process identifies events to sample, reporting bias describes how some points become hidden, while others become visible, as a result of the actions and decisions of the witnesses and interviewees. For an overview of the impact of selection bias on human rights data collection, see Jule Krüger, Patrick Ball, Megan Price, and Amelia Hoover Green (2013). “It Doesn’t Add Up: Methodological and Policy Implica-tions of Conflicting Casualty Data.” In Counting Civilian Casualties: An Introduction to Recording and Estimating Nonmilitary Deaths in Conflict, ed. by Taylor B. Seybolt, Jay D. Aronson, and Baruch Fischhoff. Oxford UP.5 Christian Davenport and Patrick Ball. “Views to a Kill: Exploring the Implications of Source Selection in the Case of Guatemalan State Terror, 1977–1996.” Journal of Conflict Resolution 46(3): 427–450. 2002.6 Megan Price, Jeff Klingner, Anas Qtiesh, and Patrick Ball (2013). “Full Updated Statistical Analysis of Documentation of Killings in the Syrian Arab Republic.” Human Rights Data Analysis Group, commissioned by the United Nations Office of the High Commissioner for Human Rights (OHCHR). Megan Price, Jeff Klingner, and Patrick Ball (2013). “Prelimi-nary Statistical Analysis of Documentation of Killings in the Syrian Arab Republic.” The Benetech Human Rights Program, commissioned by the United Nations Office of the High Commissioner for Human Rights (OHCHR). 7 http://www.csr-sy.com8 http://www.syrianhr.org9 http://syrianshuhada.com10 http://www.vdc-sy.info11 See reports in the LA Times, BBC, and the Independent, among others.12 Price. et al. 2013.13 See https://hrdag.org/mse-the-basics/ for the first in a series of blog posts describing Multiple Systems Estimation (MSE) or Kristian Lum , Megan Emily Price and David Banks (2013). Applications of Multiple Systems Estimation in Human Rights Research. The Ameri-can Statistician, 67:4, 191–200. DOI: 10.1080/00031305.2013.82109314 http://www.iraqbodycount.org15 Carpenter D, Fuller T, Roberts L. “WikiLeaks and Iraq Body Count: the sum of parts may not add up to the whole—a comparison of two tallies of Iraqi civilian deaths.” Prehosp Disaster Med. 2013;28(3):1–7. doi:10.1017/S1049023X1300011316 Ibid.17 Ibid.18 We downloaded the ibc-incidents file on 14 Feb 2014, and processed it using the pandas package in python.19 The top 100 sources include, for example, AFP, AL-SHAR, AP, CNN, DPA, KUNA, LAT, MCCLA, NINA, NYT, REU, VOI, WP, XIN, and US DOD VIA WIKILEAKS.20 These assumptions can be formalized and tested within the framework of ‘species richness,’ which is a branch of ecology that estimates the number of different types of species within a geographic area and/or time period of interest using models for data organized in a very similar way to the IBC’s event records. See Wang, Ji-Ping. “Estimating species richness by a Poisson-compound gamma model.” Biometrika 97.3 (2010): 727–740.

Page 22: Sais.34.1

20 Big Data, Selection BiaS, anD Mortality in conflictSAIS Review Winter–Spring 201421 A research question to address this might be: Do media-reported killings in a globally-interesting conflict like Iraq or Syria decline during periods when other stories attract interest? Do reported killings decline during the Olympics? 22 Krüger et al. (2013)23 Bias issues can sometimes be resolved with appropriate statistical models, that is, with better scientific reasoning about the specific kind of data involved. However, we underline that bias is not solvable with better technology. Indeed, some of the most severely biased datasets we have studied are those collected by semi- or fully-automated, highly technologi-cal methods. Technology tends to increase analytic confusion because it tends to amplify selection bias. 24 For a description of multiple systems estimation, see Lum et al. 2013. For methods on missing data in survey research which might be applicable to the adjustment of raw, non-random data if population-level information is available, see Brick, J. Michael, and Graham Kalton. “Handling missing data in survey research.” Statistical methods in medical research 5.3 (1996): 215–238. For an overview of species richness models which might be used to estimate total populations from data organized like the IBC, see op. cit Wang. For an analysis of sampling issues in “elusive” populations, see Johnston, Lisa G., and Keith Sabin. “Sampling hard-to-reach populations with respondent driven sampling.” Methodological Innovations Online 5.2 (2010): 38–48.25 https://hrdag.org/why-raw-data-doesnt-support-analysis-of-violence/26 Lazer, David and Kennedy, Ryan and King, Gary and Vespignani, Alessandro, Google Flu Trends Still Appears Sick: An Evaluation of the 2013–2014 Flu Season (March 13, 2014). Available at SSRN: http://ssrn.com/abstract=2408560

Page 23: Sais.34.1

21SAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

21© 2014 by The Johns Hopkins University Press

Using Big Data and Quantitative Methods to Estimate and Fight Modern Day Slavery

Monti Narayan Datta

Given the hidden, criminal nature of contemporary slavery, empirically estimating the proportion of the population enslaved at the national and global level is a challenge. At the same time, little is understood about what happens to the lives of the survivors of slavery once they are free. I discuss some data collection methods from two nongovernmental organizations (NGOs) I have worked with that shed light on these issues. The first NGO, the Walk Free Foundation, estimates that there are about 30 million enslaved in the world today. The second NGO, Free the Slaves, employs a longitudinal analysis to chronicle the lives of survivors. The acquisition and dissemination of such information is crucial because policymakers and donors sometimes require hard data before committing time, political will, and resources to the cause.

Unpacking the Problem of Contemporary Slavery

As Kevin Bales of the Wilberforce Institute for the Study of Slavery and Emancipation explains, “Slavery is the possession and control of a

person in such a way as to significantly deprive that person of his or her individual liberty, with the intent of exploiting that person through their use, management, profit, transfer or disposal. Usually this exercise will be achieved through means such as violence or threats of violence, deception and/or coercion.”1

Thus, at its core, slavery is a dynamic be-tween two individuals, the enslaved and the slave-holder, in which the slave-holder has a monopoly of control and violence upon

Monti Narayan Datta is an assistant professor of political science at the University of Richmond. His current book project, forthcoming with Cambridge University Press, focuses on the consequences of anti-Americanism. He is working on several projects on human trafficking and modern day slavery with Free the Slaves and Chab Dai and the Walk Free Foundation. Along with Kevin Bales and Fiona David, he is a co-author of the Global Slavery Index: http://www.globalslaveryindex.org.

. . . at its core, slavery is a dynamic between two individuals , the enslaved and the slaveholder, in which the slaveholder has a monopoly of control and violence upon the enslaved.

Page 24: Sais.34.1

22 Big Data anD Quantitative MethoDs to estiMate MoDern Day slaverySAIS Review Winter–spring 2014

the enslaved. The slaveholder can coerce the enslaved to perform a number of abominable acts. This can include: sexual servitude on the streets of New York City;2 adult labor in the coltan mines of the Congo;3 child slavery in the shrimp farms of Bangladesh; or forced domestic servitude in the suburbs of Los Angeles.4 Compounding the matter is that enslaved persons can spend years—sometimes decades—under such conditions.5 This can sometimes lead to slavery lasting across several generations. Short of homicide, slavery is one of the most inhumane crimes one person can commit against another.

In recent years, a number of governments and international govern-mental organizations have addressed modern day slavery at home and abroad. In the United States, Congress passed the Victims of Trafficking and Violence Protection Act (TVPA) in 2000. The TVPA established the President’s Interagency Task Force to Monitor and Combat Trafficking—a cabinet-level group whose mission is to coordinate efforts to combat traf-ficking in persons—led by the U.S. State Department. Since then, the State Department has produced its annual Trafficking in Persons (TIP) Report, which has become “the U.S. Government’s principal diplomatic tool to en-gage foreign governments on human trafficking.”6 Although not without controversy, the TIP Report has educated many on the sources and impact of modern day slavery.7

On the global stage, between 2000 and 2001 the United Nations Gen-eral Assembly adopted three protocols to its Convention against Transna-tional Organized Crime: (1) the Protocol to Prevent, Suppress and Punish Trafficking in Persons, especially Women and Children; (2) the Protocol against the Smuggling of Migrants by Land, Sea, and Air; and (3) the Pro-tocol against the Illicit Manufacturing and Trafficking in Firearms. With 117 signatory countries, these protocols, known as the Palermo Protocols, advanced the global discussion not only on what constitutes contemporary slavery, but also on what the international community can do to mitigate its spread.

Although some may argue that international agreements like the Palermo Protocols and documents like the TIP Report matter only margin-ally,8 others counter they catalyze change.9 Building upon a crest of public awareness on human trafficking, U.S. President Barack Obama proclaimed in 2012 at the Clinton Global Initiative, “We are turning the tables on the traffickers. Just as they are now using technology and the Internet to exploit their victims, we are going to harness technology to stop them.”10

Although he did not mention it explicitly, President Obama was re-ferring to the idea of using big data to mitigate contemporary slavery. As Pulitzer-prize winning journalist Steve Lohr explains, big data is “shorthand for advancing trends in technology that open the door to a new approach to understanding the world and making decisions.”11 This typically involves using software to find trends and patterns in large amounts of aggregated data from the Internet, sometimes from publicly available data, and other times from clandestinely obtained data.

Along the lines of utilizing publicly available data, the tech-giant Google announced in April 2013 a big data partnership with the Polaris

Page 25: Sais.34.1

23Big Data anD Quantitative MethoDs to estiMate MoDern Day slavery

Project, an antislavery NGO in Washington, D.C. The partnership, called the Global Human Trafficking Hotline Network, aims to use data mining soft-ware to identify human trafficking trends from the hotline that can even-tually inform “eradication, prevention, and victim protection strategies.”12

Although using big data to fight trafficking is new, the idea has been demonstrated by scholars like Mark Latonero of the Annenberg Center on Communication Leadership & Policy at the University of Southern Califor-nia. Latonero’s team partnered with local law enforcement agencies in Los Angeles, explored trends in human trafficking on websites like Backpage.com, and applied this information to target specific traffickers. This was done by mining data from advertisements for the sexual services of domestic minors on the adult section of Backpage in the Greater Los Angeles area, and identifying the phone numbers from those ads that appeared in the greatest frequencies. With this information, Latonero’s team was able to provide law enforcement with data linking certain phone numbers to criminal networks.

The U.S. government is also using big data to mine private information networks, not on the World Wide Web, but on what is called the Deep Web—that part of the Internet that is not searchable on databases like Google. The Defense Advanced Research Projects Agency (DARPA), a branch of the U.S. military, recently launched a program called Memex to hunt criminal networks on the Deep Web. The first domain DARPA intends to undercover with this new technology is human trafficking.13

These developments in big data dovetail with a broader discussion within academia about how social science researchers can apply quantitative methods to estimate trends in contemporary slavery. Although rigorous, many studies of modern day slavery only exist in the pol-icy and academic communi-ties, and very few published works actual ly employ quantitative methods. In a comprehensive review of the research-based literature on contemporary slavery, Elżbieta M. Goździak and Micah N. Bump of George-town University found that, of 218 research-based journal articles, only seven (about 3 percent) were based on quantitative methods. Without hard data, it can be challenging for scholars to make generalizable inferences to inform policy.

In this paper, using big data as a backdrop, I discuss some novel quan-titative methods employed by two NGOs I have worked with that shed light on contemporary slavery. The first NGO, the Walk Free Foundation, esti-mates that there are about 30 million enslaved in the world today. The sec-ond NGO, Free the Slaves, working with its local Indian partner, MSEMVS, assesses the lives of survivors and how they are reintegrating into society. The acquisition and dissemination of such information is crucial because

These developments in big data dovetail with a broader discussion within academia about how social science researchers can apply quantitative methods to estimate trends in contemporary slavery.

Page 26: Sais.34.1

24 Big Data anD Quantitative MethoDs to estiMate MoDern Day slaverySAIS Review Winter–spring 2014

policymakers and donors sometimes require hard data before committing time, political will, and resources to the cause.

The Walk Free Foundation

Australian philanthropists Andrew and Nicola Forrest established the Walk Free Foundation (Walk Free)14 three years ago to eradicate contemporary slavery. After meeting with Microsoft co-founder Bill Gates, Andrew For-rest was inspired to explore the underpinnings of contemporary slavery using quantitative methods. As Forrest recounts, “Global modern slavery is hard to measure, and Bill’s a measure kind of guy,” adding, “in manage-ment speak, if you can’t measure it, it doesn’t exist.”15 For Forrest, it was important to inform people in the business and policy worlds of the extent to which slavery exists, country-by-country, to prompt action. Although some quantitative assessments of contemporary slavery existed, very little was publicly available. Forrest sought to collect more precise data to dis-seminate freely and thus launched a Global Slavery Index (GSI), on which I have been working since 2012.16

The 2013 GSI ranks 162 of the world’s nations in terms of their level of contemporary slavery. Methodologically, these rankings are based on several factors; the most novel is an estimation of the proportion of the population enslaved in each country. For this measure, the GSI team (led by Kevin Bales and Fiona David) has drawn upon secondary source data analysis that Bales pioneered for his book, Disposable People, and later dis-seminated in Scientific American.17 These secondary sources consisted of a review of the public record, including materials from published reports from governments, the investigations of NGOs and international organizations, and journalistic reports. The GSI team has also drawn upon data from rep-resentative random sample surveys to extrapolate the prevalence of slavery for selected comparable countries. Figure 1 illustrates the 2013 GSI data for the proportion of the population estimated to be enslaved.

In Figure 1 the countries with darker shades indicate a corresponding higher proportion of enslavement. Some of the countries with the highest proportions are Haiti (about 2.1 percent of the population enslaved), Mau-ritania (about 4.0 percent of the population enslaved), Pakistan (about 1.2

percent of the population enslaved), and India (about 1.1 percent of the popula-tion enslaved).

Table 1 lists the 2013 GSI data in terms of the total estimated number of the enslaved, country-by-country. This is a novel contribution compared to other

estimates of contemporary slavery. Such information can be useful to busi-ness people, policymakers, and students who want a more informed under-standing of where slavery occurs and with what frequency.

Overall, the 2013 GSI estimates about 29.8 million are enslaved among the 162 countries under study. The country with the least number of esti-

. . . 29.8 million are enslaved among the 162 countries under study.

Page 27: Sais.34.1

25Big Data anD Quantitative MethoDs to estiMate MoDern Day slavery

mated enslaved in 2013 was Iceland (twenty-two enslaved), and the country with the greatest number was India (13.9 million enslaved). The standard deviation (or spread) was extremely large: about 1.2 million enslaved.

One important question is if the GSI has made a difference in the real world. One way to shed light on this is to explore some of the statis-tics achieved since the GSI’s launch in October 2013. To date, the GSI has received over half a million website visits. There have been over thirteen thousand downloads of the full report, available in English, Arabic, French, and Spanish. Moreover, there have been over fifteen hundred media reports about the GSI in over thirty-five countries, including The Economist,18 Die Standaard,19 La Vanguardia,20 CNN,21 National Public Radio,22 and Time.23

Some of the media reports about the GSI illustrate how it can gener-ate discussion on an underreported issue. In India, for instance, where the GSI estimates the greatest number of the enslaved to be, media response has been strong. The Times of India reported, “Sixty-six years after indepen-dence, India has the dubious distinction of being home to half the number of modern-day slaves in the world.”24 Perhaps due to such sentiments, the Hindustan Times discussed the causes of slavery in India and observed, “Some of the reasons for high numbers caught in slavery in India are the difficulty in accessing protections and government entitlements, such as the food rations card, corruption or non-performance of safety nets (such as the National Employment Guarantee, primary health care and pensions) and practices of land grabbing and asset domination by high-caste groups.”25

There is also some evidence that the GSI has begun to influence gov-ernment policy. In January of this year, building upon the momentum of the GSI, Andrew Forrest signed a memorandum of understanding (MOU)

Figure 1. Global Slavery Index (GSI)—Proportion of the Population Estimated Enslaved in 2013

Page 28: Sais.34.1

26 Big Data anD Quantitative MethoDs to estiMate MoDern Day slaverySAIS Review Winter–spring 2014

Table 1. Global Slavery Index—Estimated Enslaved in 2013

Country Estimated Enslaved Country Estimated Enslaved

Afghanistan 86,089 Lebanon 4,028 Albania 11,372 Lesotho 14,560 Algeria 70,860 Liberia 29,504 Angola 16,767 Libya 17,683 Argentina 35,368 Lithuania 2,909 Armenia 10,678 Luxembourg 69 Australia 3,167 Macedonia 6,226 Austria 1,100 Madagascar 19,184 Azerbaijan 33,439 Malawi 110,391 Bahrain 2,679 Malaysia 25,260 Bangladesh 343,192 Mali 102,240 Barbados 46 Mauritania 151,353 Belarus 11,497 Mauritius 535 Belgium 1,448 Mexico 103,010 Benin 80,371 Moldova 33,325 Bolivia 29,886 Mongolia 4,729 Bosnia and Herzegovina 13,789 Montenegro 2,234 Botswana 14,298 Morocco 50,593 Brazil 209,622 Mozambique 173,493 Brunei 417 Myanmar 384,037 Bulgaria 27,739 Namibia 15,729 Burkina Faso 114,745 Nepal 258,806 Burundi 71,146 Netherlands 2,180 Cambodia 106,507 New Zealand 495 Cameroon 153,258 Nicaragua 5,798 Canada 5,863 Niger 121,249 Cape Verde 3,688 Nigeria 701,032 Central African Republic 32,174 Norway 652 Chad 86,329 Oman 5,739 Chile 37,846 Pakistan 2,127,132 China 2,949,243 Panama 548 Colombia 129,923 Papua New Guinea 6,131 Costa Rica 679 Paraguay 19,602 Côte d’Ivoire 156,827 Peru 82,272 Croatia 15,346 Philippines 149,973 Cuba 2,116 Poland 138,619 Czech Republic 37,817 Portugal 1,368 Democratic Republic of the Congo 462,327 Qatar 4,168 Denmark 727 Republic of the Congo 30,889 Djibouti 2,929 Romania 24,141 Dominican Republic 23,183 Russia 516,217 Ecuador 44,072 Rwanda 80,284 Egypt 69,372 Saudi Arabia 57,504 El Salvador 10,490 Senegal 102,481

Page 29: Sais.34.1

27Big Data anD Quantitative MethoDs to estiMate MoDern Day slavery

Equatorial Guinea 5,453 Serbia 25,981 Eritrea 44,452 Sierra Leone 44,644 Estonia 1,496 Singapore 1,105 Ethiopia 651,110 Slovakia 19,458 Finland 704 Slovenia 7,402 France 8,541 Somalia 73,156 Gabon 13,707 South Africa 44,545 Gambia 14,046 South Korea 10,451 Georgia 16,227 Spain 6,008 Germany 10,646 Sri Lanka 19,267 Ghana 181,038 Sudan 264,518 Greece 1,466 Suriname 1,522 Guatemala 13,194 Swaziland 1,302 Guinea 82,198 Sweden 1,237 Guinea-Bissau 12,186 Switzerland 1,040 Guyana 2,264 Syria 19,234 Haiti 209,165 Tajikistan 23,802 Honduras 7,503 Tanzania 329,503 Hong Kong, SAR China 1,543 Thailand 472,811 Hungary 35,763 Timor-Leste 1,020 Iceland 22 Togo 48,794 India 13,956,010 Trinidad and Tobago 486 Indonesia 210,970 Tunisia 9,271 Iran 65,312 Turkey 120,201 Iraq 28,252 Turkmenistan 14,711 Ireland 321 Uganda 254,541 Israel 8,096 Ukraine 112,895 Italy 7,919 United Arab Emirates 18,713 Jamaica 2,386 United Kingdom 4,426 Japan 80,032 United States 59,644 Jordan 12,843 Uruguay 9,978 Kazakhstan 46,668 Uzbekistan 166,667 Kenya 37,349 Venezuela 79,629 Kuwait 6,608 Vietnam 248,705 Kyrgyzstan 16,027 Yemen 41,303 Laos 50,440 Zambia 96,175 Latvia 2,040 Zimbabwe 93,749

Source: The Global Slavery Index

with the Pakistani State of Punjab. In the business world, that a government would sign a deal with a businessman to help eradicate slavery in its own borders is atypical. Yet Forrest was able to leverage his influence in Pakistan to encourage a conversation that aims to provide the state of Punjab with inexpensive coal in exchange for assurances that the government will work toward the liberation of its own people. 26 Although it is too early to see how Pakistan will hold up to its promise, this agreement may herald future MOUs between NGOs like Walk Free and governments that want to mitigate slavery, and one day even eradicate it.

Page 30: Sais.34.1

28 Big Data anD Quantitative MethoDs to estiMate MoDern Day slaverySAIS Review Winter–spring 2014

The GSI may also be influencing heads of state. Former U.S. President Jimmy Carter references the GSI several times in his new bestselling book, A Call to Action: Women, Religion, Violence, and Power. And the GSI has been publicly endorsed by, among others, Hillary Clinton, Gordon Brown, Julia Gillard, and Tony Blair.27

Free the Slaves

The GSI strives to use big data to count the number of slaves in the world. Other NGOs have begun to employ longitudinal techniques to chronicle the lives of survivors of slavery once they are free. One such NGO is Free the Slaves (FTS), which Kevin Bales, Peggy Callahan, and Jolene Smith co-founded in 2000 as the sister-organization of Anti-Slavery International (the oldest international human rights organization in the world).28

Early in its evolution, FTS reasoned that the liberation of any slave would be beneficial not only for that individual, but also for the commu-nity, and thus produce a “freedom dividend,” multiplied by each additional person freed. As FTS explains, “Local communities thrive when formerly enslaved people start their own businesses; communities begin to flourish as people come together to organize and watch out for one another; children go to school—and the benefits extend for generations.” 29

For the past decade, FTS has partnered with different grassroots orga-nizations in Haiti, India, Nepal, Ghana, the Democratic Republic of Congo, and Brazil to empower local communities of the enslaved to seek liberation. In India, FTS has worked with a local grassroots organization called Mina Sansadham Evam Mahila Vikas Sansthan (MSEMVS).30 Through the ef-forts of MSEMVS, over 150 villages have eradicated slavery and trafficking in recent years and many more are beginning to experience liberation in the North Indian States of Uttar Pradesh and Bihar, two of India’s poorest states, as Figure 2 highlights.

In addition to empowering people in rural Uttar Pradesh to seek libera-tion, MSEMVS has been among the first NGOs to begin several longitudinal studies on the effects, in addition to quantitative studies of the predictive factors of enslavement. The studies are intended to provide insight into: (1) whether slavery and trafficking have been eradicated; and (2) whether the socio-economic conditions of people living in these communities have improved. I consulted with Free the Slaves at this time, and, along with Ginny Baumann, Jody Sarich, Austin Choi-Fitzpatrick, and Jessica Leslie, helped put together a follow-up report for the village of Kukrouthi in Ut-tar Pradesh.

The follow-up report was conducted among the residents of three hamlets in Kukrouthi village.31 There were two sources of information: The first was a set of 120 household level surveys, and the second was a set of focus group discussions. A total of 929 people were accounted for by the surveys. The time periods under comparison were 2009 (when the libera-tion process began) and 2011 (when the process of self-liberation by local residents was completed).

Page 31: Sais.34.1

29BIG DATA AND QUANTITATIVE METHODS TO ESTIMATE MODERN DAY SLAVERY

Some of the key fi ndings between the 2009 and 2011 studies are as follows, providing credence to FTS’s supposition of there being a “freedom dividend” after liberation.

Growth in Childhood EducationOne important indicator of a freedom dividend in Kukrouthi village is the number of children in school. The underlying premise is that in free com-munities children receive better education, which fuels a society’s human capital. In Kukrouthi, the team from MSEVMS found evidence of signifi cant growth in childhood education rates. Whereas in 2009 only 69 percent of the school-aged children were reported to be in school, by 2011, 91 percent were enrolled, as Figure 3 illustrates.

Figure 2. Uttar Pradesh, North India

Page 32: Sais.34.1

30 BIG DATA AND QUANTITATIVE METHODS TO ESTIMATE MODERN DAY SLAVERYSAIS Review WINTER–SPRING 2014

Better NutritionAnother key indicator illuminating the freedom dividend is access to ad-equate nutrition. As with childhood education, the team from MSEVMS reported a dramatic increase in the number of families that were able to eat three meals a day, from 31 percent in 2009 to 71 percent in 2011. This was more than a 200 percent increase, as Table 2 details.

Table 2. Number of Daily Meals By Household in Kukrouthi, 2009 and 2011

Number of Meals Year Percentage

Two Meals Per Day 2011 22% 2009 31%

Three Meals Per Day 2011 71% 2009 31%

No Response 2011 8% 2009 3%

Figure 3. Percent Children in School in Kukrouthi, 2009 and 2011

Page 33: Sais.34.1

31Big Data anD Quantitative MethoDs to estiMate MoDern Day slavery

Improved Access to Health CareYet another strong indicator of a freedom dividend is access to health care, even if of rudimentary quality. In 2011, MSEVMS reported that almost the entire population of Kukrouthi village had access to healthcare. This was another dramatic increase compared to 2009, when MSEVMS found that just 52 percent of families received health care treatment. Table 3 provides a breakdown of this comparison.

Table 3. Comparison of Access to Health Care in Kukrouthi, 2009 and 2011

Access to Health Care Year Percent

Yes 2011 96%

2009 57%

No 2011 3%

2009 43%

Don’t Know 2011 1%

2009 .

No Response 2011 1%

2009 1%

Improvement in Childhood VaccinationsLastly, in 2009, just one-third of children had the proper number of recom-mended vaccinations (i.e., three vaccinations). By 2011, this had increased to 90 percent, as Table 4 shows.

Table 4. Comparison of Child Vaccinations in Kukrouthi, 2009 and 2011

Immunizations Year Percentage

None 2011 . 2009 49%

One 2011 3% 2009 7%

Two 2011 7% 2009 12%

Three 2011 90% 2009 33%

Page 34: Sais.34.1

32 Big Data anD Quantitative MethoDs to estiMate MoDern Day slaverySAIS Review Winter–spring 2014

A World Without Slavery

Applying quantitative methods to the study of contemporary slavery could contribute significantly to shedding more light on the phenomenon. In collaboration with my colleagues at the Walk Free Foundation, I have used quantitative methods to estimate the total number of enslaved in the

world today. This, in turn, has generated discussion among the media and pol-icy community on how to mitigate modern day slav-ery, with an eye toward its eradication. With Free the Slaves and MSEVMS, we

have begun to chronicle systematically how communities can benefit from freedom. This information provides preliminary evidence to policy makers that liberating slaves provides a wide range of socioeconomic benefits.

The modern day anti-slavery movement is young. Moving forward, we need more scholars and policy makers who want to explore what quantita-tive methods and big data can do for the movement. We are at a point in the world where everyone agrees that contemporary slavery is a wrong that must be addressed. The time is ripe for further discussion on how to make this a reality. I hope we can get there, at least in part, through employing quantitative methods and exploring big data.

Notes

1 Kevin Bales, The Global Slavery Index, 2013. http://www.globalslaveryindex.org/report/#view-online 2 For example: http://www.gems-girls.org/get-involved/very-young-girls 3 Congo. https://www.freetheslaves.net/congo 4 CNN Freedom Project, http://thecnnfreedomproject.blogs.cnn.com.5 For example: Survivors of Slavery Speak Out, http://survivorsofslavery.org 6 Trafficking In Persons Report, http://www.state.gov/j/tip/rls/tiprpt 7 For example: http://www.coha.org/the-trafficking-in-persons-report-who-is-the-united-states-to-judge 8 For example: John J. Mearsheimer, “The False Promise of International Institutions,” In-ternational Security, Vol. 19, No. 3 (1995) pp. 5–49.9 For example: Anne Marie-Slaughter, A New World Order, (Princeton University Press, 2005).10 Barack Obama, “Remarks by the President to the Clinton Global Initiative,” September 25, 2012. http://www.whitehouse.gov/the-press-office/2012/09/25/remarks-president-clinton-global-initiative 11 Steve Lohr, “The Age of Big Data,” The New York Times, February 11, 2012. http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html?pagewanted=all&_r=0 12 “Polaris Project Launches Global Human Trafficking Hotline Network.” http://www.polarisproject.org/media-center/news-and-press/press-releases/767-polaris-project-launches-global-human-trafficking-hotline-network 13 “Darpa Reinventing Search Engines to Fight Crime,” Wired, February 11, 2014. http://www.wired.co.uk/news/archive/2014-02/11/darpa-memex-human-trafficking14 The Walk Free Foundation, http://www.walkfreefoundation.org

Applying quantitative methods to the study of contemporary slavery could contribute significantly to shedding more light on the phenomenon.

Page 35: Sais.34.1

33Big Data anD Quantitative MethoDs to estiMate MoDern Day slavery

15 Elisabeth Behrmann, “Gates Helps Australia’s Richest Man in Bid to End Slavery,” Bloom-berg, April 14, 2013. http://www.bloomberg.com/news/2013-04-10/gates-helps-australia-s-richest-man-in-bid-to-end-slavery.html 16 The Global Slavery Index, http://www.globalslaveryindex.org 17 Kevin Bales, “The Social Psychology of Modern Slavery,” Scientific American, April 2002. 18 “Dry Bones,” The Economist, October 19, 2013. http://www.economist.com/news/international/21588105-hateful-practice-deep-roots-still-flourishing-dry-bones19 “Wereldwijd bijna 29 miljoen slaven [29 million people in slavery worldwide],” De Stan-daard, October 17, 2013. http://nos.nl/artikel/563375-wereldwijd-bijna-30-miljoen-slaven.html20 “Casi 30 millones de personas son esclavos modernos [Almost 30 million people are modern slaves],” La Vanguardia, October 18, 2013. http://www.lavanguardia.com/20131018/54391301708/casi-30-millones-de-personas-son-esclavos-modernos-barce-lona.html21 Tim Hume, “India, China, Pakistan, Nigeria on Slavery’s List of Shame, Says Report,” CNN, October 17, 2013. http://www.cnn.com/2013/10/17/world/global-slavery-index 22 Audie Cornish, “Report Estimates 30 Million People in Slavery Worldwide,” National Public Radio, October 17, 2013, http://www.npr.org/templates/story/story.php?storyId=236407720 23 Nilanjana Bhowmick, “Report: Almost 14 Million Indians Live Like Slaves,” Time, October 17, 2013. http://world.time.com/2013/10/17/report-almost-14-million-indians-live-like-slaves/24 “India Has Half the World’s Modern Slaves: Study,” The Times of India, October 18, 2013. http://timesofindia.indiatimes.com/india/India-has-half-the-worlds-modern-slaves-Study/articleshow/24313244.cms 25 Abhijit Patnaik, “Modern Slavery Widespread in India,” Hindustan Times, October 17, 2013. http://www.hindustantimes.com/India-news/NewDelhi/Modern-slavery-widespread-in-India/Article1-1136431.aspx 26 Dennis Shanahan, “Andrew Forrest Strikes Cheap Coal Deal to End Pakistan Slav-ery,” The Australian, January 23, 2014, http://www.theaustralian.com.au/business/mining-energy/andrew-forrest-strikes-cheap-coal-deal-to-end-pakistan-slavery/story-e6frg9df-1226808181875# 27 The Global Slavery Index. http://www.globalslaveryindex.org/endorsements28 Anti-Slavery International. http://www.antislavery.org/english 29 FTS In India: Free a Village, Build a Movement. http://www.freetheslaves.net 30 https://www.ashanet.org/projects/project-view.php?p=90731 Ginny Baumann, et al, “Follow Up Study of Slavery and Poverty In Kukrouthi village, St Ravidas Nagar District, Uttar Pradesh,” June 2012, unpublished manuscript. Free the Slaves.

Page 36: Sais.34.1

35A ConversAtion with ArCh PuddingtonSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

35© 2014 by The Johns Hopkins University Press

A Conversation with Arch Puddington, Vice President for Research at Freedom House

Who is the target audience of Freedom House reports?

From the beginning, we have sought to provide analysis that combines scholarly rigor with a methodology and vocabulary that is accessible to the general public. Obviously, there is a niche group of policymakers here and in Europe, as well as journalists, scholars, political activists and dissidents, who make up our core audience. But our data are also widely used by educa-tors and students, including at the secondary level.

We have also developed a growing audience among foreign government officials. This is in large measure due to the important role of democracy and honest governance in the calculations of international development agencies, financial institutions, and governments. Especially since Freedom House findings have been formally incorporated into the foreign assistance process of the American government, we have experienced a major increase in communications with foreign diplomats, who want to discuss, or com-plain about, our conclusions about their countries.

Freedom House’s 2013 Freedom on the Net report examined internet activism and “increasingly sophisticated restrictions on internet free-dom” by authoritarian regimes. Based on the report, what opportuni-ties and obstacles do new technologies offer in promoting freedom?

New technologies offer a significant opportunity to advance democracy. Throughout the world, online activists and ordinary social media users uti-lize these tools to organize, lobby, and hold their governments accountable. Women’s rights groups, free speech advocates, and human rights organiza-tions have staged successful advocacy campaigns to overturn or prevent the passage of oppressive laws. In many authoritarian states, such as China, Saudi Arabia, and Bahrain, exposés by online and citizen journalists reveal-ing corruption, police abuse, and pollution often force the authorities to acknowledge the issue, and in some cases, hold the perpetrators accountable.Unfortunately, the transformative power of digital media is not limited

Arch Puddington is vice president for research at Freedom House. He manages the publication of Freedom in the World, an annual report assessing global political rights and civil liberties, and is responsible for the development of new research and advocacy programs.

Page 37: Sais.34.1

36 SAIS Review Winter–Spring 2014

to individuals fighting to promote freedom. Technological advances also bring new tools to censor the web and intimidate citizens who are engaged

in online speech that is deemed to threaten the regime, insult the dominant religion, or sow social discord. Authoritarian regimes monitor the personal communications of their citizens for po-litical reasons, with the goal of identify-ing and suppressing

government critics and human rights activists. Such monitoring can have dire repercussions for the targeted individuals in those countries, including imprisonment, torture, and even death.

In 2007, Freedom House published a report indicating a “profoundly disturbing deterioration”—a greater number of countries were becom-ing less free than were becoming more free. Could you share insights on this finding?

According to our findings, more countries have experienced declines in freedom than have experienced gains during each of the last eight years. This is unprecedented in the forty-one-year history of Freedom in the World. At the same time, this decline is not in itself a cause for alarm. Many of the

declines represent quite small setbacks and not a pell-mell retreat from the gains of pre-vious decades. Many coun-tries that embraced democ-racy over the previous four decades had little experience with the institutions of free-dom, and their adherence to good government standards

is beginning to fray. Especially in times of relative scarcity, corruption is emerging as a particular evil, especially as top-to-bottom graft and favorit-ism erodes popular faith in democratic institutions.

A more serious problem that is reflected in our findings is the durabil-ity of what we call modern authoritarian regimes. Russia’s Vladimir Putin and the Chinese Communist Party leadership are the best examples of this phenomenon, but there are others as well: Aliyev in Azerbaijan, the Iranian clerics, Correa in Ecuador, the post-Chavez group in Venezuela. Modern authoritarians preside over countries that are well-integrated into the global economic and diplomatic systems and often possess energy riches.

Technological advances also bring new tools to censor the web and intimidate citizens who are engaged in online speech that is deemed to threaten the regime, insult the dominant religion, or sow social discord.

. . . corruption is emerging as a particular evil, especially as top-to-bottom graft and favoritism erodes popular faith in democratic institutions.

Page 38: Sais.34.1

37A ConversAtion with ArCh Puddington

The leaders are unabashedly antidemocratic and anti-Western. They devote their energies to the control of the political process, the press, civil society, and the rule of law. They avoid the excesses and stupidities of communism, especially in economic policy, but use nuanced and sophisticated methods to control the levers of power. Modern authoritarianism has emerged over the past fifteen years, and its practitioners have grown in power and even international respectability over time. Modern authoritarianism today ranks as the most worrying threat to freedom around the world.

The most recent Freedom in the World report noted that the number of electoral democracies has risen, while the distribution of countries in each of the “free,” “partly free,” and “not free” categories did not change significantly in comparison to 2012. Why do you think this is the case?

One way to think of a country with a designation as “free” is as a liberal or consolidated democracy. In recent years, the number of free countries has remained steady at eighty-seven to ninety, meaning that approximately 45 percent of the world’s sovereign states enjoy systems that guarantee com-petitive elections and a broad range of civil liberties. On the other hand, the number of electoral democracies has oscillated between 115 and 123. There are thus some thirty countries that can be said to have met internationally accepted standards for competitive elections but which fall short on other indicators that measure liberal democracy—press freedom, minority rights, gender equality, corruption, and so forth.

Freedom in the World ranks Mexico as an electoral democracy but also places it in the “partly free” category because of the impact of uncontrolled violence. Indonesia likewise qualifies as an electoral democracy but is ranked as “partly free” because, among other problems, its government has been unable to secure the rights of religious minorities. Given that the standards for gaining a designation as an electoral democracy are less strict than for achieving designation as a free country, it is not surprising that there is more movement in and out of the electoral democracy category.

What do you make of recent articles from BBC and al-Jazeera (among others) calling attention to corruption in the EU? Some have argued that the quality of electoral democracies in the United States and in Europe have been deteriorating. What are your thoughts on this as-sertion? How closely do you think popular indices like those from Freedom House mirror the reality on the ground?

I’m not overly exercised about the level of corruption in the EU. Every so-ciety based on money transactions suffers from corruption to one degree or another. The key here is whether corruption is pervasive, officially toler-ated, and engaged in by the political leadership. The most damning report on European corruption was commissioned by the EU itself, and most EU countries have media which investigate corruption charges and an indepen-

Page 39: Sais.34.1

38 SAIS Review Winter–Spring 2014

dent judiciary which prosecutes corrupt officials. Europe should be con-cerned when officials are intervening to prevent the press from uncovering corrupt acts or prosecutors from bringing charges against officials accused of graft. All too often accusations of widespread corruption in democracies are advanced by people of bad faith from countries—Russia and Belarus, for example—where corruption is a way of life.

As for the United States, there clearly are growing problems with its political system. Gerrymandering has gotten worse and the new movement for voter identification has been implemented in ways that suggest efforts to weaken Democratic candidates. At the same time, the American system retains a unique dynamism. It remains open to the emergence of new faces (Barack Obama) and new forces (the Tea Party). Despite its multinational character, the United States has managed to avoid the emergence of influ-ential parties or movements that preach racism or xenophobia.

We place considerable effort on capturing these nuances in Freedom in the World and other reports. Freedom in the World is not a report on governance per se; we endeavor to reflect the level of freedom an individual experiences on the ground, and zero in on the threats to freedom whether they come from the state, terrorists, extremist movements, or other sources. We have developed a methodology that looks at the broad set of institutions and val-ues that make up human freedom while providing a flexibility that enables us to highlight the qualitative differences between one society and another.

How does Freedom House collect the quantitative and qualitative in-formation used in its reports? How do you extract significant insights from this information?

We see our principal role as providing analysis, including scores and judg-ments about democratic performance, to the policymaking community, the media, and scholars. Our analysts make use of the vast sources of informa-tion that are available these days, including government reports, the find-ings of think tanks and NGOs, reports of multilateral institutions, press accounts, interviews with officials and critics alike, and the many other sources that have emerged in the data explosion era.

Freedom House is a source for analysis, not data. We see our role as providing assessments on the state of freedom, identifying the principal threats to freedom, and showcasing global and regional trends. Using data from our country analysis, we are able to identify the global and regional trajectory of freedom, broadly defined, as well as specific elements of free-dom, such as freedom of expression and press, elections, corruption and transparency, civil society, and rule of law. We can, in other words, illuminate which institutions of democracy are most vulnerable to pressure from au-thoritarian rulers, and which institutions have proved most durable. There are other organizations that see their mission as providing data on elections, corruption, assaults on journalists, economic freedom, and so forth. Free-dom House, by contrast, works to inform the public about the gains and setbacks in democratic government, civil liberties, and personal freedom.

Page 40: Sais.34.1

39A ConversAtion with ArCh Puddington

What do you make of the recent trend in which governments freely release open data? What are the policy implications of this trend?

Clearly, enhanced transparency is preferable to less openness. My concern is that some governments will be tempted to fudge or falsify data or decide to stop publishing informa-tion when the results are embarrassing. For some time now, Argentina has been publishing inflation figures that most experts regard as bogus. After po-litical attention was drawn to spiraling crime rates, the Venezuelan government stopped publishing statis-tics on violent crime. These examples suggest that in the future, as in the past, the data world will be divided between democracies that almost always publish honest statistics and other countries whose data may or may not reflect reality.

For democracies, political leaderships will face a new challenge in explaining the unwelcome news that will inevitably emerge from published data. More data will mean a more informed citizenry, especially at the elite level. But it will also mean more pressure on governments to communicate, often in response to the arguments of demagogues, why unemployment rates, inequality, traffic accidents, or test scores for children are moving in the wrong direction.

Authoritarian regimes will have it easier. Their leaders will either quash uncomfortable facts or distort them. Here it will be essential that interna-tional financial institutions, transparency think tanks, and the global busi-ness community weigh in by demanding honest accounting. It is instructive that Argentina agreed to adjust its inflation figures after pressure from the IMF.

The field of international affairs has become more focused on collecting and analyzing large quantities of data. What would you recommend to international affairs students as the field be-comes more data-driven?

I would urge students to remember that data and facts can be manipulated and misused. Serious assessment of a society’s political well-being requires facts, but it also demands honest interpretation. An overemphasis on data can distort an analyst’s efforts to understand the true quality of freedom as thoroughly as can outright bias.

. . . in the future, as in the past, the data world will be divided between democracies that almost always publish honest statistics and other countries whose data may or may not reflect reality.

An overemphasis on data can distort an analyst’s efforts to understand the true quality of freedom as thoroughly as can outright bias.

Page 41: Sais.34.1

41Corruption, transparenCy, and apathy in the Western WorldSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

Of Note

A Deterioration of Democracy? Corruption, Transparency, and Apathy in the Western World

Rachel Ostrow

Arch Puddington, in his interview with the SAIS Review of International Affairs, expresses a firm belief in the power of interpretation. “An over-

emphasis on data,” he says, “can distort an analyst’s efforts to understand the true quality of freedom as thoroughly as can outright bias.” Freedom House’s annual reports on freedom have made data on democracy accessible for millions of people in the diplomatic, academic, and wider communities. These analyses have criticized governments throughout the Middle East, Africa, and Asia for anti-democratic and autocratic methods. However, Free-dom House’s important work researching authoritarianism and democracy throughout the world should start to focus once again on its birthplace—the Western world.

Freedom House, based in the United States (and largely funded by government agencies such as the State Department and the U.S. Agency for International Development), could be—and has been—accused of hav-ing Western biases. A quick look at Freedom in the World 2014 shows that the United States and Canada, as well as the majority of European nations, are classified as “free” right up to the Ukrainian border.1 However, several coun-tries in Europe—as Freedom House rightly notes—have suffered democratic backsliding. France, Switzerland, and Hungary have all passed laws or gone through social movements seeking to limit the rights of migrants and ethnic minorities. These occurrences, though noted in Freedom House’s analysis, do not seriously affect the calculations within.

Hungary itself is an excellent example of where this analysis has masked the more sinister undertones within an open democracy. Hungary’s recent re-election of Viktor Orban—in an election widely seen as free and fair—can be seen as a backwards turn for Hungarian democracy. As a member of the right-wing, nationalist Fidesz party, Orban will likely have to make concessions to the far-right, anti-Semitic, and anti-Roma Jobbik party, which

41© 2014 by The Johns Hopkins University Press

Rachel Ostrow is a second-year M.A. candidate at the Johns Hopkins University Paul H. Nitze School of Advanced International Studies (SAIS) concentrating in Russian and Eurasian Studies. She is Web Editor of The SAIS Review.

Page 42: Sais.34.1

42 SAIS Review Winter–Spring 2014

is gaining power and influence in Hungary. Orban has also courted the governments of the authoritarian states of Russia, Azerbaijan, and China,

much to the consternation of other European states.2 While Freedom House notes these events with concern, these movements do not seem to affect the calculations for Hungary, and it remains clas-sified as a “free” state in Free-dom in the World 2013. In a more objective, side-by-side

analysis of migrant rights and media access, countries like Hungary may match more closely with China than their governments—or Freedom House—would care to admit.

Despite investigations by the BBC and Al-Jazeera showing that corrup-tion is an issue in the European Union, Puddington pushes back against those allegations, and the wider implications for the deteriorating quality of electoral democracies in the established democratic world. “Every soci-ety based on money transactions suffers from corruption to one degree or another,” he says, arguing that corruption within the political class is the clearest sign that a society is undemocratic, and that such corruption is rare within Western democratic societies.

Giovanni Sartori’s “conceptual stretching” theory would imply that corruption cannot be the same everywhere. Dynastic authoritarian regimes are considered inherently corrupt by Western scholars; however, the stability they allow in some states (particularly those in a post-conflict state) ensures their popularity on a domestic level. In the United States, the recent Su-preme Court ruling in McCutcheon v. Federal Election Commission, enshrining money as speech—and thus allowing unlimited amounts of private spending in political campaigns—could be viewed as corrupt in countries where poli-tics remain clan- or kinship-based.3 The effects that the McCutcheon ruling will have on American democracy remain to be seen, but a likely outcome will be the continued disenfranchisement of low-income and minority vot-ers as special interests pay their way into office.

Puddington notes that the American political system has “growing problems,” specifically pinpointing gerrymandering and voter ID laws as obstacles to continued democracy in the United States.4 He argues, however, that the United States still has a dynamic political system that prevents the emergence of xenophobic or racist parties. Puddington is correct in assert-ing that the United States is still a vibrant democracy. However, it could be argued that the increasing polarity of the two-party system has drowned out the rise of minority parties—both those considered dangerous (the xe-nophobic and racist parties Puddington warns of) and those less so, namely socialist and issue-specific parties.

A two-party system ignores pluralism, instead prioritizing electoral duopoly; while this does not directly threaten democracy, it does stoke

Hungary itself is an excellent example of where this analysis has masked the more sinister undertones wi th in an open democracy.

Page 43: Sais.34.1

43Corruption, transparenCy, and apathy in the Western World

electoral indifference. Since 2008, youth participation in democratic insti-tutions in the United States has fallen, and distrust in political institutions among that age group has risen.5 Though 52 percent of young Americans polled stated that they would re-call every single member of the United States Congress following the 2013 govern-ment shutdown, a quarter of the respondents said they did not plan to vote in the 2014 midterm elections.6 Revelations such as those disclosed in the Edward Snowden National Security Administration (NSA) leaks continue to erode trust in the government as well.7

Challenges to democracy exist throughout the Western world, not just in Hungary and the United States. As Puddington notes, Mexico, de-spite being an electoral democracy, is listed as only “partly free” due to the threat of violence against its citizens. The Freedom in the World index is not just about governance, says Puddington, but rather the “level of freedom an individual experiences on the ground.” Thus, threats from outside the government—such as from terrorist groups and rebels—can affect freedom even in the most developed democracy. An excellent example of this extra-governmental threat are the recent elections in Afghanistan, where threats by the Taliban insurgency against the April 2014 elections were carried out in the form of terrorist bombings and shootings throughout the country in the weeks and months leading up to the elections. Defying all expectations, however, the elections were considered to have been relatively free and fair, and even featured significant female participation.8,9

The use of data on a large scale has been Freedom House’s method since the institution began publishing the Freedom in the World index in 1973. Advances made in the calculation and tabulation of big data in recent years have allowed for evaluation of post-communist, developing, authoritarian, and nascent countries’ democratic development. Indices like Freedom in the World help to prioritize international development efforts, aid allocation, and even sanctions. However, the data used in these indices is, as Pudding-ton says, useless without qualitative evaluation. With democracy’s declining quality throughout the developed and developing world, this kind of analy-sis has become even more important.

Freedom House , in 2007, announced the “profoundly disturbing deterioration” of democracy—according to its numbers, from 2006 to 2007, a greater number of countries had become less free than more free. This

A two-par ty sys tem ignores pluralism, instead prioritizing electoral duopoly; while this does not directly threaten democracy, it does stoke electoral indifference.

. . . the fact that some of these declines have been seen in the world’s most established democracies is a cause for concern.

Page 44: Sais.34.1

44 SAIS Review Winter–Spring 2014

has now been a trend for the past eight years. Puddington insists this is not cause for alarm, and that tough economic times may be to blame for some of this democratic backsliding. However, the fact that some of these declines have been seen in the world’s most established democracies is a cause for concern. Freedom House should more closely examine the data for the world’s developed states and cast a more critical eye on their demo-cratic institutions and norms, subjecting these developed states to the same qualitative analysis experienced by developing democracies.

Notes

1 Freedom House, Freedom in the World 2014, January 23, 2014.2 Charles Gati, “What Viktor Orban’s victory means for Hungary and the West,” The Wash-ington Post, April 7, 2014.3 “Before and After the Supreme Court’s Ruling,” The New York Times, April 2, 2014.4 The interview with Mr. Puddington was held prior to the Supreme Court’s ruling in Mc-Cutcheon v. FEC.5 Benjamin Scuderi, “What are Millenials thinking?” Harvard Political Review, 2014.6 Ian Kohnle, “Angry, Yet Apathetic: The Young American Voter,” Harvard Political Review, December 3, 2013.7 Harry Enlen, “Polls show Obama’s real worry: NSA leaks erode trust in government,” The Guardian, June 13, 2013.8 Rob Nordland and Matthew Rosenberg, “After ‘09 Fraud, Afghanistan Reports a Cleaner Election,” The New York Times, April 8, 2014.9 Rob Nordland, Azam Ahmed, and Matthew Rosenberg, “Afghan Turnout Is High as Voters Defy the Taliban,” The New York Times, April 5, 2014.

Page 45: Sais.34.1

45Controversy surrounding the thai riCe-pledging sChemeSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

45© 2014 by The Johns Hopkins University Press

The Politics of Numbers: Controversy Surrounding the Thai Rice-pledging Scheme

Pongkwan Sawasdipakdi

Thailand’s rice-pledging scheme has been widely criticized for its unsustainability. While the government argues that the program is beneficial for rice farmers, the opposition points out that the program has resulted in significant losses. Over the past few years, data have been used as a tool to support arguments from both the government and the opposition. However, the reliability of data utilized by both sides is questionable. This paper discusses problems surrounding data collection and usage, both of which have contributed to the controversy surrounding the policy.

Introduction

One of the most controversial policies implemented by Prime Minister Yingluck Shinawatra’s government is the rice-pledging scheme initi-

ated in 2011. The policy, often criticized for its populist appeal, intended to improve the living standards of Thai rice farmers by increasing rice prices. In order to increase the prices, the program allows farmers to directly pledge rice to the government at a fixed price of 15,000 baht ($469) per ton.1 The government believes that this higher-than-market price helps ease the bur-den placed on the rice farmers due to the increasing cost of production.

However, the results have been starkly different than those the govern-ment anticipated. Because the government has been unable to export the rice it purchased from domestic farmers at elevated prices, its current surplus of rice is unprecedented. While benefits to farmers are debatable, the program is significantly over budget and is a heavy burden on government finances. In June 2013, the opposition Democratic Party disclosed a classified document from the audit committee overseeing the agricultural product-pledging programs. This document concludes that the program was over budget by 260 billion baht ($8.13 billion).2 The leaked information immediately became the opposition’s political weapon to nurture anti-government sen-timent. The reports also led to credit rating agency Moody’s downgrading Thailand’s sovereign debt. However, instead of defending themselves by

Pongkwan Sawasdipakdi is a second-year M.A. candidate at the Johns Hopkins Paul H. Nitze School of Advanced International Studies (SAIS) concentrating in Southeast Asia Studies.

Page 46: Sais.34.1

46 SAIS Review Winter–Spring 2014

disclosing available data, Minister of Commerce Boonsong Teriyapirom and Deputy Minister of Commerce Natawut Saikua repeatedly insisted that the program would be over budget by less than 260 billion baht ($8.13 billion)

without referring to any spe-cific numbers.3

This was an ineffective way to respond to the claims made by the opposition. In a modern society, people in-creasingly demand statistics either to confirm or convey

their beliefs. The failure of the ministers to offer an alternative set of sta-tistics damaged the credibility of the government. First, it showed that the government did not have reliable data with which it could evaluate the impact of the program and utilize as the basis for future decisions. Second, it led to criticism that the government did not release the information because it wanted to maintain a populist policy that targeted the ruling party’s political base.

This paper discusses problems surrounding data collection and us-age, both of which have contributed to the controversy surrounding the policy. It argues that the confusion over the statistical method contributed to the ambiguity of the program’s performance evaluation. Furthermore, the government’s failure to disclose the available statistics to the public has damaged the credibility of not only the program, but the government.

The first section of the paper will briefly explain the regulations and procedures of the rice-pledging scheme. The following section will explore controversies and claims made by the opposition that affect the credibility of the program and the government. Finally, the paper analyzes how the confusion over data collection and statistics usage reinforced the problem.

Regulations and Procedures of Thailand’s Rice-pledging Scheme

The current rice-pledging scheme allows rice farmers to pledge an unlim-ited amount of their self-produced paddy to the government. This type of agricultural program in Thailand is unusual. The Thai government’s other agricultural policies generally set a quota for how much farmers can sell directly to the government. Because the new policy allows farmers to, in essence, sell all of their rice production to the government, it requires that rice farmers intending to participate in the program obtain approval from the Department of Agriculture and their community. Moreover, the farmers have to certify that the rice they sell to the government is their own product. In other words, neither farmers nor landlords are allowed to pledge rice purchased from other owners or unknown origins to the government. These regulations were designed to lower instances of corruption.4

With these basic requirements fulfilled, farmers are eligible to pledge their product at any local rice miller. In return, the millers issue a warehouse receipt for the farmers to claim funding from the Bank of Agriculture and

The failure of the ministers to offer an alternative set of statistics damaged the credibility of the government.

Page 47: Sais.34.1

47Controversy surrounding the thai riCe-pledging sCheme

Agricultural Cooperatives (BAAC). The BAAC is then obligated to process a payment to the farmers within three business days. However, in reality, the government has had difficulty fulfilling the time commitment. In some cases, the government has failed to process the entire payment to local farm-ers, mainly due to its inability to sell the rice on the international market and secure loans from Thai banks.

Procurement prices vary based on the type and quality of rice, rang-ing from 13,800 baht ($431) per ton to 20,000 baht ($625) per ton. The announced 15,000 baht ($469) per ton price was used as a benchmark for 100 percent white rice. Besides differences in types, prices are also vary based on moisture content. Rice with low moisture content is priced higher than rice with higher moisture content. Table 1 shows detailed price data for different types of rice.5

Table 1. Procurement Prices of the Rice-pledging Scheme

Type Pledging Prices Baht US Dollar

White rice 25% 13,800 431.25

White rice 15% 14,200 443.75

White rice 10% 14,600 456.25

White rice 5% 14,800 462.50

White rice 100% 15,000 468.75

Sticky rice (short grain) 15,000 468.75

Sticky rice (long grain) 16,000 500.00

Thai Pathumthani fragrant rice 16,000 500.00

Province fragrant rice 18,000 562.50

Thai jasmine rice 20,000 625.00

Source: Department of Internal Trade, Ministry of Commerce

Each miller processes the pledged raw rice after procuring it from farmers. These millers are paid by the government to mill and deliver processed rice to the government’s central warehouses, controlled by the Public Warehouse Organization and the Marketing Organization for Farm-ers. A subcommittee overseeing rice sales is then responsible for drafting a selling plan. The government, at first, planned to outsource rice sales to a private company. However, the private company lacked market networks both domestically and internationally, and could not sell rice at a profit-making level. Instead, the government has accumulated a surplus of rice. As a result, the government has signed contracts with seven other companies to sell them rice at significantly discounted prices. This situation has been

Page 48: Sais.34.1

48 SAIS Review Winter–Spring 2014

worsened by the fact that the program is suffering from financial problems and requires immediate additional revenues to pay the farmers.6

Criticisms of the Rice-pledging Scheme

The rice-pledging program has been criticized by the public, the opposition, and both Thai and foreign economists for being unsustainable. After the first year of implementation, in September 2012, the Thailand Develop-ment Research Institute (TDRI), a prominent Thai economic think tank, published a research paper on the advantages and disadvantages of the program. The research indicated that the program is beneficial for the rice farmers in two ways. First, the rice farmers participating in the program have received additional income from fixed and market price differentials totaling 72.71 billion baht ($2.27 billion). Second, farmers not participat-ing in the program are able to sell their rice at higher prices. This is because private merchants have to compete with the government by offering farmers higher prices, thereby increasing domestic market rice prices.7 However, the paper also criticizes many aspects of the policy. First, the program mostly benefits high and middle-income rice farmers because they can afford high costs of production and utilize economies of scale to produce rice on a larger scale. In contrast, poorer rice farmers mainly grow rice for household consumption. Second, the program is costly. Aside from the estimated cost of over 300 billion baht ($9.38 billion) that the government paid in subsi-dies to the farmers in the 2011–2012 production season, it paid the millers approximately 20.64 billion baht ($645 million) for the milling costs. The government has also had to rent warehouses and pay costs to preserve rice quality. Moreover, the government had to hire surveyors to inspect quantity and quality of the procured rice and pay interest and miscellaneous costs. Table 2 shows the costs of the rice-pledging scheme for the 2011–2012 sea-son estimated by the TDRI report.

The third backlash pointed out by the TDRI is corruption. The paper raises a concern that millers participating in the program could smuggle rice from neighboring countries (specifically mentioning Cambodia), and pledge the smuggled rice to the government under the names of random farmers. Moreover, the millers could deliver milled rice of sub-par quality and quan-tity to the government. Fourth, if rice is stored for long periods of time, its quality decreases, as does its value. This could affect the prestige of Thai rice in the world market; Thai rice is globally renowned for its high quality and sells for a higher price than rice produced by other countries. The changing perception towards Thai rice can plausibly drive Thai rice prices down in the future. Finally, the TDRI report criticizes the government’s vague rice sales plan, which it alleges could accelerate the program’s financial prob-lems. While the pledging and procurement processes are clearly elaborated in the policy, the government only states that rice sales are handled by the subcommittee on rice sales. This subcommittee is responsible for deter-mining quantity, prices, methods, and conditions for selling rice stored in government’s warehouses. However, no specific rice-selling guidelines have yet been published.

Page 49: Sais.34.1

49Controversy surrounding the thai riCe-pledging sCheme

Tab

le 2

. Exp

ense

s, I

nco

me,

an

d P

rofi

t of

th

e R

ice-

ple

dgi

ng

Sch

eme

2011

–201

2 (m

illi

on b

aht)

Tab

le 2

: Exp

ense

s, I

nco

me,

an

d P

rofi

t of

the

Ric

e-p

led

gin

g Sc

hem

e 20

11-2

012

(mil

lion

bah

t)

Con

ten

ts

Ric

e P

rice

s b

ased

on

Ass

um

pti

on 1

.2

Ric

e P

rice

s b

ased

on

Ass

um

pti

on 1

.3

Wet

Sea

son

20

11-2

012

Dry

Se

ason

20

12

Tot

al

Wet

Sea

son

20

11-2

012

Dry

Sea

son

20

12

Tot

al

1.

Loa

ns

Cal

cula

ted

fro

m t

he

amou

nt

of

un

mil

led

ric

e in

sto

ck a

s of

Se

pte

mbe

r 20

12 (m

illi

on t

ons)

118.

593.

68

6.95

179,

294.

50

12.1

0

297,

888.

18

19.0

8

118,

593.

68

6.95

179,

294.

50

12.1

0

297,

888.

18

19.0

8

2.

Inte

rest

s 3,

331.

48

2,18

4.35

5,

515.

83

3,33

1.48

2,

184.

35

5,51

5.83

3.

O

per

atio

nal

Cos

ts

3.1

Mil

lin

g co

sts

3.2

War

ehou

se r

ents

an

d

qual

ity

pre

serv

atio

n (4

7 ba

ht/

ton

mil

led

ric

e/3

mon

ths)

3.

3 Q

ual

ity

insp

ecti

on (1

4.5

bah

t/to

n m

ille

d r

ice)

3.

4 O

per

atio

nal

cos

ts o

f go

vern

men

tal a

gen

cies

3.

5 D

eter

iora

tion

cos

ts (5

%

year

ly d

ecre

ase

of v

alu

e of

m

ille

d s

tore

d in

th

e w

areh

ouse

s)

13,7

75.8

9 6,

989.

35

483.

32

61.2

9 2,

770.

18

3,47

1.76

18,3

66.5

2 12

,369

.42

652.

48

108.

41

3,17

6.01

2,

060.

20

32,1

42.4

2 19

,358

.77

1,13

5.80

16

9.70

5,

946.

19

5,53

1.96

13,4

20.9

7 6,

989.

35

483.

32

61.2

9 2,

770.

18

3,11

6.84

18,1

02.6

5 12

,369

.42

652.

48

108.

41

3,17

6.01

1,

796.

32

31,5

23.6

2 19

,358

.77

1,13

5.80

16

9.70

5,

946.

19

4,91

3.16

4.

Tot

al E

xpen

ses

(1+2

+3)

Tot

al E

xpen

ses p

er T

on

135,

701.

05

19,5

25

199,

845.

38

16,5

19

333,

546.

43

17,6

16

135,

346.

13

19,4

74

199.

581.

50

16,4

97

334,

927.

63

17,5

85

5.

Tot

al I

nco

me

Tot

al In

com

e pe

r T

on

92,3

90.6

8 13

,293

13

0,67

4.70

10

,798

22

3,02

5.38

11

,709

83

,660

.02

12,0

37

115,

486.

87

9,54

6 19

9,14

6.89

10

,455

6.

L

oss

Loss

per

Ton

43

,310

.37

6,23

2 69

,210

.68

5,72

1 11

2,52

1.05

5,

907

51,6

86.1

1 7,

437

84,0

94.6

3 6,

951

135,

780.

74

7,12

8 E

stim

atio

n A

ssu

mp

tion

s:

1.

Ass

um

pti

ons

for

rice

sal

es a

nd

sal

e p

rice

s 1.

1 A

ssu

me

that

th

e p

rogr

am s

tore

d r

ice

in t

he

war

ehou

ses

for

a ye

ar, s

tart

ed f

rom

Oct

ober

20

11 a

nd

en

ded

in S

epte

mbe

r 20

12. T

he

gove

rnm

ent

sold

all

ric

e in

Sep

tem

ber

2012

. 1.

2 In

th

e ca

se t

hat

th

e go

vern

men

t so

ld a

ll r

ice

at t

he

pri

ces

equ

ival

ent

to t

he

bid

din

g p

rice

s on

Sep

tem

ber

5, 2

012

a) W

hit

e ri

ce 5

% i

s p

rice

d a

t 16

,300

bah

t/to

n a

nd

Jas

min

e ri

ce i

s p

rice

d 2

9,80

0 ba

ht/

ton

(bas

ed o

n t

he

bid

din

g p

rice

s on

Sep

tem

ber

5, 2

012)

. b)

Pri

ces

of t

he

oth

er t

ypes

of

rice

are

bas

ed o

n d

omes

tic

wh

oles

ale

pri

ces

in J

uly

20

12, s

ubt

ract

ing

oper

atio

nal

cos

ts o

f 1,

500

bah

t/to

n.

1.3

In t

he

case

th

at s

ale

pri

ces

hav

e d

ecre

ased

a)

Wh

ite

rice

5%

is

pri

ced

at

13,4

70 b

aht/

ton

(d

ecre

asin

g 10

% f

rom

th

e bi

dd

ing

pri

ce a

nd

su

btra

ctin

g th

e op

erat

ion

al c

osts

of

1,20

0/to

n).

b) J

asm

ine

rice

is

pri

ced

at

25,8

10 b

aht/

ton

(d

ecre

asin

g 5%

fro

m t

he

bid

din

g p

rice

an

d s

ubt

ract

ing

the

oper

atio

nal

cos

ts o

f 2,

500

bah

t/to

n).

c) D

omes

tic

wh

oles

ale

pri

ces

are

equ

ival

ent

to t

hos

e in

1.2

. 2.

Ass

um

e th

at m

illi

ng

and

del

iver

ing

pro

cess

es f

ollo

w t

he

per

iod

sta

ted

in

th

e p

rogr

am,

so t

hat

th

ere

wou

ld n

ot b

e u

nm

ille

d r

ice

stor

ed i

n t

he

war

ehou

ses

and

th

e go

vern

men

t d

id n

ot h

ave

to

ren

t m

ille

r’s

war

ehou

ses.

3.

Th

e lo

an i

nte

rest

s ar

e ba

sed

on

val

ue

and

per

iod

in

wh

ich

un

mil

led

ric

e w

as r

ecei

ved

by

the

pro

gram

. 4.

War

ehou

se r

ents

for

mil

led

ric

e, q

ual

ity

pre

serv

atio

n c

osts

, an

d d

eter

iora

tion

cos

ts a

re c

alcu

late

d

base

d o

n t

he

quan

tity

of

mil

led

ric

e an

d p

erio

d in

wh

ich

th

e ri

ce w

as d

eliv

ered

to

the

war

ehou

ses

Sou

rce:

Th

aila

nd

Dev

elop

men

t R

esea

rch

Inst

itu

te

Page 50: Sais.34.1

50 SAIS Review Winter–Spring 2014Table 2. Continued

Estimation Assumptions:1. Assumptions for rice sales and sale prices

1.1 Assume that the program stored rice in the warehouses for a year, started from October 2011 and ended in September 2012. The government sold all rice in September 2012.

1.2 In the case that the government sold all rice at the prices equivalent to the bidding prices on September 5, 2012

a) White rice 5% is priced at 16,300 baht/ton and Jasmine rice is priced 29,800 baht/ton (based on the bidding prices on September 5, 2012).

b) Prices of the other types of rice are based on domestic wholesale prices in July 2012, subtracting operational costs of 1,500 baht/ton.

1.3 In the case that sale prices have decreaseda) White rice 5% is priced at 13,470 baht/ton (decreasing 10% from

the bidding price and subtracting the operational costs of 1,200/ton).

b) Jasmine rice is priced at 25,810 baht/ton (decreasing 5% from the bidding price and subtracting the operational costs of 2,500 baht/ton).

c) Domestic wholesale prices are equivalent to those in 1.2.2. Assume that milling and delivering processes follow the period stated in the program,

so that there would not be unmilled rice stored in the warehouses and the govern-ment did not have to rent miller’s warehouses.

3. The loan interests are based on value and period in which unmilled rice was received by the program.

4. Warehouse rents for milled rice, quality preservation costs, and deterioration costs are calculated based on the quantity of milled rice and period in which the rice was delivered to the warehouses

Source: Thailand Development Research Institute

In June 2013, criticisms over the rice-pledging scheme once again oc-cupied news headlines. A classified report by the audit committee for the government’s agricultural product pledging programs was leaked to the op-position Democratic Party. In the report, which was posted on the Twitter of Korn Chatikawanich, one of the opposition’s most prominent leaders, the program was estimated to cost the government at least 260 billion baht ($8.13 billion). The report includes budgets for the rice-pledging program in 2011–2012 and 2012–2013. It indicates that the government spent 661.22 billion baht ($20.66 billion) in the 2011–2012 production season and 408.75 billion baht ($12.79 billion) during the 2012–2013 production season. After including a loan from the BAAC and income from rice sold by the Ministry of Commerce, the balance for both seasons of the rice-pledging scheme

resulted in a loss of 220.9 7 billion baht ($6.9 billion). However, this

number only reflects the balance as of 31 January 2013. The committee in-dicated that the program caused monthly losses of 10 billion baht ($312.5

. . .the total losses incurred by the program could be as high as 260 billion baht.

Page 51: Sais.34.1

51Controversy surrounding the thai riCe-pledging sCheme

million). As a result, as of May 2012, the total losses incurred by the program could be as high as 260 billion baht.8

The report was part of government efforts to estimate losses resulting from the rice-pledging scheme. The conclusions of the internal government report were inconclusive. There are several other reports that show conflict-ing data and illustrate that the government was in the process of choosing the most logical way to estimate losses caused by the program.9 In response to the criticisms, Minister of Commerce Boonsong Teriyapirom and Deputy Minister of Commerce Natawut Saikua held a press conference on the bal-ance of the rice-pledging scheme on 7 June 2013. The ministers insisted that the program was over budget by less than 260 billion baht. Moreover, they argued that since the program was still in progress, making conclusions about the impact on the budget was impossible. However, the government maintained that the program would be over budget by a maximum of 80 to 90 billion baht ($2.50 to $2.81 billion). The fact that the ministers could not cite specific numbers further inflamed criticisms. Many journalists in attendance at the press conference were furious over how unclear the state-ment was and argued with the ministers.10 The inability of the ministers and the government to adequately satisfy the demand for credible information intensified criticisms directed at the government and exacerbated confusion among the opposition and the public. This clearly affected the government’s credibility.11

A day after the press conference, opposition leader Abhisit Vejjajiva emphasized that Prime Minister Yingluck’s government had lost 42.96 bil-lion baht ($1.34 billion) in the 2011–2012 wet season, the inaugural grow-ing and harvest season of the program. According to Abhisit, the loss in the first season was minimal, considering that Thailand’s rice production was devastated by a massive flood in May-June 2012. However, in 2012–2013, the government lost 178 billion baht ($55.63 billion). This number does not include the operational and administrative costs of the program, esti-mated to be approximately 40 billion baht ($1.25 billion). Therefore, the rice-pledging program lost at least 200 billion baht ($6.25 billion) in one production year.12

On 14 June 2013, the government disclosed a new set of numbers con-cerning the rice-pledging scheme. The numbers show that in the 2011–2012 production year, the program was over budget by a maximum of 130 billion baht ($4.63 billion), less than the estimated loss cited by the opposition. However, the government accepted that there was confusion over numbers used by different agencies.13

In October 2013, Pridiyathorn Devakula, former deputy prime minis-ter and former minister of finance, estimated that the rice-pledging scheme would result in losses as high as 425 billion baht ($13.28 billion).14 Pridi-yathorn was the first person that publicly and explicitly explained his cal-culation method. According to his calculations, the program’s costs include payments to farmers, processing, delivery, storage, debt interest, and other miscellaneous expenses. The income from the program includes income from selling rice and expected income from selling the unsold rice at market

Page 52: Sais.34.1

52 SAIS Review Winter–Spring 2014

prices at the end of the year. He presented two methods of balance calcula-tion. The first method involves a basic addition and subtraction of expenses and income. Utilizing this method yields a loss of 425 billion baht ($13.28 billion). The second method includes interest rates from government loans of 55 billion baht ($1.72 billion), deterioration costs of 60 billion baht ($1.88 billion), and preservation costs and other expenses of 20 billion baht ($625 million) in the calculation. Using data from the official government audit committee for the agricultural product-pledging scheme, the second method yields a loss of 470 billion baht ($14.69 billion).15

Information War: Data as a Legitimizing Tool

The rice-pledging scheme has clearly been an information war where numer-ous sets of data are used to provide legitimacy for both sides. It is difficult to verify the reliability of this data. First, the method of calculation is not uniform across government agencies. Although different agencies share the same raw data, such as the amount of pledged rice, subsidies, and milled

rice in stock, profits and losses are usually different due to dissim-ilarities in calculations. The government’s at-tempt to calculate ac-tual losses, in the wake of charges by the op-position in 2012, is a

good example of dissimilarities resulting from calculation methodologies. Prime Minister Yingluck appointed Varathep Rattanakorn, the Minister in the Office of the Prime Minister and the Deputy Minister of Commerce, to investigate actual losses of the program. Varathep gathered data from different agencies, including the Ministry of Commerce, the Ministry of Agriculture and Cooperatives, the Ministry of Interior, the Office of the Prime Minister, and the Office of the National Economic and Social Devel-opment Board. He discovered differences between the agencies.16 Table 3 shows differences in the results of the calculations from the audit commit-tee overseeing agricultural pledging scheme and the Ministry of Commerce. This could potentially explain why the ministers and deputy ministers of commerce insisted that the program’s total loss was indeed less than 280 billion baht. However, because of the government’s inability to provide of-ficial data coordinated across all agencies to counter the opposition’s claims, the opposition’s numbers were perceived as more reliable to the public. This clearly affected the government’s credibility. Table 3 shows differences between data held by the audit committee and the Ministry of Commerce.

The numbers presented in Table 3 were gathered as an attempt to clarify the financial costs accrued by the program in the face of criticism by both the opposition and the public. These two calculations in Table 3, while sharing the same set of raw data, show a significant difference in

Although different agencies share the same raw data […] profits and losses are usually different due to dissimilarities in calculations.

Page 53: Sais.34.1

53Controversy surrounding the thai riCe-pledging sCheme

Tab

le 3

. Pri

mar

y D

ata

on t

he

Ric

e-p

led

gin

g Sc

hem

e 20

11–2

012

(Wet

Sea

son

201

1-20

12—

Dry

Sea

son

201

2)

Con

ten

ts

Th

e A

ud

it C

omm

itte

e

Th

e M

inis

try

of C

omm

erce

Am

oun

t of

un

mil

led

rice

(m

illi

on t

ons)

21

.7

21.7

Am

oun

t of

mil

led

rice

(m

illi

on t

ons)

13

.5

13.5

Bu

dget

use

d (b

aht)

33

7,32

2,00

0,00

0 33

7,32

2,00

0,00

0

Exp

ense

s (b

aht)

14

,786

,000

,000

14

,786

,000

,000

Tot

al e

xpen

se (

2)+(

3) (

bah

t)

352,

108,

000,

000

362,

108,

000,

000

Val

ue

of m

ille

d ri

ce r

emai

nin

g in

sto

ck (

bah

t)

156,

000,

000,

000

252,

300,

000,

000

Val

ue

of r

ice

sold

(ba

ht)

59

,200

,000

,000

49

,900

,000

,000

Loss

(ba

ht)

136,

908,

000,

000

49,9

08,0

00,0

00

Sou

rce:

Cab

inet

Res

olu

tion

Ju

ne

18, 2

013

Page 54: Sais.34.1

54 SAIS Review Winter–Spring 2014

estimated loss. The two numbers differ by 87 billion baht ($2.72 billion).17 The cabinet resolution, however, does not include any information about the calculation methodologies utilized by each agency. Therefore, the root cause of differences in calculations is not known. This not only increased confusion among the public, but also impacted the government’s decision-making processes. For example, when the Ministry of Commerce relies on the data that estimate a lower level of losses, it could alter the decision to hold or sell rice. In fact, the government originally decided to stock rice until the global prices of rice increased. However, this method proved ineffective as world rice prices continued to decrease, primarily due to an increase in rice production in India and Vietnam.18

Figure 1. Export Prices for Rice19

Source: Food and Agriculture Organization of the United Nations (FAO)

The data do not include numbers from the 2012–2013 production year. This is because at the time this report was released, the 2012–2013 production cycle had not yet ended. Nevertheless, the cabinet resolution in which the data was published includes an estimated budget for the 2012–2013 production year provided by the Office of National Economic and Social Development Board. Table 4 shows the data provided by this report.

The same report by the Office of National Economic and Social De-velopment Board also points out expected negative effects of the program. First, the government would accumulate debt of 159.69 billion baht ($4.951 billion). If the government specified a pledging quota of 15 million tons a year from 2014–2017, the government would have an average debt of 80.62 billion baht ($2.5 billion) annually. This debt figure was estimated using the difference between the pledging price and the market price trend for rice. The figure also includes interest and procurement costs. Second, the program increases export prices and costs of production which therefore decreases Thailand’s relative competitiveness with other rice producing

Page 55: Sais.34.1

55Controversy surrounding the thai riCe-pledging sCheme

Tab

le 4

. Bu

dge

t E

stim

atio

n f

or t

he

Ric

e-p

led

gin

g Sc

hem

e 20

11–2

012

and

201

2–20

13

Con

ten

ts

2011

–201

2

2

012–

2013

T

otal

1. A

mou

nt

of u

nm

ille

d ri

ce (

mil

lion

ton

s)

21.6

8 18

.79

40.4

7

2. B

udg

et u

sed

(bah

t)

337,

246,

000,

000

251,

462,

000,

000

588,

708,

000,

000

3. V

alu

e of

ric

e so

ld (

bah

t)

From

th

e be

gin

nin

g of

th

e pr

ogra

m t

o M

ay 2

013:

76,0

01,0

00,0

00

Ric

e ex

pect

ed t

o be

sol

d u

nti

l Sep

tem

ber

2013

:

73,0

82,0

00,0

00

4. B

enefi

ts o

f th

e pr

ogra

m

4.

1 In

crea

se in

far

mer

’s p

rofi

ts (

bah

t)

116,

000,

000,

000

114,

000,

000,

000

230,

000,

000,

000

4.

2 In

crea

se in

GD

P (p

erce

nt)

0.

69

0.62

Sou

rce:

Cab

inet

Res

olu

tion

Ju

ne

18, 2

013

Page 56: Sais.34.1

56 SAIS Review Winter–Spring 2014

countries. Third, large and medium sized farmers have gained more benefits from the program than small sized farmers. Forth, the government has a limited ability to sell rice. Finally, the process of issuing certified documents and warehouse receipts for farmers is bureaucratically slow.20

Aside from the calculation problem, it is also unclear in which stages data are collected. For practical purposes, statistics on unmilled rice should be collected when farmers deliver their products to millers. The numbers should mirror the amount of unmilled rice specified in the warehouse receipts that farmers deposit at the BAAC. The second set of data are col-lected after the pledged rice has been milled and delivered to government

warehouses. Then, the government has to keep record of how much milled rice has been sold and how much remained in the warehouses. The com-

plexity of the process provides opportunities for miscalculations and cor-ruption. Although the final data gathered and compared by Varathep show cohesive raw data such as amount of unmilled and unmilled rice in stock, their original data have been reportedly contradictory. Unfortunately, these contradicting data have not been made available to the public. According to Teerat Ratasevi, the Spokesperson of the Office of the Prime Minister, before Prime Minister Yingluck appointed Varathep to investigate the data, each agency had used uncoordinated sets of data. These include data on the amount of unmilled and milled rice in stock, program’s income, and amount of rice sold. This added to the contradictory estimations of losses.

Another problem affecting the data collection process is corruption. Farmers or millers smuggle rice from neighboring countries to claim sub-sidies; some millers also hide unqualified rice in a bulk of high quality rice. When surveyors arrive to inspect quality and quantity of rice, the unqualified rice can pass undetected. The corruption associated with the rice-pledging scheme is publicly recognized. A nation-wide survey by the National Insti-tute of Development Administration (NIDA) on farmers’ perspectives on the rice-pledging scheme, which sampled 1,250 respondents, shows that the popular perception is that the program failed mainly because of cor-ruption.21 This public perception of corruption within the program, again, impacted the credibility of both the program and the government. More importantly, instances of corruption distort the data and subsequent poli-cies enacted by the government.

Conclusion

Data are often used as a tool to shape public opinion. At the same time, the public also demands data and numbers to confirm their beliefs. In a country like Thailand, where data collection and calculation methodologies are not uniform and not completely reliable, data have been used as a tool

The complexity of the process provides opportunities for miscalculations and corruption.

Page 57: Sais.34.1

57Controversy surrounding the thai riCe-pledging sCheme

to gain legitimacy by both factions in Thai politics. In the case of the rice-pledging scheme, differences in data collected and calculated by different government agencies have led to confusion over actual losses accrued by the program. When the opposition took advantage of data from one of the agencies, interpreted the estimated loss, and turned the data into a political weapon, the government remained confused over contradicting data from different agencies and could not effectively respond to the opposition’s claim. Later, the government tried to regain public confidence by providing a more coordinated set of data, but the criticism has already been widely spread throughout the population.

When numbers become political themselves, the question is no longer about the precision and correctness of data. Rather, it becomes a question of who can use the available data more effectively and appropriately. The Thai government’s failure in this political game of numbers has con-tributed immensely to its credibility crisis. Despite the fact that this political game is more about data presentation than data correctness, the data collection and calculation processes should not be disregarded. The government has to make sure that its data is reliable and the calculation reflects the reality. Thus, correct data will contribute to the implementation of more realistic policies in the future.

Notes

1 Exchange rate 32 baht/USD as of April 13, 2014. Assanai Panyamang, “Nayobai Chamnam Khao Pak Phue Thai” [Phue Thai Party’s Rice-pledging Policy], Voice TV, June 20, 2011. http://news.voicetv.co.th/thaivote/12673.html (accessed April 10, 2014).2 Bangkok Biz News, “Yan Rattaban Khadtoon Chamnam Khao 2.6 San Laan” [Confirmed, the Government Lost 260 billion in the Rice-pledging Scheme], Bangkok Biz News, June 7, 2013. http://www.bangkokbiznews.com/home/detail/business/business/20130607/509898/ยันรัฐบาลขาดทุนจำานำาข้าว2.6แสนล้าน.html (accessed April 10, 2014).3 Matichon Online, “‘Poo’ Mai Doo ‘Boonsong-Natawut’ Tob Nak Khao Mai Dai Chamnam Khao Tao Dai Aang Lerk Ngan Leaw Ma Chuoi ‘Sam’ Ha Sieng” [Yingluck Did Not Watch When “Boonsong-Natawut” Could Not Answer Journalists on the Rice-pledging Scheme, Says Currently Help “Sam” Run Campaign], Matichon Online, June 8, 2013. http://www.matichon.co.th/news_detail.php?newsid=1370604260&grpid=01&catid=&subcatid= (ac-cessed April 10, 2014).4 Khaosod, “Khanton Chamnam Khao” [Rice-pledging Procedures], Khaosod, October 3, 2011. http://www.khaosod.co.th/view_news.php?newsid=TUROamIyd3hNREF6TVRBMU5BPT0= (accessed April 10, 2014) 5 Department of Domestic Trade, “Raka Chamnam Khao Pluek Na Pee Pee Kan Palit 2554/2555” [Rice-pledging Prices for the Wet Season 2011/2012 Production Year], Depart-ment of Domestic Trade, Ministry of Commerce. http://www.thairiceexporters.or.th/Interven-tion%20program.htm (accessed April 10, 2014).

When numbers become polit ical themselves, the question is no longer about the precision and correctness of data. Rather, it becomes a question of who can use the available data more effectively and appropriately.

Page 58: Sais.34.1

58 SAIS Review Winter–Spring 20146 Bangkok Biz Online, “Rat Hom Rabai Khao Jai Nee Cao Na” [The Government Rushes to Sell Rice to Pay Farmers], Bangkok Biz Online, February 4, 2014. http://www.bangkokbiznews.com/home/detail/business/business/20140204/560825/รัฐโหมระบายข้าวจ่ายหนี้ชาวนา.html (accessed April 10, 2014).7 Thailand Development Research Institute, “Pol Dee Pol Sia Khong Karn Chamnam Khao Took Med” [Advantages and Disadvantages of Pledging Unlimited Amount of Rice], Thai-land Development Research Institute. http://tdri.or.th/tdri-insight/ar3/ (accessed April 10, 2014).8 Bangkok Biz News, “Yan rattaban khadtoon chamnam khao 2.6 san laan,”9 Thairath’s business team, “Scan Krongkarn Chamnam Khao Tang Aok Rattaban Kue Thoi” [Scan the Rice-pledging Scheme, Government Should Back Off], Thairath, June 17, 2013. http://www.thairath.co.th/column/eco/ecoscoop/351544 (accessed April 10, 2014).10 Matichon Online, “‘Poo mai doo Boonsong-Natawut.”11 Phichit Likhitjitsomboon, “Krongkarn Rab Chamnam Khao Khadtoon 2.6 San Laan Baht Jing Rue” [Did the Rice-pledging Scheme Really Lost 260 Billion Baht], Prachatai, June 21, 2013. http://prachatai.com/journal/2013/06/47311 (accessed April 10, 2014).12 Thairath’s Political Team. “‘Mark’ Yum Khadtoon Chamnam Khao 2.6 San Laan Jee ‘Poo’ Thobthuan Duan” [Abhisit Insists the Rice-pledging Scheme Lost 260 Billion Baht, Urge Yingluck to Urgently Review the Program], Thairath, June 8, 2013. http://www.thairath.co.th/content/pol/349968 (accessed April 10, 2014).13 Krob Krua Kao, “Varathorn Pei Chamnam Khao Kadtoon Kae 1.3 San Laan” [Varathorn Points Out the Rice-pledging Scheme lost only 130 Billion Bhat], Krob Krua Kao, June 14, 2013. http://www.krobkruakao.com/ข่าวเศรษฐกิจ/74917/-วราเทพ-เผยจำานำาข้าวขาดทุนแค่-1-3-แสนล้าน.html (accessed April 10, 2014).14 Pridiyathorn is not a member of the Democratic Party, the current official opposition party. However, he has criticized the Yingluck government on many issues ranging from political to economics.15 Thai Republica, “Mahakarp Chamnam Khao (2): ‘Mom Oui” Kang Vitee Khamnuan Kad-toon lae Kwak Ngern Chang Borisat Samruat Chaona Pisoot Krai Dai Krai Sia” [Pridiyathorn Shows Calculation Method for the Rice-pledging Scheme’s Loss and Hires a Company to Investigate Gains and Loss from the Program], Thai Republica, October 18, 2013. http://thaipublica.org/2013/10/pledge-rice-epic-2/ (accessed April 10, 2014).16 Bangkok Biz News, “Kittirat Chaeng Khadtoon Chamnam Khao Mai Thueng 2.6 San Laan” [Kittirat Explains the Rice-pledging Scheme is Less than 260 Billion Baht Over Budget], Bangkok Biz News, June 10, 2013. http://www.bangkokbiznews.com/home/detail/busi-ness/business/20130610/510340/กิตติรัตน์แจ้งขาดทุนจำานำาข้าวไม่ถึง2.6แสนล..html (accessed April 13, 2014).17 Prachatai, “Perd Tua Lek – Kho Sanernae Khong Sor Sor Cho Tor Khrongkarn Chamnam Khao” [Reveal Numbers – Nesdb’s Recommendation for the Rice-pledgign Scheme], Pracha-tai, June 18, 2013. http://prachatai.com/journal/2013/06/47270 (accessed April 10, 2014).18 Vikram Nehru, “Thailand’s Rice Policy Gets Sticky,” East Asia Forum, June 13, 2012. http://www.eastasiaforum.org/2012/06/13/thailand-s-rice-policy-gets-sticky/ (accessed April 10, 2014).19 Food and Agriculture Organization of the United Nations, “Rice Market Monitor,” Food and Agriculture Organization of the United Nations, 17.1 (April 2014). http://www.fao.org/fileadmin/templates/est/COMM_MARKETS_MONITORING/Rice/Images/RMM/RMM_APR14_H.pdf (accessed April 13, 2014).20 Prachatai, “Perd Tua Lek – Kho Sanernae Khong Sor Sor Cho Tor Khrongkarn Chamnam Khao.”21 Oryza, “Corruption Killed Thailand Pledging Scheme, Says Survey,” Oryza, February 13, 2014. http://oryza.com/news/rice-news/corruption-killed-thailand-rice-pledging-scheme-says-survey (accessed April 10, 2014).

Page 59: Sais.34.1

59Open Data pOlicy imprOves DemOcracySAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

59© 2014 by The Johns Hopkins University Press

Open Data Policy Improves Democracy

Ian Kalin

Global organizations are increasingly adopting technology lessons from the commercial sector—in particular, technology that facilitates “openness.” In a tangible sense, this open-ness could be misunderstood as a global increase in government-managed websites. In reality, governments all over the world are empowering people with greater access to pub-lic information while also helping citizens extract value from this data. The popularity of data liberation has been reflected in a portfolio of new policies that are increasingly being coordinated among nations. In this article, one of the global leaders in open innovation, Socrata, identifies where these open data initiatives are working well, the common trends between them, and how government leaders and policymakers could replicate the success for their own constituencies.

Data All Around Us

In October 2012, with Hurricane Sandy closing in on Manhattan, the New York City government posted information about evacuation zones,

shelters, and food centers to its web-based, open data portal. Without any formal mandate or procurement, civic developers and other government response workers then developed dynamic digital maps that citizens could access on their mobile devices. This public information was placed directly in the hands of the people who needed it, and it helped them in clear and measurable ways. After the storm, Rachael Haot, New York City’s chief digi-tal officer, said, “We estimate that collectively we served and informed ten times as many individuals—hundreds of thousands of people—by embracing an open strategy.”1

This story illustrates a growing global trend toward government transparency and accountability. Governments are finding innovative ways to liberate data from antiquated “vaults” in order to help agencies perform their core services. Whether for disaster response or more routine func-tions—such as managing education systems, health care, housing services, business permits, safety inspections, parking resources, or even fighting crime—governments are using data to improve quality of life.

Of course, this evolving landscape would not function well if the system only included publishers. Data consumers, ranging from scientists fighting climate change to parents who are curious about local graduation rates, are also contributing new ideas and approaches to the open data

Ian Kalin was formerly an advisor to the White House on technology policy as a Presidential Innovation Fellow. Today, Ian is the Director of Open Data for Socrata.

Page 60: Sais.34.1

60 SAIS Review Winter–Spring 2014

movement. Indeed, a growing and self-perpetuating ecosystem of innova-tion has arisen from the exchange between data producers and consumers. Policymakers must understand and advance this ecosystem so that they can take advantage of their integral role and improve the quality of government services.

Open Data is Not New

With deference to history, the concept of open data is not new. The practice of pooling tax revenues to provide free access to public information is as old as the first library in human civilization. But changes in scale have become changes in kind. The cost of cloud computing has dropped precipitously; the technical barriers to entry for interaction with digital data have soft-ened; and general democratic dissatisfaction has driven governments to be more “open.” This confluence of factors means that the open data market is ripe with opportunity. The first global leader to recognize this potential was then-presidential candidate Barack Obama.

In 2008, the Obama campaign launched an effort to achieve unprec-edented levels of transparency, following years of systemic obfuscation by the previous administration. As a candidate, Obama practiced transparency through actions such as disclosing political donations. As president, Obama signed the “Memorandum on Transparency and Open Government” on his first day in office.2 This memo ushered in a wave of “Open Government” initiatives that have fundamentally transformed the way in which the federal government collaborates with its citizens and people throughout the world.

Other governments, both local and international, soon followed the precedent set by the U.S. federal government. For example, San Francisco Mayor Gavin Newsom issued an executive directive on open government in 2009.3 In 2011, the British government established its Government Digital Service to deliver on a “digital by default” strategy.4 More recent global milestones in this movement include the European Union directive on the “re-use of public sector information”5 and the adoption of the Open Data Charter by the “Group of Eight” (G8) in 2013.6 Non-governmental organiza-tions and private companies have also implemented open data policies. This series of landmark legislative acts and international agreements affirm that open data is both rooted in history and constantly evolving.

The Citizen Experience

Why is open data valuable for ordinary people? This is what open data would look like if you could see the invisible connections:

•   A student with a smartphone can predict when the next public bus will arrive because the city has published bus schedules in an accessible data format.

•   A parent can learn whether a restaurant has a history of unsafe conditions because the state’s inspection reports are embedded on Yelp.

•   A senior citizen can access free assistance in choosing a new medical insur-ance plan because a community organizer is able to access federal pricing schedules from a number of different online networks.

Page 61: Sais.34.1

61Open Data pOlicy imprOves DemOcracy

These examples demonstrate that open data by itself is inert; it cannot cure and it has no taste. But when put in the hands of innovators, open data can do amazing things. Good data design is essential for im-proving citizens’ interactions with data. As described by Cyd Harrell in Beyond Transparency: Open Data and the Future of Civic Innovation, a book published by Code for America:

Public data is rarely usable by ordinary citizens in the form in which it is first released. The release is a crucial early step, but it is only one step in the process of maximizing the usefulness of public resources for the people who own them. […] The real-time transit apps that are such a strong early example of useful open data do more [emphasis added] than offer a raw feed of bus positions. The best of them allow riders to search arrivals on multiple lines of their choosing and adjust their commute plans accordingly.7

Yet the previous examples also demonstrate that people rarely inter-act directly with open data. Does this mean that governments must master the art of mobile app design? Fortunately for governments everywhere, the answer is simple: You don’t have to do it all yourself.

Governments as Wholesale, Innovators as Retail

Generally, governments are not very good at extracting all of the value from their own data. Of course, there are some notable exceptions. For example, when I worked at the U.S. Department of Energy, I learned that the federal government created and managed some of the best computer simulators in the world. But most city and state governments do not have the human resources to build and ship software with the same efficiency as technology companies. Therefore, in order for governments to truly take advantage of open data, they must publicly engage non-governmental innovators.

The best analogy for how governments should approach the multi-stage process of building on the value of data is to think of data as a fuel. Governments effectively mine the fuel through their day-to-day operations, whether through processing permits, conducting inspections, or other activities. Next, governments must open up their data as wholesale, raw information. Finally, innovators take this data-as-fuel and “burn it” in their digital engines, propelling their retail products and services forward.

A single dataset further illustrates the process: The city of Chicago actively publishes the city’s crime data on a web-based portal.8 Although there are lightweight tools to visualize the crime data on Chicago’s website, companies like the Chicago Tribune9 and Kramer Concepts enhanced the effectiveness of the data with retail products that people were eager to use, such as Kramer Concepts’ iPhone app, “Chicago Crime Watch.”10 The annual

[...] open data by itself is inert; it cannot cure and it has no taste. But when put in the hands of innovators, open data can do amazing things.

Page 62: Sais.34.1

62 SAIS Review WINTER–SPRING 2014

report of the city of Chicago notes: “Open data has enabled developers to create a wealth of applications for Chicago’s residents. These applications range from browsing 311 requests, to reminding residents of street clean-ings, to informing citizens on the location and nuances of zoning laws.”11

Stories like this one from Chicago are not rare. In fact, a study from the New York University GovLab—titled the “Open Data 500”—lists fi ve hundred U.S. organizations that rely on federal government open data, including datasets available at Data.gov, to sell their products.12 The Brit-ish government lists more than three hundred apps that leverage their equivalently centralized open data portal.13 All of this activity represents the dawn of the open data market. A study by the McKinsey Global Institute titled “Unlocking Innovation and Performance with Liquid Information” estimated that between $3 trillion and $5 trillion of economic value could be generated annually across just a handful of sectors through the smarter use of public-domain data.14

This is how governments can turn megabytes into jobs. By treating data as a fuel rather than as an end unto itself, governments can serve as

trusted platforms for fur-thering innovation, as op-posed to passive bystanders in the emerging data revolu-tion.

This is how governments can turn megabytes into jobs.

Figure 1. The Data Ecosystem

Page 63: Sais.34.1

63Open Data pOlicy imprOves DemOcracy

Government Innovator’s Toolkit

Governments have a variety of tools that help them serve as platforms for innovation. Among the most grassroots-level tools are events called “hack-athons.” During these gatherings, which usually take place over a weekend, software developers compete to develop new apps that serve a pre-estab-lished goal—for example, finding creative uses for recently released govern-ment datasets. The power of hackathons to attract brilliant, civic-minded young people is widely documented. However, these events are usually not effective in creating sustainable products of real value, given their short time span and informal nature. For that reason, governments—ranging from small cities to the White House—also host “data jams” and “datapaloozas.” These initiatives are derivatives of the hackathon model, but their mission is to engage diverse business leaders, such as CEOs, instead of the young coders that usually attend hackathons.

Many more variations of public-private partnerships have developed in recent years, including entrepreneur-in-residence programs, startup week-ends, weekly hack nights, and innovation challenges. Much like hackathons, data jams, and datapaloozas, these initiatives favor fast results—for better or worse. For example, entrepreneur-in-residence programs, like the Fuse Corps or Presidential Innovation Fellows, seek to place executive innovators in government offices where they can impact local and national challenges. These entrepreneur-in-residence programs typically range from six to twelve months in length. Consequently, even if these innovators succeed in pro-moting tremendous change, what happens after they complete their tour of service? As should be expected from initiatives that favor fast results, many of the tools in the government innovator’s toolkit are short-lived.

However, one significant tool can produce more sustainable results: policy creation. Increasingly, governments are reaching to their oldest tool to ensure that open data initiatives outlive a single administration and pen-etrate deeper into the core operations of government services. Four types of policy levers are most commonly used with regard to open data: executive orders, new laws, new regulations, and non-binding resolutions.

The Four Types of Policy Tools and Lessons from Implementation

Policy Tool #1: Executive OrdersPresident Obama’s January 2009 memorandum on transparency and open government demonstrated the administration’s commitment to enhanc-ing openness in the federal government. Yet the real policy magic was an executive order, the Open Government Directive of December 2009, which required all U.S. federal agencies to comply with a detailed set of time-bound actions. The directive’s requirements were published on a GitHub reposi-tory, a hosting platform for open source software development projects, where even non-government workers could offer suggestions to improve the detailed policy guidance. The “Project Open Data” public revision notes on GitHub include suggestions to clarify the strategic objectives, to correct mis-

Page 64: Sais.34.1

64 SAIS Review Winter–Spring 2014

steps in the prescribed licensing terms for legal data reuse, and to improve the mandatory metadata fields for enterprise data inventories.15

Unfortunately, the president’s executive order did not work as planned. The technology system selected to power the latest version of Data.gov proved more difficult to work with than anticipated. Additionally, the ac-tion timelines were widely perceived as too aggressive, yet federal agencies that were unable to meet the deadlines did not suffer any liabilities, which contributed to an overall lack of accountability. As a result of these factors, more than half of federal agencies failed to comply with the executive order by the stated deadlines. Of the two major problems—technology and time-lines—the latter was more directly related to policy failure.

The Open Government Directive serves as a memorable benchmark for this type of policy tool. All effective open data policies must include some sort of timeline in order to be effective. Yet for an organization as large as the U.S. federal government, six months was not enough to publish comprehensive data catalogs, to create enterprise data inventories, and to devise new internal management systems.

Policy Tool #2: Non-Binding ResolutionsShort and sweet, non-binding resolutions are town squire-like pronounce-ments without legal weight that are used to express a community’s interests or concerns. A timeless example of a non-binding resolution is the recent proclamation on “Open Data by Default” from Palo Alto, California.16 Palo Alto is a city to watch for fans of civic innovation because it is endowed with a wealth of entrepreneurial people and businesses, including HP, Palantir, Facebook executives, and Stanford University. Palo Alto has made great strides with open data in recent years, despite the city’s small size and com-paratively small budget.

Non-binding resolutions are usually brief—about one page in length—and include declarative sentences that begin with words like “whereas.” Below are diverse examples of non-binding resolutions, which could serve as introductory articles for more robust legislation in the future:

•   WHEREAS, the government is committed to using technology to foster open, transparent, and accessible government; and

•   WHEREAS, the adoption of open data improves provision of services, increases transparency and access to public information, and enhances coordination and efficiencies among departments and partner organiza-tions across the public, nonprofit, and private sectors; and

•   WHEREAS, it should be easy to do business with the government. On-line government interactions mean more convenient services for citizens and businesses and online government interactions improve the cost-effectiveness and accuracy of government operations; and

•   WHEREAS, the protection of privacy, confidentiality, and security will be maintained as a paramount priority while also advancing the government’s transparency and accountability through open data; and

•   WHEREAS, by publishing structured standardized data in machine-readable formats, the government seeks to encourage the local software community to develop software applications and tools to collect, organize, and share public data in new and innovative ways; and

Page 65: Sais.34.1

65Open Data pOlicy imprOves DemOcracy

•   WHEREAS, in commitment to the spirit of open government, the gov-ernment will consider public data to be an asset of the citizens that paid for it and therefore operate under an “open by default” philosophy; and

•   WHEREAS, proactively disclosing government data is a fundamental element of an open government program and is consistent with existing laws; and

•   WHEREAS, the use of open data exchange standards improves transpar-ency, access to public information, and improved coordination and ef-ficiencies among organizations across the public, non-profit, and private sectors.

This lightweight policy tool has the benefit of lasting longer than a sin-gle government official. At a functional level, these resolutions also empower individuals to work as “change agents” within government whenever they encounter the stereotypical (and all-too-common) bureaucrat who rejects proposals simply because they are unfamiliar. With a copy of an open data resolution in hand, a government entrepreneur possesses tangible proof of a chief executive’s support for transparency. However, with respect for the numerous resolutions that have come out of small towns, resolutions are only as timeless as residents’ memories.

Policy Tool #3: Internal RegulationsThe third type of policy tool includes memorandums, handbooks, director’s instructions, official guides, semi-formal outlines like a “Letter from the Of-fice of the Governor,” and other internal regulations. The New York State Handbook on Open Data is an example of an internal regulations success story. Originating from the Office of Information Technology Resources, the handbook is a comprehensive, clear, and authoritative guide on how open data should work. Also available on GitHub, the handbook resembles the federal open government initiative. Yet New York State deserves credit for the manner in which it developed the guidelines. The state government established a system in which counties can procure and quickly deploy the same system used at the state level. In comparison, federal agencies were encouraged to establish their own systems in response to the Open Govern-ment Directive, but were not offered the same off-the-shelf resources or tools that New York State has provided to its local governments.

Another example of effective internal open data regulations comes from the U.S. Department of Interior (DOI). The DOI responded to the president’s executive order on open data by creating a hierarchy of manuals and internal policies to prescribe how the program should function on a day-to-day level within the operationally diverse bureaus and offices within the DOI.17 There are tiers of regulations, including: a strategic manual at the top level, a council charter that assigns responsibilities at the next level, policies at the third level, and at the lowest level, detailed plans. Although this hierarchy might seem unnecessarily burdensome, those familiar with large bureaucracies will recognize that these types of guidelines are often necessary for governments to operate effectively.

Page 66: Sais.34.1

66 SAIS Review Winter–Spring 2014

Policy Tool #4: Codified LawsLegislation that becomes codified law is the 800-pound gorilla of open data policy tools. The archetypal example comes from San Francisco, where Mayor Gavin Newsom drafted executive orders that called for California to become a leader in “cloud-first” technology procurement, transparency, and open data. Significantly, the executive orders later turned into legislation, which brought the power of stronger department mandates and a decent budget.

Once enacted, laws are generally difficult to revise. However, in the case of San Francisco, the city council has already revised the law two times in four years. The current law reads like an all-star list of elements that should be included in open data policies, including:

•   A strategic, long-term vision for resident empowerment•   Clear definitions and straightforward ownership or core responsibilities•   A mandate to identify data that should be open and clear direction on

how to open the data•   Strong technical standards for data interoperability and the use of Ap-

plication Programming Interfaces (APIs)•   Rigorous protection of privacy, confidentiality, and security•   Minimal license restrictions•   Building on other existing laws•   A mandate to engage the public for feedback while also improving cus-

tomer service18

Despite the strengths of its law, San Francisco has fallen behind other major cities in terms of extracting value from its data. A simple comparison to the success of Chicago’s Smart Chicago Collaborative or New York City’s Big Apps Challenges demonstrates how, for San Francisco, a great law is simply not good enough.

In the United States, twenty cities, as well as nine counties and states, have passed open data laws. There are another ten to fifteen jurisdictions with legislators that have publicly indicated that they will propose similar legislation.19 As with many other waves of inventions, these laws have certain elements in common, yet there are also city-level experiments that have ab-solutely nothing in common with laws in other cities. Of course, time will tell which policies ultimately prove to be the best.

Do You Really Need an Open Data Policy?

With this roster of policy successes and failures, it is important to take a step back and ask whether open data policy is essential. Whether a govern-ment is large or small, it is reasonable to conclude that an official policy is not strictly necessary for empowering people with the public data they need. However, any attorney general will advise a mayor or governor that there is some risk involved in data publishing efforts, and that policies are the best way to mitigate this risk.

When well-executed, policies provide benefits that other innovation tools—such as hackathons or entrepreneur-in-residence programs—cannot.

Page 67: Sais.34.1

67Open Data pOlicy imprOves DemOcracy

These benefits include additional funding to make sure the programs have the requisite infrastructure, longevity beyond any individual leader’s term, and a wake-up call to spur slow-moving bu-reaucracies. But the ab-sence of policy should not prevent a govern-ment leader from taking the first steps toward a successful open data program. The greatest barrier to getting started is identifying responsible government leaders who recognize they should give resources back to the taxpayers who pay for them.

The Protection of Privacy

Every policy leader who works on open government issues should place top priority on the protection of privacy, confidentiality, and security. This article will not address the intricate legal definitions that determine which data is private or whether recent U.S. federal government activities were justified. Instead, in the context of open data policy, the responsible catch-all is to reiterate in policy documents the paramount importance of privacy protection, and to consult with your government agency’s chief attorney, chief privacy officer, or other relevant leader to determine agency-specific privacy protection measures.

For example, there is an evolving risk area known as the “mosaic ef-fect,” which refers to a process in which direct disclosures of information alone are not violations of privacy, but the ability to combine several differ-ent datasets can lead to such violations. The existence of this risk became well-known when Netflix released what they thought was “anonymized” information about a set of users, when in reality, it was possible for hackers to “de-anonymize” the profiles.20 In the case of the U.S. federal government, its policy acknowledges this risk, places responsibility for fighting the risk on individual government agencies, and provides tools to help them do so.

Open data is not fundamentally at odds with privacy protection. In my experience, the exercise of conducting an inventory, which is essen-tial to managing an effective open data program, always led to revelations for leadership about the data their govern-ment was collecting. These revelations ultimately resulted in the creation of more secure digital infrastructures. It is easy for an apathetic stakeholder to mention the word “privacy” in order to restrict innovation in open data. The protection of privacy must be paramount, but given the advanced tech-

Open data is not fundamentally at odds with privacy protection.

The greatest barrier to getting started is identifying responsible government leaders who recognize they should give resources back to the taxpayers who pay for them.

Page 68: Sais.34.1

68 SAIS Review Winter–Spring 2014

nological solutions for protecting sensitive information, which are used by hundreds of government agencies all throughout the world, privacy does not have to be an excuse to halt the entire train.

A Global Movement

The opportunities and risks of government data are not restricted to any one country. As mentioned before, the G8 recently adopted an open data charter and is launching new programs now. Additionally, the World Bank has one of the best open data programs among the major non-governmental organizations. Furthermore, after Typhoon Haiyan struck the Philippines in 2013, killing over six thousand people and inflicting unmeasurable damage, open data programs organically developed to assist with the reconstruction efforts by better targeting aid relief.21

Open data initiatives do not develop in isolated cases; instead, there is a global highway. Applications that were built to simplify the complexity of city building permits in Ireland are expanding to San Francisco, given the similarity of data availability. From one perspective, as new data becomes available, the data serves as a kind of business development plan for innova-tors to sell their products into new markets. On a global scale, these patterns demonstrate that no single organization can extract all of open data’s value. This maxim is best illustrated by Bill Joy, the co-founder of Sun Microsys-tems. He advises that the smartest people in the world will always work for somebody else.22 Therefore, to unlock the maximum potential from open data, organizations must host data in ways that maximize discoverability, value creation, and access through interoperability. For these reasons, there is a self-fulfilling process in which the global growth of open data will also improve the quality of its impact.

Your Local Playbook

You’re a policy leader who does not yet have an open data program. How can you get started today? As a first step, I advise you to think about opening datasets for people to use. Of course, choosing the data to start with can be a cumbersome task. Most cities begin with datasets related to crime, zoning, permits and licenses, taxes and budget, or general site locations. Prioritizing the datasets that support the political executive’s top priorities may also be a good place to start. It is also a good exercise to begin with the question, “Am I wasting time trying to find data within my government instead of spending my time actually working with the data?”

For most organizations that launch open data programs, the first set of tangible returns they receive is often due to major efficiency improve-ments. Open data provides direct benefits to publishing organizations. It not only benefits external stakeholders, but when leveraged internally, it can also help organizations make data-driven decisions, deliver core programs and services, and achieve mission goals more efficiently. As a final sugges-tion, simply research what open data leaders like Chicago, New York, and San Francisco are doing for inspiration.

Page 69: Sais.34.1

69Open Data pOlicy imprOves DemOcracy

We all know that established government bureaucracies are in need of transformation. We have witnessed how innovative ideas can arise when the right information is connected with the right people. Therefore, policy leaders must commit to using open data tools to empower their residents and to improve the quality of government services.

Notes

1 Rachel Haot, “Open Government Initiatives Helped New Yorkers Stay Connected During Hurricane Sandy,” TechCrunch, January 11, 2013, http://techcrunch.com/2013/01/11/data-and-digital-saved-lives-in-nyc-during-hurricane-sandy/ (accessed March 2, 2014).2 President Barack Obama to Heads of Executive Departments and Agencies, “Memorandum on Transparency and Open Government,” The White House, January 21, 2009, http://www.whitehouse.gov/the_press_office/Transparency_and_Open_Government/ (accessed March 2, 2014).3 Gavin Newsom, “San Francisco Government and Technology: How We’re Innovating,” Mashable, October 21, 2009, http://mashable.com/2009/10/21/san-francisco-government/ (accessed March 17, 2014).4 “Government Digital Service,” Wikipedia, http://en.wikipedia.org/wiki/Government_Digi-tal_Service, (accessed March 17, 2014).5 European Commission, European legislation on reuse of public sector information, http://ec.europa.eu/digital-agenda/en/european-legislation-reuse-public-sector-information (ac-cessed March 17, 2014).6 G8, Open Data Charter and Technical Annex (London, United Kingdom: UK Cabinet Office, 17 June 2013), https://www.gov.uk/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex#contents (accessed March 2, 2014).7 Cyd Harrell, “The Beginning of a Beautiful Friendship: Data and Design in Innovative Citizen Experiences,” Beyond Transparency: Open Data and the Future of Civic Innovation, ed. Brett Goldstein and Lauren Dyson (San Francisco, CA: Code for America Press, 2013), 151.8 City of Chicago, “Crimes - 2001 to Present,” City of Chicago Data Portal, https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2 (accessed March 23, 2014).9 City of Chicago, “Crime App,” City of Chicago Digital Portal, http://digital.cityofchicago.org/index.php/crime-app/ (accessed March 17, 2014).10 Kramer Concepts, LLC, Chicago Crime Watch, version 1.2.2 (iTunes, 2013), iPhone and iPad app, https://itunes.apple.com/us/app/chicago-crime-watch/id549140902.11 City of Chicago, A report on the status of open data in Chicago and actions for 2014, http://report.cityofchicago.org/open-data-2013/ (accessed March 17, 2014).12 GovLab, “Welcome to the Open Data 500 Pre-Launch,” Open Data 500, http://www.opendata500.com/ (accessed March 23, 2014).13 Data.gov.uk, “Apps,” Data.gov.uk, http://data.gov.uk/apps (accessed March 23, 2014).14 McKinsey Global Institute, “Open data: Unlocking innovation and performance with liq-uid information,” http://www.mckinsey.com/insights/business_technology/open_data_un-locking_innovation_and_performance_with_liquid_information, October 2013 (accessed March 17, 2014).15 Github, “Project Open Data,” Revision Notes, https://github.com/project-open-data/project-open-data.github.io/commits/master (accessed March 20, 2014).16 City of Palo Alto, “Proclamation Open [Data] by Default,” https://www.cityofpaloalto.org/civicax/filebank/documents/38803, February 3, 2014 (accessed March 17, 2014).17 Bernard Mazer, Department of Labor Memorandum, “Implementation of Department of Interior’s Open Data Policy”, http://project-open-data.github.io/assets/docs/MEMO_RE_IMPLEMENTATION_OF_DOI_OPEN_DATA_POLICY.pdf, September 16, 2013 (accessed March 20, 2014).18 American Legal Publishing Corporation, San Francisco Codes, http://www.amlegal.com/library/ca/sfrancisco.shtml (accessed March 20, 2014).

Page 70: Sais.34.1

70 SAIS Review Winter–Spring 201419 Sunlight Foundation, “Open Data Policies at Work,” http://sunlightfoundation.com/policy/opendatamap/ (accessed March 20, 2014).20 Taylor Buley, “Netflix Settles Privacy Lawsuit, Cancels Prize Sequel,” Forbes, http://www.forbes.com/sites/firewall/2010/03/12/netflix-settles-privacy-suit-cancels-netflix-prize-two-sequel/, March, 12, 2010 (accessed March 17, 2014).21 Zuzana Stanton, “OpenStreeMap volunteers map Typhoon Haiyan-affected areas to support Philippines relief and recovery efforts,” World Bank Blog, https://blogs.worldbank.org/eastasiapacific/openstreetmap-volunteers-map-typhoon-haiyan-affected-areas-support-philippines-relief-and-recovery, November 15, 2013 (accessed March 17, 2014).22 McKinsey Global Institute, “Unleashing Government’s ‘innovation mojo,’” http://www.mckinsey.com/insights/public_sector/unleashing_governments_innovation_mojo_todd_park_interview, October 2012 (accessed on March 23, 2014).

Page 71: Sais.34.1

71SAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

71© 2014 by The Johns Hopkins University Press

Open Governments, Open Data: A New Lever for Transparency, Citizen Engagement, and Economic Growth

Joel Gurin

The international open data movement is beginning to have an impact on government policy, business strategy, and economic development. Roughly sixty countries in the Open Government Partnership have committed to principles that include releasing government data as open data—that is, free public data in forms that can be readily used. Hundreds of businesses are using open data to create jobs and build economic value. Up to now, however, most of this activity has taken place in developed countries, with the United States and United Kingdom in the lead. The use of open data faces more obstacles in developing countries, but has growing promise there, as well.

The open data movement, an international movement to make data from governments and other sources available for public use, has evolved

alongside big data and holds equally powerful implications. Expanding our public stores of open data will allow users, from small-scale mobile app developer to large financial or healthcare institutions, to find new, in-novative, and productive applications that generate economic and social value. Examples of the impact of the open data movement can be found the world over:

•   In the United States, more than five hundred companies are using free government open data to build businesses of all kinds and sizes—in healthcare, financial services, energy, education, and other fields.1 The federal government encourages this trend by holding “Datapaloozas” that bring hundreds or even thousands of policymakers, subject matter experts, business leaders, and technologists together to celebrate the value of government data and identify new applications.2

•   In the United Kingdom, the government launched the Open Data Insti-tute with an initial grant of 10 million pounds. This institute serves as an incubator for new data-driven companies working on such issues as sustainability, housing, and corporate transparency.3

Joel Gurin is senior advisor at the Governance Lab at New York University, where he directs the Open Data 500 study. He is also author of the book Open Data Now, and founder and editor of OpenDataNow.com. He previously chaired the White House Task Force on Smart Disclosure and is the former editorial director and executive vice president of Consumer Reports.

Page 72: Sais.34.1

72 Open GOvernments, Open DataSAIS Review Winter–sprinG 2014

•   In Russia, a real estate lawyer founded RosPil, a website that publishes government contracts online for the public to scrutinize for signs of cor-ruption. Many suspicious contracts were annulled after being flagged in RosPil. The site saved Russians more than a billion dollars in its first few years.4

•   In Ghana, a young entrepreneur created Farmerline, a company that increases access to government data on markets, weather, and more for farmers and their families. Farmerline provides information by voice via cellphone for illiterate farmers.5

•   In about sixty countries around the world, members of the Open Govern-ment Partnership have committed to making government data available to their citizens in simple, practical formats.6

Defining Open Data and Big Data

Open data is distinct from big data, which describes large datasets pro-cessed using advanced analytic techniques. Big data on national economies, demographics, energy sources, social and political trends, and more can in-

fluence strategic approaches to foreign policy and inter-national relations. However, this data is often inaccessible to entrepreneurs, advocates, or ordinary citizens. For ex-ample, national security or

corporate and financial datasets are closely guarded, and only available to a limited number of people for review and analysis.

In comparison, open data is designed for public consumption, and it is democratic by nature. It can be defined as “accessible, public data that people, companies, and organizations can use to launch new ventures, ana-lyze patterns and trends, make data-driven decisions, and solve complex problems.”7 Where big data used within government agencies aids leaders in the formulation of foreign policy, open data can boost economic develop-ment, improve trust in government, and fight corruption.

Big data, open data, and the broader idea of open government are related and overlapping concepts, as demonstrated in the Venn diagram below. Big data does not have a precise definition, but describes data that is voluminous, varied, and rapidly changing. Big data must be managed and analyzed with sophisticated technology. Open data consists of big data, as well as smaller-scale data, which also has important applications. For example, the participatory budgeting movement, now active in 1,500 cities around the world, uses open data on city budgets to empower communities to direct how their tax dollars are spent. The budget data is not especially complex, but opening it up for citizens to use can have important results.8 Similarly, the World Bank, which has started publishing information on international development as open data, provides a lot of small-scale data that is nonetheless significant. Statistics like the number of women in parliament or patterns in life expectancy can be displayed in simple but meaningful tables and graphs.9

[…]open data can boost economic development, improve trust in government, and fight corruption.

Page 73: Sais.34.1

73OPEN GOVERNMENTS, OPEN DATA

Sources and Uses of Open Data

While national, regional, and local governments are the main sources of open data, other sources exist, as well. Civil society and the private sector are both major generators of open data. In developing countries, because government data systems are often incomplete or unreliable, these sources are especially important.

Social media is one signifi cant source of open data. Public activity on Twitter, review websites like Yelp, or other outlets for public opinion can be analyzed to identify market trends, societal issues, or political momentum. For example, V.S. Subrahmanian at the University of Maryland used data collected from social media to predict the outcome of national elections in India,10 while a team at Imperial College in London reviewed Twitter pat-terns to study how different groups infl uenced and directed each other in the 2011 London riots.11

Open data can also be collected from citizens themselves. This “crowd-sourcing” approach is the basis for RosPil’s anti-corruption work in Russia, as well as a similar site in India, IPaidABribe.com. The international organi-zation Global Integrity asks ordinary people in countries around the world to gather information to increase government transparency and account-ability.12 Crowdsourcing can also be used to gather fi eld data on healthcare issues, the activities of multinational corporations, or the environment, which becomes open data when made available for public use.

Finally, more and more research scientists—even some in the pharma-ceutical industry—are starting to share their research fi ndings as open data in the name of scientifi c progress. Often in international collaborations,

Figure 1. Big Data, Open Data, and Open Government

Page 74: Sais.34.1

74 Open GOvernments, Open DataSAIS Review Winter–sprinG 2014

researchers studying Alzheimer’s Disease, Parkinson’s Disease, multiple my-eloma, and other serious diseases are now working together to share data in the early stages of experiments. Many foundations that support this research insist on data-sharing as a condition for grants: focused on the search for cures, they see shared data as a way to accelerate progress.13

In a recent significant step, Johnson & Johnson agreed to make its clinical trial data publicly available by sharing it with a Yale research cen-ter. This is an important shift from the common practice of releasing only those clinical trials that suggest a drug is effective. By releasing all clinical trial data, Johnson & Johnson will enable other researchers to determine whether the positive clinical data actually outweighs negative trial results.14

In short, open data can be derived from a range of nongovernmental and governmental sources. The result is an emerging ecosystem that is ro-bust and complex. Considered as a whole, open data has a clear potential to be a major force in politics, culture, and economic growth throughout the world. This potential is already evident in a number of areas.

Open Data as an Economic Driver

As the open data movement developed over the last decade, its goals ex-panded. At first, advocates focused on government transparency and ac-countability. Those are still essential goals. However, governments now also recognize open data as a tool for economic development. In a world where economies are increasingly data-driven, the free, public nature of open data makes it a powerful business resource.

Open data’s economic value was first realized in the United States and the United Kingdom, the two countries with the most extensive and devel-oped national open data policies. The U.S. government’s Open Data Policy, announced by executive order in May 2013, is designed to make federal data “open by default,” meaning that agencies will release it in usable forms to the public unless there is a compelling reason not to, such as privacy or national security.15

In announcing the policy, President Obama recognized the economic value of open data. He declared that “we’re making even more govern-

ment data available, and we’re making it easier for people to find and to use.” He then predicted that open data is “going to help launch more startups. It’s going to

help launch more businesses… It’s going to help more entrepreneurs come up with products and services that we haven’t even imagined yet.”16

While making data “open by default” is the right goal, it is idealistic to expect it to happen any time soon. Simply declaring that data should be open will not make it soi the data will not open itself. The United States has thousands of different federal data systems, many housing incomplete or

The United States has thousands of different federal data systems, many housing incomplete or inaccurate data managed with outdated technology.

Page 75: Sais.34.1

75Open GOvernments, Open Data

inaccurate data managed with outdated technology. And in other countries, open data resources may be far more limited. How can we determine whether opening this data will be worth the time, effort, and expense?

The most widely quoted study of open data’s potential value was pub-lished by McKinsey and Company in October 2013.17 That study estimated that unlocking open data could be worth 3 to 5 trillion dollars per year across seven sectors of the global economy: education, transportation, con-sumer products, electricity, oil and gas, health care, and consumer finance. Other studies have provided regional and country-level estimates. Open data has an estimated annual worth between 3 and 9 billion dollars in the United Kingdom.18 For the European Union, the value has been estimated to be 30 to 140 billion euros.19

Just as important as knowing the economic value of open data is understanding how that economic value is produced. The Open Data 500 study, managed by New York University’s Governance Lab (GovLab), ex-amines five hundred companies based in the United States to assess how government open data is used as a key business resource.20 Companies in fifteen different sectors were identified as open data users. They range from information providers for consumers to those that manage data for other businesses—companies that ranged in size from large organizations like Bloomberg and Dun and Bradstreet to those as small as two-person startups that create mobile apps.

The Open Data 500 survey asked companies to list the government agencies whose data they use, in the interest of developing a new process to make key government datasets more accessible and usable. The GovLab plans to use information from the Open Data 500 as a foundation for a series of open data roundtables, which will bring government agencies to-gether with the companies that use their data. In these structured dialogues, data-holders and data-users will work together to identify high-priority datasets, discuss the obstacles to their use, and begin to figure out ways to overcome those obstacles.

The Open Data 500 study also asked companies to describe how they earn revenue, and found that open data businesses use a variety of operating models. Several use the same kinds of revenue models that any web-based business could consider, including advertising, subscriptions, and referral fees for sending customers to other businesses. Others, in contrast, provide more high-tech data management or data analysis services. Many use a mix of models.

Other researchers have studied the ways that open data builds business value. Research by Deloitte’s Insights Team identified five “archetypes” for open data companies:21

•   Suppliers publish their data as open data, not as a direct revenue source but to increase customer loyalty, enhance the company’s reputation, or help meet other goals.

•   Aggregators collect and analyze open data, selling their insights or profit-ing in other ways.

•   Developers “design, build, and sell web-based, tablet, or smartphone ap-plications” using open data as a free resource.

Page 76: Sais.34.1

76 Open GOvernments, Open DataSAIS Review Winter–sprinG 2014

•   Enrichers are “typically large, established businesses” that use open data to “enhance their existing products and services,” for example by using demographic data to reach customers in new ways.

•   Enablers charge other companies to facilitate the access and use of open data.

Understanding business needs and revenue models will help gauge the value of open data, but may still not yield meaningful national numbers. Estimating what a country’s entire corpus of government data is worth is

difficult and, in fact, may not be the right approach. Not all government data is equally valu-able. For example, 10 percent of a government’s data may hold 90 percent of its economic value. Determining which datasets in which sectors are most valuable to the public may be a better ap-proach. It is also important to

identify how the public and private sectors can cooperate to achieve greater access.

Government Approaches to Open Data

Up to now, U.S. government agencies have determined the release process for their data with little input from the people and companies that use it. This process, essentially a supply-side approach to data, was based on the hope that simply providing access to data would encourage its usage. However, left to make their own judgments, government agencies may not make the best decisions. They may avoid releasing data that is more diffi-cult to process for public consumption, or data with embarrassing flaws in completeness and accuracy. This could mean the information most valued by the public remains behind closed doors. Or, in contrast, too much un-important data could be published instead, simply because it happens to be easy to release. Daniel Kaufmann of Revenue Watch has called this problem “zombie data”—data published without purpose or any real value.22

A better approach is what can be called “demand-driven data disclo-sure”: a systematic way to ensure that open data is driven by the needs of potential users by involving many stakeholders (see diagram below). The U.S. Open Data Policy requires each agency to release an inventory of all its data-sets and designate a point of contact for public inquiries—both good steps to connect data providers to data users. But agencies could do more to gather input from data users directly. The GovLab’s open data roundtables are designed to facilitate that process. Agencies could also create online forums to discuss specific issues or establish public advisory groups on open data.

Other countries already engage with open data users in innovative ways. In the United Kingdom, the government formed an Open Data User Group in 2013 to provide ongoing feedback on its open data program.23 The

Not all government data is equa l ly va luable […] 10 percent of a government’s data may hold 90 percent of its economic value.

Page 77: Sais.34.1

77OPEN GOVERNMENTS, OPEN DATA

French government asked a variety of open data activists to advise them on how to structure Data.gouv.fr, the central government hub for open data.24

In Mexico, the government used a website it calls a “datatron” to gather input on the open data that was most sought after by the public.25

As more governments launch open data programs, NGOs are develop-ing methods to measure the effectiveness of these initiatives. The Open Data Barometer, recently published by the Open Data Institute and the World Wide Web Foundation, measures the “openness” of government data in seventy-seven countries.26 A comparable assessment of seventy countries by the Open Knowledge Foundation, called the Open Data Index, produced generally similar country evaluations.27

Wealthier countries generally score higher on openness than develop-ing countries, with the United Kingdom and the United States as the world leaders. A recent analysis of the Open Knowledge Foundation’s index by the Oxford Internet Institute found that national wealth accounts for about a third of the variation in countries’ scores.28 To some extent, the reasons may be technological: technology infrastructure is expensive and a country without widespread internet access may fi nd it diffi cult to take advantage of open data.

In many countries, though, the problem is more complex. Open data can challenge the established order in ways unfavorable to the incumbent government. The founder of RosPil, Aleksei Navalny, faces ongoing legal difficulties with the Russian government, including imprisonment on charges of conspiring to steal from a state-owned lumber company.29 Many governments will fi nd open data a threat to their power. As the Washington Post observed in reporting on the Oxford Internet Institute study mentioned above, “If you’ve got a corrupt government, transparency’s probably not your thing. If you’ve got high unemployment, deep poverty or a serious pollution problem, you’re probably not inclined to hand over information about those problems to the people who live with them.”30

Figure 2. Multi-stakeholder Engagement: “Demand-Driven Data Disclosure”

Page 78: Sais.34.1

78 Open GOvernments, Open DataSAIS Review Winter–sprinG 2014

How to Make Data Work for Development

Put these issues aside for a minute and imagine that a government wants to use open data in the cause of transparency, accountability, good govern-ment, and economic development. Even in the best political environment, putting open data to work in the developing world is difficult. Limited resources, poor data collection, and political and legal issues all interfere with open data’s effective usage.

A recent blog post by Prasanna Lal Das, a leader in open data initia-tives at the World Bank, summarized the obstacles. First, many countries lack key data on important issues. They also face logistical challenges in releasing data, including data irregularity, technical gaps, and cost. In ad-dition, Lal Das writes, “The policy/regulation environment around open data in developing countries is patchy…There are human capital gaps in the sector [and]… access to finance for emerging smart data firms is one of the frontline issues.”31

It is a chicken and egg problem. National governments may have little incentive to release data until they see the economic benefits, but compa-nies cannot demonstrate those benefits until they have access to open data.

But some developing countries are beginning to develop pro-grams to release open data, either because they have seen its promise in the developed world, or because they want to increase government trans-parency and credibility. The World Bank is helping those countries implement open data policies with its Open

Data Readiness Assessment. The assessment helps national governments to evaluate the state of their data resources, gaps in their expertise, and actions required to release data effectively.32

Where governments have been slow to gather and release their own open data, private data companies have begun to step in. As those companies gather data on developing countries, the countries’ governments need to de-cide whether to acknowledge and publish it. Their incentive is to gain credit for transparency, with foreign investors and their own citizens, by publishing the new data. Even though the data may include embarrassing information on poor health, poverty, or a country’s other problems, acknowledging the problems and promoting open data may be politically wise when a private company starts to bring these issues to light.

One data-gathering company is Metabiota, a San Francisco-based company that collects information on infectious disease internationally, in-cluding in the United States. Metabiota merges data from the World Health Organization and other sources with data from blood samples collected by workers the company trains. The company then sells reports based on the

National governments may have little incentive to release data until they see the economic benefits, but companies cannot demonstrate those benefits until they have access to open data.

Page 79: Sais.34.1

79Open GOvernments, Open Data

data to U.S. government agencies, such as the Department of Defense, and others that have an interest in knowing about infections disease risks.33

Based in New York, another innovative startup is Ulula, which means “transparency” in Chichewe, a Southern African language. Ulula gathers data about companies in the extractive industries in about a dozen different countries. Using mobile technology, local citizens can anonymously report information on company operations. The clients for this crowdsourced open data include investors and companies themselves, who use Ulula to monitor their operations and improve relationships with the communities in which they operate.34

Applying open data in any country requires a deep knowledge of what kind of business will best serve that country’s citizens. Take the case of weather and agricultural data, and the ways it can be used to serve farmers in different parts of the world.

First, consider a company that has become the best-known example of a successful open data business. The Climate Corporation formed in 2006 with the goal of developing a better form of weather insurance: The com-pany planned to analyze the risks posed by bad weather precisely enough to offer better insurance benefits at reasonable premiums. Mathematicians and data analysts from organizations like Google and Stanford were hired to analyze vast amounts of data from U.S. government sources to achieve the company’s goal.

As a result, the Climate Corporation developed such an extensive analysis that it can now offer more than weather insurance: It can provide American farmers precise guidance on crops and planting cycles to help them adapt to climate change. In the fall of 2013, Monsanto bought the Climate Corporation for just under one billion dollars. Their success is a prominent example of how free public data can be turned into financial gain.35

The Climate Corporation is a success in America, but it would not work as well in Ghana, where more than half of the population relies on farming for their livelihood. With the exception of a few large agribusinesses, most farmers do not want highly sophisticated data. Instead, they seek public data on weather, markets, and other relevant factors. The Ghanaian farmer’s problem is that even this basic information is inaccessible and many farmers are not literate enough to use what is available. Agricultural extension officers can help farmers in theory, but there are major practical limitations to their impact: There are about a thousand farmers for every extension officer assigned to help them.

On the other hand, open data can be used to help these farmers in ways that work in their country. Mobile phone penetration is high in Ghana and there is a growing cadre of young tech-savvy entrepreneurs. One of these innovators noticed the information gap and created Farmerline to deliver farm-related data in a simplified but effective format. Using an applica-tion available to anyone with a cell phone, Farmerline delivers information and allows farmers to ask questions through voice as well as text in any of Ghana’s many languages.36

Page 80: Sais.34.1

80 Open GOvernments, Open DataSAIS Review Winter–sprinG 2014

Conclusion

The drive to open data is a global movement that will unfold in different ways around the world. In developed countries, the economic value is now clear enough to give a rationale for releasing open data, and the United States’ and United Kingdom’s national policies are setting a model that other countries have started to follow. In developing countries, open data has been a lower priority. There isn’t yet strong evidence that open data can quickly create great business value in developing economies today.

Still, there are clear reasons for developing countries to adopt open data policies sooner rather than later. In countries with a history of corrup-tion, open data can help a government establish transparency, credibility, and trust. International investors will increasingly demand open data to help them decide which countries are a good risk for them. And for young, tech-savvy entrepreneurs, open data in any country can be a free resource to develop new products and services at a low cost.

However the future plays out, open data is almost certain to be a sig-nificant global force. It will help shape economic development, international investment decisions, and international relations. Like big data, it will have a profound effect on how we understand the world’s economic, political, and demographic landscapes. But unlike most big data, open data will make that understanding accessible to all, with far-reaching results.

Notes

1 http://www.OpenData500.com2 Lauren Caldwell, “Open Government Data Fuels Private Sector Innovation: Data Jams, Hackathons, and Datapaloozas,” SAIS Review (blog), March 12, 2014, http://saisreview.org/2014/03/12/open-government-data-fuels-private-sector-innovation-data-jams-hack-athons-and-datapaloozas/.3 http://theodi.org/4 Paul Healy, Karthik Ramanna, and Matthew Shaffer, “Rospil.Info,” HBS No. 112-033 (Boston: Harvard Business School Publishing, 2012), http://www.hbs.edu/faculty/Pages/item.aspx?num=41530.5 http://farmerline.org/6 http://www.opengovpartnership.org/7 Joel Gurin, Open Data Now, (New York: McGraw-Hill, 2014), 9.8 For more on the Participatory Budgeting Project, see: http://www.participatorybudgeting.org/about-participatory-budgeting/what-is-pb/.9 See http://data.worldbank.org/.10 V.S. Subrahmanian, “Forecasting the Spread of Sentiments and Emotion in Social Media.” presentation, Sentiment Analysis Symposium from Alta Plana Corporation, New York, NY, March 6, 2014.11 Mariano Beguerisse-Diaz et al., “Communities, roles, and informational organigrams in directed networks: the Twitter network of the UK riots,” arXiv:1311.6785v1 [physics.soc-ph], November 26, 2013, http://arxiv.org/abs/1311.6785.12 http://www.globalintegrity.org/.13 See Gurin, Open Data Now, Chapter 10.14 “Johnson & Johnson announces clinical trial data sharing agreement with Yale School of Medicine,” Johnson & Johnson press release, January 31, 2014, https://www.jnj.com/news/all/johnson-and-johnson-announces-clinical-trial-data-sharing-agreement-with-yale-school-of-medicine.

Page 81: Sais.34.1

81Open GOvernments, Open Data

15 Barack Obama, “Making Open and Machine-Readable the New Default for Government Information,” Executive Order 13642, May 9, 2013, http://www.whitehouse.gov/the-press-office/2013/05/09/executive-order-making-open-and-machine-readable-new-default-government-.16 Barack Obama, “President Obama Speaks on Innovation and Manufacturing,” remarks, Applied Materials, Inc., Austin, TX, May 9, 2013, http://www.whitehouse.gov/photos-and-video/video/2013/05/09/president-obama-speaks-innovation-and-manufacturing.17 James Manyika et al., “Open Data: Unlocking Innovation and Performance With Liquid Information,” McKinsey & Company, October 2013, http://www.mckinsey.com/insights/business_technology/open_data_unlocking_innovation_and_performance_with_liquid_in-formation.18 UK Department for Business Innovation and Skills, Market Assessment of Public Sector Information, by Deloitte, London, UK: URN BIS/13/743, May 2013, https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/198905/bis-13-743-market-assessment-of-public-sector-information.pdf.19 European Commission, Review of Recent Studies on PSI [Public Sector Information Re-Use and Recent Market Developments, by Graham Vickery, Information Economics, Paris, France: 2011, http://www.umic.pt/images/stories/publicacoes6/psi_final_version_formatted-1.pdf.20 The Open Data 500 study has both civic and economic goals, including: provide a basis for assessing the economic value of government open data; encourage the development of new open data companies; and, foster a dialogue between government and business on how government data can be made more useful. The research model employed is applicable to other countries, and we are investigating opportunities to replicate the study around the world. For additional information, see http://www.opendata500.com.21 Deloitte LLP and the Open Data Institute, “Open Growth: Stimulating the Demand for Open Data in the UK,” Deloitte Analytics Briefing Note, 2012, 3. http://www.deloitte.com/assets/Dcom-UnitedKingdom/Local%20Assets/Documents/Market%20insights/Deloitte%20Analytics/uk-da-open-growth.pdf.22 Daniel Kaufmann, “Zombie Data: Has Open Data Failed to Live Up to Its Hype?” round-table discussion, from Thomson Reuters, London, UK, October 25, 2013, http://www.trust.org/spotlight/Zombie-data-has-open-data-failed-to-live-up-to-its-hype/.23 https://www.gov.uk/government/policy-advisory-groups/open-data-user-group24 Romaine Dillet, “How France’s Open Data Team is Modernizing the French Government Through Data,” TechCrunch, February 12, 2014, http://techcrunch.com/2014/02/12/how-frances-open-data-team-is-modernizing-the-french-government-through-data/.25 http://datos.gob.mx/.26 “Open Data Barometer,” Open Data Research Network, http://www.opendataresearch.org/project/2013/odb.27 “Open Data Index,” Open Knowledge Foundation, https://index.okfn.org/.28 “Open Data Index,” Oxford Internet Institute, http://geography.oii.ox.ac.uk/#open-data-index.29 Greg Brown, “Crowdsourcing to Fight Corruption: Aleksei Navalny and the RosPil Experiment,” Sunlight Foundation (blog), August 6, 2013, https://sunlightfoundation.com/blog/2013/08/06/crowdsourcing-to-fight-corruption-aleksei-navalny-and-the-rospil-experiment/.30 Emily Badger, “Why the Wealthiest Countries Are Also the Most Open With Their Data,” Wonkblog (blog), Washington Post, March 14, 2014, http://www.washingtonpost.com/blogs/wonkblog/wp/2014/03/14/why-the-wealthiest-countries-are-also-the-most-open-with-their-data/.31 Prasanna Lal Das and Alla Morrison, “From Open Data to Development Impact – the Crucial Role of the Private Sector,” Open Data: The World Bank Data Blog (blog), January 8, 2014, http://blogs.worldbank.org/opendata/open-data-development-impact-crucial-role-private-sector.32 “Readiness Assessment Tool,” The World Bank, http://data.worldbank.org/about/open-government-data-toolkit/readiness-assessment-tool.33 http://metabiota.com/

Page 82: Sais.34.1

82 Open GOvernments, Open DataSAIS Review Winter–sprinG 201434 http://ulula.com/35 Gurin, Open Data Now, 27-31.36 Laura Manley (project manager of the Open Data 500 and former Vittana Fellow in Ghana) in discussion with the author, March 2014.

Page 83: Sais.34.1

83GettinG the technoloGical toolkits RiGhtSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

Of Note

Open Governments, Open Data: Getting the Technological Toolkits Right

Mike Nguyen

Around the world, governments and local authorities are beginning to release their data for public consumption and reuse. As Joel Gurin writes

in his article, “Open Governments, Open Data,” open data promises to be a “global force” that has the potential “to shape economic development, international investment decisions, and international relations.” However, as anyone who has worked with data knows, simply releasing or having open data is not enough. The challenge comes in formatting the data to be workable and computable, which allows the data to deliver insights. With constant advances in computing and mobile technology, open data offers even more promise in the hands of average end-users when combined with the right tools.

On one end of the spectrum, some authorities have made great strides in pushing for open data, but have developed limited means to visualize or drive home the impact of their data. For example, NGOs like China’s Institute of Public and Envi-ronmental Affairs, which devel-oped the China Pollution Map Database, have pushed Chinese government departments at all levels and regions throughout China to release 97,000 envi-ronmental supervision records of polluting enterprises in Chi-na. However, a perusal of the website demonstrates its limitations: visualization is only possible within the website’s limited platform, and individual records can only be searched and accessed on a one-by-one basis. These limitations become frustrating when trying to gather aggregate data to tell a story about a particular region or company.

Likewise, IPaidABribe.com, an Indian NGO that collects corruption data, provides aggregated, summary-level data on corruption nationally and

83© 2014 by The Johns Hopkins University Press

Mike Nguyen is a second-year M.A. candidate at the Johns Hopkins University Paul H. Nitze School of Advanced International Studies (SAIS) concentrating in China Studies. He is an Assistant Editor for The SAIS Review.

...some authorities have made great strides in pushing for open data, but have developed limited means to visualize or drive home the impact of their data.

Page 84: Sais.34.1

84 SAIS Review Winter–Spring 2014

provincially, but offers only anecdotal evidence on the individual-level. The end-user cannot plot the anecdotal evidence of corruption on a map, link the data to specific offices or individuals, or research specific ministries or individual public officials to monitor trends or patterns of corruption at the micro-level. IPaidABribe.com has a wealth of information on corruption, with over 25,000 reports covering 659 cities throughout India and totaling over 71 crore rupees ($11.8 million) in bribes. Designing a tool or equipping users with methods to compute or utilize the data more effectively would make IPaidABribe.com’s platform even more powerful and impactful.

On the other end of the spectrum are entities such as the United States Census Bureau, which balances the limitations of its “in-house” interactive data visualization tools with the provision of raw, granular data that links to geographic information system (GIS) shapefiles and other survey processing software. Census data is a powerful resource when it is expertly visualized via GIS, offering clear insights from colored overlays on maps. However, for the average end-user, the process of visualizing this data—joining shapefiles on GIS, carefully pruning immense datasets, learning functions such as pivot tables, or software such as Microsoft Excel—requires a tremendous time commitment and technical expertise not readily accessible. These tools are useful for researchers, but for the average citizen, there is almost no desire to manipulate complex data.

Some authorities have tried for a middle ground approach to data availability and accessibility. For example, the World Bank’s World Databank allows for the creation of simple reports directly on their website, with op-tions to quickly access, download, and interact with the raw data (in various formats) within minutes. This approach strikes a healthy balance between the average end-user who may wish to generate simple charts and trends, and the needs of researchers, who may wish to run more complicated regres-sions using the detailed datasets. Promoters of open data should strive for this middle ground, providing both ease of use for the average end-user and more meaningful data for researchers.

However, the true potential of open data reaches beyond visualization or accessibility. It rests with execution—when real problems are made solv-able for real people through the use of open data. Imagine a world where

entire domains of knowledge, from arts to engineering to medicine , are connected on the cloud, curated, and can be accessed on a mobile device. Imag-ine an i l l i terate , poor farmer in sub-

Saharan Africa switching on his smartphone, studying Ikea-like diagrams of how to repair motorbikes, and then starting his own repair business with the microfinance application he has accessed. Imagine a child teaching

The true potential of open data reaches beyond visualization or accessibility. It rests with execution—when real problems are made solvable for real people through the use of open data.

Page 85: Sais.34.1

85GettinG the technoloGical toolkits RiGht

herself the skills to play an instrument, test for HIV, or operate a welding machine—all from her smartphone and other mobile devices that tap into curated domains of knowledge. Now, imagine that she can teach herself to program any number of these devices (or others yet to be connected) to fill knowledge or skills gaps in her community.

This is exactly the vision that Stephen Wolfram, inventor of Math-ematica, has behind his highly ambitious new computational paradigm and programming language.1 More promising than Google, which seeks to un-derstand objects and things and their relationships so it can provide search results, Wolfram Alpha wants to make all of the world’s data and domains of knowledge computable. Wolfram Alpha has already generated thousands of accessible datasets that are curated and updated in real-time on a variety of disciplines, such as weather patterns, socioeconomic indicators, health and medicine, engineering, transportation, and food and nutrition.

The project’s aim is to make these domains of knowledge accessible and referenced, intuitively computable via the Wolfram programming lan-guage, scalable to web applications, or embedded within mobile and con-nected devices that run on credit card-sized, single-board computer chips such as the Raspberry Pi. Put simply, the Wolfram programming language (which has 11,000 pages of documentation) hopes to make the world’s data and knowledge more accessible and computable. Then, it hopes to make programs—and eventually objects—smart. Imagine the incredible potential these smart objects would have if they could access all the world’s knowl-edge, as well as the incredible amounts of open data hosted by governments, corporations, and other organizations.

Governments and authorities that wish to join this mobile computing revolution must do more than make open data available. They must properly interface open data with the right devices, platforms, and programming languages so that the knowledge hidden in open data can be intuitively pre-sented, utilized, and implemented by end-users. Additionally, corporations, non-governmental organizations, and other authorities that curate immense stores of useful data should be encouraged to be more forthcoming and overcome the privacy and other challenges inherent in releasing this data.

While there are technological limitations, realizing open data’s full potential is not farfetched. According to Sony Ericsson’s 2013 “Mobility Report,” by 2019 LTE will cover more than 65 percent of the world’s popu-lation, and $20 smartphones and $35 tablets will soon be a global reality.2 When coupled with cloud computing capabilities and inexpensive mobile devices, the revolution in open data is a lightning rod in a perfect storm of technology, further democratizing and revolutionizing the way we interact with, access, and make use of data. Whether governments can take advantage of this revolution will depend on more than just making open data avail-able. Instead, it depends on how innovative and committed authorities are to ensuring that open data interfaces with the mobile, cloud, and technical revolutions simultaneously occurring.

Page 86: Sais.34.1

86 SAIS Review Winter–Spring 2014

Notes

1 “Stephen Wolfram’s Introduction to the Wolfram Language,” [n.d.], video clip, YouTube, https://www.youtube.com/watch?v=_P9HqHVPeik (accessed April 12, 2014). 2 Ericsson Mobility Report: On the Pulse of the Networked Society, November 2013, http://www.ericsson.com/res/docs/2013/ericsson-mobility-report-november-2013.pdf

Page 87: Sais.34.1

87Bridging the innovation-Policy gaPSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

87© 2014 by The Johns Hopkins University Press

Bridging the Innovation-Policy Gap

Kord Davis

The speed of technological progress has generated a gap in the dialogue between innovators and policymakers, posing a danger for the future of both. As questions over data ethics protocols gain prevalence, it will become critical to construct collaborative spaces where the public and private sector can engage in mutually beneficial conversations and share knowledge to design effective, balanced, and useful policy.

The pace of technological innovation far exceeds that of legislation. Policymakers are challenged to keep up with the latest developments in

features, functionality, and business models. Meanwhile, technologists are innovating on a daily basis, often ignoring the potential impact that future legislation or policy might have on their endeavors.

This is an untenable state of affairs. The growing gap between rev-enue-generating innovations in the business and technology worlds and the policies and regulations that govern commerce represents a mas-sive risk to econo-mies around the world. It is truly the Wild West of data innovation. What is legal and ethical today might be constrained in unimaginable ways tomorrow unless policymakers and technologists engage in a more productive dialogue.

Policymakers and Technologists: Challenges of Collaboration

There are many challenges to creating a space for generating dialogue be-tween policymakers and technologists. First among these challenges are borders. Political, cultural, social, and technological boundaries are shifting rapidly, and each has historically proven to be difficult terrain on which to map appropriate and useful policy. Data theft crimes have demonstrated

Kord Davis is a strategist and facilitator who helps organizations at the intersection of people and data. He is author of Ethics of Big Data (O’Reilly Media 2012), one of the first works to explore a framework for making decisions about data based in individual personal and shared organizational values.

What is legal and ethical today might be constrained in unimaginable ways tomorrow unless policymakers and technologists engage in a more productive dialogue.

Page 88: Sais.34.1

88 SAIS Review Winter–Spring 2014

the incredible complexities in applying legal policy across multiple cross-border jurisdictions.1

Similarly, the process of policy development poses significant challenges. Special interest lobbyists, committees, debates, and legal procedures are in-herently time-consuming influences on the development and enforcement of good policy. The sheer amount of time and effort it takes to generate consensus and to implement policy or legislation is one of the most stub-born barriers to collaboration.

Knowledge gaps are a major obstacle, as well. Technology evolves at such a rapid pace that keeping policymakers up to date on the latest innovation is a full-time job. Worse, our collective understanding of the ethical issues at play in the big data landscape is often too narrow or too broad to fully inform our actions. This knowledge gap is most visible on the topic of in-dividual privacy. Digital privacy does not exist outside the context of the particular technological system in which it is defined. And it is inherently bound to the technological architecture that defines individual identity.

In order to develop good policy, policymakers must understand the complexities of how identity and privacy are technologically instantiated in any given system. Furthermore, they must understand how identity and privacy are managed generally across many different systems. This is com-plex knowledge, yet it is essential to generate balanced and useful policy.

There is also a culture clash between technology and policy development. One way to characterize this clash is that technologists are interested in solving problems, while policymakers are interested in regulating solutions (or the methods of solving problems). For technologists, legislation and policy represent boundaries or constraints and feel like four-letter words. For policy makers, unregulated innovation represents the risk of unintended consequences. If we are to close the gap between innovation and legislation, we must improve our understanding of this culture clash, as well as the driv-ing forces that govern its shape and size, which include process, cultural, and social components.

Among these cultural and social components are conflicts of interest be-tween organizations and individuals. Businesses (and supporting technolo-gies) are inherently interested in innovation and profit, while individuals and organizations have a vested interest in their digital identity, privacy, prop-erty, and reputations. Good, fair, and balanced policy provides protections against business practices that are damaging to individuals, organizations, and markets, while encouraging ongoing innovation in pursuit of profit.

Without a more comprehensive understanding of the technological forces at play in these rapidly developing and innovative business practices and technical solutions, policy will continue to be outdated and toothless—or, worse, it will become a disruptive force that fails to do its intended job.

Technology and Social Values: The Cases of Robert Bork and Netflix

Technology will continue to evolve in ways and with outcomes that we cannot fully anticipate. These innovations can have a deep influence that impacts our social values and our body of law.

Page 89: Sais.34.1

89Bridging the innovation-Policy gaP

For example, in 1987, Robert Bork’s nomination to the Supreme Court was hotly contested. The opposition used Bork’s video rental history as evi-dence against his confirmation, and the resulting controversy led to federal legislation enacted by Congress in 1988. Called the Video Privacy Protection Act (VPPA), the VPPA made it illegal for any videotape rental service provider to disclose rental history information outside the ordinary course of busi-ness, and violators were liable for damages up to $2,500.

In September 2011, Netflix posted a public appeal, requesting that customers contact their Congressional representatives to amend the VPPA. The amendment would allow Netflix users to share their viewing history with friends on Facebook.22 It had been a mere twenty-three years between the passing of the VPPA, when Congress took action to protect consumers from having their purchase history used to judge their professional capa-bilities, and 2011, when a major American business asked its customers to allow that very same information to be shared legally.

What would a federal appointee’s nomination look like today if his or her Netflix queue were part of the standard confirmation process? Should there be a special section in the process for “watches a lot of conspiracy movies”? How many is too many?

Without big data, no business would be in a po-sition to offer such a ca-pability or to make such a request. In the twenty-three years between the VPAA and the Netflix request, big data has influenced the mechanisms with which we share information, as well as our desires and preferences about what is important enough to share. The influence of big data on our daily lives has motivated a call for more explicit discussion about the ethical use of big data technologies.

Four Aspects of Data Ethics

Technology and policy do not exist independently. Especially in the realm of big data, the complexities of technical infrastructure make many aspects of data ethics interdependent in complicated ways. To more fully understand these dependencies, we must first understand the key elements of big data’s forcing function and their ethical implications. Four aspects of data ethics are of particular importance in assessing the obstacles to developing col-laborative policy: identity, privacy, ownership, and reputation.

First, digital identity is concerned with the relationship between our offline identity and our online identity. It is, by definition, what an organi-zation knows (or believes it knows) about you. It is also always necessarily incomplete. While many people are working hard to close this gap, true 1:1 parity between our physical presence and our virtual presence is a long way off. There will be gaps for a long time to come—and many argue the gap will never close.

What would a federal appointee’s nomination look like today if his or her Netflix queue were part of the standard confirmation process?

Page 90: Sais.34.1

90 SAIS Review Winter–Spring 2014

Second, privacy is concerned with the access, control, and usage of personal data. One of the greatest challenges here is that we often have different norms and definitions of what constitutes “privacy.” Today, you might not care that Congress knows what movies you watch. In 1987, Robert Bork cared very much. Helen Nissenbaum argues that privacy is context dependent and that the “flow” or transmission of personal data occurs within informational norms. Changing one of the norms results in unexpected usage of personal data—which often feels like a breach of trust and a violation of privacy.33

Third, ownership is concerned with identifying who generates and owns data; what rights can transferred, sold, bought, shared, or stored (and for how long); and the obligations of people who generate and use data. This is a massive policy challenge. Entire new markets are being developed around “data brokers” who actively seek, acquire, buy, and sell personal data for a myriad of uses.44

Finally, reputation is concerned with the process of determining whether data is trustworthy, and how we can use data to make value-based judgments about individuals and organizations. Big data exponentially increases the amount of information available and the number of ways we can interact with it. This phenomenon increases the complexity of manag-ing how individuals and organizations are perceived and judged. Organiza-tions are increasingly using big data to make reputation-based assessments on critical topics including credit-worthiness, insurance risk, employment eligibility, and health status.

We are often myopic about these four elements, frequently focusing on one to the exclusion of others. Good privacy policy cannot be devel-oped without an understanding of the technical infrastructure and data handling practices that define an individual’s identity in a given system. Similarly, what an organization does with that data can directly impact an individual’s or organization’s reputation. Furthermore, who benefits finan-cially or otherwise from data ownership is an economic question of great importance. There is an Internet meme that says, “If you’re not paying for a service, you’re not the customer, you’re the product.” This concept turns the question of data ownership on its head and fundamentally alters the notion of property.55

Traditionally, an individual either owned or “sold” the results of their labor (their output) to an employer or a customer. They had some influence on who bought the results of their labor, and they typically received fair market value for that output in the form of revenue or salary. You don’t need to be an historian or an economist to understand that circumstances are different today, and these four aspects of data ethics raise more ques-tions than they answer.

When Facebook sells its users’ web browsing history to targeted mar-keting agencies, this “output” has been explicitly given to Facebook through its terms of service agreement. Users lose most of the control over what happens to the data, how much it is worth, and who buys or accesses it. In return for this loss of control, we can post funny pictures of cats, notes to each other, and links to other places on the web.

Page 91: Sais.34.1

91Bridging the innovation-Policy gaP

A balanced perspective is essential to bridging the innovation-policy gap. That balance is informed by an understanding of the complex inter-actions among identity, privacy, ownership, and reputation. Policymakers need help from technologists to ensure that policy is balanced, fair, timely, and useful. Technologists need to engage policymakers in ongoing and collaborative dialogue to help guide the outcomes. Bridging this gap is in everyone’s best interest.

Overcoming Challenges and Finding Mutual Benefits

How are we to respond to this increasing gap between technological innova-tion and the development of policy and legislation?

The first option available is to level the knowledge base. Policymakers and technologists need to create a common, shared understanding of each other’s perspectives. Each discipline is complex, nuanced, and takes years to learn how to do well. Furthermore, these disciplines are becoming increas-ingly intertwined.

Without policy guidelines, we face an uncertain future for technologi-cal innovation. Without technical knowledge to inform those guidelines, we may unnecessarily constrain competitive markets and di-minish the opportunity to benefit from innovations. Policymakers and technolo-gists must learn to collabo-rate more closely to ensure that we find a good balance between risk and innovation.

Another helpful ap-proach is to create a space for collaboration between poli-cymakers and technologists. Too often, policymakers and technologists operate too far remove from each other. Policy is frequently developed with a superficial understanding of the implications on user experience, infrastructure, access, bandwidth, usage patterns, and the true costs and benefits associated with the amazing possibilities big data offers. Technical innovations are often made with little to no discussion of the ethical, cul-tural, legal, political, and social implications of building a product or service.

In their effort to bridge the innovation-policy gap, policymakers and technologists should seek to find mutual benefits. The increase in data breaches, continuing concerns over data privacy, complex privacy and usage policies, difficult opt-out procedures, and a persistent lack of understanding by the general public of how specific technologies work can all be reduced or removed by working more closely together.

Policies benefit from having an accurate technical understanding of the issues at hand, and technology benefits from well-informed policy that provides protections and guidance. Mutual benefits encourage deeper brand relationships and consumer confidence. This translates into faster adoption by consumers, more vocal brand advocates, a reduction of risk from poorly

Policy is frequently developed with a superficial understanding of the implications on user experience, [...] and the true costs and benefits associated with the amazing possibilities big data offers.

Page 92: Sais.34.1

92 SAIS Review Winter–Spring 2014

designed or unexpected legislation, and market leadership by promoting the common good.

Each discipline can help the other by intentionally adopting these practices. Sharing knowledge in collaborative spaces generates mutual ben-efit for both disciplines, and ultimately, for the common and greater good. Mutual benefit is created from a shared understanding of our common values and aligning our collective actions with those values. This alignment increases the pace of innovation, informs balanced and useful policy, and helps make those benefits real in the world by turning the question “should we do this” into “how might we do this”?

Who is Bridging the Gap Today?

Technologists are not the only voice in the policy discussion. A recent ini-tiative announced by the White House exhibits many of the characteristics described above. By hosting a series of public events to encourage discussion amongst technologists, business leaders, civil society, and the academic com-munity, the Office of Science and Technology Policy seeks to learn from a wide range of experts and to engage the public in this discussion.5

There is also an emerging “think/do” tank in New York City called the Data & Society Research Institute. Their explicit purpose is “addressing social, technical, ethical, legal, and policy issues that are emerging because of data-centric technological development.”6 They are creating working groups, developing and holding events, and inviting a wide variety of participants to engage in an ongoing dialogue.

Both of these initiatives are intentionally seeking to level the knowl-edge base, create spaces for collaboration, and find mutual benefits. We can look to these initial efforts as examples and inspiration—but policymakers and technologists should not stop there. As noted previously, we are living in the Wild West of data innovation. It is an inspiring time that offers the potential for amazing benefits. However, much like the expansion of the American West, it is also fraught with danger, hardship, and few rules.

By working together to design a few guiding principles, we can help ourselves reduce the risk of dangers, ease hardships, and improve our chances of reaping the collective and common benefits of big data. Bridging the innovation-policy gap is essential to achieving this goal.

Notes

1 An outstanding introduction to the complexities of this world is: Joseph Menn, Fatal Sys-tem Error: The Hunt for the New Crime Lords Who are Bringing Down the Internet (New York City: Public Affairs, 2010).2 Netflix, “Blog: Help Us Bring Facebook Sharing to Netflix USA,” Netflix, http://blog.netflix.com/2011/09/help-us-bring-facebook-sharing-to.html.3 Helen Nissenbaum, “A Contextual Approach to Privacy Online,” Dædalus 140, no. 4 (2011), http://www.amacad.org/publications/daedalus/11_fall_nissenbaum.pdf 4 IT Law Wiki, “Data Broker,” IT Law Wiki, http://itlaw.wikia.com/wiki/Data_broker.5 John Podesta, “Big Data and the Future of Privacy,” The White House Blog, http://www.whitehouse.gov/blog/2014/01/23/big-data-and-future-privacy6 The Data and Society Institute, “Homepage,” The Data and Society Institute, http://www.datasociety.net/

Page 93: Sais.34.1

93Making the Most of Disruptive innovationSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

93© 2014 by The Johns Hopkins University Press

Fast Data, Slow Policy: Making the Most of Disruptive Innovation

Aniket Bhushan

International affairs policy and practice are particularly ripe for disruptive innovation fu-eled by the rise of big data and open data. While there are several loci of disruption, this paper demonstrate how the speed and time dimension in policy-relevant research in international affairs and international development are being disrupted. Three illustrative case studies—real-time macroeconomic analysis, humanitarian response, and poverty measurement—are discussed. Finally, the concluding section explores how successful policy entrepreneurs can make the most of disruptive innovation in the age of big data.

Most trends are accompanied by a familiar cycle of hype. After the initial trigger come inflated expectations, then a trough of disillusionment

followed by an upward slope of enlightenment, before finally resting on a mainstream plateau.1 The heightened pace of open data and big data, and their potential impact on international affairs, is following a similar pattern. Whether one chooses to defend the hype or challenge “data fundamental-ism,”2 enough fodder exists to fuel both sides of the debate.

Proponents argue the rise of big data and open data fundamentally changes the way we think about the world. The sheer volume, velocity, variety and veracity of big data3 means we can worry less about quality is-sues associated with narrower information sources. We can reframe our methodological orientation to focus on iterative learning and correlations as opposed to obsessing over causality. Doing this allows us to embrace the possibility of working with a plethora of untapped (and growing) data feeds to address challenges not even fully articulated yet. Doing this also means leaving in abeyance a host of new dilemmas in areas such as privacy, autonomy, and asymmetric coverage.4

Detractors on the other hand are quick to point out that data is not objective (indeed the term “raw data” is an oxymoron). Data cannot “speak for itself,” as the proponents of big data would have us believe.5 There are biases at all stages, from collection to analysis to presentation. Big data may be unbeatable when it comes to forecasting, but it is “dumb” when it

Aniket Bhushan is senior researcher at the North-South Institute (NSI), a Canadian-based policy research institution. Mr. Bhushan heads NSI’s work on the Canadian International Development Platform (CIDP), a data and analytical platform on Canada’s engagement with the developing world. Mr. Bhushan’s current research focuses on the impact of big data and open data on public policy. His academic background includes degrees in political science and commerce, and he completed his M.A. in political science at Carleton University (Canada).

Page 94: Sais.34.1

94 SAIS Review Winter–Spring 2014

“comes to doing science,” as it is not underpinned by sophisticated research designs that aim to identify causal relationships.6 The bigger the data, the more we are prone to lull ourselves into a false sense of confidence in predic-tive analytics. Indeed, big data accentuates the “signal to noise” problem.7

There are a multitude of other issues associated with the use of data. Big data has its roots in the commercial sector.8 The main intention behind generating sharper insights into customer behavior and profiles is to achieve better targeting and segmentation; in other words, smarter discrimina-tion to ultimately drive profitability. When examined from a public policy perspective, this could be highly problematic. The kinds of targeting and discrimination taken for granted in many commercial sectors, like adver-tising, would be expressly forbidden in more regulated industries, like the insurance industry, and may be contrary to the aims of public policy and public service delivery.9

Whichever perspective one identifies with, the inescapable fact is that big data and open data are already disrupting several industries. Policy relevant research and analysis in international affairs is no exception. In fact, international affairs—and international development as a subset—are particularly ripe for data driven disruption. The speed of big data and open data has inherent disruptive potential that is already impacting a number of areas, from the highly mainstream business of macroeconomic indicators to new approaches to poverty measurement in difficult country contexts.

The aim of this paper is to describe what the emerging paradigm of in-ternational affairs in the age of big data looks like and the role of the policy entrepreneur at the center of this process. Successful policy entrepreneurs, both individuals and institutions, will be able to make the most of highly disruptive trends in data that will fuel future analyses of international af-fairs. Adapting to this emerging paradigm requires investing in new tools. While there are several loci of disruption, the emphasis of this paper is on the time and speed dimension.

International Affairs and Development are Ripe for Disruptive Innovation

Disruptive innovation is a business school concept often overused with little reference to the original idea. What does one mean by disruptive innovation in international affairs? The disruptive innovation paradigm argues that small, speculative innovations at the base of the pyramid can often leapfrog and disrupt established domains because top-tier players pursue incremen-tal innovation with their most important, but also most change-resistant clients.10 There are several examples from big business—Amazon’s business model disrupted brick and mortar retail, Skype disrupted long distance telephony, Netflix is disrupting cable broadcasting.

The paradigm is also applicable to international affairs. The main clients of policy analysis and research are bureaucrats or political decision makers whether in government or international institutions. In this context, the potential disrupters are analysts who can make the most of the rise of big and open data.

Page 95: Sais.34.1

95Making the Most of Disruptive innovation

Both as a field of practice and analysis, international affairs and devel-opment present an opportunity for disruption. Recent research into what senior policymakers in international affairs want and expect from research-ers is revealing in this regard. The survey, unique in its sampling, targeted senior U.S. national defense policymakers from the George H.W. Bush, Bill Clinton, and George W. Bush administrations.11 The findings paint an un-flattering picture of the increasing gap between the scientific aspirations of international affairs scholarship and the needs of policymakers. The gaps are highest at the top. The study found that the more policymakers know about a subject (especially through direct experience), the less likely they are to believe the “experts.”

While the gap between direct experience and academic knowledge may be unsurprising, the survey also points to just how wide some of the generational gaps can be. For instance, it is alarming that, for a survey conducted in 2013, senior policymakers still do not count the Internet as a useful source of policy relevant information. Instead they prefer to gravitate towards individual high profile scholars. A key take-away is that the ease of access to information does not necessarily translate into greater efficacy. Rather, the diffuse nature of the medium—usually seen as a positive in terms of democratizing information—is itself its weakness. A plethora of sources, many of questionable reliability and with no authoritative source among them, can become a barrier for time constrained policymakers who have little interest in “cutting-edge tools and rarified theory.”12 Senior policy-makers instead prefer researchers and analysts act as informal advisers and creators of knowledge.

Senior policymakers neither trust nor have time for big data and open data. At the highest levels, most senior policymakers act precisely like in-cumbent industry leaders or the large risk-averse, change-resistant clients in Christensen’s disruptive innovation model. They are highly embedded in the classical paradigm typified by clean, composite, largely linear, and mostly backward-looking indicators, underpinned by methodologically rig-orous, often expensive heuristic frameworks that remain firmly within the closed-loop single source of truth paradigm. These clients are serviced by well-heeled ad-visors steeped in high quality (i.e. expensively vetted) informa-tion and data flows. Think of examples rang-ing from official unemployment figures to infla-tion to, worse, composite in-dexes of poverty,

Much of the data we rely on in international affairs and international development research and analysis is fraught with serious problems and is so slow that it is almost a historical caricature by the time it is published, barely descriptive about the present, let alone insightful about the future.

Page 96: Sais.34.1

96 SAIS Review Winter–Spring 2014

and inherently fuzzy concepts like “governance.”13 This paradigm is already being decisively disrupted by the rapid mainstreaming of big and open data, as outlined in the next section.

Much of the data we rely on in international affairs and international development research and analysis is fraught with serious problems and is so slow that it is almost a historical caricature by the time it is published, barely descriptive about the present, let alone insightful about the future. Some of the most important data that we take for granted as “real-time” is not only published with significant lags, but is subject to significant revision. For example, gross domestic product (GDP) is, at best, a quarterly series published with a two-month lag, and revised over the next four years.14 This, along with the vastly asymmetric influence of a handful of influential senior advisors privy to the closed inner circle of senior decision makers, makes international affairs fertile ground for disruptive innovation.

The “MS Excel error heard around the world” provides a glimpse into what disruption in international affairs research in the age of big data looks like.15 The most telling aspect about the now infamous Reinhart-Rogoff spreadsheet error, at least from the perspective of research in the age of big data, was not the furor it created by questioning whether high public debt (more than 90 percent of GDP) really has an unusually large effect on growth prospects. The issue was that the researchers were conducting the analysis manually in Excel instead of using reproducible code. Their specific data was not initially “open,” but was made so when a graduate student requested it to replicate results—and obviously could not. The story was literally heard around the world, thanks not only to the speed of the spread of information but also due to the high profile of the “experts” involved in a highly polarized debate. A series of small, incremental and—when viewed in isolation—unexpected trends at the base of the pyramid were able to disrupt the asymmetric influence wielded by high profile individual incum-bents, even in an unlikely domain such as international affairs research and analysis.

How Big Data and Open Data are Disrupting International Affairs: The Key Element is Time and Speed

In this section we discuss three highly summarized case studies on how big and open data are disrupting research, analysis, and practice in three very different data domains within international affairs: macroeconomic analysis, humanitarian crisis response, and the measurement of poverty and socioeconomic indicators. The common thread in each case is the focus on the impact of big and open data on time and speed.

Real-time economic analysis Current or near real-time economic analysis is a highly data dependant en-terprise. It is highly conservative, in that it is dominated by central banks, ministries of finance, and large private financial institutions. The more timely, accurate, and relevant the data, the better the current assessment

Page 97: Sais.34.1

97Making the Most of Disruptive innovation

and the more valuable it is from a policy perspective. Big data is already disrupting how we collect, compute, and project basic real-time macroeco-nomic indicators, ranging from GDP and inflation to financial, housing, and labor market indicators.

Recently, many central banks, including the Bank of England, Eu-ropean Central Bank, the Bank of Japan, and the Bank of Canada, have looked into the possibility of leveraging big data to enhance the timeliness of current economic analysis.16 An interesting innovation in Canada is the use of big data to fill the gaps in the timeliness of official GDP statistics by developing a new short term GDP indicator that provides daily updates of real GDP growth forecasts. Existing monthly data is combined with big data to predict GDP growth before official national accounts data are re-leased for a given quarter, thus bridging the gap period.17 The example also demonstrates how big data traverses the “official” and “unofficial” domains.

In the case of Japan, the Abe government needed immediate informa-tion on time-sensitive policy changes, such as a major increase in the sales tax. What analysts found was that under the existing system there was no way to assess the situation until the household survey or sales data was released and analyzed months later—an eternity in terms of real-time eco-nomic analysis. In response, the government proposed the development of a new composite index that would use big data, including online searches and point-of-sale records that would shed immediate light on the impact of policies, albeit not without significant methodological challenges.18

Similarly, the Billion Prices Project (BPP) at the Massachusetts Insti-tute of Technology (MIT) demonstrates how big data can be leveraged to provide a real-time gauge of inflation. BPP uses web-scrapers (a relatively simple approach, but one that is highly extensible and adaptable to several uses) to scour websites of online retailers for real-time prices on an enor-mous range of products. After the collapse of Lehman Brothers in 2008, BPP data showed how businesses started cutting prices immediately. In contrast, official inflation figures did not show deflationary pressures until November.19 Given the importance of inflation and timely assessment of inflationary expectations from the perspective of monetary policy response, this information represents a significant improvement in response time.20

To assume that these innovations are limited to advanced economies would be a mistake. The UN Global Pulse initiative has partnered with BPP and Price Stats to apply the same web-scraping approach in six Latin American countries, specifically to monitor the price of bread and calculate a new eBread Index.21 Nascent results from the project show the approach can be extended to developing country contexts, and that, in general, the eBread Index is highly correlated with the official consumer price index for the food basket in these countries. However unlike official inflation data that is available monthly, the eBread Index is available daily. This again is a major improvement in country contexts where inflation and inflationary expectations can change rapidly.

Big data has also been successfully leveraged for a range of other macro indicators. For instance, online search data from Google has been success-

Page 98: Sais.34.1

98 SAIS Review Winter–Spring 2014

fully used to predict initial claims for unemployment benefits, consumer sentiment indexes in the United States and United Kingdom, and even car sales down to specific brands.22 These trends show that companies like Google, Facebook, and Twitter23 are as important to the future data flow that will fuel policy relevant international affairs research as any national official statistical agency.

The implication is that these companies may be far more important than multilateral data clearing houses such as the World Bank, OECD, or UN bodies, on whose highly questionable traditional data—in terms of qual-ity, coverage, granularity and timeliness—much of the current research and analysis in international affairs and development depends. While academics have often pontificated about new and alternative measures of progress, such as the “happiness index,” lesser known firms than Google or Facebook, like Jana, are experimenting with SMS based surveys on a global scale that are able to deliver a real-time snapshot of societal well-being.24

Humanitarian crises and disaster relief 2.0The tragic earthquake off the coast of Haiti’s capital in January 2010 marked a watershed moment for the impact of big data and open data on disaster relief. The earthquake “created a chasm between what the international humanitarian community knew about Haiti prior to the quake and the reality it faced in the immediate aftermath.”25 The response in Haiti dem-onstrated an important change in how the huge information gap between damage assessment and response planning was filled. For the first time, two new data inflows were added to the typical crisis response data: one from volunteer and technical communities around the world (principally open source mapping communities like OpenStreetMap, Sahana, CrisisMappers, and Ushahidi), and one directly from the affected community of Haitians.26

The experience in Haiti showed that the international humanitarian community was not equipped to handle these new information channels, in terms of both speed and complexity.27 The volunteer technical communities approached the problems in ways that fundamentally challenged the status quo of large humanitarian agencies leading the recovery efforts while the smaller groups follow.

Criticism of the Haiti experience revolves around the overflowing information pipeline.28 Yet, this is a far better problem than the opposite situation. The ability to learn and rapidly apply lessons in future crises, as discussed below, demonstrates the benefit of having “too much” informa-tion. Before focusing on the lessons, it is important to emphasize that the Haitian response proved the rise of big data and open data is not simply about data or technical sophistication. One of the most useful roles played by volunteers was language translation of a huge volume of SMS and other messaging through social media channels. The disruptive innovation was that a highly networked and highly technical, yet contextually aware, virtual community emerged organically. Arguably, the creation of such a commu-nity may not have been possible, no matter how many pilot projects were funded by well-meaning donor agencies.29 One reason is that the problem-

Page 99: Sais.34.1

99Making the Most of Disruptive innovation

solving, transparency-driven, open source mindset that underpins much of the virtual community is not always shared by big bureaucracies and senior policymakers.30

Lessons from the Haitian earthquake have been applied in other contexts. User generated crisis maps have saved lives in subsequent disasters.31 Volunteers involved in the Haiti mapping project have supported other crowd-sourced mapping initiatives, including proj-ects that emerged in the wake of the earthquake in Chile, floods in Pakistan, the crisis in Libya, the earthquake and tsunami in Japan, and the typhoon in the Philippines. With each experience, the work has gotten better as lessons are rapidly shared within a likeminded, highly moti-vated, and well organized community. The process of interlinking real-time, geo-spatial crisis data with other relevant data feeds, such as traditional me-dia, has grown exponentially in the past few years. The time taken between crisis impact and information generation has shrunk dramatically compared to historical response times. In the case of Japan, within two hours after the earthquake and tsunami, real-time witness reports were being mapped and shared. In a context where seconds and minutes can determine the difference between life and death, the rise of big and open data and their associated communities has disrupted how society plans humanitarian responses, ensuring such tools will be leveraged in future crises.

The poverty of poverty measures At the other end of the velocity spectrum are data on typically slow moving measures like poverty. Not only are poverty trends relatively slow moving, at least in comparison to the examples discussed above, but the reporting lags are enormous. The significant lag time of the data bears repeating: When the World Bank announced that 22 percent of the world’s population lived on less than $1.25 a day in 2012—and, consequently, the first Millennium Development Goal had been achieved—that data was four years old when reported, dating from 2008.32

The data is the poorest where it matters the most. Recent analysis of the state of widely used economic indicators, such as GDP in sub-Saharan Africa, raises serious issues. While international databases like the World Bank report time-series data for many countries, the countries themselves were found to have not published their own data for many of the years cov-ered. Many countries in the region have or are in the process of updating their national income accounts methodology, making these more consistent with what most countries use. In so doing, many are finding a very different picture than they had been led to believe.

. . . t h e p rob l em- so l v ing , t r a n s p a r e n c y - d r i v e n , open source mindset that underpins much of the virtual community is not always shared by big bureaucracies and senior policymakers.

Page 100: Sais.34.1

100 SAIS Review Winter–Spring 2014

For instance, Ghana’s 2010 revision showed that GDP was 60 percent higher than expected, instantly catapulting a low income country to middle income status. Research comparing GDP data from country sources with GDP data from the World Bank is alarming. GDP estimates according to national sources in some countries like Burundi (2007) were found to be 32 percent higher than the same reported by the World Bank. However, in other cases the reverse was true, and for Guinea-Bissau in 2006, the World Bank’s estimate was 43 percent higher than that of the national authority.33

It is important to understand that the problems underpinning these data challenges are not merely an issue of technical capacity, competence, or cost of collection. A far greater problem is perceived or actual interference, whether from political authorities, donors, or other actors. These issues have been aptly termed “the political economy of bad data,” which neatly de-

scribes the situation in many developing countries.34 Huge in-centives to misreport plague administra-tive data systems on many levels. For ex-ample, when Kenya

decided to abolish fees in primary school, this radically changed the incen-tives for reporting by school administrators, as schools are allocated more teachers and funding if they attract more students. While administrative data from the Ministry of Education shows a steady increase in primary school enrollment rates, demographic survey and national statistical data fails to confirm the trend and instead indicates enrollment rates have been flat over the same time period.35

These findings, while extremely troubling, are made worse by added issues that complicate incentives. For instance, a fast growing trend among donors is cash-on-delivery or performance-based aid, a trend based on the idea of paying for results instead of paying for inputs. Whatever one may think about this conceptually as an aid modality, the fact is that these ap-proaches greatly increase the data burden. In this approach donors pay for development results or outcomes such as increased educational enrollment and improved performance. For performance-based measures to work, or-ganizations need better, more timely, and more granular data. The more ingenuity society can throw at the problem the better.

How are big data and open data disrupting this landscape? Given the context described above, tapping into passively generated and proxy data, if only to triangulate results or provide baseline referential information, could be a welcome innovation. Big data approaches have thrown up three inter-esting possibilities. The first is analysis of anonymized call detail records (CDRs). A recent project in Cote d’Ivoire, using five million anonymized CDRs from Orange telecommunications customers collected over a five month period, analyzed both the level and location of activity. The analysis indicated that a wider range of calls and longer durations were good prox-ies for wealth. Using this data, researchers were able to create a granular

. . . that the problems underpinning these data challenges are not merely an issue of technical capacity, competence, or cost of collection.

Page 101: Sais.34.1

101Making the Most of Disruptive innovation

geospatial estimate of poverty in Cote d’Ivoire—the first data pertaining to a full survey of the country that has been available since the late 1990s due to political strife and economic turmoil in recent years, which have hampered traditional methods.36

Another interesting innovation in small scale poverty measurement and prediction is an approach using night light illumination. This approach rests on the assumption that poorer places are quite literally in the dark. Using geospatial, night light, and census data for Bangladesh in 2001 and 2005, researchers showed that a regression model combining the data was able to predict poverty at a granular level. The cost effective and non-intru-sive nature of this approach makes it a useful source of proxy poverty data, and makes up for potentially lower accuracy.37 The concept is also being extended with application to other geographic regions, including in Africa.

A third avenue is high-frequency micro-surveys conducted using mobile phones and other platforms. The World Bank’s Listening to Latin America or L2LAC project was launched out of a frustration among policy-makers looking for information on the impact of the 2008 economic crisis in Latin America. Typically this sort of analysis depends on household survey data collected and reported over years—and at a high cost. The L2LAC pilot covered nationally representative samples in Peru and Honduras and dem-onstrated that by using mobile platforms, small versions of wider household surveys can be conducted on a monthly basis and at a fraction of the cost. This provides much closer to “real-time” insights into poverty, employment, inequality, and other trends essential for effective responses to fast moving crises. L2LAC also provides a useful gauge of poverty dynamics and trends between official reporting periods, which can be years apart.38 The model has since been extended to pilot projects in Africa.39

Anonymized CDR analysis, proxy light source data, and mobile phone based micro-surveys are big data innovations that are disrupting how we measure and respond to poverty at various levels. Aspects of each approach have the potential to be “mainstreamed,” which would have been unthink-able just a few years ago.

Making the Most of Disruption Requires a New Kind of Policy Entrepreneur

Simon Maxwell, formerly of the Overseas Development Institute, popular-ized the term “policy entrepreneur” when he proposed a simple self-assess-ment questionnaire that categorized analysts into four types: story-tellers who are steeped in powerful grand narratives that often inform policy; networkers who rely mostly on their connections with policymakers; engineers who are grounded in testing ideas expected to have policy import; and fixers who wield expert power around specific problems.40

In an age where big data and open data are fast becoming part of the mainstream in several domains of international affairs, we need to add a new, fifth type of policy entrepreneur: the disrupter. The policy entrepreneur who focuses on making the most of disruptive innovation opportunities ushered in by big data and open data typifies the role of the disrupter.

Page 102: Sais.34.1

102 SAIS Review Winter–Spring 2014

A set of key trends underpin our discussion and make international affairs ripe for disruptive innovation. They are worth repeating. The majority of analyses in international affairs rely on clean, composite, largely linear, and mostly backward-looking indicators that remain firmly within the closed-loop single source of truth paradigm. This, combined with the asym-metric influence of a handful of influential senior advisors privy to the inner circle of senior policymakers, makes international affairs fertile ground for disruptive innovation fueled by the rapid mainstreaming of big data tools.

A key trend underlying the examples discussed in this paper is the dramatically abbreviated learning, development and deployment curves associated with these innovations, and their shrinking marginal costs. A result of these improvements is that sophisticated analytical capacity is be-coming more available to an ever wider range of users and analyzers, with a much wider set of applications. The scope of who—whether individual or institution—can be a data generator, aggregator, analyzer, and synthesizer is expanding rapidly.

The rise of big data and open data has shrunk the time lag between the start of a trend, when responders have access to essential information

needed to respond to the trend, and the feedback loop generated by those who are affected by the response.41 Traces of emerg-ing trends show up faster in machine-level exchanges across online data platforms—or in the data stream generated by the nearly 35,000 Facebook likes that brands and organizations receive every minute—than they do in official statistics.42 As the level of digital activity grows, as social networks become more ossified, and as literacy and awareness around tools increase across the developed and the developing worlds, traditional barriers to adoption of new tech-

nologies will fall rapidly. These trends create new analytical and engagement opportunities.

The rise of big data is disrupting clean, composite, backward-looking indicators and the result is often messy, probabilistic, but real-time and forward looking dashboards. We are just at the beginning of a shift in the nature of policy analysis in international affairs, from in-depth research to on the fly analytics. What do these trends mean for a new generation of policy entrepreneurs? How can the new policy entrepreneur make the most of the disruptive innovation potential fueled by big data and open data?

The rise of big data is disrupting clean, composite, backward-looking indicators and the result is often messy, probabilistic, but real-t ime and forward looking dashboards. We are just at the beginning of a shift in the nature of policy analysis in international affairs, from in-depth research to on the fly analytics.

Page 103: Sais.34.1

103Making the Most of Disruptive innovation

The successful policy entrepreneur will first and foremost invest in staying on top of these rapidly moving trends and their associated tools and technologies. The landscape is evolving fast. Take for instance crowd sourced mapping, which only emerged at scale less than five years ago and is already a mainstream tool in crisis response. To make the most of the disruptive potential, the successful policy entrepreneur will work to break down perceived dichotomies and distinctions, and get comfortable work-ing in polarized terrains. Or consider the distinction between “public” and “private” sources of information. As we have seen, Google and Twitter can be as important a data and analytical resource as the World Bank or any national statistical agency. Policy entrepreneurs that succeed in blurring and breaking these dichotomies will be best placed to capitalize on the disruptive potential.

Harnessing disparate sources requires investing in tools that help drive interoperability. A lot of potentially disruptive data exists across disparate levels and domains—for example, at the national, local, regional or even neighbourhood levels. Potentially disruptive data also exists in both actual and virtual communities and networks, as well as across government depart-ments and agencies. Furthermore, disruptive data can be found across dif-ferent data types that may not intuitively work well together—for instance, well-structured data (such as numerical relational tables) and unstructured data (such as Twitter hashtags or search data with multiple and complex in-terrelationships). There are currently over 300 known open data sites hosted at various levels of government across the world.43 As a starting point, there remains significant unexploited potential to better link and leverage this fast growing public sector open data, not only to increase efficiency, but to drive entirely new business and service-delivery models.44

In addition to tools that stimulate interoperability across disparate sources, types, and levels, the successful policy entrepreneurs will distinguish themselves by having a keen eye on which combination of tools and infor-mation source work best for which policy questions. The greatest potential impact may be in areas where experimental alternatives, like triangulating multiple big or open data sources, either significantly improves or funda-mentally alters the view presented by standard single-source closed-loop indicators.

With the expanded scope of who can be a data generator, aggregator, or analyzer come inherent problems. These challenges will not only be a major preoccupation of the new policy entrepreneur but will also represent a significant opportunity. A key issue is validation of new and innovative ap-proaches. Here, successful policy entrepreneurs will make the most of their “analog” skillsets and deep contextual awareness. Big data, no matter how powerful, can easily go astray without a domain expertise. Maintaining con-textual awareness requires a great deal of offline investment. The importance of building agile teams and maintaining loose but motivated networks with diverse backgrounds and skillsets cannot be overemphasized. The new breed of policy entrepreneur, while versed in data science, cannot be restricted to data. Big data policy entrepreneurs will need to be obsessed with developing

Page 104: Sais.34.1

104 SAIS Review Winter–Spring 2014

innovative validation and verification solutions, with the aim of disrupting the incumbent closed-loop single source of truth paradigm, still emblematic of most international affairs research and policy analysis.

Notes

1 Gartner, Inc. Research Methodologies, “Hype Cycle Research Methodology,” Gartner, Inc., http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp. 2 Kate Crawford, “The Hidden Biases in Big Data,” HBR Blog Network, posted April 1, 2013, http://blogs.hbr.org/2013/04/the-hidden-biases-in-big-data/. 3 IBM, “The Four V’s of Big Data” IBM, Inc., http://www.ibmbigdatahub.com/sites/default/files/infographic_file/4-Vs-of-big-data.jpg As recently as 2000, less than a quarter of the world’s stored information was digital. Now, around 98 percent is digitally stored. There are several similar statistics. A good reference for the scale of the rise of big data is WikiBon, found at http://wikibon.org/blog/big-data-statistics/.4 There are several big data evangelists and enthusiasts; see in particular: Kenneth Cukier and Viktor Mayer-Schoenberger, “How It’s Changing the Way We Think About the World,” Foreign Affairs 92, no. 3 (2013): 27-40; World Economic Forum 2012 Report, “Big Data, Big Impact: New Possibilities for International Development,” World Economic Forum, http://www.weforum.org/reports/big-data-big-impact-new-possibilities-international-development; Harvard Business Review October 2012, “Getting Control of Big Data,” Harvard Business Review; Aniket Bhushan, “Big Data, Democratized Analytics and Deep Context Will Change How We Think and Do Development,” The North-South Institute, http://www.nsi-ins.ca/wp-content/uploads/2013/04/2012-Big-Data-Democratized-Analytics-and-Deep-Context-will-Change-How-We-Think-and-Do-Development.pdf.5 Kate Crawford, “The Hidden Biases in Big Data,” HBR Blog Network, posted April 1, 2013, http://blogs.hbr.org/2013/04/the-hidden-biases-in-big-data/.6 Marc Bellemare, “Big Dumb Data,” Marc F. Bellemare Blog, posted May 16, 2013, http://marcfbellemare.com/wordpress/2013/05/big-dumb-data/. The point is interesting, but questionable. Much of the artificial intelligence and machine learning at the core of big data is in fact underpinned by sophisticated and causal, if only iterative, research designs and models. In fact, there is no reason one couldn’t apply more or bigger data to various social science research designs. The greatest potential from a methodological and academic perspective may be found by overcoming some of these gaps. 7 Konrad Yakabuski, “Big Data should inspire humility, not hype,” The Globe and Mail, March 4, 2013, http://www.theglobeandmail.com/globe-debate/big-data-should-inspire-humility-not-hype/article9234569/.8 Bernholz describes how big data hype has captured the imagination of the commercial and to some extent the public policy domains, but virtually ignores civil society, a sector that generates a great deal of significantly untapped data. See: Lucy Bernholz, “Civil Society and Big Data,” Philanthropy 2173 Blog, posted February 24, 2014, http://philanthropy.blogspot.ca/2014/02/civil-society-and-big-data.html?m=1.9 Michael Schrage, “Big Data’s Dangerous New Era of Discrimination,” HBR Blog Network, posted January 29, 2014, http://blogs.hbr.org/2014/01/big-datas-dangerous-new-era-of-discrimination/.10 Clayton Christensen, “Disruptive Innovation,” Clayton Christensen’s personal website, http://www.claytonchristensen.com/key-concepts/.11 Paul Avey and Michael Desch, “What Do Policymakers Want From Us? Results of a Sur-vey of Current and Former Senior National Security Decision-makers,” International Studies Quarterly 58, no. 4 (2014) Forthcoming.12 Paul Avey and Michael Desch, “What Do Policymakers Want From Us? Results of a Sur-vey of Current and Former Senior National Security Decision-makers,” International Studies Quarterly 58, no. 4 (2014): 34. Forthcoming. (Page number may differ in final edition.)13 Several examples come to mind, but think of highly composite measures like the Human Development Index (HDI) or the World Governance Indicators (WGI) that are popular in

Page 105: Sais.34.1

105Making the Most of Disruptive innovation

international affairs research and rely on such untimely data that they are more a historical caricature than an insightful tool for understanding coming trends. 14 Similarly, inflation data is published monthly, but with a three week lag after the reporting month. See: Nii Ayi Armah, “Big Data Analysis: The Next Frontier,” Bank of Canada Review, Summer 2013, http://www.bankofcanada.ca/wp-content/uploads/2013/08/boc-review-summer13-armah.pdf.15 The Rachel Maddow Show, “The Excel Error Heard Around the World,” MSNBC, http://www.msnbc.com/rachel-maddow-show/the-excel-error-heard-round-the-world. See also: Thomas Herndon, Michael Ash and Robert Pollin, “Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff,” (working paper no. 322, PERI University of Massachusetts, Amherst). 16 See: Mitsuru Obe, “Japan Looks to Big Data for Timely Economic Indicator,” The Wall Street Journal, September 24, 2013; L. Einav and J. D. Levin, “The Data Revolution and EconomicAnalysis,” (working paper no. 19035, National Bureau of Economic Research); A. Binette and J. Chang, “CSI: A Model for Tracking Short-Term Growth in Canadian Real GDP,” Bank of Canada Review, Summer 2013, http://www.bankofcanada.ca/wp-content/uploads/2013/08/boc-review-summer13-binette.pdf; Nii Ayi Armah, “Big Data Analysis: The Next Fron-tier,” Bank of Canada Review, Summer 2013, http://www.bankofcanada.ca/wp-content/uploads/2013/08/boc-review-summer13-armah.pdf. 17 Binette and J. Chang, “CSI: A Model for Tracking Short-Term Growth in Canadian Real GDP,” Bank of Canada Review, Summer 2013, http://www.bankofcanada.ca/wp-content/uploads/2013/08/boc-review-summer13-binette.pdf; Nii Ayi Armah, “Big Data Analysis: The Next Frontier,” Bank of Canada Review, Summer 2013, http://www.bankofcanada.ca/wp-content/uploads/2013/08/boc-review-summer13-armah.pdf.18 Mitsuru Obe, “Japan Looks to Big Data for Timely Economic Indicator,” The Wall Street Journal, September 24, 2013. 19 Nii Ayi Armah, “Big Data Analysis: The Next Frontier,” Bank of Canada Review, Summer 2013, http://www.bankofcanada.ca/wp-content/uploads/2013/08/boc-review-summer13-armah.pdf.20 A counterargument often made is that focusing on frequency tends to overestimate both inflation and the variability of inflation. This is particularly the case in other measures such as the Everyday Price Index, which measures inflation from the perspective of goods that people purchase frequently. However, these critiques can be easily countered by sta-tistically correcting for additional variability inherent in higher frequency data. For more information, see: Matthew Yglesias, “Here’s a Deliberately Inaccurate Inflation Index,” April 9, 2013, http://www.slate.com/blogs/moneybox/2013/04/09/aier_s_everyday_inflation_in-dex_is_terrible.html21 UN Global Pulse, PriceStats, and the Billion Prices Project, “Daily Tracking of Commodity Prices: the eBread Index,” http://www.unglobalpulse.org/projects/comparing-global-prices-local-products-real-time-e-pricing-bread.22 Hyunyoung Choi and Hal Varian, “Predicting Initial Claims for Unemployment Benefits,” Google Inc., http://static.googleusercontent.com/media/research.google.com/en/us/archive/papers/initialclaimsUS.pdf; Hyunyoung Choi and Hal Varian, “Predicting the Present with Google Trends,” Google Inc., http://people.ischool.berkeley.edu/~hal/Papers/2011/ptp.pdf; Paul Cheung, “Big Data, Official Statistics and Social Science Research: Emerging Data Challenges,” (presentation at the World Bank, December 12, 2012).23 Mining Twitter data on food related conversations has also been shown to be strongly correlated with food price inflation. See: World Economic Forum 2012 Report, “Big Data, Big Impact: New Possibilities for International Development,” World Economic Forum, http://www.weforum.org/reports/big-data-big-impact-new-possibilities-international-development.24 JANA and UN Global Pulse, “Global Snapshot of Well Being - Mobile Survey,” United Na-tions, http://www.unglobalpulse.org/sites/default/files/Mobile%20Data%20for%20Develop ment%20Primer_Oct2013.pdf.25 UNOCHA et al., “Disaster Relief 2.0: The Future of Information Sharing in Humanitarian Emergencies,” United Nations Foundation, http://issuu.com/unfoundation/docs/disaster_re-lief20_report.

Page 106: Sais.34.1

106 SAIS Review Winter–Spring 201426 UNOCHA et al., “Disaster Relief 2.0: The Future of Information Sharing in Humanitarian Emergencies,” United Nations Foundation, http://issuu.com/unfoundation/docs/disaster_re-lief20_report; See also: Kim Rose, “The humanitarian power of big data,” Hortonworks Blog, posted May 6, 2013, http://hortonworks.com/big-data-insights/the-humanitarian-power-of-big-data/; Maja Bott, Björn-Sören Gigler, and Gregor Young, “The Role of Crowdsourcing for Better Governance in Fragile State Contexts,” International Bank for Reconstruction and Development/World Bank, https://wbi.worldbank.org/wbi/Data/wbi/wbicms/files/drupal-acquia/wbi/crowdsourcing_final_0.pdf; Maja Bott and Gregor Young, “The Role of Crowdsourcing for Better Governance in International Development,” The Fletcher Journal of Human Security 27 (2012): 47-70.27 During the first week, volunteers mapped some 1,600 reports from affected Haitians based on information from Twitter, Facebook, and online news. Over 30,000 SMS messages from affected Haitians were sent through the Ushahidi led Project 4636 in the first month alone.28 Maja Bott, Björn-Sören Gigler, and Gregor Young, “The Role of Crowdsourcing for Better Governance in Fragile State Contexts,” International Bank for Reconstruction and Development/ World Bank, https://wbi.worldbank.org/wbi/Data/wbi/wbicms/files/drupal-acquia/wbi/crowdsourcing_final_0.pdf.29 It is worth noting that once innovations like open source crowd mapping prove their utility, donors and others see value in supporting and facilitating them from the “inside.” A great example of this is USAID’s recent MapGive initiative, which encourages volunteers to learn and get involved in crowd mapping using the OpenStreetMap platform: http://mapgive.state.gov/index.html30 The Standby Volunteer Task Force for Live Mapping (SBTF), an online volunteer initia-tive for crisis mapping that was founded as a consequence of the various loosely connected projects for Haiti’s recovery.31 Jing Guo, “How User Generated Crisis Maps Save Lives in Disasters,” The World Bank Blog, posted February 26, 2014, http://blogs.worldbank.org/publicsphere/how-user-generated-crisis-maps-save-lives-disasters.32 Johan Mistiaen, “What will it take to improve poverty data?” The World Bank: MDGs and Beyond 2015 (2012): 2. 33 Jerven Morton, “African Growth Miracle or Statistical Tragedy? Interpreting trends in the data over the past two decades,” (paper presented at UNU-WIDER conference on Inclusive Growth in Africa, September 20-21, 2013). More worryingly, as the paper shows, in some cases base year revisions are nearly three decades apart, implying that we should not be surprised if we see a number of countries go through a statistical inflation in fundamental data like GDP, as in the recent experience in Ghana. 34 Justin Sandefur and Amanda Glassman, “The Political Economy of Bad Data: Evidence from African Survey and Administrative Statistics,” (paper presented at UNU-WIDER con-ference on Inclusive Growth in Africa, September 20-21, 2013). 35 Justin Sandefur and Amanda Glassman, “The Political Economy of Bad Data: Evidence from African Survey and Administrative Statistics,” (paper presented at UNU-WIDER con-ference on Inclusive Growth in Africa, September 20-21, 2013). We should also note that analyses such as these are made far easier by the spread of open data including administra-tive level data. Kenya is viewed as a leader in this regard among developing countries. 36 Christopher Smith, Afra Mashhadi, and Licia Capra, “Ubiquitous Sensing for Mapping Poverty in Developing Countries,” (paper submitted to the D4D session of the NetMob 2013 conference). UN Global Pulse, “Mobile Phone Network Data for Development,” October 2013, http://www.unglobalpulse.org/sites/default/files/Mobile%20Data%20for%20Develop-ment%20Primer_Oct2013.pdf. 37 Prasanna Lal Das, “Scenes from the DC big data dive- the final report,” The World Bank Blog, posted May 28, 2013, http://blogs.worldbank.org/opendata/scenes-dc-big-data-dive-final-report.38 A. Ballivian and J. Azevedo, “Listening to LAC: Using Mobile Phones for High Frequency Data Collection,” The World Bank.39 An example at the metropolis level is the Listening to Dar project in Dar es Salaam, Tanzania.

Page 107: Sais.34.1

107Making the Most of Disruptive innovation

40 Simon Maxwell, “Policy Influence: Policy entrepreneurs,” Overseas Development Institute Blog, posted January 2009, http://www.odi.org.uk/publications/5896-simon-maxwell-engineer-networker-fixer-storyteller-policy-entrpreneurship.41 World Economic Forum 2012 Report, “Big Data, Big Impact: New Possibilities for Inter-national Development,” World Economic Forum, http://www.weforum.org/reports/big-data-big-impact-new-possibilities-international-development.42 Nii Ayi Armah, “Big Data Analysis: The Next Frontier,” Bank of Canada Review, Summer 2013, http://www.bankofcanada.ca/wp-content/uploads/2013/08/boc-review-summer13-armah.pdf.43 For more see Open Data Sites: http://opendatasites.com/44 A good example is the emergence of public sector open data based companies and busi-ness models. In the United States, real estate sector companies Trulia and Zillow come to mind. There are also innovative business models emerging around searching, indexing, meta-tagging, and visualizing open public data, like Evision.io.

Page 108: Sais.34.1

109Big Data CustoDianship in a gloBal soCietySAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

109© 2014 by The Johns Hopkins University Press

Big Data Custodianship in a Global Society

Chris Poulin

With the rise of big data, we are witnessing the unprecedented ability of organizations, public and private, global and local, to collect data that is both broader in scope and more exacting in detail. This new technology paradigm allows for revolutionary advances, improving our daily lives in areas such as healthcare, national security, and consumer products and services. However, with any emerging technological advance, there is potential for abuse. In the case of big data, there are concerns about privacy, and even the potential for behavioral modi-fication. This paper will discuss the process of collecting, analyzing, and applying big data, using examples from international cases and the author’s work at the Durkheim Project.

What is Big Data?

Big data is a recently developed catchphrase in information technology that refers to the rise of data collection at a scale previously unfeasible.

Any piece of measurable data that previously might have been discarded can now be stored, and increasingly, everything that can be stored can also be analyzed. Big data initiatives have emerged in a variety of disciplines, in-cluding medical research, weather analysis, financial quantitative research, targeted e-commerce, and government intelligence activities.1

Historical DevelopmentThe historical foundation of the big data trend was the democratization of supercomputing-related technologies, otherwise known as “high per-formance computing.” The tools used in supercomputing, which dates to the 1960s, bear a striking similarity to many of the tools used in big data analysis, albeit now with a greater emphasis on functionality and ease of use. For example, many nuclear research programs required complex simulations capabilities that outstripped simple computing resources. Combined with an explosion in the amount of information available to scientists in other areas, such as astronomical science, researchers pushed for the development of tools that could do more with less. In the 1990s, these software tools

Chris Poulin is the principal partner of Patterns and Predictions, a big data prediction company. He is also director of the Durkheim Project, a non-profit big data collaboration with the U.S. Department of Veterans Affairs and Facebook. He recently served as co-director of the Dartmouth Metalearning Working Group at Dartmouth College, where he worked on large-scale machine learning. Poulin has also lectured on artificial intelligence at the U.S. Naval War College.

Page 109: Sais.34.1

110 SAIS Review Winter–Spring 2014

found their way to commodity machines (or personal computers), which formed the basis for the generation of tools we now know as big data.

Founded in 1998, Google was one of the first large commercial compa-nies to embrace commodity distributed computing. For Google’s founders, a primary goal was to provide better search engines, both in scope and so-phistication, while spending fewer resources to process the data. Their initial attempt was a home-brewed set of tools for distributed computing on low cost machines. In 2004, Google launched a series of internal initiatives to develop a reliable big data infrastructure. Their findings were subsequently published in a series of high profile research papers.2, 3

As Google was one of the greatest success stories in the business, many researchers took interest when these papers were released. One researcher, Doug Cutting, was inspired to write his own implementation of a distrib-uted search engine (called Nutch) and a distributed storage system (called Hadoop). The influential Hadoop distributed storage system, with its open source grid tools, and systems similar in functionality have further driven the rise of big data.

Applications Put simply, if you collect enough information, you can see things at a level of detail that you could not see before. For example, if given access to enough data about an anonymous individual, you can likely identify that individual. With enough information over time, you can predict events in that individual’s life. And if you observe how an individual reacts to these events, you can predict that individual’s future behavior. Within this infor-mation lies tremendous opportunity to help individuals avoid unnecessary hardships and live better lives.

These opportunities are especially apparent in the field of medicine. With the ability to collect and view more patient data, researchers and physi-cians are able to better quantify risks to individuals.4 They can view details that previously slipped past clinical practice and can identify potential medical problems before they happen. Within the deployment of big data systems in both research and clinical medical practice, the elusive prom-

ises of preventative medicine are closer to being realized.

As director of the Dur-kheim Project,5 a non-profit that generates linguistic-driven mod-els to predict veteran suicides, I explored how big data analy-sis could be used to save lives. Our team spent over a decade researching the cues in language

and the correlations between combinations of words and outcomes. While teaching a machine to understand language is widely considered to be a “hard AI” problem—that is, requiring an advanced level of artificial intelli-gence—many have proven that increasingly sophisticated computer systems

Assessing suicide risk is a particularly difficult diagnosis for clinical professionals, and one that is highly amenable to big data.

Page 110: Sais.34.1

111Big Data CustoDianship in a gloBal soCiety

can predict correlations of language with events. We applied this system to the challenge of estimating suicide risk among U.S. veterans. Assessing suicide risk is a particularly difficult diagnosis for clinical professionals, and one that is highly amenable to big data. Thus, the social utility of this type of application is obvious.6

However, many entities collect large amounts of highly detailed personal information for less purely altruistic purposes. Much of this in-formation has helped power the growth of the commercial Internet giant. Furthermore, some uses of big data are highly distressing when they go wrong;7 misuse can cause public embarrassment8 or financial hardship for an individual,9 and even pose challenges to government security.10 While it is a broader discussion to debate the scope of national security activities by state-run intelligence agencies, we do note that the capability to collect this data on individuals is now widely known by the public.

Researchers, private corporations, and public institutions now have the capability to collect and analyze data on a grand scale. With this newfound and increasingly powerful analysis capability, the question therefore arises: What actions should these actors undertake, and what actions should they refrain from?

What Does a Big Data Project Look Like?

We will discuss six important implementation areas, drawing examples from two hypothetical cases: (1) the highly public use of data during the Arab Spring,11 and (2) a human subjects study using private medical data.12 We will also draw lessons from our work at the Durkheim Project.

CollectionFirst, data must be collected. In the Arab Spring case, social media messages are pulled from Twitter’s web services. The data is comprised of “tweets,” which are electronic messages of 140 characters or less. Since the informa-tion in this case is published publicly, the users have consented to fairly unrestricted use of their data. In contrast, in the second hypothetical case, medical data generally requires informed consent in the form of an opt-in procedure, usually a formal consent agreement. The reader should note that nearly all other data falls somewhere between these two examples in terms of openness versus restrictedness. In the Durkheim Project, we follow clinical standards, even when not strictly needed, in order to maintain the utmost trust in the data collected by our system.

StorageIn both hypothetical examples, datasets are stored in large databases follow-ing collection. The individual or institution storing the data may or may not have a data governance or data privacy policy. In brief, a data governance policy determines what to do with the data, while a data privacy policy determines who can view the data. In the Twitter case, the data is of low sensitivity, since the users and their published tweets are publicly known.

Page 111: Sais.34.1

112 SAIS Review Winter–Spring 2014

However, in the case of medical data, the information requires more careful handling. For example, proper procedure restricts access to the dataset, and requires that data be stored securely behind an IT firewall. In the Durkheim Project, we first aggregated the data from social media and mobile applica-tions worldwide on secure cloud services, like Amazon Web Services, and then transmitted the data to our medical center partner for final storage.

ProcessingTo analyze the datasets, we must process the data via some sort of comput-ing cluster. This cluster may be in a secure data center, or it may not. The transmissions between the analyst who is sending the data and the machines that are processing the data may or may not be encrypted. The data process-ing machines may or may not be themselves secure. While this may not be critical in the first hypothetical case, it is certainly critical to the second. In fact, standards in the field of data processing are only now emerging. The Durkheim Project facilities have a supercomputer (Cray grid) behind a firewall that processes the data, before transmitting the results to another system that can be accessed by medical analysts directly via the Internet.

QueryOnce an initial data analysis has been conducted, it is necessary to specify who has access to the data. The question now becomes, who has rights to query the system? In both hypothetical cases, this is a concern. In the case of the Arab Spring, what if the U.S. Department of State had witnessed an increasing frequency of terms related to dissent in the weeks before the mass protests? This information would have been valuable to many actors: West-ern governments interested in anticipating regional upheaval or disaster; NGOs that needed to estimate demand for foreign aid; and even dictators that wanted to gauge the risk to their own regime. In the case of medical risk analysis, the issue of query is strongly related to the privacy of the indi-vidual. At the Durkheim Project, our medical protocol dictates that the raw data contained in medical records of those who are suicide positive are never seen by non-Veterans Affairs staff. However, once the data is “de-identified” and analyzed, our internal team (bound by non-disclosure agreements and HIPAA regulation compliance rules) can access the information.

ReportingAnalytics reports are usually a combination of numbers, qualitative visual-izations, and written narratives intended to measure tested outcomes. The end consumer might be a scientific journal, government research agency, or a policymaker. This leads to the question, can the consumer of the report trust the data? Does this consumer or decision maker understand how the data was collected? Do they know whether the data was stored and pro-cessed securely? How will the dissemination of the analytic report impact the people, places and things that were analyzed?

In the case of Twitter and the Arab Spring, analytic reports have been published publicly.13 In contrast, in the case of medical data, only

Page 112: Sais.34.1

113Big Data CustoDianship in a gloBal soCiety

de-identified data is disseminated outside of the direct researchers. At the Durkheim Project, we shared our preliminary results with other organiza-tions under non-disclosure agreement, as well as with our funding agency (DARPA). Final publication of results was in a peer-reviewed journal (PLOS ONE), and any further sharing of our data would have required restrictions. Dissemination of final results will vary highly by project, depending on the type of intended analysis.

ActionFuture leaders, having reviewed the data from the Arab Spring case, might learn to better allocate resources based on conflicts detected via social media. In the medical case, the proper use of risk metrics can enable triage of clini-cal resources, and produce recommendations for risk reduction or lifestyle changes. In considering these scenarios, it is important to ask, should the person entrusted with these decisions also be trusted with their analysis? We argue the answer is: only if they know (with great certainty) how the information was collected, stored, processed, queried, and reported. Mak-ing use of our own analysis, the Durkheim Project is currently engaged in intervention efforts related to suicide and other mental health risks. Effec-tive clinical interventions must first be based upon validated, peer-reviewed medical research. No responsible clinical decision maker would assume otherwise. Although we have published comprehensive methods, we have only begun the validation of our methodology, and this process is ongoing.

Behavioral Modification

If big data can enable us to predict events, then the actions of large cor-porations, NGOs and governments have the potential to affect behavioral change on those populations whose future behavior is predicted. In the cases described above, there are clear advantages to this predictive potential. However, big data analysis can also be applied with malicious intent. For ex-ample, consider a scenario in which a big data analyst can craft the rhetoric of thought leaders (perhaps on Twitter) to impact how an event will unfold. Simi-larly, consider a situation in which an unregulated medical insurance compa-ny understands individual risk values, and adjusts its pricing accordingly, thus adversely affecting the un-healthy population. These capabilities, combined with other tools like geo-location from mobile phone data, are incredibly powerful: they can predict where an individual will go and what they will do.

If big data can enable us to predict events, then the actions of large corporations, NGOs and governments have the potential to affect behavioral change on those populations whose future behavior is predicted.

Page 113: Sais.34.1

114 SAIS Review Winter–Spring 2014

At the Durkheim Project, in our own behavioral modification studies, we hope to enable interventions for individuals—not only before a negative event occurs, but well before. For example, if an individual has acute risk of suicide within days, then emergency services might need to be involved. However, if an individual is exhibiting early stages of a suicidal thought process, an intervener can offer softer cues that a more positive outcome is possible, thus avoiding later acute risk. This is, in fact, our current research effort.

Implications

When big data first emerged, early adopters embraced the mentality: “Now we can collect everything, so let’s do that.” But as use of big data technol-ogy evolves, institutions are becoming more careful of what they collect, on whom, and why.

Consent of the GovernedCareful implementation of opt-in/consent-based data collection saves a great deal of confusion and institutional heartache down the line. In this context, careful means explicit—for example, avoiding long contracts with fine print, and clearly describing the data’s current and future use. At the Durkheim Project, we have made a concerted effort to make our consent process easy to understand. Other organizations have taken this idea one step further, arguing for data ownership contracts with individuals.14 In the Arab Spring and Twitter case, while the data may be public, its analysis remains dangerous to those who were analyzed. In the medical case, the underlying data is already regulated and protected, but the meta-analysis is a legal grey area.

PrivacyPublicly available information is, and will likely always be, considered part of the information commons.15 At the same time, individuals will always value privacy. One function of privacy (especially on the Internet) is to spare us from embarrassment and societal exclusion. Privacy is also a protected right in many developed nations. Privacy protection in the world of big data will increasingly mean the difference between fair treatment and discrimination, or in many cases, life and death. In the Arab Spring case, if a regime could

identify the dissenters by their tweets, then their lives may be in jeopardy. In suicide risk prediction, we are fully aware of the implications of false positive signals—for example, we might embarrass our users.

In many cases, even a true positive might exacerbate an individual’s risk if the data is treated with improper care.

. . . it is the transparency of the underlying process that will win the trust of big data consumers.

Page 114: Sais.34.1

115Big Data CustoDianship in a gloBal soCiety

Transparency and TrustWhile some subjects are undoubtedly sensitive or proprietary, it is the transparency of the underlying process that will win the trust of big data consumers. Trust in systems is critical to maintaining a stable infrastruc-ture, and systems must have checks that are formal and procedural to instill trust. Furthermore, transparency on the part of data leaders will engender trust in systems. Data leaders should seek to assure the public, as customer or constituent, that their analysis will be used for constructive purposes. The Durkheim Project team understands that trust is the core function of our effort to reach out to, and stay in contact with, enough individuals to make a difference.

Big Data Leadership Going Forward

Big data offers a world-changing paradigm. The same technology that drove much of Google’s growth is now diffusing into the mainstream. This trend is revolutionizing health care, promoting a greater understanding of finan-cial risks, and a greater situational awareness in state security and military applications. Our world is becoming more aware of its own comings and goings, and as a result, has become more efficient in its capabilities for re-source allocation.

However, we are only beginning to see the potential applications of big data. Like the powerful paradigms before (for example, nuclear power), big data will offer applications for the benefit of mankind, as well as applica-tions that control others and harm societies. We are already beginning to witness some of the implications of deep privacy and behavioral tracking offered by big data.

In the rush to create and profit from these new technologies, many companies have become quite successful, largely following the existing let-ter of the law. However, as society increases its understanding of big data’s potential power, there must be a greater focus on defining what are and are not exploitive uses.

In this effort, we advocate more transparent and consent-based big data systems. We hope that with the Durkheim Project, we lead with a positive example of custodianship. We believe that systems with this design philosophy can truly change the world for the better, while systems without this philosophy tend to be adversarial and exploitive. Policymakers should be aware of both the promise and peril of these new data systems.

Notes

1 “Liberty and Security in a Changing World,” Report and Recommendations of The President’s Review Group on Intelligence and Communications Technologies, December 12, 2013, http://www.whitehouse.gov/sites/default/files/docs/2013-12-12_rg_final_report.pdf 2 Jeffrey Dean & Sanjay Ghemawat “MapReduce: simplified data processing on large clus-ters,” Google, Inc. (2004), http://static.googleusercontent.com/media/research.google.com/en/us/ archive/mapreduce-osdi04.pdf3 Andrew Fikes, Deborah A. Wallach, Fay Chang, Jeffrey Dean, Michael Burrows, Robert E Gruber, Sanjay Ghemawat, Tushar Chandra, Wilson C. Hsieh, “Bigtable: A Distributed Stor-

Page 115: Sais.34.1

116 SAIS Review Winter–Spring 2014age System for Structured Data,” Google Inc. (2006), http://static.googleusercontent.com/media/ research.google.com/en/us/archive/ bigtable- osdi06.pdf4 Chris Poulin, Brian Shiner, Paul Thompson, Linas Vepstas, Yinong Young-Xu et al., “Pre-dicting the Risk of Suicide by Analyzing the Text of Clinical Notes,” PLoS ONE 9, no. 1 (2004), http://www.plosone.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2F journal.pone.0085733&representation=PDF5 The Durkheim Project: www.durkheimproject.org6 Neal Ungerleider, “This May Be The Most Vital Use Of ‘Big Data’ We’ve Ever Seen”, Fast Company, http://www.fastcolabs.com/3014191/this-may-be-the-most-vital-use-of-big-data-weve-ever-seen7 Matt Pearce, “Dad gets OfficeMax mail addressed ‘Daughter Killed in Car Crash,’” Los Angeles Times, January 19, 2014.8 Charles Duhigg, “How Companies Learn Your Secrets,” New York Times, February 16, 2012, http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html?pagewanted=1&_r=2&hp& (accessed March 17, 2014).9 Bart Custers, Bart Schermer, Tal Zarsky, Toon Calders, eds., Discrimination and Privacy in the Information Society: Data Mining and Profiling in Large Databases, (New York, NY: Springer, 2012).10 Denver Nicks, “NSA Memo Says Snowden Tricked Colleague to Get Password,” Time, February 13, 2014, http://swampland.time.com/2014/02/13/nsa-leaks-edward-snowden-password/ (accessed March 17, 2014).11 David Wolman, “Facebook, Twitter Help the Arab Spring Blossom,” Wired, April 16, 2013, http://www.wired.com/magazine/2013/04/arabspring/ (accessed March 17, 2014).12 Dartmouth Committee For the Protection of Human Subjects (CPHS): Study #2378113 Gilad Lotan, et al. “The Arab Spring: Revolutions Were Tweeted: Information Flows during the 2011 Tunisian and Egyptian Revolutions,” International Journal of Communication, vol. 5 (2011), http://ijoc.org/index.php/ijoc/article/view/1246 (accessed March 9, 2014).14 Jaron Lanier, Who Owns the Future? (New York, NY: Simon & Schuster, 2013).15 Lawrence Lessig, “The Public Domain,” Foreign Policy, August 30, 2005, http://www.foreign-policy.com/articles/2005/08/30/the_public_domain (accessed March 17, 2014).

Page 116: Sais.34.1

117The LimiTs of Big DaTaSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

Of Note

The Limits of Big Data

Christine Croft

More Data, More Questions

Chris Poulin’s essay, “Big Data Custodianship in a Global Society,” es-tablishes a foundation for the exploration of fascinating and difficult

questions surrounding the use and influence of big data. While Poulin’s discussion ranges from the basics of big data to the implications of its use, three main questions beg greater consideration. First, one must consider in greater depth the “how” of big data: how it is collected, stored, analyzed, and reported. This “how” becomes essential as policymakers and technologists interact to create and inform policy. Second, one must also confront the uncomfortable question of whether big data is actually capable of changing the world. Will big data become the dominant force in helping policymakers predict protests or doctors save lives, or should big data be recognized as one of many metrics to consider? Finally, the question of whether big data will serve to modify behavior based on the knowledge of its collection presents a particularly troubling view of the future.

The Big Data Buzz

In recent years, big data has become a buzz word in the tech-savvy com-munity, simultaneously holding promise and mystery. The “what” of big data—what it can accomplish and the consequences of its use—has been well documented. Yet the “how” of big data is less easily understood, protected by technolo-gists as if it were an industry secret. While the rise of big and open data is purported to offer transparency and greater access, the pro-

117© 2014 by The Johns Hopkins University Press

Christine Croft is a first-year M.A. candidate at the Johns Hopkins University Paul H. Nitze School of Advanced International Studies (SAIS) concentrating in Strategic Studies. She is an Assistant Editor for The SAIS Review.

While the rise of big and open data is purported to offer transparency and greater access, the process of how to analyze raw data to create actionable insights is still a mystery to many in the business and policy communities.

Page 117: Sais.34.1

118 SAIS Review Winter–Spring 2014

cess of how to analyze raw data to create actionable insights is still a mystery to many in the business and policy communities.

In his essay, Poulin outlines the process of utilizing big data: collec-tion, storage, processing, query, reporting, and action. Though he appro-priately raises the issues of data privacy during the collection, storage, and reporting phases, he skims over the processing and query phases, which are arguably most important to the analysis of big data.1 The absence of a clear, scientific understanding and resulting communication of how mas-sive amounts of data are sorted and analyzed presents a major challenge to policymakers.

David Weinbeger, a senior researcher at Harvard University’s Berkman Center for Internet and Society and co-director of the Harvard Library In-novation Laboratory, argues that “it is not at all clear that human brains will be capable of understanding why the supercomputers have come up with the answers that they have.”2 The risk, he explains, is that we create “knowledge but no understanding.”3 If technologists cannot accurately explain to policymakers and business leaders why the data has produced a certain result, particularly in the case of an outlier, there is a chance the results of the analysis will be discarded as illogical or unsound.

As industry leaders, consumers, and policymakers become increas-ingly comfortable with the use of big data, we must increase our knowledge about how it is queried, analyzed, and reported. Otherwise, we risk creating statistical results that appear to lack a causal link, which only increases the risk for missing the correlated outcomes that big data is hailed as capable of predicting.

Should Big Data Analyze Everything?

The two case studies (the Arab Spring and the medical industry) discussed by Poulin reflect the diversity of big data’s applications. While some may laud the breadth of topics that big data can touch, one should be equally cautious about the applicability of big data to topics that require more nu-anced analysis and understanding than big data alone can deliver. The ease with which big data can be acquired and analyzed inclines us to look for answers where perhaps none exist.

The Arab Spring case is an excellent example of searching for an-swers where none exist. Ambiguity exists in the world, and big data cannot eliminate it; thus, policymakers must be comfortable operating within the bounds of the known and unknown. However, Poulin offers the Arab Spring as an example of a future in which data from Twitter feeds could be used to predict dissent and anticipate regime change. Such an idea stretches the limits of big data analysis; the events in Egypt cannot be reduced to measur-ing how often certain words are used on Twitter. Surely, each tweet fueled a dialogue, and in turn, a movement, but perhaps that is what big data misses. Big data cannot account for the cumulative effect of a movement, it cannot perceive intensity of human emotion, and it cannot predict free will.

Similarly, the case of suicide risk among U.S. veterans presents po-tential risks of relying on data. There is no substitute for person to person

Page 118: Sais.34.1

119The LimiTs of Big DaTa

interaction; we cannot replace doctors with computers that measure the word count rather than the intonation of a distressed veteran. We must rely upon individuals to collect the “raw data” provided by the human condition—a veteran’s face, eyes, and emotion. Data can be used to make predictions, and it is one of many tools that might decrease the risk of veteran suicides, but it cannot serve as a replacement for human interaction.

The overwhelming range of big data’s applicability in vari-ous industries can provide a false sense of security when it comes to searching for answers to perplexing questions. The challenge is that, while data is excellent at making correla-tions, it does not tell us why those correlations are significant or whether they have a causal relationship.4 Furthermore, by glorifying big data as the next paradigm for predicting the future, we sacrifice our understanding and appreciation of organic, emotional action—the type that fuels protests and saves lives.

Can Big Data Change Our Behavior?

One of Poulin’s strongest assertions about big data is its ability to predict an individual’s future behavior. From this assumption, he draws a second-ary conclusion—that organizations, NGOs, and governments can use that information to effect behavioral change. Poulin considers a situation where an analyst knows what will happen before the outcome and then modifies a “thought leader’s” discussion on Twitter to affect the predicted outcome.

With this assertion, Poulin fails to fully analyze the problems with manipulating human behavior. While studies have indicated that human behavior changes in response to a feeling of being “watched,”5 the benefit of big data is that it is collected at a meta-level, and individuals may not even realize that data is being collected about their actions. Furthermore, Poulin ignores the challenges inherent in data collection and analysis. The lack of understanding surrounding how data is collected and why it produces certain results may preclude governments and organizations from acting on the information, thereby precluding individuals from being influenced by the collection of data. Indeed, Columbia University statistician Victoria Stodden argues that individuals may not even listen to the answers provided by big data due to its complexity or the occurrence of outliers.6

The Orwellian notion that big data can modify behavior may be ex-citing to big data analysts, but presents a much darker picture for society. Big data cannot and should not provide the solution to every global crisis; rather, it should inform decisions as one of many tools used by policymak-ers or business leaders.

Big data cannot account for the cumulative effect of a movement, it cannot perceive intensity of human emotion, and it cannot predict free will.

Page 119: Sais.34.1

120 SAIS Review Winter–Spring 2014

Can Big Data Really Change the World?

Big data is often hailed as a world-changing paradigm, and perhaps it is. Yet by blindly accepting big data without thoughtful and thorough consider-ation of its limitations, society risks losing its appreciation for the human condition. The more we rely on the alleged predictive capabilities of data, the less we rely on our intuition, which reflects each individual’s own big data analysis of past experiences and knowledge.

Big data must be recognized as a tool that offers analysis to pressing questions and global challenges, but the results of big data analyses should never be accepted outright as the final solution to these challenges. In the future, it is likely that big data will be so ingrained in our analytical meth-odologies that its use will be as ubiquitous as the Internet. If we can find a way to utilize the merits of big data while appreciating its limits, society will have developed an essential tool to make sense of an increasingly data-filled world.

Notes

1 Christopher Versace, “The Big Deal about Big Data and What it Means for IT and You,” Forbes, 28 January 2014, http://www.forbes.com/sites/chrisversace/2014/01/28/the-big-deal-about-big-data-and-what-it-means-for-it-and-you/2 David Weinberger, “The Machine That Would Predict the Future,” Scientific American, December 2011, 6.3 Ibid.4 Gary Marcus and Ernest Davis, “Eight (No, Nine!) Problems With Big Data,” The New York Times, 6 April 2014, http://www.nytimes.com/2014/04/07/opinion/eight-no-nine-problems-with-big-data.html?action=click&contentCollection=Opinion&region=Footer&module=MoreInSection&pgtype=Blogs&_r=05 Jason Goldman, “How being watched changes you – without you knowing,” BBC, 10 Feb-ruary 2014, http://www.bbc.com/future/story/20140209-being-watched-why-thats-good. 6 Weinberger, 6.

Page 120: Sais.34.1

121Harnessing Data for national securitySAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

121© 2014 by The Johns Hopkins University Press

Harnessing Data for National Security

David Rubin, Kim Lynch, Jason Escaravage, and Hillary Lerner

Since 9/11, the U.S. government has initiated efforts to enhance its information-sharing capabilities and doubled its investment in counterterrorism, spending nearly $80 billion. Sharing data and conducting analyses across the government’s legacy stovepipes of infor-mation is challenging but mandatory to reduce redundancy, increase cost efficiency, and improve national security mission performance. The challenges involved in harnessing big data analytics for a more enlightened approach to national security center on striking the optimal balance between complex opposing forces—opportunity versus risk, collective security versus individual privacy, and innovation versus protection. While the government has made progress in identifying existing data sources and sharing high-level metadata, it is still in the early phases of the maturity curve in terms of enabling access across the federal ecosystem to leverage the valuable analytics that inform evidence-driven decision-making. This paper explores the strategies and frameworks to expedite effectively analyzing and using data to drive national security activities.

Introduction

French philosopher Denis Diderot, prominent during the Age of Enlight-enment, authored an early version of the encyclopedia to condense stores

of knowledge into one place, thereby democratizing access to insight. The twenty-first century Internet is based on a similar premise.

David Rubin, a senior vice president at Booz Allen Hamilton, leads the Law Enforcement and Intelligence Subaccount focusing on the Department of Justice, Federal Bureau of Investigation, and Department of Homeland Security. He brings 27 years of professional experience supporting analysis related to national security.

Kim Lynch, a principal at Booz Allen Hamilton, leads the firm’s work at the Department of Homeland Security Office of Intelligence and Analysis along with other national security clients. Ms. Lynch also serves as an adjunct professor at George Washington University.

Jason Escaravage is a principal on Booz Allen Hamilton’s Strategic Innovations Group. Since joining the firm in 2000, he has supported a diverse set of customers throughout the Department of Defense and Intelligence Community with a focus in technology-driven analytics to automate, enhance, and expand an organization’s ability to analyze its mission critical information.

Hillary Lerner is a senior associate focused on policy, communications, and training support for the Department of Homeland Security and other national security clients.

Page 121: Sais.34.1

122 SAIS Review Winter–Spring 2014

Our connected society continuously produces valuable data that is ushering in a new era of enlightenment. The convergence of behavioral, scientific, economic, and environmental data offers access to unimaginable discoveries that have the power to transform our lives. When the data holdings of the United States government can be more easily accessed and shared across agencies and departments at the federal, state, and local levels, perhaps the United States will achieve a modern-day vision of the collective enlightenment Diderot conceived.

Advanced technologies allow information about our purchases, social networks, movements, and physical identities to be collected, stored, ana-lyzed, and used in unprecedented ways, sparking an ongoing, impassioned debate about the tenuous intersection of national security and civil liber-ties. “The immense volume, diversity, and potential value of data will have profound implications for privacy, the economy, and public policy,” notes John Podesta, Counselor to the President.2

Since 9/11, the U.S. government has initiated efforts to enhance its information-sharing capabilities and doubled its investment in counterter-rorism, spending nearly $80 billion. This included the establishment of the National Counterterrorism Center (NCTC), an organization charged with bringing together more than thirty data sources from across federal departments and agencies to provide real-time and strategic intelligence related to terrorism. Today, NCTC has established itself as the focal point for sharing information on terrorism and driving activities across the Intel-ligence Community.

As is often the case, the challenges involved in harnessing big data analytics for a more enlightened approach to national security center on striking the optimal balance between complex opposing forces—opportu-nity versus risk, collective security versus individual privacy, and innovation versus protection. Some level of individual exposure is required to pursue the collective, societal benefit. And, protection measures must not have a chilling effect on innovation that empowers us to stay a step ahead of ad-versaries.

Setting aside bureaucratic turf wars over rigid silos of locked informa-tion, today’s legislative and regulatory leaders must collaborate and engage in conversations that are less about the inevitable, now largely accepted, col-lection and storage of data in the contemporary ecosystem, and more about developing and coordinating appropriate policies and governance frame-works to manage the use of information.

Conventional data practices and mechanisms do not allow federal agencies to effectively and securely integrate, share, and analyze segmented data sources in order to draw the critical conclusions required for informed national security. The data itself can be used to accelerate and enhance the consolidation of disparate repositories of information and intelligence. And, savvy investments in cloud analytics solutions will allow government leaders to fully capitalize on the power and promise of big data, gaining real-time situational awareness of constantly evolving threats to national security.

Page 122: Sais.34.1

123Harnessing Data for national security

Agencies must collaboratively create and carefully manage shared data environment policies and governance frameworks that effectively balance national security and privacy concerns. Their ability to accomplish these ob-jectives will directly impact the government’s mission to safeguard America’s borders, infrastructure, and citizens, while ensuring a commitment to the civil liberties defined in the Constitution.

Data: The Most Powerful Asset in America’s Arsenal

Data impacts every facet of our personal and professional lives. Every day, we willingly share information with government agencies, Internet service providers, financial institutions, social networks, and other entities that enhance and sustain our existence.

Big data and associated analytics deliver transformative value to soci-ety across a broad range of activities in the commercial sector, from trans-portation to energy to healthcare. In the public sector, data also drives nearly every aspect of mission-critical national security—from intelligence analysis to military operations, fraud detection, counterterrorism, diplomatic deci-sions, and foreign policy initiatives.

The ability to aggregate and analyze large amounts of data can help governments and organizations better address public health issues, learn how economies work, and prevent fraud and other cyber crimes. Data can help isolate the needle in the haystack. The White House’s Big Data Research and Development Initiative underscores a growing recognition that big data analytics can aid in solving some of the nation’s most complex problems.6 Big data can address pressing social, political, and economic changes; ac-celerate the pace of discovery in science and engineering; and strengthen national security.

Relevant information often comes from unlikely sources. Consider Eliot Higgins, a self-taught munitions expert and blogger based in Leicester, UK, credited with confirming Syria’s use of chemical weapons. Relying on verified social media sources inside Syria, Higgins, known online as “Brown Moses,” posted video evidence of civilian casualties on his blog and proved to be an indispensable, authoritative source, especially given the Bashar al-Assad regime’s effective ban of international press in Syria.7 Similarly, some of the best, real-time information regarding Arab Spring came from social media sites like Twitter and Facebook.

The challenge is monitoring the multitude of available information sources. By the time government agencies produce classified intelligence reports, people like Higgins may have already known about and reported on the relevant international developments.

Information Sharing Saves Lives

Similar to the federal government, the retail and healthcare industries al-ready possess a wealth of information that lends itself to advanced analytics. Online retailers, for example, tap into every available data to unlock insights about customers’ preferences and purchasing behavior, target their market-

Page 123: Sais.34.1

124 SAIS Review Winter–Spring 2014

ing efforts, and give consumers what they want before they know they want it. Demanding, instant gratification-driven consumers agree to the tradeoff of sacrificing some privacy in exchange for the perks of personalized delivery. In healthcare, like national security, historical data and information related to risk factors and outcomes are used to measure efficacy and inform criti-cal decision-making.

Case studies from the healthcare industry illustrate the transformative process and performance impact of data analytics. Using advanced analytic techniques, Columbia University Medical Center shortened the time needed to identify possible complications in brain-injured patients with potentially lifesaving results.9 In 2011, researchers at Kaiser Permanente used the medi-cal records of 3.2 million individuals to find a link between mothers’ use of antidepressant drugs and autism spectrum disorders in their children, determining that if a mother used antidepressants during pregnancy, her child’s risk of developing autism doubled. The only reason researchers had access to the medical records was because they had been collected earlier and retained.10

Growing From Ground Zero to Big Data Analytics

Sharing data and conducting analyses across the government’s legacy stove-pipes of information is challenging but mandatory to reduce redundancy, increase cost efficiency, and improve national security mission performance. Critical connections remain elusive because different agencies relying on different databases are responsible for different aspects of securing and facilitating the movement of data. A leaner, more productive government enterprise will harness the power of existing data at its disposal.11

Lack of awareness of information assets may prevent an organization from including critical data in its scope of analysis. Increased knowledge of existing information resources is required in order to unlock these resources for analysis and derive maximum value. It is a continuum that starts with organizing and enabling access to current data; then expanding the size, type, and scale of data that systems can process; and enhancing analytics capabilities to better execute and manage enterprise priorities.

The Drug Enforcement Administration’s 2013 National Drug Threat Assessment notes that the “heroin seized each year at the Southwest Border increased 232 percent from 2008 (558.8 kilograms) to 2012 (1,855 kilo-grams). The increase in Southwest Border seizures appears to correspond with increasing levels of production of Mexican heroin and the expansion of Mexican heroin traffickers into new U.S. markets.” This type of trend can be further explored and addressed by the U.S. government through an opera-tional lens to give front-line mission operators additional insight through money laundering-related data, network analysis, and demand data. With the right analytic tools, analysts can more effectively and efficiently use these types of large-scale data repositories—not by combing through looking for a needle in a haystack, but rather through tools that quickly identify patterns, trends, and anomalies meant for further exploration.

Page 124: Sais.34.1

125Harnessing Data for national security

While the government has made progress in identifying existing data sources and sharing high-level metadata, it is still in the early phases of the maturity curve in terms of enabling access across the federal ecosystem to leverage the valuable analytics that inform evidence-driven decision-making. In the past, foreign policy experts relied on small amounts of data to implement national security activities. There will always be gaps in data and analysis, but the future of data analytics provides the opportunity for better decisions based on a more informed perspective.

“Public agencies are in the early days of their big data efforts. Chief Information Officers (CIO) overwhelmingly report that they are just get-ting started with big data efforts. While they see the value in synthesizing databases and using analytics, the challenge of big data has overwhelmed them.”12 Agency CIOs are aware of the value proposition of data analytics, knowing that big data can revolutionize service delivery and streamline business operations.

In a recent report by MeriTalk, Balancing the Cyber Big Data Equation, which captured insights from federal IT experts in the interwoven disciplines of big data and cybersecurity, “agencies agreed there is ‘tremendous value’ in data the government collects, but note that agencies ‘lack both infrastruc-ture and policy to enable correlation, dissemination and protection.’”13 In short, the data is there—and will continue to grow—but federal policies will have to change before that data is used in the most efficient way.

Strategies and Frameworks to Expedite the Transition

The Information Sharing EnvironmentThe Office of the Director of National Intelligence’s (ODNI) Information Sharing Environment (ISE) encompasses the people, projects, systems, and agencies that enable responsible information sharing for national security.14 Supported by standards, policy, and governance, ISE partners develop and sustain mission processes and activities that help frontline law enforcement, intelligence, and defense personnel detect, prevent, and mitigate terrorist activity. One of ISE’s mission objectives is to improve nationwide decision-making by transforming from information ownership to stewardship. This will be accomplished by:

•   Achieving greater interoperability through standards-based acquisition•   Driving responsible information sharing by interconnecting existing net-

works and systems with strong identity, access, and discovery capabilities•   Standardizing, reusing, and automating information sharing policies and

agreements with strong protection of privacy, civil liberties, and civil rights

ODNI recently introduced a new distributed data aggregation reference ar-chitecture designed to make information sharing between federal agencies easier. “In a sense, it helps take the ‘big’ out of big data,”15 explains Ksh-emendra Paul, a program manager at ISE. Instead of inefficiently sharing data in bulk, which causes potential problems related to privacy, policy, and security, enhanced correlated data will consistently be pushed to agencies.

Page 125: Sais.34.1

126 SAIS Review Winter–Spring 2014

For example, to assist in the processing of visa applications, limited data regarding an individual’s criminal record could be exposed between the departments of Justice and State.

Hierarchical Access ControlBuilding on the concept of multilevel access privileges, consider this ex-ample: If there’s a “person of interest” identifier, such as a phone number, in both a DEA database and an FBI database, even though detailed informa-tion is only available to those with authorized access, it is helpful to agents in both organizations to know the individual is on the radar of multiple agencies as part of a drug or counterterrorism investigation.

Government organizations can establish alliances and validate the data’s authoritative nature as part of sophisticated identity and access man-agement. Business rules can dictate that alliance partners have full access, for example, while others receive only the “executive summary” without being able to see the granular data.

Big data has opened up new and broader access to sensitive informa-tion. Organizations need to keep pace with new security concerns and bring big data under a sound identity and access management umbrella.16

Data Policy Coordination through Governance Frameworks“Next-generation governance provides a framework to help federal agency CIOs and other leaders understand how they can use available tools and data, not only to manage individual programs but also to coordinate pro-gram portfolios within agencies and across the government. In particular, the framework can help agencies build data-driven governance that ensures compatibility, consistency and efficiency in federal IT investments,” explains a senior Booz Allen Hamilton technology leader.17

CIOs say that poor data governance is the most critical factor holding up agencies in their efforts to pursue big data. “We continue to build on a fragile foundation…. Data governance is not sexy and no one wants to do it, yet it is our Achilles’ heel.”18

A next-generation framework streamlines and strengthens decision making by putting information, such as critical risk and cost data, at the center of governance. Harnessing the capability of richer data sets and ro-bust analytics tools gives government decisionmakers transparent insight into mission performance, as well as enabling rapid information sharing and real-time reporting to support a more agile process.19

Data-Powered Risk ManagementNext-generation governance can help agencies achieve mission goals and risk management objectives. A scalable and auditable governance framework enables effective implementation of the policies and processes required to ensure desired outcomes and manage inherent risk. The same data analyt-ics tools used to streamline government processes can be applied to risk mitigation efforts.

Page 126: Sais.34.1

127Harnessing Data for national security

Sophisticated data-centric threat tracking allows government officials to monitor activity in specific regions and quickly issue alerts in response to real-time circumstances. Data can also help fiscally constrained law enforcement and national security entities efficiently allocate, deploy, and align resources with prioritized levels of risk. The key is leveraging data and analytics to improve workflow and augment human expertise.

Tracking patterns and analyzing trends lead to relevant conclusions and reliable predictions. When a shift in activity reveals a vulnerability, law enforcement can intercede and initiate a proactive response to prevent an emerging threat from morphing into a tangible risk. For example, cross referencing ship manifests against known data from ongoing criminal in-vestigations can be used to alert officials regarding which cargo containers to search at ports of entry.

Chief Data Officers to Nurture Need-to-Share ModelThe concept of establishing Chief Data Officers (CDOs), more often found in commercial organizations, can help enterprises and government agen-cies adopt governance structures, increase data transparency, and develop information access strategies for accomplishing specific objectives. Data assets—like IT resources aligned to the CIO—are critical to enterprise suc-cess and therefore require senior-level accountability and management to drive maturation. CDOs can nurture a transition from a need-to-know to a need-to-share paradigm that manages data assets in compliance with privacy policies while simultaneously removing any hurdles that prevent extracting inherent value from information that can and should be exchanged.

Back to Basics: Balancing Risk and Rewards

Internally, big data empowers government leaders to track the success of national security initiatives and allocate resources to those that are yield-ing a measurable return on investment. From a broader perspective, effec-tive management of data assets enables the government to proactively and consistently protect American citizens, our border, and the U.S. economy.

As data structures continue to grow larger and more diverse, the chal-lenges of integrating and protecting data will become even greater. “The revolution in big data and cloud computing has ignited a ‘gold rush’ to ex-tract value from the mountains of digital information collected and stored by government. The benefit to agencies…will be substantial.”20

Failure to adopt big data analytics will constrict mission performance. When evaluating potential privacy infringement, a holistic view of the com-pelling advantages offered by big data analytics should be part of a balanced risk equation. The consequences of not adopting big data analytics will surely outweigh the risk.

When government organizations examine how these analytics tech-niques are transforming enterprises in real-world scenarios, the value becomes apparent via dramatic gains in the efficiency, efficacy, and perfor-mance of mission-critical business processes. 21

Page 127: Sais.34.1

128 SAIS Review Winter–Spring 2014

The U.S. national security posture has a lot to gain from a rigorous effort to integrate massive data troves and glean the valuable insights that will help dictate America’s future.

Notes

1 Peter Daboll, “5 Reasons Why Big Data Will Crush Big Research,” Forbes, December 3, 2013, http://www.forbes.com/sites/onmarketing/2013/12/03/5-reasons-why-big-data-will-crush-big-research/ (accessed March 11, 2014).2 John Podesta, Counselor to the President, “Big Data and the Future of Privacy,” The White House Blog, January 23, 2014, http://www.whitehouse.gov/blog/2014/01/23/big-data-and-future-privacy (accessed February 21, 2014).3 Craig Mundie, Senior Advisor to CEO of Microsoft, “Privacy Pragmatism, Focus on Data Use, Not Data Collection” Foreign Affairs, March/April 2014, http://www.foreignaffairs.com/articles/140741/craig-mundie/privacy-pragmatism (accessed March 11, 2014).4 Michael Allen, Majority Staff Director, House Permanent Select Committee on Intelligence, Book Discussion on Blinking Red, Crisis and Compromise in American Intelligence after 9/11, C-Span, December 5, 2013.5 “Privacy Pragmatism.”6 Jason Escaravage and Peter Guerra, Enabling Cloud Analytics with Data-Level Security: Tapping the Full Value of Big Data and the Cloud, Booz Allen Hamilton, 2013.7 Patrick Radden Keefe, “Rocket Man, How an unemployed blogger confirmed that Syria had used chemical weapons,” The New Yorker, November 25, 2013, http://www.newyorker.com/reporting/2013/11/25/131125fa_fact_keefe (accessed March 11, 2014).8 “Rocket Man.”9 Data-driven healthcare organizations use big data analytics for big gains, IBM, February 2013, http://public.dhe.ibm.com/common/ssi/ecm/en/imw14682usen/IMW14682USEN.PDF (accessed March 11, 2014.10 “Privacy Pragmatism”11 Suzanne Storc and Mike Delurey, Marshaling Data for Enterprise Insights, A 10-Year Vision for the US Department of Homeland Security, Booz Allen Hamilton, 2012.12 “What CIOs say about big data,” FCW, February 24, 2014, http://fcw.com/articles/2014/02/24/exectech-drill-down.aspx (accessed March 11, 2014) in reference to IBM Center for the Busi-ness of Government report, Realizing the Promise of Big Data (http://is.gd/IBM_Center_BigData).13 Frank Konkel, “Report: Big data, cybersecurity intrinsically linked,” FCW, February 25, 2014, http://fcw.com/articles/2014/02/25/crit-read-big-data-cyber.aspx (accessed March 11, 2014).14 Information Sharing Environment, www.ise.gov. 15 Frank Konkel, “Three things to watch on information sharing,” FCW, January 31, 2014, http://fcw.com/articles/2014/01/31/paul-on-info-sharing-2014.aspx (accessed March 11, 2014).16 Jonathan Lewis, Big Data, Big Privacy Concerns: Identity Management in a Big Data World, M2M Evolution, March 7, 2014, http://www.m2mevolution.com/topics/m2mevolution/articles/372521-big-data-big-privacy-concerns-identity-management-a.htm (accessed March 11, 2014).17 Fred Knops and Mike Isman, “Next-generation governance is the key to future IT success,” GCN, April 13, 2011, http://gcn.com/Articles/2011/04/13/Commentary-next-generation-governance.aspx (accessed March 5, 2014).18 “What CIOs say about big data.”19 Next-Generation Governance, Enhanced Decisionmaking Through a Mission-Focused, Data-Driven Approach, Booz Allen Hamilton, April 2011.20 Jason Escaravage and Peter Guerra, Enabling Big Data with Data-Level Security, The Cloud Analytics Reference Architecture, Booz Allen Hamilton, 2012.21 Generating Value From Big Data Analytics, ISACA, January 2014, http://www.isaca.org/Knowl-edge-Center/Research/ResearchDeliverables/Pages/Generating-Value-From-Big-Data-Analytics.aspx (accessed March 11, 2014).

Page 128: Sais.34.1

129ImplIcatIons for the IndIvIdual and the stateSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

129© 2014 by The Johns Hopkins University Press

China’s Beidou: Implications for the Individual and the State

Eric Hagt

China’s satellite navigation system, Beidou, has been elevated from a second-order consider-ation to a place of prominence in China’s space program. This transformation in national priorities is the result of a convergence of several state goals involving economic development, industrial and technological innovation policies, as well as domestic and external security. The implications of a completed Beidou system are significant for the development and security of the Chinese state and the individual, in both beneficial and disquieting ways.

Introduction

The evolution of China’s Beidou Navigation Satellite System (北斗卫星导航系统 beidou weixing daohang xitong) from a regional project into a

global capability reflects more than a loosening of fiscal constraints, but a clear national policy agenda. Beidou (abbreviated as BDS) is a mega project that bolsters national development through several means: it builds the nec-essary infrastructure for a modernizing economy, propels China’s domestic science and technology efforts, and moves industrial activity up the value-added chain. Though more aspirational than reality at this point, BDS has the po-tential to tap into the lucrative global satellite navigation mar-ket, rivaling GPS and Galileo.

However, Beidou is also the backbone for an informa-tion system that provides the Chinese state with an impressive array of military as well as internal moni-toring and surveillance capabilities. This application has important con-sequences for China’s strategic interaction with other countries, and for individuals in Chinese society. The balance of priorities in the early years of Beidou will determine whether it will be used primarily as a coercive tool or to augment China’s national development.

Eric Hagt is a PhD candidate in the China Studies program at the Johns Hopkins University Paul H. Nitze School of Advanced International Studies (SAIS).

Though more aspirat ional than reality at this point, BDS has the potential to tap into the lucrative global satellite navigation market, rivaling GPS and Galileo.

Page 129: Sais.34.1

130 SAIS Review Winter–Spring 2014

Humble Beginnings

As of early 2014, China has launched sixteen satellites for its second genera-tion BDS (Beidou-2). It has offered services to customers in the Asia-Pacific region since December 2012, and China plans to serve customers worldwide upon the system’s completion in 2020. At that time, the system will include thirty-five satellites, comparable to the U.S. GPS and the planned European Galileo. It consists of both civilian and military signals, the former with an accuracy of five meters in Southeast Asia and the latter in the centimeter range.1 China promotes BDS as more reliable—if not more accurate—than GPS for the Asia region. Furthermore, China’s satellite system offers special-ized functions such as the ability to use Beidou for SMS, and an enhanced capacity to overcome the “urban canyon” effect, whereby satellite imagery is impeded by a density of tall buildings, a problem of particular salience in highly urbanized Asian society.2

While the current BDS enjoys strong political and financial support, China’s initial efforts to develop a satellite navigation system experienced a slow start. The inception of BDS can be traced to 1985: A Chinese military officer attending a conference on GPS realized that while the free use of

GPS greatly benefited Chi-na, it was also conditional. In other words, China was dependent upon the Unit-ed States for its satellite navigation needs—a state of affairs that was fundamen-tally unacceptable to the People’s Liberation Army (PLA). This marks the first rationale for an indepen-dent BDS system and set

in motion a plan to build the first generation of Beidou.3 Yet for the next ten years, technological and economic obstacles resulted in little headway.

A number of events in the 1990s shifted the terms of support for Beidou’s development. The 1990 Gulf War demonstrated in spectacular fashion the force-multiplying effect of U.S. space-based assets.4 Then in two separate incidents—the Yinhe Incident in 1993 and China’s large-scale military exercises in the Taiwan Strait in 1996—China’s access to the GPS signal was allegedly interrupted.5 These events underscored two lessons for the PLA: First, a satellite navigation system was requisite for a modern military; and second, as long as China was dependent on GPS, the country would be vulnerable. Consequently, funding for an experimental regional navigation system (Beidou-1) was initiated around 1997 to 1998.6

This first generation program was a modest system consisting of two satellites in geosynchronous orbit (with one spare) and one ground station, which provided active (vulnerable two-way) transmission with limited accu-racy (20 to 30 meters) and only regional coverage. While space was a prior-

The inception of BDS can be traced to 1985: A Chinese military officer attending a conference on GPS realized that while the free use of GPS greatly benefited China, it was also conditional.

Page 130: Sais.34.1

131ImplIcatIons for the IndIvIdual and the state

ity for China’s leaders during the 1990s, the state’s limited fiscal resources meant that the manned space effort took precedence over China’s satellite programs, including the navigation system.

The core rationale for Beidou in its early phases—its military signifi-cance and the vulnerability of relying on GPS—was primarily based in na-tional defense terms. However, a confluence of changing strategic, political, social, and economic circumstances over the following decade motivated China to build a far more robust satellite navigation program.

A Perfect Storm: Beidou Takes Off

On the surface, an important turning point for Beidou-2 was the falling out between the Chinese and the Europeans over participation in the Galileo project, Europe’s navigation system. China joined Galileo with a €230 mil-lion euro investment in 2003, but it was apparently excluded from certain aspects of the project in 2006. Soon after, China independently developed its own global system, the Beidou Navigation Satellite System (Beidou-2).7

While this event undoubtedly helped cement the logic of launching an autonomous navigation capability, other elements in China’s national development have conspired to drive the program’s development. In short, Beidou has come to be seen as a strategic asset by China’s leaders. For the Chinese, a strategic asset incorporates a quality that extends beyond national security or traditional military utility, as strategy is usually per-ceived in the West.8 Domestic stability and economic growth are first-order considerations for China’s leaders: In the official discourse, these areas are framed as the “core national goals of economic and social development and national security.”9 Therefore, a strategic asset serves not only external security needs, but it also maintains social order and furthers economic development. Beidou and China’s satellite program are viewed as emerging sectors that are strategically significant for military needs, and for China’s overall national evolution.10

With regard to the economy, BDS has taken on new importance in several respects. Global positioning and navigation are indispensable for a wide range of critical domestic activities, including shipping and air traffic control. Furthermore, banking transactions, stock exchanges, and other financial activities would screech to a halt without the precise timing ap-plications of these systems. Additionally, road transportation, the electricity sector, the Internet, and weather monitoring all depend heavily on global position and navigation.11 These services can be accessed free of charge through GPS, but reliance on GPS would leave China’s vital economic and financial foundations vulnerable to disruption. An influential figure in the Beidou program, Major General Yuan Shuyou, noted, “The Americans would just have to slightly tamper with GPS and our whole system could be paralyzed.”12 Thus, the supporting functions of BDS in the economy are now increasingly seen as critical national infrastructure.

Beidou may contribute to the economy in more direct ways, as well. One objective for building an independent system has always been the lure of capturing the lucrative regional and global satellite navigation market,

Page 131: Sais.34.1

132 SAIS Review Winter–Spring 2014

estimated at roughly $250 billion.13 Sun Jiadong, a scholar and chief de-signer of Beidou, has stated that China hopes to capture 80 percent of the domestic market and 20 percent of the global market. If these targets are achieved, annual revenue could reach as high as ¥500 billion.14

In order to promote marketization of its navigation services, China has gone on an impressive “transparency spree.” In December 2012, Ran Chengqi, the director of the China Satellite Navigation Office, announced the publication of two important technical documents—Beidou’s interface control document and the BDS Open Service Performance Standard—that assist developers in manufacturing receivers and chips.15 However, the hurdles to achieving market viability are high. To date, Beidou has captured less than 10 percent of the domestic market.16 High-level officials have publicly called into question national policy planning to develop Beidou, citing industry excess, inefficiency and abuse.17 Therefore, while Beidou has promise as a financially viable service, this objective remains an aspiration.

Beidou has been recognized as a strategic industry for other reasons, as well. A central feature of China’s recent industrial and science and tech-nology (S&T) policies has been to foster a National Innovation System (NIS).18 This policy orientation is being operationalized through a broad national investment strategy to promote indigenous efforts in science and technology. For the past thirty years, China’s development strategy has been dominated by lower-end manufacturing to take advantage of the country’s cheap and relatively unskilled labor pool. China’s leadership understands that the next round of development will require far greater inputs of human talent and ingenuity. The high-end technology demands of a broad space program are a poster child for such efforts.

To reach this goal, numerous initiatives and plans have been initi-ated since the late 1980s,19 but the centerpiece was the “National Medium and Long-Term Plan for the Development of Science and Technology 2006–2020” (MLP), released in 2006. This document represents the Chinese belief that innovation can be steered by the government. It relies heavily on supply-side policies for research and education. New targets to strengthen indigenous innovation have raised concerns about the emergence of techno-nationalism, in which the technological capabilities of a nation’s firms are key sources of their competitive prowess, and these capabilities are built by national action.20 Beidou is not mentioned specifically in the MLP, but aerospace features prominently in its target areas.21

Chinese leaders more recently detailed their intentions to develop a comprehensive NIS by 2020 in a circular, “Opinions on Deepening the Reform of the Scientific and Technological System and Speeding up the Building of a National Innovation System.”22 According to the document, the value of emerging industries of strategic importance (a list that includes the satellite industry) will total about 8 percent of national GDP by 2015 and 15 percent by 2020. Domestic media reports in 2010 suggested that China would invest up to $1.5 trillion over five years to support and expand the strategic industries, a figure that represents an annual expenditure of 5 percent of GDP.23

Page 132: Sais.34.1

133ImplIcatIons for the IndIvIdual and the state

Other important national planning documents like China’s 11th and 12th Five-Year Plans (2006–2010 and 2011–2015) support Beidou indirectly through increased budget allocation for high technology R&D. Further-more, there are now three white papers—authoritative documents on policy directions—on China’s space activities that demonstrate the Chinese state’s concentrated focus on space programs, with Beidou as an increasingly cen-tral component. Finally, since the Beidou program was specified as one of the strategic emerging industries in 2010, a rash of policy planning docu-ments to develop Beidou have been released by the State Council.24

The Beidou program would foster a domestic space industry. Further-more, as a highly dual-use technology—for civilians and military alike—it would bolster development in the context of civil-military integration (CMI), a policy initiative launched in China in 2003. Historically, China’s defense industry has been sharply segregated from the civilian sector of the economy. By bridging the most innovative parts of the civilian and military sectors, China’s industry policymakers hope to reap greater economic effi-ciencies in manufacturing and marketing, and to promote synergies in sci-ence and technology, although the initiative has met with limited success.25 Given the vast regional and global market potential for satellite navigation services, the Chinese government recognizes the potential of the Beidou program as a critical impetus for CMI policy.26

In summary, the Beidou program has steadily migrated from the pe-riphery to center stage of China’s strategy to build an NIS. This evolution is evident in an overall shift in China’s space program, which had been domi-nated by manned space since the 1990s. Although China continues to make great strides in manned space (Shenzhou) and the space lab (Tiangong), the satellite program has received increasing attention since the mid-2000s, with Beidou a large part of this focus.27 A satellite navigation program is es-sential for supporting financial and civilian infrastructure, and it promises great rewards if regional and global satellite navigation markets can be won. But the implications of such a system for security, including military and domestic stability, are considerable.

External Security

From the perspective of Beijing, China’s external security is threatened by several factors.28 First, the status of Taiwan is a core national interest for China. Cross-strait relations have generally improved since the election of President Ma Ying-jeou in 2008. However, short of reunification, Taiwan will remain a potential flash point, particularly for U.S.-China relations. Second, the Asian regional security environment is growing increasingly complicated. Threats include island disputes in the South China Sea (principally with the Philippines, Vietnam, and Malaysia) and the East China Sea (with Japan); residual border tensions with India; simmering tensions over North Korea’s nuclear program; and U.S. military activity near China’s northwestern bor-der in Afghanistan. Finally, in many respects, China’s interests are expand-ing globally. Its reliance on natural resources around the world (primarily energy and rare minerals) is growing, and leaders in Beijing are increasingly

Page 133: Sais.34.1

134 SAIS Review Winter–Spring 2014

concerned with guaranteeing the country’s supply. Moreover, multilateral trade in goods, services, and technology is no longer a luxury, but a neces-sity for China’s economic health.

In the aggregate, these external security conditions call for greater mili-tary capabilities. Whether it is the means to monitor China’s regional secu-rity environment or the ability to project power beyond China’s borders, an independent, robust navigation-positioning system is essential. In addition to the Beidou-2 system, which boasts a military signal with centimeter-level accuracy, other space-based platforms such as reconnaissance and commu-nications satellites significantly expand the military capabilities of the PLA. In fact, over the past several years, the PLA has acquired a range of assets that greatly enhance force projection, precision-guidance, target acquisition, command and control, force deployment, and damage assessment.29 These assets include twenty-three communication satellites (only ten of which are thought to have commercial application), and forty-four earth observa-tion and remote sensing satellites (of which roughly half are operated by the military). China now has the world’s second largest satellite program.

Although China’s program is not necessarily equivalent to the United States in terms of function and capability, these gross figures of communica-

tion, observation, and navigation-positioning approach U.S. military figures, especially in terms of regional context.30

Finally, the applications of BDS and China’s comprehensive satellite suite are of particular value for China’s near-seas maritime domain. This is crucial for creating a more complete Command, Control, Communica-tions, Computers, Intelligence, Surveillance and Reconnaissance (C4ISR) network in the region. Furthermore, Beidou-2 can be applied to mineral exploration, fishery resource management, survey and mapping, and other satellite geodetic functions that are especially useful for asserting China’s claims and interests in the South and East China Seas.

Internal Security

Domestic order is a growing preoccupation of China’s leaders. Annual re-ports of mass unrest continue to mount and have roughly averaged 100,000 “mass incidents” per year since 2004.31 While they result from a wide variety of perceived economic, social, environmental, and legal inequities, the phe-nomenon is pervasive across regional and social segments of the country. Viewed from this perspective, China’s domestic stability appears to be a monumental challenge.

The overarching rubric for this phenomenon is weiwen, or “maintain-ing stability,” and it has captured the leadership’s attention, as well as a growing portion of central government resources. The influence of weiwen on national policymaking was demonstrated in 2011, when the central govern-ment announced it would devote more funding to weiwen than to the entire

China now has the world’s second largest satellite program.

Page 134: Sais.34.1

135ImplIcatIons for the IndIvIdual and the state

national defense budget.32 This pivot to internal social order has profound implications for the government’s ability to monitor, surveil, and ultimately control a large and complicated demography in a vast geographic space. It is probably no accident that coincident with this timeframe, China’s satel-lite program (observation, communication, and nav-igation-positioning) dra-matically increased.33 A robust satellite navigation system—along with other geo-location and imagery technologies—would pro-vide the government with a powerful tool for maintaining internal stability.

Perhaps the most salient example of these developments is the advent of ubiquitous location-aware technologies. China is the largest mobile phone and Internet user market in the world, with a billion mobile phone users and more than 600 million Internet users.34 In this sense, China is the world’s largest networked society. Navigation-chip enhanced mobile phones are increasingly offering a high degree of detail on individual activities, movement, and even behavior.35 Available technology can predict with 93 percent accuracy the location of an individual after observing only three months of phone-usage patterns.36

The ability to automatically generate “personal maps” could offer society a wide range of beneficial applications in medicine, marketing, and legitimate law enforcement, to name a few.37 But as with most technologies, location-aware technology has dual-use implications. These personal maps can reveal the most salient details of people’s lives—political and religious beliefs, suspicious associations, and undesirable habits. Vehicles are also targets for Beidou receivers. China’s Ministry of Transportation has begun requiring the installation of Beidou receivers in vehicles in many provinces, particularly in those heavily involved in the car manufacturing industry.38

Geospatial data is being developed in concert with imagery surveil-lance technology, including CCTV cameras. Analysts estimate there are now 30 million cameras operating in China—one for every forty-three people—and this figure is expected to grow 20 percent annually over the next five years.39 License plate recognition systems are also ubiquitous. Integrated with navigation and positioning technologies, these systems can effectively place vehicles and individuals under constant surveillance. The potential for navigational omniscience, and the intense degree of personal information collection, provides the government with an unprecedented ability to track, monitor, and ultimately control individuals in society.

In the United States, a legislative and regulatory backlash has begun against the so-called “surveillance state.” For example, it was recently de-cided that a warrant was required to install GPS tracking in vehicles. Other bills are being introduced that limit the state’s free reign to collect geospatial

This pivot to internal social order has profound implications for the government’s ability to monitor, surveil, and ultimately control a large and complicated demography in a vast geographic space.

Page 135: Sais.34.1

136 SAIS Review Winter–Spring 2014

data from cell phones, as well.40 No such law exists in China, although public opposition to personal data collection is growing.41 Considering the legal and civil libertarian track record of the Chinese government, restrictions on such technologies are likely to be weak.

Natural disasters are another phenomenon that threaten the safety and livelihood of Chinese citizens. China has long suffered from floods, droughts, and earthquakes, but the populace’s ability to express their dissatisfaction at the government’s handling of them is something new. During the Wenchuan Earthquake of 2008, many poorly-constructed public schools collapsed, causing the deaths of thousands of children, and sparking national outrage. This event and several following (including the Yushu Earthquake) demonstrated the need for space-based assets (imagery, communication, and navigation-positioning) to effectively manage disaster relief. In fact, the ability of Beidou to relay short messages in the absence of cell phone availability, in addition to its navigational purposes, was appar-ently a great boon to the program.42

Whither Beidou

The Beidou program has slowly but steadily progressed from a secondary consideration in China’s space program to a centerpiece of its National In-novation System goals and a programmatic priority. Certainly, the easing

of national fiscal constraints overall has not hurt Beidou’s prospects. However, its politi-cal and financial support stem from other sources, as well. A robust, indigenous satellite navigation system contributes fundamentally to a number of critical economic, industrial, and science and technology policies, as well as China’s ex-ternal and internal security imperatives.

In this context, how should Beidou be recognized? Will it be a force for advancing social and economic development, and serve as an engine for more efficient industrial, science and technology policies? Or will Beidou primarily operate in a security capacity, assisting the military in managing China’s external threats, and serving as a coercive tool for domestic monitor-ing and control? Naturally, it will serve all of these objectives, though not necessarily in equal measure.

The so-called business model rationale for building Beidou—that it will become commercially and financially viable by capturing a significant portion of the regional and global navigation market—is a strong motiva-tor and prominent in official statements. But this rationale is relatively weak, and justifying Beidou in these terms has been an afterthought. The

A robust, indigenous satellite navigation system contributes fundamentally to a number of critical economic, industrial, and science and technology policies, as well as China’s external and internal security imperatives.

Page 136: Sais.34.1

137ImplIcatIons for the IndIvIdual and the state

program’s initial considerations were primarily strategic in nature, with the business element only arriving much later—not unlike what happened in the United States with the development of GPS. The difference is that GPS is an established program, free of charge, and one that dominates the global and even Asian regional navigation market. The threshold for breaking into this market dominance is formidable, and China’s chances of overcoming this huge disadvantage is highly uncertain. Indeed, few dividends have been seen so far for Beidou. Therefore, the business model rationale should be seen as aspirational in nature, and not a firm justification on which to embark on such an expensive program.

Despite these odds, the program seems to be gaining momentum, strongly suggesting that other motivations are at work, and additional fac-tors suggest the security dimension of Beidou is prominent. The Beidou program, like the majority of China’s space activities, is effectively owned and operated by the military. The General Armaments Department (GAD), a powerful central body in the PLA, is tasked with implementing the mod-ernization of equipment and weapons, of which an important portfolio is the space program. There is a civilian, NASA-like agency in China, the National Space Administration, but it has little say over policy.43 In light of China’s efforts to commercialize Beidou, there are many stakeholders in the Chinese government that participate in designing policy for the program.44 Yet the PLA appears to maintain a nearly exclusive purview over the program through the Beidou Office under GAD.45 A large and strongly-supported—both politically and financially—space program, including Beidou, is not something the PLA is willing to give up.

The military control of the Beidou program does not preclude civilian use. However, given the uncertain prospects for market viability, it seems that security applications are a priority of the program. While Beidou ap-pears to play an increasingly central role in China’s NIS policies, its impact is partial and indirect, while its security applications are direct and criti-cal. Moreover, if the primary purpose of Beidou is commercialization and national innovation, a military-operated management system—particularly in an opaque organization like the PLA—is hardly the best option. To date, policies to develop Beidou have been confusing,46 and the complicated com-mand and control arrangement within the military has severely hampered commercial application in the past.47

As with many technologies, Beidou has the potential for both civilian and military use: It could foster socioeconomic and scientific development in China, or serve as a coercive tool for domestic and international security threats. This combination of potential applications has resulted in strong political and program-matic support. However, given the aggravated state of China’s external and internal security environ-ment, the nature of the program’s governance un-

It could foster socioeconomic and scientific development in China, or serve as a coercive tool for domestic and international security threats.

Page 137: Sais.34.1

138 SAIS Review Winter–Spring 2014

der China’s military, as well as the state of civil legal protections in China, the development of Beidou has disquieting implications for both individu-als within China’s borders and for China’s strategic interaction with other countries.

Notes

1 For civilian signal within region, see “Chinese Beidou Positioning Accuracy Up to Five Meters in the ASEAN Region, Exceeds that of Beijing,” Science and Technology Daily, Dec 29, 2013. For military signal, see Shi Chuang et al, “Precise Orbit Determination of Beidou Satellites with precise positioning,” Science China Earth Sciences, 2012, 55(7): 1079–1086.2 Kevin Pollpeter, Patrick Besha, Alanna Krowlikowski, “The Research, Development, and Acquisition Process for the Beidou Navigation Satellite Programs,” Study of Innovation and Technology in China, Policy Brief, January, 2014. 3 Bo Qingjun, the PLA officer that attended the conference, would later become head of GSD’s Bureau of Surveying and Mapping, one of the principle agencies managing the Bei-dou system. See “探秘中国北斗卫星导航定位系统 [The Quest for China’s Beidou Satellite Navigation and Positioning System],” Newsweek, Jan 1, 2011.4 Liu Huaqing, 刘华清回忆录 [Memoires of Liu Huaqing],” (Beijing: PLA Publishing House, 2004).5 For the Yinhe Incident see, “中國研發“北斗”的真正動力 [The Real Power in China’s Devel-opment of Beidou],” Global Times Network, Dec 31, 2012. For the Taiwan Strait Incident, see, Li Yun, Wu Xu and Gao Chong, “仰望北斗星为梦想导航 [Looking up at Beidou, Dreaming of Navigation],” Xinhua, Nov 9, 2013.6 “中国北斗导航卫星诞生记:研发时曾受经费不足困扰 [Birth of Chinese Beidou Naviga-tion Satellite: Plagued by Lack of Funding in R&D],” China Newsweek Net, Dec 30, 2010.7 Robert B. de Selding, “European Officials Poised to Remove Chinese Payloads from Galileo Stats,” Spacenews.com, March 12, 2010.8 For example, this is articulated by Wang Jisi in “China’s Search for a Grand Strategy,” Foreign Affairs (March/April 2001), pp 68–79.9 For example, see: “国务院办公厅关于印发国家卫星导航产业中长期发展规划的通知 [State Council General Office Notification on the Issuance of the Medium and Long term National Satellite Navigation Industry Development Plan],” October 9 2013, No 97. http://www.gov.cn/zwgk/2013-10/09/content_2502356.htm10 Ibid.11 “美国有GPS,我们有北斗! [The United States has GPS, We have Beidou!],” Guangming Daily, Dec 28, 2011. 12 “北斗”加速织网 明年覆盖亚太 [Beidou Network Accelerated—Will Covering the Asia-Pacific by Next Year],” Beijing Daily, April 11, 2011. 13 “GNSS Market Report: 2013” by European GNSS Agency, see: http://www.gnss.asia/sites/gnss.asia/files/GNSS_Market%20Report_2013.pdf14 “北斗 中国人自己的GPS [Beidou, The Chinese People’s Own GPS],” China Youth Daily, Oct 10, 2012. 15 “The News Release at the Press Conference of the State Council Information Office,” Beidou Satellite Navigation System Website (accessed March 25, 2014) www.beidou.gov.cn16 “北斗发展面临’战国纷争’ [Beidou Development is Facing the ‘Warring States Disputes’],” Dianzi Xinxi Chanyewang, March 20, 2014.17 “北斗产业找不着北?杨元喜:北斗产业园区遍地开花 [Is the Beidou Industry Adrift? Yang Yuanxi: Beidou Industrial Parks Sprouting Everywhere],” People.net, March 7, 2014.18 Patrick Besha, “National, Regional and Sectoral Innovation Systems in China: General Overview and Case Studies of Renewable Energy and Space Technology Sectors,” PhD dis-sertation, George Washington University, submitted May 19, 2013. 19 The 863 Program was established in 1986 to stimulate the development of advanced technologies, one of which was space technologies.

Page 138: Sais.34.1

139ImplIcatIons for the IndIvIdual and the state

20 Sylvia Schwaag Serger and Magnus Breidne, “China’s Fifteen-Year Plan for Science and Technology: An Assessment,” Asia Policy, No 4 (July 2007), 135–164.21 Including high-resolution earth observation systems, manned space flights and the moon probe.22 “国务院关于加快培育和发展战略性新兴产业决定 [Decision to Accelerate the Develop-ment of Strategic Emerging Industries],” 2010, No. 32.23 “China eyes new strategic industries to spur economy.” July 23, 2012. http://www.reuters.com/article/2010/12/03/us-china-economy-investment- idUSTRE6B16U920101203. 24 “国务院办公厅关于印发国家卫星导航产业中长期发展规划的通知 [Medium and Long term National Satellite Navigation Industry Development Plan],” issued by the State Council, 2013, No 97; “北斗卫星导航系统推广应用的若干意见今发布 [Some Opinions on Promoting the Development of Beidou Satellite Navigation Industry],” March 11, 2014, No 1214; and”国务院办公厅关于印发国家卫星导航产业中长期发展规划的通知 [Some Opinions of the State Council on Promoting the Information Consumption to Expand Domestic Demand].”25 Besha, Patrick (2011).”Civil-Military Integration in China: A Techno-Nationalist Approach to National Development.” American Journal of Chinese Studies, 3 26 “北斗卫星导航系统推广应用的若干意见今发布 [Some Opinions on Promoting the De-velopment of Beidou Satellite Navigation Industry]” March 11, 2014. No 1214.27 As a rough comparison, in the 1990s, China launched about sixteen satellites. In the next five years, that number increased to 27. Since then, China has launched well over one hundred.28 Andrew Nathan and Andrew Scobell, China’s Search for Security (Columbia University Press: New York, 2012).29 Eric Hagt and Matt Durnin, “Space: China’s Tactical Frontier,” Journal of Strategic Studies, Vol 34 Issue 5, 2011. 30 Details on satellite launches come from UCS database, updated January 2014. http://www.ucsusa.org. 31 Jonathan Walton, “Intensifying Contradictions: Chinese Policing Enters the 21st Century,” NBR Analysis, Feburary 2013. 32 Willy Lam, “Beijing ‘Weiwen’ Imperatie Steals Thunder at NPC,” China Brief (Jamestown Foundation), Vol 11, Issue 4, March 10, 2011.33 See footnote 27. 34 “China mobile subscribers up 1.1 percent in June to 1.05 billion,” Reuters, July 20, 2012, and “At D11, It’s Clear: China Beats U.S. in Mobile & Internet, Forbes, May 30, 2013. 35 The state and implications of this era of navigational omniscience for individuals is de-scribed in the new book by Hiawatha Bray, You Are Here: From the Compass to GPS, the History and Future of How We Find Ourselves (Basic Books: New York, 2014).36 Chaoming Song et al., “Limits of Predictability in Human Mobility,” Science, February 19, 2010, 1018–1021. 37

38 “交通部:九省区市车辆必须安装北斗导航 [Ministry of Transportation: Nine Provinces, Regions and Municipalities Required to Install Beidou Navigation],” Southern Weekend, Jan 16, 2013.39 Frank Langfitt, “In China Beware: A Camera May be Watching You,” NPR, Jan 29, 2013. 40 For example, the Location Privacy Protection Act, introduced by Senator Al Franken in 2014. 41 “傅蔚冈:安装北斗导航何成’政治任务’?[Fu Weigang: How is Installing [on vehicles] Beidou Navigation a ‘Political Task’?]” Caixin, Jan 16, 2013.42 Authors interviews with space program officials, Beijing, August, 2013. 43 Mark A. Stokes with Dean Cheng, “China’s Evolving Space Capabilities: Implications for U.S. Interests,” prepared for the U.S.-China Economic and Security Review Commission, April 26, 201244 “中国北斗导航卫星诞生记:研发时曾受经费不足困扰 [Birth of Chinese Beidou Navi-gation Satellite: Plagued by Lack of Funding in R&D],” China Newsweek Net, Dec 30, 2010.45 The full name is the China Satellite Navigation Management Office.

Page 139: Sais.34.1

140 SAIS Review Winter–Spring 201446 “北斗产业找不着北?杨元喜:北斗产业园区遍地开花 [Is the Beidou Industry Adrift? Yang Yuanxi: Beidou Industrial Parks Sprouting Everywhere],” People.net, March 7, 2014.47 While GAD directs the program’s launch and on-orbit servicing, the General Staff De-partment manages Beidou application (as well as develop operational requirements for the military) with oversight of much of the ground infrastructure to execute that. The potential for friction between these two departments increases and adds layers of security that will hamper an open-based system necessary for broader use. 席志刚 [Xi Zhigang], “北斗导航系统商业应用面临窘境 [Beidou System Faces Dilemma in Commercial Application],” “凤凰周刊[ Ifeng Weekly],” Issue 13, 2011; and 杨时[Yang Shi], “北斗应用:千亿元的大蛋糕 [Application of Beidou: Cake worth a Hundred Billion Yuan],” 中国新闻周刊 [China Newsweek, Issue 1, 2011.

Page 140: Sais.34.1

141ImplIcatIons for InternatIonal cyber securItySAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

141© 2014 by The Johns Hopkins University Press

Understanding Interconnectivity of the Global Undersea Cable Communications Infrastructure and its Implications for International Cyber Security

Margaret Ross

This paper analyzes the interdependent nature of the global undersea cable communications infrastructure and its implications for international security. The paper concludes that the number of landing points in a country is not a function of miles of coastline, the number of other countries to which the country connects via cable links, nor the number of years the country has been a member of the international undersea cable network. However, the number of landing points correlates with the country’s structural position in the network, as well as variations in hard power and socio-political cohesion (as articulated by Barry Buzan’s framework). Both political and economic motivations discourage diversifying the number of landing points. Finally, this paper discusses two potential implications. First, countries that might seek to control citizens’ access to information by limiting the number of landing points increase the risk to the global undersea communications infrastructure. Second, countries with fewer landing points and high betweenness put the cable infrastruc-ture at higher risk than similar scoring countries with more landing points.

Introduction

Underwater sea cables form the physical backbone of intercontinental communications connectivity. The importance of this aspect of the

global telecommunications infrastructure to the international economy well predates the coining of the term cyberspace and the United States govern-ment’s conception of the global information infrastructure as part of a new cyber “domain” to be secured. Cyberspace refers to the places where digital information is created and transmitted that fuel the global economy.1 Soon after the first trans-Atlantic telegraphic cable was laid in 1858, the technol-ogy enabled European imperialism to reach all corners of the globe. The three forms of submarine communications (telegraph, telephone, and most recently fiber-optic) have come to underpin the global economy.2

Margaret Ross is a second-year M.A. candidate at the Johns Hopkins Paul H. Nitze School of Advanced International Studies (SAIS) concentrating in Strategic Studies.

Page 141: Sais.34.1

142 SAIS Review Winter–Spring 2014

As of 2006, the president of the United States’ National Security Tele-communications Advisory Committee (NSTAC) estimated that undersea cables carried 95 percent of international communications traffic, including internet traffic. The rest transited over satellite. By 2010, the Institute of Electrical and Electronics Engineers (IEEE) estimated that percentage had increased to over 99 percent. Governments, along with emergency services and law enforcement, rely heavily on the smooth functioning of this com-munications infrastructure, especially in the event of a terrorist attack or major natural disaster. The financial services sector’s reliance on this infra-structure has also grown. The Society for Worldwide Interbank Financial Telecommunication (SWIFT) reported that, in 2004, it traded 7.4 trillion U.S. dollars and processed 9 million messages per day between 208 differ-ent countries across undersea fiber-optic cables. By 2010, that number had reached 15 million messages per day.3

For a number of reasons fiber-optic cables have supplanted satellites as the international communications mechanism of choice. The first modern fiber-optic cables were installed in 1988. The development of these cables was spurred by the inadequacies of satellites, which include longer latency time (the amount of time it takes for a message to traverse from source to desti-nation), which interferes with the timing of voice conversations, as well as more erratic transmission quality, and a lack of security.4 More importantly, satellites make a poor back-up to today’s cable infrastructure, because in the event of catastrophic failure, total satellite capacity would be two orders of magnitude smaller than the amount of traffic currently traversing cables.5

Threats to the Cable Network

Landing points, the locations where undersea cables make landfall, provide a gateway between the underwater cable infrastructure and land-over-land communication networks. Undersea cables tend to converge at relatively few landing points, which increase the cables’ subjectivity to environmental threats, such as earthquakes, and man-made threats, both accidental and deliberate. These manmade threats include shipping activity (such as the dragging of fishing nets or anchors along the ocean floor where cables lie), theft of precious metals (criminals often mistake fiber-optic cables for the older copper cables), piracy targeted at cable technicians and repair boats, and deliberate sabotage or cutting of cables, whether by common criminals or terrorists.

The worst outages to date include cables deliberately cut in the Medi-terranean in 2008 and earthquake damage in the East China Sea in 2006 and 2009. While some of these outages affected hundreds of millions of customers, in most cases, Internet traffic was impaired rather than made completely unavailable, although full restoration of services took months for some customers. However, this does not rule out a catastrophic scenario caused by a more orchestrated and targeted attack that could strain the system and cause severe economic damage, especially to smaller countries. For example, in 2009 vandals severed terrestrial (not undersea) cables in San Francisco and managed to cut off phone and Internet service to thousands

Page 142: Sais.34.1

143ImplIcatIons for InternatIonal cyber securIty

of customers in the Bay Area for hours. The severity of the outage indicated that the particular cables cut were likely deliberately chosen.6

A particular vulnerability of the undersea infrastructure is the lack of physical diversity in the network due to the propensity to re-use the same geographic corridors to lay cable and for cables to converge at relatively few landing sites. Because most of the undersea cable infrastructure is owned by the private sector, there are economic disincentives for companies to invest in the exploration, stand up, and maintenance of new routes and landing stations, to file new permits, and to scout for suitable beach and shallow water topography. Most importantly, companies prefer to re-use existing sites and routes because the cost of such sites is already known.7

From the public sector perspective, there is an inherent policy trade-off with respect to physical diversity. Re-use of existing paths facilitates national and international efforts to designate certain areas as cable protec-tion zones; however, these areas then risk becoming single points of failure.8 For example, a major earthquake off the coast of Taiwan in 2006 triggered an underwater mudslide that caused an unprecedented nine simultaneous cable failures, impairing Internet access in seven East Asian countries that took seven weeks to repair.9

Securitizing an Interdependent Network

The global undersea cable communications infrastructure exhibits interde-pendent characteristics because no single actor has ownership over the entire network; cable routes are shared between consortiums of companies; its jurisdiction falls under both national and international laws; and coordina-tion between companies and countries is necessary to ensure speedy repairs when outages occur. The global network can be considered interdependent because it indirectly connects all actors with all others and because there are externalities associated with security or lack of security.10

The undersea cable infrastructure demonstrates some of the chal-lenges concealed within the concept of international security. Barry Buzan begins his book People, States, and Fear with the observation that “the logic of security almost always involves high levels of interdependence among the actors trying to make themselves secure.”11 Security is about the pursuit of freedom from threat, but to successfully “securitize” an issue requires two components.12 First, a case must be made that some threat actor (such as terrorists, criminals, pirates, and negligent actors) poses an existential threat to the referent object that requires security, in this case undersea cable communications infrastructure. Second, a relevant audience, in this case the stakeholders of that infrastructure (such as governments and their constituents), must be convinced that protection of that infrastructure war-rants special action outside the bounds of existing political procedure (such as special treaties and subsidizing of private industry). However, the idea of “securitizing” undersea cable infrastructure, which is both international and interdependent in nature, is fraught with immediate contradictions because the problem, according to Buzan, is that security is both a contested and inter-subjective concept about which nation states differ greatly. In

Page 143: Sais.34.1

144 SAIS Review Winter–Spring 2014

particular, states vary by two major qualities: their overt power, such as a state’s military and economic capabilities, and their socio-political cohesion, which can be roughly viewed as a state’s domestic stability.13 Buzan lists six factors that might affect a state’s degree of socio-political cohesion: political violence, intrusion of state’s police into the political lives of its citizenry, ideological conflicts over organization of the state, presence of contend-ing national identities, lack of a clear hierarchy of political authority, and state control of media. The primary distinction between states with weak and strong socio-political cohesion is that in addition to perceived external threats, weakly cohesive states feel threatened by potential internal interfer-ence from contending ideas and domestic groups, while strongly cohesive states’ conceptions of national security are mostly concerned with external interference.14

Globalized telecommunications infrastructure places weakly cohesive states in a dilemma. While these states require this infrastructure for their economic growth, they may simultaneously view this infrastructure as a po-

tential threat to government au-thority and a ref-erent object to be secured against the free expres-sion of ideas and i n f o r m a t i o n . Weakly cohesive

states might also feel vulnerable to the potential influence that can be ex-ercised over contending domestic groups by external political actors and thus may seek to censor their citizens’ access to outside information.15 For example, countries such as China impose a strict filtering regime on content flowing in and out of the country.16

The hypothesis of this paper is that weakly cohesive states will seek to regulate their population’s use of the Internet and access to external infor-mation. Therefore, these states will be more likely to possess fewer numbers of cable landing points than strongly cohesive states, which would facilitate such regulation and censorship.

Research Design and Data

The remainder of this paper explores the properties of the global undersea cable communications infrastructure and which factors—geography, eco-nomic resources, military capability, or level of political freedoms—corre-late most closely with the number of cable landing points countries might choose to have. The data shows that diversification of landing points is not likely a function of the length of a country’s coastline, the number of connections to other countries, the number of years a country has been a member of the undersea cable network, nor a function of a government’s political stability. However, a country’s economic resources and some of the measures for levels of political rights and civil liberties do correlate signifi-

While these states require this infrastructure for their economic growth, they may simultaneously view this infrastructure as a potential threat to government authority.

Page 144: Sais.34.1

145ImplIcatIons for InternatIonal cyber securIty

cantly with number of landing points. Additionally, an important network property known as betweenness centrality, which measures the extent to which an actor lies on the shortest path between other pairs of actors in the network, also significantly correlates with the number of landing points.

The data for this analysis was drawn from publicly available infor-mation on 166 countries, territories, and some sub-state entities, such as Alaska and Hawaii, that are geographically distinct from their country’s mainland; 265 international cable links that were ready-for-service between 1989 and 2013; and the 686 landing points where these links make land-fall.17 Attributes were also gathered for 115 countries, including the number of landing points per country (TeleGeography 2014), the number of years of membership in the network (calculated using TeleGeography’s data),18 2012 military spending (Stockholm International Peace Research Institute 2013),19 2011 GDP per capita (World Bank 2014), kilometers of coastline (CIA World Factbook 2013),20 perceptions of political rights and civil liber-ties (Freedom House Index 2013 and World Governance Indicator for Voice and Accountability Indicator 2013), 21 perceptions of political stability of government (WGI for Political Stability and Absence of Violence 2013), 22 and the degree and betweenness centrality of each country (UCInet Soft-ware 2002). An additional attribute for level of internet and digital media freedom was obtained for forty-six countries in the dataset (Freedom House Freedom of the Net 2013).23

Military spending and GDP per capita were used to approximate Bu-zan’s concept of overt power. The Freedom House and World Governance Indicators were used to approximate Buzan’s concepts of socio-political cohesion. Length of coastline, number of years in network, and number of connections to other countries serve as control variables in the regression for other potential factors that might determine numbers of landing points.

Network Data

In addition to the internal attributes of each country, the TeleGeography data was converted into network data in which countries are connected by their cable links. Cables that make landfall at several points are broken into separate segments in the dataset. Countries share a link if each has a land-ing point connecting to an uninterrupted segment of cable. This network is both a physical representation of cable infrastructure shaped by limita-tions of geography and environment, as well as a social representation of the configuration of countries and the political and economic considerations that have connected them together.

Figure 1 displays the global network with entities sized by their num-ber of landing points. By visual inspection it is apparent that the number of landing points for each entity is unrelated to the number of connections it has to other entities in the network (its degree centrality). Globally, the median number of landing points of those countries with at least one in-ternational cable is two. An entity’s position in this network carries certain implications. For example, the more diverse an entity’s connections, the more insulated it is from the threat of being cut off from global communi-cations, whether accidentally or deliberately.

Page 145: Sais.34.1

146 SAIS Review Winter–Spring 2014

It is important to note that these networks represent snapshots in time in what is a very dynamic system. For example, the number of countries linked by undersea cables more than tripled between 1979 and

2005, including the addition of twenty-eight countries newly added to the network since 1999. In 1979, countries across several continents occupied the top ten spots in bandwidth. Now, eight of those top ten countries are in

Asia.24 Structural positions in the network can change significantly with the addition of a few new cables. For example, the addition of a cable from an East Asian country to a Latin American country would greatly reduce the betweenness score of the United States. As of January 2014, TeleGeography listed twenty cables that would be ready for service sometime in 2014 and 2015. These cables were not included in this analysis since they have not yet been completed.

Research Findings

A multiple regression analysis (see Appendix) evaluated whether the inde-pendent variables—kilometers of coastline, years of network membership, degree centrality (number of connections to other countries), between-ness centrality, log of military spending, GDP per capita, and stability and political rights measures—had a significant relationship to the dependent variable, number of landing points. All three approximations for aspects of Buzan’s concepts of socio-political cohesion, the FH Index, WGI’s Voice and Accountability Indicator, and WGI’s Political Stability Indicator, indi-vidually exhibit a significant relationship with number of landings points at the .05 alpha threshold. However, when combined in a multiple regres-sion with degree and betweenness centrality, military spending, GDP per capita, kilometers of coastline, and years of network membership, they are no longer significant. Military spending is also only significant when run in a simple regression on number of landing points, but loses its significance when other variables are added. Coastline, years of network membership, and degree centrality are never significant. The FH Index and the two WGI indicators were substituted individually in three separate multiple regres-sions, since they are highly correlated to each other. In all three regressions, the t-statistics for betweenness are significant at the .001 level, while in two of the regressions t-statistics for GDP per capita are significant at the .05 level. Degree centrality, military spending, kilometers of coastline, and age of membership were not significant. Betweenness and GDP per capita both have a positive correlation with numbers of landing points, leading to the interpretation that those countries that are structurally more central to the network and which have greater economic resources also tend to have greater numbers of landing points. It is important to note that this relation-ship is merely a correlation, not an indication of causation.

. . . these networks represent snapshots in time in what is a very dynamic system.

Page 146: Sais.34.1

147ImplIcatIons for InternatIonal cyber securIty

Fig

ure

1. C

omp

lete

Net

wor

k D

iagr

am S

ized

by

Lan

din

g P

oin

ts

Page 147: Sais.34.1

148 SAIS Review Winter–Spring 2014

For the forty-six countries in the dataset that also had a Freedom of Net index, a separate simple linear regression was run with number of land-ing points as the dependent variable. The test produced a significant p-value at the .05 threshold, indicating a negatively significant relationship between levels of internet freedoms and number of landing points. The lower the Freedom of Net score, the higher the level of internet freedom, so a negative relationship here implies that high levels of internet freedom are associated with high numbers of landing points.

Interpretation

For the relationship between the stability and freedom indicators and landing points, the regression analyses yield inconclusive results. There is a possible simultaneity issue, which occurs when two variables—in this case, number of landing points and the various stability and freedom in-dices—influence each other simultaneously. Under this logic, the greater the number of landing points, the greater a citizen’s potential ability to access external information, which may correspond to a greater propensity to openness towards external information. In the reverse direction, greater levels of political freedoms and civil liberties and greater government sta-bility may intrinsically be linked with openness and greater demand for a robust information infrastructure which would tend towards higher num-bers of landing points. The other issue is that these indices have relatively strong correlations to GDP per capita and GDP per capita seems to bear the stronger relationship to number of landing points. Thus, the stability and freedom indices’ relationship to landing points is dampened when the model controls for GDP per capita.

A set of potentially more problematic methodological issues pertains to the indices themselves. First, the FH Index, its Freedom of the Net In-dex, and the WGI Indicators are perception-based measures, not measures of actual levels of stability and political and civil liberties. This gives these indices a subjective quality that leaves them potentially subject to issues of groupthink and false consensus. Second, these variables serve as proxies for concepts that are not easily observable and measurable. The WGI Indica-tors are aggregated from many sources and then categorized into one of six governance groupings, two of which are used in this analysis. The connec-tions between WGI’s sources and the constructs they purport to measure have yet to be validated.25

An important network property known as betweenness centrality significantly correlates with the number of landing points in the multiple regression analyses. An actor has high betweenness to the extent that it lies on the shortest path between other pairs of actors in the network. As-suming the shortest cable paths between any two countries represent the most efficient transit path of information, high betweenness countries are structurally important to the functioning of the entire network. A severe cable outage in a high betweenness country would affect other countries in the network more than the same outage in a low betweenness country.26

Page 148: Sais.34.1

149ImplIcatIons for InternatIonal cyber securIty

However, the network disruption potential that a high betweenness country might exhibit also depends on how easily communications traffic could be re-routed or how easily new ties could be created. Countries can have low betweenness scores for two reasons. Either the country does not lie on the shortest paths of any other country pairs or the country is connected to other countries that are themselves directly connected.27

Figure 2 is a network diagram showing entities sized by betweenness with cable ties to ten or more other entities in the global network. The betweenness scores range from 0 to 6,693 and the United States, Italy, South Africa, Egypt, Portugal, United Kingdom, India, and France—listed in descending order—each have betweenness scores greater than 1,000 in the complete network. The United States has the highest betweenness in the network because it connects two major groupings of countries in Latin America and East Asia that are themselves not directly connected.

The tendency indicated by the regression analyses is for the higher scoring betweenness countries to have higher numbers of landing points, a characteristic that should strengthen the network. How-ever, there are a few countries in key structural positions that possess fewer numbers of land-ing points. Countries with lower numbers of landing points but that have high betweenness put the network at higher risk than countries with higher numbers of landing points and the same level of betweenness. These high-between-ness, fewer-landing point countries create choke points that pose a potential risk to the entire network.

Figure 3 is a scatter plot showing betweenness on the x-axis and number of landing points on the y-axis. Countries with betweenness scores greater than 1,000 are labeled. Four countries, Egypt, India, Portugal, and South Africa, have relatively high betweenness, but fewer than ten landing points each. Of this group of four, India had the highest total bandwidth in service as of 2005, meaning the largest amount of information transiting its cables.28

Conclusion

This paper proposed several measures by which the security of the global undersea cable communications infrastructure could be evaluated. It found that amount of coastline, number of years in the network, and number of connections to other countries did not correlate significantly with the dependent variable, number of landing points. The relationship was incon-clusive for some levels of overt power, such as military spending, and certain measures of stability and political freedom. Betweenness, GDP per capita, and level of internet freedom did correlate significantly with the dependent variable. However, because, the Freedom of the Net index is available for just

T h e s e h i g h - b e t w e e n n e s s , fewer-landing point countries create choke points that pose a potential risk to the entire network.

Page 149: Sais.34.1

150 SAIS Review Winter–Spring 2014F

igu

re 2

. En

titi

es w

ith

Ten

or

Cab

le M

ore

Tie

s Si

zed

by

Bet

wee

nn

ess

Page 150: Sais.34.1

151ImplIcatIons for InternatIonal cyber securIty

Fig

ure

3. B

etw

een

nes

s of

In

tern

atio

nal

Lan

din

g C

onn

ecti

ons

by

Nu

mb

er o

f L

and

ing

Poi

nts

Page 151: Sais.34.1

152 SAIS Review Winter–Spring 2014

forty-six countries evaluated in the global cable network, this is an area for future evaluation.

While Buzan’s framework signals that weak and strong states (whether measured by overt power or socio-political cohesion) will likely differ over what constitutes “securitizing” this interdependent network, this paper does not demonstrate definitively that weak and strong states approach the issue of landing points differently. The increasing interdependency of the global telecommunications system means that it would be difficult for a country in the modern age to deliberately disrupt another’s access to global com-munication without impairing its own. However, this analysis shows that structural properties of the network mean that not all countries are equally insulated against this threat.

Page 152: Sais.34.1

153ImplIcatIons for InternatIonal cyber securIty

Ap

pen

dix

: Reg

ress

ion

Res

ult

s

(1

) (2

) (3

) (4

) (5

) (6

) (7

)

lan

din

gs

lan

din

gs

lan

din

gs

lan

din

gs

lan

din

gs

lan

din

gs

lan

din

gs

WG

Ivoi

ce

2.66

5***

0.

823

(4.1

9)

(1.1

9)

WG

Ista

bili

ty

1.

572*

-0

.113

(2.4

1)

(-0.

16)

FHin

dex

-1.1

96**

*

-0

.340

(-

3.44

)

(-

1.01

)de

gree

0.08

23

0.06

71

0.08

35

(0

.82)

(0

.67)

(0

.83)

betw

een

nes

s

0.00

393*

**

0.00

420*

**

0.00

393*

**

(4

.40)

(4

.80)

*

4.34

log1

0mil

it~

y

1.00

3 0.

679

0.95

8

(1

.36)

(0

.87)

(1

.30)

gdpp

erca

pita

.000

0637

0.

0000

853*

0.

0000

723*

(1

.98)

(2

.30)

*

(2

.41)

coas

tlin

e

0.

0000

281

0.00

0033

0 0.

0000

291

(1

.16)

(1

.36)

(1

.20)

age

0.07

04

0.12

6 0.

0819

(0

.60)

(1

.11)

(0

.70)

FHn

et

-0

.136

*

(-

2.19

)

_con

s

4.82

0***

5.

066*

**

8.66

0***

-2

.418

-2

.613

-1

.521

12

.83*

**

(8.0

6)

(7.9

5)

(6.7

7)

(-1.

37)

(-1.

47)

(-0.

73)

(4.2

4)

N

11

3 11

3 11

3 11

0 11

0 11

0 46

adj.

R-s

q

0.12

9 0.

041

0.08

8 0.

499

0.49

2 0.

497

0.07

7

Page 153: Sais.34.1

Notes

1 Daniel T. Kuehl, “From Cyberspace to Cyberpower: Defining the Problem,” in Cyberpower and National Security, ed. Franklin D. Kramer, Stuart H. Starr and Larry K. Wentz, First Edi-tion ed. (Dulles, Virginia: Potomac Books, Inc., 2009).2 Peter J. Hugill, “The Geopolitical Implications of Communications Under the Seas,” in Communications Under the Seas: The Evolving Cable Network and its Implications, ed. Bernard S. Finn and Daqing Yang, (Cambridge, Mass.: MIT Press, 2009), 257. 3 Karl Frederick Rauscher, Proceedings of the Reliability of Global Undersea Cable Communications Infrastructure Study and Global Summit: The ROGUCCI Study and Global Summit Report: IEEE Communications Society (2010), 179.4 Jeff Hecht, “Fiber-Optic Submarine Cables: Covering the Ocean Floor with Glass,” in Com-munications Under the Seas: The Evolving Cable Network and its Implications, ed. Bernard S. Finn and Daqing Yang, (Cambridge, Mass.: MIT Press, 2009), 46.5 Karl Frederick Rauscher, Proceedings of the Reliability of Global Undersea Cable Communications Infrastructure Study and Global Summit: The ROGUCCI Study and Global Summit Report: IEEE Communications Society (2010), 137.6 Marguerite Reardon, “How Secure is the US Communications Network?” Cnet, April 13, 2009. 7 Karl Frederick Rauscher, Proceedings of the Reliability of Global Undersea Cable Communications Infrastructure Study and Global Summit: The ROGUCCI Study and Global Summit Report: IEEE Communications Society (2010), 102. and Jim Bishop and John Walker, “There is no Safety in Numbers: The Security Issues of Multiple Cable Landings,” Submarine Telecoms Forum, (2006): 16-18. 8 Karl Frederick Rauscher, Proceedings of the Reliability of Global Undersea Cable Communications Infrastructure Study and Global Summit: The ROGUCCI Study and Global Summit Report: IEEE Communications Society (2010), 72.9 Ibid, 172. 10 Stephen P. Borgatti and Xun Li, “On Social Network Analysis in a Supply Chain Context,” Journal of Supply Chain Management 45 no.2 (2009): 5. and Howard Kunreuther and Geoffrey Heal, “Interdependent Security,” Journal of Risk and Uncertainty 26 no.2-3 (2003): 231-249.11 Barry Buzan, People, States, and Fear: An Agenda for International Security Studies in the Post-Cold War Era, 2nd ed., (Boulder, Colorado: L. Rienner, 1991). 12 Ibid, 18.13 Ibid, 100.14 Forrest Hare, “The Cyber Threat to National Security: Why Can’t We Agree?,” (paper presented at the Conference on Cyber Conflict, Tallinn, Estonia, 2010).15 Ibid, 106.16 Edward J. Malecki and Hu Wei, “A Wired World: The Evolving Geography of Submarine Cables and the Shift to Asia,” Annals of the Association of American Geographers 99 no.2 (2009): 360-382. 17 Cable data was obtained from TeleGeography, a telecommunications market research and consulting firm. Its submarine cable map is free and available at http://www.telegeography.com/telecom-resources/submarine-cable-map/index.html.18 While some countries, such as the U.K. and U.S., have had underwater cable connections for far longer than since 1989, because of incomplete data, the “age” of country membership in this network only extends to twenty-five years and corresponds with the oldest cables in the data which date back only to 1989. 19 SIPRI provided 2012 military spending in terms of 2011 U.S. dollars. It was then logged for the purpose of this analysis. http://www.sipri.org/research/armaments/milex/milex_da-tabase20 Central Intelligence Agency, CIA World Factbook 2013, https://www.cia.gov/library/publications/the-world-factbook/fields/2060.html21 Freedom House describes itself as independent watchdog organization dedicated to the expansion of freedom around the world. It produces an annual comparative assessment

Page 154: Sais.34.1

155ImplIcatIons for InternatIonal cyber securIty

of political rights and civil liberties that covers 195 countries and fourteen related and disputed territories. www.freedomhouse.org/report/freedom-world/freedom-world-2013#.U1A3vhZM9UQ22 The World Governance Indicators are produced by The World Bank Development Research Group. They are aggregate indicators that measure six dimensions of governance for over 200 countries and territories. The Voice and Accountability Indicator measures the percep-tions of the extent to which a country’s citizens are able to participate in selecting their government, as well as freedom of expression, freedom of association, and a free media. The Political Stability Indicator measures perceptions of the likelihood that the govern-ment will be destabilized or overthrown by unconstitutional or violent means, including politically motivated violence and terrorism. http://info.worldbank.org/governance/wgi/index.aspx#home23 Since 2009, Freedom House has produced Freedom of the Net, which is an assessment of the degree of Internet and digital freedom around the world, now covering sixty coun-tries. Data is accessible at www.freedomhouse.org/sites/default/files/resources/FOTN%202013_Charts%20and%20Graphs_Global%20Scores.pdf24 Edward J. Malecki and Hu Wei, “A Wired World: The Evolving Geography of Submarine Cables and the Shift to Asia,” Annals of the Association of American Geographers 99 no.2 (2009): 360-382. 25 Melissa A. Thomas, “What Do the Worldwide Governance Indicators Measure?” European Journal of Development Research 22 (2010): 31-54.26 Stephen P. Borgatti and Xun Li, “On Social Network Analysis in a Supply Chain Context,” Journal of Supply Chain Management 45 no.2 (2009): 5, 11. 27 Stephen P. Borgatti, M. G. Everett, and Jeffrey C. Johnson, Analyzing Social Networks (Los Angeles: SAGE Publications Ltd., 2013).28 Edward J. Malecki and Hu Wei, “A Wired World: The Evolving Geography of Submarine Cables and the Shift to Asia,” Annals of the Association of American Geographers 99 no.2 (2009): 360-382.

Page 155: Sais.34.1

157Book Review Policymaking in the eRa of Big Data

A Dangerous Trade-off : Policymaking in the Era of

Big Data

Ilaria Mazzocco

Viktor Mayer-Schönberger and Kenneth Cukier, Big Data: A Revolution That Will Transform How We Live, Work, and Think (Eamon Dolan/Houghton Mifflin Harcourt, 2013), 256 pp.

One of the most astonishing aspects of Edward Snowden’s revelations and the consequent congressional hearings regarding the NSA’s data collec-

tion activities was the sheer size of the program. While the notion of living in an increasingly connected and digitalized world is widely recognized, it is easy to overlook how this state of constant communication is shaping business and government and what this means for our society as a whole. Viktor Mayer-Schönberger, professor of internet governance and regulation at the Oxford Internet Institute at the University of Oxford, and Kenneth Cukier, Data Editor for The Economist, are certain that we live on the cusp of the next paradigm shift. With this in mind, the authors of Big Data: A Revolution That Will Transform How We Live, Work, and Think set out to build a frame-work for understanding how technology allows us to process increasingly large amounts of information, how we can use this information, and why big data may be the biggest thing since the printing press.

At the heart of the book is the idea that we are seeing a qualitative shift in knowledge as we increase the size of the information we process. Sampling was a revolutionary instrument, but in many fields it may outlive its usefulness as we now have the opportunity to work with samples so large they come close to including the entire statistical population. The authors make a strong case for building a new methodology for the big data

Ilaria Mazzocco is a first-year M.A. candidate at the Johns Hopkins University Paul H. Nitze School of Advanced International Studies (SAIS) concentrating in China Studies. She is an Assistant Editor for The SAIS Review.

Sampling was a revolutionary instrument, but in many fields it may outlive its usefulness as we now have the opportunity to work with samples so large they come close to including the entire statistical population.

SAIS Review vol. XXIII, no. 2 (Summer–Fall 2003)

157

SAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

© 2014 by The Johns Hopkins University Press

Page 156: Sais.34.1

158 SAIS Review Winter–Spring 2014

universe. Thanks to several convincing examples, largely from the business sector, Mayer-Schönberger and Cukier argue that, in the future, knowledge must rely less on causal links and more on imprecise data collected from an increasingly large number of sources.

It is counterintuitive that one of the greatest achievements of modern science—precision—is what we must give up in order to achieve more accurate results in the future. Big data, the authors argue, is inherently messy. The crux of their argument holds that approximation on such a massive scale produces reasonably reliable results, and often does so without the need to create models as complex as those required for smaller samples. Examples ranging from gram-mar correction software to credit analysis indicate that the ability to prioritize quantity over quality can make a big difference. Companies that embrace using data of dubious quality in order to achieve massive scale are more nimble and quick—crucial elements in today’s business world.

Logic may be one of the most powerful tools to be developed by the hu-man mind, yet to use big data we may need to leave logic behind. One of the

main consequences of practicing analysis of big data is that we can learn relatively little about causation even as we are able to make predic-tions more accurately than ever before. Com-panies like Walmart, Target, and Amazon have successfully found patterns in their consumers’ behavior that enable them to make smarter marketing choices. These patterns do not mean these companies need, or even have the ability, to understand why women purchase unscented lotion during their third month of pregnancy,

but it allows them to target the right customers. Mayer-Schönberger and Cukier believe that in a world where correlation can predict the next flu outbreak by

tracking Google searches, causal links are no longer as important—and may even be misleading. The idea that causal links are becoming obsolete is one of the more significant and controversial points in the authors’ argument.

While companies may be able to part with causality as they build more effective marketing strategies, it is problematic to encourage this approach

among policymakers. If society does not develop the instruments to understand why a child performs poorly in school, there is little to gain from knowing which student will start falling behind first. Moreover, as policymakers rely increas-ingly on correlation in social matters, it will become harder to avoid profiling.

The authors argue that big data will allow policymakers to conduct more accurate predictions that will avoid profiling such broad—and stereotyped—categories. Nonetheless, it is hard to forget how the Bloomberg administration in New York City, lauded by the authors for its forward-looking use of data,

Logic may be one of the most powerful tools to be developed by the human mind, yet to use big data we may need to leave logic behind.

The idea that causal links are becoming obsolete is one of the more significant and controversial points in the authors’ argument.

Page 157: Sais.34.1

159Book Review Policymaking in the eRa of Big Data

was widely criticized by the public for its heavy-handed crime fighting meth-ods. In a world where the reason why one commits a crime matters less then when the crime was committed, there is a real risk that the structural causes of issues addressed through big data might remain unaddressed. While the authors recognize the risks of relying exclusively on data, they are able to offer only limited solutions.

The third consequence of relying on big data is that data begets data. The growing data-related industry is increasingly divided among those who collect the data, those who analyze it, and those who know how to use it. This is changing the playing field in business and providing exciting new opportu-nities that forward-looking companies like Google are especially well-poised to take advantage of. With every action we take, whether visiting a hospital, transcribing digital text on re-captcha, or texting a friend, we are constantly producing more data and allowing for more correlations, and thus predictions, to be made. The ownership of this data and how it will be used may give rise to the next great monopoly or dictatorship.

The word “data” derives from the plural form of the Latin word “datum,” which loosely translates as “something that is given.” There is some irony to be found in the etymology of the word as users are finding it increasingly difficult to monitor where and what is given and for what purpose. Mayer-Schönberger and Cukier propose moving toward an auditing-style system to monitor data usage and shifting the responsibility to companies to evaluate whether they are managing user information responsibly. The authors are also careful to warn against using big data in ways that may affect individuals’ rights. You might have a pre-disposition to commit a crime, but this information should never infringe on your right for a trial based on facts. These are good ideas, but are they enough?

You could make the argument that some of the United States’ legal frameworks are outdated when it comes to the risks posed by technology. From identity thefts associated with the use of social security numbers online to privacy statements, these instruments must be reviewed. Yet as the authors themselves recognize, it is difficult to predict where technology will lead us. In many ways, it is easier to evaluate the positive and negative outcomes of relying on data for companies, where profits are the clear objective. In government in-stitutions, where citizens’ lives and their environment are at stake, this clear-cut goal is harder to identify, let alone achieve. As the NSA data collection program illustrates, big data is changing more than just our book-purchasing behavior. In the evolving context of how big data can be used, Mayer-Schönberger and Cukier’s overview provides a highly accessible and useful framework that can and should be used to discuss the implications of big data for society.

Page 158: Sais.34.1

161Book Review A PRoPhet foR the DigitAl heRetics

A Prophet for the Digital Heretics: Evgeny Morozov’s Quest to

Debunk Silicon Valley Solutionism

Bartholomew Thanhauser

Evgeny Morozov, To Save Everything, Click Here: The Folly of Technological Solutionism (Public Affairs, 2013), 432 pp.

Is it possible to be addicted to the Internet? Evgeny Morozov seems to think so. His laptop has an easily removable Wi-Fi card and his home has a safe

with a timed combination lock into which he can throw all Internet enablers: phone, Wi-Fi card, router cable. But even that is not enough. Necessity breeds innovation, and Morozov has found a way to use screwdrivers to pry the lock apart. Now, the screwdrivers go into the safe, as well.

On the surface, these probably seem like the actions of an eccentric, albeit highly determined, man. And to those who believe the Internet is an inherently positive force, Morozov probably is exactly that. However, to those who take a more critical view of the Internet, and more broadly, our society’s discourses on technology, Evgeny Morozov and his book, To Save Everything, Click Here, offer a powerful, paradigm-shifting rebuttal. Although heavy-handed at times, Morozov provides an intelligent shot of cynicism into the way we view, use, and discuss technology.

Morozov, a self-described “digital her-etic,”1 offers two main critiques to articulate his worldview. The first critique takes aim at what he calls Silicon Valley’s “solutionism,” which is the idea that technology, or more specifically, the Internet, can be the solution to all the world’s problems. Global-warming? Ruthless autocrats? Endangered naked mole rats? Don’t worry, there’s an app for all of these problems, and Facebook, Google, and others are building it. Morozov’s second critique is of solutionism’s spawn—“Internet-centrism”—which is the inescapable (and often detrimental) centrality that Silicon Valley solutionists attach to the Internet. The Internet is the square-

Bartholomew Thanhauser is a first-year M.A. candidate at the John Hopkins University School of Advanced International Studies (SAIS) concentrating in Southeast Asian stud-ies. He is an Assistant Editor for The SAIS Review.

The Internet is the square-peg-in-the-round-hole that Silicon Valley uses in its endeavors to explain and improve everything.

SAIS Review vol. XXIII, no. 2 (Summer–Fall 2003)

161

SAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)

© 2014 by The Johns Hopkins University Press

Page 159: Sais.34.1

162 SAIS Review Winter–Spring 2014

peg-in-the-round-hole that Silicon Valley uses in its endeavors to explain and improve everything.

To Morozov, solutionism and Internet-centrism’s “promise of eternal amelioration” is a “digital straightjacket.”2 Not only does he reject the belief that the two are capable of solving the world’s problems, but he sees this very mindset as dangerous. It warps the role of technology in our lives and re-distributes power in potentially negative ways. It is a mindset that has al-lowed Silicon Valley to co-opt morality and provide solutions to problems that don’t necessarily exist. Its goal of a “frictionless,” perfectly efficient society is as quixotic as it is harmful; as Morozov writes, “sometimes, imperfect is good enough; sometimes, it’s much better than perfect.”3

Morozov is no Luddite; he does not advocate readers lock their electron-ics in a safe like he does, much less throw them away. His qualms are not with technology, but with solutionism. He recognizes that technology is often a positive force. It is only in casting it as a panacea that our intellectual curiosity deadens, our agency weakens, and our understanding of the human condition becomes tenuous. To Morozov, the Internet is not a solution, but a series of tubes and wires.

This is a powerful statement, and Morozov gives it further potency by grounding it in history. He argues that, although the Internet has changed our lives in numerous ways, the belief that it will transform us into different

humans living in different societies is a kind of “geek creation myth.” He com-pares the Internet to past inventions, noting that “al-most every new invention is met with great expecta-tions that it will promote human understanding.”4 Paved roads, telegraphs, radios, and televisions were all predicted to erase cultural differences, end

conflict, and eliminate human misunderstandings—and none lived up to these monumental predictions. He uses the term “epochalism” (one of the seemingly hundreds of neologisms in the book) to describe the radical proclamations that follow each new invention, and he views Silicon Valley’s idolization of the Internet as simply the most recent incarnation of epochalism.

Furthermore, he charges that this current epochalism fuels misconcep-tions about the Internet. One by one, he works to discredit each misconcep-tion. He argues that the Internet is neither natural nor neutral. Twitter’s “trending” and Google’s autocomplete functions are not mirrors that reflect reality as their creators claim; they are manipulable algorithms that actively shape reality. Morozov even takes on popular ideas like “transparency” and “openness,” arguing that despite the hagiographic public support they enjoy, both can be detrimental to society. Calling himself an “Internet realist,”5 he

He uses the term “epochalism” (one of the seemingly hundreds of neologisms in the book) to describe the radical proclamations that follow each new invention, and he views Silicon Valley’s idolization of the Internet as simply the most recent incarnation of epochalism.

Page 160: Sais.34.1

163Book Review A PRoPhet foR the DigitAl heRetics

describes some of the downsides of transparency (it can weaken trust, discour-age communication, and punish compromise) and argues that it is a mistake to worship it uncritically.

Taking on these popular beliefs about the Internet is no easy task, but Morozov does so by using his impressive knowledge of the contemporary technological landscape. At times, Click Here reads like an encyclopedia of little-known, but undeniably cool technological developments. Some of these are straw men propped up for Morozov to excoriate, while others are used as examples worthy of his praise. However, all the products he describes—rang-ing from “Forget Me Not” lamps that open upon touch to “the Eye Tribe,” which tracks eye movement to gauge how much content a reader absorbs on a page—leave the reader shaking his or her head in amazement. There is seem-ingly no limit to the diversity of bizarre yet fascinating technologies Morozov can critique.

He evaluates these inventions and the solutionism and Internet-centrism that fuel many of them with a caustic, entertaining wit. He calls the twelfth librarian of the U.S. Congress, Daniel Boorstin, “America’s most overrated historian.”6 He notes that seventy-nine year-old Microsoft researcher and re-nowned engineer, Gordon Bell, “writes like an inexperienced teenager who’s not had his fair share of diverse social interactions.”7 Finally, he claims that New York Times best-selling author Jane McGonigal is “utterly confused about human experience.”8 These are not ad hominem attacks: Morozov is nothing if not thorough in his critiques. Nevertheless, it is little wonder that Morozov is a “feared reviewer of other technology pundits’ books.”9 His critiques are as entertaining as they are unrelenting.

Yet at times, Morozov seems so focused on tearing down the ideas of oth-ers that he leaves little room to build up his alternate vision of the role technol-ogy should play in the world. Mo-rozov offers some suggestions: a world where Google, Twitter, Facebook, and their various rela-tives must “acknowledge [their] own immense role in shaping the public sphere, and start playing that role in a more responsible manner.”10 Furthermore, a world in which technological inven-tions do not solve our problems, but empower us to think critically and solve them ourselves. Lastly, a world that recognizes that just as the Internet is not inherently solution-producing, neither is Internet regulation inherently creativity-staunching.

Nevertheless, in building this alternate paradigm, Morozov occasionally falls victim to his own critiques. For someone who derides Internet-centrism, he struggles to imagine societal advances without the Internet. Furthermore, he uses the same epochalistic terms that he criticizes his contemporaries for using. Most notably, in defending his arguments, Morozov uses a few clunker examples that seem fueled more by paranoia than logic (for example, he give

Yet at times, Morozov seems so focused on tearing down the ideas of others that he leaves little room to build up his alternate vision of the role technology should play in the world.

Page 161: Sais.34.1

164 SAIS Review Winter–Spring 2014

serious consideration to the possibility that Amazon will replace novelists with robots), and uses techno-jargon excessively (ranging from “technostructural-ists” to “datasexuals”).

Morozov seems unusually aware of Click Here’s shortcomings. In his postscript, he offers, “on the odd chance that this book succeeds, its greatest contribution to the public debate might lie in redrawing the front lines of the intellectual battles about digital technologies.”11 This is Click Here’s biggest achievement. In challenging solutionism and Internet-centrism, Morozov has provided a crucial counterweight to the dominant views on technology that threaten to “impoverish and infantilize our public debate.”12 Rather than offer a rigid alternate to these views, Morozov shifts them, pushing contemporary discourses on technology past the misguided yet intoxicating feeling that we “are living through a revolution.”13

Notes

1 Evgeny Morozov, To Save Everything, Click Here: The Folly of Technological Determinism (Public Affairs, 2013), xiii.2 Ibid, xiii.3 Ibid, xv.4 Ibid, 292.5 Ibid, 93.6 Ibid, 51.7 Ibid, 272.8 Ibid, 308.9 Ian Tucker, “Evgeny Morozov: ‘We are abandoning all the checks and balances,’” The Observer, March 9, 2013, http://www.theguardian.com/technology/2013/mar/09/evgeny-morozov-technology-solutionism-interview (accessed March 21, 2014).10 Morozov, Click Here, 146.11 Ibid, 355.12 Ibid, 43.13 Ibid, 357.