using open government data to predict war: a case study of data and systems challenges

9
Using open government data to predict war: A case study of data and systems challenges Andrew Whitmore Department of Information Systems and Decision Science, University of South Florida, 8350 N. Tamiami Trail, SMC-C263 Sarasota, FL, United States abstract article info Available online xxxx Keywords: Open data Case study Research methods Military conict Data quality System design Government portal The ability to predict future military engagements would be a boon to combatants, contracting companies, investors, and other stakeholders. While governments may seek to conceal plans for impending conict, they must spend large sums of money mobilizing and equipping soldiers in preparation for deployment. Thus, examining government spending patterns might yield insight into future military conict. This article reports on an attempt to explore the possibility of using open U.S. Department of Defense (D.O.D.) contracting data to identify patterns of spending activity that can predict future military engagement. The research in this article followed a two-stage method. The rst stage involved the exploration of the research question in the context of a specic case, the U.S. invasion of Iraq in 2003. The second stage assessed the open government contracting data used in the research and classied data and systems problems that were encountered according to an established analytical framework for open data barriers. The analysis demonstrated that the use of U.S. D.O.D. contracting data to predict future war has promise; however, a number of problems with the data and online portal prevented the derivation of conclusive, generalizable results. These problems were related to the open data barriers of task complexity and information quality. A detailed description of how these barriers manifested and directions for overcoming them are presented. © 2014 Elsevier Inc. All rights reserved. 1. Introduction The ability to predict future military engagements would be a boon to combatants, contracting companies, investors and other stake- holders. While governments may seek to conceal plans for impending conict, they must spend large sums of money mobilizing and equipping soldiers in preparation for deployment (for example, see Harrison, 1988). Thus, examining government spending patterns might yield insight into future military conict. This article reports on an attempt to explore the possibility of using open U.S. Department of Defense (D.O.D.) contracting data to identify patterns of spending activity that can be used to predict future military engagement. Over the last decade, the United States has been engaged in a series of military conicts including the wars in Afghanistan and Iraq. As part of the process of getting ready for and engaging in war, the U.S. Department of Defense (D.O.D.) paid billions of dollars to private defense contracting companies for the procurement of goods and services. For example, private contractors comprised 54% of the U.S. D.O.D.'s workforce in Iraq and Afghanistan (Schwartz, 2010) and mili- tary contracts comprised 20% of all Iraq related spending in 2010 (Singer, 2010). In the U.S., the Federal Government uses private contracting companies to provide the goods and services required for the operations and activities of government agencies and ofces. The goods and services are provided by government contractors (vendors) who enter into contracts with the government that specify what work needs to be performed or what goods need to be delivered. These contracts cover an enormous domain of goods and services ranging from ofce supplies to nuclear missile components. Although there are numerous permutations, the basic contracting process begins with a government agency posting an advertisement online that details the nature of the contract, the goods and services to be delivered, the nature of the competitive selection process (for example, lowest bid price) and other related information. The companies interested in competing for the contract submit their bids and then a selection process follows where the agency offering the contracts chooses a specic vendor (for comprehensive contracting procedures, see U.S. Government, 2005). The overwhelming majority of U.S. government agencies rely on contracting, and the U.S. Department of Defense (U.S. D.O.D.) engages in the greatest level of contracting activity by a fairly wide margin. 1 Given the extent to which the U.S. D.O.D. relies upon military con- tractors, it seems plausible to explore the notion that the contract granting activity of U.S. D.O.D. agencies could be used as a proxy measure for classied operational activities. If there is a correlation between contracting activity and classied operational activities, then large increases in military contracts might be a leading signal for future Government Information Quarterly xxx (2014) xxxxxx E-mail address: [email protected]. 1 Source: https://www.fpds.gov/downloads/FPR_Reports/Fiscal%20Year%202007/ Total%20Federal%20View.pdf. GOVINF-01034; No. of pages: 9; 4C: http://dx.doi.org/10.1016/j.giq.2014.04.003 0740-624X/© 2014 Elsevier Inc. All rights reserved. Contents lists available at ScienceDirect Government Information Quarterly journal homepage: www.elsevier.com/locate/govinf Please cite this article as: Whitmore, A., Using open government data to predict war: A case study of data and systems challenges, Government Information Quarterly (2014), http://dx.doi.org/10.1016/j.giq.2014.04.003

Upload: andrew

Post on 09-Feb-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Using open government data to predict war: A case study of data and systems challenges

Government Information Quarterly xxx (2014) xxx–xxx

GOVINF-01034; No. of pages: 9; 4C:

Contents lists available at ScienceDirect

Government Information Quarterly

j ourna l homepage: www.e lsev ie r .com/ locate /gov inf

Using open government data to predict war: A case study of data andsystems challenges

Andrew WhitmoreDepartment of Information Systems and Decision Science, University of South Florida, 8350 N. Tamiami Trail, SMC-C263 Sarasota, FL, United States

E-mail address: [email protected].

http://dx.doi.org/10.1016/j.giq.2014.04.0030740-624X/© 2014 Elsevier Inc. All rights reserved.

Please cite this article as: Whitmore, A., UsinInformation Quarterly (2014), http://dx.doi.o

a b s t r a c t

a r t i c l e i n f o

Available online xxxx

Keywords:Open dataCase studyResearch methodsMilitary conflictData qualitySystem designGovernment portal

The ability to predict future military engagements would be a boon to combatants, contracting companies,investors, and other stakeholders. While governments may seek to conceal plans for impending conflict, theymust spend large sums of money mobilizing and equipping soldiers in preparation for deployment. Thus,examining government spending patterns might yield insight into future military conflict. This article reportson an attempt to explore the possibility of using open U.S. Department of Defense (D.O.D.) contracting data toidentify patterns of spending activity that can predict future military engagement. The research in this articlefollowed a two-stage method. The first stage involved the exploration of the research question in the contextof a specific case, the U.S. invasion of Iraq in 2003. The second stage assessed the open government contractingdata used in the research and classified data and systems problems that were encountered according to anestablished analytical framework for open data barriers. The analysis demonstrated that the use of U.S. D.O.D.contracting data to predict future war has promise; however, a number of problems with the data and onlineportal prevented the derivation of conclusive, generalizable results. These problems were related to the opendata barriers of task complexity and information quality. A detailed description of how these barriers manifestedand directions for overcoming them are presented.

© 2014 Elsevier Inc. All rights reserved.

1. Introduction

The ability to predict future military engagements would be aboon to combatants, contracting companies, investors and other stake-holders. While governments may seek to conceal plans for impendingconflict, they must spend large sums of money mobilizing andequipping soldiers in preparation for deployment (for example, seeHarrison, 1988). Thus, examining government spending patternsmight yield insight into future military conflict. This article reports onan attempt to explore the possibility of using open U.S. Departmentof Defense (D.O.D.) contracting data to identify patterns of spendingactivity that can be used to predict future military engagement.

Over the last decade, the United States has been engaged in aseries of military conflicts including the wars in Afghanistan and Iraq.As part of the process of getting ready for and engaging in war, theU.S. Department of Defense (D.O.D.) paid billions of dollars to privatedefense contracting companies for the procurement of goods andservices. For example, private contractors comprised 54% of the U.S.D.O.D.'s workforce in Iraq and Afghanistan (Schwartz, 2010) and mili-tary contracts comprised 20% of all Iraq related spending in 2010(Singer, 2010). In the U.S., the Federal Government uses privatecontracting companies to provide the goods and services required forthe operations and activities of government agencies and offices. The

g open government data to prg/10.1016/j.giq.2014.04.003

goods and services are provided by government contractors (vendors)who enter into contracts with the government that specify what workneeds to be performed or what goods need to be delivered. Thesecontracts cover an enormous domain of goods and services rangingfrom office supplies to nuclear missile components. Although thereare numerous permutations, the basic contracting process begins witha government agency posting an advertisement online that details thenature of the contract, the goods and services to be delivered, the natureof the competitive selection process (for example, lowest bid price) andother related information. The companies interested in competing forthe contract submit their bids and then a selection process followswhere the agency offering the contracts chooses a specific vendor(for comprehensive contracting procedures, see U.S. Government,2005). The overwhelming majority of U.S. government agencies relyon contracting, and theU.S. Department of Defense (U.S. D.O.D.) engagesin the greatest level of contracting activity by a fairly wide margin.1

Given the extent to which the U.S. D.O.D. relies upon military con-tractors, it seems plausible to explore the notion that the contractgranting activity of U.S. D.O.D. agencies could be used as a proxymeasure for classified operational activities. If there is a correlationbetween contracting activity and classified operational activities, thenlarge increases in military contracts might be a leading signal for future

1 Source: https://www.fpds.gov/downloads/FPR_Reports/Fiscal%20Year%202007/Total%20Federal%20View.pdf.

redict war: A case study of data and systems challenges, Government

Page 2: Using open government data to predict war: A case study of data and systems challenges

2 A. Whitmore / Government Information Quarterly xxx (2014) xxx–xxx

military conflict. While other researchers have used historical datato predict military actions and outcomes (Dupuy, 1979), the use ofdefense contracting data for these purposes has not been previouslydocumented. The defense contracting data used in this research wasaccessed through the usaspending.gov data portal. Section 4 providesmore details about the data portal and its use in this research.

While assessing the feasibility of using open defense contractingdata to predict future conflict, it was necessary to assess the value ofthe open government data itself in terms of its ability to yield unbiased,consistent and generalizable results in accordance with commonresearch standards. This assessment of the open data helps shed lighton the benefits of open government data (for example, see Kassen,2013; Linders, 2013) as well as any barriers to the use of these data(for example, see Janssen, Charalabidis, & Zuiderwijk, 2012; Zuiderwijk,Janssen, Choenni, Meijer, & Sheikh Alibaks, 2012).

Furthermore, this research highlights the complexities involved inmultidisciplinary research that involves computer science, informationscience and digital government. Computer science is focused on thetheoretical and technical aspects of computing (Denning, 2005); inthis case the efficient design and implementation of the web databasesystem that houses the contracting data. Information science is focusedon how users interact with and utilize (Borko, 1968) the governmentdatabase and portal. Digital government examines the effectiveness ofthe government-to-citizen communication and information sharing(Marchionini, Samet, & Brandt, 2003) that the database and portalenable. There is a dependency between the individual disciplines inmultidisciplinary research. For example, a poorly designed database(computer science) will adversely affect user interaction with the sys-tem (information science) which will lead to ineffective governmentto citizen communication (digital government). This paper helps shedlight on these interactions as we progress into the open data era.

The remainder of the paper is structured as follows: Section 2presents a review of a subset of related literature and identifies thearticle used as an analytical framework to model the challengesencountered when working with the usaspending.gov data and portal.Section 3 describes the method employed in this research. Specifically,the section addresses how the use of defense contracting data to predictwar and open data barriers are related to one another. Section 4 pre-sents the research case, a detailed description of the data source, the an-alytical approach, the analysis results, a discussion of the emergentissues that inhibited validity and generalizability, and a tie-in withopen data barriers. Section 5 maps specific issues that emerged duringthe use of the usaspending.gov data portal to the chosen classificationframework for open data barriers. A set of recommendations for howthe data and portal could be improved to offset these challenges isalso presented. Section 6 presents the conclusions of the study and de-scribes how the combination of open data classification frameworksand the examination of specific data portals can be used together to pro-vide a roadmap for improved open data implementations.

2. Related literature

Government data portals have the potential to be a great boon toacademics and practitioners by allowing easy online access to vastamounts of data without the need for repeated data requests, transcrip-tion of data from print to electronic formats, and other tasks thatwould limit user interest and data usefulness. By opening their data,government agencies have the potential to promote transparency,increase citizen participation and spur innovation (Nam, 2011).Furthermore, open data initiatives can help citizens learn aboutgovernment activities, improve government accountability, and en-able citizens to participate in the political process (Janssen, 2011).Open data initiatives can also provide the data independent partiesneed to evaluate the quality of government policy-oriented decisionmaking (Napoli & Karaganis, 2010). Despite all these proposed benefits,there is no clear indication that these implementations are actually

Please cite this article as: Whitmore, A., Using open government data to pInformation Quarterly (2014), http://dx.doi.org/10.1016/j.giq.2014.04.003

successful in fulfilling their missions. While benchmarking the successof government information systems has become a common practice inthe area of e-government (for example, Jansen, de Vries, & van Schaik,2010; Whitmore, 2012a), there are currently no metrics by which toevaluate the success of government open data initiatives (Bertot,McDermott, & Smith, 2012). Going forward, government agenciesmight make use of a benchmarking framework based on an opengovernment data stage model that has been proposed (Kalampokis,Tambouris, & Tarabanis, 2011).

A large number of factors could cause an open data implementationto be unsuccessful. For example, the importance of the quality ofthe data and the portals that contain the data cannot be overstated.The field of information systems has thoroughly demonstrated theimpact of data and system quality on user adoption (for example,Venkatesh & Davis, 2000; Venkatesh, Morris, Davis, & Davis, 2004).While government data portals offermany theoretical benefits, they fre-quently do not receive levels of use that correspond to their potentialbenefits. Despite attempts by government agencies to stimulate innova-tive use of open data, response from external stakeholders has beentepid (Yang & Kankanhalli, 2013). While the promise of open data isgreat, in practice the process of making government data usable isfraught with impediments at every step (Zuiderwijk et al., 2012).These process problems can create a set of barriers that limit theusefulness of open data.

The barriers limiting user adoption of government data portals takemany forms. Researchers have shown that data quality continues to be amajor issue for open government statistical data (Karr, 2008). Dataquality plays a critical role in the level of use of government portals(Detlor, Hupfer, Ruhi, & Zhao, 2013). Some have suggested that therelease of quality data is not always a priority for government agenciesdue to a lack of incentives (Conradie & Choenni, 2013). Users of opengovernment data have complained about a lack ofmetadata for releaseddocuments, useless or inconsistent formats, and other barriers to usabil-ity (for example, Kerschberg, 2011). These data quality issues canfrustrate user attempts to make sense of the data and therefore limitdata interpretability and understandability. These factors have beenshown to present a barrier to accessibility and ultimately, use (Strong,Lee, & Wang, 1997).

In addition to discouraging use, quality issues can also impactaccessibility. One frequently cited data quality issue, a lack of metadata,can impede retrieval of relevant information in public sector informa-tion systems (for example, Christian, 2001; Quam, 2001; Whitmore,2012b). Also impacting accessibility, researchers have been strugglingwith the over-classification of information and by the recent use ofnew document classifications that can make records increasinglydifficult to obtain (Feinberg, 2004; Strickland, 2005). Others havepointed out that data are frequently not current and there is typicallya lack of opportunity for public participation in the delivery of theopen data (Lee & Kwak, 2012). Other barriers impeding the usefulnessof open data initiatives include a lack of domain knowledge on thepart of users (King, Liakata, Lu, Oliver, & Soldatova, 2011), a lack of inter-operability between datasets (Mclaren &Waters, 2011), and a shortageof the statistical knowledge required to successfully work with thedatasets (Janssen et al., 2012).

The challenges mentioned earlier only touch the surface of thebarriers to open data implementation success. Janssen et al. (2012)proposed a comprehensive classification scheme for these barriers.These researchers conducted a literature review and set of interviewswith civil servants in Europe in order to generate a classification ofadoption barriers for open government data. Their two-tier classifica-tion scheme consisted of six top-level barrier categories (institutional,task complexity, use and participation, legislation, information quality,and technical) and numerous specific barriers associated with eachtop-level category. This research utilizes this classification frame-work to illustrate the barriers experienced when working with theusaspending.gov data portal.

redict war: A case study of data and systems challenges, Government

Page 3: Using open government data to predict war: A case study of data and systems challenges

3A. Whitmore / Government Information Quarterly xxx (2014) xxx–xxx

3. Methodology

The research in this article follows a two-stage approach. Thefirst stage involves the presentation of the research case and thesecond stage classifies data and systems issues encountered whenperforming the research according to an established analyticalframework. Specifically, the two research stages entail:

1) In the first stage, a thorough description of the case along with itsdata source and analytical techniques is presented. Case studies arebest used to explain “how” or “why” some social phenomenonworks (Yin, 2009). In the particular context of this research, thecase approach will be used to explore how D.O.D. contractingbehavior supported and can be used to predict the preparations forand execution of the U.S. Invasion of Iraq in 2003. The case approachallows for the inclusion of contextual detail such as the particularD.O.D. agencies involved in war preparations, the dollar amountand spending patterns of their contracting activities, and thematerials and services that were acquired through the contracting.This case was selected because the invasion of Iraq in 2003 wasone of only two military conflicts that fell within the time framefor which data were available. The other conflict that fell withinthe data time frame was the invasion of Afghanistan. However,since that conflict was initiated almost immediately after the eventsof September 11, 2001, there would not have been enough time forthe U.S. D.O.D. to go through the process of advertising contracts,reviewing bids and selecting winners before the conflict began.That fact made the war in Afghanistan a less useful case for examin-ing whether or not changes in contracting activity could be used topredict military conflict in advance. Case studies can also be usedas part of larger, mixed method studies (Yin, 2009) where theymay play a role in enriching and contextualizing the results ofstatistics-driven empirical research. Mixed method studies involv-ing statistical analysis can add power to the research by establishingthe statistical significance and generalizability of the results. Howev-er, certain conditions related to data quality and availabilitymust bemet in order for a statistical analysis to be performed as part of amixed method analysis. The case conclusions make the connectionfrom the data and systems problems that were identified to theproblem of open government data barriers and their impact onmixed method multidisciplinary research.

2) In the second stage, the data and systems problems encounteredwhen working with the data are classified according to the opendata barriers as described by Janssen et al. (2012). Working withinthe framework established in that article, a thorough assessmentof the performance of the usaspending.gov portal along the dimen-sions of task complexity and information quality is presented.While the article presents a variety of barrier dimensions, thedimensions of task complexity and information quality are barriersfrom the user's perspective while many of the other dimensionsdescribed in the article are barriers from the data provider's pointof view (Janssen et al., 2012). Since this research is from the perspec-tive of a user who has no institutional knowledge of the innerworkings of the agency whomanages the data used in the research,the analysis is confined to the user-centric dimensions of taskcomplexity and information quality. Finally, a set of recommen-dations for mitigating the open data barriers observed in thisresearch are presented.

4. Research case

This case represents an attempt to use open government datato explore whether non-classified U.S. D.O.D. contract spendingpatterns can be used as a leading indicator of military engagements.In particular, the research examines pronounced changes in U.S.D.O.D. agency contract spending in two periods: (1) just prior to

Please cite this article as: Whitmore, A., Using open government data to pInformation Quarterly (2014), http://dx.doi.org/10.1016/j.giq.2014.04.003

the congressional authorization to attack Iraq in October 2002 and(2) just prior to the actual beginning of the invasion of Iraq in March2003. Atypical patterns in contract spending around these referencepoints may indicate that contract spendingmight be a leading indicatorof declarations of war or the initiation of armed conflict.

The congressional authorization to attack Iraq, formally known asthe “Joint Resolution to Authorize theUse of United States Armed ForcesAgainst Iraq,” was introduced to Congress on October 2, 2002. It waspassed by the House on October 10, 2002, passed by the Senate onOctober 11, 2002 and signed into law by President Bush on October16, 2002. This research uses the October 2, 2002 date as the referencedate for the authorization of force since this date marks the officialbeginning of the authorization process. A ramp-up in contractingactivity prior to the authorization of military force could potentially:

(1) indicate that war is considered by the government to be aforegone conclusion;

(2) indicate that a particular U.S. D.O.D. agency or office plays arole in the very early stages of war preparation;

(3) be used as a basis to initiate defensive preparations or invest-ments in materials or organizations whose value tends to riseduring armed conflict.

This research is also interested in pronounced changes in contract ac-tivity after the authorization for war, but before the initiation of armedengagement. A ramp-up in contracting activity after authorization butprior to military action could potentially:

(1) indicate that the agency or office in question may lie on thecritical path for military deployment;

(2) reflect information about the current needs or deficiencies inmilitary forces based on the nature of the contracts;

(3) be used to estimate the amount of time remaining until militaryengagement commences.

4.1. Open data source

A variety of sources provide information on U.S. D.O.D. contracts.The websites www.defense.gov/contracts and www.FedBizOpps.govboth provide information on current U.S. D.O.D. contract opportunities.The usaspending.gov portal provides historical information on U.S.D.O.D. contracts. All contracts issued by U.S. D.O.D. agencies, officesand installations are recorded in the usaspending.gov records. Dataon U.S. D.O.D. spending from usaspending.gov were retrievedfor years 2000–2003, which cover the two time periods of interest(the authorization for war in October 2002 and the initiation ofarmed conflict in March 2003). The data used in this research wereretrieved on July 31, 2012.

The data for each year were downloaded from the usaspending.govportal as a table. Each row of the table represented a unique contract.The columns of the table presented a wide array of information oneach contract including information about: the recipient of the contract,the agency that granted the contract, the contract time frame (start andend dates), the type of contract and when it was signed, the amount ofmoney paid for the contract work, the locationwhere the contract workwas performed, the government agency that purchased the contractwork (frequently the same as the agency that granted the contract),details of the vendor performing the contract work, the generalkind of work being performed, the competitive basis for the awardingof the contract, and any legislative mandates that apply to thecontracted work.

4.2. Open data analysis

For each U.S. D.O.D. agency represented in the data, the dailytotal U.S. dollar value of the contracts it issued was calculated for2000–2003. Summary statistics for total daily contract amounts for

redict war: A case study of data and systems challenges, Government

Page 4: Using open government data to predict war: A case study of data and systems challenges

4 A. Whitmore / Government Information Quarterly xxx (2014) xxx–xxx

each agency were determined, and agencies that had instances of totaldaily spending that exceeded +3 standard deviations for that respec-tive agency during 2002–2003 were flagged. These agencies whosetime series contained outliers of more than +3 standard deviationswere graphed using STATA 11 IC and the time series were visuallyinspected to establish the correspondence of the outliers to the tworeference points previously mentioned, the authorization for war andthe initiation of the invasion.

The analysis yielded a number of U.S. D.O.D. offices and agencies thatexhibited outlier behavior around the two reference points. The authorpurposively selected a subset of these organizations for presentation inthe article. The selectionwas based on the author's judgments about thepossible relevance of each office or agency in terms of the researchquestion. These organizations are presented as examples rather thanas a comprehensive set of results in order tomeet reasonable limitationson the number of figures and in light of the results which show that acomprehensive, generalizable test of the research question is notfeasible given the limitations present in the data source, which will beaddressed in Section 4.4.

4.3. Case results

The results in this section demonstrate the patterns of behavior offour U.S. D.O.D. agencies around the two reference points judged impor-tant to the ability to predict military engagement: (1) just prior to thecongressional authorization to attack Iraq in October 2002 and (2) justprior to the actual beginning of the invasion of Iraq in March 2003.Many details of the exact role each agency plays in the larger U.S.D.O.D. organization are understandably classified due to the role themilitary plays in national security. However, in order to provide somecontext for each of the examples, Table 1 presents public informationabout the operational role of each agency.

Fig. 1 presents the contract spending behavior of the four U.S. D.O.D.agencies that were selected for presentation in this research.

4.3.1. Pronounced changes in contract spending prior to authorizationfor war

The top row of Fig. 1 presents two examples of agencies thatexperienced spikes in contract spending just prior to the congressionalauthorization forwar (represented by a dashed line in the figure). Thesetwo agencies are the U.S. Air Force Air Armament Center/Peace KeepingOperations and the U.S. Army Corps of Engineers, Baltimore District.Generally speaking, agencies that experienced large ramp-ups in

Table 1Selected D.O.D. agency details and missions.

Usaspending.gov agency name Full name

AAC/PKO Operational Contracting U.S. Air Force Air ArmamentCenter/Peace Keeping Operations

USA Engineer District Baltimore U.S. Army Corps of Engineers,Baltimore District

Virginia Contracting Activity Defense Intelligence AgencyVirginia Contracting Activity

Combat Direction Systems Activity U.S. Navy Combat DirectionSystems Activity

1 Source: http://en.wikipedia.org/wiki/Air_Armament_Center2 Source: http://www.nab.usace.army.mil/About/MissionandVision.aspx3 Source: http://www.fas.org/irp/dia/vaca/4 Source: http://www.navsea.navy.mil/nswc/damneck/content/mission_vision.aspx

Please cite this article as: Whitmore, A., Using open government data to pInformation Quarterly (2014), http://dx.doi.org/10.1016/j.giq.2014.04.003

contract activity just prior to the authorization of war tended to doso only a day or two prior to the authorization, most frequently onOctober 1, 2002. This trend may be somewhat misleading however,as the date associated with each contract is the date it was signed,not the date it was offered. The date the contract was offered wouldbe a more useful piece of information since it would provide a moreaccurate idea of the extent to which contract activity at a given agencywas a leading indicator for war. However, this information is notavailable from usaspending.gov.

4.3.2. Pronounced changes in contract spending prior to the invasion of IraqThe bottom row of Fig. 1 presents two examples of agencies that

experienced heightened contract spending after war was authorized(dashed line on the left) but before the actual invasion began (dashedline on the right). These two agencies are the Defense IntelligenceAgency's Virginia Contracting Activity and the U.S. Navy CombatDirection Systems Activity. As one might expect, a number of U.S.D.O.D. agencies experienced abnormal levels of contract spendingafter congress authorized war on Iraq. What differentiated contractspending patterns in this group from the previous group is the widevariety of spending patterns present among the agencies. For example,the Defense Intelligence Agency's Virginia Contracting Activitydemonstrated a sustained build-up of contract spending that culminatesin a peak just prior to the actual attack on Iraq. By contrast, the U.S. NavyCombatDirection SystemsActivity shows an entirely different pattern oftwo outlier points and no consistent ramping-up of spending. Thedegree of diversity among the patterns of contract spending prior tothe invasion of Iraq contrasts to the relative uniformity of the leadingindicators of the authorization for war who generally showed an uptickon or about October 1, 2002, one day before the resolution was formallyintroduced to congress.

4.3.3. Types of contracts issued by the example agenciesIn addition to the patterns of contract spending around the authori-

zation for war and the invasion of Iraq, one might also be interestedin the nature of the contracts themselves. The usaspending.gov datainclude a generic North American Industry Classification System(NAICS) code that classifies the major contract activities. While it maybe useful to have this information, the generic nature of the NAICScodes makes it difficult to surmise the exact nature of the contracts.

Generally speaking, increased contract spending at agencies justprior to the introduction of the congressional authorization forwar with Iraq tended to be rather heavily focused on infrastructure

Mission statement

Air Armament Center (AAC) was an Air Force Material Command (AFMC) centerat Eglin Air Force Base, Florida, responsible for development, acquisition, testing,and deployment of all air-delivered weapons for the U.S. Air Force. The center wasinactivated in 2012.1

Operates corporately to deliver innovative and effective solutions to our customers'engineering challenges in a manner consistent with our values and our principlesof environmental stewardship.2

Because of its unique mission, the Defense Intelligence Agency conducts itscontracting business through the Virginia Contracting Activity, also known as VACA.Through this office, DIA acquires the necessary products and services required tosupport its combat support mission.3

Support the mission of the Dahlgren division of the Naval Surface Warfare Center byproviding force-level integrated and interoperable engineering solutions,mission critical control systems, and associated testing and training technologies tomeet maritime, joint, special warfare and information operation requirements relatedto surface warfare. Execute other responsibilities as assigned by the Commander,Dahlgren Division, Naval Surface Warfare Center.4

redict war: A case study of data and systems challenges, Government

Page 5: Using open government data to predict war: A case study of data and systems challenges

Fig. 1. Patterns in contract spending for sample D.O.D. agencies and offices.

5A. Whitmore / Government Information Quarterly xxx (2014) xxx–xxx

improvement. For example, as shown in Table 2, the two examplesgiven as possible leading indicators of the authorization for war ex-perienced large increases in infrastructure contracting such as elec-trical work, building construction, and facilities support services.One plausible explanation for this is that they were getting readyfor war by improving depreciated infrastructure or adding additionalinfrastructure in order to support what they perceived to be aninevitable increase in capacity demand. Given the sub-optimalinfrastructure conditions at many military facilities (Associationof the Unites States Army, 2001), a revamp prior to war mighthave been viewed as a necessity.

Generally speaking, increased contract spending at agencies justprior to the invasion of Iraq was frequently focused on informationand communication technology. For example, as shown in Table 2,the two examples given as possible leading indicators of the invasionof Iraq experienced large increases in investments in computer hard-ware, software and communications equipment. This kind of technicalequipment is obviously necessary to support a large number of troopsoperating overseas. However, the U.S. D.O.D. has experienced a lack of

Table 2Examples of spending by leading indicator.

Leading indicator Office ID Usaspending.gov agency/office

Authorization (October 2, 2002) GM02 AAC/PKO Operational ContractAuthorization (October 2, 2002) CA31 USA Engineer District Baltimor

Engagement (March 19, 2003) ZD50 Virginia Contracting Activity

Engagement (March 19, 2003) BW11S Combat Direction Systems Acti

Please cite this article as: Whitmore, A., Using open government data to pInformation Quarterly (2014), http://dx.doi.org/10.1016/j.giq.2014.04.003

equipment and support in these areas. For example, U.S. Marines inIraq complained about a lack of engineering and communicationsequipment (Bender, 2005) and the U.S. D.O.D. has had great diffi-culty modernizing parts of its outdated computer hardware andsoftware infrastructure (Nolan, 2012). Both factors suggest thatlarge investments in these areas would be needed to facilitate theC4ISR (Command, Control, Communications, Computers, Intelligence,Surveillance and Reconnaissance) activities involved in launching anoverseas invasion.

4.4. Case conclusions and relationship to open data barriers

The research question asked whether non-classified U.S. D.O.D.contract spending can be used as a leading indicator of militaryengagements. Despite some promising circumstantial evidence, thisanalysis is unable to conclusively answer this research question. Specif-ically, two primary factors prevented the analysis from conclusivelydemonstrating empirically rigorous results. First, the usaspending.govdata only go back to FY 2000 and while the U.S. has been quite active

name Example contract categories

ing Electrical contractors, Construction, Waste treatment and disposale Facilities support services, Engineering services, Highway, street and

bridge constructionCustom computer programming services, Research and development,Other communications equipment manufacturing

vity Engineering services, Switchgear and switchboard apparatusmanufacturing, Radio and television broadcasting and wirelesscommunications equipment manufacturing

redict war: A case study of data and systems challenges, Government

Page 6: Using open government data to predict war: A case study of data and systems challenges

6 A. Whitmore / Government Information Quarterly xxx (2014) xxx–xxx

militarily in that period relative to other countries, there are still not asufficient number of “events” needed to determine whether a specificagency's contract spending patternsmay indicate future armed conflict.Second, the data that do exist is fraught with noise and unrelated andperhaps non-public events that are difficult to control for and thusthat make it difficult to determine whether changes in contractspending are actually related to the event in question. This is primarilydue to a lack of contextual information about the motivations andspecific requirements of the contract.

In addition to these two primary factors which by themselves wouldhave been sufficient to prevent research validity and generalizability, ahost of other issues with the open data and portal were encounteredduring the performance of the research. The multitude of issues thatwere encountered was due not to the inherent intractability of theresearch question, but to a set of challenges present in the open dataused in the research. Specifically, the author encountered two classesof barriers when working with the open data: barriers related to taskcomplexity and barriers related to information quality (Janssen et al.,2012). These issues and their relationship to open data barriers andmultidisciplinary research standards are presented in the next section.

5. Challenges with the data and portal

The case yielded some circumstantial evidence that public U.S.D.O.D. contracting data can be used to indicate impending conflict.However, a number of issues with both the usaspending.gov data andthe portal through which the data were accessed were identified thatposed significant challenges and limited the validity and generalizabilityof the research. These issueswere related to two categories of open databarriers: task complexity and information quality. A summary of thefindings related to these barriers is presented in Table 3.

A detailed description of these barriers and how they manifested inthe usaspending.gov portal is presented in Sections 5.1-5.2.

5.1. Task complexity

Barriers related to task complexity impose a set of obstacles that canlimit a user's ability to utilize the data efficiently and effectively.The usaspending.gov portal suffered from a number of these barriers,some of which were overcome by the author but others imposedsome fundamental limitations on the research described in the case.

5.1.1. Data formats and datasets are too complex to handle and use easilyThe datasets used in the research casewere very large, containing up

to millions of rows and hundreds of columns. The size of the datasetsmakes the importance of the available download formats very signifi-cant. There were four file formats available for downloading: CSV andTSV (Comma or Tab Separated Values), the web syndication XMLlanguage Atom, and some generic format simply labeled “XML.” It isnot clear which XML vocabulary this last option was intended to mapto. Files downloaded in this format contained no references to any

Table 3Usaspending.gov challenges and corresponding open data barriers.

Categories1 Barriers1

Task complexity Data formats and datasets are too complex to handle and us

Apps hiding the complexity but also potential other use of o

Focus is on making use of single datasets, whereas the real vcome from combining various datasetsContradicting outcomes based on the use of the same data

Information quality Lack of informationLack of accuracy of the informationIncomplete information, only part of the total picture showncertain range

1 From Janssen et al. (2012).

Please cite this article as: Whitmore, A., Using open government data to pInformation Quarterly (2014), http://dx.doi.org/10.1016/j.giq.2014.04.003

existing standardized namespaces or schemas, whichwould complicatethe task of integrating these datawith other government data through aweb services implementation.

Furthermore, the sheer size of the files rendered them too large tobe easily imported into common desktop productivity software suchas MS Office. In the research case, the author downloaded the files inCSV format and attempted to import them into a databasemanagementsystem powerful enough to handle them, MySQL. In the first attempt,the author tried to import the files through the phpMyAdmin graphicalfront end, only to find out that the files were too large to be inputthrough this method due to the way phpMyAdmin uses “extendedinserts” and some of the default time-out settings in the system config-uration files. In the end, the author wrote a Java program to transformthe CSV files into SQL files and then imported them directly throughthe command line interface to MySQL. While there may have beenother approaches to getting the data into a database managementsystem, the point is that a reasonable level of technical expertise wasrequired to transform the data from its native format into somethingthat can be made usable by the researcher.

The choice of the file formats, in the opinion of the author, reflected abias towards researchers and assumed a level of technical competenceless frequently encountered among journalists, students, or other con-cerned citizens. An illustration of this is the availability of the data inthe XML formats, Atom and generic XML. Raw XML is not a formattypically used by the average citizen. At a minimum, a user wouldneed to create an XSLT Stylesheet to even view the data in an easilyreadable format. That of course would require the user to be able tocode in XSLT. While it may be the case that at the moment most ofthe users of the usaspending.gov portal are researchers with strongtechnical skills, the lack of other more accessible data formatsposes a barrier to wider use. For example, making the data availablein an assortment of SQL formats that could be directly imported intovarious database management systems would have saved the authortime in his research case.

5.1.2. Apps hiding the complexity but also potential other use of open dataAs a supplement to the raw data download capabilities of the

usaspending.gov portal, the developers implemented a user-friendly in-teractive environment (“Trends” tab on site global navigation menu)where users can work with forms to filter or query the raw data bygovernment agency, state, year, and other criteria. The results arereturned as either a data table or a graph. This feature is essential toensure that the average citizen can access and use the data given thecomplexities of working with the raw data that were previouslydescribed. However, this feature only allows users access to a smallportion of the data contained in the raw datasets. Specifically, only asmall subset of the government agencies and offices covered in theraw data are accessible for querying in this interactive environment.Furthermore, for those agencies and offices that are covered, onlyvery narrow information is returned by the system, frequently onlyannual contracting expenditures. However, the raw data contain over

Usaspending.gov example

e easily Data tables contain millions of rows and hundreds of columns;only delimited files and XML formats available for download

pen data Some data available for download are not available through theuser-friendly interactive portal features

alue might No database schema is provided; no information on sourcedatabase tablesData dynamism issuesMissing values (foreign keys); lack of comprehensive metadataInconsistent data values

or only a Only years 2000–2013 available

redict war: A case study of data and systems challenges, Government

Page 7: Using open government data to predict war: A case study of data and systems challenges

7A. Whitmore / Government Information Quarterly xxx (2014) xxx–xxx

200 columns that give a much more comprehensive and rich set ofinformation about each transaction, almost none of which is availablethrough this interactive capability.

This limitation forms a virtual digital divide between thosewhohavethe skills and time to work with the raw data and those who do not.With respect to the research case, none of the graphs that weregenerated in the research could have been generated online throughthe “Trends” section of the portal as those agencieswere not included inthe available list. Instead, generating them required a MySQL databaseand a dedicated statistical package, both potential barriers to accessand use. Thus, while the “Trends” section did a good job hiding thecomplexity of the underlying data, it simultaneously made the dataless useful.

5.1.3. Focus is on making use of single datasets, whereas the real valuemight come from combining various datasets

The rawdata downloaded from the usaspending.gov portals come assingle datasets. If one selects “complete” in the data detail field, each ofthese datasets will contain over 200 columns. These columns cover avast amount of information related to each contract. It is the belief ofthe author based his years of experience working with databases thatthese datasets are essentially prefabricated query results run against amuch larger database that contained many different tables joinedtogether through primary and foreign keys. As supporting evidence ofthis assertion, the author points out that the downloaded tables arenot even in second normal form. This implies that either the author'sprevious assertion is correct or the design of the government's databaseis grossly inept. The author finds the latter proposition to be unlikely.

While the data available for download through the portal may havecome from a variety of different sources, no information about how toobtain the source data was presented. For example, there were manyforeign keys present in the data with telltale naming conventions suchas field names that contained “id,” for example “solicitationid.” In thisparticular case, foreign key “solicitationid” refers to a table containinginformation about the advertisement for the contract that was pub-lished as part of the contract bidding process. Information containedin that advertisement might very well help to provide the contextnecessary to establish that a contract was related to war preparationefforts rather than to some non-related cause. The inability to establishthe context for different patterns in contract spending severely limitedthe conclusions of the research case. Thus, the site could be improvedby providing a database schema that illustrates the architecture of thebroader database from which the query results available for downloadon the portal are drawn. With more information about the foreignkeys present in the data and the external data to which they are related,the author could have attempted to locate the government portal thatcontained this additional information (if it was available), downloadedthe data, connected the datasets and presented a more compellingargument that the contract spending in question was in fact intendedfor war preparation activities.

5.1.4. Contradicting outcomes based on the use of the same dataOne factor complicating the analysis described in the case was that

the government data appeared to be in a constant state of change.For reasons that are not clearly explained on the website, the numberof data rows per table changes over time even for years well in thepast. For example, on July 31, 2012 the table of U.S. D.O.D. contractsfor FY 2005 (not used in this research) contained 1,420,625 rows.On August 2, 2012, the same table contained 1,420,643 rows. OnAugust 25, 2012 the table contained 1,420,839 rows. On March 30,2013 the table contained 1,421,593 rows, and on July 11, 2013 thetable contained 1,421,859 rows. Thus, a dataset containing records fora time frame well in the past underwent a change of 1234 records in alittle under a year. The author was able to find a partial explanationof the changes to these tables in the database dictionary. The databasedictionary explained that data available through the system undergoes

Please cite this article as: Whitmore, A., Using open government data to pInformation Quarterly (2014), http://dx.doi.org/10.1016/j.giq.2014.04.003

regular modifications (insertions/updates/deletions), although no clearexplanation for why this was happening was presented. Since theusaspending.gov site is an archive rather than an operational datastore, these kinds of changes, especially deletions, seem peculiar. It isalso somewhat perplexing to the author that these changes are beingmade on a regular basis to data that is quite old.

Changing data can present serious challenges to analysis, particular-ly statistical analysis, when new or changed data alters or invalidatesthe conclusions previously reached (and published) by scholars. Thus,two scholars working with these data on two different dates couldpossibly arrive at contradicting research outcomes.

5.2. Information quality

Barriers related to information quality affect a user's ability tomeaningfully utilize the data and have implications for the qualityof data provider's record keeping. A number of barriers related toinformation quality were encountered in the case.

5.2.1. Lack of accuracy of the informationUnless one is able to compare the accuracy of government data

against one or more alternative sources it can be difficult to get a trueidea of the level of accuracy present in the data. However, when thedata present in the source are not internally consistent, accuracyproblems become rather obvious.

A number of inconsistencies in the usaspending.gov data werereadily observable. For example, vendor names were frequently spelledin several different ways and on numerous occasions different numbersof employees and annual revenue figures were provided for a singlevendor in the same time period. These data inconsistencies can com-plicate or prevent data analysis. For example, if one wished to examinethe relationship between the size of a vendor (in terms of employeesor annual revenue) and the size of the contracts they tend to receive,one would run into a methodological issue using the data due to itsinconsistencies.

While these inconsistencies raise questions, the true accuracy ofthe overall data is not measurable for a variety of reasons includingthe fact that some of the data are collected and maintained by thegovernment alone thereby preventing any further external validation.Data inconsistencies like this are frequently observed in the absence ofa correctly formulated database architecture. For example, relationaldatabases are designed to prevent data redundancy, a condition wheredifferent entitiesmaintain different copies of the samedata. Data redun-dancy can result in discrepancies or inconsistencies in the data thatcan be mitigated by employing a single, centralized database. Withoutknowledge of the structure of the database from which these datasetsare drawn, it is not possible to determine exactly why there would beinconsistencies in the data. As mentioned before, the publication of adatabase schema would help identify the source of these issues.

5.2.2. Lack of informationThere was a lack of necessary information in both the data made

available through the usaspending.gov portal and on the portal itself.In terms of the portal, there were several issues including incompletemetadata. While there was a database dictionary available on theusaspending.gov site, the dictionary was basically limited to providingdefinitions of the field (column) names in the database. In May 2013,the database dictionary was updated to include an additional fieldlabeled “length” that simply contained an integer number. Presumably,this number would signify the maximum number of charactersallowable in each field. However, the database dictionary lackedother important metadata such as the data types (i.e., integer, real,Boolean, char, text) of each field as well as any information aboutrequired data formats (i.e., particular date standards and units ofmeasure) or constraints. This missing information prevented furtherdata validation and forced the researcher to make guesses about data

redict war: A case study of data and systems challenges, Government

Page 8: Using open government data to predict war: A case study of data and systems challenges

8 A. Whitmore / Government Information Quarterly xxx (2014) xxx–xxx

types when importing the data into the MySQL database. Theseassumptions could result in inefficient memory storage which canbe important whenworking with very large tables as well as possibletype mismatches if the researcher was attempting to join these datawith data from other sources.

The online portal through which the data were accessed alsosuffered from design issues related to information availability. Forexample, the form that is used to select the data the user wishes toreceive allows the user to filter the data by agency. Certain agenciessuch as the Central Intelligence Agency and the National SecurityAgency appear in the list of available agencies, yet no records forthese two agencies (or other intelligence agencies) are actuallyavailable for download.

The data downloaded from the portal also suffered from regularinformation barriers. Certain data fields were characterized by regularmissing values. For example, “solicitationid,” a unique identifier thatconnects individual contracts to the solicitation that offered thedetails of the contract was frequently missing. This omission prohibitsresearchers from connecting the contract information to the details ofthe solicitation which might provide more insight into the motivationsfor the spending. The inability to provide this context was one of themajor limitations in the research case.

5.2.3. Incomplete information, only part of the total picture shown or only acertain range

Along with the inability to establish the motivations for con-tract spending (war preparation or otherwise), the short timespan for which the data are available also imposed an intractablelimitation that prevented generalizability in the research case. In theusaspending.gov portal, only data for financial years 2000–2013 areavailable. It is unclear why data prior to 2000 is unavailable, and anarchive that only includes 14 years of longitudinal data limits itsown usefulness. For example, in the research case described in this

Table 4Improvement directions for overcoming the data barriers.

Categories1 Barriers1

Task complexity Data formats and datasets are too complex to handleand use easily

Apps hiding the complexity but also potential otheruse of open data

Focus is on making use of single datasets, whereas thereal value might come from combining various datasets

Contradicting outcomes based on the use of thesame data

Information quality Lack of information

Lack of accuracy of the information

Incomplete information, only part of the total pictureshown or only a certain range

1 From Janssen et al. (2012).

Please cite this article as: Whitmore, A., Using open government data to pInformation Quarterly (2014), http://dx.doi.org/10.1016/j.giq.2014.04.003

article, the inability to consider other conflicts such as the Gulf War(1990–1991) resulted in an inadequate number of “events” thatprecluded generalizing the findings. While there could be legitimatetechnical reasons why pre-2000 records cannot be made available, theauthor is skeptical especially when data going much further back isreadily obtainable from other government sources. For example, dataon the GDP of all countries back to 1970 can easily be downloadedfrom the United Nations National Accounts Main Aggregates Database.

5.3. Summary of suggested improvements for the usaspending.gov portal

Based on this analysis, Table 4 presents some suggestions forimproving the usaspending.gov portal to help overcome thesedata barriers.

6. Conclusions

The U.S. D.O.D. relies heavily upon military contractors to providethe goods and services required to maintain the U.S. armed forces andengage in military operations throughout the world. Given the extentof this dependence, the contract granting activity of U.S. D.O.D. agenciesmight be used as a proxy measure for classified operational activities.If there is a correlation between contracting activity and classifiedoperational activities, then large increases in military contracts mightbe a leading signal for future military conflict. The ability to predictfuturemilitary engagementswould be a boon to combatants, contractingcompanies, investors and others. While military operational plans areclassified and not available for public scrutiny, the contracting activityof the U.S. D.O.D. is a matter of public record.

This article reported on an attempt to explore the possibility of usingopen U.S. D.O.D. contracting data to identify patterns of spending activitythat can predict future military engagement. The research in this articlefollowed a two-stage approach. The first stage involved the exploration

Suggestions for improvement of usaspending.gov

Improve XML data• Create and publish a schema and namespace to establish the syntax rulesand structure of the XML vocabulary used to represent the contract data.This will enable data validation and interoperability.• Create and publish an XSLT stylesheet that will enable users to view theXML data through a web browser.Publish data as SQL• In addition to CSV and XML formats, publish the data as SQL so that it caneasily be imported into relational databases.Improve the comprehensiveness of the online interactive portal• Allow all of the data fields and contracting offices to be queried throughthe online portal.Provide a database schema• A database schema along with information about the tables used to compilethe contracting data would be useful to users seeking to connect these datawith other government data sets.Reduce data dynamism• Change policies so that data inserts/updates/deletions are performed in amore efficient and timely manner.Improve database dictionary• Improve database dictionary to contain information about data types,formats, constraints, etc. This will improve the memory efficiency of thedatabase management system and facilitate the connection of these datawith other government data sets.Update the interactive portal• Update the interactive portal so that agencies for which no data areavailable (intelligence agencies) are not listed among the query options.Improve data consistency• Apply best practice from the data warehousing literature to ensure clean,consistent and available data.Expand the scope of the data• Make data collected before 2000 available through the usaspending.gov portal.

redict war: A case study of data and systems challenges, Government

Page 9: Using open government data to predict war: A case study of data and systems challenges

9A. Whitmore / Government Information Quarterly xxx (2014) xxx–xxx

of the research question in the context of a specific case, the U.S. invasionof Iraq in 2003. The analysis demonstrated that the use of U.S. D.O.D.contracting data to predict future war has promise; however, a numberof problemswith the data and online portal prevented conclusive, gener-alizable results. The second stage classified the data and systems prob-lems encountered when performing the analysis according to anestablished analytical framework for opendata barriers alongwith specif-ic examples of how these barriers manifested in the research.

Specifically, two categories of barriers in particular presented seriousimpediments to theuse of the data for researchpurposes: task complex-ity and data quality. Specific instances of task complexity barriers in theusaspending.gov portal included massive datasets available in difficult-to-use formats, interactive portal features that hid much of the dataavailable in the download, a lack of information about the connectionof the downloaded data to its sources, and data that changed overtime. Specific information quality issues that were encountered includ-ed missing values (especially foreign keys), inconsistent data whichsuggest problemswith the source database structure, and a very limitedtime range for which the data were available.

One limitation of the research is that it addressed several open databarriers derived from the research framework article but did notaddress others. This approach was taken because the author's interac-tion with the usaspending.gov portal was as a user, not as a providerof the data. Thus, this perspective did not allow the article to addressthe supply-side barriers that the government agency managing theusaspending.gov portal may face. This fact limited the comprehensive-ness of the barrier analysis for the usaspending.gov data and portal.Another limitation of the paper is the reliance on a single casestudy, the 2003 invasion of Iraq. Due to the limited time frame forwhich data were available on the usaspending.gov portal, a morecomprehensive examination of many cases was impossible. Becausethe research only considered a single case, it was impossible to validateor generalize the findings in a way that would meet common socialscience research standards.

Despite the limitations, the findings provide further empirical, case-based evidence in support of open data barrier frameworks proposed inthe literature. Furthermore, this study extends the literature by identify-ing specific examples of themanifestation of these barriers encounteredwhenworkingwith the usaspending.gov data portal anddemonstratingsome of the specific challenges to research validity and generalizabilitypresented by these open data barriers.

While open data hold great promise for promotingmultidisciplinaryresearch, government agenciesmust correct these issueswith their dataand systems if that promise is to be fulfilled. Thefirst step in that processis to present a broad framework for the kinds of barriers thatmanifest inthe open data process. The literature has done a fine job at that. Thesecond step, the focus of this research, is to present specific instancesof these barriers as they exist in individual government data portals.These details along with the theoretical presentation of the barriersand their impacts available in the literature can be used as a roadmapto correct existing systems and improve the quality of subsequentopen data implementations.

References

Association of the Unites States Army (2001). Decaying military infrastructure: Putting U.S.Army readiness at risk, Institute of Land Warfare. Arlington, VA: Association of theUnited States Army.

Bender, B. (2005).Marine units found to lack equipment: Corps estimates of needs in Iraq arecalled faulty. Retrieved from. http://www.boston.com/news/world/articles/2005/06/21/marine_units_found_to_lack_equipment/?page=full.

Bertot, J. C., McDermott, P., & Smith, T. (2012). Measurement of open government:Metrics and process. Proceedings of the 45th Annual Hawaii International Conferenceon Systems Sciences (HICSS2012), Maui, HI (January 4–7).

Borko, H. (1968). Information science: What is it? American Documentation, 19(1), 3–5.Christian, E. (2001). A metadata initiative for global information discovery. Government

Information Quarterly, 18(3), 209–221.Conradie, P., & Choenni, S. (2013). Exploring process barriers to release public sector

information in local government. 2012, October 22-25. Paper presented at the 6th

Please cite this article as: Whitmore, A., Using open government data to pInformation Quarterly (2014), http://dx.doi.org/10.1016/j.giq.2014.04.003

International Conference on Theory and Practice of Electronic Governance (ICEGOV '12),Albany, New York, USA.

Denning, P. J. (2005). Is computer science science? Communications of the ACM, 48(4), 27–31.Detlor, B., Hupfer, M. E., Ruhi, U., & Zhao, L. (2013). Information quality and community

municipal portal use. Government Information Quarterly, 30(1), 23–32.Dupuy, T. N. (1979).Numbers, predictions, and war: Using history to evaluate combat factors

and predict the outcome of battles. New York: Bobbs-Merrill.Feinberg, L. E. (2004). FOIA, federal information policy, and information availability in a

post-9/11 world. Government Information Quarterly, 21(4), 439–460.Harrison, M. (1988). Resource mobilization forWorldWar II: The U.S.A., U.K., U.S.S.R., and

Germany, 1938-1945. The Economic History Review, 41(2), 171–192.Jansen, J., de Vries, S., & van Schaik, P. (2010). The contextual benchmark method:

Benchmarking e-government services. Government Information Quarterly, 27(3),213–219.

Janssen, K. (2011). The influence of the PSI directive on open government data: An overviewof recent developments. Government Information Quarterly, 28(4), 446–456.

Janssen,M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers andmyths ofopen data and open government. Information Systems Management, 29(4), 258–268.

Kalampokis, E., Tambouris, E., & Tarabanis, K. (2011). Open government data: A stagemodel. In M. Janssen (Eds.), Lecture notes in computer science: Electronic government(pp. 235–246). Berlin Heidelberg: Springer.

Karr, A. F. (2008). Citizen access to government statistical information. In H. Chen(Eds.), Digital government: E-government research, case studies, and implementation(pp. 503–529). New York, USA: Springer.

Kassen, M. (2013). A promising phenomenon of open data: A case study of the Chicagoopen data project. Government Information Quarterly, 30(4), 508–513.

Kerschberg, B. (2011).Metadata, the freedom of information act, and government hypocrisy.Retrieved from. http://www.forbes.com/sites/benkerschberg/2011/04/11/metadata-the-freedom-of-information-act-and-government-hypocrisy/.

King, R. D., Liakata, M., Lu, C., Oliver, S. G., & Soldatova, L. N. (2011). On the formalization andreuse of scientific research. Journal of The Royal Society Interface, 8(63), 1440–1448.

Lee, G., & Kwak, Y. H. (2012). An open governmentmaturitymodel for social media-basedpublic engagement. Government Information Quarterly, 29(4), 492–503.

Linders, D. (2013). Towards open development: Leveraging open data to improve theplanning and coordination of international aid. Government Information Quarterly,30(4), 426–434.

Marchionini, G., Samet, H., & Brandt, L. (2003). Introduction to a special issue on digitalgovernment. Communications of the ACM, 46(1), 24–27.

Mclaren, R., & Waters, R. (2011). Governing location information in the UK. TheCartographic Journal, 48(3), 172–178.

Nam, T. (2011). New ends, new means, but old attitudes: Citizens' views on opengovernment and government 2.0. Proceedings 44th Hawaii International Conferenceon System Sciences (HICSS) (January 4-7).

Napoli, P.M., & Karaganis, J. (2010). On making public policy with publicly available data:The case of U.S. communications policymaking. Government Information Quarterly,27(4), 384–391.

Nolan, J. (2012). Military computer upgrades 30 years behind schedule, cost $7 billion. Re-trieved from. http://www.standard.net/stories/2012/06/22/military-computer-upgrades-30-years-behind-schedule-cost-7-billion.

Quam, E. (2001). Informing and evaluating a metadata initiative: Usability and metadatastudies in Minnesota's Foundations Project. Government Information Quarterly, 18(3),181–194.

Schwartz, M. (2010). Department of Defense Contractors in Iraq and Afghanistan: Back-ground and Analysis. Congressional Research Service Retrieved from. http://books.google.com/books?id=F5xB0r3qw0QC.

Singer, P. W. (2010). The regulation of new warfare. The Brookings Institution.Strickland, L. S. (2005). The information gulag: Rethinking openness in times of national

danger. Government Information Quarterly, 22(4), 546–572.Strong, D.M., Lee, Y. W., & Wang, R. Y. (1997). Data quality in context. Communications

of the ACM, 40(5), 103–110.U.S. Government (2005). Federal acquisition regulation. Retrieved from. http://www.

acquisition.gov/far/current/pdf/FAR.pdf.Venkatesh, V., & Davis, F. D. (2000). A theoretical extension of the technology acceptance

model: Four longitudinal field studies. Management Science, 46(2), 186–204.Venkatesh, V., Morris, M. G., Davis, G. B., & Davis, F. D. (2004). User acceptance of

information technology: Toward a unified view. MIS Quarterly, 27(3), 425–478.Whitmore, A. (2012a). A statistical analysis of the construction of the united nations

e-government development index. Government Information Quarterly, 29(1), 68–75.Whitmore, A. (2012b). Extracting knowledge from U.S. Department of Defense freedom

of information act requests with social media. Government Information Quarterly,29(2), 151–157.

Yang, Z., & Kankanhalli, A. (2013). Innovation in government services: The case of opendata. In Y. K. Dwivedi (Eds.), Grand successes and failures in IT. Public and privatesectors (pp. 644–651). Berlin Heidelberg: Springer.

Yin, R. K. (2009). Case study research: Design and methods (4th ed.). Thousand Oaks,California: SAGE Publications, Inc.

Zuiderwijk, A., Janssen, M., Choenni, S., Meijer, R., & Sheikh Alibaks, R. (2012).Socio-technical Impediments of Open Data. Electronic Journal of e-Government(EJEG), 10(2), 156–172.

Andrew Whitmore is on the faculty of the Department of Information Systems andDecision Science at the University of South Florida. His primary areas of research interestinclude public sector information systems and the “big data”movement. He holds a PhD inInformatics from the State University of New York at Albany, an MS in InformationSystems from Johns Hopkins University, and a BA in Economics from Cornell University.Dr. Whitmore may be reached at , [email protected].

redict war: A case study of data and systems challenges, Government