ucdp ged codebook version 5 · what the ucdp ged version 5.0 dataset contains: the basic unit of...

39
UCDP Georeferenced Event Dataset Codebook Version 5.0 This version authored by: Mihai Croicu, Ralph Sundberg, Ph. D. When citing this dataset, please always cite: The official data presentation article Sundberg, Ralph, and Erik Melander, 2013, “Introducing the UCDP Georeferenced Event Dataset”, Journal of Peace Research, vol.50, no.4, 523-532 …and the codebook Croicu, Mihai and Ralph Sundberg, 2016, “UCDP GED Codebook version 5.0”, Department of Peace and Conflict Research, Uppsala University The official abbreviation of this dataset is UCDP GED. The official full name of this dataset is UCDP Georeferenced Event Dataset. The current version of the dataset is 5.0 The current dataset maintainer is Mihai Croicu For any questions or comments contact: [email protected] Data extracted from UCDP systems on 15 October 2016

Upload: others

Post on 24-May-2020

25 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

UCDPGeoreferencedEventDatasetCodebookVersion5.0

Thisversionauthoredby:

MihaiCroicu,RalphSundberg,Ph.D.

Whencitingthisdataset,pleasealwayscite:

TheofficialdatapresentationarticleSundberg,Ralph,andErikMelander,2013,“IntroducingtheUCDPGeoreferencedEventDataset”,JournalofPeaceResearch,vol.50,no.4,523-532…andthecodebookCroicu,MihaiandRalphSundberg,2016,“UCDPGEDCodebookversion5.0”,DepartmentofPeaceandConflictResearch,UppsalaUniversity

TheofficialabbreviationofthisdatasetisUCDPGED.TheofficialfullnameofthisdatasetisUCDPGeoreferencedEventDataset.

Thecurrentversionofthedatasetis5.0

ThecurrentdatasetmaintainerisMihaiCroicuForanyquestionsorcommentscontact:[email protected]

DataextractedfromUCDPsystemson15October2016

Page 2: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

2/39

Quick Start Guide: What the UCDP GED version 5.0 dataset contains: ThebasicunitofanalysisfortheUCDPGEDdatasetisthe“event”,anindividualincident(phenomenon)oflethalviolenceoccurringatagiventimeandplace.Morespecificallywedefineaneventas:“Anincidentwherearmedforcewasbyanorganised actor against another organized actor, or against civilians,resultinginatleast1directdeathataspecificlocationandaspecificdate”.The dataset contains 128 264 events. It is a global dataset that covers theentirety of theGlobe (excluding Syria)between 1989-01-01and 2015-12-31.Themaximum(best)spatialresolutionofthedatasetistheindividualvillageortown.Thedatasetisfullygeocoded.Themaximum(best)temporalresolutionofthedatasetistheday.OnlyeventslinkabletoaUCDP/PRIOArmedConflict,aUCDPNon-StateConflictoraUCDPOne-SidedViolenceinstanceareincluded.Eventsareincludedfortheentireperiod,i.e.bothfortheyearswhensuchconflictswereactiveandfortheyearswhensuchconflictswherenotactive.TheUCDPGED5.0is100%compatiblewithUCDPGED4.1,4.0,3.0and2.0.Forbackwardscompatibilitywithversions1.0–1.9seethefullcodebook.The dataset is available in CSV, Shapefile, Rdata, Excel and SQL (Postgres,MySQLandMicrosoftSQLServer).AbetapublicAPIforGEDdataisdescribedat:http://ucdp.uu.se/apidocs/ Quick overview of the variables included in the UCDP GED version 5.0: Variable name Content Type id A unique numeric ID identifying each event. integer relid A quick machine parse-able string key (hash)

describing the content of each event. The key is constructed using the abbreviation of the country name (for instance AFG for Afghanistan), the calendar year, the type of violence, the dyad or actor ID and a counter. This variable is also a unique identifier for each event in

string(255)

Page 3: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

3/39

the entire dataset year The year of the event integer active_year 1: if the event belongs to an active conflict/dyad/actor-

year 0: otherwise

Integer

type_of_violence

Type of UCDP conflict: 1: state-based conflict 2: non-state conflict 3: one-sided violence

integer

conflict_dset_id UCDP/PRIO ID for state based conflicts as per the UCDP/PRIO Armed Conflict Dataset. UCDP conflict ID code for non-state dyad as per the UCDP Non-State Dataset. UCDP actor ID code for the one-sided violence actor as per the UCDP One-Sided Dataset. Note that the same conflict_id in GED can represent two separate conflict, as the dyad_id is unique only within each type of violence (i.e. a nonstate dyad can an identical id with a state-based conflict)1. Warning: DO NOT USE this variable to aggregate on or to subset/filter on, as ERRONEOUS results will result. For desired functionality please USE dyad_new_id below. This variable should only be used for merging with data from the UCDP Dyadic, UCDP Non-State and UCDP One-Sided Dataset. Further note that this variable will be deprecated in the future from all UCDP datasets and replaced with the conflict_new_id below.

string(255)

conflict_new_id A unique dyad id code for each individual conflict in the dataset. PLEASE DO USE this variable to aggregate, subset or filter by conflict within GED. Further note that this variable will be the main identifier in all future UCDP releases.

integer

conflict_name Name of the UCDP conflict to which the event belongs. For non-state conflicts and one-sided violence this is the same as the dyad name.

string(9999)

dyad_dset_id UCDP dyad id code for state based dyad as per the UCDP Dyadic Dataset.

string(255)

1Forexample,dyadid433identifiesthreeseparatedyads:GovernmentofSudan–SLM/A(state-based),GovernmentofSenegal–civilians(one-sided)andDioula–Guere(non-state)

Page 4: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

4/39

UCDP conflict ID code for non-state dyad as per the UCDP Non-State Dataset. UCDP actor ID code for the one-sided violence actor as per the UCDP One-Sided Dataset. Note that the same dyad_id in GED can represent two separate dyads, as the dyad_id is unique only within each type of violence (i.e. a nonstate dyad can an identical id with a state-based dyad). Warning: DO NOT USE this variable to aggregate on or to subset/filter on, as ERRONEOUS results will result. For desired functionality please USE dyad_new_id below. This variable should only be used for merging with data from the UCDP Dyadic, UCDP Non-State and UCDP One-Sided Dataset. Further note that this variable will be deprecated in the future from and replaced with the dyad_new_id below.

dyad_new_id A unique dyad id code for each individual dyad in the dataset. PLEASE DO USE this variable to aggregate, subset or filter by dyad within GED.

integer

dyad_name Name of the conflict dyad creating the event. A dyad is the pair of two actors engaged in violence (in the case of one-sided violence, the perpetrator of violence and civilians). The two sides are separated by an ASCII dash (e.g. Government of Russia - Caucasus Emirate, Taleban – civilians).

string(9999)

side_a_dset_id The unique ID of side A as in the current version of the UCDP Actor Dataset

string(255)

side_a_new_id A unique ID of side A. This will be used in the future as the main actor identifier.

integer

side_a The name of Side A in the dyad. In state-based conflicts always a government. In one-sided violence always the perpetrating party.

string(9999)

side_b_dset_id The unique ID of side B as in the current version of the UCDP Actor Dataset

string(255)

side_b_new_id A unique ID of side B. This will be used in the future as the main actor identifier.

integer

side_b The name of Side B in the dyad. In state-based always the rebel movement or rivalling government. In one-sided violence always “civilians”.

string(9999)

number_of_sources Number of total sources containing information for an event that were consulted.

integer

Page 5: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

5/39

Note that this variable is only available for data collected for 2013 and 2014, and for recently revised events. For older data, -1. Note that -1 does NOT mean information on the source is missing; reference to the source material is ALWAYS available in the source_article field.

source_article References to the names, dates and titles of the source material from which information on the event is gathered. A reference to at least one source material is available for ALL EVENTS. This variable is highly streamlined for information collected in 2013 and 2014, and less so for older data. For such older data, abbreviations such are used. The most frequent are: R: Reuters News, BBC: BBC Monitoring AP: Associated Press Newswires AFP: Agence France Presse, X: Xinhua DOW: Dow Jones Wires

text

source_office The name of the organizations publishing the source materials. Note that this variable is only available for data collected for 2013 and 2014, and for recently revised events. For older data, the field is empty. Note that an empty field does NOT mean information on the source is missing; reference to the source material is ALWAYS available in the source_article field, for every event.

text

source_date The dates the source materials were published. Note that this variable is only available for data collected for 2013 and 2014, and for recently revised events. For older data, the field is empty. Note that an empty field does NOT mean information on the source is missing; reference to the source material is ALWAYS available in the source_article field, for every event.

text

source_headline The titles of the source materials. Note that this variable is only available for data collected for 2013 and 2014, and for recently revised events. For older data, the field is empty. Note that an empty field does NOT mean information on the source is missing; reference to the source material is ALWAYS available in the source_article field, for

text

Page 6: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

6/39

every event. source_original The name or type of person or organization from which

the information about the event originates in the original report. e.g. “police”, “Lt. Col. Johnson”, ”eyewitnesses”, “rebel spokesman”.

string(9999)

where_prec The precision with which the coordinates and location assigned to the event reflects the location of the actual event. 1: exact location of the event known and coded. 2: event occurred within at maximum a ca. 25 km radius around a known point. The coded point is the known point. 3: only the second order administrative division where an event happened is known. That administrative division is coded with a point representing it (typically the centroid). 4: only the first order administrative division where an event happened is known. That administrative division is coded with a point representing it (typically the centroid). 5: the only spatial reference for the event is neither a known point nor a known formal administrative division, but rather a linear feature (e.g. a long river, a border, a longer road or the line connecting two locations further afield than 25 km) or a fuzzy polygon without defined borders (informal regions, large radiuses etc.). A representation point is chosen for the feature and employed. 6: only the country where the event took place in is known. 7: event in international waters or airspace.

integer

where_coordinates Name of the location to which the event is assigned. Fully standardized and normalized.

string(9999)

where_description An extracted snippet of text from the source material describing the location.

text

adm_1 Name of the first order (largest) administrative division where the event took place

string(9999)

adm_2 Name of the second order administrative division where the event took place

string(9999)

latitude Latitude (in decimal degrees) numeric(9,6) longitude Longitude (in decimal degrees) numeric(9,6) geom_wkt An Open Geospatial Consortium textual representation

of the location of each individual point. Formatted as well known text without SRID.

string(9999)

priogrid_gid The PRIO-grid cell id (gid) in which the event took place. Compatibility with PRIO-grid (Tollefsen, 2012) is guaranteed for both PRIO-grid 1 and 2.

integer

Page 7: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

7/39

Warning: We associate every point to the PRIO-grid that contains it, even if the point is in another country than the one officially assigned to the respective PRIO-grid cell through their majority area rule. It is your responsibility to make sure the covariates for the PRIO-grid cell are correct for each event. Further, for the same reason, DO NOT, under any circumstances, first clip out (subset) PRIO-grid by country before merging with UCDP GED as data loss will certainly occur. Refer to your copy of the PRIO-grid for further details on PRIO-grid's majority assignment rule (p.3).

country Name of the country in which the event takes place. string(999) region Region where the event took place. One of following:

{Africa, Americas, Asia, Europe, Middle East} string(999)

event_clarity 1 (high) for events where the reporting allows the coder to identify the event in full. That is, events where the individual happening is described by the original source in a sufficiently detailed way as to identify individual incidents, i.e. separate activities of fighting in a single location: Example of such reporting: “2 people where killed in Banda Aceh town on the 9th of December in fighting between the government and GAM when a car exploded in a main market.” 2 (lower) for events where an aggregation of information was already made by the source material that is impossible to undo in the coding process. Such events are described by the original source only as aggregates (totals) of multiple separate activities of fighting spanning over a longer period than a single, clearly defined day. Examples of such reporting: “The Ukrainian government informs that 29 people have died in the past six days in a number of clashes with the separatists along the line of conflict”.

integer

date_prec How precise the information is about the date of an event. 1: exact date of event is known; 2: the date of the event is known only within a 2-6 day range. 3: only the week of the event is known 4: the date of the event is known only within an 8-30 day range or only the month when the event has taken place is known 5: the date of the event is known only within a range longer than one month but not more than one calendar year.

integer

date_start The earliest possible date when the event has taken place.

Date YYYY-MM-

Page 8: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

8/39

DD date_end The last possible date when the event has taken place. Date

YYYY-MM-DD

deaths_a The best estimate of deaths sustained by side a. Always 0 for one-sided violence events.

integer

deaths_b The best estimate of deaths sustained by side b. Always 0 for one-sided violence events.

integer

deaths_civilians The best estimate of dead civilians in the event. For non-state or state-based events, this is the number of collateral damage resulting in fighting between side a and side b. For one-sided violence, it is the number of civilians killed by side a.

integer

deaths_unknown The best estimate of deaths of persons of unknown status.

integer

best_est The best (most likely) estimate of total fatalities resulting from an event. It is always the sum of deaths_a, deaths_b, deaths_civilians and deaths_unknown.

integer

high_est The highest reliable estimate of total fatalities integer low_est The lowest reliable estimate of total fatalities integer geom / geometry

An Open Geospatial Consortium / ESRI binary representation of each individual point. Contains the SRID (4326) where supported.

Due to the binary nature of this variable, this variable is contained only in the formats that support it.

geometry (Point,4326)

isocc The ISO 3-letter code of the country where the event takes place.

string(3)

gwno The Gleditsch and Ward numeric identifier of the country where the event takes place

integer

gwab The Gleditsch and Ward alphabetic identifier of the country where the event takes place

string(3)

This ends the quick-start guide. For more detailed information, please refer to the detailed dataset description in the next chapters of the codebook. Remainder of the page left empty for your notes.

Page 9: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

9/39

1 : Statement of Purpose The purpose of this project is to provide the academic community with the mostcomprehensivestructuredeventdataavailableonorganisedviolence inthepost-1989world,soastoanswerthecallforgeographicallyandtemporallydisaggregateddata.Whereas the ambition is to provide a dataset with both theoretical and practicalrelevanceforresearchersinabroadrangeofscholarlytraditions,mainlypragmaticandpracticaldecisionsguidetheconstructionofthedataset.Thisallowsforeffectivecodingproceduresaswellasdisaggregatedandflexibledatawithoutpredeterminedbiasesforcertain research purposes. The geo-referenced event data may thus be used forpurposes ranging from wanting to illustrate conflict behaviour geographically, usinggeographic information systems software, to studying causal pathways by applying avarietyofmethodsforstatisticalanalysis.Whilstretainingtheambitiontoprovideadatasetopenforabroadvarietyofresearchpurposes,thefocusofthedatasetonconflictdynamicsandtheeffectsofarmedviolence,intheformofdeaths,stillsetstheparametersforusers.ThismeansthattheUCDPGEDisineffectprimarilydirectedtoward,andwillmostprobablybeusefulto,quantitativeand comparative researchers interested in the fatal outcomes of violent conflictbehaviouratthelevelbelowthestate.Thus,thedatasetisconstructedinsuchawayasmaximizethecomparabilityandconsistencyacrosstimeandspace,andprovideagloballyconsistentimageofthephenomenonoforganizedviolence.Ineffect, thegoalofUCDPGEDisnottopresentthemostcompleteandaccurateimageofacertainconflictatacertainpoint in time,butratherbea tool for theglobalunderstandingofsubnationalconflictpatternsandtrends.Version 5 is the first full global event dataset, and consists of the data for thewhole World between 1989 and 2015, including cases such as Iraq, Colombia,Pakistan,AfghanistanandIndia.DataforSyriaisnotincludedinthecurrentversion–asdatacollectionprocessesandproceduresatthislevelofdisaggregationdidnotyieldaproduct releasableat this timewith thesame levelof consistencyandclarityasotherGEDdata.

2 : Definitions Wedefineaneventas:An incident where armed force was used by an organised actor against anotherorganizedactor,oragainstcivilians,resultinginatleast1directdeathataspecificlocationandaspecificdate”.Thesearethespecificelementsofthedefinition:1.Armed force: use of arms in order to promote the parties’ general position in theconflict,resultingindeaths.

Page 10: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

10/39

- arms: any material means e.g. manufactured weapons but also sticks, stones,fire,wateretc.2.Organizedactor:agovernmentofanindependentstate,aformallyorganizedgrouporaninformallyorganizedgroupaccordingtoUCDPcriteria:

a. Governmentofanindependentstate:Thepartycontrollingthecapitalofastate.

b. Formally organized group: Any non-governmental group of people havingannounced a name for their group and using armed force against agovernment (state-based), another similarly formalized group (non-stateconflict)orunorganizedcivilians(one-sidedviolence).Thefocusisonarmedconflict involving consciously conducted and planned political campaignsratherthanspontaneousviolence.

c. Informallyorganizedgroups:Anygroupwithoutanannouncedname,butwhich uses armed force against another similarly organized group(non-stateconflict),wheretheviolentactivityindicatesaclearpatternofviolent incidents that are connected and inwhich both groups use armedforceagainsttheother

3. direct death : a death relating to either combat between warring parties orviolenceagainstcivilians.

UCDP GED provides three estimates for deaths for each event, thus creating anuncertaintyinterval:

- a low estimate, containing the most conservative estimate of deaths that isidentifiedinthesourcematerial;

-abestestimate,containingthemostreliableestimateofdeathsidentifiedinthesourcematerial;

-ahighestimate,containingthehighestreliableestimateofdeathsidentifiedinthe source material. Note that UCDP attempts to distinguish and not includeunreasonable claims in the high estimate of fatalities, and tends to be highlyconservativewhencountingfatalities2.

Inorder foranevent toexist,at leastonedeadneeds toberegistered in thehigh,bestorlowestimate.

4.Specificlocation:anameandonepairoflatitudeandlongitudecoordinatesthatrelatetothegeographicalinformationspecifiedinthesourcematerial.

5.Specificdate: a specified timeperiodduringwhicharmed interactions causeatleast1fatality.Thenormaltemporalunittowhichaneventcanberelatedisa24-hourdaystartingatmidnight.

- In some cases it is impossible, based on the source material, to reduce thespecificdate toasingledayasreportingonlyrefers towider timespans(multiple

2For a more elaborate discussion on aspects concerning point 1-4, please refer to UCDPCodebooksforState-BasedArmedConflicts,Non-stateConflictsandOne-SidedViolence.

Page 11: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

11/39

days) or information on the exact day is not clear. For these events, awider timespanisprovidedthroughtheuseofthedate_start,date_endanddate_precvariables.

ForfurtherUCDPdefinitionspleaserefertothe"Definitions"sectionofUCDPswebpageavailableathttp://www.pcr.uu.se/research/ucdp/definitions/NotethatthisdefinitionisfullycompatiblewiththeoneusedforGED1.0-4.1.

3. Comparison with other event ontologies: TheUCDPGED isan"incidentdataset",sharingahighlysimilarconceptualizationof"events" with datasets such as the Global Terrorism Dataset (START, 2013), ACLED(Raleigh, 2009) or SCAD (Salehyan, 2012) in the sense that each entry represents anincident,areal-lifeseriesofactionscircumscribedtoacertaintypologyandresultinginacertainoutcomeorsetofoutcomes.InthecaseofUCDPGEDthistypologyis:fightingresultinginthedeathofatleastoneperson.Thisdiffersmarkedlyfromtheconceptualizationofevents inmost"machine"datasetsand coding systems such as PHOENIX/EL:DIABLO (OEDA, 2014), CAMEO/TABARI(Gerner et. al,2014),ICEWS/JABARI(Boscheeet.al.,2015)orKEDS(Schrodt,2006).Inthese datasets, an event represents an action between two actors (e.g. taunt, attack,fight, retreat, mediate etc.). Since typically multiple actions lead to an outcome (anincident) a UCDP GED event is equivalent to a collection or aggregation of multipleeventsinsuch"machine"datasets.Further,ascomparedtothesedatasets,UCDPGEDismore geared towards extracting information relating to the incident (such as fatalityfigures)ratherthanbinningactionsintosetsofcategories.Further,UCDPGEDdiffersmarkedlytoeventdatacollectioneffortswheretheeventisdefined as the individual source report, such as for example, MMAD (Rød andWeidmann,2014),sinceUCDPGEDcollatesinformationfromallsourcesreferringtothesameinformationandinterpretsalloftheminordertoextract informationanddefineanevent.Thus,UCDPGEDwouldbea filteredversionof suchadataset,with filteringperformed according to certain interpretation rules, uncertainty management anddefinitions.Of course, this comparison only describes the conceptual differences between ourconceptualization of "events" and other conceptualization of "events" for users eithermore familiar with other datasets or users wanting to use UCDP GED together withother event-type data. As such, no other event dataset replicates or is replicated byUCDP GED as UCDP GED collects information on a specific, unique type of socialbehaviour(asdefinedabove).

4. Sources and data collection process: The UCDP GED is manually curated and compiled, with automatic assistance in dataretrieval,filtering,datastorageandmanipulation,aswellasdatavalidation.

Page 12: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

12/39

TheoriginalreportingunderlyingUCDPGEDarecollectedfromthreesetsofsources:1.globalnewswirereporting2.globalmonitoringandtranslationoflocalnewsperformedbytheBBC3. secondary sources such as local media, NGO and IGO reports, field reports,booksetc.withaslightmajority(approximately60%ofthedataset)ofalleventsbeingbasedonglobalnewswiresreporting.Theprocessisdoneina"two-pass"system,firstbyconsultingnewswiresourcesfortheentireglobethenbyconsultinglocal/specializedsourcesbasedoninformationobtainedfromthefirstpass.Further, theUCDPGED is basedon the sameunderlyingdata that all the otherUCDPdatasetsarebasedon,i.e.UCDPGEDisnotbuiltFROMe.g.theUCDPDyadicDataset,butratherboththeUCDPDyadicDatasetandGEDarebuiltfromthesamedata.4.1Firstpass:GlobalNewswireandBBCMonitoringGlobalnewswirereportingaswellasBBCMonitoringdataaresourcedfromtheDowJonesFactivaaggregator,usingthefollowinggeneralsearch-string:kill* or die* or injur* or dead or death* or wounded or massacre* Thissearchisdoneglobally,with"headandlead"and"intelligentindexing"beingusedforfurtherfilteringinthosecaseswherethisisfeasible.In terms of sources, UCDP uses Reuters News, Agence France Presse (in English),AssociatedPress,Xinhua(inEnglish)aswellasBBCMonitoring.Notethat forsomeoftheyearsandgeographicareas,reportingforsomeofthesources isextremely limitedduetoFactiva'snon-inclusionofthewholecorpus.Similarly,mediareporting isnotconsistentacross timeorspace foranyof theabove-mentioned organizations. Changing managerial focuses, different organizationalstructures(suchasfieldofficelocations),aswellasdifferentresourcedistributionsandallocations (such as, for example, the restructuring of BBC Monitoring in the early2010s)makemediareportingqualityandquantityvastlydifferentovervariousperiodsandoverdifferentareas.Furthermore,thewaysinwhichconflictsarereportedsettheparameters for the preciseness of the data. In some countries and some phases ofconflicts,theeventdataisbasedoneitherdetaileddailyreportsormoresummary-likereportscoveringlargerareas.Giventhisvast inconsistencyinthesourcematerialgenerationprocess,wedonotaimto be "source-consistent" - i.e.we do not aim to use the exact same set of sources togeneratetheentiredataset,sincethiswouldnotprovideanythingelsethanaverypoorconveniencesample.Thenon-randomness insuchasamplewouldbedrivenbynews-agencies'procedures.

Page 13: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

13/39

Instead, we drive our coding procedures with the set goal of obtaining a populationdatasetattheaggregate level(acompleteyearly list)ofallUCDPconflicts.ThisallowsforconsistencytobepasseddowntotheeventlevelandthusincreasesthequalityandreliabilityofthesamplepresentedinGED3.

4.2Secondpass:Localandspecializednews-sourcesThus,formanyconflictsandperiodsweadd:secondarysourcessuchas: -localmonitoringofvariouslocalmedia(e.g.PressTrustofIndiaforIndia,orEFEnewsagencyforLatinAmericaorRadioOkapiforDR.Congo), - localmonitoring and research organizations (such as SATP for India andPakistan), - global NGO reports (such as those coming from Human Rights Watch orAmnestyInternational), -UN,EU,AUandotherIGOreports, -governmental publications (where considered reliable, such as those fromTruthandReconciliationCommissions), -researcharticlesorbooksetc.The choice of whether to include secondary sources is made by project leaders andUCDPcoderstogetherattheconclusionofthefirstpass.Thegoalofthisinclusionistofurther specify and identify conflicts (at the aggregate country-year level) in placeswheredetail is seen as insufficient. Thus, by attempting to equalize the level of detailidentified across the board we make the sample more normalized for substantiveanalysis.Codersareexpertsinthecodingprocess.Unlikeanyotherdatacollectionprojectinthefield,allcodersarefull-timelong-termemployeesofUCDP,typicallyfollowingconflictsandcountries for longperiods,andattaining inmanycasesspecialist status incertaingeographical areas. Further, UCDP consults external area specialists as to best selectsourcesappropriateforthissecondpass.4.3Furtherconsiderationsfordatavalidityandreliability

Ingeneral, thecodebookand itsappendicesaim tocontribute to improve,asmuchaspossible, the reliability of the data, by presenting clear and consequent definitions aswellastransparentcodingproceduresandrules.

3Whilewedoclaim thatourapproachwillprovideabetter sample for substantiveanalysis inmostcases,werecognizethatsomeanalyseswillrequireasource-consistentdataset. Ifyoudoneed to subset the data for source consistency (with the above-mentioned caveats), you cansubset thedataset to solelyuseoneorallnews-sourcesusing thesource_article variable.Youmayfindsuchsubsetsvaluable forthings likeanalysesonmediabiasesor, inconjunctionwiththefulldataset,capture-recapturetests.

Page 14: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

14/39

Theconstructedprecisioncodesfortime,geographyandeventclarity,howeverdetailedand elaborated, may allow for differing interpretations and understandings. Thoughcodingrulesandprecisioncodeshavebeenextensivelydiscussedwithresearchersandtested in a pilot phase of the project during the summer of 2009, the process ofconstructingthegeo-referencedeventdatasetisbasedonseveralproceduresthatmaynotalwayscorrespondtotherealityoftheevents.Forexamplewhenconstructingthedataset,theUCDPcodershave,forpragmaticreasons,workedfromtheassumptionthatalleventsreferring to thesamestartandenddatesand1 locationrepresentaneventclarity of 1.However, due to changing coding rules over a longperiodof time for theannual UCDP data, some of the dates as well as the included information are not aspreciseasothers.Thisisespeciallytruefortheyears2002and2003duringwhichtheUCDPexperiencedmajorstructuralrearrangementsandimprovements.

Further, differences in reporting may affect not only how much of the real worldpopulationofeventsiscoded(seeabove),butalsowhatdetaillevelcanbeextracted.Assuch, for some countries, precise locationsmight be uncommon in reports on armedviolence.Theremightevenbeapreference towardsreportingviolentactivitieson thefirst-order administrative level or less,which decreases the geographical precision inlarge.Inrelationtothis,thecodersoftheGEDareexpertsonthecodingprocedure,yetseldomon the geographical dimensions of each conflict. This opens up for an errormarginalwhere unclear location phrases such as “area” or “zone” can be misinterpreted. Toaddress this challenge, the UCDP begins with studying the geographical andadministrativestructuresforeachnewcountrytocode.

4.4.QualityassuranceThe dataset includes an extensive series of procedures to assure the quality andreliabilityofthedata.First, a two-stage coding procedure is employed for each event, with at least twoseparate coders being in charge of coding and revising each individual event for thefinalized product. This two-stage procedure is conducted at different times (formosteventsat leastoneyearapart,but inothercasesasmuchas10yearsapart),andusesseparatesetsofprocedures,soastoinsurethatcodersdonotinfluenceeach-otherandtoinsureinter-coderreliability.Alargenumberofroutinesareset inplaceatthisstageforqualitycontrol,eachcoderbeing given a fixed, comprehensive set of protocols to be followed that ensure theconsistent treatment of, amongst other, dyad names, dyad IDs, precision scores,geocodinglocations,streamliningofnames,integrityoffatalityestimatesetc.Theexactcodingprocedure ishighly formalized,withall thestepsof theprocessbeinggiven tocoders, together with algorithms to insure the correctness of all the decisions thatcodershavetotake.Similarly,theassignationofeacheventtoatypeofviolenceandtoadyadisdonebyatleasttwocodersfollowingasimilarlystrictsetofroutines.Identifiedinconsistenciesareresolvedthroughregular,frequentmeetings(atleastonceaweek)whereallthecodersandprojectmanagerstakepart.Third, over50automatic tests are applied to thedata, followedbya seriesofmanualchecksbyaproject leader.Algorithmsverifying that locationsareproperlygeo-coded

Page 15: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

15/39

(through theusageofdensityanalysesand interpolation techniques), thatADM1sandADM2sareproperly identifiedand linked toevents, thatdeathestimatesareproperlyused, that IDsareproperlyusedandconsistentwiththeaggregateddatasetsaredoneetc.Someofthesetestsaredoneatpointofinput,GEDusingastate-of-theart,custombuiltdata-management,data-inputanddata-storagefacility,othersatreleasetime.Visualizationsof thedataareprovidedtocodersandprojectmanagersusingaGoogleMapsAPIderivedsolution.The automated routines do not make any modifications to the datasets, requiring ahumancodertomakeallthechangesforanaddedlevelofsecurity.Theautomatedtestsare re-run for asmany times as required, until the data is deemed as acceptable forreleasebyaprojectmanager.****INSERTSOMETEXTABOUTExternalAssessmentsandValidations****

5. Data inclusion: Theeventdatasethasadyad and actor focus, tracing theeventsofallUCDPconflictdyads4for both active years (years that have crossed the 25 battle related deathsthreshold)andnon-activeyears(theremainder).Thus,ifadyadcrossedthe25-deathsthresholdinasingleyear,butdidgeneratesomeevents in either previous or subsequent years, all events belonging to the dyad areincluded,includingthoseinyearswherethethresholdwasnotcrossed5.The dataset includes all three types of UCDP organised violence: state-based conflict,non-stateconflictandone-sidedviolence.AllthreecategoriesoftheUCDPannualdataare mutually exclusive and coded events will therefore also be exclusive and non-overlapping.Thedataseriesstartin1989andeventsbeforethiscalendaryeararenotincluded.AlltheinclusioncriteriaareidenticaltoUCDPGEDversion1.5–4.1.

4Adyadconsistsof twoconflictingprimarypartiesorpartykillingunarmedcivilians. Instate-basedarmedconflicts,atleastoneoftheprimarypartiesmustbethegovernmentofastate.Astate-basedconflictcanincludemorethanonedyad,ifmultiplegroupsopposethegovernmentover the same incompatibility; non-state conflicts andone-sidedviolence instances are alwaysequivalenttoadyad.Further, in non-state armed conflicts, a dyad can only consist of formally versus formallyorganizedgroupsorinformallyversusinformallyorganizedgroups.Aformallyorganizedgroupcan not be fighting an informally organized group to keep non-state conflicts and one-sidedviolenceasindependentcategories.5E.g. State-based dyad 431 (Government of Uganda – UNRF II) crosses the 25 battle-relateddeaths threshold only in 1997.However, this dyad had some events, but did not cross the 25battle-relateddeaths in1996and1998. Inversions1.0and1.1only thoseeventsbelonging tothedyad in1997were included. In thisversionall theeventsbelongingto thedyad(includingthosein1996and1998)wereincluded.

Page 16: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

16/39

ForatabularviewofhowtheUCDPGEDdatasetrelatestootherUCDPproductspleaserefertotable2below.NotethatthereleaseofUCDPGEDisnotsynchronizedperfectlywiththeabovedatasets,thusdatadiscrepanciesmayappearduetodatarevisions.Table2.UCDPdatacorrespondingtothetheGEDDyadType Period ActorInclusion EventInclusion ReferenceState-Based 1989-2015 All dyads that cross the 25

battle-related deathsthresholdinatleastoneyearof the period and have astated goal ofincompatibility.The whole activity of thesedyads is included over theentire 1989-2015 period(including those dyad-yearswhen the 25 battle relateddeaths threshold was notexceeded).

All events leadingto at least onedeathbetween:

- 1989-01-01 to2015-12-31

UCDP/PRIO ArmedConflict DatasetCodebook Version 4-2016

Non-State 1989-2015 All dyads that cross the 25battle-related deathsthresholdinatleastoneyearoftheperiod.The whole activity of thesedyads is included over theentire 1989-2015 period(including those dyad-yearswhen the 25 deathsthreshold was notexceeded)..

All events leadingto at least onedeathbetween:

- 1989-01-01 to2015-12-31

UCDP Non-StateConflict CodebookVersion2.5-2016

One-Sided 1989-2015 All dyads that have evercrossed the 25 deathsthresholdduringthisperiod.The whole activity of thesedyads is included over theentire 1989-2015 period(including those actor-yearswhen the 25 deathsthreshold was notexceeded).

All events leadingto at least onedeathbetween:

- 1989-01-01 to2015-12-31

UCDP One-SidedViolence Codebook,Version1.4-2016

6. Dataset content: Thedatasetcontentscanbedividedinto7categories:eventidentifiers;actorsanddyads;sources;geography;time;clarity;fatalityfigures.Notethatthecurrentversionofthedatasetcontainsdifferentfieldidentifiers(variablenames) and field positions than GED version 1. You will have to adapt your scripts

Page 17: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

17/39

developed for UCDP GED 1 to work with UCDP GED 2-5. Further note that somevariableshavechanged,andothershavebeenadded.AllvariablespresentinUCDPGED5havebeenpresentinUCDPGED2,3,4and4.1.VariablesnewinGED2/3/4ascomparedtoGED1areunderlinedinred.VariableschangedinGED2/3/4ascomparedtoGED1areunderlinedinblack.VariablesunchangedthroughoutGED1/2/3areinsimplebold.6.1EventidentifiersThissectionprovidesuniqueidentifiersforeveryevent(row/entry) inthedataset.Allvariablesinthissectioncanbeusedasauniquekeyforthedataset.id A persistent unique numeric ID identifying each

event. The same id in versions 1.9, 2, 3, 4, 4.1 and 4.5 identifies the same event (incident). This allows changes between versions to be traced at event level.

integer

relid A quick machine parse-able string key (hash) describing the content of each event. The key is constructed using the abbreviation of the country name (for instance AFG for Afghanistan), the calendar year, the type of violence, the dyad or actor ID and a counter. The format of the key is CCC-YYYY-T-DDDDD-SSSS. CCC = Gleditsch and Ward 3-letter textual country code for the first country of activity of side A (e.g. AFG for Afghanistan). YYYY = Calendar Year of the event. T = Type of violence (1 for state-based violence, 2 for non-state violence, 3 for one-sided violence). DDDD = Dyad_id of the event. SSSSS = A counter.

string(255)

6.2ActorsanddyadsThis sectionprovidesvariables thatallow for linkagesbetween theUCDPGEDandallotherUCDPdatasets.This sectionalsoprovideswithvariables to allowyou toaggregate/filter/extractdataonconflict,dyadoractor.

Page 18: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

18/39

active_year 1: if the event belongs to an active

conflict/dyad/actor-year 0: otherwise

Integer

type_of_violence

Type of UCDP conflict: 1: state-based conflict 2: non-state conflict 3: one-sided violence

integer

conflict_dset_id UCDP/PRIO ID for state based conflicts as per the UCDP/PRIO Armed Conflict Dataset. UCDP conflict ID code for non-state dyad as per the UCDP Non-State Dataset. UCDP actor ID code for the one-sided violence actor as per the UCDP One-Sided Dataset. Note that the same conflict_id in GED can represent two separate conflict, as the dyad_id is unique only within each type of violence (i.e. a nonstate dyad can an identical id with a state-based conflict)6. Warning: DO NOT USE this variable to aggregate on or to subset/filter on, as ERRONEOUS results will result. For desired functionality please USE dyad_new_id below. This variable should only be used for merging with data from the UCDP Dyadic, UCDP Non-State and UCDP One-Sided Dataset. Further note that this variable will be deprecated in the future from all UCDP datasets and replaced with the conflict_new_id below.

string(255)

conflict_new_id

A unique dyad id code for each individual conflict in the dataset. PLEASE DO USE this variable to aggregate, subset or filter by conflict within GED.

integer

conflict_name Name of the UCDP conflict to which the event belongs. For non-state conflicts and one-sided violence this is the same as the dyad name.

string(9999)

dyad_dset_id UCDP dyad id code for state based dyad as per the UCDP Dyadic Dataset. UCDP conflict ID code for non-state dyad as per the UCDP Non-State Dataset.

string(255)

6Forexample,dyadid433identifiesthreeseparatedyads:GovernmentofSudan–SLM/A(state-based),GovernmentofSenegal–civilians(one-sided)andDioula–Guere(non-state)

Page 19: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

19/39

UCDP actor ID code for the one-sided violence actor as per the UCDP One-Sided Dataset. Note that the same dyad_id in GED can represent two separate dyads, as the dyad_id is unique only within each type of violence (i.e. a nonstate dyad can an identical id with a state-based dyad). Warning: DO NOT USE this variable to aggregate on or to subset/filter on, as ERRONEOUS results will result. For desired functionality please USE dyad_new_id below. This variable should only be used for merging with data from the UCDP Dyadic, UCDP Non-State and UCDP One-Sided Dataset.

dyad_new_id A unique dyad id code for each individual dyad in the dataset. PLEASE DO USE this variable to aggregate, subset or filter by dyad within GED.

integer

dyad_name Name of the conflict dyad creating the event. A dyad is the pair of two actors engaged in violence (in the case of one-sided violence, the perpetrator of violence and civilians). The two sides are separated by an ASCII dash (e.g. Government of Russia - Caucasus Emirate, Taleban – civilians).

string(9999)

side_a_dset_id The unique ID of side A as in the current version of the UCDP Actor Dataset

string(255)

side_a_new_id A unique ID of side A. integer side_a The name of Side A in the dyad. In state-based

conflicts always a government. In one-sided violence always the perpetrating party.

string(9999)

side_b_dset_id The unique ID of side B as in the current version of the UCDP Actor Dataset

string(255)

side_b_new_id A unique ID of side B. integer side_b The name of Side B in the dyad. In state-based always

the rebel movement or rivalling government. In one-sided violence always “civilians”.

string(9999)

6.3.DescriptionofSourcesThis section contains references to the sourcesunderlying eachevent. See section4.2foradescriptionofthedatacollectionprocessesandsourceselectionprocess.

Page 20: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

20/39

While not available in the dataset, UCDP does keep the full text of the underlyingreporting. As most of this text is copyrighted to news agencies/publishers, it isimpossibletosupplysuchtextstogetherwiththedataset.Ifyouneedtoobtainaccesstothe full text of reports, youwill either need to re-download them from Factiva/LexisNexis7.UCDP does not store the unique identifiers that Factiva, Reuters, AFP etc. assigns toevents,asduringthedecades-longdatacollectionprocessweobservedsuchidentifierschangemultipletimes,makingthemuselessfortracingsourcematerialdirectly.number_of_sources Number of total sources containing information for

an event that were consulted. Note that this variable is only available for data collected for 2013 - 2015, and for recently revised events. For older data, -1. Note that -1 does NOT mean information on the source is missing; reference to the source material is ALWAYS available in the source_article field.

integer

source_article References to the names, dates and titles of the source material from which information on the event is gathered. A reference to at least one source material is available for ALL EVENTS. This variable is highly streamlined for information collected in 2013 - 2015, and less so for older data. For such older data, abbreviations such are used. The most frequent are: R: Reuters News, BBC: BBC Monitoring AP: Associated Press Newswires AFP: Agence France Presse, X: Xinhua DOW: Dow Jones Wires

text

source_office

The name of the organizations publishing the source materials. Note that this variable is only available for data collected for 2013 - 2015, and for recently revised events. For older data, the field is empty. Note that an empty field does NOT mean information on the source is missing; reference to the source material is ALWAYS available in the source_article field, for every event.

text

7Forverysmallsamplesororiginalreportsorinformationonindividualevents,youarewelcometocontactus.

Page 21: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

21/39

source_date The dates the source materials were published. Note that this variable is only available for data collected for 2013 - 2015, and for recently revised events. For older data, the field is empty. Note that an empty field does NOT mean information on the source is missing; reference to the source material is ALWAYS available in the source_article field, for every event.

text

source_headline

The titles of the source materials. Note that this variable is only available for data collected for 2013 - 2015, and for recently revised events. For older data, the field is empty. Note that an empty field does NOT mean information on the source is missing; reference to the source material is ALWAYS available in the source_article field, for every event.

text

source_original The name or type of person or organization from which the information about the event originates in the original report. e.g. “police”, “Lt. Col. Johnson”, ”eyewitnesses”, “rebel spokesman”.

string(9999)

6.4.GeographyData in the UCDP GED is geo-referenced, meaning that each event is connected to aspecificlocationdefinedbyapairoflatitudeandlongitudecoordinates.Each event is connected to a single location. If reporting talks about multiplelocations but gives only one aggregated fatality figure is given, then the followingprocedureisapplied: -oneseparateeventiscreatedforeachlocation; -deathsare splitbetween locationsasevenlyaspossible inorder tomaintainthe fatality figures as integers. The split is performed automatically by the datamanagementsystem8.ThecoordinatesarefixedtotheWorldGeodeticSystemof1984(WGS84),EPSGSRID4326.Thesecoordinatesarespecifiedindecimaldegreeswithaprecisionof6decimalfigures(e.g.75.920211).Coordinates(latitudeandlongitude)usedintheGEDarebasedonthemostpreciselocationmentionedinthesource.Thelowestlevelofspatialdisaggregationforanurbanlocationisthetown,fortheruralareas,thevillage.8ifinsufficientdeathsexisttocreatetherequirednumberofevents(e.g.thereportingspeaksofthreelocationsandofonlytwodead),nosplitisperformed.Instead,thesmallestgeographicalunitencompassingallmentionedlocationsisusedwithanappropriateprecisionscore.

Page 22: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

22/39

Street, neighborhoods, parts of towns are not coded, even when such information isavailableinthereporting.Thus,atownisalwaysrepresentedbyasinglepairoflatitudeandlongitudecoordinates.Suburbs, as long as they can be seen as separate urban areas, distinct from themaintown,arecodedas individual towns. Similarly, airportsarealwayscodedas separateentities.Other features such as “mountains”, “peaks” and “forests” are also used to specifygeographicallocation,aslongastheirsizeiscomparable(sameorderofmagnitude)tothoseoftownsorvillages.Thenextlowestlevelsofspatialdisaggregationaretheadministrativedivisionofthecountry.UCDP uses two levels administrative divisions for every country, the first-orderadministrativedivision(referredtoastheADM1)andthesecondorderadministrativedivision(referredtoastheADM2).In the case of multiple, contested administrative systems (such as in Sri Lanka orNagorno-Karabakh), UCDP uses the administrative system of the governmentcontrollingthecapitalofthecountrywheretheeventtakesplacein.Thehighest levelofspatialaggregationforlocationisthecountry,definedusingtheGleditschandWardlist.Further,allthegeocodingistime-aware,i.e.locationsarecodedtotheplace-namesandadministrativedivisionsthatwereinplaceatthetimetheeventtookplace.Forexample,aneventthattookplacein1989inwhatistodaySt.Petersburg,Russia,isgeocodedashappening in Leningrad, Soviet Union. Thus, changes in administrative structures ofcountries,aswellaschangesinbordersarevisibleinUCDPGED.Alltext-basedinformationonlocationisprovidedintwoways:-thenameofthelocationasmentionedinthetext(where_description).- the name of the location whose coordinates were assigned to the event(where_coordinates).Whileinmostcasesthetwoaresimilar,intwocasestheywilldiffer:- if the locationmentioned is not found during the geocoding process, and a coarseradministrativeunitisused(ex.ifNgwanvillageinShanStateisfoundinthetext,butthecodercannotfindthatexactvillage,where_descriptionwillindicateNgwanvillageinShanStatewhereaswhere_coordinateswillindicateShanState.ThelatitudeandlongitudeinthiscasewillbethoseofShanState).- if the location is spelled in different ways (e.g. Mumbai and Bombay; inwhere_description the spelling of the article will be used, inwhere_coordinates onlyoneformwillbepresent:Mumbaitown).where_coordinates isalwaysstreamlined-alatitude/longitudepairwillonlyeverlink to one where_coordinate. Further in where_coordinate, all capitals are

Page 23: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

23/39

referredtoas“cities”,allurbanlocalitiesotherthancapitalsas“towns”(NewYorkCityTownisacorrectnameinwhere_coordinate),allrurallocalitiesasvillagesorlocalitiesetc.

6.4.1.Geo-referencingsourcesUCDPdoesnotemployanover-archingsourceforgeocoding,asexperiencehasproventhatthereisnoqualityglobalsourceforlocationdata,especiallyforconflictzonesandleast-developedcountries.As such, UCDP coders employ sources such as global gazetteers (such as the UnitedStates National Geospatial Intelligence Agency’s GEOnet Names Server, Maplandia,GeoHack or the Google Geocoding API), local maps provided by governmentalauthorities, UN agencies (such as UN OCHA) or local NGOs, as well as, on occasion,historicalmapssuchastheUSArmyMapServiceGlobalTopographicMapsseries.Supervised semi-automatic geocoding is employed in a number of cases (mainly inEurope and the Former Soviet Union), using Google GeocodingAPI, Yandex andBing.Strings to be geocoded are always manually extracted, however, and the resultinggeocodingisvettedbothmanuallyandbyautomaticprocedures.Extremecareistakentoinsurethefullconsistency,coherenceandreliabilityofthedataacross thedataset.UCDPmaintainsbotha repositoryofall thenamespreviouslygeo-coded,aswellasinternalautomatedsystemsdesignedtoinsurethatconsistency(suchas 1:1 matches between place-names and coordinates) is maintained throughout thedataset.Information used to determine administrative divisions (labelled ADM1 and ADM2)stem from several different sources, commonly from a government’s ownwebsite orreferenceliteraturethatcoversadministrativedivisionsglobally.TheglobalISO3166-2standardisfurtherusedforidentifyingadministrativedivisions.Note that while in most cases ADM1s are the largest administrative divisions in acountry, in some cases (such as Russia or Romania) they are not, as the largestadministrative division is either solely a statistical reporting unit or simply a legalfiction.Correspondence regarding geographical coordinates, administrative divisions and anygeneralquestionsorcommentsregarding thegeographicaspectsof thecodingshouldbeemailedtothemaintainerof thedataset.Also,pleasereportanypotentialerrors inthedataset.

6.4.2.Geo-precisionanditsValuesIn order to determine the precision with which specific latitude and longitudecoordinates are connected to an event location, the dataset uses a geo-precisionvariable. Precise coding rules and examples of how the geo-precision values areassignedintheGEDcanbefoundintheAppendix.Thegeo-precisionvariablecanhavesevenvalues:

Page 24: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

24/39

1-Eventcanberelatedtoanexactlocation,meaningaplacenamewithaspecificpairoflatitudeandlongitudecoordinates;2 - Event can be “near”, in the “area” of or up to 25 kmaway froman exact location,meaningaplacenamewithaspecificpairofcoordinates;3 - Event can be related to a second order administrative division (ADM2), such as adistrict,municipalityorcommune4 - Event can be related to a first order administrative division (ADM1), such as aprovince,stateorgovernorate;5-Eventcanonlybespecifiedtoa featurethat isneitheraknownpointnoraknownformaladministrativedivision,butratheralinearfeature(e.g.alongriver,aborderoraroad)orafuzzypolygonwithoutdefinedborders(informalregions,largeradiusesetc.).Arepresentationpointischosenforthefeatureandemployed.Similarly,ifalocationisonlyknowntobebetweentwopoints,andthesetwopointsaremorethan25kmapart,suchlocationsarecodedwithgeoprecision5.6-Eventcanonlyberelatedtothewholecountry;7 - Event can only be related to an estimated pair of coordinates at sea or in the air(providedtheairplanedidnotcrashasaresultoftheevent;insuchcasesthelocationofthecrashiscodedwiththeappropriateprecisioncode).

where_prec Described above integer where_coordinates Described above string(9999) where_description Described above text adm_1 Described above string(9999) adm_2 Described above string(9999) latitude Latitude (in decimal degrees) numeric(9,6) longitude Longitude (in decimal degrees) numeric(9,6) geom_wkt An Open Geospatial Consortium textual representation

of the location of each individual point. Formatted as well known text without SRID.

string(9999)

priogrid_gid The PRIO-grid cell id (gid) in which the event took place. Compatibility with PRIO-grid 1 and 2 (Tollefsen, 2012) is assured. Warning: We associate every point to the PRIO-grid that contains it, even if the point is in another country than the one officially assigned to the respective PRIO-grid cell through their majority area rule. It is your responsibility to make sure the covariates for the PRIO-grid cell are correct for each event. Further, for the same reason, DO NOT, under any circumstances, first clip out (subset) PRIO-grid by country before merging

integer

Page 25: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

25/39

with UCDP GED as data loss will certainly occur. Refer to your copy of the PRIO-grid for further details on PRIO-grid's majority assignment rule (p.3).

country Name of the country in which the event takes place. Note that this variable differs from the country variable in the annual UCDP data, which registers the country of the incompatibility/actor and not the country location of the specific events.

string(999)

region Region where the event took place. One of following: {Africa, Americas, Asia, Europe, Middle East}

string(999)

isocc The ISO 3-letter code of the country where the event takes place.

string(3)

gwno The Gleditsch and Ward numeric identifier of the country where the event takes place

integer

gwab The Gleditsch and Ward alphabetic identifier of the country where the event takes place

string(3)

6.5.ClarityThiscodeswhetherthereportingwassufficientlyclearforthecodertobeabletofullyidentifytheeventitselfornot.

1:(denotinghighclarity):eventswherethereportingallowsthecodertoidentifythe event in full. That is, eventswhere the individual happening is described by theoriginal source in a sufficiently detailed way as to identify individual incidents, i.e.separateactivitiesoffightinginasinglelocation:Example of such reporting: “2 people where killed in Banda Aceh town on the 9th ofDecember in fightingbetween thegovernmentandGAMwhena car exploded inamainmarket.”

2 : (denotinglowerclarity):foreventswhereanaggregationof informationwasalready made by the source material that is impossible to undo in the codingprocess. The codermerely has access to sources saying that events have taken place(and has aggregated fatality figures), but cannot break apart the reporting intoconstituentevents.Sucheventsaredescribedbytheoriginalsourceonlyasaggregates(totals)ofmultipleseparateactivitiesoffightingspanningoveralongerperiodthanasingle,clearlydefinedday. Given that the report aggregatesmultiple incidents into one story impossible todisaggregate back, it is unclear how many battles took place during the time periodspecifiedinthesource.Thustheyare"secondaryevents",becausetheformofreportingdoes not allow the coder to know exactlywhen the casualties occurred, and how thebattleswerefought,andtheeventthussummarisesaseriesofclashesintooneevent.Ofcourse,UCDPhasapreferenceforeventswithaclarityof1;eventswithaclarityof2are justacomplement to the former. In fact,often times, it ispossible,usuallybycorroboratingmultiplereports,toidentifysomeoftheclarity-1eventscontainedinthe

Page 26: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

26/39

description making up the event with clarity of 2. In such cases fatalities in suchidentified events are subtracted from those given in the clarity-2 event. This leads toclarity-2eventssometimesdefyingtheparametersofthefatalityestimates,asthe‘highestimate’mayattimesbelowerthanthe‘best’or‘low’estimate.Examplesofclarity-2events:“TheUkrainiangovernment informsthat29peoplehavedied in thepastsixdays inanumberofclasheswiththeseparatistsalongthelineofconflict”."inthepast2months120peoplewerekilledinoperationsthroughoutAssam"."The responsible for the Aceh military operation indicates that 29 people have beenkilledinvariousincidentsoffightingoverthepastfivedays".event_clarity described above integer 6.6.TimeEacheventisdefinedtohaveoccurredatacertaindate.Theprecisionofthedatasetisonecalendarday,startingat00:00(midnight)andendingat23:59localtime.

Inmanycases,theexactdayaneventhastakenplaceisimpossibletofindoutwithanycertainty.Inthosecases,atemporalprecisionvariableisprovidedwhichdenoteswithwhataccuracyaspecifictimeperiodinwhichtheeventoccurredisknown.

Thetemporalprecisionvariablecanhavesixvalues:

• 1–theexactdayoftheeventisknow;

• 2–theexactdayoftheeventisnotknown,onlytimeperiodbetween2-6days;

• 3-theexactdayoftheeventisnotknown,onlytheweek;

• 4-theexactdayoftheeventisnotknown,onlythemonth;

• 5–theexactdayoftheeventisnotknown,onlytheyear.

date_prec How precise the information is about the date of an

event. 1: exact date of event is known; 2: the date of the event is known only within a 2-6 day range. 3: only the week of the event is known 4: the date of the event is known only within an 8-30 day range or only the month when the event has taken place is known 5: the date of the event is known only within a range longer than one month but less than one calendar year.

integer

date_start The earliest possible date when the event has taken place.

Date YYYY-MM-

Page 27: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

27/39

DD date_end The last possible date when the event has taken place. Date

YYYY-MM-DD

6.7.FatalityfiguresThissectionprovidesfatalityfiguresforeachevent.Anoteonciviliandeaths:Civiliandeathscanexistinallthreecategoriesofviolence.Instate-basedandnon-stateviolence,civiliandeathscount“collateral”killings,i.e.whenoneormoreciviliansarekilledasaneffectoffightingbetweenthetwowarringparties.At times, such fightingmay even result in only the civilian bystanders receiving fatalinjuries.Similarly, impreciseshellingorbombing inthecontextofanarmedconflict iscodedas state-basedviolenceunless it is clear (fromeither reportingor context) thatcivilianshavebeenexplicitlytargeted.In one-sided violence, the targeted and killed civilians are always registered in thedeaths_civilianscolumn.deaths_a The best estimate of deaths sustained by side a.

Always 0 for one-sided violence events.

integer

deaths_b The best estimate of deaths sustained by side b. Always 0 for one-sided violence events.

integer

deaths_civilians The best estimate of dead civilians in the event. For non-state or state-based events, this is the number of collateral damage resulting in fighting between side a and side b. For one-sided violence, it is the number of civilians killed by side a.

integer

deaths_unknown The best estimate of deaths of persons of unknown status.

integer

best_est The best (most likely) estimate of total fatalities resulting from an event. It is always the sum of deaths_a, deaths_b, deaths_civilians and deaths_unknown.

integer

high_est The highest reliable estimate of total fatalities integer low_est The lowest reliable estimate of total fatalities integer

Page 28: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

28/39

6.8.Variablespresent incurrentversionsofGEDnotpresent inolderversions:where_location (GED 1.0 – 1.9):renamedtowhere_descriptioncoordinate_location (GED 1.0 – 1.9):renamedtowhere_coordinatesevent_type (GED 1.0 – 1.9): given the high level of confusion experienced byouruserswithregardstotheactualmeaningofthevariable, itwasreplacedbyanewconcept,event_clarity.Geocomment (GED 1.0 – 1.9): eliminatedas ahuman-legible free-text commenton location names, geocoding sources and alternative spellings proved to have moredisadvantagesthanadvantagesfordatausabilityinalarge,quant-orienteddataset. dyad_unique (GED 1.5):replacedbydyad_new_id.Dyad_uniquewasatemporary,stop-gapmeasurespecifictoGED1.5topreventanacuteproblemoriginatingfromthemerging of three separate systems. The construction of a newUCDP system togetherwithintroductionofanUCDP-widesystem uniq (GED 1.1 – GED 1.5): replaced by id. Compared to uniq, which wasspecifictoeachreleaseofthedata,idispersistent,i.e.consistentacrossreleasesofthedatasets.Anentrywiththesameid inversion1.9describesthesamereal-lifeincidentinversions2.0–5.0.

7. Available Formats (Format Declaration): TheUCDPGEDisprovidedinavarietyofformatsforusebyresearcherswithindifferentfieldsandwithdifferentneeds.Allformatsareavailablefordownloadfreeofcharge(noregistrationrequired)fromtheUCDPGEDwebsite(http://ucdp.uu.se/ged).TheGEDiscurrentlyavailable(asofversion5.0)inthefollowingformats:CommaSeparatedValues(CSV),Excel (XLS),ESRIShapefile (SHP),RDataFrame(RData)andSQLdumpsforusewithspatially-awaredatabasesolutions.Pleasenotethateachversioncomeswithattached‘PlatformNotices’detailingspecificsofthefileformatsandinstructionsforloadingandusingthedata.A brief summary is provided here to help users understand each file format and itscompatibilityaswellastoquicklygetstarted:CSVformat:Aplaintextfilecontainingstructuredcommaseparatedvalues.Thefileissuitable for usage with statistics packages, for processing with various programminglanguages,etc.Note that the format implements the full CSV specifications as summarized inRFC41809.The fullCSVspecificationsareproperly implementedbya largenumberof

9TheRFC,summarized,statesthatdataisstoredinUTF-8,columnnamesarelistedinthefirstrow,lines(rows)areterminatedintheWindowsnew-linesystem(CRLF),fields(record/cells)

Page 29: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

29/39

softwarepackages,includingOpenOffice,Stata(usingtheinsheetcommandorthemenu),SPSS,R(usingtheread.csvfunction),PHP(usingthefgetcsvandfputcsvfunctions),Python(usingthecsvmodule)etc.As UCDP uses the defaults, typically absolutely no customization or specifications ofadditionalparametersisneededfortheusageofthefile10.Notethatthisfiledoesnotcontainthegeomvariableasitisaplaintextfile.Excelformat:AnExcel2007compatiblefilethatcanbeusedforvisualizingthedatainasimpleOffice-likesystem.Notethatthisfiledoesnotcontainthegeomvariableasitisaplaintextfile.Rdataformat:AnRdata-frame.Requiresrgdalandsuggestssp.Esri shapefile format:AnESRIshapefilepackage(shp/shx/dbf file) thatcanbeusedwith variousdesktop/legacyGIS solutions such asESRIArcGIS,QGIS etc. or importedintoR (using rgdal) or STATA (using spmap) if the spatial features of the dataset areneededforstatisticalprocessing.Notethatthisfilecontainsthespatialfeatures(geomvariable)intheESRIformat.Spatial database format: A dump of an SQL table containing the entire dataset.Importable incompatibleSQLdatabases forprocessingandanalysesaswellasusablewithnew-generationGIStools(suchasQGIS).CurrentlysupportedarePostgresqlwiththePostGISextension(versions8.3andabovewith PostGIS 2.0 or above; created on a version 9.1 with PostGIS 2.1) and MySQL(version 5 and above, created with MySQL 5.09) and SQL Server (created with SQLServer2012).The table is ‘flattened’, i.e. all the information is stored in a single tablerather than inmultipledataunderarelationalmodel. Ifyouneedamorenormalizeddatamodel,pleasecontactus.Note that this file contains the spatial features (geom variable) in the OGC format(geometry,point,SRID4326)11API(machine-to-machinecommunication):Anopenbetaexists forUCDPGEDdataasaweb-servicethroughaRESTfulAPIundertheDaaSparadigm.DescriptionsonhowtousetheUCDPGEDBetaAPIareavailableathttp://ucdp.uu.se/apidocs/

areseparatedwithacomma(‘,’),witheachfieldcontainingaspace,adoubleapostrophe(“),acomma(,)oranewline(CRLF)beingenclosedwithindoubleapostrophes(“).Doubleapostrophesareescapedbydoubleapostrophes(“”).10NotethatExcelversionspriortoMSOffice2016defaultstoTab-Separated-ValueswhenimportingCSVfiles,anddoesnotsupportthefullspecificationrequiredbyUCDPforthefiletocorrectlyload.PleaseusetheprovidedExcelfile.Similarly,ArcGIShasabrokenCSVimportsystemasofversion10.Pleaseusetheprovidedshapefile.11TheMySQLdumpdoesnotcontainanOGCbinarygeometrycolumn,onlyanOGCWellKnownTextcolumnlabeledgeom_wkt,duetodifferingimplementationsofbinarygeometryinvariousMySQLengines.TheMicrosoftSQLServerdumpwillcreatethebinaryblobcolumnontheflywhenloadingthefile.

Page 30: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

30/39

8. Acknowledgements: The UCDP geo-referencing and event data project is grateful for valuable external

input from Halvard Buhaug at the Peace Research Institute, Oslo (PRIO), Nils B.WeidmanatKonstanzUniversity,LucGiradinatETHZurichaswellasTomislavDulicatthe Hugo Valentin Centre, Uppsala University, for infrastructural support, ideas andcomments.

CurrentmembersoftheUCDPteam,inalphabeticorder:MarieAllansson,MihaiCroicu,GarounEngström,ErikaForsberg,HelenaGrusell, Prof.Håvard Hegre, Stina Högbladh, Gabrielle Lövquist, Prof. Erik Melander (director),TheresePettersson,MargaretaSollenberg,RalphSundberg,SamTaub,LottaThemner,Prof.PeterWallensteen(senioradvisor). 10. References: Boschee,E.;J.Lautenschlager,S.O'Brien;S.Shellman;J.Starz,James;M.Ward(2015),"ICEWSCodedEventData",http://dx.doi.org/10.7910/DVN/28075,HarvardDataverse,V7.DeborahJ.G.,P.A.Schrodt,O.Yilmaz(2009).ConflictandMediationEventObservations(CAMEO)Codebook.http://eventdata.psu.edu/data.dir/cameo.html,2009.Raleigh,C.,A.Linke,H.Hegre,J.Karlsen(2010),IntroducingACLED:AnArmedConflictLocationandEventDataset,JournalofPeaceResearchvol.47(5):651-660.Rød,E.G.,N.B.Weidmann(2014),ProtestingDictatorship:TheMassMobilizationinAutocraciesDatabase,APSAAnnualMeetinginWashington,DC,2014.Salehyan,I.,C.Hendrix,J.Hamner,C.Case,C.Linebarger,E.Stull,J.Williams(2012).SocialconflictinAfrica:Anewdatabase.InternationalInteractions,38(4):503-511.Schrodt,P.A.,J.BeielerandM.Idris(2013),Three'saCharm?OpenEventDataCodingwithEL:DIABLO,PETRARCH,andtheOpenEventDataAlliance.PaperpresentedattheInternationalStudiesAssociationAnnualConvention,Toronto,March2014.Schrodt,P.A.TwentyyearsoftheKansaseventdatasystemproject.ThePoliticalMethodologist,14(1):2–8.NationalConsortiumfortheStudyofTerrorismandResponsestoTerrorism(START).(2013).GlobalTerrorismDatabase.Retrievedfromhttp://www.start.umd.edu/gtdAlsoconsultthefollowingUCDPCodebooks:UCDPDyadicDatasetCodebook:http://www.pcr.uu.se/research/ucdp/datasets/ucdp_dyadic_dataset/

Page 31: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

31/39

UCDP/PRIOConflictDatasetCodebook:UCDPNon-StateConflictDatasetCodebook:UCDPOne-SidedDatasetCodebook:UCDPActorDatasetCodebook:http://ucdp.uu.se/downloads/

9. Appendixes:

APPENDIX 1: Temporal Precision Coding and Date EstimationRulesThis document specifies the qualifications for all temporal precision variablevaluesaccording to therulesconstructedby theUCDPfor theGED. Italsosetsrules for interpretation of time-related expressions and estimation of events’start and end dates. The appendix presents concrete examples that guidetemporalprecisioncodinganddateestimationprocedures.EstimationofStartandEndDates1. Start and end dates of the events are set according to information in the

originalsources.

2. Ambiguous time-relatedexpressions(e.g.past fewdays)are interpretedon

the basis of the rules presented below. This ensures uniform estimation oftheevents’startandenddatesthroughouttheentireGED.

3. Ifthesourcedoesnotprovideanyinformationaboutthetimeperiodduring

which the event took place, dates are estimated for three days, countingbackwardsfromthedayofreportingandexcludingthedayofreporting:

a. “24rebelsoldierswerekilled”;

b. “Security forces stepped up operations against the largestinsurgentgroupinAssamstate,whereanewgovernmentwassettotakechargeonFriday.ApolicespokesmansaidfourmembersoftheoutlawedULFAwerekilledinthebattles”;

c. “10bodiesfoundburiedinamassgraveinterritorycontrolledbytheULFArebels”.

Page 32: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

32/39

TemporalPrecision1–DailyPrecisionofTime1. If the exact date of an event is known the temporal precision code of 1 is

applied. Such events have the same start and end dates that are preciselyspecified in the news sources either by dates, day names, hours or otherspecifictemporalconcepts:

a. “14th January”, “today”, “yesterday”, “last Tuesday” - date for

specifiedday;

b. “Mondaynight”-dateforMonday;

c. “Lastnight”-dateforprecedingdayofreporting;

d. “Theotherday”-datefortheprecedingdayofreporting.

TemporalPrecision2–ImpreciseTime(2-6days)1. Temporalprecisionvalueof2shouldbeusedinthosecaseswhenstartand

end dates for events are of unspecified character, spanningmore than onecalendardaythoughnolongerthansixdays,i.e.shorterthanaweek:

a. “Recently”, “recent attacks” - dates for 3 days preceding and not

includingthedayofreporting;

b. “Past/lastfewdays”-datesfor3daysprecedingandnotincludingthedayofreporting;

c. “Around2July”-datesforthreedays,1-3July,withthestateddate+/-onecalendarday;

d. “Over the weekend” - dates for Saturday and Sunday, if sourcedoes not include Friday in the concept of weekend and unlessspecificdates/daysfortheweekendareprovidedinthesource;

e. “Sincethebeginningoftheweek”,”thisweek”-datesfromMondaytothedayofreporting;

f. “NightbetweenSundayandMonday”-datesfor2days;

g. “Past24hours”-datesforthedayofreportingandtheprecedingday;

h. “Past 48 hours” - dates for the day of reporting and2 precedingdays;

i. “Past 72 hours” - dates for the day of reporting and2 precedingdays;

Page 33: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

33/39

j. “Past 2 days” - dates for 2 days preceding and not including thedayofreporting;

k. “SinceThursday”–datesfromThursdayuntilthedayofreporting;

l. “Five-dayoffensive”-datesfor5daysoffightingincludingthedayofreporting;

m. “Continuousfightingbetween13-16February”-specifieddates;

n. “Night-longbattle”-datesfor2dayscoveringthewholenight;

o. “Nightofclashes”-datesfor2dayscoveringthewholenight;

p. “Last6daysof January” -dates for25-30 January, including finaldateofmonth;

q. “Late last week” - dates for Friday to Sunday of the precedingweek.

TemporalPrecision3–WeeklyPrecisionofTime1. Temporalprecisionvalueof3shouldbeusedinthosecaseswhenstartand

enddatesforeventsarespecifiedtoacertainweek,butspecificdatesarenotprovided:

a. “Last week” - dates for Monday-Sunday of the preceding week.

Exceptions can be made if there are reasons to believe that theeventtookplaceduringtheweekofthereporting(e.g.sometimes“a raid lastweek” reported on Sundaymight refer to the periodMonday-Saturday of the same week, then dates for Monday-Saturdayofthatweekshouldbeused);

b. “Pastweek” -dates for7days including thedayof thereporting,unless text indicates that past week refers to an ongoing week(startingMonday);

c. “FirstweekofAugust”-datesforAugust1-7.

d. “Week-old offensive” - dates for a week of fighting, 7 days,includingthedayofreporting;

TemporalPrecision4–MonthlyPrecisionofTime

Page 34: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

34/39

1. Temporalprecisionvalueof4shouldbeusedinthosecaseswhenstartandenddates foreventsarespecified toacertainmonth,butspecificdatesarenotprovided:

a. “Beginning of/early March” – March 1 to March 10/day of

reporting;

b. “MiddleofMarch”–March15+/-5calendardays,i.e.March10-20;

c. “End of/lateMarch” –March 15 to the last day ofMarch/day ofreporting;

d. “Anumberofweeks”,“recentweeks”-datesfor3weekscountingbackwardsfromthedayofreporting;

e. “Severalweeks”–datesfor3weeks;

f. “Earlierthismonth”–startingthe1stdayofthemonthandendingonthedayprecedingthedayofreporting;

g. “Lastmonth”-datesforthemonthprecedingtheoneonwhichtheeventwasreported;

h. “Afortnightago”-datesforpreceding14daysincludingthedayofreporting.

TemporalPrecision5–AnnualPrecisionofTime1. Temporalprecisionvalueof5shouldbeusedinthosecaseswhenstartand

enddatesforeventsarespecifiedtoacertainyear,butspecificdatesarenotprovided:

a. “1995”-1995-01-01to1995-12-31;

b. “Lastyear”-datescoveringtheyear,YYYY-01-01toYYYY-12-31;

c. “Pastyear”–AlldatesfromthedateofreportingbacktoYYYY-01-01

d. “Early1999”–1999-01-01to1999-04-30;

e. “Mid1999”–1999-05-01to1999-08-31;

f. “Late1999”–1999-09-01to1999-12-31;

g. “Past3months”-datesfor3monthscountingbackwardsfromthedayofreporting(maynotcrossoverintoanothercalendaryear);

Page 35: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

35/39

h. “Pastfewmonths”–datesfor3monthscountingbackwardsfromthe date of reporting (may not cross over into another calendaryear).

Page 36: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

36/39

APPENDIX2:Geo-precisionCodingRulesThis document gives an overview of the coding rules for geo-precision codescoupledwithexamplesandcomments.Generalrules1. Allgeographical locationsarecodedwithmoderationwithpreferencegiven

tomorecertainlocationseveniftheyrepresentahigherlevelofaggregationover those locations which are less certain but represent a lower level ofaggregation.

2. Unclear geographical referenceswith several possible levels of aggregation

are coded as the highest possible one. For instance, if there is a town, adistrict (ADM2) and a province (ADM1) of the same name and the sourcedoesnotspecifytowhichtypeof locationitrefers,thenthelocationwillbecodedasADM1.

3. If event location (camp, bridge, road etc.) has the same name as a certain

suburb, town or village (e.g. Uppsala IDP camp and Uppsala town), thecoordinatesforthattownorvillageshouldbeusedonlyifitisknownthattheevent location iswithin or close to (within 25 km) that town or village. Ifinformation about the locations’ proximity to that town or village is notavailable, the location is aggregated to the lowest available administrativedivision.Forinstance, if it isnotknownthatUppsalaIDPcampiswithin25kmfromUppsalatown,coordinatesforUppsalamunicipality(ADM2)shouldbeused.

4. If the source refers to a certain location (e.g. river, forest, lake, park,

mountainsetc.)thatisnotsimilarinsizewithalocality,orthatisnotapoint,arepresentationpointiscreatedwithprecision5.Ifthatlocationlieswithinan ADM2 or ADM1, the ADM2 or ADM1 is attached to the representationpoint. Do not aggregate e.g. rivers or national parks to administrativedivisionsifrepresentationpointscanbemade.

5. When coding historical observations the GED uses the names of the

administrative divisions in force at the time of the reporting. If theboundariesofADM1havechangedover time ina country, thedatasetusesestimatedcoordinates forolderprovincesbasedon therelevantseatof theADM1atthetimeoftheevent.

AhistoryofadministrativechangesistrackedinternallybytheUCDPsysteminadatastructurereferredtoasageotree.Ifyourequireaccesstosuchfiles,contactus.

Page 37: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

37/39

Geo-precision1Geo-precisionvalueof1isusedifthelocationinformationcorrespondsexactlyto the geographical coordinates available. Each pair of coordinates is alsocoupledwithnamesforADM1andADM2whenavailable.1. “City”,“town”,“village”,“location”,“locality”-centroidpointcoordinates;

2. “District”, ”quarter”, ”neighbourhood”, “locality” (of town) - coordinates fortowncentroidpointareappliedhere,andnotthespecificsectionofit,thoughthenameanddetailsarekeptintextinparenthesisin“Where";

3. Airbattlesiflocationisclear,i.e.“aplanewasshotdownoverKitgum”.

Geo-precision2If the location information refers to a limited area arounda specified location,coordinatesforthatlocationtogetherwiththegeo-precisionvalueof2areused.1. “Near/in the vicinity of/adjacent to/just outside/around Kitgum town” –

coordinatesforKitgumtown;

2. “Pietermaritzburgarea”–coordinatesforPietermaritzburgtown;

3. “Outskirts/suburbs of Bujumbura city” – since outskirts and suburbs areunderstood as relatively independent and distant entities coordinates forBujumburacityshouldbeused;

4. “17kmfromUppsalatown”–iftheeventtakesplacewithinadistanceof25kmfromaspecifiedlocation,coordinatesforthatspecifiedlocationareused;

5. “North of Luanda city”, “southeast of Y mountain” - unspecified distancesfromaspecifiedlocationareunderstoodtobenearthestatedlocation;

6. “BujumburacitytowardsGishinganovillage”–ifcoordinatesforGishinganovillage can not be retrieved, then coordinates for Bujumbura city will beused;

7. “NiulandvillagenearDimapurtown”- ifcoordinatesforNiulandvillagearenotavailable,butcoordinatesforDimapurtownexist,thelatterareused;

8. “Dungu territory in DRC” – third level administrative divisions (ADM3), ifsmall enough to have an approximate radius of 25 km or less, receive aprecisioncodeof2.

Geo-precision3If the source refers to or can be specified to a larger location at the level ofsecondorderadministrativedivisions (ADM2), suchasdistrictormunicipality,

Page 38: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

38/39

theGEDusescentroidpointcoordinatesforthatADM2.Ifthesearenotavailable,representationcoordinatesforatownwithinthatADM2areused.ThenameoftheADM2inforceatthetimeofreportingisrecordedinthevariableADM2.1. “Arushadistrict,Arushaprovince”-coordinatesforArushadistrict(ADM2);

2. “Burambicommune,Burundi”–coordinatesforBurambicommune(ADM2);

3. Airbattlesifunclearlocation-ifthebattletakesplace“over”acertainADM2,coordinatesforthatADM2willbeused;

Geo-precision4Ifthelocationinformationreferstoafirstorderadministrativedivision,suchasaprovince(ADM1),theGEDusesthecoordinatesforthecentroidpointofADM1.1. “Cibitokeprovince,Burundi”–coordinatesforCibitokeprovince(ADM1);

2. Airbattlesifunclearlocation-ifthebattletakesplace“over”acertainADM1,coordinatesforthatADM1areused;

3. If theADM2 inwhich theevent tookplace inunclear(e.g.differentsourcesrefertodifferentADM2sinwhichthesameeventtookplace),thelocationisaggregatedtotheADM1level;

Geo-precision5Geo-precisionvalueof5isusedinthesecases:1. Ifthelocationinformationreferstopartsofacountrywhicharelargerthan

ADM1, but smaller than the entire country such as “Southern Lebanon”,“NorthernUganda”.Inthesecases,arepresentationpointiscreatedforthatpartof thecountryandusedasa representationof thatarea togetherwithgeo-precision value of 5. Note that these points are stored and reusedconsistentlybytheUCDP(thus,alleventsassignedto“NorthernDRCongo”willhavethesamecoordinatesrecorded).

2. If a pair of coordinates is estimated as a representation point for a linear,

non-administrativepolygonorfuzzygeographicfeature(river,informalarea,large lake etc.). For example, if the location is on the border between twocountries and the location of such point is not precisely known, a pair ofestimated coordinateswill be used togetherwith geo-precision value of 5.For example, “on the border betweenUganda and Sudan”will be coded as“Uganda/Sudan border” with the coordinates for a selected point on theborder between Uganda and Sudan; Note that these points are stored andreusedconsistentlybytheUCDP(thus,alleventsassignedto“Uganda/Sudanborder”willhavethesamecoordinatesrecorded).

Page 39: UCDP GED Codebook version 5 · What the UCDP GED version 5.0 dataset contains: The basic unit of analysis for the UCDP GED dataset is the “event”, an individual incident (phenomenon)

39/39

3. If the location informationrefers to islandswhicharenotanADM1or2oftheirown.Forexample,“Zanzibarisland”willbeunderstoodaseasternpartofTanzaniaandreceivegeo-precisionvalueof5.Ifapairofcoordinatesforthat island is not available in the gazetteers, it can be represented by anADM1inthatisland.

4. If the location is not specific and need to be estimated (for example, “road

betweenPaderandKitgum”,“alongAswariver”etc.),orthelocationismorethan25kmawayfromanotherlocation(forexample,75kmsouthofKitgumtown),thenarepresentationpointiscreatedforthatpoint.Thisisdoneevenif the two points are located in the same ADM2 As such, if an event isdescribedastakingplaceon“theroadbetweenYeiandRasulinYeidistrictofEquatoriaState”,thenapointisestimatedonthatroad,withprecision5,withboththeADM1(Equatoriastate)andtheADM2(Yeidistrict)coded.

Geo-precision6Ifthelocationinformationreferstoanentirecountry,centroidpointcoordinatesofthatcountryareused.Also,ifthelocationisnotprovided/isunclear/referstoseveral locations which can not be split and covers the whole country and aparticularactivityareaoftheactorisnotclear,centroidpointcoordinatesofthatcountryareused.

1. "Germany"-centroidpointcoordinates;

Geo-precision7Iftheeventtakesplaceoverwaterorininternationalairspace,thegeographicalcoordinatesinthedataseteitherrepresentthecentroidpointofacertainwaterarea or estimated coordinates according to similar techniques as presentedaboveforgeo-precisioncode5.Forairevents,precisioncode7isusedonlyifthedeathisnottheeffectofordidnotresultintheairplanecrashing(insuchacase,1-5precisioncodesareusedwiththelocationofthecrash).1. “Southernocean”–centroidpointcoordinates;

2. “BayofBengal”–centroidpointcoordinates;

3. “37kmoffthecoastfromStockholmcity”–estimatedcoordinatesforapoint37kmand90degreesoffthecoastofStockholm.

4.“theministerwasstabbedonanairplaneenroutetoDelhiafterdepartingIslamabad”–coordinatesforIslamabadairport,precisioncode7.