copyright 2017, vaughn smith
TRANSCRIPT
Near real-time monitoring of tropical dry forests in North and Central America
by
Vaughn Smith, B.S.
A Thesis
In
Wildlife, Aquatic, and Wildlands Science and Management
Submitted to the Graduate Faculty
of Texas Tech University in
Partial Fulfillment of
the Requirements for
the Degree of
MASTER OF SCIENCE
Approved
Dr. Carlos Portillo-Quintero
Chair of Committee
Dr. Guofeng Cao
Dr. Gad Perry
Mark Sheridan
Dean of the Graduate School
December, 2017
Copyright 2017, Vaughn Smith
Texas Tech University, Vaughn Smith, December 2017
i
ACKNOWLEDGMENTS
My time at Texas Tech University has been incredibly rewarding, primarily
thanks to Dr. Carlos Portillo-Quintero, my thesis committee chair and mentor, who
expertly guided me through the last two and a half years of scientific exploration. I have
always had an interest in the natural and technical sciences, but was admittedly hesitant
upon accepting the graduate research assistant position with Dr. Portillo-Quintero as I
thought my skills and knowledge may be lacking. However, with Dr. Portillo-
Quintero’s continued encouragement and tutelage I acquired the necessary geographic
information system and data science proficiencies to significantly contribute to an
emerging body of research. This study would not have been possible without support
from the Tropi-Dry Collaborative Research Network funded by the Inter-American
Institute of Global Change Research. I am quite pleased with the results of my research
and feel like I can truly call myself a ‘scientist,’ thanks to Dr. Carlos Portillo-Quintero.
I would also like to thank my other committee members, Dr. Guofeng Cao and
Dr. Gad Perry for guiding me through my thesis. Their thoughtful and thorough
inquiries throughout the process helped me to create a more robust final product, on
which I am proud to put my name.
Additionally, I would like to thank Dr. Robert Cox, Dr. Terry McLendon, and
Dr. Katie Lewis, all brilliant and passionate professors who’s courses I sincerely
appreciated. They, as well as all of the faculty, staff and students comprising the College
of Agricultural Sciences and Natural Resources, helped to provide all of the
supplementary expertise needed to fully round out and complete my graduate education.
Finally, I would like to thank my friends, old and newly made in Lubbock, and
family, especially my mother, Dr. Katherine A. Groves, who has provided unending
love, support, and guidance throughout my life; her strength and intellect have always
been and will continue to be an inspiration. I must also acknowledge my father, Chester
B. “Solo” Smith, who passed away in 2005 – your love, wisdom and overall grand
personality is missed and cherished.
Texas Tech University, Vaughn Smith, December 2017
ii
TABLE OF CONTENTS
ABSTRACT ................................................................................................................. iii
LIST OF FIGURES .................................................................................................... iv
INTRODUCTION .........................................................................................................1
1.1 Objectives of the study ..........................................................................................5
LITERATURE REVIEW .............................................................................................7
2.1 Defining tropical deforestation .............................................................................7
2.2 Tropical dry forest deforestation trends in Latin and Central America ................8
2.3 Mapping deforestation using remote sensing and potential for near
real-time monitoring ...........................................................................................10
2.4 BFAST algorithm family ....................................................................................12
2.5 The challenges of detecting and tracking deforestation in TDF .........................16
2.6 Other change detection algorithms .....................................................................16
MATERIALS AND METHODS ...............................................................................18
3.1 Study area and project context ............................................................................18
3.1.1 Yucatan, Mexico ..........................................................................................19
3.1.1.1 Study Area 1 (Y1): East of Tekit ..........................................................20
3.1.1.2 Study Area 2 (Y2): Tekik de Regil .......................................................20
3.1.2 Guanacaste, Costa Rica ................................................................................21
3.1.2.1 Study Area 1 (G1): Cuajiniquil-Soley ..................................................21
3.1.2.2 Study Area 2 (G2): North of Bijagua ....................................................22
3.2 Data acquisition and preprocessing.....................................................................23
3.3 Vegetation indices ...............................................................................................25
3.4 Open-source and licensed software.....................................................................27
3.5 System architecture .............................................................................................28
3.6 Design of a near-real-time monitoring system using BFAST.............................34
3.7 Validation ............................................................................................................36
3.8 Near real-time validation ....................................................................................39
RESULTS AND DISCUSSION .................................................................................41
4.1 Breakpoints and magnitudes ...............................................................................41
4.2 Accuracy assessment...........................................................................................48
4.3 Step towards a near real-time deforestation monitoring system in
Central America. .................................................................................................55
4.4 Sources of error and implications for BFAST implementation ..........................56
4.5 Implications for biodiversity and conservation ...................................................59
CONCLUSION ............................................................................................................62
LITERATURE CITED ...............................................................................................63
APPENDICES .............................................................................................................67
A. BFAST CODE IMPLEMENTATION IN RSTUDIO .........................................67
B. ERROR MATRICES ...........................................................................................72
Texas Tech University, Vaughn Smith, December 2017
iii
ABSTRACT
Tropical Dry Forests (TDF) represent one of the most preferred habitats in the
tropics for human settlements and exploitation, and directly and indirectly provide vital
natural resources, such as food, water, wood products, minerals, medicines, etc., to
support the lives and livelihoods of approximately 90 million people in Latin America.
Unfortunately, the rate of TDF deforestation in Latin America, as well as globally, has
had an increasing trend for the last several decades. Deforestation monitoring, using
time-series of Landsat imagery, has is becoming a reality with the advent of cloud
computing, open-source programming platforms, and near real time distribution of
imagery data, but little has been done to implement these systems in TDF landscapes.
The general objective for my research was to evaluate the feasibility and efficiency of
automated time-series analysis tools (e.g. BFAST in the R statistical analysis
programming language) for detecting and monitoring deforestation in TDF landscapes
using satellite imagery. Results show that BFAST time-series analysis tools were
effective in accurately determining deforestation events. Vegetation indices that utilize
the shortwave infrared bands prove to be more sensitive to forest disturbance than other
indices using the red and near infrared bands. Moderate to extreme negative magnitude
values proved to be the determining products that indicated a deforestation event, with
value ranges varying widely between study sites/regions. However, the application of
BFAST for shorter time frames in near real-time (weeks to 3 months) will only be
possible through the use of combined, multi-sensor data to handle gaps due to poor
quality images and cloud cover, as well as external data to eliminate commission errors.
The methods discussed in this study could provide near real-time and eventually true
real-time capabilities that provide a better understanding of land-cover change
dynamics, which would assist in conservation efforts to help protect biodiversity around
the world.
Texas Tech University, Vaughn Smith, December 2017
iv
LIST OF FIGURES
1. Example of first-order harmonic model fitted to real Landsat
observations (pixel values) by DeVries et al., 2015 demonstrating
sequential monitoring approach for detecting breaks (Red line). .....................13
2. Example of a breakpoint flagged on a single pixel by BFAST.........................14
3. Location of the study areas in Mesoamerica: A) Yucatan, Mexico and
B) the Guanacaste Region, Costa Rica. Green coverage refers to
Tropical Dry Forest extent mapped by Portillo-Quintero and Sanchez-
Azofeifa (2010) for Mesoamerica. ....................................................................18
4. Location of Y1 and Y2 sites in the Guanacaste Region, Costa Rica.
Green coverage refers to Tropical Dry Forest extent mapped by
Portillo-Quintero and Sanchez-Azofeifa (2010) for Mesoamerica. ..................20
5. Location of G1 and G2 sites in the Guanacaste Region, Costa Rica.
Green coverage refers to Tropical Dry Forest extent estimated by the
GlobCover2009 project for Mesoamerica. ........................................................22
6. Comparison of spectral bands for Landsat 7 (L7 ETM+) and Landsat 8
(OLI & TIRS) ....................................................................................................24
7. Example of the electromagnetic signature of healthy green vegetation
and associated absorption and reflectance features. ..........................................25
8. Folder architecture that needs to be created outside of R environment
on computer. ......................................................................................................29
9. Flowchart representing methods of this study. .................................................33
10. Model visualization of how near real-time system functions. ..........................35
11. Reference data locations for validation. One-hectare grid cells visually
inspected via multi-temporal images available in Google Earth (total
n=373). Y1 (‘D’=64; ‘S’=54), Y2 (‘D’=32; ‘S’=32), G1 (‘D’=21;
‘S’=50), G2 (‘D’=70; ‘S’=50). ..........................................................................38
12. Validation example. Deforested cells (‘D’) – areas visibly covered by
TDF beginning 2013, and visibly non-forested by 2016, with soil
exposure. Stable cells (‘S’) referred to areas of any land covers that
remained the same during 2013-2016. ..............................................................39
13. Magnitude outputs for NBR2 in Yucatan site 2. Monitoring period
07/11 to 07/15 (green to red) overlays monitoring period 01/12 to
12/15 (green to blue) so that new deforestation is highlighted in blue. ............40
14. Maps of breakpoint magnitudes for all VIs for Y1 and pixel
percentages. .......................................................................................................44
Texas Tech University, Vaughn Smith, December 2017
v
15. Maps of breakpoint magnitudes for all VIs for Y2 and pixel
percentages. .......................................................................................................45
16. Maps of breakpoint magnitudes for all VIs for G1 and pixel
percentages. .......................................................................................................46
17. Maps of breakpoint magnitudes for all VIs for G2 and pixel
percentages. .......................................................................................................47
18. Overall accuracies for Y1, Y2, G1, G2, all sites, and all sites except G2.........49
19. Producer’s accuracies for Y1, Y2, G1, G2, all sites, and all sites except
G2. .....................................................................................................................50
20. User’s accuracies for Y1, Y2, G1, G2, all sites, and all sites except G2. .........51
21. Evaluation of near real-time accuracy of BFAST. Points represent new
breakpoints detected in a 6-month window. Imagery shows ground
truth data. The accuracy for this assessment in Y2 was estimated in
55%. ..................................................................................................................54
22. Example of false positive detected breaks not associated with
deforestation. .....................................................................................................55
23. Image A from 02/25/2005 and B from 10/11/2015 in G2 NDVI stack
showing lack of data due to cloud mask (and due to Landsat 7 Scan
Line Corrector error in Image A). .....................................................................57
24. Example taken from Murillo-Sandoval et al. 2017. Three breakpoints
(dashed red lines) and four segments (black lines) identified over time
series (blue lines). The slope coefficients (β) are all significant
(α = 0.05) and ρ represents p-values. ................................................................58
Texas Tech University, Vaughn Smith, December 2017
1
CHAPTER I
INTRODUCTION
Global deforestation rates have displayed an overall increasing trend throughout
the last several decades contributing significantly to the loss of biodiversity as well as
to the loss of the potential for carbon sequestration (Portillo-Quintero, et al. 2014).
Interestingly, while forests ecosystems across the world experience loss and regrowth
at different rates, the global trend in recent years suggests a reduction in forest loss
(Hansen, et al. 2013). For example, the forests of Brazil, which have historically
experienced large scale deforestation events, underwent a verified reduction in
deforestation from 2.8 million hectares in 2003 – 2004 to 1.3 million hectares in 2007
– 2008 due to a combination of enhanced conservation efforts as well as economic
decline (Butchart, et al. 2010). However, this is offset by increased deforestation in
Eurasian tropical rainforest, African tropical moist deciduous forest, South American
dry tropical forests, and Eurasian tropical moist deciduous and dry forests (Hansen, et
al. 2013). Additionally, regrowth does not necessarily translate to regained biodiversity,
as regrowth tends to result in secondary forest with altered successional species
composition that differ from mature primary forests (Read and Lawrence, 2003).
The importance of biodiversity cannot be understated as it is what underlies the
delicate balance of various forest ecosystems around the world. Biodiversity has a
number of features, such as richness of species, ecosystem type rarity, abnormal
evolutionary or ecological occurrences, rarity of higher taxonomical groups, and status
of endemic species that each can each individually contribute to overall biodiversity loss
if affected (Olson et al., 2001). Biodiversity is significant, not only in terms of the
simple aesthetic beauty of nature and availability of precious natural resources, but also
in terms of measures of productivity. Generally, the productivity of a forest is positively
correlated with species richness (Vila et al., 2007). It has also been found by Bohn and
Huth (2017) that forest structure as well as species richness have an impact on
productivity factors such as above-ground wood production, which significantly
Texas Tech University, Vaughn Smith, December 2017
2
impacts carbon-sequestration. The loss of biodiversity in forest ecosystems around the
globe is a current major issue in biological conservation. Remnants of forest in high
biodiversity ecoregions need to be protected and continually monitored, ideally as close
to real-time as possible in order to produce actionable results to stem the tide of
biodiversity loss.
One of the national strategies implemented by governments around the world to
protect biodiversity and avoid further loss is the design and maintenance of protected
areas. Since the mid-twentieth century, national and international environmental
agencies have designed and expanded a system of protected areas (PA) across the globe
to preserve the currently fragmented natural ecosystems of the world. However, these
areas, once isolated from highly populated areas and distant from threats, are now
embedded in social-ecological land systems characterized by patches of preserved or
managed natural ecosystems in a ‘matrix’ of urban, agriculture and livestock ranching
land uses that has expanded rapidly, increasing land conflicts between stakeholders and
degradation outside and inside the PA system (Boillat et al. 2017). Patches of forest
survive in human-dominated landscapes that are highly variable in time and space,
where choices of rotational crops or land abandonment shape the dynamics of forests in
terms of its extent or ecological functionality. Such human-dominated landscapes are
common in the tropical forests of Mesoamerica, a region that is known for still harboring
some of the most biodiversity-rich forests in the world (Garcia-Frapolli, 2007).
Land use and land cover in the countries of Mesoamerica have undergone
change in different directions as a result of the complex history of the politics and
socioeconomic conditions of the region. Armed conflicts in the 80’s, post-conflict
pacification and recent changes in socioeconomic conditions have caused profound
fluctuations in land distribution, tenure and land use change. Different circumstances in
land quality, as well as access to credit and insurance, for small and large land owners
have shaped the distribution of land use in Central America. Poverty and migration have
also influenced decisions on land use that have led to expansion of pastures for cattle
ranching to the detriment of natural landscapes, while government driven investment on
Texas Tech University, Vaughn Smith, December 2017
3
agricultural intensification has favored the expansion of high-yield crops, especially for
large land owners (Davis and Lopez-Carr, 2014). Furthermore, all countries differ in
their land use policies, land redistribution history, as well as biodiversity protection
policies and law enforcement capacity.
In 2003, natural vegetation, including secondary forests and selectively logged
forests, was estimated to cover 57% of Mesoamerica, with the remaining area being
used predominantly for crop (mostly corn, coffee, beans and sugar cane) and cattle
production (42%) and 1–2% in urban and other land covers (DeClerck et al. 2010). The
Central American Commission on Environment and Development (CCAD) calculates
that 400,000 ha of forest are being lost on an annual basis in the Mesoamerican region.
Deforestation rates continue to be high, although recent reports have also notice forest
regrowth in some areas of Central America. In any case, the dynamics of forest loss,
regrowth and disturbance, in addition to the land use change related to urban,
agricultural and cattle ranching expansion are complex in the region and mostly tied to
contextual and local factors.
The use of Geographic Information Systems (GIS) and remote sensing (satellite
imagery) has played a key role in understanding the past patterns and trends in
deforestation across the region. Each country in the region has established a monitoring
program that relies on the use of satellite imagery for mapping the extent and
distribution of terrestrial ecosystems. Countries like Costa Rica and Mexico have had a
long tradition in the use of remote sensing products for understanding ecosystem extent,
while other countries currently lack updated information on the conservation status of
its forests. However, in even the best of cases, land cover and land use maps are typically
generated every 5 to 10 years for a country. An example is the CCAD Central American
Land Cover Map for 1980, 1990, 2000, and 2010 developed by CATHALAC
(http://cathalac.org/) in 2011, which allows observing long term deforestation patterns
and trends to study its causes.
One of the reasons for studying deforestation dynamics at these time intervals in
local institutions of Mesoamerican countries, is the limitation of computing power and
Texas Tech University, Vaughn Smith, December 2017
4
software licensing that allows for larger storage and image processing capabilities.
However, the new era of cloud computing has enabled faster and more efficient
processing of satellite images that are globally available. Distribution archives from
NASA and the European Space Agency (ESA), are now capable of freely distributing
raw satellite imagery collected in the same day, or readily processed data with only a
few days of delay. A recent dataset produced by Hansen et al. (2013) and distributed
through the Global Forest Watch website, has produced ‘tree cover’ loss and gain maps
from 2000-2015 at annual rates for the whole world. Although the ‘tree cover’ definition
includes not only forests, but also disturbed vegetation, plantations and other tree
dominated land covers, this dataset is helping researchers understand trends in
vegetation cover, especially in threatened ecoregions, while highlighting areas of rapid
change in the last years. The Hansen et al. (2013) product is updated every year, but still
only provides information with a lag of two years (2000-2015) and cannot provide sub-
annual information (monthly) and/or indicate the occurrence of deforestation in near
real-time.
Over the past several years there have been significant advances in the design of
continuous land cover change (CLCC) mapping algorithms that use the complete record
of Landsat data, taking advantage of the high-quality Landsat data archive that became
freely available in 2008 (Cohen et al. 2017). These unique techniques in remote sensing
allow the user to study the trend in pixel values across hundreds to thousands of images
and detect when a pixel value drastically changes, indicating a change in surface
reflectance, and thus, in land cover or land use. CLCC algorithms can produce outputs
that include the exact date when the abrupt change occurred. Some algorithms can
produce highly accurate land use and land cover maps at any given time for the satellite
image time series. Its application relies on the heavy use of programming languages
such as Python and Matlab and the use of high performance computational
infrastructure. CLCC mapping algorithms can be iterated to register significant breaks
in pixel values of satellite imagery, as new data is acquired and processed. These
Texas Tech University, Vaughn Smith, December 2017
5
algorithms have opened up the possibility of establishing real-time or near real-time
detection of deforestation using satellite imagery.
The potential for real-time or near real-time change detection has become more
of a reality in recent years as access to higher spatial, spectral and temporal resolution
datasets have become more available, along with better tools to analyze this data.
However, algorithms and methodologies have been mostly applied for research
purposes and are not yet operational on the ground. For biodiversity-rich Latin
American countries, the technology is still far from its operationalization. Ideally, such
a system will allow a director of a conservation effort or the manager of a national park
to receive a report with a map of potentially deforested areas every month, or every 3 to
6 months, allowing for actions to be taken in exact locations in the field as early as
possible to prevent further forest cover losses. For this system to operate as a supporting
tool in decision making in countries of the Mesoamerican region, it has to be credible.
High accuracies in the detection of deforestation need to achieved. Because of this, it is
important to test its accuracy in different scenarios, ecoregions and socio-ecological
systems, using a variety of image-based products (vegetation indices) and algorithm
parameters.
1.1 Objectives of the study
The general objective for this research was to evaluate the feasibility and
efficiency of automated time-series analysis tools (e.g. BFAST; BFASTMonitor;
BFASTSpatial) for detecting and monitoring deforestation in tropical dry forest
landscapes using Landsat satellite imagery.
The specific objectives for this research were:
1. Evaluate the accuracy of automated time-series analysis tools (e.g.
BFAST; BFASTMonitor; BFASTSpatial) applied on Landsat imagery for
the detection of deforestation events in tropical dry forests.
2. Evaluation of the capabilities of BFAST time-series analysis tools to
track changes in near real-time using Landsat imagery.
Texas Tech University, Vaughn Smith, December 2017
6
To fulfill specific objective 1, vegetation index data from Landsat satellite
imagery between 2000 and 2016 was used as input data for the ‘bfastSpatial’ algorithm.
This analysis produced outputs that showed breaks in a seasonal trend that correlated to
TDF canopy loss. An accuracy assessment was implemented to evaluate the algorithm’s
accuracy for each vegetation index. To fulfill specific objective 2, I tested the accuracy
of the ‘bfastSpatial’ algorithm in detecting recent deforestation when sets of new
observations were added to the time series. I then evaluated the sequential outputs to
determine temporal differences between the detected break and ground truth data.
Texas Tech University, Vaughn Smith, December 2017
7
CHAPTER II
LITERATURE REVIEW
2.1 Defining tropical deforestation
According to the Forestry Department of the F.A.O. in their Global Forest
Resources Assessment 2010, “Deforestation” is defined as the conversion of forest to
other land use or the long-term reduction of the tree canopy cover below the minimum
10 percent threshold. Deforestation implies the long-term or permanent loss of forest
cover and denotes transformation into another land use. Such a loss can only be caused
and maintained by a continued human-induced or natural perturbation. Deforestation
includes areas of forest converted to agriculture, pasture, water reservoirs and urban
areas. The term specifically excludes areas where the trees have been removed as a
result of harvesting or logging, and where the forest is expected to regenerate naturally
or with the aid of silvicultural measures. Unless logging is followed by the clearing of
the remaining logged-over forest for the introduction of alternative land uses, or the
maintenance of the clearings through continued disturbance, forests commonly
regenerate, although often to a different, secondary condition. In areas of shifting land
use, forest, fallow forest and agricultural lands appear in a dynamic pattern where
deforestation and the return of forest occur frequently in small patches. To simplify
reporting of such areas, the net change over a larger area is typically used by F.A.O
methodologies. Deforestation also includes areas where, for example, the impact of
disturbance, overutilization or changing environmental conditions affects the forest to
an extent that it cannot sustain a tree cover above the 10 percent threshold.
However, others authors such as Sierra (2000) have a much simpler definition
of deforestation whereby deforestation is simply, total removal of forest canopy for any
reason (including logging). For the purposes of this research deforestation will be
defined as complete loss of forest canopy for any reason at any scale, even at sub-hectare
scales. This level of small-scale deforestation may not seem significant, but typically
processes in forest conversion to other land uses is progressive, starting with small
Texas Tech University, Vaughn Smith, December 2017
8
clearings and then expanding to greater extensions. Therefore, the method here applied
will be evaluated at its full potential for detecting and including all detected forest
clearings at the minimum mapping unit of a Landsat pixel size (0.09 ha).
2.2 Tropical dry forest deforestation trends in Latin and Central America
This study is focused on Tropical Dry forests (TDF), which represent one of the
most preferred habitats in the tropics for human settlements and exploitation (Murphy
& Lugo, 1986; Sánchez-Azofeifa et al., 2005). Tropical dry forests are defined by
various authors in various ways, but in general, TDF can be defined, as Sánchez-
Azofeifa et al. (2005) have described, as a tropical ecosystem where at least 50% of
trees present are drought deciduous (trees completely shed their leaves in the dry
season), the mean annual temperature is at least 25 °C, total annual precipitation ranges
between 70 and 200 cm, and there are three or more dry months every year (precipitation
less than 10 cm).
Tropical dry forest loss in Latin and Central America, as well as globally, has
had an increasing trend for the last several decades. Murphy and Lugo (1986) identified
that about 40% of the earth’s tropical and subtropical landmass is dominated by open or
closed forest, where 42% is dry forest. According to Miles et al. (2006) more than half
(54.2%) of the remaining dry forests are located within South America, and the
remaining area of dry forest is almost equally divided between North and Central
America (12.5%), Africa (13.3%), and Eurasia (16.4%), with a relatively small portion
in Australasia and Southeast Asia (3.8%). Miles et al. (2006) suggest that the total
estimated area of remaining TDF is approximately 1,048,700 km2. According to
Portillo-Quintero et al. (2010) the potential extent of TDF in North and Central America,
South America, and the Caribbean islands is approximately 1,520,659 km2 while the
current extent is actually 519,597 km2. Such findings indicate that the TDF has suffered
a loss of 66% of its historical potential cover.
Drivers of deforestation in TDF can be very different between and within
countries, but the main driver of deforestation is unequivocally due to intensive
Texas Tech University, Vaughn Smith, December 2017
9
anthropogenic disturbance. According to Portillo-Quintero et al. (2014) and Murphy &
Lugo (1986), the tendency for TDF to have relatively flat terrain, fertile soils with less
aggressive successional vegetation, seasonality in rainfall that allows for short-cycle
crop agriculture, climate more suitable for livestock and less suitable for mosquitoes
that spread diseases, and lower overall biomass that facilitates clearing are all the
primary reasons that human populations have an affinity for TDF. Many resources that
are useful to human populations, not only in rural areas, but urban as well, are found in
TDF: plants used for food, beverages, condiments, construction materials, firewood,
medicinal/herbal remedies; animals for hunting; and shade and fresh air for locals to
enjoy (Portillo-Quintero et al., 2014).
Tropical dry forest ecosystems directly and indirectly provide vital natural
resources, such as food, water, wood products, minerals, medicines, etc., to support the
lives and livelihoods of approximately 90 million people in Latin America. In addition
to providing life-supporting natural resources, tropical dry forest ecosystems have a
significant impact on global climate as they have at least half of the rainforest’s carbon
storage capacity. In the Americas, for example, TDF restoration could potentially add 8
Gt (gigatons) of carbon to the potential total ecosystem carbon stock (Portillo-Quintero
et al. 2014). Beyond these facts, TDF provide much of the planet’s biodiversity, which
is intrinsically beneficial – at the least in terms of simple aesthetics and the beauty of
nature, and more so to provide opportunities to study, research, learn from and gain a
deeper understanding of nature.
Understanding the patterns of tropical deforestation and having the ability to
measure monthly or annual rates of deforestation in an efficient and timely manner, will
help to efficiently allocate resources for TDF management, conservation, and
restoration efforts in critical areas of its distribution in Latin America. In Mesoamerica,
preventing further TDF losses is especially important in the context of current watershed
management within the “Corredor Seco Centroamericano” (or Central American Dry
Corridor), a region of the pacific coast that has been recently subject to frequent
droughts, with detrimental consequences to local economies and vulnerable populations
Texas Tech University, Vaughn Smith, December 2017
10
dependent on subsistence agriculture. The region is so at risk that it has been a concern
for international humanitarian aid agencies.
2.3 Mapping deforestation using remote sensing and potential for near
real-time monitoring
Early mapping techniques, such as those demonstrated by Trejo and Dirzo
(1999), used early potential vegetation maps with more recent current land use maps to
compare the potential versus the existing vegetation. Most of this work included the
manual digitization of forest cover extent over large areas using aerial photography and
satellite imagery as ground-truth information for forest cover. This approach suggested
a tremendous effort from digitizers and analysts and limited change detection to few
time steps in time. However, technological developments in automated mapping tools
for satellite imagery since the 70's have allowed scientists to map forests on a regular
basis for any particular area of the world to understand the temporal trends of
deforestation on an annual basis or across decades. As new satellites and data
distribution methods become available, the temporal resolution and the level of detail
and data (spatial and spectral resolution) of the datasets have increased, yielding much
better products that help to understand the dynamics of deforestation at any particular
site.
The field of remote sensing has been advancing rapidly over the last 10-20 years
and two sensors have been of high importance for mapping and monitoring
deforestation: the MODIS (Moderate Resolution Imaging Spectroradiometer) sensor
system aboard the Terra and Aqua NASA satellites, which have been in orbit since the
year 2000; and Landsat series of satellites, which have been in orbit since the 1970s.
MODIS allows for surface multispectral data to be collected at 250-m, 500-m, and 1-
km resolution daily, every 8 days, every 16 days, or monthly, depending on the specific
data product such as surface reflectance, snow cover, or vegetation indices. LANDSAT
satellites also collect multispectral data every 16 days, but with much higher spatial
resolutions at 15-m, 30-m, and 100-m. Until recently, this data was difficult to collect,
Texas Tech University, Vaughn Smith, December 2017
11
process and analyze, but due to the improvements made in computing technologies over
the last several years, this flow of data is much more efficient and less costly. For
example, the United States Geological Survey opened their archive of Landsat scenes
to the public free of charge in 2008, which then spawned several related products such
as EarthExplorer and Glovis, which are browser-based viewing tools as well as the Earth
Resources Observation and Science (EROS) Center Science Processing Architecture
(ESPA) ordering interface that allows for bulk ordering of customized, preprocessed
data. Additionally, other data repositories that may have imagery with better resolutions
that may charge a fee are offered free of charge or at a reduced rate through various
organizations and institutions, which helps to make remote sensing and its applications
much more accessible.
According to DeVries et al. (2015), to date, only a few remote sensing based
forest monitoring systems exist in tropical countries, the most advanced of which are
the PRODES and DETER systems of the Brazilian Space Agency (INPE), used for
annual deforestation mapping and near real-time deforestation monitoring, respectively.
However, with the opening of the U.S. Geological Service (USGS) Landsat data
archive, large amounts of medium-resolution optical earth observation data have been
made freely available to the public, which combined with continued advances in the
field of cloud computing for geospatial data has allowed for high temporal resolution
forest change monitoring at unprecedented spatial scales. An example of
implementation of remote sensing technologies and cloud computing techniques can be
found in the work of Hansen et al. (2013), which currently provides deforestation
information for the Global Forest Watch organization on an annual basis. In this study,
loss and gain of global tree cover extent was mapped using Landsat 7 data from 2000 to
2012 at a 30-m resolution. Over 600,000 Landsat 7 images were compiled and analyzed
using Google Earth Engine which applied a supervised learning algorithm to identify
per pixel tree cover.
Many scientists and researchers globally have started to utilize the full temporal
resolution of MODIS and Landsat datasets to detect and track trends in vegetation index
Texas Tech University, Vaughn Smith, December 2017
12
products by automating time-series analysis of the satellite imagery. Vegetation Indices
(VIs) are combinations of surface reflectance at two or more wavelengths designed to
highlight a particular property of vegetation. One of the more commonly used
vegetation indices is the Normalized Difference Vegetation Index (NDVI), which serves
to quantify healthy, green vegetation (Daughtry et al. 2005). The NDVI is created using
a ratio of these two wavelengths: the red wavelength (~600-700 nm) in the visual
spectrum and the near-infrared (NIR) wavelength (~700-1300 nm). Recently, a team of
researchers from Wageningen University (Jan Verbesselt, Loic Dutrieux and Ben
DeVries) have designed and implemented a package in the R statistical programming
language called ‘Breaks for Additive Seasonal and Trends’ or BFAST that allows for
the detection of breaks from a seasonal trend in a time-series of Vegetation Index values.
The BFAST algorithm in R includes a set of utilities and wrappers to perform change
detection on spatially gridded, time-series satellite data (Landsat and MODIS).
Essentially, they have used historical NDVI data over several years to create a trend line
based on the seasonality of the forest. Then, once a break from the trend is detected
using statistical methods in the algorithm, a magnitude is calculated for that break,
which is then used to determine if deforestation has occurred. In the next section, I
explain in detail the composition of the BFAST algorithm family.
2.4 BFAST algorithm family
The BFAST package for the statistical programming language, R, stands for
Breaks for Additive Seasonal Trends and was developed by Verbesselt, et.al. (2010).
The function accepts a univariate time-series object as an input along with other
adjustable parameters. For each pixel in a Landsat scene, the time-series is used to
create a best-fit seasonal regression model with a trend component. Seasonal regression
models recommended by previous studies (DeVries et al., 2015) are first-order
harmonic, which allows better description of the trajectories of pixel values in natural
systems under seasonal changes in precipitation and phenology (Figure 1).
Texas Tech University, Vaughn Smith, December 2017
13
The first order harmonic model is explained by this equation:
where yt and t are the response (dependent variable) and time (independent
variable), f is the temporal frequency, α is the intercept, γ, and δ are the amplitude and
phase of the harmonic component, and εt is the residual (noise component).
The algorithm then detects if the real data deviates significantly enough from
the model and creates a breakpoint with a magnitude of deviation from the trend.
Breakpoints are detected within a user-defined monitoring period by computing
ordinary least squares (OLS-based) moving sums (MOSUM) of residuals using
observations from a selected fraction of the history period (defined by the h value):
Figure 1. Example of first-order harmonic model fitted to real Landsat observations
(pixel values) by DeVries et al., 2015 demonstrating sequential monitoring approach
for detecting breaks (Red line).
Texas Tech University, Vaughn Smith, December 2017
14
where y and ŷ are actual and expected observations, respectively, n is the number of
sample observations, h is the MOSUM bandwidth (fraction of the number of sample
observations), and 𝜎 is an estimator of the variance (De Vries et al. 2015). A breakpoint
is signaled when |MOt| deviates from zero beyond a 95% significance boundary (Figure
2).
In addition, BFAST allows the computation of change magnitude (M) for each
breakpoint detected by taking the median of residuals within the monitoring period, in
which tn ≤ ti ≤ tN:
where yt and ŷt are actual and expected observations, respectively based on the
methods used by DeVries et al. (2015). BFAST registers the time when the breakpoint
was detected within the monitoring period.
There are several parameters that can be modified in BFAST, but the most
significant are:
▪ formula – regression model formula (harmonic and/or trend component)
▪ order – order of the harmonic term
Figure 2. Example of a breakpoint flagged on a single pixel by BFAST
Texas Tech University, Vaughn Smith, December 2017
15
▪ start – starting date of the monitoring period
▪ history – specification of the stable history period
▪ h – numeric scalar between 0 and 1 specifying bandwidth relative to the sample
size in the MOSUM monitoring process
The ‘bfastmonitor’ package in R was later optimized to run on spatial data, since
the time-series input for ‘bfastmonitor’ is univariate and it cannot accept data in a raster
format. The optimized version, ‘bfmSpatial’, is able to accept a raster brick as an input
and runs ‘bfastmonitor’ on every pixel of an image. A raster brick is an object class in
R that consists of a single object that consists of multiple layers, which in this case
correspond to Landsat images with each layer from a specific date. The output of
‘bfmSpatial’ is a raster brick with the default layers being breakpoint, magnitude and
error with supplementary layers history, r.squared, adj.r.squared, and coefficients for
further external statistical analysis, but were not used for the purposes of this research.
Other research conducted utilizing the BFAST family of algorithms has shown
promising results. Verbesselt et al. (2010) found that BFAST accurately detected
significant phenological changes, both abrupt and gradual, over long periods of time
with an ability to filter out noise, or false positive breaks (although the quality of data
was noted as an important factor in handling noise). However, later research conducted
by Schultz et al. (2016) found several error sources related to the BFAST algorithm
including topography, atmosphere, edge effects and data availability and variance. All
of these factors contribute to commission error, but data availability is particularly
important in that the number of observations in the monitoring period significantly
affects accuracy and omission errors. The density of the time-series is key in that the
more data that is available, the better a model can be fit, and the more advancements in
data availability (i.e. data repositories, cloud computing, other Landsat-like sensors such
as Sentinel 1 and 2 with higher temporal, spatial and spectral resolution) will allow for
increased data density to fill in any potential gaps (Schultz et al., 2016).
Texas Tech University, Vaughn Smith, December 2017
16
2.5 The challenges of detecting and tracking deforestation in TDF
Despite the many advances discussed, significant challenges still present
themselves when tracking deforestation, especially within TDF ecosystems. The
persistent small-scale changes in these landscapes (usually related to small-holder
agricultural expansion) and natural temporal patterns of leaf senescence are two major
constraints to the accurate mapping and accounting of deforestation (DeVries et al.,
2015). Patterns of senescence are prominent in tropical dry forests due to the
pronounced seasonality that is a feature of TDF. As water stress increases, senescence
increases as well, and the severity and length of dry periods will have an effect on the
level of senescence that a TDF species experience. Senescence can present
complications in a remote sensing capacity, especially in regard to disturbance /
deforestation tracking, but when observed over long enough periods of time,
phenological patterns can be distinguished from disturbance or deforestation events.
Fitting the right regression model over the pixel values for tropical dry forests
ecosystems ensures that phenological patterns are taken into account when estimating a
break in the series.
2.6 Other change detection algorithms
There are other change detection algorithms that are also currently being
researched, which show promise in annual as well as near real-time change detection.
Two examples of these algorithms are the Continuous Change Detection and
Classification (CCDC) algorithm developed by Zhu and Woodcock (2014) and the
Landsat-based Detection of Trends in Disturbance and Recovery (LandTrendr)
algorithm developed by Kennedy et al. (2010). The CCDC algorithm utilizes all
spectral bands from Landsat within a different mathematical model over each individual
pixel. The continuous aspect of the algorithm implies near real-time functionality, as
the algorithm is intended to have the capacity to detect changes with each newly added
image. LandTrendr on the other hand recognizes the limiting factors of Landsat data,
which include the 16-day temporal cycle of Landsat, cloud cover issues, as well as data
collection gaps. Because of this, LandTrendr utilizes an annual temporal scale in year-
Texas Tech University, Vaughn Smith, December 2017
17
to-year change detection. LandTrendr is also a pixel based system like CCDC and
BFAST, but the LandTrendr algorithm allows for smoothing over longer periods
reducing spectral noise, as well as capture of more abrupt unsmoothed events, which
combines the trend-seeking and deviation-seeking approaches of previous studies.
BFAST was selected over these other well-known disturbance monitoring
algorithms such as the CCDC and LandTrendr because it has shown to be more resistant
to noise and missing data (due to persistent cloud cover) and it produces monthly
information on breakpoints and trends.
Texas Tech University, Vaughn Smith, December 2017
18
CHAPTER III
MATERIALS AND METHODS
3.1 Study area and project context
For this study, I selected my study sites within two tropical dry forest ecoregions of
Mesoamerica: the dry forests of Yucatan Peninsula, Mexico; and the dry forests of the
Guanacaste Conservation Area, Costa Rica (Figure 3). Both ecoregions have distinctive
and contrasting land use histories, landscape distribution, species composition,
management regimes and anthropogenic threats.
Figure 3. Location of the study areas in Mesoamerica: A) Yucatan, Mexico and B)
the Guanacaste Region, Costa Rica. Green coverage refers to Tropical Dry Forest
extent mapped by Portillo-Quintero and Sanchez-Azofeifa (2010) for Mesoamerica.
Texas Tech University, Vaughn Smith, December 2017
19
3.1.1 Yucatan, Mexico
The Yucatán Peninsula is located in southeastern Mexico and separates the
Caribbean Sea from the Gulf of Mexico. The tropical sub-humid climate becomes drier
moving towards the central portion of the region, with a pronounced dry season lending
to the deciduous nature of the forests, receiving less than 1200 mm/year of rainfall.
Additionally, the dry forests of the Yucatán are isolated from other dry forests by sea
and vast rainforests, which has created a region with a unique composition of
biodiversity.
The tropical dry forests of the Yucatan Peninsula are among the most threatened
ecosystems in America. Rotational crops (milpas) are a widespread practice in the
region. Forests are cleared for the establishment of crops (mainly corn), and then after
two to three years the land is abandoned and vegetation is allowed to regrow, while the
adjacent parcels with secondary vegetation are cleared for establishing another crop
(Garcia-Frapolli, 2007). This cycle is evident in the dynamics of land use and land cover
in the region. However, the expansion of agribusiness practices, tourism and the
expansion of cattle ranching in the area has contributed to increased rates of forest
conversion in this region. Many square kilometers of dry forest have been also
substituted either by henequén plantations, or by secondary communities that arise from
intense cattle grazing (Valero et al., 2017). I selected two (2) sites to implement this
methodology in the Yucatan peninsula (Figure 4): a) East of Tekit, Yucatan, Mexico,
and b) Tekik de Regil, Yucatan, Mexico.
Texas Tech University, Vaughn Smith, December 2017
20
3.1.1.1 Study Area 1 (Y1): East of Tekit
The site east of Tekit will be referred to from now on as ‘Y1’ site (Figure 4).
The study site is centered around the following coordinates (UTM 16N WGS84
261754.47 m E, 2281785.68 m N) covering 100 km2 of tropical dry forest dominated
landscape, ten kilometers northeast of the rural Town of Tekit, which has a population
of around 10,000 inhabitants.
3.1.1.2 Study Area 2 (Y2): Tekik de Regil
The site Tekik de Regil will be referred to from now on as ‘Y2’ site (Figure 4).
The study site is centered around the following coordinates (UTM 16N WGS84
235655.17 m E, 2305423.92 m N) also covering 100 km2 of tropical dry forest
dominated landscape, fifteen kilometers southeast of the city of Merida, capital of
Yucatan state, which has a population of around 800,000 inhabitants.
Y1
Y2
Figure 4. Location of Y1 and Y2 sites in the Guanacaste Region, Costa Rica. Green
coverage refers to Tropical Dry Forest extent mapped by Portillo-Quintero and
Sanchez-Azofeifa (2010) for Mesoamerica.
Texas Tech University, Vaughn Smith, December 2017
21
3.1.2 Guanacaste, Costa Rica
The Guanacaste Conservation Area is in the northwestern part of Costa Rica and
consists of two geographical zones: The Nicoya Peninsula and the Tempisque Northeast
Basin. It contains three national parks, as well as wildlife refuges and other nature
reserves that are managed by the Sistema Nacional de Areas de Conservacion (SINAC).
Because of this protected status, the area contains the largest amount of continuous,
undisturbed tropical dry forest from Mexico to Panama with approximately 120,000
terrestrial hectares. This area also experiences a pronounced dry period typical of
tropical dry forest, when at least 80% of the trees lose their leaves and stand leafless for
three to five months. The area receives between 800 and 2600 mm of rainfall, typically
between May and November. The Guanacaste area is characterized by a mix of tropical
dry and moist ecological zones with steep terrain and thin or infertile soils that are
mostly classified as unsuitable for agriculture (Calvo-Alvarado et al. 2009). I selected
two (2) sites to implement this methodology in Guanacaste (Figure 5): a) Cuajiniquil-
Soley, Guanacaste, Costa Rica., b) North of Bijagua, Alajuela Province, Costa Rica.
3.1.2.1 Study Area 1 (G1): Cuajiniquil-Soley
The site east of Cuajiniquil-Soley will be referred to from now on as ‘G1’ site
(Figure 5). The study site is centered around the following coordinates: UTM 16N
WGS84 645540.18 m E, 1213377.58 m N, covering 50 km2 of tropical dry forest
dominated landscape, within the Guanancaste Conservation Area in the Tempisque
Northeast Basin.
Texas Tech University, Vaughn Smith, December 2017
22
3.1.2.2 Study Area 2 (G2): North of Bijagua
The site north of Bijagua will be referred to from now on as ‘G2’ site (Figure 5).
The study site is centered around the following coordinates: UTM 16N WGS84
708810.64 m E, 1195708.74 m N, covering 80 km2 of human dominated and fragmented
tropical dry forest landscape, outside of protected areas, and in the proximity of the
Arenal Volcano. The area has a low density population of rural cattle ranching farmers
and small towns.
These sites were selected because they represented areas where deforestation
has occurred annually (low or high levels of deforestation) since 2001 as verified by the
GFW tree loss dataset (http://www.globalforestwatch.org/map). Given that processing
larger sizes of data (one complete landsat scene) will take several hours of processing,
the size for each study areas comprised between 50-100 square kilometers of land. This
G1
G2
Figure 5. Location of G1 and G2 sites in the Guanacaste Region, Costa Rica. Green
coverage refers to Tropical Dry Forest extent estimated by the GlobCover2009
project for Mesoamerica.
Texas Tech University, Vaughn Smith, December 2017
23
size was identified as optimum for this study because it allows us to repeat and iterate
‘bfastSpatial’ processing of Landsat Time series multiple times using the power of a
Desktop PC (16GB RAM) for research purposes. The size also allowed us to purchase
complete coverage of high resolution imagery (GeoEye, Worldview) for the same
locations for inspection and initial validation purposes.
3.2 Data acquisition and preprocessing
Satellite images used for this analysis consisted of multispectral images from the
Landsat 7 Enhanced Thematic Mapper + (ETM+) and Landsat 8 Operational Land
Imager (OLI) sensors.
Landsat 7 was launched in 1999 and utilizes a whisk broom scanning approach
that uses a single detector and mirror to acquire data one pixel at a time by scanning
back and forth. However, these scanners have more moving parts that are subject to
failure, as in 2003 when the scan line corrector failed, creating data gaps. While
approximately 75% of the data for each scene is collected, it still creates an issue for
time-series data analysis. Landsat 8 on the other hand uses a push broom scanner, which
has multiple detectors that scan a line of pixels all at once, thus being less susceptible
to the wear and tear of having more moving parts. The Landsat 8 satellite has been
operational without error since 2013. Landsat 7 and 8 are both similar in terms of spatial
and temporal resolution, with a 30 meter resolution (each pixel is 30x30 m) and 16-day
revisit time. However, they differ slightly in their spectral resolution with Landsat 7
having 9 bands and Landsat 8 having 11 bands with some variation in their position and
range within the electromagnetic spectrum (see Figure 6 for comparison).
Texas Tech University, Vaughn Smith, December 2017
24
I used the USGS Glovis and Earth Explorer application to retrieve Landsat scene
lists of all available scenes without any filters such as cloud cover, which were then
used as inputs in the USGS ESPA ordering system. The USGS ESPA ordering system
provides additional data output products such as calculated vegetation indices, product
metadata, surface reflectance, top of atmosphere reflectance, brightness temperature,
and pixel QA band that is used to create a cloud mask in R via the processLandsatBatch
function. The USGS ESPA ordering system also allows for image preprocessing such
as reprojection, image extent modification and pixel resizing.
For the Y1 and Y2 sites, Landsat imagery from Path/Row 20/46 scenes was
acquired through ESPA. A total of 241 images available between 2000 – 2016 were
processed. For G1 and G2, Landsat imagery from Path/Row 16/52 and 16/53 scenes
was acquired through ESPA. A total of 224 images available between 2000 – 2016 were
processed.
The Landsat imagery products requested through ESPA corresponded to
vegetation indices (VIs). The VIs used in this study included the normalized difference
vegetation index (NDVI), enhanced vegetation index (EVI), normalized burn ratio
(NBR), normalized burn ratio 2 (NBR2), modified soil-adjusted vegetation index
Figure 6. Comparison of spectral bands for Landsat 7 (L7 ETM+) and Landsat 8
(OLI & TIRS)
Texas Tech University, Vaughn Smith, December 2017
25
(MSAVI), and normalized difference moisture index (NDMI), which are explained in
the next section.
Unfortunately, the USGS ESPA ordering system made significant changes
during the course of this research, which presented issues that are currently in the
process of being resolved. The USGS ESPA ordering system changed the Landsat file
naming nomenclature as they made the switch to only processing Landsat Collection 1
images as opposed to Landsat pre-collection images. This new naming convention
wasn’t initially recognized by the ‘bfmSpatial’ algorithm, but in working with the
developers an initial workaround was put into place with a more permanent solution
being established currently.
3.3 Vegetation indices
Vegetation indices are spectral comparison functions of two or more bands on
the electromagnetic spectrum intended to emphasize various properties of vegetation.
For example, the normalized difference vegetation index (NDVI) is a ratio comparing
near-infrared and red reflectance values, as healthy vegetation typically displays very
low reflectance in the red band with high reflectance in the near-infrared band (see
Figure 7).
Figure 7. Example of the electromagnetic signature of healthy green vegetation
and associated absorption and reflectance features.
Texas Tech University, Vaughn Smith, December 2017
26
The formulas to obtain all indices are as follows:
▪ NDVI = (𝑁𝐼𝑅 − 𝑅𝑒𝑑)
(𝑁𝐼𝑅 + 𝑅𝑒𝑑)
▪ EVI = 2.5 ∗𝑁𝐼𝑅 − 𝑅𝑒𝑑
𝑁𝐼𝑅 + 6 ∗ 𝑅𝑒𝑑 − 7.5 ∗ 𝐵𝑙𝑢𝑒 + 1
▪ 𝑁𝐵𝑅 = (𝑁𝐼𝑅 − 𝑆𝑊𝐼𝑅2)
(𝑁𝐼𝑅 + 𝑆𝑊𝐼𝑅2)
▪ 𝑁𝐵𝑅2 =(𝑆𝑊𝐼𝑅1 – 𝑆𝑊𝐼𝑅2)
(𝑆𝑊𝐼𝑅1 + 𝑆𝑊𝐼𝑅2)
▪ 𝑀𝑆𝐴𝑉𝐼 =2 ∗𝑁𝐼𝑅+1 −√(2∗𝑁𝐼𝑅+1)2 −8(𝑁𝐼𝑅 −𝑅𝑒𝑑)
2
▪ 𝑁𝐷𝑀𝐼 =(𝑁𝐼𝑅 − 𝑆𝑊𝐼𝑅1)
(𝑁𝐼𝑅 + 𝑆𝑊𝐼𝑅1)
Where:
NIR – Near-infrared band (Band 4 in Landsat 7 and Band 5 in Landsat 8)
Red – Red band (Band 3 in Landsat 7 and Band 4 in Landsat 8)
2.5 – Gain factor for correction
6 & 7.5 – Coefficients of aerosol resistance term
Blue – Blue band (Band 1 in Landsat 7 and Band 2 in Landsat 8)
1 – Canopy background adjustment
SWIR1 – Short-wave infrared 1 (Band 5 in Landsat 7 and Band 6 in Landsat 8)
SWIR2 – Short-wave infrared 2 (Band 7 in Landsat 7 and 8)
NDVI is a commonly used vegetation index that measures green, healthy
vegetation as it utilizes the regions of the electromagnetic spectrum most associated
with high absorption of chlorophyll in the red band and high reflectance of NIR band
by leaf mesophyll layers. (Jensen 2016). EVI was developed as an improvement to
NDVI as it corrects potential NDVI saturation issues due to areas with a high leaf area
Texas Tech University, Vaughn Smith, December 2017
27
index (LAI), which is an estimate that characterizes foliage cover and plant canopies
(Exelis, 2017). NBR and NBR2 both utilize infrared bands that are most sensitive to
changes related to fire and are significant indicators of burn severity. NBR uses a
combination of band 5 (Near infrared) and band 7 (shortwave infrared) from Landsat 8,
while NBR 2 uses both shortwave infrared bands 6 and 7 (Boer et al. 2008). MSAVI is
a modified or improved version of the Soil-Adjusted Vegetation Index (SAVI) that has
an adjustment factor to minimize soil noise that is usually picked up by NDVI. This
adjustment factor is iterated continuously in the MSAVI, which increases the dynamic
range of SAVI, optimizing soil adjustments (Qi et al. 1994). NDMI improves upon
NDVI in its ability to track water stress and plant biomass changes more closely, as the
bands used highly correlate with water content of canopies. NDMI is similar to NBR,
but it uses band 6 as the shortwave infrared information (Jensen 2016).
3.4 Open-source and licensed software
Many software tools were used to complete this research, primarily R, RStudio,
ArcGIS, and Google Earth. R is an open-source, object-oriented statistical
programming language which was used in combination with RStudio, an integrated
development environment that has robust features such as code editing, debugging and
various graphics and visualization tools. The R language is widely used amongst
researchers and data specialists for developing statistical software and analyzing data,
and has continued to rise in popularity since the release of the first stable beta version
in 2000 (R version 3.3.3 was used for this research). The open-source nature of R makes
it quite accessible as there are several packages already built in with advanced
functionality and thorough documentation, and more being developed continually
through various R communities such as the Comprehensive R Archive Network
(CRAN). ArcGIS is a powerful mapping and analytics platform that was used to
analyze output data produced by ‘bfastSpatial’ in R. ArcGIS is a licensed software
product developed by ESRI that was initially released in 1999 and is currently in its 10th
version (ArcGIS 10.3.1 was used for this research). ArcGIS is used for a variety of
purposes and has many capabilities including spatial analytics, mapping and
Texas Tech University, Vaughn Smith, December 2017
28
visualization, 3D modeling and visualization, real-time GIS applications, remote
sensing imagery, and data collection and management.
Google Earth is another powerful mapping platform that has many built in
datasets that were used for validating results. Google Earth is freely available and was
initially released by Google in 2001. It is also rather intuitive to use, making it a widely
popular platform. The program uses satellite imagery to create a 3D rendition of Earth,
which can be navigated very simply, and allows for the addition of layers from existing
Google datasets or custom, user-created layers. Google Earth was particularly useful in
this research for validation of the accuracy of ‘bfastSpatial’ outputs as well as for the
valdation of near-real-time functionality as Google Earth has time-lapse data built in
that allows users to inspect high spatial resolution imagery at variable time frames.
3.5 System architecture
The general approach consists of firstly acquiring Landsat 7 and 8 images from
2000 to 2016. Imagery is then used as inputs for the ‘bfastSpatial’ algorithm. The
images are first stacked, representing a singular brick as the input. Then ‘bfastSpatial’
objectively analyzes each pixel individually, by creating a seasonal trend model based
on the real vegetation index data over time. This model is used as the basis of
comparison against real data in a specified monitoring period, and if the real data breaks
from what is expected in the model, a breakpoint is then flagged with a specific
magnitude of how severe the break was from the trend.
As a first step, pre-processed data acquired from the USGS ESPA ordering
system are placed into a directory named “landsat.” (See Figure 8 for details of directory
architecture).
Texas Tech University, Vaughn Smith, December 2017
29
These directories also need to be created within the R environment with function
command such as: landsatDir <- file.path(stepDir, ‘landsat’) (See Appendix A for exact
coding).
The function ‘processLandsatBatch’, which is part of the ‘bfastSpatial’ package,
is then used to extract data for all of the vegetation indices and apply a cloud mask via
the ‘pixel_qa’ layer, which is a quality assessment algorithm that contains cloud, cloud
confidence, cloud shadow, snow/ice, and water data. This mask allows for the extraction
of low quality pixels from of the analysis.
This vegetation index data is stored in separate directories, which are then used
to create raster brick object types via the ‘timeStack’ function, another ‘bfastSpatial’
function. Each raster brick contains layers and each of the layers is an image of
vegetation index data with an associated date from 2000 – 2016. These images stacked
together forming a brick is the time series data used as the input for ‘bfmSpatial’.
data
(Stores VI Stacks)
datastep
ndvi (and other VI folders)
landsat
out
(Stores Outputs)
Figure 8. Folder architecture that needs to be created outside of R environment on
computer.
Texas Tech University, Vaughn Smith, December 2017
30
Once a vegetation index is bricked it can be run through ‘bfmSpatial’, which has
several parameters and inner workings to be considered.
First, the date vector object ‘dates’ is created by acquiring the scene information
provided in the Landsat scene ID. Next, sensor information is acquired for the sub-
setting of data by sensor. Then the length of the coefficient vector is determined based
on the formula selected (trend and/or harmonic). At this point the system is set to run
the iterative function that runs ‘bfastmonitor’ on every pixel over the raster brick. The
brick is first subset by sensor, which can be used to limit the analysis by sensor (all
sensors were used in this study), and then converted to a BFAST time-series object by
the ‘bfastts’ function as ‘bfastmonitor’ does not accept raster class objects.
After BFAST time-series creation, ‘bfastmonitor’ is run on every pixel with the
following parameters:
▪ data = time-series raster brick
▪ start = start of monitoring period (2013 in this study)
▪ formula = response ~ harmonic
▪ order = 1
▪ lag = NULL
▪ slag = NULL
▪ history = c(“all”)
▪ type = “OLS-MOSUM”
▪ h = 0.25
▪ end = 10
▪ level = 0.05
The most important of these parameters are ‘formula’, ‘order’ and the ‘h value’
as these are how the model is created from which breaks are detected.
Texas Tech University, Vaughn Smith, December 2017
31
The ‘formula’ parameter refers to the seasonal model to be fitted from landsat
observations, ‘order’ refers to the order of the equation of the model, while ‘h-value’
referes to the fraction of values from the history period (all data) that will be used to
compute the OLS-MOSUM statistic. This means that for the period of 2000-2016, an
‘h-value’ of 0.25 or 25% will allow a 4 year window of data to be included in the OLS-
MOSUM computation. At this h-value, only one break will be able to be detected every
four years.
Previous research from Verbesselt et al. (2012) and DeVries et al. (2015), as
well as experimentation during the course of this research have suggested that a first
order seasonal harmonic model with an h value of 0.25 produce the most accurate
results. The reason for this is that a first order harmonic model better represents seasonal
variability in leaf phenology in tropical dry forests. In addition, because low-quality
pixel values are very frequent (up to 50% of the complete available data), an h-value of
0.25 (4 years), allows enough data to be included in the computation of the MOSUM
statistic for break detection.
The ‘monitoring period’ is set by the user and represents the window or time
frame where the user wants to visualize breakpoints. In this study, I selected the period
2013-2016 as the monitoring period.
Additionally, if the internals of ‘bfmSpatial’ are being run manually there are
additional parameters that need to be set as they are the actual parameters within
‘bfmSpatial’ that are required for the internal components of ‘bfmSpatial’ to function
and to process ‘bfastmonitor’ outputs (see Appendix A for detailed code). These
parameters are:
▪ x = time-series raster brick
▪ dates = NULL (set internally within bfastSpatial)
▪ pptype = ’irregular’ (temporal resolution or time between images)
▪ monend = NULL (optional end of monitoring period)
Texas Tech University, Vaughn Smith, December 2017
32
▪ mc.cores = 1 (optional parameter in parallel processing)
▪ returnLayers = c(“breakpoint”, “magnitude”, “error”) – output brick layers to
include
▪ sensor = c(“ETM+ SLC-on”, “ETM+ SLC-off”, “OLI”) – sensors to include
Once ‘bfastmonitor’ has run over all pixels the output is a raster brick with the following
layers:
▪ breakpoint – timing of breakpoints detected for each pixel
▪ magnitude – the median of the residuals within the monitoring period
▪ error – a value of 1 for pixels where an error was encountered by the algorithm
and NA where the method was successfully run
▪ Additional layers not in default returnLayers parameter include history,
r.squared, adj.r.squared and coefficients, which can be used for additional
statistical analysis not covered in this research.
The output layers can be further manipulated by separating breakpoint timing by
year and month with the changeMonth function, which is part of the ‘bfastSpatial’
package, as well as by creating a magnitude map of only the breakpoints (the magnitude
layer by default shows magnitude for all pixels). The various outputs can be verified
via the ‘plot’ function, a standard R function that allows graphical viewing of data. The
output layers and manipulations can then be converted to GeoTiff files via the
writeRaster function, which is part of the raster package for R. The breakpoint timing,
magnitude of breakpoints, and month of breakpoint separated by year (2013 – 2016)
were all used for the purposes of this research. Once outputs are obtained a threshold
can be applied to the magnitude product so that only negative values remain thus
creating a map of only potentially deforested areas (see Figure 9 for detailed workflow).
Texas Tech University, Vaughn Smith, December 2017
33
Figure 9. Flowchart representing methods of this study.
Texas Tech University, Vaughn Smith, December 2017
34
3.6 Design of a near-real-time monitoring system using BFAST
The nature of the ‘bfastSpatial’ algorithm is such that the monitoring period
requires enough data to accurately determine breakpoints. Therefore, if a single year is
used as the monitoring period, there may not be enough data to accurately determine
breakpoints. This is why I used a four year monitoring period to more accurately detect
breaks. However, during this monitoring period, only one break can be detected per
pixel, which negatively impacts the temporal accuracy of detected breaks. Since the
monitoring period is four years a break could be detected prematurely due to variations
in precipitation so an area that is drought-stricken in 2013 for example could be
prematurely flagged as a breakpoint when deforestation actually occurred in 2015.
Because of these temporal accuracy issues, a new approach was devised. With the new
approach, outputs were obtained from ‘bfastSpatial’ with a monitoring period from July
2011 to July 2015 and compared to outputs with a monitoring period from January 2012
to December 2015 (includes new data from the most recent 6 months).
Previous research suggested that simply adding single images onto the
monitoring period as they become available would result in new breakpoint detection in
that particular image if deforestation had occurred. This ideal setup could be considered
real-time. However, it was found that one additional image was not enough data to
ensure an accurate detection of a break so I hypothesized that additional data (up to 6
months) would provide better detection accuracy. Although, six months is a long period
of time compared to the concept of real-time, I consider this time frame to be ‘near real-
time’, in comparison to current global approaches based on annual or decadal
information.
This additional 6 months of data provided enough for the algorithm to
accurately detect breaks within the new data (see Figure 10 for visualization).
Texas Tech University, Vaughn Smith, December 2017
35
MOSUM computed over 4- year data from 07/11
to 07/15
4-year window shifted to 01/12 to 12/15 to include 6 months of
new data (blue box) with break detected (red dotted line and circle)
0.35
0.45
0.55
0.65
0.75
0.85
0.95
1.05
Yucatan2 NBR2 Secondary Run - 6 Months Later
Figure 10. Model visualization of how near real-time system functions.
Texas Tech University, Vaughn Smith, December 2017
36
3.7 Validation
For evaluating the accuracy of the BFAST method in detecting small scale and
short term TDF loss, I implemented an ‘error matrix’ or ‘confusion matrix’ accuracy
assessment approach based on Congalton (1991), which has been extensively used for
map validation in scientific studies. The error matrix compares map values generated
by automated processes (in my case, breakpoint magnitudes indicating deforestation)
with true ground information collected from reference data. Reference data can be
collected by field visits or by using very high spatial resolution imagery (VHR).
An error matrix allows quantifying the simplest measure of accuracy, called the
‘Overall accuracy’, which is computed by dividing the total correctly classified pixels
by the total number of pixels in the error matrix. In addition, the error matrix method
allows to calculate the ‘Producer’s accuracy’ which refers to the number of correct
pixels in a category divided by the total number of pixels in that category based on the
reference data. This accuracy measure indicates the probability of a reference pixel
being correctly classified and is considered a measure of omission error, indicating how
well a certain area was classified. A third accuracy measurement is given by the ‘User’s
accuracy’, which is the total number of correct pixels in a category divided by the total
pixels classified in that category and is a measure of commission error, which indicates
the probability that a pixel classified on the map/image actually represents that category
in the ground.
For calculating the accuracies of the BFAST products, I collected ground truth
information using multi-temporal very-high resolution (VHR) imagery acquired for the
project and available in Google Earth platform (Rapid eye/World View < 5 m spatial
resolution) similarly to DeVries et al. (2015), Grogan et al. (2016), Murillo-Sandoval et
al. (2016) and Schultz et al. (2016).
For collecting ground truth data, I used all available VHR imagery in Google
Earth from all dates available starting 01/2012 until December 12/2015. A grid with 1
hectare cells (100 x 100 m) was created using ArcGIS covering the entire study area
Texas Tech University, Vaughn Smith, December 2017
37
extent. However, for practical purposes, I selected a subset of the area near to 20% of
the whole study area, for performing the validation processes. This allows dedicating
higher level of attention for detecting deforestation using VHR imagery and thus,
obtaining a higher quality reference dataset. This was performed for Y1, Y2 and G2
sites (See Figure 11). For G1, I used the entire grid covering the study site in order to
collect the highest possible amount of reference information since deforestation in this
site occurred in very low densities.
Using multi-temporal imagery in Google Earth, sites Y1, Y2 and G1 were
inspected thoroughly on screen. One-hectare grid cells were categorized as ‘D’
(indicating a deforested location) and ‘S’ (stable land cover/land use). Deforested cells
(‘D’) referred to areas that were visibly covered by tropical dry forests in 2012 and the
beginning of 2013, and visibly non-forested by the beginning of 2016, with evident soil
exposure (Figure 12). Stable cells (‘S’) referred to areas that were either agriculture,
forest, pasture and other land covers and remained the same during the 2012-2016
period. Forest regrowth areas, initially considered, referred to deforested areas that
started to regain vegetation and accumulated sufficient biomass that was visible in VHR
imagery and potentially picked up by the BFAST algorithm. However, although an
important percentage of the original biomass was restored in 3 years, BFAST
predominantly mapped these areas as ‘D’ events. Therefore, I considered these areas as
already deforested sites that will rather fit into the ‘S’ category.
Multi-temporal VHR imagery for the G2 site was lacking in the Google Earth
platform. Therefore, I used the Hansen et al. (2015) dataset of annual tree cover loss for
the years 2013-2016 available in the Global Forest Watch website (www.gfw.org) as an
independent source of information.
Texas Tech University, Vaughn Smith, December 2017
38
Reference data
Y2 site Y1 site
‘D’ (deforested) ‘S’ (stable)
G1 site G2 site
Figure 11. Reference data locations for validation. One-hectare grid cells visually
inspected via multi-temporal images available in Google Earth (total n=373). Y1
(‘D’=64; ‘S’=54), Y2 (‘D’=32; ‘S’=32), G1 (‘D’=21; ‘S’=50), G2 (‘D’=70; ‘S’=50).
Texas Tech University, Vaughn Smith, December 2017
39
The reference dataset consisted of 373 locations with confirmed deforestation
events across the four sites. The 373 locations were compared with BFAST products
(breakpoint and change magnitude maps). Breakpoints with moderate to extreme
negative change magnitudes were considered indicators of forest clearings. Error
matrices were constructed using BFAST outputs from each vegetation index. I
evaluated the accuracy of breakpoints detected for each time series of vegetation indices
(EVI, NBR/NBR2, NDMI, MSAVI and NDVI) and compared accuracy ranks obtained
among indices.
3.8 Near real-time validation
The near real-time results were validated by viewing both of the magnitude
outputs from 07/2011 to 07/2015 and 01/2012 to 12/2015 in ArcGIS and creating a point
shapefile of all of the newly detected breaks in the 6-month period (see Figure 13 for
visualization). This shapefile was then converted into a KMZ file to be imported into
Google Earth. The historical image tool was then used to view imagery from 2015
corresponding to the period between July 2015 and December 2015. In order to
determine accuracy of the near real-time method every point that corresponded to a new
‘D‘D
‘D
‘D‘D
‘D
‘S’
‘S’
‘S’
‘S’
‘S’ 2013 2016
Figure 12. Validation example. Deforested cells (‘D’) – areas visibly covered by TDF
beginning 2013, and visibly non-forested by 2016, with soil exposure. Stable cells
(‘S’) referred to areas of any land covers that remained the same during 2013-2016.
Texas Tech University, Vaughn Smith, December 2017
40
breakpoint was analyzed based on the Google Earth image prior to 07/15, which was
03/25/2015 and the images between 07/15 and 12/15, which was 09/05/2015. Using
these sample pixels from the BFAST output, I evaluated the producer’s accuracy of
BFAST at a local scale to detect new deforestation in a period of 6 months.
Figure 13. Magnitude outputs for NBR2 in Yucatan site 2. Monitoring period 07/11
to 07/15 (green to red) overlays monitoring period 01/12 to 12/15 (green to blue) so
that new deforestation is highlighted in blue.
Texas Tech University, Vaughn Smith, December 2017
41
CHAPTER IV
RESULTS AND DISCUSSION
In this research the overall efficacy and accuracy of the BFAST set of algorithms
was evaluated in the analysis of small-scale deforestation in tropical dry forest
environments in the Yucatan Peninsula region and Guanacaste Conservation Area. Data
availability was an immediately apparent issue as Landsat data has a 16-day temporal
resolution and due to the scan line corrector failure in Landsat 7 data, as well as poor
quality and high cloud cover of tropical areas, a significant number of observations
(clear pixel values) were masked by the image quality mask. For example, each year
should have approximately 23 corresponding Landsat scenes, but some years had 10 or
less, thereby impacting the creation of the model.
Overall, the ‘bfastSpatial’ algorithm performed well in terms of processing time
and accuracy of results. The majority of the Landsat 8 tar files were approximately 6
MB, with the Landsat 7 files being approximately 3 MB. All of the processing was
done on either a MacBook Pro running OS X 10.11.6 with an Intel Core i7 3.1 GHz
processor and 16 GB of 1887 MHz DDR3 RAM memory or a Dell OptiPlex 2010
running Windows 7 Enterprise with an Intel Core i7 3.4 GHz processor and 16 GB of
RAM memory, with both having similar results in terms of processing time. Time-
series raster brick creation was efficient, generally taking 5 minutes or less and creating
bricks approximately 45 MB in size. Running ‘bfmSpatial’ was the longest part of the
process and generally took about 20-30 minutes to produce the output bricks with
breakpoints, magnitudes, error, and other supplementary outputs, which were typically
around 7 MB in size.
4.1 Breakpoints and magnitudes
Figures 14-17 (next pages) show the distribution of pixels where breakpoints
were detected for the 2013-2016 monitoring period for each site. Breakpoints are
labeled with its corresponding magnitude value using a red>yellow>green color
Texas Tech University, Vaughn Smith, December 2017
42
gradient scheme. Green and yellow magnitude values correspond to slight to moderate
positive breakpoints, while reddish magnitude values correspond to slight to extreme
negative breakpoints.
Both areas in the Yucatan experienced significantly higher rates of breakpoint
detection and more overall deforestation than both areas of Guanacaste. Breakpoint
detection in both study areas of the Yucatan ranged widely from about 14.1% to 92.5%
(see Figures 14 and 15) of pixels being labeled as breaks from the model, whereas both
areas of Guanacaste ranged from about 2.8% to 51.4% (see Figures 16 and 17). The
most notable in terms of high breakpoint detection percentage were NDVI, MSAVI and
NDMI readings from Yucatan study site 2 (Figure 17) with 92.5%, 89.3%, and 81.3%,
respectively, of total pixels flagged as breakpoints. This could also be due to several
factors such as better data quality, or the study site’s proximity to the metropolitan area
of Merida. On the other hand, the most notable low breakpoint percentage was EVI
with 2.8% in Guanacaste study site 2 (Figure 17), although this area had low breakpoint
percentages among all indices. This area is more centrally located in the Guanacaste
region near the Arenal Volcano and a mountain range, opposed to the first study site
which is located near coastline. I believe G2 has more probability of being obscured by
heavy cloud cover, thus creating large gaps in the data, which negatively impact the
model created by BFAST for those pixels.
Breakpoint magnitude was the most significant predictor of a deforestation event
with the most negative values correlating most to deforestation. The magnitude values
varied greatly, but quite interestingly amongst sites not even in the same region rather
than vegetation index. Yucatan site 1 and Guanacaste site 2 had similarly broad
magnitude ranges amongst all vegetation indices with the extremes being -0.660399
(NDMI) and 0.477973 (NDVI) in Yucatan site 1, and -0.644136 (EVI) and 0.461431
(NDMI) in Guanacaste site 2. Conversely, Yucatan site 2 and Guanacaste site 1 had
comparably narrow magnitude ranges amongst all vegetation indices, the confined
outliers being -0.267992 (NBR) and 0.286369 (NBR) in Yucatan site 2, and -0.29846
(NBR) and 0.277289 (NDVI) in Guanacaste site 1.
Texas Tech University, Vaughn Smith, December 2017
43
DeVries et al. (2015) determined that moderate to extreme negative magnitude
values are associated with declines in forest cover, which is expected since significant
negative breaks in vegetation index values can only occur with conversion of a
vegetated cover to other land use. Yellow to green values represent positive breaks
caused by a sudden increase in the value of the vegetation index. This could be explained
by increases of precipitation in the area, which could produce an excess of moisture in
the soil and vegetation and therefore, a slight increase in the values for indices such as
NBR and NDMI. For the purposes of this research, I only considered moderate to
extreme negative magnitude values as indicators of TDF loss.
Texas Tech University, Vaughn Smith, December 2017
44
Y1 site – Magnitude Values of All Detected Breakpoints
Vegetation Index NDVI EVI MSAVI NDMI NBR NBR2
Breakpoint Pixels 54,079 14,071 40,003 42,337 39,362 24,389
Total Pixels 99,855 99,855 99,855 99,855 99,855 99,855
Percentage 54.2% 14.1% 40.1% 42.4% 39.4% 24.4%
MSAVI NDVI EVI
NDMI NBR NBR2
Figure 14. Maps of breakpoint magnitudes for all VIs for Y1 and pixel percentages.
Texas Tech University, Vaughn Smith, December 2017
45
Yucatan Area 2 – Magnitude Values of All Detected Breakpoints
Vegetation Index NDVI EVI MSAVI NDMI NBR NBR2
Breakpoint Pixels 92,334 46,317 89,121 81,144 68,658 33,895
Total Pixels 99,855 99,855 99,855 99,855 99,855 99,855
Percentage 92.5% 46.4% 89.3% 81.3% 68.8% 33.9%
MSAVI NDVI EVI
NBR2 NDMI NBR
Figure 15. Maps of breakpoint magnitudes for all VIs for Y2 and pixel percentages.
Texas Tech University, Vaughn Smith, December 2017
46
Guanacaste Area 1 – Magnitude Values of All Detected Breakpoints
Vegetation Index NDVI EVI MSAVI NDMI NBR NBR2
Breakpoint Pixels 12,152 9,867 11,695 23,401 22,774 12,752
Total Pixels 45,530 45,530 45,530 45,530 45,530 45,530
Percentage 26.7% 21.7% 25.7% 51.4% 50.0% 28.0%
Figure 16. Maps of breakpoint magnitudes for all VIs for G1 and pixel percentages.
NDVI EVI MSAVI
NDMI NBR NBR2
Texas Tech University, Vaughn Smith, December 2017
47
Guanacaste Area 2 – Magnitude Values of All Detected Breakpoints
Vegetation Index NDVI EVI MSAVI NDMI NBR NBR2
Breakpoint Pixels 7,944 3,022 7,771 8,355 6,052 5,657
Total Pixels 108,240 108,240 108,240 108,240 108,240 108,240
Percentage 7.3% 2.8% 7.2% 7.7% 5.6% 5.2%
NDVI EVI MSAVI
NDMI NBR NBR2
Figure 17. Maps of breakpoint magnitudes for all VIs for G2 and pixel percentages.
Texas Tech University, Vaughn Smith, December 2017
48
4.2 Accuracy assessment
The reference dataset consisted of 373 locations with confirmed deforestation
events across the four sites. However, it is important to note that ‘ground-truth’
deforestation for G2 was estimated using the Hansen et al. 2013 dataset. The accuracy
of this dataset at landscape scales for TDF is unknown, therefore, I considered reporting
accuracy measures with and without the G2 validation dataset.
In analyzing the overall accuracies of the vegetation indices across all sites it is
clear that NBR/NBR2 was the most accurate with an overall accuracy of 74%, which
increased to 83.4% if excluding poor G2 validation data (Figure 18). NDMI was also a
fairly accurate determiner of deforestation with an overall accuracy of 71.6% across all
sites and 81.4% excluding G2. NDVI, EVI and MSAVI were not as effective in
successfully detecting deforestation with overall accuracies of 64.6%, 63.5%, and
60.1%, respectively, across all sites and overall accuracies of 73.1%, 68%, and 64%,
respectively, if excluding G2.
The producer’s accuracy (which refers to the number of correct pixels in a
category divided by the total number of pixels in that category based on the reference
data) yields more optimistic results for NBR/NBR2 and NDMI indices across all sites.
The user’s accuracy (commission error) yields similar accuracies among indices (60-
80%) for all sites.
This is indicative of and validates previous research suggesting that vegetation
indices that exploit the water absorption features of the SWIR band in the
electromagnetic spectrum are more sensitive to forest change than the chlorophyll
absorption features of the red band. Details of the reasons behind this discrepancy will
be discussed in the next section.
Texas Tech University, Vaughn Smith, December 2017
49
0
20
40
60
80
100
Overall accuracies (Y1 site)
0
20
40
60
80
100
Overall accuracies (Y2 site)
020406080
100
Overall accuracies (G1 site)
020406080
100
Overall accuracies (G2 site)
404550556065707580
Overall accuracies (all sites)
4050607080
Overall accuracies (all except G2)
Figure 18. Overall accuracies for Y1, Y2, G1, G2, all sites, and all sites except G2.
Texas Tech University, Vaughn Smith, December 2017
50
020406080
100
Producer's accuracies for 'Deforested' class (Y1 site)
020406080
100
Producer's accuracies for 'Deforested' class (Y2 site)
020406080
100
Producer's accuracies for 'Deforested' class (G1 site)
020406080
100
Producer's accuracies for 'Deforested' class (G2 site)
0
20
40
60
80
100
Producer's accuracies for 'Deforested' class (all sites)
0
20
40
60
80
100
Producer's accuracies for 'Deforested' class (all sites)
w/o G2
Figure 19. Producer’s accuracies for Y1, Y2, G1, G2, all sites, and all sites except G2.
Texas Tech University, Vaughn Smith, December 2017
51
60
80
100
User's accuracies for 'Deforested' class (Y1 site)
020406080
100
User's accuracies for 'Deforested' class (Y2 site)
020406080
100
User's accuracies for 'Deforested' class (G1 site)
0.0020.0040.0060.0080.00
100.00
User's accuracies for 'Deforested' class (G2 site)
0
20
40
60
80
100
User's accuracies for 'Deforested' class (all sites)
0
20
40
60
80
100
User's accuracies for 'Deforested' class (all sites) w/o G2
Figure 20. User’s accuracies for Y1, Y2, G1, G2, all sites, and all sites except G2.
Texas Tech University, Vaughn Smith, December 2017
52
The varying accuracy of the different vegetation indices was interesting to
witness and has been explained in previous studies, and validated by this research.
DeVries et al. (2015) noted that vegetation indices like NBR/NBR2 and NDMI that
utilize near infrared and short wave infrared bands are particularly sensitive in detecting
canopy moisture content, thus making them highly accurate in detecting deforestation
as well as in differentiating age classes of forest (primary vs. secondary). Additionally,
these vegetation indices can better distinguish not only age classes of forests, but also
discriminate minimal vegetation in pastures or degraded areas from bare soil, which
reduces cropland false positives (Bewernick, 2015). NDMI has been found to be
particularly useful in previous studies of herbaceous biomass in savanna ecosystems in
order to determine fire risk, again due to the SWIR band’s responsiveness to plant tissue
water content (Verbesselt et al. 2006). In even earlier studies conducted by Wilson et
al. (2002) NDVI and NDMI were directly compared in forest harvest type detection,
also using Landsat imagery, with the older method of comparing 2 images from different
dates at 2, 3, and 6-year intervals. Their research showed NDMI significantly
outperforming NDVI over all intervals in instances of obvious clearcutting, but
especially in smaller scale partial harvests, suggesting increased precision and accuracy.
In this research it has been demonstrated that the water absorption associated with this
region of the electromagnetic spectrum (used in NBR, NBR2 and NDMI) is more
sensitive to change than the chlorophyll absorption associated with the red or NIR band
used in other indices (NDVI, EVI, MSAVI). This notion is corroborated by other
research such as that conducted by Sims and Gamon (2003) whereby direct comparisons
were made between vegetation indices based on water and chlorophyll absorption
features. They note that a remote sensors ability to deeply penetrate a forest canopy to
acquire information is directly tied to the strength of the absorption of wavelengths.
This is why NDVI and other chlorophyll absorption based vegetation indices cannot
penetrate forest canopy deeply as they absorb much more strongly than water,
particularly in forests with high leaf area indices. Essentially, since the chlorophyll is
being absorbed by the leaves higher up in the canopy, this prevents better data from
Texas Tech University, Vaughn Smith, December 2017
53
being acquired as the wavelength being sensed is stopping short, whereas water
absorption features can be detected more thoroughly throughout the entirety of the
canopy (Sims and Gamon, 2003). For example, if forest canopy is removed and
replaced with pasture or agricultural land, these could have similar chlorophyll
absorption features, but pasture or agricultural land could not replicate the moisture
absorption levels of a forest canopy. These differences in wavelength absorption
features are why NBR, NBR2 and NDMI are more sensitive to changes in forest
structure than NDVI, EVI and MSAVI.
The second study site in Guanacaste proved very difficult to validate in terms of
accuracy due to the lack of an adequate validation data set. This lack of data is why I
chose not to fully include the second site in evaluating the overall accuracy of the
method.
Regarding the ‘near real-time’ assessment, after analyzing all newly detected
breaks, 76 points out of 138 were determined to be actual deforestation yielding 55%
accuracy with false positives yielding 45% of breaks detected (Figure 21). Many of the
false positive breaks seemed to be associated with disturbance events such as pasture
clearing (Figure 22). However, many points selected did not represent the most negative
magnitude values so accuracy could have been compromised by commission errors.
Texas Tech University, Vaughn Smith, December 2017
54
Figure 21. Evaluation of near real-time accuracy of BFAST. Points represent new
breakpoints detected in a 6-month window. Imagery shows ground truth data. The
accuracy for this assessment in Y2 was estimated in 55%.
Detection of deforestation events like this is significant considering they
happened between 03/25/2015 and 09/06/2015, but the new breakpoint data was from
07/2015 to 12/2015, which would suggest that the deforestation happened between
07/2015 and 09/06/2015. Some of the omission errors, might be associated to the lack
of reference data beyond 09/2015.
Texas Tech University, Vaughn Smith, December 2017
55
4.3 Step towards a near real-time deforestation monitoring system in
Central America.
The high level of accuracy of the BFAST method and its potential for near real-
time application could have a significant impact on the way that land uses and forests
are managed in tropical dry forest areas. My results support the confidence values
reported by other authors in humid forests and other dry forest sites, and contributes to
the design of a near real-time system for the Guanacaste Conservation Area and Yucatan
area. With the accurate results produced by ‘bfastSpatial’ combined with the near real-
time method described in this research, actionable results can be attained. For example,
if this method were to be applied every 3 to 6 months a report could be produced for a
director or other conservation, land-use, or forest manager, which could then be used to
determine if action needed to be taken on the ground. This would be particularly useful
in areas like Guanacaste where deforestation is illegal. In areas like Yucatan this
method could also be proven useful in allowing for better management of land for milpa
Figure 22. Example of false positive detected breaks not associated with deforestation.
Texas Tech University, Vaughn Smith, December 2017
56
agricultural practices whereby cropland is rotated and allowed to lie fallow for periods
of time.
Some aspects need to be taken into consideration for a monitoring system based
on BFAST to work. First, local authorities and local scientists need to be trained in the
use of R-software and the application of BFAST family of algorithm. Local capacity
building is an important aspect since users on the ground need to be aware of the
advantages and limitations of the method.
Furthermore, the possibility of including more observations at the spatial
resolution of Landsat (30-m) or higher, will greatly increase the accuracy of the method.
New methods are being developed to enable algorithms such as BFAST to incorporate
multiple data streams from Landsat, MODIS and Sentinel 1 & 2, to help fill gaps in the
time series. If, for example, the BFAST algorithm is applied using a time-series
consisting of harmonized Landsat, Sentinel-2 and Radar data, then this will allow to
have up to 10 observations per month for every pixel. This will greatly increase the
probability of having non-contaminated or cloud-free pixels and allow subsequent
BFAST runs after acquiring just 1 month or 3 months of data.
The possibility of filling the gaps using more imagery could also increase the
processing time. Using only Landsat, the processing time at an ecoregional scale could
reach about 12 hours using a single computer or 2 hours using parallel computing just
for one run. This could double or triple with the use of harmonized multi-sensor data.
Parallel computing or cloud computing through international collaborations could be
implemented to reduce processing time.
4.4 Sources of error and implications for BFAST implementation
There are various error issues related to the methodology described in this study,
with the primary sources of error being the lack of data caused by interference from
cloud cover and the need for additional metrics derived from BFAST outputs and from
external sources. Cloud cover is the most significant factor in analyzing time series data
Texas Tech University, Vaughn Smith, December 2017
57
with BFAST, with study site G2 displaying this issue most prominently (See Figure 23
for examples of clouded images within stack).
A B
The G2 stack contained 224 images for the period 2000-2016 with each image
containing 108,240 pixels for a total of 24,245,760 pixels. Of these 24,245,760 pixels,
21,294,806 pixels were flagged NA (Not Applicable) by the cloud mask file, meaning
that 87.8% of the data was useless, with only 12.2% of the data being used by BFAST.
Furthermore, this 12.2% of usable data wasn’t necessarily consistent for each pixel over
time so a single pixel location may have a vegetation index value on one date, but could
be flagged NA on another, thus affecting the overall model for that pixel created by
BFAST.
Additional sources of data that could be used to reduce sources of error in the
methodology are the slope (statistical) of segments between breakpoints, which is
derived from BFAST outputs, as well as elevation and slope (topographical) data, which
can be acquired from external sources. Murillo-Sandoval et al. (2017) utilized the slope
of each breakpoint segment to determine if actual deforestation had taken place. In their
study, the authors utilized BFAST without attention to a specific monitoring period so
Figure 23. Image A from 02/25/2005 and B from 10/11/2015 in G2 NDVI stack
showing lack of data due to cloud mask (and due to Landsat 7 Scan Line Corrector error
in Image A).
Texas Tech University, Vaughn Smith, December 2017
58
multiple breaks could be detected, but a similar method could be applied for a specific
monitoring period with one breakpoint. For example, the user can calculate the slope
between the start of the monitoring period and the break, and then, the slope between
the break and the end of the monitoring period. If negative slope coefficients are
significant (α = 0.05) this could be considered to be potential deforestation or browning
(see figure 24 for Murillo-Sandoval slope graph).
In addition to statistical slope data, externally sourced elevation and
topographical slope data can also be used to enhance deforestation detection accuracy.
It was noted by Murillo-Sandoval et al. (2017) that areas of high elevation (> 2000 m
above sea level) not only experienced much higher levels of cloud cover (much like the
G2 site), but were also expected to have less anthropogenic disturbance due to forest
access difficulty. Singh et al. (2017) also expanded upon this idea by utilizing slope
and Shuttle Radar Topography Mission elevation data (as well as other approachability
factors such as settlements, major roads, distance to forest edge, and water body
locations) to model deforestation with the use of an artificial intelligence neural
network. With the use of cloud cover, statistical slope, and topographical data that
Figure 24. Example taken from Murillo-Sandoval et al. 2017. Three breakpoints
(dashed red lines) and four segments (black lines) identified over time series (blue
lines). The slope coefficients (β) are all significant (α = 0.05) and ρ represents p-
values.
Texas Tech University, Vaughn Smith, December 2017
59
influence access to forests like slope and elevation, a confidence product can be created.
This confidence product can be used to determine if a particular area is more or less
susceptible to deforestation, thus providing enhanced potential to filter out spurious
breakpoints.
4.5 Implications for biodiversity and conservation
When considering the implementation of new technologies and methods such as
those described in this study, it is important to consider the end user and how this
methodology could actually be effectively and efficiently utilized in the field, especially
by those unfamiliar with how to use this technology. Firstly, it is necessary to be able
to convey how exactly this research would contribute to the protection of biodiversity
and conservation efforts. For example, Simons-Legaard et al. (2016) utilized 16 total
Landsat images between 1973 and 2010 to create time-series forest disturbance maps
used for habitat monitoring and projections for the Canada lynx, a US federally
threatened species. The decline in vegetation density in the shared boreal and sub-boreal
forest habitat of the Canada lynx and the snowshoe hare (primary food source for the
Canada lynx) equated to a decline in snowshoe hare population thus negatively
impacting the population of Canada lynx. The researchers note that time-series data is
commonly used in mapping land-cover change, while regular non-time-series imagery
is used in developing species-habitat model predictors, but the two methods are rarely
combined. However, comparable studies can be improved upon by employing methods
like BFAST on time-series data of a higher temporal resolution created through utilizing
full Landsat archives. Not only could initial disturbance maps be improved with the use
of more data and change detection algorithms like BFAST, the wildlife habitat could be
monitored in near real-time, thus adding an additional level of depth to the research for
directly tracking changes at regular intervals over the course of the study. Determining
near real-time changes in habitat extent for a threatened species such as the Canada lynx
could further the potential for developing appropriate modeling parameters to more
accurately monitor and predict changes in habitat. Ultimately, understanding the
Texas Tech University, Vaughn Smith, December 2017
60
dynamic nature of habitat extent transformation can greatly assist in conservation efforts
to preserve unique and vulnerable biodiversity.
Although not directly addressed in this research the potential for other types of
change detection could also be of value, more specifically, the magnitude of detected
breakpoints that were positive. Originally this was hypothesized to represent regrowth
(especially at the most extreme positive magnitude classes), but the nature of regrowth
dynamics is complex. For example, in this research, when very high positive magnitude
values were detected they didn’t necessarily correlate to forest regrowth as true forest
regrowth takes place over years, and wouldn’t register as an abrupt change in a
vegetation index. However, regrowth of some type was detected from bare soil to
grasslands or pastureland, which can be particularly useful in general near real-time
land-cover change mapping, especially in regards to grassland management for fire
suppression, for example. Moreover, while initial results suggested positive magnitudes
did not correlate with forest regrowth specifically, perhaps decreasing data gaps,
changing model parameters and statistically analyzing supplementary BFAST outputs
combined with external data (as mentioned in the previous section) could yield accurate
forest regrowth results. This being said, more research needs to be conducted regarding
the analysis of positive magnitude value breaks as this was not the focus of this
particular study.
As the efficacy of this technology and methodology becomes more apparent, it
becomes increasingly important to understand how this can be practically used in the
field, particularly in areas where resources may be scarce. Despite the environment in
which the methodology described in this study is used, optimization of several of the
functions to make a simpler, more unified system is key. The downloading of imagery
via USGS ESPA (as well as other repositories) can be automated through a Unix based
programming language like R via a bulk ordering application programming interface
(API). The bulk ordering API would allow this more unified system to first download
the pre-processed images (reprojected, image extents cropped to study site, vegetation
indices, cloud mask, etc.), then stack and analyze time-series data to produce outputs.
Texas Tech University, Vaughn Smith, December 2017
61
With proper parameter selection and scripting to connect the disparate functions, inputs
can be acquired, then analyzed to produce outputs, which can also be further
manipulated within the same system to produce more “end-user” (i.e. conservation
director) based outputs such as the magnitude threshold map of probable deforestation
correlating to lowest magnitude class (additional error reduction techniques described
in the previous section could also be applied). With this map of probable deforestation,
areas of interest could be investigated directly or used to geotag waypoints for drone
monitoring. However, developing and operating this optimized system would require
significant resources including sufficient computing power, stable internet connection,
and at least one staff member (as well as drone if trying to maximize efficiency). In
more financially distressed areas with fewer assets this system is infeasible, and would
require the collaborative effort of conservation organizations and universities around
the world.
Texas Tech University, Vaughn Smith, December 2017
62
CHAPTER V
CONCLUSION
To conclude, the need for this research is becoming increasingly necessary as
the rates of deforestation continue to climb and the status of tropical dry forests,
especially primary forests, and their biodiversity continue to be under threat. The
overall objectives of this study were to evaluate the accuracy of the BFAST set of
algorithms to validate the findings of previous research as well as to determine potential
near real-time capabilities of BFAST. These objectives were successfully met by
employing the BFAST parameters that yielded the most accurate results in previous
studies, most notably the first-order harmonic formula with an h value of 0.25 applied
over NDMI and NBR/NBR2 vegetation indices. Vegetation indices that utilize the
shortwave infrared bands proved to be more sensitive to forest disturbance than other
indices using the red and near infrared bands. Moderate to extreme negative magnitude
values were revealed to be the determining output products that indicated a deforestation
event, with value ranges varying widely between study sites and regions. Cloud cover
impacted the low level of accuracy achieved for the G2 site, which contrasted with the
rest of the sites. This site had more variable topography, and potentially increased
probability for atmospheric contamination of Landsat observations. The near real-time
monitoring objective was met with some initial success in that the method was able to
detect new breakpoints within a 6-month period or less. Because of poor data
availability and spuriously detected breakpoints, the application of BFAST for shorter
time frames in near real-time (weeks) will only be possible through the use of multi-
sensor data and external data sources. Such a system will also improve detection
probability in mountainous areas. Additionally, it was found that overall, the
methodology of this study was demonstrated to be effective and accurate for detecting
deforestation at sub-annual temporal scales, and could be upscaled to ecoregional or
national scales using available Landsat data, making it a beneficial means of conserving
biodiversity in the field.
Texas Tech University, Vaughn Smith, December 2017
63
LITERATURE CITED
Bewernick, T. 2015. Mapping post deforestation land use in the Brazilian Amazon
using remote sensing time series. Wageningen University.
Boer, M.M., C. Macfarlane, J. Norris, R.J. Sadler, J. Wallace, and P.F. Grierson.
2008. Mapping burned areas and burn severity patterns in SW Australian
eucalypt forest using remotely-sensed changes in leaf area index. Remote
Sensing of Environment 112:4358-4369.
Bohn, F.J. and A. Huth. 2017. The importance of forest structure to biodiversity-
productivity relationships. Royal Society Open Science 4:160521.
Boillat, F.M. Scarpa, J.P. Robson, I. Gasparri, T.M. Aide, A.P. Dutra Aguiar, L.O.
Anderson, M. Batistella, M. Gesteira Fonseca, C. Futemma, H.R. Grau, S.-L.
Mathez-Stiefel, J.P. Metzger, J.P.H. Balbaud Ometto, M.A. Pedlowski, S.G.
Perz, V. Robiglio, L. Soler, I. Vieira, E.S. Brondizio.2017. Land system
science in Latin America: challenges and perspectives, Curr. Opin. Environ.
Sustain., 26–27, pp. 37-46
Butchart, S.H.M., et al. 2010. Global Biodiversity: Indicators of Recent Declines.
Science 328:1164-1168.
Cohen WB, Healey SP, Yang Z, Stehman SV, Brewer CK, N G, Huang C, Kennedy
RE et al.. 2017. How Similar Are Forest Disturbance Maps Derived from
Different Landsat Time Series Algorithms? Forests. 8(4):98
Congalton, R.G. (1991) A Review of Assessing the Accuracy of Classifications of
Remotely Sensed Data. Remote Sensing of Environment 37:35-46.
Davis J, Lopez-Carr D. 2014. Migration, remittances and smallholder decision-
making: implications for land use and livelihood change in Central America.
Land Use Policy 38: 319-329
DeClerck, F.A.J., Chazdon, R., Holl, K.D., Milder, J.C., Finegan, B., Martinez-
Salinas, A., Imbach, P., Canet, L., Ramos, Z., 2010. Biodiversity conservation
in human-modified landscapes of Mesoamerica: Past, present, and future.
Biological Conservation 14, 2301–2313.
DeVries, B., M. Decuyper, J. Verbesselt, A. Zeileis, M. Herold, and S. Joseph. 2015.
Tracking disturbance-regrowth dynamics in tropical forests using structural
change detection and Landsat time series. Remote Sensing of Environment
169:320-334.
Texas Tech University, Vaughn Smith, December 2017
64
Exelis (2017) Exelis Documentation center: Vegetation Indices. Available at:
https://www.harrisgeospatial.com/docs/vegetationindices.html
García-Frapolli, E., Ayala-Orozco, B., Bonilla-Moheno, M., Espadas-Manrique, C.,
Ramos-Fernández, G., 2007. Biodiversity conservation, traditional agriculture
and ecotourism: land cover/land use change projections for a natural protected
area in the northeastern Yucatan Peninsula, Mexico. Landscape and Urban
Planning 83, 137–153.
Hansen, M.C., P.V. Potapov, R. Moore, M. Hancher, S.A. Turubanova, A. Tyukavina,
D. Thau, S.V. Stehman, S.J. Goetz, T.R. Loveland, A. Kommareddy, A.
Egorov, L. Chini, C.O. Justice, and J.R.G. Townshend. 2013. High-
Resolution Global Maps of 21st-Century Forest Cover Change. Science
342:850-853.
Jensen, J.R. 2016. Introductory digital image processing: A remote sensing
perspective (4th edition). Pearson series in geographic information science.
Kennedy, R.E., Z. Yang, and W.B. Cohen. 2010. Detecting trends in forest
disturbance and recovery using yearly Landsat time series: 1. LandTrendr –
Temporal segmentation algorithms. Remote Sensing of Environment
114:2897-2910.
Miles, L., Newton, A. C., DeFries, R. S., Ravilious, C., May, I., Blyth, S., Kapos, V.
and Gordon, J. E., A global overview of the conservation status of tropical dry
forests. Journal of Biogeography, 2006, 33, 491-505.
Murillo-Sandoval, P.J., J. Van Den Hoek, and T. Hilker. 2017. Leveraging Multi-
Sensor Time Series Datasets to Map Short- and Long-Term Tropical Forest
Disturbances in the Colombian Andes. Remote Sens 9:179
Olson, D.M., E. Dinerstein, E.D. Wikramanayake, N.D. Burgess, G.V.N. Powell, E.C.
Underwood, J.A. D’Amico, I. Itoua, H.E. Strand, J.C. Morrison, C.J. Loucks,
T.F. Allnutt, T.H. Ricketts, Y. Kura, J.F. Lamoreux, W.W. Wettengel, P.
Hedao, and K.R. Kassem. 2001. Terrestrial Ecoregions of the World: A New
Map of Life on Earth. BioScience 51(11):933-938.
Portillo-Quintero, C. A. and Sanchez-Azofeifa, G. A. 2010., Extent and conservation
of tropical dry forests in the Americas. Biological Conservation, 2010, 143,
144-155.
Portillo-Quintero, C., A. Sanchez-Azofeifa, J. Calvo-Alvarado, M. Quesada, and
M.M. do Espirito Santo. 2014. The role of tropical dry forests for
biodiversity, carbon and water conservation in the neotropics: lessons learned
and opportunities for its sustainable management. Reg Environ Change
15:1039-1049.
Texas Tech University, Vaughn Smith, December 2017
65
Qi, J., A. Chehbouni, A.R. Huete, Y.H. Kerr, and S. Sorooshian. 1994. A modified
soil adjusted vegetation index. Remote Sensing of Environment 48:119-126.
Read, L. and D. Lawrence. 2003. Recovery of biomass following shifting cultivation
in dry tropical forests of the Yucatan. Ecological Applications 13(1):85-97.
Sanchez-Azofeifa, G. A., Quesada, M., Rodriguez, J. P., Nassar, J. M., Stoner, K. E.,
Castillo, A., Garvin, T., Zent, E. L., Calvo-Alvarado, J. C., Kalacska, M. E. R.,
Fajardo, L., Gamon, J. A. and Cuevas-Reyes, P., Research Priorities for
Neotropical Dry Forests. Biotropica, 2005, 37(4) 477-485.
Schultz, M., J. Verbesselt, V. Avitabile, C. Souza, and M. Herold. 2016. Error
Sources in Deforestation Detection Using BFAST Monitor on Landsat Time
Series Across Three Tropical Sites. Journal of Selected Topics in Applied
Earth Observations and Remote Sensing 9(8):3667-3679.
Simons-Legaard, E.M., D.J. Harrison, and K.R. Legaard. 2016. Habitat monitoring
and projections for Canada lynx: linking the Landsat archive with carnivore
occurrence and prey density. Journal of Applied Ecology 53:1260-1269.
Sims, D.A. and J.A. Gamon. 2003. Estimation of vegetation water content and
photosynthetic tissue area from spectral reflectance: a comparison of indices
based on liquid water and chlorophyll absorption features. Remote Sensing of
Environment 84:526-537.
Singh, S., C.S. Reddy, S.V. Pasha, K. Dutta, K.R.L. Saranya, and K.V. Satish. 2017.
Modeling the spatial dynamics of deforestation and fragmentation using Multi-
Layer Perceptron neural network and landscape fragmentation tool. Ecological
Engineering 99:543-551.
Valero, A; Schipper, J and Alnutt, T. (2017) Yucatán Dry Forests. World Wildlife
Fund. Available at https://www.worldwildlife.org/ecoregions/nt0235.
Accessed on September 2017.
Verbesselt, J., R. Hyndman, A. Zeileis, and D. Culvenor. 2010. Phenological change
detection while accounting for abrupt and gradual trends in satellite image time
series. Remote Sensing of Environment 144:2970-2980.
Verbesselt, J., B. Somers, J. van Aardt, I. Jonckheere, and P. Coppin. 2006.
Monitoring herbaceous biomass and water content with SPOT VEGETATION
time-series to improve fire risk assessment in savanna ecosystems. Remote
Sensing of Environment 101:399-414.
Texas Tech University, Vaughn Smith, December 2017
66
Vila, M., J. Vayreda, L. Comas, J. J. Ibáñez, T. Mata, and B. Obón. 2007. Species
richness and wood production: a positive association in Mediterranean forests.
Ecology Letters 10:241-250.
Wilson, E.H. and S.A. Sader. 2002. Detection of forest harvest type using multiple
dates of Landsat TM imagery. Remote Sensing of Environment 80:385-396.
Zhu, Z. and C.E. Woodcock. 2014. Continuous change detection and classification of
land cover using all available Landsat data. Remote Sensing of Environment
144:152-171.
Texas Tech University, Vaughn Smith, December 2017
67
APPENDICES
A. BFAST CODE IMPLEMENTATION IN RSTUDIO
**Note it is advised to visit https://github.com/loicdtx/bfastSpatial prior to
implementation to ensure correct versions, that there are no issues, and to become
familiar with the algorithm. For a step by step tutorial visit http://changemonitor-
wur.github.io/talks/bfastSpatial-2016/bfastSpatial_Peru.html#(1) **
# install developer’s version of bfastSpatial, unless it has been updated to
accommodate the new Landsat collection 1 data naming convention then no need for
ref = ‘develop’
devtools::install_github(‘loicdtx/bfastSpatial’, ref = ‘develop’)
# set directory path
setwd(‘~/path_to_study_site_directory’)
# set path for reading and saving files
path <- getwd()
# load bfastSpatial and set tmpdir
library(bfastSpatial)
tmpDir <- rasterOptions()$tmpdir
# set the path to the location of script
inDir <- file.path(path, 'data')
# stepDir is where intermediary outputs are stored
stepDir <- file.path(inDir, 'datastep')
# directory for Landsat data
landsatDir <- file.path(stepDir, 'landsat')
# where individual VI layers are stored prior to being stacked; ndviDir, eviDir, etc. are
subdirectories of stepDir
ndviDir <- file.path(stepDir, 'ndvi')
eviDir <- file.path(stepDir, 'evi')
msaviDir <- file.path(stepDir, 'msavi')
ndmiDir <- file.path(stepDir, 'ndmi')
nbrDir <- file.path(stepDir, 'nbr')
nbr2Dir <- file.path(stepDir, 'nbr2')
# outDir is where outputs are stored
outDir <- file.path(inDir, 'out')
Texas Tech University, Vaughn Smith, December 2017
68
# processLandsatBatch is variable due to the change in USGS ESPA file naming
convention. If using developers version of bfastSpatial use the following to apply the
cloud mask: keep = c(322, 386) applies to Landsat 8 data. Change to: keep = c(66,
130) for Landsat 5-7 data
# script to unzip Landsat files, apply cloud mask, and calculate VI if not available
if (!file.exists(file.path(inDir, 'ndvi_stack.grd'))) {
# unzip individual file, use the cloud mask, create ndvi if not available
processLandsatBatch(x = landsatDir, outdir = ndviDir,
delete = TRUE, overwrite = TRUE, mask = 'pixel_qa', vi = 'ndvi',
keep = c(322, 386))
# make temporal ndvi stack
ndviStack <- timeStack(x = ndviDir, pattern = glob2rx('*.grd'),
filename = file.path(inDir, 'ndvi_stack.grd'),
datatype = 'INT2S')
} else {
ndviStack <- brick(file.path(inDir, 'ndvi_stack.grd'))
}
# set ndviStack to x to prepare to run through bfmSpatial
x <- ndviStack
# run bfmSpatial on x/ndviStack with same parameters used in this research
bfmSpatial <- function(x, dates=NULL, pptype='irregular', start = 2013,
monend=NULL,
formula = response ~ harmon, order = 1, lag = NULL, slag = NULL,
history = c("all"), type = "OLS-MOSUM", h = 0.25, end = 10, level =
0.05, mc.cores=1, returnLayers = c("breakpoint", "magnitude", "error",
“history”, “r.squared”, “adj.r.squared”, “coefficients”), sensor=NULL,
...) {
# populate date parameter with date data from Landsat scene info
if(is.null(dates)) {
if(is.null(getZ(x))) {
if(!.isLandsatSceneID(x)){ # Check if dates can be extracted from layernames
stop('A date vector must be supplied, either via the date argument, the z
dimension of x or comprised in names(x)')
} else {
dates <- as.Date(getSceneinfo(names(x))$date)
}
} else {
dates <- getZ(x)
}
}
Texas Tech University, Vaughn Smith, December 2017
69
# optional: reformat sensor if needed
# prepare for subsetting
sensor <- c(sensor, "ETM+ SLC-on", "ETM+ SLC-off", “OLI”)
s <- getSceneinfo(names(x))
s <- s[which(s$sensor %in% sensor), ]
# determine length of coefficient vector
# = intercept [+ trend] [+ harmoncos*order] [+ harmonsin*order]
coef_len <- 1 # intercept
modterms <- attr(terms(formula), "term.labels")
if("trend" %in% modterms)
coef_len <- coef_len + 1
if("harmon" %in% modterms)
coef_len <- coef_len + (order * 2) # sin and cos terms
fun <- function(x) {
# subset x by sensor
if(!is.null(sensor))
x <- x[which(s$sensor %in% sensor)]
# convert to bfast ts
ts <- bfastts(x, dates=dates, type=pptype)
#optional: apply window() if monend is supplied
if(!is.null(monend))
ts <- window(ts, end=monend)
# run bfastmonitor(), or assign NA if only NA's (ie. if a mask has been applied)
if(!all(is.na(ts))){
bfm <- try(bfastmonitor(data=ts, start=start,
formula=formula,
order=order, lag=lag, slag=slag,
history=history,
type=type, h=h,
end=end, level=level), silent=TRUE)
# assign 1 to error and NA to all other fields if an error is encountered
if(class(bfm) == 'try-error') {
bkpt <- NA
Texas Tech University, Vaughn Smith, December 2017
70
magn <- NA
err <- 1
history <- NA
rsq <- NA
adj_rsq <- NA
coefficients <- rep(NA, coef_len)
} else {
bkpt <- bfm$breakpoint
magn <- bfm$magnitude
err <- NA
history <- bfm$history[2] - bfm$history[1]
rsq <- summary(bfm$model)$r.squared
adj_rsq <- summary(bfm$model)$adj.r.squared
coefficients <- coef(bfm$model)
}
} else {
bkpt <- NA
magn <- NA
err <- NA
history <- NA
rsq <- NA
adj_rsq <- NA
coefficients <- rep(NA, coef_len)
}
res <- c(bkpt, magn, err, history, rsq, adj_rsq)
names(res) <- c("breakpoint", "magnitude", "error", "history", "r.squared",
"adj.r.squared")
res <- res[which(names(res) %in% returnLayers)]
if("coefficients" %in% returnLayers)
res <- c(res, coefficients)
return(res)
}
out <- mc.calc(x=x, fun=fun, mc.cores=mc.cores, ...)
return(out)
}
#after bfmSpatial runs view output brick
out
# extract change raster
Texas Tech University, Vaughn Smith, December 2017
71
change <- raster(out, 1)
# create month product
months <- changeMonth(change)
# set up labels and colormap for months
monthlabs <- c("jan", "feb", "mar", "apr", "may", "jun",
"jul", "aug", "sep", "oct", "nov", "dec")
cols <- rainbow(12)
# extract magnitude of the raster and scale values between 0 – 1.
magn <- raster(out, 2) / 10000
# make a version showing only breaking pixels
magn_bkp <- magn
magn_bkp [is.na(chang)] <- NA
opar <- par(mfrow=c(1, 2))
# Write breakpoint, yearly break month product, and breakpoint magnitude raster
layers to GeoTiff files as well as the raster brick to a .grd file.
writeRaster(out[[1]], filename = "Site1_NDVI_breaks.tif", format = "GTiff",
overwrite = TRUE)
writeRaster(months$changeMonth2013, filename = "Site1_NDVI_breaksmos13.tif",
format = "GTiff", overwrite = TRUE)
writeRaster(months$changeMonth2014, filename = "Site1_NDVI_breaksmos14.tif",
format = "GTiff", overwrite = TRUE)
writeRaster(months$changeMonth2015, filename = "Site1_NDVI_breaksmos15.tif",
format = "GTiff", overwrite = TRUE)
writeRaster(months$changeMonth2016, filename = "Site1_NDVI_breaksmos16.tif",
format = "GTiff", overwrite = TRUE)
writeRaster(magn_bkp, filename = "Site1_NDVI_magbreaks.tif", format = "GTiff",
overwrite = TRUE)
writeRaster(out, filename = "data/out/out_NDVI.grd", overwrite = TRUE)
# Test breakpoints
plot(ndviStack[[80]], col = grey.colors(255), legend = F)
plot(out[[1]], add=TRUE)
# Test months product
plot(months, col=cols, breaks=c(1:12), legend=FALSE)
legend("bottomright", legend=monthlabs, cex=0.5, fill=cols, ncol=2)
# Test magnitudes
plot(magn_bkp, main="Magnitude of a breakpoint")
plot(magn, main="Magnitude: all pixels")
Texas Tech University, Vaughn Smith, December 2017
72
B. ERROR MATRICES Yucatan Area 1
NDMI
D S Total UA
D 52 9 61 85.25
S 12 45 57 78.95
Total 64 54 118
PA 81.25 83.33333333 OA 82.20338983
NBR/NBR2
D S Total UA
D 50 9 59 84.75
S 14 45 59 76.27
Total 64 54 118
PA 78.125 83.33333333 OA 80.50847458
EVI
D S Total UA
D 28 2 30 93.33
S 36 52 88 59.09
Total 64 54 118
PA 43.75 96.2962963 OA 67.79661017
NDVI
D S Total UA
D 35 4 39 89.74
S 29 50 79 63.29
Total 64 54 118
PA 54.6875 92.59259259 OA 72.03389831
MSAVI
D S Total UA
D 19 2 21 90.48
S 45 52 97 53.61
Total 64 54 118
PA 29.6875 96.2962963 OA 60.16949153
Texas Tech University, Vaughn Smith, December 2017
73
Yucatan Area 2
NDMI
D S Total UA
D 23 4 27 85.19
S 9 28 37 75.68
Total 32 32 64
PA 71.875 87.5 OA 79.6875
NBR/NBR2
D S Total UA
D 28 3 31 90.32
S 4 29 33 87.88
Total 32 32 64
PA 87.5 90.625 OA 89.0625
EVI
D S Total UA
D 8 2 10 80.00
S 24 30 54 55.56
Total 32 32 64
PA 25 93.75 OA 59.375
NDVI
D S Total UA
D 19 8 27 70.37
S 13 24 37 64.86
Total 32 32 64
PA 59.375 75 OA 67.1875
MSAVI
D S Total UA
D 11 5 16 68.75
S 21 27 48 56.25
Total 32 32 64
PA 34.375 84.375 OA 59.375
Texas Tech University, Vaughn Smith, December 2017
74
Yucatan Overall
NDMI
D S Total UA
D 75 13 88 85.23
S 21 73 94 77.66
Total 96 86 182
PA 78.125 84.88372093 OA 81.31868132
NBR/NBR2
D S Total UA
D 78 12 90 86.67
S 18 74 92 80.43
Total 96 86 182
PA 81.25 86.04651163 OA 83.51648352
EVI
D S Total UA
D 36 4 40 90.00
S 60 82 142 57.75
Total 96 86 182
PA 37.5 95.34883721 OA 64.83516484
NDVI
D S Total UA
D 54 12 66 81.82
S 42 74 116 63.79
Total 96 86 182
PA 56.25 86.04651163 OA 70.32967033
MSAVI
D S Total UA
D 30 7 37 81.08
S 66 79 145 54.48
Total 96 86 182
PA 31.25 91.86046512 OA 59.89010989
Texas Tech University, Vaughn Smith, December 2017
75
Guanacaste Area 1
NDMI
D S Total UA
D 16 8 24 66.67
S 5 42 47 89.36
Total 21 50 71
PA 76.19047619 84 OA 81.69014085
NBR/NBR2
D S Total UA
D 15 6 21 71.43
S 6 44 50 88.00
Total 21 50 71
PA 71.42857143 88 OA 83.09859155
EVI
D S Total UA
D 15 11 26 57.69
S 6 39 45 86.67
Total 21 50 71
PA 71.42857143 78 OA 76.05633803
NDVI
D S Total UA
D 15 8 23 65.22
S 6 42 48 87.50
Total 21 50 71
PA 71.42857143 84 OA 80.28169014
MSAVI
D S Total UA
D 14 11 25 56.00
S 7 39 46 84.78
Total 21 50 71
PA 66.66666667 78 OA 74.64788732
Texas Tech University, Vaughn Smith, December 2017
76
Guanacaste Area 2
NDMI
D S Total UA
D 19 8 27 70.37
S 51 42 93 45.16
Total 70 50 120
PA 27.14285714 84 OA 50.83333333
NBR/NBR2
D S Total UA
D 19 4 23 82.61
S 51 46 97 47.42
Total 70 50 120
PA 27.14285714 92 OA 54.16666667
EVI
D S Total UA
D 17 2 19 89.47
S 53 48 101 47.52
Total 70 50 120
PA 24.28571429 96 OA 54.16666667
NDVI
D S Total UA
D 9 3 12 75.00
S 61 47 108 43.52
Total 70 50 120
PA 12.85714286 94 OA 46.66666667
MSAVI
D S Total UA
D 14 2 16 87.50
S 56 48 104 46.15
Total 70 50 120
PA 20 96 OA 51.66666667
Texas Tech University, Vaughn Smith, December 2017
77
Guanacaste Overall
NDMI
D S Total UA
D 35 16 51 68.63
S 56 84 140 60.00
Total 91 100 191
PA 38.46153846 84 OA 62.30366492
NBR/NBR2
D S Total UA
D 34 10 44 77.27
S 57 90 147 61.22
Total 91 100 191
PA 37.36263736 90 OA 64.92146597
EVI
D S Total UA
D 32 13 45 71.11
S 59 87 146 59.59
Total 91 100 191
PA 35.16483516 87 OA 62.30366492
NDVI
D S Total UA
D 24 11 35 68.57
S 67 89 156 57.05
Total 91 100 191
PA 26.37362637 89 OA 59.16230366
MSAVI
D S Total UA
D 28 13 41 68.29
S 63 87 150 58.00
Total 91 100 191
PA 30.76923077 87 OA 60.20942408
Texas Tech University, Vaughn Smith, December 2017
78
All sites overall
NDMI
D S Total UA
D 110 29 139 79.14
S 77 157 234 67.09
Total 187 186 373
PA 58.82352941 84.40860215 OA 71.58176944
NBR/NBR2
D S Total UA
D 112 22 134 83.58
S 75 164 239 68.62
Total 187 186 373
PA 59.89304813 88.17204301 OA 73.99463807
EVI
D S Total UA
D 68 17 85 80.00
S 119 169 288 58.68
Total 187 186 373
PA 36.36363636 90.86021505 OA 63.53887399
NDVI
D S Total UA
D 78 23 101 77.23
S 109 163 272 59.93
Total 187 186 373
PA 41.71122995 87.6344086 OA 64.61126005
MSAVI
D S Total UA
D 58 20 78 74.36
S 129 166 295 56.27
Total 187 186 373
PA 31.01604278 89.24731183 OA 60.0536193
Texas Tech University, Vaughn Smith, December 2017
79
All sites without Guanacaste site 2
NDMI
D S Total UA
D 91 21 112 81.25
S 26 115 141 81.56
Total 117 136 253
PA 77.77777778 84.55882353 OA 81.4229249
NBR/NBR2
D S Total UA
D 93 18 111 83.78
S 24 118 142 83.10
Total 117 136 253
PA 79.48717949 86.76470588 OA 83.39920949
EVI
D S Total UA
D 51 15 66 77.27
S 66 121 187 64.71
Total 117 136 253
PA 43.58974359 88.97058824 OA 67.98418972
NDVI
D S Total UA
D 69 20 89 77.53
S 48 116 164 70.73
Total 117 136 253
PA 58.97435897 85.29411765 OA 73.12252964
MSAVI
D S Total UA
D 44 18 62 70.97
S 73 118 191 61.78
Total 117 136 253
PA 37.60683761 86.76470588 OA 64.03162055