quantifying the predictive value of soil moisture for ... · observed soil moisture is arguably the...
TRANSCRIPT
MSc Artificial Intelligence
Track: Machine learning
Master Thesis
Quantifying the predictive value of soilmoisture for vegetation growth using
neural networks
by
Robert Leenders
10811548
42 ECTS
April 2016 – September 2016
Supervisor:
Dr R de Jeu
Assessor:
Dr M Welling
Machine Learning GroupUniversity of Amsterdam
Abstract
Soil moisture is a crucial constraint for vegetation growth, and has there-
fore potentially predictive value. However, the strength of this predictive
value is still to a large degree unknown. This thesis quantifies the pre-
dictive value of soil moisture for vegetation growth. New methods are
introduced to predict vegetation growth using satellite based soil mois-
ture observations. These new methods are based on neural networks and
are evaluated over mainland Australia. Analysis on the predictions of our
3 layer neural network revealed that (a) soil moisture provides a strong
predictive value for vegetation, (b) soil moisture can be used to reliably
predict vegetation up to two months in advance, and (c) soil moisture has
a strong local spatial relation with vegetation. The accuracy of vegetation
predictions are dependent on the magnitude of soil moisture, where the
quality of the vegetation prediction is higher in dry regions as compared
to wet areas.
Contents
1 Introduction 1
2 Background 5
2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Soil moisture . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 NDVI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Machine learning models . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Bayesian Neural Networks . . . . . . . . . . . . . . . . . . . . 13
3 Predicting NDVI 15
3.1 Using neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Analyzing the performance . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 Performance on different areas . . . . . . . . . . . . . . . . . . 20
3.2.2 Performance on different time periods . . . . . . . . . . . . . . 22
3.2.3 Why wetness decreases performance . . . . . . . . . . . . . . . 23
3.2.4 Adaptability on anomalies . . . . . . . . . . . . . . . . . . . . 24
3.3 Predicting further into the future . . . . . . . . . . . . . . . . . . . . 26
3.4 Locally connected methods . . . . . . . . . . . . . . . . . . . . . . . . 28
4 Predicting NDVI with uncertainty 31
4.1 Bayesian neural networks . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Analyzing the uncertainty . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.1 Uncertainty in different areas . . . . . . . . . . . . . . . . . . 37
4.2.2 Uncertainty in different time periods . . . . . . . . . . . . . . 38
5 Conclusion 42
Bibliography 44
Chapter 1
Introduction
Vegetation is the assemblage of plant species and their ground cover. It plays an
important role in our ecosystem where it regulates various biogeochemical cycles such
as water, carbon, and nitrogen. It converts carbon to oxygen, converts solar energy
into biomass, is the basis of all food chains, and provides wildlife habitat and food.
Understanding the impact of climate change on vegetation dynamics is crucial in
understanding ecosystem dynamics. This is also the reason why vegetation dynamics
are observed for analyzing climate change. Besides the importance of vegetation for
our ecosystem, vegetation is being used in a wide range of important problems, most
notably in forecasting and monitoring. Examples of such problems are climate change
monitoring [Bounoua et al., 2000], agricultural productivity (crop yield [Teal et al.,
2006]), drought monitoring [Peters et al., 2002], and forest fire detection [Illera et al.,
1996].
The effect of climate change on vegetation dynamics is complex and is influenced
by a wide range of different climatic constraints. The three strongest climatic con-
straints are water availability, solar radiation, and temperature [Stephenson, 1990,
Churkina and Running, 1998, Nemani et al., 2003]. The impact of these three com-
ponents on vegetation are relatively well studied, with water availability being the
least well studied [Lotsch et al., 2003, Mercado et al., 2009]. This is peculiar as
more than half of the world’s ecosystems are substantially limited by the availability
of water [Heimann and Reichstein, 2008]. A decrease in water availability reduces
the ability of vegetation to convert carbon-dioxide to oxygen due to a restriction in
stomatal conductance and a limited availability of root water [van der Molen et al.,
2011].
The climatic constraint of water availability on vegetation consists mainly of pre-
cipitation and soil moisture, with soil moisture being more strongly related to plant
growth dynamics than precipitation. There are three important factors that make
1
soil moisture crucial to plants. First off, it provides water and nutrients to the plants,
allowing it to grow. Secondly it creates a buffer and ensures water availability to
plants, even in absence of precipitation. And finally, it enhances the soil chemical
processes which aids the availability of macro-nutrients such as nitrogen. Besides
the influence on vegetation, soil moisture is also of fundamental importance to many
other hydrological and biological processes.
Observed soil moisture is arguably the key variable for modulating the complex
dynamics of the climate-soil-vegetation system and controlling the spatial and tempo-
ral patterns of vegetation [Porporato and Rodriguez-Iturbe, 2002]. However, instead
of using soil moisture observations to study the relation between vegetation dynamics
and water availability, often proxies are used such as model based soil moisture and
drought indices [Hirschi et al., 2011, Lotsch et al., 2003]. Near surface soil moisture
can be accurately observed at a regional and global scale using passive and active mi-
crowave sensing instruments [Owe et al., 2008, Liu et al., 2012, 2011, Miralles et al.,
2010]. The combination of passive and active observations gives a robust observed
satellite based soil moisture product [De Jeu et al., 2008, Dorigo et al., 2010].
Satellite observed soil moisture has been used to show a strong positive relation
between soil moisture and vegetation at large spatial and long-term temporal scales
over mainland Australia [Chen et al., 2014], with dry regions that have low vegetation
density being more sensitive to soil moisture and with vegetation lagging about one
month behind soil moisture. However, the details of the relationship between soil
moisture and vegetation are not yet clear.
The main objective of this thesis is to quantify the predictive value of satellite
based soil moisture for vegetation by forecasting vegetation maps. To forecast vege-
tation maps powerful machine learning techniques are used to model the relationship
between satellite based soil moisture and vegetation. The predictive value of satellite
based soil moisture for vegetation will be analyzed for different spatial and temporal
regions, and for different soil moisture levels. Additionally, the quantity of how far
into the future soil moisture has predictive value will be analyzed.
Long term satellite soil moisture data from ESA CCI Liu et al. [2011] and satellite
vegetation proxies as described by the normalized difference vegetation index (NDVI)
[Rouse, 1973] are used. Neural networks are used as our machine learning model to
forecast vegetation maps. The neural networks will take satellite soil moisture as
input and produce NDVI maps as output. Neural networks are a powerful set of
models that can model complex non-linear relationships between input and output.
A deep neural network is a neural network which is composed of multiple hidden
layers. By stacking layers, which represent linear and non-linear transformations,
2
deep neural networks can learn increasingly complex abstractions of the data. Deep
neural networks have become very popular over the last couple of years, especially
under the term deep learning [Hinton et al., 2012, Collobert and Weston, 2008, LeCun
et al., 2015].
To quantify the predictive value of soil moisture for vegetation the accuracy of
the neural networks are analyzed. The analysis will be done over different spatial
regions as well as different temporal regions. To quantify how far into the future soil
moisture has predictive value for vegetation a lag period between input and output
samples is introduced. The accuracy of the models with different lag periods are then
analyzed to quantify how far into the future soil moisture has predictive value. To
quantify the spatial relation between soil moisture and vegetation locally connected
neural networks are used, and their accuracies are analyzed. Finally a measure of
uncertainty is introduced to the models, thereby introducing another way to possibly
quantify the predictive value of soil moisture for vegetation. Having a measure of
uncertainty is also useful for the practical applicability of our models.
Several studies have set up methodologies to predict vegetation (NDVI). However,
none of them used soil moisture as input. Indeje et al. [2006] predicted NDVI in Ke-
nia using the seasonal rainfall from the global climate models (GCM). It is assumed
that climate variability, especially precipitation, drive variability in NDVI. The au-
thors apply a correction to the GCM output using the model output statistics (MOS)
approach, and then predict NDVI using a combination of empirical orthogonal func-
tion (EOF), singular value decomposition (SVD), or canonical correlation analysis
(CCA), and multiple linear regression. They report that NDVI can be skillfully pre-
dicted (with ≥ 0.6 correlation), however, they do not report any error characterization
such as the mean squared error (MSE).
[Jiang et al., 2016] studied the spatiotemporal variability and predictability of
NDVI in Alberta, Canada. They showed that vegetation in southern Alberta is pre-
dominantly driven by precipitation. Instead of predicting NDVI it predicts smoothed
NDVI (sNDVI). The authors use a linear regression model and an artificial neural
network model calibrated by a genetic algorithm (ANN-GA) to predict sNDVI. Simi-
lar to our findings, they found that the non-linear model (ANN-GA) performed better
than the linear model. This study will take a similar approach, but then with a direct
focus on soil moisture using more advanced neural networks over Australia.
In this study the focus will be on both the influence of soil moisture on NDVI
(as already investigated by Chen et al. [2014]) and the predictive value of soil mois-
ture. This allows for a deeper analysis on the relationship between soil moisture
and vegetation and allows for a better look at the predictive value of soil moisture
3
for vegetation. The main contribution of this thesis, presented in chapter 3, is the
analysis and prediction of vegetation with high accuracy using neural networks. The
focus is on analyzing the predictability of vegetation in different temporal and spatial
regions. Also analyzed is the effect of introducing a lag period between the soil mois-
ture observations and the vegetation predictions. Furthermore, the performance of a
regular neural network and an ensemble of small locally connected neural networks
is compared. Finally, chapter 4 will focus on improving the predictions by adding a
measure of uncertainty to them. This is done using a Bayesian approach; by replac-
ing the neural network with a Bayesian neural network based on work of Louizos and
Welling [2016].
4
Chapter 2
Background
2.1 Data
2.1.1 Soil moisture
The soil moisture dataset [Liu et al., 2012, 2011, Wagner et al., 2012] is provided by the
CCI project which is part of the ESA programme on global monitoring of essential cli-
mate variables. The dataset was retrieved from http://www.esa-soilmoisture-cci.
org/node/145 on April 2016. It provides surface soil moisture maps at a 0.25◦ res-
olution from 1972 to 2014. It uses active as well as passive microwave sensors and
combines these two data streams into one final dataset. Observations are available
daily, however, not every area has a daily valid soil moisture observation. In other
words, daily maps are incomplete. To help with this issue, and to make the data
consistent with the NDVI dataset, the observations are averaged over the first fifteen
days of a month and the remaining observations of a month. This results in two soil
moisture maps per month. Figure 2.1 shows an example of a 15 day soil moisture
map.
2.1.2 NDVI
To quantify vegetation the normalized difference vegetation index (NDVI) [Rouse,
1973] is used. NDVI is an index that captures the amount of live green vegetation
or photosynthetic activity in an area and was first introduced in 1973 by Rouse et
al. It is a popular index that has found a wide range of applications in areas such as
vegetation dynamics, biomass production, and crop yield prediction.
NDVI uses visible light and near infrared light to distinguish between healthy
and unhealthy vegetation. It uses the concept that in general healthy vegetation will
5
0 200 400 600 800 1000 1200 1400
0
100
200
300
400
500
600
700
Figure 2.1: Example of a soil moisture map. White indicates no soil moisture in-
formation is available for that area, blue indicates dry areas, and red indicates wet
areas. Even when averaged there are still areas without data (e.g. the white areas in
South-America).
absorb most of the visible light while it reflects more of the near infrared light. In
contrast, unhealthy vegetation reflects more visible light and less near infrared light.
This leads to the following fraction:
NDV I =NIR−REDNIR +RED
where NIR is the near infrared reflectance value for a cell and RED is the red
reflectance value for that cell. The near infrared reflectance and red reflectance values
for cells are captured using satellite instruments. In general NDVI values range from
-1 to +1 with larger values indicating more vegetation.
NDVI is not the only index that measures live green vegetation. Other indices
such as the soil-adjusted vegetation index (SAVI) or the enhanced vegetation index
(EVI) also try to measure live green vegetation. For this study NDVI is chosen due
to its wide recognition within the science community.
The NDVI data is obtained from the GIMMS AVHRR Global NDVI dataset
[Pinzon and Tucker, 2014]. The dataset was retrieved from https://ecocast.arc.
nasa.gov/data/pub/gimms/3g.v0/ on April 2016. The dataset is assembled from a
collection of observation of NOAAs Advanced Very High Resolution Radiometers. It
6
provides bimonthly observations at a 1/8th◦ resolution from 1981 to 2014. To avoid
resolution incompatibility with the soil moisture data the NDVI dataset is downscaled
to the same 0.25◦ resolution of the soil moisture dataset. Figure 2.2 shows an example
of a 15 day NDVI map.
0 200 400 600 800 1000 1200 1400
0
100
200
300
400
500
600
700
Figure 2.2: Example of an NDVI map. Blue indicates an NDVI value of -1 and red
indicates an NDVI value of +1.
2.1.3 Preprocessing
Before feeding the data to the models three preprocessing transformations were per-
formed. The first transformation is the aggregation of 10 observations (2 per month)
into a single observation, effectively introducing a notion of history to all our observa-
tions. This comes from the work of Chen et al. [2014], which shows that a soil moisture
observation influences the NDVI up until the following 5 months. Performing this
preprocessing step results in a small increase, 2-3%, in performance.
The second transformation is normalizing the input data. This is commonly done
as it often leads to faster convergence and better local optima. To normalize the
input data the mean is subtracted from the input data and the result is divided by
the standard deviation of the input data.
The third transformation is removing the seasonality so our neural network can
focus on learning anomalies. The data contains a strong seasonality, in other words,
vegetation maps of the same months (or adjacent months) look similar. Note that
every month is represented by two samples, such that there are 24 samples in a year.
7
Each sample then spans a period of roughly two weeks. To remove the seasonality
the mean is computed for each period and this mean is then subtracted from each
sample within that same period. An example of this transformation on Australia can
be seen in figure 2.3.
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
Figure 2.3: The first map is the original output, the second map is the average of
all samples of that same time period, and the third map is the difference between
the first two maps. Orange indicates a difference of zero, red means an increase of
vegetation, blue means a decrease of vegetation.
This transformation is also applied on the input data. Note that this might remove
some important information, most importantly the scale. Consider two samples from
different months, that have completely different averages. It is then possible that after
applying this transformation two (input) samples have the same difference maps but
have completely different output maps.
To evaluate this transformation two neural networks were trained, one on a dataset
without this transformation (’non-anomaly’) and one with this transformation applied
(’anomaly’). Figure 2.4 shows the test accuracy of both neural networks. The non-
anomaly neural network has an error of 0.002310 and the anomaly neural network
has an error of 0.002075, an improvement of ±10%. The differences are small but the
anomaly neural network outperforms the non-anomaly neural network consistently.
Therefore, this transformation will be used as an extra preprocessing step on the
input dataset.
2.2 Machine learning models
In this study a few machine learning models are used including ridge regression, neural
networks, and Bayesian neural networks. Familiarity with ridge regression is assumed
so that the next two sections can focus on giving an overview of neural networks and
Bayesian neural networks.
8
2009 2010 2011 2012 2013
Time
0.000
0.002
0.004
0.006
0.008
0.010
0.012
Err
or
non-anomaly
anomaly
Figure 2.4: Mean squared error for original dataset and transformed dataset
2.2.1 Neural Networks
To explain what a neural network is, it is important to first understand what a
perceptron is, which is a type of artificial neuron. The perceptron takes several
inputs x1, x2, . . . , xn, and multiplies each input by a weight w1, w2, . . . , wn, it then
sums up all these values together and if that value is larger than a certain threshold
it will output a 1 and otherwise a 0. To be more precise:
output =
0, if∑n
i=0 xiwi ≤ threshold
1, if∑n
i=0 xiwi > threshold(2.1)
By varying the weights and the threshold the perceptron will learn to make dif-
ferent decisions. The last few years different artificial neurons are often being used
instead of perceptrons, they still use the same idea of weights, except they will often
apply a non-linearity such as the sigmoid function to the result sum instead of com-
paring it to some threshold. By stacking these perceptrons a more powerful model
called the multilayer perceptron is obtained.
One example of a multilayer perceptron is shown in figure 2.5. The first column
of nodes is usually the input, the second column is the first layer of perceptrons, the
9
third column is the second layer of perceptrons, etcetera. By having multiple layers
of perceptrons increasingly difficult decision can be made. A multilayer perceptron
is a certain type of neural network, one where the artificial neurons are perceptrons,
however, as will become clear in the next section, it is possible to have different
kind of neurons. A neuron is often called a (hidden) unit. A neural network then
has input units, hidden units (in a multilayer perceptron case the perceptrons), and
output units. To be clear, a multilayer perceptron is a neural network, but a neural
network is not necessarily a multilayer perceptron.
Input #1
Input #2
Input #3
Output
Hidden
layer #2
Hidden
layer #1
Input
layer
Output
layer
Figure 2.5: An example of a multilayer perceptron
The equation 2.1 can be rewritten in a more general form:
output = f(w · x + b) (2.2)
where x and w are vectors of input and weights, and b is a bias term. The bias
term is simply the threshold except it has been moved to the left hand side. The
function f defines what kind of artificial neuron it is. Given the function:
f(x) =
0, if x ≤ 0
1, if x > 0(2.3)
the neuron corresponds to a perceptron and the equation is equal to equation 2.1.
However, f can be any kind of function like a sigmoid, tanh, or a rectified linear
one. It is important to note that often f is a non-linear function as this makes the
neural network more powerful. Below three non-linearities are highlighted. Firstly,
the sigmoid function which squashes inputs to a value between 0 and 1 as can been
10
seen in figure 2.6, the equation is as follows:
σ(x) =1
1 + e−x
Secondly, the tanh function which is similar to the sigmoid function except it
squashes inputs to a value between -1 and +1, it is plotted in figure 2.6, and the
equation is as follows:
tanh(x) =1− e−2x
1 + e−2x
Finally, the rectified linear function, these units are often called rectified linear
units or ReLU units. This function returns the input if it’s larger than zero, otherwise
it returns zero. It is plotted in figure 2.7 and the equation is as follows:
relu(x) =
x, if x > 0
0, if x ≤ 0(2.4)
−5.0 −4.0 −3.0 −2.0 −1.0 1.0 2.0 3.0 4.0 5.0
−1.0
−0.5
0.5
1.0
x
yσ(x) = 1
1+e−x
tanh(x) = 1−e−2x
1+e−2x
Figure 2.6: The sigmoid and tanh functions
The output of a neural network could be a unit in which f is the identify function,
which is often used for regression problems. There are other possible options such as
a softmax, which is often used for classification problems. The problem considered
in this thesis has as many output units as there are pixels in the NDVI map that is
11
−5.0 −4.0 −3.0 −2.0 −1.0 1.0 2.0 3.0 4.0 5.0
−1.0
1.0
2.0
3.0
4.0
5.0
x
yrelu(x) = max(0, x)
Figure 2.7: The ReLU function
being predicted. Each output unit has to predict a real value between -1 and 1 (the
range of NDVI values) so for this problem f is set to the identify function for the
output units.
By changing the weights of the neural network, the decisions made by the neural
network change, but how should one change these weights? Basically, one would like
to have an algorithm that changes the weights and the biases of the neural network so
that it outputs correct answers, based on some training data. To quantify how correct
an answer is a cost function is defined, or an error measure. One example of a cost
function is the mean squared error. The learning algorithm then tries to minimize this
cost function by changing the weights and biases. One of the most common learning
algorithms is gradient descent, that computes the gradient of the error with respect
to the weights and biases, and then updates the weights and biases, so that the error
decreases. Computing this gradient is often done using backpropagation [Rumelhart
et al., 1985].
As gradient descent requires computation of the gradient over the complete dataset,
which is expensive, stochastic gradient descent (SGD) is often used. SGD is a stochas-
tic approximation of gradient descent that instead computing the gradient over the
complete dataset, computes the gradient over a subset of the dataset. SGD is widely
used but is inefficient when it comes to optimizing objectives that contain other
sources of noise than data subsampling. Adam [Kingma and Ba, 2014] is a learning
algorithm that tries to be efficient at optimizing these stochastic objectives. The ad-
vantage of using Adam over SGD is that it is invariant to rescaling of gradients and
robust to noisy and sparse gradients while having little memory and computational
12
overhead. In this thesis Adam is used as optimizer for all our experiments.
Another crucial part of a neural network is its architecture. The architecture of
a neural network consists of layers, each containing a number of units. Usually, the
first layer is the input, the final layer is the output, and all layers in between are
hidden layers. For example, the neural network in figure 2.5 has 4 layers. The first
layer has 3 input units, the two following layers are hidden layers one with 4 units
and one with 5 units, and the final layer has a single output unit.
Finally, dropout [Srivastava et al., 2014] is briefly discussed. Dropout is a regu-
larization technique that randomly drops units during training. To be precise, during
training the output of randomly selected units will be set to zero, while during testing
all outputs will be scaled by some factor (this factor depends on the probability that
a unit drops). Usually a certain probability set per layer on whether or not a unit
drops. The hope is that this prevents units from co-adapting too much. Dropout is
a very simple technique but surprisingly effective.
2.2.2 Bayesian Neural Networks
A disadvantage of neural networks is that they do not provide any kind of uncertainty
measure with their output, in other words, they do not provide any confidence inter-
vals. This is especially important for problems where key decisions are being made
based on the output.
To introduce confidence levels to neural networks Bayesian methods are applied
to it. In the Bayesian treatment of neural networks we marginalize over the distri-
bution of parameters in order to make a prediction. In other words, a probability
distribution is put over the weights w. As a neural network is highly non-linear and
complex an exact Bayesian treatment is practically impossible. Therefore, approxi-
mation methods are used to approximate the distribution. This section focuses on
a family of approximation methods called variational inference. Another family of
approximation methods are the Markov Chain Monte Carlo (MCMC) methods. The
advantage of variational inference methods over MCMC methods is that variational
inference methods do not require any sampling and hence are fast and deterministic.
Variational inference is a family of methods that cast inference in a distribution
as an optimization problem. This is done by minimizing the Kullback-Leibler (KL)
divergence [Kullback and Leibler, 1951] between the approximate posterior and the
true posterior. In other words, the distribution p(y|x), which is too complicated
to evaluate directly, is approximated by a simpler distribution q(y). Usually this
simpler distribution q makes more independence assumptions than p. The problem
then becomes which simpler distribution q to select. This is done by defining a family
13
of distributions Q that are all simple enough to evaluate, then the q in Q that best
approximates p is selected (this is the optimization part). To evaluate how well q
approximates p the KL divergence is often used.
There exist a lot of variational inference methods such as loopy belief propaga-
tion, mean-field approximation, and expectation propagation. Recent research in this
area, with a focus on applications in neural networks, includes work from [Graves,
2011, Hernandez-Lobato and Adams, 2015, Blundell et al., 2015, Kingma et al., 2015,
Louizos and Welling, 2016]. This thesis will use the variational Bayesian neural net-
work method defined in this last paper, called VMG (Variational Matrix Gaussian).
All recent approaches mentioned above, besides the approach of Louizos and
Welling [2016], assume a fully factorized posterior distribution over the neural network
weights. In other words, they treat each weight of the weight matrix independently.
In contrast, Louizos and Welling [2016] treat the whole weight matrix as one us-
ing a matrix variate Gaussian distribution Gupta and Nagar [1999]. This leads to
a reduction in variance parameters to estimate, better weight posterior uncertainty
estimation, more information sharing between weights, and an easier learning task.
14
Chapter 3
Predicting NDVI
In this chapter NDVI is predicted using soil moisture. The performance of different
neural network architectures is analyzed, as well as the influence of different areas and
time periods on performance. Afterwards a lag period is introduced between input
and prediction to quantify how far into the future soil moisture has predictive value.
Finally, the performance and benefits of using ensembles of locally connected neural
networks over a single large neural network are discussed.
All our models take a map with soil moisture levels as input, and produce a map
with vegetation levels as output. To represent vegetation the normalized difference
vegetation index (NDVI) is used, throughout this chapter vegetation and NDVI can
be used somewhat interchangeably. Both the CCI Soil Moisture dataset and the
GIMMS NDVI dataset provide maps that cover the entire world. A lot of areas are
simply not interesting to look at since water is always readily available, and so soil
moisture has a small impact on vegetation. Furthermore, predicting vegetation for the
entire world instead of a single country is more expensive computationally. Hence, in
the experiments only a single country will be considered, namely Australia. Australia
was chosen, and not for example south-Africa, because it is a well studied area and
because the data available for Australia is of high quality. To focus on Australia
image patches of 140 by 180 pixels containing just Australia were extracted and used
as new input and output datasets.
Figure 3.1 shows a few examples of input, output, and prediction maps (predictions
were done by our best performing model). Visually the predictions look similar to
the expected output, with areas bordering water looking more similar, while a few
spots in the middle of Australia prove more difficult to predict.
15
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
(a) The first month in the test set
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
(b) The sixth month in the test set
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
(c) The twelfth month in the test set
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
(d) The wettest (and worst) month in the test set
Figure 3.1: Input, output and prediction for a few samples in the test set. For the
first column blue represents dry areas, while red represents wet areas. For the other
two columns blue represents no vegetation, while red represents high vegetation.
16
3.1 Using neural networks
In this thesis the focus is on using neural networks as a prediction model, however it
is helpful to establish a baseline using a few simpler models. This is done by applying
two linear models on the problem: linear regression and ridge regression. The linear
regression model has no parameters and the ridge regression model uses a weight
penalty of α = 100. Figure 3.2 shows that the ridge regression model performs
significantly better than linear regression suggesting that overfitting is a problem.
The fact that ridge regression performs similar to our best neural network suggests
that the relation between soil moisture and NDVI might be of linear nature.
A variety of different neural network architectures were tested on our problem,
the performance of the most interesting ones together with the performance of the
two baseline models can be found in table 3.1. The neural network with the best
performance has 2 hidden layers each with 2500 ReLU units and uses heavy `1 and
`2 regularization. It performs about 39% better than the ridge regression model.
Table 4.1 also contains the performance of a model that simply predicts all zeros, in
other words, it predicts there are no anomalies and so the predicted vegetation map
is equal to the average of all vegetation maps in the training data of that same time
period. The best neural network model performs about 20% better than this model.
The next few paragraphs will elaborate more on how these neural networks were
initialized, trained, how their architectures impacted performance, what properties
worked best and why.
All neural networks weights were initialized by drawing from a standard normal
distribution with standard deviation 0.01 as described in Alex’ One Weird Trick Paper
[Krizhevsky et al., 2012]. All biases were initialized to a constant bias of 0.1.
Each neural network was optimized using the Adam [?] optimizer with a learning
rate of 0.0001. The Adam optimizer was initialized with the following parameters:
β1 = 0.9, β2 = 0.999, ε = 10−8. The minibatch size was set to 24 at the beginning
of training and was slowly increased to the size of the complete training set. In total
there are about 520 samples in the dataset, of which 390 (75%) are used for training
purposes. The training dataset is small enough to allow for a full non-stochastic
gradient update.
So far there hasn’t been any details on what exactly is being optimized. The
output data (the vegetation maps) are essentially matrices of real values, and so any
matrix similarity measure might be used as an error measure. This work uses the
`2 norm ||A− B||22 which is simply the mean of the squared differences between two
matrices: E(A,B) = 1n
∑ni=0 (Ai −Bi)
2.
Heavy regularization was required for good generalization, especially a strong `1-
17
2009 2010 2011 2012 2013
Time
0.000
0.005
0.010
0.015
0.020
0.025Err
or
LR
Ridge
NN
Figure 3.2: Time series with the mean squared error of linear regression, ridge regres-
sion, and the best neural network, on a period from Aug 2008 to 2014.
Model Parameters Error
All zeros - 0.00234371
Linear regression - 0.0100653
Ridge regression α = 100 0.00305129
Neural network 1x500 ReLU 0.00192698
Neural network 1x4000 ReLU 0.00191569
Neural network 2x500 ReLU 0.00190007
Neural network 2x1500 ReLU 0.00188281
Neural network 2x2500 ReLU 0.00186817
Neural network 2x2500 ReLU dropout 0.00186956
Neural network 2x2500 tanh 0.0021368
Neural network 2x2500 sigmoid 0.00222824
Neural network 2x3500 ReLU 0.00186919
Neural network 2x3500 ReLU, dropout 0.00186964
Neural network 3x3500 ReLU 0.00186975
Table 3.1: The performance of several models. The error column contains the mean
squared error for each model. The best performing model is an NN with parameters
2x2500 ReLU, meaning it has 2 hidden layers each with 2500 ReLU units and does
not use dropout.
18
regularization was important. This is probably due to the strong spatial relation
present in the data, i.e. vegetation growth in the south hardly depends on the soil
moisture levels in the north. For our models using a regularization rate of `1 = 10−6
and `2 = 10−5 as a general rule worked well. These were also the exact values used as
regularization constants in our best performing neural network. For smaller networks
the regularization parameters were decreased by a factor 10 - 100 and for larger
networks there were increased by a factor 10 - 100.
Out of the three different non-linearities tested the ReLU non-linearity performed
best. As table 3.1 shows, it performs about 15% to 20% more accurate than the sig-
moid or tanh non-linearities. The problem of predicting vegetation using soil moisture
can be seen as a problem were essentially one map is morphed into another, both
containing real values that are somehow correlated. By using a non-linearity that
squashes values such as the sigmoid or tanh (see figure 2.6) the model loses valuable
information about the input signal.
Table 3.1 shows that having a neural network with one hidden layer only decreases
performance by 5%, again suggesting that the relation between soil moisture and
NDVI has a linear nature. Having three or more hidden layers did not improve
performance and having more than five hidden layers actually decreased performance.
This is probably due to the small amount of available data; deep neural networks
require a lot of data samples to train on. Experiments showed that 2500 hidden units
per layer was the sweet spot. Adding any additional hidden units did not result in an
increase of performance. Neural networks with less hidden units (e.g. 500) were able
to predict general vegetation levels of large areas correctly, but couldn’t accurately
predict smaller areas.
Finally, adding dropout [Srivastava et al., 2014] to the neural networks hardly
impacted the test error, which is surprising. It did, however, improve the training
error. Dropout makes a lot of sense because, like mentioned above, there is a strong
spatial relation, where an output pixel only depends on the surrounding input pixels.
By dropping a lot of pixels from the input image the model effectively gets rid of a lot
of noise. One possible reason for the lack of improvement in test error is the strong
`1 regularization all neural networks have. Another possible explanation would be
that our neural network architectures aren’t very deep which is often a requirement
for dropout to work satisfactory.
One architecture that has not been discussed so far is that of convolutional net-
works. Convolutional networks did not perform as well as conventional neural net-
works and due to their high computational cost no further research was done in this
direction. There are two problems with the convolution neural networks: the first one
19
is that filters are only applicable locally and therefore a lot of them are a waste of time
and space, and the second one is that pooling removes a lot of important information.
This is a problem as the model needs to learn exactly which pixels have increased or
decreased vegetation, and by what amount, and not just the general areas which have
increased or decreased vegetation. Another problem with using convolutional neural
networks is that they are inherently deep, which due to the amount of available data,
is a problem.
3.2 Analyzing the performance
In this section the performance of our best model: a 3 layer neural network with 2500
ReLU units per hidden layer is analyzed. First the performance of the prediction
method over different areas in Australia are analyzed. Second, the impact of timing
on the predictive skills are investigated. As will be shown in the following two sec-
tions, the prediction performance depends on the quantity of soil moisture available
where more soil moisture means worse performance. Finally, in the last subsection an
attempt will be made to explain why an increase in soil moisture results in a decrease
in test accuracy.
3.2.1 Performance on different areas
In this section different regions of Australia are analyzed to see if vegetation is more
difficult to predict in any of the regions. Measuring the difficulty of predicting vege-
tation of a region will be done by aggregating the mean squared error per pixel over
the test data.
The resulting map can be seen in figure 3.3a, and shows that there are a few small
spots in the middle of Australia where the errors are concentrated. These seem to be
spots where the model generalizes poorly. To get a more realistic look at areas that
are troublesome another map where the maximum error is bounded by 0.02, is shown
in figure 3.3b.
Figure 3.3b shows that the errors concentrate in east Australia and that the north-
west of Australia contains the least amount of errors. It is probable that this happens
due to sudden changes in wetness in these areas. Figure 3.4 shows the average soil
moisture over the entire dataset for the year 2011, which is the wettest period. Look-
ing at these figures one can see that, for example, in the year 2011 in south Australia
there is a spike in soil moisture which is an area with high error. Similar reasoning
can be applied to other spots where there is a large difference between average soil
20
moisture and soil moisture in 2011, for example, the spots in mid Australia.
0 50 100 150
0
20
40
60
80
100
1200.015
0.030
0.045
0.060
0.075
0.090
0.105
0.120
(a) Error map with unbounded errors.
0 50 100 150
0
20
40
60
80
100
1200.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
(b) Error map with the error bounded to a maximum
of 0.02
Figure 3.3: Average error maps where each pixel represents the average mean squared
error for that pixel.
0 50 100 150
0
20
40
60
80
100
120
0.00
0.04
0.08
0.12
0.16
0.20
0.24
0.28
0.32
(a) Average soil moisture over the entire dataset
0 50 100 150
0
20
40
60
80
100
120
0.00
0.04
0.08
0.12
0.16
0.20
0.24
0.28
0.32
(b) Average soil moisture over the wettest year (2011)
Figure 3.4: Maps with the average soil moisture for Australia
However, this does not explain the lack of error in the north west of Australia
which also has an increased soil moisture during 2011. The average NDVI over the
entire dataset and the average NDVI over 2011 are shown in figure 3.5. From these
two figures one can clearly see that even though there were increased soil moisture
levels in the north west there was hardly any increase in NDVI. Intuitively, this can
be explained by the idea that different areas respond differently to an increase of soil
moisture. As our neural network has not encountered anything this extreme before it
21
has to guess which areas respond strongly to the increased soil moisture and which do
not respond at all. Furthermore, areas with less NDVI response have less variation
and are thus easier to predict.
0 50 100 150
0
20
40
60
80
100
120
1.0
0.8
0.6
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0
(a) Average NDVI over the entire dataset
0 50 100 150
0
20
40
60
80
100
120
1.0
0.8
0.6
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0
(b) Average NDVI over the wettest year (2011)
Figure 3.5: Maps with the average NDVI for Australia
3.2.2 Performance on different time periods
The analysis in the previous section hinted at a strong relation between soil moisture
levels and prediction performance. This section further explores this theory and
analyzes the driest and wettest periods and their performances.
The five driest periods in the test set are the samples from periods: 1-15 Aug 2008,
1-15 Nov 2009, 1-15 Oct 2009, 6-31 Oct 2009, and 16-30 Nov 2009, these are the 1st,
32nd, 30th, 31st, and 33rd samples in the test set. These samples have the 3th, 34th,
20th, 21th and 26th best performances (out of 130 samples). These performances
aren’t spectacular but nearly all of them are in the top 20% of performances. In
contrast, the five wettest periods are all in the bottom 8% of performances. The
five wettest periods are: 1-15 Mar 2011, 16-28 Feb 2011, 16-30 Mar 2011, 1-15 Apr
2011, 1-15 Feb 2011. They have the 3rd, 1st, 2nd, 5th and 10th worst prediction
performances, four of them are even in the top 5 of worst performances.
To further solidify the relation between soil moisture and prediction performance
two time series are shown in figure 3.6. The green line represents the error of the
neural network and is scaled to be on the same scale as that of the blue line which
represents soil moisture. Besides the clear relation between the two time series that
one can see visually, it has a Pearson correlation of 0.816. In other words, 66.5% of
the error variance can be explained by a simple linear regression on the average soil
22
moisture. A scatter plot with the error on the y-axis and the average soil moisture on
the x-axis can be seen in figure 3.7. This scatter plot shows a clear positive relation
between the average soil moisture and the prediction error of the neural network.
2009 2010 2011 2012 2013
Time
Average SM
Error
Figure 3.6: The average soil moisture and (scaled) test error per sample over a period
from Aug 2008 to 2014. The graphs move similar suggesting a relation between soil
moisture levels and errors.
Combing the observations of this section with the observations of the previous
sections one can conclude there is a strong negative relation between wetness and
prediction performance. The next section will attempt to explain why wetness de-
creases performance.
3.2.3 Why wetness decreases performance
The previous two sections showed that an increase of soil moisture often results in
an increase of test error or a decrease in test accuracy. This section discusses two
reasons why this is.
Firstly, an increase in soil moisture often results in an increase in vegetation which
leads to greater variability. This variability is difficult to predict as each area reacts
differently and as there is little of data available. Another possible issue is that the
soil moisture differential maps fed to the neural networks are relative to the average
soil moisture for that month. In other words, the input are relative values, not say
percentages of change, which means that one input map might represent a 5% increase
23
0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0
Average soil moisture
0.002
0.000
0.002
0.004
0.006
0.008
0.010Err
or
Figure 3.7: Scatter plot comparing average soil moisture and error
in soil moisture for one month but 15% for another month. Further research using
percentages of change as input to the models did not yield any positive performance
improvements, proving this is not an issue.
Secondly, vegetation is a complex process that is influenced by a lot of different
factors. Soil moisture, which is part of water availability, is only one such factor.
Other factors include sunlight availability, geography of area, and temperature. Dur-
ing drier periods, when water is scarce, soil moisture is one of the most important
factors, because water availability is the limiting factor. During wetter periods where
water availability is high soil moisture because a less important factor and other
factors such as sunlight availability become more important.
3.2.4 Adaptability on anomalies
In the experiments conducted above the models are simply trained on 75% of the
available data and tested on the remaining 25%. However, this is not a very realistic
scenario. In practice, the machine learning models would be continuously re-trained
whenever new data becomes available. In other words, Therefore, for practical appli-
cability of the models, it might be interesting to see if the model could have predicted
the anomaly (the wettest year) better had it been trained on all data available just
24
up before this anomaly. Furthermore, it might be interesting to see when the neural
network adapts and learns how vegetation reacts to these wet circumstances.
To investigate how quickly the neural network adapts to wet circumstances several
neural networks have been trained. Each was given one more training sample (of 2011
the wettest year) than the previous one. Figure 3.8 shows the performance of four
neural networks on the data for the year 2011 - 2012. The blue line corresponds to the
error of the original neural network that has not seen any extra data, the green line
corresponds to the error of the neural network that has been trained up until 2011,
the red line corresponds to the error of the neural network that has been trained up
to the first vertical black line and the cyan line corresponds to the error of the neural
networks that has been trained up until the second vertical black line.
Time0.000
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
Err
or
Original
End 2011
BL 1
BL 2
Figure 3.8: Error on 2011 - 2012 period for different training datasets. Original is
the original 75% training dataset, End 2011 is trained until the end of 2011, BL 1 is
trained till the first vertical black line, and BL 2 is trained until the second vertical
black line.
The results show that training until 2011 gives a performance boost for the first
two months of 2011, which was expected, however it doesn’t improve performance for
the wettest period from March to May. Training until just before the wettest period
results in a 28.5% performance improvement over the original neural network for
the wettest period. Extending the training data by including the first peak of the
25
wettest period results in a 45% performance improvements over the original neural
networks for the second peak in the wettest period. One can conclude that the neural
network has trouble predicting the anomaly until just before it happens, and that
once reaching that peak it quickly adapts. This is useful for practical applicability
of the neural network because one could be confident it quickly adapts to current
trends.
3.3 Predicting further into the future
In the previous experiments the models were trying to predict vegetation for the
same period as the given soil moisture. In other words the lag between an input and
output sample was zero. To stronger quantify the predictive value of soil moisture a
lag period between input and output samples is introduced. This lag period ranges
from zero to four months and allows for analysis on where relevant soil moisture
information is being stored. In other words, it allows for quantification of the future
predictive value of soil moisture for vegetation.
Multiple neural networks, each with the same architecture, parameters and opti-
mization methods as described in the previous section, but each with a different lag
period have been trained. In total four neural networks were trained, one with a lag
of 1 month, one with a lag of 2 months, one with a lag of 3 months, and one with a
lag of 4 months.
The mean squared error per lag has been plotted in figure 3.9, and table 3.2
shows the combined mean squared error per lag. Looking at the wet anomalies in
2011, figure 3.9 clearly shows that most important information is contained within
the first month. For time periods where the behavior is relatively normal, all lags
seem to perform equally well, this is probably due to the strong seasonality present in
the problem. Looking at the mean squared error there seems to be a linear increase
in error per increased lag period.
A model which always predicts all zeros is added to table 3.2, all zeros meaning
it predicts the average NDVI recorded for that biweek. This clearly shows that the
neural network with a lag of 3 months is only marginally better than predicting
all zeros, and the neural network with a lag of 4 months performs even worse than
predicting all zeros. This suggests that all relevant information is present in the three
months preceding the month one wants to predict.
To further analyze the future predictive value of soil moisture the average vegeta-
tion per time period is shown in figure 3.10, with table 3.2 showing the corresponding
Pearson correlation between the averages of true vegetation maps and each lag pe-
26
riod. The Pearson correlation coefficients show that neural networks with a lag up to
two months can still predict the average vegetation trends well, however, with a lag
of 3 months or more these predictions become inaccurate. This reinforces the idea
that all relevant information is present in the three months preceding the month one
wants to predict.
2009 2010 2011 2012 2013
Time
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
Err
or
No lag
Lag of 1 month
Lag of 2 months
Lag of 3 months
Lag of 4 months
All zeros
Figure 3.9: The mean squared error per lag period from 2008 to 2014.
Lag Error Pearson correlation
Zero lag 0.00186817 0.800
Lag of a month 0.00205393 0.581
Lag of 2 months 0.00213289 0.234
Lag of 3 months 0.00230171 -0.032
Lag of 4 months 0.00237467 -0.095
All zeros 0.00234371 -
Table 3.2: The mean squared error and Pearson correlation of different lag periods
27
2009 2010 2011 2012 2013
Time
0.03
0.02
0.01
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Avera
ge N
DV
I
Target averages
No lag
Lag of 1 month
Lag of 2 months
Lag of 3 months
Lag of 4 months
Figure 3.10: The correct average vegetation and the predicted average vegetation for
each lag period from 2008 to 2014.
3.4 Locally connected methods
The previous sections showed there is a strong spatial relation present in the data.
This section will take advantage of this by using multiple neural networks that each
predict a single pixel of the NDVI map. Instead of having one neural network that
predicts the entire vegetation map at once, 140 · 180 = 25200 neural networks were
trained each predicting one pixel of the vegetation map. Instead of receiving the
entire soil moisture map as input each neural network will receive a 16 by 16 patch
surrounding the pixel it tries to predict as input. A 16 by 16 pixel patch is an area
of 400km by 400km which should contain all the important information.
The first experiment will use a one layer neural network, essentially a perceptron,
as architecture. These neural networks are initialized and trained similarly to previous
experiments. There are two notable differences in the hyperparameters, one is that
the regularization parameters have been decreased to `1 = 10−3 and `2 = 10−3, and
the second one is that the learning rate has been decreased by a factor 10 to 10−5.
These hyperparameters have been decreased because the neural networks are much
smaller.
The second experiment is similar to the first one, but changes the neural network
28
Model Error
Neural network 0.00186817
Ensemble of 1 layer neural networks 0.00194662
Ensemble of 2 layer neural networks 0.00184438
Table 3.3: The mean squared error of our best neural network and of the two ensem-
bles.
architecture to a two layer neural network with 16 ReLU units. Again these neural
networks are initialized and trained like before. The learning rate is the same as
our first experiment, 10−5, however, the regularization parameters have changed to
`1 = 10−5 and `2 = 10−2.
Table 3.3 shows the test error of our standard neural network, our most optimized
neural network and these two ensembles of neural networks. Both ensemble models
perform better than the standard neural network, with the one layer neural network
ensemble performing worse than the optimized neural network and the two layer
neural network ensemble performing a tiny bit better. This reinforces the intuition
that there is a strong spatial relation in the data. The fact that the ensemble of
one layer neural networks is able to predict with such high accuracy, again suggest
that there are a lot of areas where there is a linear relation between soil moisture
and vegetation. The fact that the 2 layer neural network ensemble performs better
suggests that at least a few areas where there is a non-linear relation between soil
moisture and vegetation. Regardless of what kind of relation it is, it is clear that
these ensembles perform equally well or better than a single large neural network.
A plot containing the average error per sample, can be seen in figure 3.11. It is
interesting to see that although the ensemble of 2 layer neural networks outperforms
the other neural networks on average, it does not outperform them on the wet periods
in 2011-2012. In fact the ensemble of one layer neural networks performs best there.
One disadvantage of having to train so many neural network is that manual tuning
of the hyperparameters is infeasible, most importantly the regularization parameters.
In a single neural network, as all the weights are regularized together, the neural
network is able to make intelligent decisions about which neurons require more weight,
and which can do with less. This is in contrast with the ensemble of neural networks
where it is highly likely that some neural networks will overfit and some will underfit,
as the hyperparameters are not optimized per neural network separately. This makes
selecting a good set of hyperparameters for the neural networks in the ensemble extra
important. In our experiments above we, due to computation constraints, gave each
29
2009 2010 2011 2012 2013
Time
0.000
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
Err
or
NN
LC 1 layer
LC 2 layers
Figure 3.11: The mean squared error of our best neural network and of the two
ensembles. The green line represents the error of the ensemble of 1 layer neural
networks, and the red line the ensemble of 2 layer neural networks.
neural network in the ensemble the same hyperparameters. The performance of these
ensembles could be improved by applying techniques such as grid search or random
search to select a good set of hyperparameters for each neural network in the ensemble
individually.
Having an ensemble of neural networks also offers some advantages. Besides the
improved performance, the neural networks are a lot smaller and can be trained
very quickly, and as they are all independent they can be trained in parallel. For
our datasets, where the resolution of input maps is still manageable, this might not
seem very important, especially with the support for multi-cpu/gpu and distributed
training in deep learning frameworks. However, when this resolution scales up by a
large factor this becomes a problem. This is mainly due to memory constraints and
the computational overhead. Remember that our input maps contain 25200 pixels
and that one pixel corresponds to an area of 25km by 25km. There are efforts, for
example by Vandersat, to improve the quality of these maps such that one pixel
corresponds to an area of 100m by 100m, leading to input maps of 6300000 pixels.
For input maps of this magnitude these locally connected methods might be highly
beneficial.
30
Chapter 4
Predicting NDVI with uncertainty
The previous chapter focused on predicting vegetation using soil moisture. In this
chapter vegetation will again be predicted using soil moisture, however, this time
models are used that also try to assign a level of uncertainty to the predictions. A
disadvantage of neural networks used in the previous chapter is that they do not
provide any kind of uncertainty measure with their output, in other words, they do
not provide any confidence intervals. This is especially important for problems where
key decisions are being made based on the predictions.
To add a level of confidence to the predictions Bayesian methods are used. The
focus here will be on Bayesian neural networks. Many different kind Bayesian neural
networks models exist, in this work the focus will be on a variational one called the
Variational Matrix Gaussian introduced by Louizos and Welling [2016].
Again the GIMMS NDVI dataset is used for vegetation and the CCI SM dataset is
used for soil moisture. The same data preprocessing steps as in the previous chapter
have been applied. Figure 4.1 shows a few examples of input, output, and prediction
maps (predictions were done by the best performing Bayesian neural network). Again
the predictions look very accurate, albeit a bit worse than the non-Bayesian neural
network.
The next section will focus on different architectures and hyperparameters and
investigate how each one affects performance on our problem. The remaining sections
will focus on the analysis of uncertainty levels for different areas and time regions.
4.1 Bayesian neural networks
This section presents the results of the Variational Matrix Gaussian model for various
architectures and hyperparameters. The performance of various models are analyzed
and the performance between non-Bayesian neural networks and Bayesian neural
31
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
(a) The first month in the test set
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
(b) The sixth month in the test set
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
(c) The twelfth month in the test set
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
(d) The wettest (and worst) month in the test set
Figure 4.1: Input, output and prediction for a few samples in the test set. For the
first column blue represents dry areas, while red represents wet areas. For the other
two columns blue represents no vegetation, while red represents high vegetation.
32
networks are compared.
All models are trained with the Adam optimizer using the following parameters:
β1 = 0.9, β2 = 0.999, ε = 10−8. A learning rate of 0.01 and a batch size of 72 were
used. Each Bayesian neural network was trained for 100 epochs. The same initializa-
tion and parameterization as described in the regression experiments of Louizos and
Welling [2016] were used. This means all models were initialized using the default
he2 initialization scheme [He et al., 2015] for the mean of each matrix variate Gaus-
sian. A Gamma prior p(τ) = Gamma(a0 = 6, b0) was introduced, as was a posterior
q(τ) = Gamma(a1, b1) for the precision of the Gaussian likelihood. The matrix vari-
ate Gaussian prior for each layer was parametrized as p(W ) =MN (0, τ−1r I, τ−1
c I),
where p(τr) and p(τc) equals Gamma(a0 = 1, b0 = 0.5) and q(τr) = Gamma(ar, br)
and q(τc) = Gamma(ac, bc). The pseudo-data was initialized using samples from
the entries of A,B. One difference is that instead of using one posterior sample to
estimate the expected log-likelihood used to update the parameters, five posterior
samples were used.
Table 4.1 shows the test error for the most interesting architecture and hyperpa-
rameter combinations. All models perform similar to their non-Bayesian counterpart,
albeit a little bit worse. The best Bayesian neural network has 3 hidden layers each
with 2500 ReLU units, uses 500 pseudo data pairs, and has a variational dropout rate
of 0.05.
The performance of various architecture for Bayesian neural networks follow a
similar pattern to that of non-Bayesian neural networks. For example increasing the
number of hidden units past 2500 did not result in any performance improvements.
One difference is that in contrast with the non-Bayesian neural network, the Bayesian
neural networks with 3 hidden layers performs better than the Bayesian neural net-
work with 2 hidden layers.
Selecting the number of pseudo-data pairs is a trade off between increased per-
formance and increased training time. In a sense it is a limiting factor as increasing
the number of pseudo-data pairs never decreases performance. By manual search 500
pseudo-data pairs was selected as a good balance. Similarly by a simple linear search
the variational dropout rate resulting in the best performance was found to be 0.05.
It is interesting to look at the errors of the best performing Bayesian neural net-
works and the best non-Bayesian neural network. Therefore, both time series are
shown in figure 4.2. The plot shows that both graphs are very similar, they per-
form well on the same time periods and perform poorly on the same time periods.
The Bayesian neural network almost always performs a little bit worse than the non-
Bayesian neural network, however, it never performs a lot worse.
33
Model Parameters pdp vdr Error
Neural network 2x2500 ReLU - - 0.00186817
Bayesian neural network 1x500 ReLU 5 0.1 0.002008188
Bayesian neural network 1x500 ReLU 25 0.1 0.001999300
Bayesian neural network 1x1500 ReLU 5 0.1 0.001994170
Bayesian neural network 1x1500 ReLU 50 0.1 0.001982113
Bayesian neural network 2x1500 ReLU 50 0.1 0.001962147
Bayesian neural network 2x1500 ReLU 150 0.1 0.00195913
Bayesian neural network 2x2500 ReLU 150 0.1 0.00195690
Bayesian neural network 2x2500 ReLU 500 0.1 0.00195016
Bayesian neural network 2x2500 ReLU 500 0.05 0.00192934
Bayesian neural network 3x2500 ReLU 500 0.01 0.00193218
Bayesian neural network 3x2500 ReLU 500 0.05 0.00191544
Bayesian neural network 3x3500 ReLU 500 0.05 0.00191622
Bayesian neural network 3x2500 ReLU 500 0.1 0.00194690
Bayesian neural network 3x2500 ReLU 1000 0.1 0.00194662
Bayesian neural network 3x2500 ReLU 500 0.2 0.00198915
Table 4.1: The mean squared error of several models. pdp stands for pseudo data
pairs, and vdr stands for variational dropout rate. The best Bayesian NN has 3
hidden layers each with 2500 ReLU units.
34
2009 2010 2011 2012 2013
Time
0.000
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
Err
or
NN
Bayes NN
Figure 4.2: The mean squared error for the best Bayesian neural network and best
regular neural network on a period from Aug 2008 to 2014.
To show how similar predictions are figure 4.3 contains the expected output, the
prediction of the non-Bayesian neural network, and the prediction of the Bayesian
neural network, for three samples in the test dataset. All images have the same scale
with red representing higher than average vegetation and blue representing lower than
average vegetation. With the Bayesian neural network performing similar to the non-
Bayesian neural networks and with the added benefit of providing confidence levels
it might be a suitable alternative to non-Bayesian neural networks.
35
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
(a) Period of 1-15 August 2009.
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
(b) Period of 16-30 November 2009.
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
0 50 100 150
0
20
40
60
80
100
120
(c) Period of 16-31 March 2013.
Figure 4.3: The output, non-Bayesian NN prediction, and Bayesian NN prediction
for a few samples in the test set. Blue represents no vegetation, while red represents
high vegetation.
36
4.2 Analyzing the uncertainty
This section contains the analysis on the performance of the best performing Bayesian
neural network. As a strong relationship between soil moisture and the models’ accu-
racy was already established in the previous chapter the focus of this section is on the
relationship between soil moisture and uncertainty, and between error levels and un-
certainty. A strong relationship between soil moisture and uncertainty would solidify
that soil moisture has a strong predictive value for vegetation, and a strong relation-
ship between error levels and uncertainty would prove the effectiveness of Bayesian
neural networks. Additionally, the difference in uncertainty between different areas
and time periods is analyzed.
4.2.1 Uncertainty in different areas
This section presents the analysis on the average error, uncertainty, soil moisture,
and vegetation maps. The maps can be seen in figure 4.4. The average error, soil
moisture, and vegetation maps are created the same way as in the previous chapter.
The average uncertainty map represents the standard deviation of 1000 drawn samples
for each pixel.
Ideally one would want the error map and the uncertainty map to be roughly
equal, unfortunately this is not the case. Many areas in mid and south Australia
with high error have low uncertainty, an undesirable result. Similarly many area with
low error have relatively high uncertainty, for example south west Australia.
The average soil moisture map bears more resemblance to the average uncertainty
map, where higher soil moisture levels correspond to more uncertainty. This suggests
that soil moisture has a stronger predictive value when soil moisture levels are low,
which is in line with the results seen so far. Areas with high soil moisture levels
and low error generally also have low uncertainty. This suggests that soil moisture
is the leading factor for uncertainty unless the models’ predictions are accurate. The
vegetation map and the uncertainty map share the same general structure but do not
seem to have any clear relation.
Although these maps seem to confirm soil moistures’ predictive value, they fail to
show that the uncertainty is working well. The next section focuses on uncertainty
in different time series and shows a clearer relationship between soil moisture levels,
error levels, and vegetation levels.
37
0 50 100 150
0
20
40
60
80
100
120
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
(a) The average error map bounded by 0.02
0 50 100 150
0
20
40
60
80
100
120
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
(b) The average uncertainty map
0 50 100 150
0
20
40
60
80
100
120
0.00
0.04
0.08
0.12
0.16
0.20
0.24
0.28
0.32
(c) The average soil moisture
0 50 100 150
0
20
40
60
80
100
120
1.0
0.8
0.6
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0
(d) Average vegetation
Figure 4.4: The error, uncertainty, average soil moisture, and average vegetation
maps for the best Bayesian neural network. All maps are averaged over a period
from Aug 2008 to 2014.
4.2.2 Uncertainty in different time periods
The average maps in previous section failed to show a clear relation between the
uncertainty map and the error and soil moisture maps. This section presents the
analysis on the time series of the average soil moisture levels, error levels, and uncer-
tainty levels, as well as the analysis on specific time periods. The time series is shown
in figure 4.5. All three the time series are scaled to be between 0 and 1 to provide a
better comparison.
The average soil moisture levels and error levels seems to be similar to that of
a non-Bayesian neural network, and have high correlation. The uncertainty levels
can be roughly categorized into three groups, large uncertainty, medium uncertainty,
38
2009 2010 2011 2012 2013
Time
Average SM
Error
Uncertainty
Figure 4.5: The average uncertainty, average error, and average soil moisture levels
of each map from Aug 2008 to 2014.
and low uncertainty. Visually uncertainty seems to be more related to soil moisture
than to error levels. This is confirmed by the Pearson correlation coefficients, which
is 0.724 for soil moisture and uncertainty, and 0.525 for error and uncertainty. This
provides extra evidence that soil moisture’s predictive value becomes stronger as it
becomes a scarcity. When soil moisture becomes readily available other ecological
factors such temperature and solar radiation become more important for vegetation
levels. As this information is unavailable to the models the uncertainty increases, and
potentially also the error. This explains why uncertainty is more closely related to
soil moisture than to the error levels.
From the analysis it is clear there is a strong positive relation between soil moisture
and uncertainty, however, as shown in the previous section, it is not clear that the
uncertainty is concentrated in the right areas. To further investigate the relations
between uncertainty and soil moisture, and between uncertainty and error, a few
sample cases are analyzed. Differential soil moisture, error, and uncertainty maps for
a few selected samples are shown in figure 4.6.
Although the Bayesian neural network is able to capture general uncertainty levels
well it is unable to consistently estimate high uncertainty for areas with high error.
In other words, the model is able to detect the existence of areas with relatively
high error, but it isn’t able to estimate where exactly. Analysis on the error maps
and uncertainty maps for all test cases showed that uncertainty was most accurately
39
estimated when there was little overall uncertainty. There are only a handful of test
cases where this is the case, one such example is shown in figure 4.6d. The model
correctly estimates high uncertainty in mid west Australia, the area where there is
also high error. However, more commonly the model is not able to estimate where
these areas are. The result is a very grainy uncertainty map where areas with high
error have low uncertainty and areas with low error have high uncertainty. A few of
such examples are shown in figures 4.6a, 4.6c, and 4.6b.
The relation between uncertainty and soil moisture suffers from the same problems
as the relation between uncertainty and error levels, where the model is unable to
consistently estimate high uncertainty for areas with large amounts of soil moisture,
something which is expected as there exists a strong positive relation between soil
moisture and uncertainty levels. It could have been that these areas simply have low
error, however, as can be seen in figure 4.6, this is not the case.
In conclusion, the Bayesian neural network is able to capture the general uncer-
tainty, however, it cannot pinpoint the location of this uncertainty. Further research
is required to investigate why uncertainty estimations are so poorly localized.
40
0 50 100 150
0
20
40
60
80
100
120
8
6
4
2
0
2
4
6
8
0 50 100 150
0
20
40
60
80
100
120
0.51
0.48
0.45
0.42
0.39
0.36
0.33
0.30
0.27
0 50 100 150
0
20
40
60
80
100
120
0
3
6
9
12
15
18
21
(a) Period of 1 - 15 January 2010
0 50 100 150
0
20
40
60
80
100
1207.5
6.0
4.5
3.0
1.5
0.0
1.5
3.0
4.5
0 50 100 150
0
20
40
60
80
100
120
0.12
0.06
0.00
0.06
0.12
0.18
0.24
0.30
0.36
0 50 100 150
0
20
40
60
80
100
120
0
4
8
12
16
20
24
28
(b) Period of 1 - 15 April 2011
0 50 100 150
0
20
40
60
80
100
120
7.5
6.0
4.5
3.0
1.5
0.0
1.5
3.0
4.5
0 50 100 150
0
20
40
60
80
100
120
0.60
0.57
0.54
0.51
0.48
0.45
0.42
0.39
0.36
0 50 100 150
0
20
40
60
80
100
1200
1
2
3
4
5
6
7
8
(c) Period of 16 - 30 September 2013
0 50 100 150
0
20
40
60
80
100
1208
6
4
2
0
2
4
6
0 50 100 150
0
20
40
60
80
100
120
0.76
0.72
0.68
0.64
0.60
0.56
0.52
0.48
0 50 100 150
0
20
40
60
80
100
120
0
2
4
6
8
10
12
14
16
(d) Period of 1-15 November 2013
Figure 4.6: Maps showing the soil moisture (column 1), uncertainty (column 2), and
error (column 3) for a few selected test cases. Both the error and uncertainty maps
represent percentage change from the average for that period.
41
Chapter 5
Conclusion
The previous chapters have shown that vegetation can be successfully predicted using
satellite based soil moisture, thereby showing that soil moisture has a strong predictive
value for vegetation. They have also shown that soil moisture has a stronger predictive
value when its available in limited quantities. In other words, soil moisture has
a stronger predictive value in dry regions than in wet regions. Furthermore, they
showed that soil moisture can be used to reliably predict vegetation up to two months
in advance and that it has a strong local spatial relation with vegetation.
The performance of our ensemble of small neural networks confirms that soil
moisture has a strong local spatial relation with vegetation. This local spatiality
allows us to use ensembles of small neural networks instead of a single large neural
networks without sacrificing performance. These ensembles allow us to scale up to a
much higher spatial resolution, a procedure that becomes difficult with a single neural
network without sacrificing performance. In fact, due to the local properties of soil
moisture, these ensembles have better performance. Furthermore, these ensembles
also allow for faster training as they are easily parallelizable.
Finally, the addition of an uncertainty measure to our predictions was analyzed.
To produce uncertainty levels with our predictions the variational Bayesian neural
network model introduced by Louizos and Welling [2016] was used. Analysis showed
that prediction performance was slightly worse than the non-Bayesian neural net-
works, however, it was still reasonably successful. The Bayesian neural network has
good general uncertainty levels that mimic the levels of error, but has limited success
in pinpointing exactly where this uncertainty should be.
Within this research the following follow-up steps are recommended:
1. Apply semi-supervised machine learning techniques. There was a lot of incom-
plete data, which was aggregated into a more complete but smaller dataset. It’s
possible that this data is of more use for semi-supervised learning models.
42
2. Include the constants and history size in the learning strategy.
3. Weigh the training examples. The climate is a very dynamic system and cur-
rently training examples from 1992 are as important as examples from 2010,
even though the examples from 2010 are more closely related to the climate
that exists today.
4. Apply different Bayesian models. Although the test accuracy of our Bayesian
neural network was only slightly worse than that of the non-Bayesian neural
network the uncertainty maps weren’t really accurate. More research into why
this is the case, and whether other Bayesian models work better would be
helpful.
43
Bibliography
Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight
uncertainty in neural networks. arXiv preprint arXiv:1505.05424, 2015.
L Bounoua, GJ Collatz, SO Los, PJ Sellers, DA Dazlich, CJ Tucker, and DA Randall.
Sensitivity of climate to changes in ndvi. Journal of Climate, 13(13):2277–2292,
2000.
T Chen, RaM De Jeu, YY Liu, GR Van der Werf, and AJ Dolman. Using satellite
based soil moisture to quantify the water driven variability in ndvi: A case study
over mainland australia. Remote Sensing of Environment, 140:330–338, 2014.
Galina Churkina and Steven W Running. Contrasting climatic controls on the esti-
mated productivity of global terrestrial biomes. Ecosystems, 1(2):206–215, 1998.
Ronan Collobert and Jason Weston. A unified architecture for natural language
processing: Deep neural networks with multitask learning. In Proceedings of the
25th international conference on Machine learning, pages 160–167. ACM, 2008.
RAM De Jeu, W Wagner, TRH Holmes, AJ Dolman, NC Van De Giesen, and
J Friesen. Global soil moisture patterns observed by space borne microwave ra-
diometers and scatterometers. Surveys in Geophysics, 29(4-5):399–420, 2008.
Wouter A Dorigo, Klaus Scipal, Robert M Parinussa, YY Liu, Wolfgang Wagner,
Richard AM De Jeu, and Vahid Naeimi. Error characterisation of global active and
passive microwave soil moisture datasets. Hydrology and Earth System Sciences,
14(12):2605–2616, 2010.
Alex Graves. Practical variational inference for neural networks. In Advances in
Neural Information Processing Systems, pages 2348–2356, 2011.
Arjun K Gupta and Daya K Nagar. Matrix variate distributions, volume 104. CRC
Press, 1999.
44
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into recti-
fiers: Surpassing human-level performance on imagenet classification. In Proceed-
ings of the IEEE International Conference on Computer Vision, pages 1026–1034,
2015.
Martin Heimann and Markus Reichstein. Terrestrial ecosystem carbon dynamics and
climate feedbacks. Nature, 451(7176):289–292, 2008.
Jose Miguel Hernandez-Lobato and Ryan P Adams. Probabilistic backpropagation
for scalable learning of bayesian neural networks. arXiv preprint arXiv:1502.05336,
2015.
Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed,
Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N
Sainath, et al. Deep neural networks for acoustic modeling in speech recogni-
tion: The shared views of four research groups. IEEE Signal Processing Magazine,
29(6):82–97, 2012.
Martin Hirschi, Sonia I Seneviratne, Vesselin Alexandrov, Fredrik Boberg, Con-
stanta Boroneant, Ole B Christensen, Herbert Formayer, Boris Orlowsky, and
Petr Stepanek. Observational evidence for soil-moisture impact on hot extremes in
southeastern europe. Nature Geoscience, 4(1):17–21, 2011.
P Illera, A Fernandez, and JA Delgado. Temporal evolution of the ndvi as an indicator
of forest fire danger. International Journal of remote sensing, 17(6):1093–1105,
1996.
Matayo Indeje, M Neil Ward, Laban J Ogallo, Glyn Davies, Maxx Dilley, and Assaf
Anyamba. Predictability of the normalized difference vegetation index in kenya and
potential applications as an indicator of rift valley fever outbreaks in the greater
horn of africa. Journal of Climate, 19(9):1673–1687, 2006.
Rengui Jiang, Jiancang Xie, Hailong He, Chun-Chao Kuo, Jiwei Zhu, and Mingxiang
Yang. Spatiotemporal variability and predictability of normalized difference veg-
etation index (ndvi) in alberta, canada. International journal of biometeorology,
pages 1–15, 2016.
Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv
preprint arXiv:1412.6980, 2014.
Diederik P Kingma, Tim Salimans, and Max Welling. Variational dropout and the
local reparameterization trick. arXiv preprint arXiv:1506.02557, 2015.
45
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with
deep convolutional neural networks. In Advances in neural information processing
systems, pages 1097–1105, 2012.
Solomon Kullback and Richard A Leibler. On information and sufficiency. The annals
of mathematical statistics, 22(1):79–86, 1951.
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):
436–444, 2015.
Yi Y Liu, RM Parinussa, Wouter A Dorigo, Richard AM De Jeu, Wolfgang Wagner,
AIJM Van Dijk, Matthew F McCabe, and JP Evans. Developing an improved
soil moisture dataset by blending passive and active microwave satellite-based re-
trievals. Hydrology and Earth System Sciences, 15(2):425–436, 2011.
YY Liu, Wouter A Dorigo, RM Parinussa, Richard AM de Jeu, Wolfgang Wagner,
Matthew F McCabe, JP Evans, and AIJM Van Dijk. Trend-preserving blending of
passive and active microwave soil moisture retrievals. Remote Sensing of Environ-
ment, 123:280–297, 2012.
Alexander Lotsch, Mark A Friedl, Bruce T Anderson, and Compton J Tucker. Cou-
pled vegetation-precipitation variability observed from satellite and climate records.
Geophysical Research Letters, 30(14), 2003.
Christos Louizos and Max Welling. Structured and efficient variational deep learning
with matrix gaussian posteriors. NIPS, 2016.
Lina M Mercado, Nicolas Bellouin, Stephen Sitch, Olivier Boucher, Chris Hunting-
ford, Martin Wild, and Peter M Cox. Impact of changes in diffuse radiation on the
global land carbon sink. Nature, 458(7241):1014–1017, 2009.
Diego G Miralles, Wade T Crow, and Michael H Cosh. Estimating spatial sampling
errors in coarse-scale soil moisture estimates derived from point-scale observations.
Journal of Hydrometeorology, 11(6):1423–1429, 2010.
Ramakrishna R Nemani, Charles D Keeling, Hirofumi Hashimoto, William M Jolly,
Stephen C Piper, Compton J Tucker, Ranga B Myneni, and Steven W Running.
Climate-driven increases in global terrestrial net primary production from 1982 to
1999. science, 300(5625):1560–1563, 2003.
Manfred Owe, Richard de Jeu, and Thomas Holmes. Multisensor historical clima-
tology of satellite-derived global land surface moisture. Journal of Geophysical
Research: Earth Surface, 113(F1), 2008.
46
Albert J Peters, Elizabeth A Walter-Shea, Lei Ji, Andres Vina, Mlchael Hayes, and
Mark D Svoboda. Drought monitoring with ndvi-based standardized vegetation
index. Photogrammetric engineering and remote sensing, 68(1):71–75, 2002.
Jorge E Pinzon and Compton J Tucker. A non-stationary 1981–2012 avhrr ndvi3g
time series. Remote Sensing, 6(8):6929–6960, 2014.
Amilcare Porporato and Ignacio Rodriguez-Iturbe. Ecohydrology-a challenging mul-
tidisciplinary research perspective/ecohydrologie: une perspective stimulante de
recherche multidisciplinaire. Hydrological sciences journal, 47(5):811–821, 2002.
Jr.; Haas R.H.; Schell J.A.; Deering D.W. Rouse, J.W. Monitoring the vernal ad-
vancement and retrogradation (green wave effect) of natural vegetation. Prog. Rep.
RSC 1978-1, page 93p, 1973.
David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning internal
representations by error propagation. Technical report, DTIC Document, 1985.
Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan
Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting.
Journal of Machine Learning Research, 15(1):1929–1958, 2014.
Nathan L Stephenson. Climatic control of vegetation distribution: the role of the
water balance. American Naturalist, pages 649–670, 1990.
RK Teal, B Tubana, K Girma, KW Freeman, DB Arnall, O Walsh, and WR Raun.
In-season prediction of corn grain yield potential using normalized difference vege-
tation index. Agronomy Journal, 98(6):1488–1494, 2006.
Michiel K van der Molen, Albertus J Dolman, Philippe Ciais, T Eglin, Nadine Gobron,
Beverly E Law, Patrick Meir, Wouter Peters, Oliver L Phillips, M Reichstein, et al.
Drought and ecosystem carbon cycling. Agricultural and Forest Meteorology, 151
(7):765–773, 2011.
W Wagner, Wouter Dorigo, Richard de Jeu, Diego Fernandez, Jerome Benveniste,
Eva Haas, and Martin Ertl. Fusion of active and passive microwave observations to
create an essential climate variable data record on soil moisture. In Proceedings of
the XXII International Society for Photogrammetry and Remote Sensing (ISPRS)
Congress, Melbourne, Australia, volume 25, 2012.
47