Download - Geostatistical Interpolation of Rainfall
-
8/9/2019 Geostatistical Interpolation of Rainfall
1/16
Escuela Superior Politcnica del Litoral
Facultad de Ingeniera en Ciencias de la Tierra
Geostatistical Interpolation of rainfall
R. Antna, A. Crdenas
a, C. Freire
a, J. Moreira
a*
aEspol-Fict, Campus Gustavo Galindo, GuayaquilEcuador.
Abstract
The next work follows an explanation of Geostatistics applied to spatial interpolation of rainfall. Concepts as
semivariogram and its parameters, Kriging and Kriging methods are explained in the theory. These concepts are taken in a
computational procedure with ArcGIS applied to a case of study at the Guayas River Basin (GRB). Rainfall data from
the GRB was provided and transformed to a point shapefile. From that data, the empirical semivariogram was built and
fitted to three model semivariograms with Ordinary Kriging. The semivariograms were compared between them to choose
the best model. Surface maps and prediction values for data gaps were computed for monthly precipitation data for each
March from 2002 to 2006. Finally, it was concluded that some additional parameters as trends or some transformation of
data can refine the Kriging process inside ArcGIS and help computing better predicted values. It was also concluded that
the least squares method gives the best fitting of a model, but not always the fittest model gives a better definition of
reality.
Keywords: Geostatistics; Precipitation; Guayas River Basin; Kriging
1.Introduction
Precipitation is an important phase of the hydrologic cycle. It comes from clouds due to a process
condensation which occurs after evaporation. Its the main source of water for many types of water bodies and
river basins (Raghunath, 2006).
One of the most important types of precipitation is rainfall. As seen in nature, in large areas rainfall is not
uniformly distributed; and it varies over time. Rainfall analysis is crucial for applications such as agricultureand infrastructure planning. In that sense, hydrologic sciences methods for estimating rainfall data gaps
include interpolation and extrapolation (Arnell and Liu, 2001).
-
8/9/2019 Geostatistical Interpolation of Rainfall
2/16
2 Anton, Cardenas, Freire, Moreira/ Geostatistical Interpolation of rainfall
Interpolation is the method for estimating information from data points which are known. The use of
interpolation methods to estimate data gaps leads to large uncertainties with respect to the estimate data from
runoff. To reduce uncertainties, spatial interpolation is used for making estimations for hydrological models.
Geostatistics is one of the most popular techniques for spatial interpolation. It can estimate a regionalized
value at un-sampled points with help of a group of observed values. The method uses semi-variograms for
characterizing the spatial dependence in the property of interest. Some important methods for geostatisticalinterpolation of data are Ordinary Kriging (ORK), Universal Kriging (UNK) and Kriging with External Drift
(KED). There are other studies using radar measurements which scientists have proven it can have more
accuracy than simple rainfall data (Ly et al, 2011).
The present study will focus on the comparison of three semivariograms with Ordinary Kriging: Spherical
(OK-S), Exponential (OK-E) and Gaussian (OK-G); applied to monthly rainfall at several gauging stations
across the GRB.
Nomenclature
GRB Guayas River Basin
OK Ordinary Kriging
OK-S Ordinary Kriging with Spherical Semivariogram
OK-E Ordinary Kriging with Exponential Semivariogram
OK-G Ordinary Kriging with Gaussian Semivariogram
2.Theoretical Framework
2.1.
Geostatistics and Geostatistical Interpolation
Geostatistics can be described as the theory of regionalized variables applied to estimating spatialphenomena (Matheron, 1969); while the interpolation can be described as the procedure using a geostatistical
method not only for estimating data gaps, but to analyze the performance of a mathematical hypothesis: If the
model is appropriated for describing the behavior of spatial phenomena (Carr, 1989).
2.2.
General procedure for the estimation of data using geostatistical interpolation
Any geostatistical method for interpolation follows a general procedure (David, 1972): (i) Building the
semivariogram. (ii) Fitting an adequate model. (iii) Producing block estimates or estimating data gaps. The
interpolated value depends of the relation between each known spatial data in the next formula for spatial
interpolation (Ly, 2011):
n
i
ii sZgZ )()( (1)
The variable Z (g) can be seen as the value in a location g that will be estimated. Z (s) will be every point
in a location s whose information is known. The parameter is the weight given to a single point. It can be
seen as the result of estimating the importance of the spatial known data.
-
8/9/2019 Geostatistical Interpolation of Rainfall
3/16
-
8/9/2019 Geostatistical Interpolation of Rainfall
4/16
4 Anton, Cardenas, Freire, Moreira/ Geostatistical Interpolation of rainfall
2
2
10
3
22
10
;
0;
0;
5.05.1
0
h
h
hhh
spherical
(5)
0;
0;
3exp1
0
2
10exp
h
hh
onential
(6)
0;3exp1
0;02
2
10 hh
h
gaussian
(7)
The parameters 0, h and 2 are referred to as nugget, distance and range. The semivariogram value atrange 2is equal to the sill 0+1.
2.5.
Kriging Methods and Ordinary Kriging
Kriging methods can be described as formulas or equation systems used to determine the weight of spatial
data taking in count the hypothetical relationship between stations given by a model semivariogram
(Matheron, 1969). In Ordinary Kriging the weights are obtained such that the estimation is unbiased and the
variance is minimized. Ordinary Kriging assumes the next model where is a unknown constant (Goovaerts,
1999):
ns
i
i
i
n
i
iji nj
1
0
1
1
,...,1
(8)
3.Methodology
3.1.
Data setup
Excel files with the measured precipitation data in thirty stations with their coordinates WGS1984 at the
GRB are provided. The data contains daily rainfall data from year 1963 to 2008. Next, a polygon shapefile ofEcuador Basins is downloaded. This is to ensure that the spatial interpolation doesnt catch rainfall data
outside GRB.
Excel lists with all the stations are made with their spatial coordinates and the rainfall data. This new Excel
file is transformed into a point shapefile in ArcMAP.
Finally, the data is able to be visualized and can be fixed depending of the requirements of the Kriging
method. Also, the method becomes more efficient as one got more stations. As the years which have less data
gaps are between 2002 and 2006, the paper covers monthly rainfall data of each March from that range.
-
8/9/2019 Geostatistical Interpolation of Rainfall
5/16
Anton, Cardenas, Freire, Moreira/ Geostatistical Interpolation of rainfall 5
3.2.Computational procedure
ArcMAP possesses a tool called Geostatistical Analyst that must follow a structured process: Represent
the data, exploring the data, fitting a model, performing diagnostics and compare models.
The Geostatistical Wizard has some spatial interpolation methods as Kriging. In the case of OK, the
procedure behind the Geostatistical Wizard starts with the building of the empirical semivariogram given byequations 2 and 3. Next the empirical semivariogram is fitted to a model with equations 4 and then 5, 6 or 7.
After that, the hypothetical spatial relationship between stations is established. Then, with equation 8, the
weight of spatial data is obtained for their use in equation 1.
Equation 1 determines the interpolated value in one point. The Geostatistical Wizard does that same
procedure for many points in a way that a surface map can be built.
Finally, the Geostatistical Wizard builds surface maps for each model. In this case three semivariogram
models are needed for each one of the five lists made in Excel. In total, fifteen surface maps are made.
4.Computational details
4.1.
Data setting
The next files were given: 1) Daily rainfall data for thirty stations at the GRB and 2) A list with the station
coordinates. The data in those files will be combined in a new list made in Excel as the one shown in Fig.1
STATION EAST NORTH PRECIPITATION (mm)
Babahoyo 663179 9801311
Balzar 621542 9850450
Caluma 694159 9820519
Camposano2 566617 9823916
Chiriboga 746874 9976710
El Corazon 714141 9874600
Flavio Alfaro 622159 9950129
Inmoriec Vergel 683167 9913530
Isla Bejucal 650581 9814769
La Cancha 660527 9952918La Capilla 611696 9812512
La Palizada 657185 9939866
Mocache 667564 9869861
Murucumba 650866 9900802
Nobol 608748 9789961
OlmedoManab 587791 9845685
Pedernales 605098 10006412
Pichilingue 667951 9881590
Pilalo 723122 9895638
Presa 639272 9898288
Puebloviejo 662200 9832160
Puerto Ila 684847 9947365
Puerto Limon 681399 9957183
Rio Congo 645203 9885571
San Antonio del Delta Pa 695073 9904015
San Juan La Man 695182 9898707
StoDomingoAirport 700326 9972354
Union 71 668655 9938481
Vinces 639054 9828748
Zapotal (Los Ros) 687140 9850388
Fig. 1: Excel list with rainfall data and coordinates for each station. Precipitation will be monthly for each March from 2002 to 2006.
-
8/9/2019 Geostatistical Interpolation of Rainfall
6/16
6 Anton, Cardenas, Freire, Moreira/ Geostatistical Interpolation of rainfall
The precipitation column must be filled with data from rainfall precipitation of one month only. In this
case, March. Because analysis covers data of each March from 2002 to 2006, the number of lists needed is
five. It is very important to save data as .xls and not .xlsx.
Finally, rainfall data gaps must not be included inside the list. For that, some considerations must be taken
in account: Caluma has data gaps in each March from 2002 to 2006. Chiriboga doesnt belong to the
GRB. Inmoriec Vergel, Isla Bejucal, Pedernales, StoDomingoAirport dont work between 2002 and2006. El Corazon has data gaps in March 2005. Vinces has data gaps in March 2004. Fig. 2 shows how
an Excel list for rainfall data at March 2004 must be done.
STATION EAST NORTH P4
Babahoyo 663179 9801311 314.3
Balzar 621542 9850450 404.6
Camposano2 566617 9823916 259.5
El Corazon 714141 9874600 572.3
Flavio Alfaro 622159 9950129 333
La Cancha 660527 9952918 395.4
La Capilla 611696 9812512 284.8
La Palizada 657185 9939866 264.2
Mocache 667564 9869861 405.3
Murucumba 650866 9900802 356.4
Nobol 608748 9789961 102.4
OlmedoManab 587791 9845685 390
Pichilingue 667951 9881590 366.2
Pil alo 723122 9895638 161.1
Presa 639272 9898288 197.8
Puebloviejo 662200 9832160 468.7
Puerto I la 684847 9947365 244
Puerto Limon 681399 9957183 283.6
Rio Congo 645203 9885571 245.8
San Juan La Man 695182 9898707 443.4
Union 71 668655 9938481 254.02
Zapotal (Los Ros) 687140 9850388 460.9
Fig. 2: Excel list with rainfall data of March 2004. Stations with data gaps were removed from the list.
4.2.
Transforming the data to shapefile
Once the excel list is gotten, ArcMAP must be opened. Using the option Add data, the excel list is
added to ArcMAP. With right click on the excel layer, the option Display XY data must be selected. Next,
fields for the X and Y coordinates must be specified: For X: EAST and for Y: NORTH. The coordinate
system of input coordinates is WGS84 and the UTM Zone 17 S.
Now the excel list is showed also as a point layer with all the stations showed on the map. This layer is
then exported as a new point shapefile. The option Label Features is selected for identifying the stations
names.
A polygon shapefile with Ecuador basins can be downloaded in the IGM website
(http://www.igm.gob.ec/cms/files/downloads/mapafisico.html). After this shapefile is added to the map, only
the GRB is exported to a new layer. By making this, data outside the basin is ignored as Chiriboga station.
Finally, the layer symbology is edited in a way that the points color is graduated from blank for lowest
values, to black for highest values. Measured rainfall data in each can be compared in a punctual way on the
map and it also can be useful for comparing map surfaces. The map in Fig. 3 has the GRB map with the
stations for precipitation data from March 2004.
http://www.igm.gob.ec/cms/files/downloads/mapafisico.htmlhttp://www.igm.gob.ec/cms/files/downloads/mapafisico.htmlhttp://www.igm.gob.ec/cms/files/downloads/mapafisico.html -
8/9/2019 Geostatistical Interpolation of Rainfall
7/16
-
8/9/2019 Geostatistical Interpolation of Rainfall
8/16
8 Anton, Cardenas, Freire, Moreira/ Geostatistical Interpolation of rainfall
The semivariogram/Covariance cloud builds the empirical semivariogram with equations 2 and 3. It allows
examining the spatial autocorrelation between the measured sample points. By selecting points close to the
zero in the x-axis, relationships can be observed (Fig. 5).
Fig. 5: Semivariogram/Covariance cloud. GRBMarch 2004. The selected points have excellent relationship in a way that predicted
values in locations closer to those stations are the most accurate.
4.4.
Fitting a model
For fitting a model, the Geostatistical Wizard must be used. The first step is the selection of the
interpolation method (OK) and selecting the measured sample values as the input data. The output data is of a
prediction type. Transformations and trends removal werent used for the analysis.Next, model variograms (OK-S, OK-E and OK-G with equations 5. 6 and 7) are selected. The model
parameters for each semivariogram can be optimized using an iterative cross validation technique and the
weighted least square algorithm with equation 4. But, if those methods arent reliable, default parameters can
be modified: Nugget, partial sill and lag. In Fig. 6 the empirical semivariogram is being fitted to a model.
-
8/9/2019 Geostatistical Interpolation of Rainfall
9/16
Anton, Cardenas, Freire, Moreira/ Geostatistical Interpolation of rainfall 9
Fig. 6: Semivariogram modeling. GRBMarch 2004. The optimize model option is used to modify the default parameters. The
empirical semivariogram is fitted to an exponential semivariogram model.
The mathematical formula representing the spatial autocorrelation between stations is used in equation 8
multiple times for computing the weights of the rainfall data. And finally, the Geostatistical Wizard uses
equation 1 for plotting the surface map with the results of equation 8 as seen in Fig. 7.
Fig. 7: Prediction map for GRBMarch 2004. Not only were the original stations plotted, but also those with data gaps. The color of the
points has some relationship with the color of the surfaces, denoting some efficacy of the Kriging method.
-
8/9/2019 Geostatistical Interpolation of Rainfall
10/16
10 Anton, Cardenas, Freire, Moreira/ Geostatistical Interpolation of rainfall
4.5.Performing diagnostics
The main criteria used for the diagnostics was to compare the predicted values and their patterns with the
surface map. Another useful step is to remember the criteria used for exploring the data, specially the results
of observing the parameters of the empirical semivariogram.
4.6.Comparing models
Models can be compared by using the option Compare in the surface map layer options. A statistical
criteria used for comparing the best method is to select the one who offers: a) The mean prediction error as
close to zero as possible, b) A root-mean-square standard prediction error close to 1, c) The root-mean-square
error and average standard error as small as possible (Fig. 8). Visual criteria can be to observe if the model
follow the same patterns as the measured sample points.
Fig. 8: Cross validation comparison for two Kriging methods for GRBMarch 2004: OK-E to the left shows a better behavior than OK-
S to the right. The average standard errors in both cases are high.
4.7.
Predicting data gaps
Data gaps can be predicted with the Validation/Prediction tool of the Kriging layer. A new excel list with
all the stations, including those with data gaps, is added to the map and transformed to a point shapefile. A
visual result is shown in Fig. 7. This shapefile is used as the input data next to the Kriging layer. The result
must be a new point layer which has an attribute table with all the stations and predicted rainfall data as seen
in Fig.9.
-
8/9/2019 Geostatistical Interpolation of Rainfall
11/16
Anton, Cardenas, Freire, Moreira/ Geostatistical Interpolation of rainfall 11
STATION EAST NORTH Predicted Precipitation (mm)
Babahoyo 663179 9801311 314.3
Balzar 621542 9850450 404.6
Caluma 694159 9820519 411.8
Camposano2 566617 9823916 259.5
El Corazon 714141 9874600 572.3
Flavio Alfaro 622159 9950129 333.0Inmoriec Vergel 683167 9913530 335.2
Isla Bejucal 650581 9814769 369.1
La Cancha 660527 9952918 395.4
La Capilla 611696 9812512 284.8
La Palizada 657185 9939866 264.2
Mocache 667564 9869861 405.3
Murucumba 650866 9900802 356.4
Nobol 608748 9789961 102.4
OlmedoManab 587791 9845685 390.0
Pedernales 605098 10006412 315.5
Pichilingue 667951 9881590 366.2
Pilalo 723122 9895638 161.1
Presa 639272 9898288 197.8
Puebloviejo 662200 9832160 468.7Puerto Ila 684847 9947365 244.0
Puerto Limon 681399 9957183 283.6
Rio Congo 645203 9885571 245.8
San Antonio del Delta Pa 695073 9904015 403.3
San Juan La Man 695182 9898707 443.4
StoDomingoAirport 700326 9972354 293.5
Union 71 668655 9938481 254.0
Vinces 639054 9828748 372.5
Zapotal (Los Ros) 687140 9850388 460.9
Fig. 9: GRB Stations with their predicted rainfall data for March 2004. The precipitation values units are in mm.
5.
Results and discussion
5.1.
Exploring the data, comparison of models and plotting surface maps
Besides prediction surface maps, prediction standard error maps can be seen inside the prediction layer. In
Fig. 10 there is an example for March 2004. Lighter colors represent better accuracy of the data, while darker
colors represent more standard error. It is observed that the prediction standard error surface map is more
related to the concept of semivariogram as seen in section 4.3 and Fig. 5: The closer the group of stations; the
better becomes the predicted values in the surrounding area. Because the concept is more related to distance
and location of stations, the fact that the area beyond Caluma at the southeast is darker is reasonable. That
means that data in that area is by far, the least accurate. This can be seen in Fig. 10.
-
8/9/2019 Geostatistical Interpolation of Rainfall
12/16
12 Anton, Cardenas, Freire, Moreira/ Geostatistical Interpolation of rainfall
Fig. 10: Prediction Standard Error map for GRB - March 2004. It is seen that predicted data beyond Caluma to the southeast will not be
accurate.
After doing observations at prediction standard error maps, it is seen in general that better predictions are
made between:
La Cancha, La Palizada, Union 71, Puerto Limon and Puerto Ila.
Presa, Murucumba and Rio Congo.
Pichilingue and Mocache.
San Juan La Man, Pilalo, San Antonio del Delta Pa and El Corazon
Camposano2 and OlmedoManab.
OlmedoManab and Balzar
Balzar and Vinces
La Capilla and Nobol La Capilla and Vinces
Vinces and Puebloviejo
Babahoyo and Puebloviejo
Zapotal (Los Ros) and Mocache
-
8/9/2019 Geostatistical Interpolation of Rainfall
13/16
Anton, Cardenas, Freire, Moreira/ Geostatistical Interpolation of rainfall 13
Each one of the three models was compared using the criteria explained in 4.5 and 4.6. The models
depicting better prediction values were: For March 2002: OK-G, 2003: OK-G, 2004: OK-E, 2005: OK-E,
2006: OK-S. After selecting the best model, surface maps are made as seen in Fig. 11.
It can be seen that higher precipitation values can be found in the northern part of the GRB, whereas lower
values were found towards the southwest. That is consistent because there are more vegetation covers in the
northeast up to the Andean foothills. Outcomes from southeast are not good enough due of the reasonexplained before (Fig. 10). In general, better interpolation results could have been better if at least two stations
were located in that area.
In March 2002 there was more rainfall than on the next years, especially to the north. Between 2002 and
2003 there is a notable difference of the rainfall at the north of GRB. In 2004 the rainfall is more distributed in
the central area of GRB than other years. In 2005 and 2006 lower precipitation is more distributed at the
southwest, but 2005 has more precipitation values in that same area than 2006.
In March 2003 there was less rainfall than on other years, followed by March 2004.
Fig. 11: Precipitation Surface maps: Top of the figure, from left to right: 2002 to 2004. Down, from left to right: 2005 and 2006.
-
8/9/2019 Geostatistical Interpolation of Rainfall
14/16
-
8/9/2019 Geostatistical Interpolation of Rainfall
15/16
Anton, Cardenas, Freire, Moreira/ Geostatistical Interpolation of rainfall 15
The Inmoriec Vergel measurement location had the highest variability in terms of rainfall. On the other
hand, Pilalo had the most uniform precipitation values.
It can be seen that results could have an error. That error is given by the Kriging method and the procedure
can be refined by having more stations, so the error can be diminished.
6.
Conclusions and recommendations
Geostatistical methods can be more or less accurate than deterministic methods depending of the variables
taken into account. Deterministic methods as Inverse Distance Weighting rely only on distance. Geostatistical
methods depend on many factors as the spatial relationship between data stations, elevation and environment
behavior.
The advantage of using Kriging methods is that not only the weight for each point is computed, but also
the errors of each weight based on a hypothetical model: a mathematical description of spatial distribution of
rainfall. Those errors are computed thanks to statistics in a way that not only surface prediction maps can be
made as the deterministic methods, but also standard error surface maps: an exclusive feature of geostatistical
methods.
Kriging prediction surface maps can be combined with their standard error maps. This combination can beuseful in a way that not only the precipitation is estimated, but also the reliability of the prediction can be
computed and visually seen.
The empirical semivariogram can be fitted better to any model with the help of iterations of the weighted
least squares criteria. But the fitting depends mainly of the number of stations and observations of phenomena
behavior. That means that the best fitting of a model can be done with the least squares method, but not
always the fittest model gives a better definition of reality. It is recommended having at least sixty measured
stations before Kriging.
For the case of study, the comparison of model semivariograms showed that the Gaussian semivariogram
and the Exponential semivariogram described better the spatial relationship of data. The fact the Gaussian
semivariogram formula is similar to the Exponential one can be related to the random function, in this case,
rainfall.
Spatial distribution of rainfall at Guayas River Basin on March can be described with higher precipitation
values at north and northeast, and lower values to the south, closer to the mouth of the basin. There are cases
as March 2004 that the rainfall is distributed more uniformly from the central area with the highest values, to
the north and south with lower values. This is related with characteristics of the Guayas River Basin as
vegetation covers and forests closer to Santo Domingo or being close to the Andean foothills, giving
orographic rainfall as a result in those zones (From North to East).
It was observed that characteristics of the analyzed zone as elevation and vegetation play an important role
in spatial interpolation of rainfall. This is the basis of other types of Kriging. It is suggested that the study
could also be conducted for other types of Kriging as Universal Kriging, Kriging with External Drift and Co-
Kriging as long as sufficient measurement stations are available (around 60).
Investigation of the influence of data transformation and trend removals in Kriging methods could also be
carried out. A comparison between deterministic and geostatistical methods, like IDW with Kriging can be
done in future research, observing the effects of the number of analyzed stations.
-
8/9/2019 Geostatistical Interpolation of Rainfall
16/16
16 Anton, Cardenas, Freire, Moreira/ Geostatistical Interpolation of rainfall
7.References
Arnell, N.W. C. Liu, R. Compagnucci, L. da Cunha, K. Hanaki, C. Howe, G. Mailu, I Shiklomanov and E. Stakhiv, 2001: Hydrology andwater resources. Climate Change 2001: Impacts, Adaptation and Vulnerability. Contribution of Working Group II to the ThirdAssessment Report of the Intergovernmental Panel on Climate Change, J.J. McCarthy, O.F. Canziani, N.A. Leary, D.J. Dokken andK.S. White, Eds., Cambridge University Press, Cambridge, 191-234.
Carr, J., & Glass, C. (1989). Use of Geostatistics For Accurate Mapping of Earthquake Ground Motion. Geophysical JournalInternational, 97(1), 31-40.
Cressie, N. (1985). Fitting variogram models by weighted least squares. Journal of the International Association for MathematicalGeology, 17(5), 563-586.
David, M., & Blais, R. A. (1972). Geostatistical Ore Reserve Estimation. Application of computer methods in the mineral industry:Proceedings of the tenth International Symposium, Johannesburg, 10th April-14th April, 1972. Johannesburg: South AfricanInstitute of Mining and Metallurgy, 27-34.
Goovaerts, P. (1999). Using elevation to aid the geostatistical mapping of rainfall erosivity. CATENA, 34(3), 227-242.Ly, S., Charles, C., & Degre, A. (2011). Geostatistical interpolation of daily rainfall at catchment scale: the use of several variogram
models in the Ourthe and Ambleve catchments, Belgium. Hydrology & Earth System Sciences, 15(7).Matheron, G. (1969). Cours de geostatistique. Paris: LEcole national suprieure des mines de Paris.Raghunath, H. (2006). Hydrology principles, analysis and design (Rev. 2nd ed.). New Delhi: New Age International (P).