Download - Geostatistical Interpolation of Rainfall

8/9/2019 Geostatistical Interpolation of Rainfall

1/16

Escuela Superior Politcnica del Litoral

Facultad de Ingeniera en Ciencias de la Tierra

Geostatistical Interpolation of rainfall

R. Antna, A. Crdenas

a, C. Freire

a, J. Moreira

a*

aEspol-Fict, Campus Gustavo Galindo, GuayaquilEcuador.

Abstract

The next work follows an explanation of Geostatistics applied to spatial interpolation of rainfall. Concepts as

semivariogram and its parameters, Kriging and Kriging methods are explained in the theory. These concepts are taken in a

computational procedure with ArcGIS applied to a case of study at the Guayas River Basin (GRB). Rainfall data from

the GRB was provided and transformed to a point shapefile. From that data, the empirical semivariogram was built and

fitted to three model semivariograms with Ordinary Kriging. The semivariograms were compared between them to choose

the best model. Surface maps and prediction values for data gaps were computed for monthly precipitation data for each

March from 2002 to 2006. Finally, it was concluded that some additional parameters as trends or some transformation of

data can refine the Kriging process inside ArcGIS and help computing better predicted values. It was also concluded that

the least squares method gives the best fitting of a model, but not always the fittest model gives a better definition of

reality.

Keywords: Geostatistics; Precipitation; Guayas River Basin; Kriging

1.Introduction

Precipitation is an important phase of the hydrologic cycle. It comes from clouds due to a process

condensation which occurs after evaporation. Its the main source of water for many types of water bodies and

river basins (Raghunath, 2006).

One of the most important types of precipitation is rainfall. As seen in nature, in large areas rainfall is not

uniformly distributed; and it varies over time. Rainfall analysis is crucial for applications such as agricultureand infrastructure planning. In that sense, hydrologic sciences methods for estimating rainfall data gaps

include interpolation and extrapolation (Arnell and Liu, 2001).


2/16

2 Anton, Cardenas, Freire, Moreira/ Geostatistical Interpolation of rainfall

Interpolation is the method for estimating information from data points which are known. The use of

interpolation methods to estimate data gaps leads to large uncertainties with respect to the estimate data from

runoff. To reduce uncertainties, spatial interpolation is used for making estimations for hydrological models.

Geostatistics is one of the most popular techniques for spatial interpolation. It can estimate a regionalized

value at un-sampled points with help of a group of observed values. The method uses semi-variograms for

characterizing the spatial dependence in the property of interest. Some important methods for geostatisticalinterpolation of data are Ordinary Kriging (ORK), Universal Kriging (UNK) and Kriging with External Drift

(KED). There are other studies using radar measurements which scientists have proven it can have more

accuracy than simple rainfall data (Ly et al, 2011).

The present study will focus on the comparison of three semivariograms with Ordinary Kriging: Spherical

(OK-S), Exponential (OK-E) and Gaussian (OK-G); applied to monthly rainfall at several gauging stations

across the GRB.

Nomenclature

GRB Guayas River Basin

OK Ordinary Kriging

OK-S Ordinary Kriging with Spherical Semivariogram

OK-E Ordinary Kriging with Exponential Semivariogram

OK-G Ordinary Kriging with Gaussian Semivariogram

2.Theoretical Framework

2.1.

Geostatistics and Geostatistical Interpolation

Geostatistics can be described as the theory of regionalized variables applied to estimating spatialphenomena (Matheron, 1969); while the interpolation can be described as the procedure using a geostatistical

method not only for estimating data gaps, but to analyze the performance of a mathematical hypothesis: If the

model is appropriated for describing the behavior of spatial phenomena (Carr, 1989).

2.2.

General procedure for the estimation of data using geostatistical interpolation

Any geostatistical method for interpolation follows a general procedure (David, 1972): (i) Building the

semivariogram. (ii) Fitting an adequate model. (iii) Producing block estimates or estimating data gaps. The

interpolated value depends of the relation between each known spatial data in the next formula for spatial

interpolation (Ly, 2011):

n

i

ii sZgZ )()( (1)

The variable Z (g) can be seen as the value in a location g that will be estimated. Z (s) will be every point

in a location s whose information is known. The parameter is the weight given to a single point. It can be

seen as the result of estimating the importance of the spatial known data.


3/16


4/16


2

2

10

3

22

10

;

0;

0;

5.05.1

0

h

h

hhh

spherical

(5)

0;

0;

3exp1

0

2

10exp

h

hh

onential

(6)

0;3exp1

0;02

2

10 hh

h

gaussian

(7)

The parameters 0, h and 2 are referred to as nugget, distance and range. The semivariogram value atrange 2is equal to the sill 0+1.

2.5.

Kriging Methods and Ordinary Kriging

Kriging methods can be described as formulas or equation systems used to determine the weight of spatial

data taking in count the hypothetical relationship between stations given by a model semivariogram

(Matheron, 1969). In Ordinary Kriging the weights are obtained such that the estimation is unbiased and the

variance is minimized. Ordinary Kriging assumes the next model where is a unknown constant (Goovaerts,

1999):

ns

i

i

i

n

i

iji nj

1

0

1

1

,...,1

(8)

3.Methodology

3.1.

Data setup

Excel files with the measured precipitation data in thirty stations with their coordinates WGS1984 at the

GRB are provided. The data contains daily rainfall data from year 1963 to 2008. Next, a polygon shapefile ofEcuador Basins is downloaded. This is to ensure that the spatial interpolation doesnt catch rainfall data

outside GRB.

Excel lists with all the stations are made with their spatial coordinates and the rainfall data. This new Excel

file is transformed into a point shapefile in ArcMAP.

Finally, the data is able to be visualized and can be fixed depending of the requirements of the Kriging

method. Also, the method becomes more efficient as one got more stations. As the years which have less data

gaps are between 2002 and 2006, the paper covers monthly rainfall data of each March from that range.


5/16

Anton, Cardenas, Freire, Moreira/ Geostatistical Interpolation of rainfall 5

3.2.Computational procedure

ArcMAP possesses a tool called Geostatistical Analyst that must follow a structured process: Represent

the data, exploring the data, fitting a model, performing diagnostics and compare models.

The Geostatistical Wizard has some spatial interpolation methods as Kriging. In the case of OK, the

procedure behind the Geostatistical Wizard starts with the building of the empirical semivariogram given byequations 2 and 3. Next the empirical semivariogram is fitted to a model with equations 4 and then 5, 6 or 7.

After that, the hypothetical spatial relationship between stations is established. Then, with equation 8, the

weight of spatial data is obtained for their use in equation 1.

Equation 1 determines the interpolated value in one point. The Geostatistical Wizard does that same

procedure for many points in a way that a surface map can be built.

Finally, the Geostatistical Wizard builds surface maps for each model. In this case three semivariogram

models are needed for each one of the five lists made in Excel. In total, fifteen surface maps are made.

4.Computational details

4.1.

Data setting

The next files were given: 1) Daily rainfall data for thirty stations at the GRB and 2) A list with the station

coordinates. The data in those files will be combined in a new list made in Excel as the one shown in Fig.1

STATION EAST NORTH PRECIPITATION (mm)

Babahoyo 663179 9801311

Balzar 621542 9850450

Caluma 694159 9820519

Camposano2 566617 9823916

Chiriboga 746874 9976710

El Corazon 714141 9874600

Flavio Alfaro 622159 9950129

Inmoriec Vergel 683167 9913530

Isla Bejucal 650581 9814769

La Cancha 660527 9952918La Capilla 611696 9812512

La Palizada 657185 9939866

Mocache 667564 9869861

Murucumba 650866 9900802

Nobol 608748 9789961

OlmedoManab 587791 9845685

Pedernales 605098 10006412

Pichilingue 667951 9881590

Pilalo 723122 9895638

Presa 639272 9898288

Puebloviejo 662200 9832160

Puerto Ila 684847 9947365

Puerto Limon 681399 9957183

Rio Congo 645203 9885571

San Antonio del Delta Pa 695073 9904015

San Juan La Man 695182 9898707

StoDomingoAirport 700326 9972354

Union 71 668655 9938481

Vinces 639054 9828748

Zapotal (Los Ros) 687140 9850388

Fig. 1: Excel list with rainfall data and coordinates for each station. Precipitation will be monthly for each March from 2002 to 2006.


6/16


The precipitation column must be filled with data from rainfall precipitation of one month only. In this

case, March. Because analysis covers data of each March from 2002 to 2006, the number of lists needed is

five. It is very important to save data as .xls and not .xlsx.

Finally, rainfall data gaps must not be included inside the list. For that, some considerations must be taken

in account: Caluma has data gaps in each March from 2002 to 2006. Chiriboga doesnt belong to the

GRB. Inmoriec Vergel, Isla Bejucal, Pedernales, StoDomingoAirport dont work between 2002 and2006. El Corazon has data gaps in March 2005. Vinces has data gaps in March 2004. Fig. 2 shows how

an Excel list for rainfall data at March 2004 must be done.

STATION EAST NORTH P4

Babahoyo 663179 9801311 314.3

Balzar 621542 9850450 404.6

Camposano2 566617 9823916 259.5

El Corazon 714141 9874600 572.3

Flavio Alfaro 622159 9950129 333

La Cancha 660527 9952918 395.4

La Capilla 611696 9812512 284.8

La Palizada 657185 9939866 264.2

Mocache 667564 9869861 405.3

Murucumba 650866 9900802 356.4

Nobol 608748 9789961 102.4

OlmedoManab 587791 9845685 390

Pichilingue 667951 9881590 366.2

Pil alo 723122 9895638 161.1

Presa 639272 9898288 197.8

Puebloviejo 662200 9832160 468.7

Puerto I la 684847 9947365 244

Puerto Limon 681399 9957183 283.6

Rio Congo 645203 9885571 245.8

San Juan La Man 695182 9898707 443.4

Union 71 668655 9938481 254.02

Zapotal (Los Ros) 687140 9850388 460.9

Fig. 2: Excel list with rainfall data of March 2004. Stations with data gaps were removed from the list.

4.2.

Transforming the data to shapefile

Once the excel list is gotten, ArcMAP must be opened. Using the option Add data, the excel list is

added to ArcMAP. With right click on the excel layer, the option Display XY data must be selected. Next,

fields for the X and Y coordinates must be specified: For X: EAST and for Y: NORTH. The coordinate

system of input coordinates is WGS84 and the UTM Zone 17 S.

Now the excel list is showed also as a point layer with all the stations showed on the map. This layer is

then exported as a new point shapefile. The option Label Features is selected for identifying the stations

names.

A polygon shapefile with Ecuador basins can be downloaded in the IGM website

(http://www.igm.gob.ec/cms/files/downloads/mapafisico.html). After this shapefile is added to the map, only

the GRB is exported to a new layer. By making this, data outside the basin is ignored as Chiriboga station.

Finally, the layer symbology is edited in a way that the points color is graduated from blank for lowest

values, to black for highest values. Measured rainfall data in each can be compared in a punctual way on the

map and it also can be useful for comparing map surfaces. The map in Fig. 3 has the GRB map with the

stations for precipitation data from March 2004.
http://www.igm.gob.ec/cms/files/downloads/mapafisico.htmlhttp://www.igm.gob.ec/cms/files/downloads/mapafisico.htmlhttp://www.igm.gob.ec/cms/files/downloads/mapafisico.html


7/16


8/16


The semivariogram/Covariance cloud builds the empirical semivariogram with equations 2 and 3. It allows

examining the spatial autocorrelation between the measured sample points. By selecting points close to the

zero in the x-axis, relationships can be observed (Fig. 5).

Fig. 5: Semivariogram/Covariance cloud. GRBMarch 2004. The selected points have excellent relationship in a way that predicted

values in locations closer to those stations are the most accurate.

4.4.

Fitting a model

For fitting a model, the Geostatistical Wizard must be used. The first step is the selection of the

interpolation method (OK) and selecting the measured sample values as the input data. The output data is of a

prediction type. Transformations and trends removal werent used for the analysis.Next, model variograms (OK-S, OK-E and OK-G with equations 5. 6 and 7) are selected. The model

parameters for each semivariogram can be optimized using an iterative cross validation technique and the

weighted least square algorithm with equation 4. But, if those methods arent reliable, default parameters can

be modified: Nugget, partial sill and lag. In Fig. 6 the empirical semivariogram is being fitted to a model.


9/16


Fig. 6: Semivariogram modeling. GRBMarch 2004. The optimize model option is used to modify the default parameters. The

empirical semivariogram is fitted to an exponential semivariogram model.

The mathematical formula representing the spatial autocorrelation between stations is used in equation 8

multiple times for computing the weights of the rainfall data. And finally, the Geostatistical Wizard uses

equation 1 for plotting the surface map with the results of equation 8 as seen in Fig. 7.

Fig. 7: Prediction map for GRBMarch 2004. Not only were the original stations plotted, but also those with data gaps. The color of the

points has some relationship with the color of the surfaces, denoting some efficacy of the Kriging method.


10/16


4.5.Performing diagnostics

The main criteria used for the diagnostics was to compare the predicted values and their patterns with the

surface map. Another useful step is to remember the criteria used for exploring the data, specially the results

of observing the parameters of the empirical semivariogram.

4.6.Comparing models

Models can be compared by using the option Compare in the surface map layer options. A statistical

criteria used for comparing the best method is to select the one who offers: a) The mean prediction error as

close to zero as possible, b) A root-mean-square standard prediction error close to 1, c) The root-mean-square

error and average standard error as small as possible (Fig. 8). Visual criteria can be to observe if the model

follow the same patterns as the measured sample points.

Fig. 8: Cross validation comparison for two Kriging methods for GRBMarch 2004: OK-E to the left shows a better behavior than OK-

S to the right. The average standard errors in both cases are high.

4.7.

Predicting data gaps

Data gaps can be predicted with the Validation/Prediction tool of the Kriging layer. A new excel list with

all the stations, including those with data gaps, is added to the map and transformed to a point shapefile. A

visual result is shown in Fig. 7. This shapefile is used as the input data next to the Kriging layer. The result

must be a new point layer which has an attribute table with all the stations and predicted rainfall data as seen

in Fig.9.


11/16


STATION EAST NORTH Predicted Precipitation (mm)

Babahoyo 663179 9801311 314.3

Balzar 621542 9850450 404.6

Caluma 694159 9820519 411.8

Camposano2 566617 9823916 259.5

El Corazon 714141 9874600 572.3

Flavio Alfaro 622159 9950129 333.0Inmoriec Vergel 683167 9913530 335.2

Isla Bejucal 650581 9814769 369.1

La Cancha 660527 9952918 395.4

La Capilla 611696 9812512 284.8

La Palizada 657185 9939866 264.2

Mocache 667564 9869861 405.3

Murucumba 650866 9900802 356.4

Nobol 608748 9789961 102.4

OlmedoManab 587791 9845685 390.0

Pedernales 605098 10006412 315.5

Pichilingue 667951 9881590 366.2

Pilalo 723122 9895638 161.1

Presa 639272 9898288 197.8

Puebloviejo 662200 9832160 468.7Puerto Ila 684847 9947365 244.0

Puerto Limon 681399 9957183 283.6

Rio Congo 645203 9885571 245.8

San Antonio del Delta Pa 695073 9904015 403.3

San Juan La Man 695182 9898707 443.4

StoDomingoAirport 700326 9972354 293.5

Union 71 668655 9938481 254.0

Vinces 639054 9828748 372.5

Zapotal (Los Ros) 687140 9850388 460.9

Fig. 9: GRB Stations with their predicted rainfall data for March 2004. The precipitation values units are in mm.

5.

Results and discussion

5.1.

Exploring the data, comparison of models and plotting surface maps

Besides prediction surface maps, prediction standard error maps can be seen inside the prediction layer. In

Fig. 10 there is an example for March 2004. Lighter colors represent better accuracy of the data, while darker

colors represent more standard error. It is observed that the prediction standard error surface map is more

related to the concept of semivariogram as seen in section 4.3 and Fig. 5: The closer the group of stations; the

better becomes the predicted values in the surrounding area. Because the concept is more related to distance

and location of stations, the fact that the area beyond Caluma at the southeast is darker is reasonable. That

means that data in that area is by far, the least accurate. This can be seen in Fig. 10.


12/16


Fig. 10: Prediction Standard Error map for GRB - March 2004. It is seen that predicted data beyond Caluma to the southeast will not be

accurate.

After doing observations at prediction standard error maps, it is seen in general that better predictions are

made between:

La Cancha, La Palizada, Union 71, Puerto Limon and Puerto Ila.

Presa, Murucumba and Rio Congo.

Pichilingue and Mocache.

San Juan La Man, Pilalo, San Antonio del Delta Pa and El Corazon

Camposano2 and OlmedoManab.

OlmedoManab and Balzar

Balzar and Vinces

La Capilla and Nobol La Capilla and Vinces

Vinces and Puebloviejo

Babahoyo and Puebloviejo

Zapotal (Los Ros) and Mocache


13/16


Each one of the three models was compared using the criteria explained in 4.5 and 4.6. The models

depicting better prediction values were: For March 2002: OK-G, 2003: OK-G, 2004: OK-E, 2005: OK-E,

2006: OK-S. After selecting the best model, surface maps are made as seen in Fig. 11.

It can be seen that higher precipitation values can be found in the northern part of the GRB, whereas lower

values were found towards the southwest. That is consistent because there are more vegetation covers in the

northeast up to the Andean foothills. Outcomes from southeast are not good enough due of the reasonexplained before (Fig. 10). In general, better interpolation results could have been better if at least two stations

were located in that area.

In March 2002 there was more rainfall than on the next years, especially to the north. Between 2002 and

2003 there is a notable difference of the rainfall at the north of GRB. In 2004 the rainfall is more distributed in

the central area of GRB than other years. In 2005 and 2006 lower precipitation is more distributed at the

southwest, but 2005 has more precipitation values in that same area than 2006.

In March 2003 there was less rainfall than on other years, followed by March 2004.

Fig. 11: Precipitation Surface maps: Top of the figure, from left to right: 2002 to 2004. Down, from left to right: 2005 and 2006.


14/16


15/16


The Inmoriec Vergel measurement location had the highest variability in terms of rainfall. On the other

hand, Pilalo had the most uniform precipitation values.

It can be seen that results could have an error. That error is given by the Kriging method and the procedure

can be refined by having more stations, so the error can be diminished.

6.

Conclusions and recommendations

Geostatistical methods can be more or less accurate than deterministic methods depending of the variables

taken into account. Deterministic methods as Inverse Distance Weighting rely only on distance. Geostatistical

methods depend on many factors as the spatial relationship between data stations, elevation and environment

behavior.

The advantage of using Kriging methods is that not only the weight for each point is computed, but also

the errors of each weight based on a hypothetical model: a mathematical description of spatial distribution of

rainfall. Those errors are computed thanks to statistics in a way that not only surface prediction maps can be

made as the deterministic methods, but also standard error surface maps: an exclusive feature of geostatistical

methods.

Kriging prediction surface maps can be combined with their standard error maps. This combination can beuseful in a way that not only the precipitation is estimated, but also the reliability of the prediction can be

computed and visually seen.

The empirical semivariogram can be fitted better to any model with the help of iterations of the weighted

least squares criteria. But the fitting depends mainly of the number of stations and observations of phenomena

behavior. That means that the best fitting of a model can be done with the least squares method, but not

always the fittest model gives a better definition of reality. It is recommended having at least sixty measured

stations before Kriging.

For the case of study, the comparison of model semivariograms showed that the Gaussian semivariogram

and the Exponential semivariogram described better the spatial relationship of data. The fact the Gaussian

semivariogram formula is similar to the Exponential one can be related to the random function, in this case,

rainfall.

Spatial distribution of rainfall at Guayas River Basin on March can be described with higher precipitation

values at north and northeast, and lower values to the south, closer to the mouth of the basin. There are cases

as March 2004 that the rainfall is distributed more uniformly from the central area with the highest values, to

the north and south with lower values. This is related with characteristics of the Guayas River Basin as

vegetation covers and forests closer to Santo Domingo or being close to the Andean foothills, giving

orographic rainfall as a result in those zones (From North to East).

It was observed that characteristics of the analyzed zone as elevation and vegetation play an important role

in spatial interpolation of rainfall. This is the basis of other types of Kriging. It is suggested that the study

could also be conducted for other types of Kriging as Universal Kriging, Kriging with External Drift and Co-

Kriging as long as sufficient measurement stations are available (around 60).

Investigation of the influence of data transformation and trend removals in Kriging methods could also be

carried out. A comparison between deterministic and geostatistical methods, like IDW with Kriging can be

done in future research, observing the effects of the number of analyzed stations.


16/16


7.References

Arnell, N.W. C. Liu, R. Compagnucci, L. da Cunha, K. Hanaki, C. Howe, G. Mailu, I Shiklomanov and E. Stakhiv, 2001: Hydrology andwater resources. Climate Change 2001: Impacts, Adaptation and Vulnerability. Contribution of Working Group II to the ThirdAssessment Report of the Intergovernmental Panel on Climate Change, J.J. McCarthy, O.F. Canziani, N.A. Leary, D.J. Dokken andK.S. White, Eds., Cambridge University Press, Cambridge, 191-234.

Carr, J., & Glass, C. (1989). Use of Geostatistics For Accurate Mapping of Earthquake Ground Motion. Geophysical JournalInternational, 97(1), 31-40.

Cressie, N. (1985). Fitting variogram models by weighted least squares. Journal of the International Association for MathematicalGeology, 17(5), 563-586.

David, M., & Blais, R. A. (1972). Geostatistical Ore Reserve Estimation. Application of computer methods in the mineral industry:Proceedings of the tenth International Symposium, Johannesburg, 10th April-14th April, 1972. Johannesburg: South AfricanInstitute of Mining and Metallurgy, 27-34.

Goovaerts, P. (1999). Using elevation to aid the geostatistical mapping of rainfall erosivity. CATENA, 34(3), 227-242.Ly, S., Charles, C., & Degre, A. (2011). Geostatistical interpolation of daily rainfall at catchment scale: the use of several variogram

models in the Ourthe and Ambleve catchments, Belgium. Hydrology & Earth System Sciences, 15(7).Matheron, G. (1969). Cours de geostatistique. Paris: LEcole national suprieure des mines de Paris.Raghunath, H. (2006). Hydrology principles, analysis and design (Rev. 2nd ed.). New Delhi: New Age International (P).

Download - Geostatistical Interpolation of Rainfall

Top Related