research article interpolation of missing precipitation...

13
Research Article Interpolation of Missing Precipitation Data Using Kernel Estimations for Hydrologic Modeling Hyojin Lee 1 and Kwangmin Kang 2 1 APEC Climate Center, 12 Centum 7-ro, Haeundae-gu, Busan 612-020, Republic of Korea 2 School of Agriculture and Natural Science, University of Maryland, College Park, MD 20742, USA Correspondence should be addressed to Kwangmin Kang; [email protected] Received 17 October 2014; Revised 15 January 2015; Accepted 5 February 2015 Academic Editor: Fubao Sun Copyright © 2015 H. Lee and K. Kang. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Precipitation is the main factor that drives hydrologic modeling; therefore, missing precipitation data can cause malfunctions in hydrologic modeling. Although interpolation of missing precipitation data is recognized as an important research topic, only a few methods follow a regression approach. In this study, daily precipitation data were interpolated using five different kernel functions, namely, Epanechnikov, Quartic, Triweight, Tricube, and Cosine, to estimate missing precipitation data. is study also presents an assessment that compares estimation of missing precipitation data through th nearest neighborhood (NN) regression to the five different kernel estimations and their performance in simulating streamflow using the Soil Water Assessment Tool (SWAT) hydrologic model. e results show that the kernel approaches provide higher quality interpolation of precipitation data compared with the NN regression approach, in terms of both statistical data assessment and hydrologic modeling performance. 1. Introduction Precipitation data are key factors in hydrologic modeling for estimating rainfall-runoff mechanism [1]. Malfunctions in running hydrologic modeling can occur due to non- continuous time series precipitation inputs. In light of this important issue, estimation of missing precipitation data is a challenging task for hydrologic modeling. Many hydrologic modeling require interpolation of missing precipitation data [2], meteorological data series completion [3], or imputation of meteorological data [4]. To estimate missing precipitation, researchers should consider spatiotemporal variations in precipitation (rainfall and snowfall) values and the related physical processes. However, accounting for spatial-temporal variation and physical processes can be difficult if there is a lack of equipment for measuring precipitation. us, statistical approaches have emerged as widely used methods for filling in missing precipitation data [5]. Many studies have investigated supplanting missing streamflow data with several statistical approaches [5], but there are limited studies on the interpolation of incom- plete precipitation and temperature data [610]. Recently, the investigation of artificial neural networks (ANNs: [11]), a more advanced statistical approach, to estimate missing precipitation data, has been proposed [12]. ANNs can learn from training data to reconstruct a nonlinear relationship and obtain values for missing data. Pisoni et al. [13] investigated the interpolation of missing data for sea surface temperature (SST) satellite images using the ANN method; they found that the results from the ANN approach show better accuracy than the results from an interpolation system, as suggested by Seze and Desbois (1987). Nevertheless, ANNs are still under dispute because their neuron systems cannot provide clear relationships between data [14]. e American Society of Civil Engineers (ASCE) Task Committee [15] discussed that although the performance of ANNs for estimating missing precipitation data has already been verified, an alternate solution should be suggested for cases in which the available data are insufficient due to the reliance of ANNs on high data quality and quantity. Additionally, ANNs have other limitations, such as a lack of physical concepts and relations, based on the experience and preferences of those using, studying, and training the networks [1517]. Since ANNs are regarded as black-box Hindawi Publishing Corporation Advances in Meteorology Volume 2015, Article ID 935868, 12 pages http://dx.doi.org/10.1155/2015/935868

Upload: others

Post on 20-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Interpolation of Missing Precipitation ...downloads.hindawi.com/journals/amete/2015/935868.pdf · withthe NN regression approach, in terms of both statistical data

Research ArticleInterpolation of Missing Precipitation Data Using KernelEstimations for Hydrologic Modeling

Hyojin Lee1 and Kwangmin Kang2

1APEC Climate Center 12 Centum 7-ro Haeundae-gu Busan 612-020 Republic of Korea2School of Agriculture and Natural Science University of Maryland College Park MD 20742 USA

Correspondence should be addressed to Kwangmin Kang hbkangkmgmailcom

Received 17 October 2014 Revised 15 January 2015 Accepted 5 February 2015

Academic Editor Fubao Sun

Copyright copy 2015 H Lee and K KangThis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Precipitation is the main factor that drives hydrologic modeling therefore missing precipitation data can cause malfunctions inhydrologic modeling Although interpolation of missing precipitation data is recognized as an important research topic only a fewmethods follow a regression approach In this study daily precipitation data were interpolated using five different kernel functionsnamely Epanechnikov Quartic Triweight Tricube and Cosine to estimate missing precipitation data This study also presents anassessment that compares estimation of missing precipitation data through 119870th nearest neighborhood (119870NN) regression to thefive different kernel estimations and their performance in simulating streamflow using the Soil Water Assessment Tool (SWAT)hydrologic model The results show that the kernel approaches provide higher quality interpolation of precipitation data comparedwith the 119870NN regression approach in terms of both statistical data assessment and hydrologic modeling performance

1 Introduction

Precipitation data are key factors in hydrologic modelingfor estimating rainfall-runoff mechanism [1] Malfunctionsin running hydrologic modeling can occur due to non-continuous time series precipitation inputs In light of thisimportant issue estimation of missing precipitation data isa challenging task for hydrologic modeling Many hydrologicmodeling require interpolation of missing precipitation data[2] meteorological data series completion [3] or imputationof meteorological data [4] To estimate missing precipitationresearchers should consider spatiotemporal variations inprecipitation (rainfall and snowfall) values and the relatedphysical processes However accounting for spatial-temporalvariation and physical processes can be difficult if thereis a lack of equipment for measuring precipitation Thusstatistical approaches have emerged as widely used methodsfor filling in missing precipitation data [5]

Many studies have investigated supplanting missingstreamflow data with several statistical approaches [5] butthere are limited studies on the interpolation of incom-plete precipitation and temperature data [6ndash10] Recently

the investigation of artificial neural networks (ANNs [11])a more advanced statistical approach to estimate missingprecipitation data has been proposed [12] ANNs can learnfrom training data to reconstruct a nonlinear relationship andobtain values for missing data Pisoni et al [13] investigatedthe interpolation of missing data for sea surface temperature(SST) satellite images using the ANN method they foundthat the results from the ANN approach show better accuracythan the results from an interpolation system as suggested bySeze and Desbois (1987) Nevertheless ANNs are still underdispute because their neuron systems cannot provide clearrelationships between data [14]

The American Society of Civil Engineers (ASCE) TaskCommittee [15] discussed that although the performance ofANNs for estimating missing precipitation data has alreadybeen verified an alternate solution should be suggested forcases in which the available data are insufficient due tothe reliance of ANNs on high data quality and quantityAdditionally ANNs have other limitations such as a lackof physical concepts and relations based on the experienceand preferences of those using studying and training thenetworks [15ndash17] Since ANNs are regarded as black-box

Hindawi Publishing CorporationAdvances in MeteorologyVolume 2015 Article ID 935868 12 pageshttpdxdoiorg1011552015935868

2 Advances in Meteorology

model [18] it is difficult to use this method for realizingmore linear relationships even though ANNs can achieveconvergence for almost any problem [17]Thus for realmech-anisms in hydrologic models in which linear relationshipsexist between series of weather inputs the solution is lessexplicit [19]

Generally a regression or a distance weighted methodis most commonly used for estimating missing precipitationfor hydrologic modeling [20] Daly et al [21] also proposea variety of regression models to incorporate spatial vari-ation in weather data However Creutin et al [22] foundthat even though simple linear regression of interpolationapproaches show satisfactory serial correlation of daily ormonthly streamflow precipitation patterns do not showproper correlation when simple linear regression or inter-polation approaches are used Furthermore if a regressionmethod is used for estimating missing precipitation to makerefined precipitation time series a small data sample wouldnot follow the normal distribution based on basic theory oflinear regression

Another approach for estimating missing precipitationdata to use neighboring data is based on distance weightXia et al [23] used the closest station to reconstruct missingprecipitation data through geometrical distance weightWill-mott et al [24] used arithmetic data averaging fromneighbor-ing data to filling missing precipitation and Teegavarapu andChandramouli [25] used an inverse distance weight methodfrom neighboring data to estimate missing precipitationdata Smith [26] Simanton and Osborn [27] and Salas [28]suggest that traditional weighting and data-driven methodsnamely distance based weighting methods are interpolatedfor estimating missing precipitation data Distance weightapproaches for estimating missing precipitation data arecombined with linear regression and median distributionof regression [29 30] Young [31] and Filippini et al [32]suggested spatially interpolating the correlation to defineweight in terms of each station

Estimation of missing precipitation data is possible whendata are available for the same location Linacre (1992)investigated the interpolation ofmissing precipitation data byusing the mean value of a data series at the same locationand Lowry [33] suggested simple interpolation betweenavailable data series Acock and Pachepsky [34] used datafrom several days before and after missing precipitation datapoints for estimating the incomplete precipitation data 119870-nearest neighborhood (119896nn) regression is a basic method forestimating missing precipitation data that considers vicinityHowever the method has some weaknesses when the datahave outliers or a nonlinear trend exists around the missingdata While 119896nn regression has a fundamental assumption tofollow a normal distribution which is statistically unsoundthe kernel method uses a mean value which can overcome119896nn regressionrsquos weakness through the kernel weightingmethod By using neighbor data in a kernel function eventhough the data show a nonlinear trend it can overcome 119896nnregression weakness

The objective of this study was to reconstruct dailyprecipitation data by using five different kernel functions(Epanechnikov Quartic Triweight Tricube and Cosine) to

estimate missing precipitation data This study also presentsan assessment that compares estimation of missing precipita-tion data through 119896nn regression to the five different kernelestimations and their performance in simulating streamflowusing the Soil Water Assessment Tool (SWAT) hydrologicmodel The remainder of this paper is organized as followsSection 2 provides a description of the study area and thehydrologic model In Section 3 the methodology of the fivedifferent kernel methods is presented Section 4 presents theresults of the interpolation of the missing daily precipitationdata and the hydrologic model simulation Finally conclu-sions are in Section 5

2 Study Area and Hydrologic Model

The Imha (Figure 1) watershed was selected as the test bedfor this study The Imha watershed is a tributary of theNakdong River basin and is located in the upper side of theNakdong River basin in South Korea It is characterized bya mountainous area approximately 798 of the total area of1361 km2 is mountainous The slope in the Imha watershedis 40 to 60 that is 655 km2 as 33 of total watershedarea The elevation of the Imha watershed ranges from80 to 1215m The average annual precipitation minimumtemperature maximum temperature humidity and windspeed for the Imha watershed are 1050mm 7∘C 188∘C65 and 16ms respectively (Water Management Infor-mation System (WAMIS) httpwwwwamisgokr) Sincethe climate conditions in this area are defined by warmtemperatures there is no precipitation in the form of snowall precipitation consists of rainfall For this evaluation ofinterpolation of precipitation data and hydrologic model per-formance precipitation and streamflow gauges were selectedas shown in Figure 1 and precipitation and streamflow datawere sourced from the Water Management Information Sys-tem (httpwwwwamisgokr)

This study selected the SWAT model for analysis SWAThas a GIS extension ArcSWAT which allows the use ofvarious GIS based datasets to model the geomorphologyof a given basin The SWAT model was developed throughresearch by the USDA (United States Department of Agricul-ture) Agricultural Research Service (ARS)Major data inputsfor SWAT include temperature (maximum and minimum)daily precipitation solar radiation relative humidity windspeed and geospatial data representing soil types land coverand elevation A watershed is divided into smaller subbasinswhich must be broken up into smaller units known as hydro-logic response units (HRU) Each of these HRUs is char-acterized by uniform land use and soil type SWAT can beused to accurately predict hydrologic patterns for extendedperiods of time [35] Canopy interception is implicit in thecurve number (CN) method and is explicit for the Green-Ampt method Infiltration is most accurately accounted forusing the CN method in SWAT An alternative method maybe used to account for infiltration is theGreen-AmptmethodHowever the Green-Ampt method has not been shown toincrease accuracy over the CN method thus the CN methodwas used in this study

Advances in Meteorology 3

Precipitation gaugeStreamflow gaugeImha

(a)

Precipitation gauge

0 750015000 30000(m)

Streamflow gaugeImha

N

(b)

Figure 1 Study basin locations including rain and stream gauges (left figure map of South Korea right figure Imha watershed)

3 Methodology

This study used the five kernel functions EpanechnikovQuartic Triweight Tricube and Cosine as a weight to pre-dict missing values Tricube method has large weight aroundtarget point Even though Tricube weight is similar to Tri-weight the decreasing acceleration of weight as far away fromtarget point is less than Triweight Next higher weight aroundtarget point is Quartic which speed in decreasing weight issimilar to Triweight Both Epanechnikov and Cosine havesmall effect on neighboring values A brief description of thefive kernel functions and their application for reconstructingthe missing values is presented in the following and specifickernel functions are described in Appendix A

31 Epanechnikov TheEpanechnikov kernel is themost oftenused kernel function The Epanechnikov kernel assigns zeroweight to observations that are a distance of four six andeight away from the reference pointThese values correspondto the choice of the interval width This is often called thechoice of smoothing parameter or band width selectionThe main character of the Epanechnikov kernel is that eventhough the distance is far away from target value namelythe missing value in this research its estimation is smoothA brief description is given by the following

119870 (119909) =3

4(1 minus 119909

2) (1)

where 119870(119909) is the kernel function and 119909 is surrounding thenearest value as an independent in data

32 Quartic The second kernel function used in this re-search was the Quartic kernel which has more weight sen-sitivity based on distance from the missing value Since theapplied weight is largely different between near and far datapoints it is more influenced by surrounding data It consistsof a fourth-order equation which has more sensitivity interms of distance than second-order equation It is describedby the following

119870 (119909) =15

16(1 minus 119909

2)2

(2)

33 Triweight The third kernel function used in this researchwas the Triweight kernel which consists of a sixth-orderequation It has the most sensitivity in terms of distancebecause a sixth-order equation estimates the missing valuebased on the difference in distance with a weighted functionas shown by the following

119870 (119909) =35

32(1 minus 119909

2)3

(3)

34 Tricube The fourth kernel function used in this researchwas the Tricube kernel which uses absolute values Sinceit uses absolute values it presents a smoother patternfor nearest values than the Triweight kernel However as

4 Advances in Meteorology

Table 1 Results of normality test with Shapiro-Wilk method for each119870-nearest neighborhood DF represents degree of freedom and 119875 valuemeans significance probability

4-NN 6-NN 8-NN119882 DF 119875 value 119882 DF 119875 value 119882 DF 119875 value

Ep 0808 19 00015 0740 19 00002 0766 19 00004Qu 0831 19 00033 0768 19 00004 0721 19 00001Tw 0827 19 00029 0789 19 00008 0745 19 00002Tc 0839 19 00045 0764 19 00004 0742 19 00002Co 0817 19 00020 0742 19 00002 0763 19 00003Reg 0876 19 00186 0858 19 00089 0883 19 00242(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine and Reg regression)

the valuesmove further away from the nearest values it showsa steep trend The Tricube kernel has the most sensitivity interms of weighted distance due to the fact that it consists of aninth-order equation as shown in the following

119870 (119909) =70

81(1 minus |119909|

3)3

(4)

35 Cosine The fifth kernel function used in this researchwas the Cosine kernel function It is a widely applied kernelfunction in various fields because it has a constant curvatureIts shape is similar to the Epanechnikov kernel even thoughit uses a cosine function as shown in the following

119870 (119909) =120587

4cos(120587

2119909) (5)

36 Calculation of the Missing Value After using a kernelfunction to calculate the weight of the missing data estima-tion of the missing data is performed using the following

119872 =1

119875

119875

sum

119894=1

119909119894sdot 119870 (119906

119894)

119906119894=

119873119894

05119875 + 1

119873119894= minus

119875

2

119875

2

(6)

where 119872 is the missing value 119875 is the number of thenearest neighborhood and 119906

119894is the119873th nearest values which

correspond to 119909119894(positive means the right side and negative

means the left side)The kernel function should have bilateralsymmetry based on a value of zero If using for example thefour nearest neighborhoods for estimating the missing valuethe neighborhood values used will be two from right sideand another two from left side The specific equation for thisexample is shown in the following and example calculation isdescribed in Appendix B

119872 =1

4119870(minus

2

3) sdot 1199091+ 119870(minus

1

3) sdot 1199092

+ 119870(1

3) sdot 1199093+ 119870(

2

3) sdot 1199094

(7)

37 Statistic Tests A normality test is required to evaluatefor infilling the methods for filling in interpolation dataThe Shapiro-Wilk [36] normality test was used with nineteensamples to determine whether the average difference isnormally distributed or not The test statistic is as shown inthe following

119882 =(sum119899

119894=1119886119894119910119894)2

sum119899

119894=1(119910119894minus 119910)2 (8)

where 119910119894is the 119894th order statistic namely the 119894th smallest

value in the sample 119910 is the mean of 119910119894 and 119886

119894is a constant

given by ordered data The null hypothesis of the Shapiro-Wilk normality test is that sample is normally distributed andif significance probability is less than 5 the null hypothesiswill be denied meaning the sample does not satisfy normaldistribution Since the significance probability for the entiregroup (Table 1) is below 5 the null hypothesis is deniedThis study should therefore use a nonparametric test fornormality analysis

The Friedman test [37] which is a kind of 119896-sample testthat can provide the difference between paired values wasselected as a nonparametric test This method evaluates asmall sample for differences by ranking a sequence list Thenull hypothesis of the Friedman test is that there is no averagedifference in each group and if the significance probabilityis less than 5 the null hypothesis will be denied thusconducting that in each group exists an average differenceA brief description of Friedman test is in the following

119876 =SStSSe

(9)

where SSt and SSe are the sum of the squared treatment andsum of the squared error respectively

The null hypothesis in this instance was denied becausethe significance probability was less than 5 for each andthis study concluded that each interpolation method has anaverage difference which is why each method is consideredindependent even though this study used five different kernelmethods For example the average rank for four referencepoints for 119896nn-regression Tricube Quartic Cosine Tri-weight and Epanechnikov varies from a large average toa small average rank (Table 2) For six reference points

Advances in Meteorology 5

Table 2 Chi-square (Χ2) test with Friedman method for finding difference among six infilling methods SD represents standard deviation119875 value means significance probability

(a) 4-NN

119873 Mean SD Min Max value Mean rank Χ2

119875 valueEp 19 minus129 446 minus1550 576 253

55602 00000

Qu 19 minus142 503 minus1506 751 274Tw 19 minus164 537 minus1490 813 258Tc 19 minus118 506 minus1487 829 447Co 19 minus140 469 minus1542 608 268Reg 19 276 542 minus1347 1438 600

(b) 6-NN

119873 Mean SD Min Max value Mean rank Χ2

119875 valueEp 19 minus261 472 minus1668 158 153

66519 00000

Qu 19 minus227 471 minus1620 282 316Tw 19 minus218 484 minus1589 406 379Tc 19 minus215 463 minus1620 284 421Co 19 minus254 469 minus1659 150 232Reg 19 006 497 minus1548 733 600

(c) 8-NN

119873 Mean SD Min Max value Mean rank Χ2

119875 valueEp 19 minus340 504 minus1733 194 132

75812 00000

Qu 19 minus310 475 minus1690 045 321Tw 19 minus274 479 minus1659 129 458Tc 19 minus293 477 minus1696 151 368Co 19 minus328 493 minus1725 185 221Reg 19 minus124 535 minus1649 808 600(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine and Reg regression)

the 119896nn-regression Tricube Triweight Quartic Cosine andEpanechnikov were ranked as shown in Table 2 In anotherexample eight reference points used 119896nn-regression Tri-weight Quartic Cosine and Epanechnikov average rank(Table 2) As shown in Table 2 the 119896nn-regression has thelargest average rank and Epanechnikov has the smallest rankaverage for all of the reference point cases This result provesthe dissimilarity of these methods

To determine which methods are dissimilar to the othersthis study performed theWilcoxon signed rank test [38]Thebasic feature of the Wilcoxon signed rank test is that datasamples that come from the same population are paired andit is detailed in the following

119882 =

1003816100381610038161003816100381610038161003816100381610038161003816

119873

sum

119894=1

[sign (1199102119894minus 1199101119894) 119877119894]

1003816100381610038161003816100381610038161003816100381610038161003816

(10)

where 119873 is the sample size 1199102119894

is 119894th value of the seconddata point 119910

1119894is 119894th value of the first data point and 119877

119894

is the rank of |1199102119894minus 1199101119894| If the 119882 value is less than 5

it means there is different mechanism used on the sampledata or method Table 3 shows that the 119882 value for 119896nn-regression is less than 5 for all cases Accordingly thissignifies that 119896nn-regression is completely dissimilar to theothermethodsAlthough the five different kernelmethods for

data interpolation exhibit similarity or dissimilarity to eachother depending on the number of reference points all ofthe kernelmethods can be distinguished from 119896nn-regressionusing the Wilcoxon signed rank test

4 Results

Since Epanechnikov has the smallest average rank whichsignifies a small difference between the observation valueand the interpolated value for all reference points in Table 2interpolation data obtained from the Epanechnikov methodhas the best result among the studied methods Figure 2shows that filling in data from 119896nn-regression has a largedifference at both four and six reference points Interpolationdata from the kernel methods are close to zero for both theaverage and median values at four reference points meaningthat the interpolation data are similar to the observation dataOn the other hand more than 75 of the interpolation datafrom 119896nn-regression exhibits a difference than zero Whenthe interpolation data are evaluated at six reference points inFigure 2 the median value from the 119896nn-regression is shownto be far away from zero At eight reference points 119896nn-regression is close to zero for both average andmedian valueshowever it is difficult to conclude that this is an ideal method

6 Advances in Meteorology

Table 3 Chi-square (Χ2) test with Wilcoxon signed rank method between regression and five different kernel methods

(a) 4-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(b) 6-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(c) 8-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine and Reg regression)

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(a) 4-NN

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(b) 6-NN

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(c) 8-NN

Figure 2 Box plots for difference between actual precipitation and interpolated precipitation 119910-axis represents mm per day

because outlying maximum values will affect the average andmedian value

This study on precipitation data interpolation also eval-uated the simulation of the interpolated data using theSWAT hydrologic model In SWAT hydrologic modelingthe surface runoff is estimated by considering excess pre-cipitation with abstractions and infiltration factor throughSoil Conservation Service CurveNumber (SCS-CN)method

Green-Ampt (GA) infiltration method is another methodto calculate the surface runoff in SWAT A study showsthat both methods give reasonable results and there is nosignificant advantage observed in using one over the otherHowever the GAmethod appears to havemore limitations inmodeling seasonal variability than the SCS-CNmethod doesHence the SCS-CN method is used for infiltration factor inthis study An SCS curve number based simulation needs

Advances in Meteorology 7

0

100

200

300

400

500

60020

08-0

1-01

2008

-01-

2220

08-0

2-12

2008

-03-

0420

08-0

3-25

2008

-04-

1520

08-0

5-06

2008

-05-

2720

08-0

6-17

2008

-07-

0820

08-0

7-29

2008

-08-

1920

08-0

9-09

2008

-09-

3020

08-1

0-21

2008

-11-

1120

08-1

2-02

2008

-12-

2320

09-0

1-13

2009

-02-

0320

09-0

2-24

2009

-03-

1720

09-0

4-07

2009

-04-

2820

09-0

5-19

2009

-06-

0920

09-0

6-30

2009

-07-

2120

09-0

8-11

2009

-09-

0120

09-0

9-22

2009

-10-

1320

09-1

1-03

2009

-11-

2420

09-1

2-15

2010

-01-

0520

10-0

1-26

2010

-02-

1620

10-0

3-09

2010

-03-

3020

10-0

4-20

2010

-05-

1120

10-0

6-01

2010

-06-

2220

10-0

7-13

2010

-08-

0320

10-0

8-24

2010

-09-

1420

10-1

0-05

2010

-10-

2620

10-1

1-16

2010

-12-

0720

10-1

2-28

ObsSim

Figure 3 Calibrated model result using original precipitation input 119909-axis represents time in days and 119910-axis represents flow in cubic metersper second

Table 4 Details of SWAT parameters which are related to runoff mechanism for Imha watershed

Parameter Description Selected valueESCO Soil evaporation compensation factor 09500EPCO Plant water uptake compensation factor 10000EVLAI Leaf area index at which no evaporation occurs from water surface [m2m2

] 30000FFCB Initial soil water storage expressed as a fraction of field capacity water content 00000IEVENT Rainfallrunoff code 0 = daily rainfallCN 00000ICRK Crack flow code 1 = model crack flow in soil 00000SURLAG Surface runoff lag time [days] 40000ADJ PKR Peak rate adjustment factor for sediment routing in the subbasin (tributary channels) 00000PRF Peak rate adjustment factor for sediment routing in the main channel 10000

SPCON Linear parameter for calculating the maximum amount of sediment that can be reentrainedduring channel sediment 00001

SPEXP Exponent parameter for calculating sediment reentrained in channel sediment routing 10000

time step updated information as soil water content changesExcess rainfall equation in SCS-CN method was generatedbased on historical relationship between the curve numberand the hydrologic mechanism for over 20 years Through-out the surface runoff calculation infiltration should beupdated over time according to the soil type Other abstrac-tions such as evapotranspiration and soil and snow evap-oration are calculated by Penman-Monteith method and

meteorological statistics Finally the kinematic storagemodelis used to compute groundwater storage and seepage Flowresulting in SWAT modeling is routed HRUs to watershedoutlet Figure 3 shows the calibration of themodel simulationas the initial step and the specific parameters are describedin Table 4 After the calibration of the SWAT model thesix different interpolated precipitation datasets with threedifferent reference ranges for each (a total of twenty-four

8 Advances in Meteorology

Table 5 Details of simulation results with six different precipitation infilling methods in Imha watershed

4-NN 6-NN 8-NN119864NS 119877

2 RMSE 119864NS 1198772 RMSE 119864NS 119877

2 RMSEEp 080 083 1532 091 092 1060 078 082 1616Qu 073 078 1783 091 093 1048 088 092 1180Tw 091 091 1056 093 094 925 095 095 810Tc 095 095 772 093 094 903 095 095 764Co 093 094 883 095 095 772 091 093 1044Reg 069 080 1914 021 065 3071 071 073 2148

interpolated precipitations data points) were used to assessthe performance of interpolated precipitation data for hydro-logic model simulation Streamflow simulations were donefor three years from 2008 to 2010 To evaluate the modelperformance considering the use of different interpolatedprecipitation datasets this study used 119864NS (Nash-Sutcliffecoefficient) 119877-square (coefficient of determination) andRMSE (root mean square error) Table 5 and Figure 4 showthat the simulation results from 119896nn-regression exhibit lowSWAT simulation performance for streamflow estimationswith 054119864NS 074119877-square and 2378m3s RMSE as anaverage All of the kernel functions on the other handexhibit good performance for hydrologic simulations withinterpolated precipitation data (Table 5 and Figure 4) theaverage of 119864NS 119877-square and RMSE (1) for Epanechnikov is083 086 and 1403m3s (2) for Quartic is 084 088 and1303m3s (3) for Triweight is 093 093 and 930m3s (4)for Tricube is 094 095 and 813m3s and (5) for Cosine is093 094 and 900m3s respectively

5 Conclusions

Five different kernel functions were applied to the Imhawatershed to evaluate the performance of each weightedmethod for estimating missing precipitation data and the useof interpolated data for hydrologic simulations was assessedThe following conclusions can be drawn from this research

(1) To estimate missing precipitation data points explor-atory procedures should consider the spatiotempo-ral variations of precipitation Due to difficulty onaccounting for these variations statistical methodsfor estimating missing precipitation data are com-monly used

(2) Although ANNs are an advanced approach for esti-matingmissing datamechanisms are unclear becausethe neuron system is ultimately a black-box modelThus regressionmethods are widely used for estimat-ing missing data even though there are limitationsin that regression methods cannot follow normaldistribution when the sample is small

(3) When using kernel functions as a weighted methodestimated missing data would satisfy normal distri-bution which is more statistically sound Also kernelmethods can overcome weakness in 119896nn-regression if

the data have outliers andor a nonlinear trend aroundthe missing data points in terms of mean value

(4) This study assessed the five kernel functions Epan-echnikov Quartic Triweight Tricube and Cosine asa weight for predictingmissing values In comparisonwith the 119896nn-regression method this study demon-strates that the kernel approaches provide higherquality interpolated precipitation data than the 119896nn-regression approach In addition the kernel functionresults better conform to statistical standards

(5) Furthermore higher quality of interpolated precipi-tation data results in better performance for hydro-logic simulations as exemplified in this study All ofthe statistical analyses of the streamflow simulationsshowed that the simulations using the interpolatedprecipitation data from the kernel functions providebetter results than using 119896nn-regression

(6) Use of kernel distribution is a more effective methodthan regression when the precipitation data have anupward or downward trend However if the precip-itation data have a nonlinear trend it is difficult toeffectively reconstruct the missing values For furtherresearch a time series analysis or a random walkmodel using a stochastic process are possiblemethodsby which to estimate missing data where there is anonlinear trend

Appendices

A Kenel Functions

Kernel density estimation is an unsupervised learning pro-cedure which historically precedes kernel regression It alsoleads naturally to a simple family of procedures for nonpara-metric classification

A1 Kernel Density Estimation Suppose we have a randomsample 119909

1 1199092 119909

119873draw from a probability density 119891

119909(119909)

and we wish to estimate 119891119909at a point 119909

0 For simplicity we

assume for now that 119909 isin 119877 (real value) Arguing as before anatural local estimate has the form

119891119909(1199090) =

119909119894isin 119873 (119909

0)

119873120582 (A1)

where 119909119894means number of 119909

119894which converges to119873(119909

0) and

119873(1199090) is a small metric neighborhood around 119909

0of width 120582

Advances in Meteorology 9

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(a) 4-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(b) 6-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

EPQUTW

TCCOKNN

(c) 8-NN

Figure 4 Scatter plots for SWAT simulation (EP QU TW TC CO and119870NN represents Epanechnikov Quartic Triweight Tricube Cosineand 119870NN-regression resp)

This estimate is bumpy and the smooth Parzen estimate ispreferred

119891119909(1199090) =

1

119873120582

119873

sum

119894=1

119870120582(1199090 119909119894) (A2)

because it counts observations close to 1199090with weights that

decrease with distance from 1199090 In this case a popular choice

for 119870120582is the Gaussian kernel 119870

120582(1199090 119909119894) = 120601(|119909 minus 119909

0|120582)

Letting 120601120582denote the Gaussian density with mean zero and

standard-deviation 120582 then (A2) has the form

119891119909(1199090) =

1

119873

119873

sum

119894=1

120601120582(119909 minus 119909

119894) = (119865 lowast 120601

120582) (119909) (A3)

the convolution of the sample empirical distribution 119865 with120601120582 The distribution 119865(119909) puts mass 1119873 at each of the ob-

served 119909119894and is jumpy in 119891

119909(119909) we have smoothed 119865

by adding independent Gaussian noise to each observation119909119894The Parzen density estimate is the equivalent of the local

average and improvements have been proposed along thelines of local regression (on the log scale for densities) Wewill not pursue these here In 119877119901 the natural generalization ofthe Gaussian density estimate amounts to using the Gaussianproduct kernel in (A3)

119891119909(1199090) =

1

119873 (21205822120587)1199012

119873

sum

119894=1

119890minus(12)(||119909119894minus1199090||120582)

2

(A4)

10 Advances in Meteorology

Table 6 Weighted values depending on day distance with each 119870NN

4-NN 6-NN 8-NN1st 2nd 1st 2nd 3rd 1st 2nd 3rd 4th

Ep 0667 0417 0703 0563 0328 0720 0630 0480 0270Qu 0741 0289 0824 0527 0179 0864 0662 0384 0122Tw 0768 0188 0901 0461 0092 0968 0648 0287 0051Tc 0772 0301 0824 0579 0167 0844 0709 0416 0100Co 0680 0393 0726 0555 0301 0747 0635 0462 0243

A2 Kernel Density Classification One can use nonparamet-ric density estimates for classification in a straight-forwardfashion using Bayesrsquo theorem Suppose for a 119869 class problemwe fit nonparametric density estimates 119891

119895(119883) 119895 = 1 119869

separately in each of the classes and we also have estimatesof the class priors

119895(usually the sample proportions) Then

Pr (119866 = 119895 | 119883 = 1199090) =

119895119891119895(1199090)

sum119869

119896=1119896119891119896(1199090)

(A5)

In this region the data are sparse for both classes and sincethe Gaussian kernel density estimates use matric kernels thedensity estimates are low and of poor quality (high variance)in these regions The local logistic regression method usesthe tricube kernel with 119896-NN bandwidth this effectivelywidens the kernel in this region and makes use of the locallinear assumption to smooth out the estimate (on the logitscale)

If classification is the ultimate goal then learning theseparate class densities well may be unnecessary and can infact be misleading In learning the separate densities formdata one might decide to settle for a rougher high-variancefit to capture these features which are irrelevant for thepurposes of estimating the posterior probabilities In factif classification is the ultimate goal then we need only toestimate the posterior well near the decision boundary (fortwo classes this is the set 119909 | Pr(119866 = 1 | 119883 = 119909) = 12)

B Procedures of Missing Precipitation

This step shows example calculation for kernel functions forweighted mean It is an example question about the weight ofeach situation If the kernel functions are all symmetric samevalues are used for weight based on day distance FollowingTable 6 1st 2nd 3rd and 4th day distance and weightedvalues are shown For example if we want to estimatemissing precipitation for 2010-02-12 (actual value is 6) seefollowing procedures (3 steps) with 4-NN Epanechnikovkernel (Table 7)

Step 1 Select the date for target interpolation data

Step 2 Decide119870th nearest days precipitation and each kernelweight

Step 3 Calculate the weight average to estimate missing

Table 7 Example calculation to interpolate for missing precipita-tion

Step 1 Step 2 Step 3Date Prec Weight PrecsdotWeight Estimation2010-02-10 15 0417 6255

59492010-02-11 172 0667 114722010-02-12 6 mdash mdash2010-02-13 91 0667 60702010-02-14 0 0417 0

Table 8 Calculating missing precipitation for 2010-02-12 (actualvalue is 6) with six different methods

Method 4NN 6NN 8NNEp 5949 5172 4862Qu 5956 5302 4936Tw 5755 5294 4952Tc 6205 5407 4963Co 5945 5197 4876Reg 10325 8967 8813(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine andReg regression)

The rest of the kernel methods for estimating missingprecipitation are described in Table 8

C Sample Calculations with Real Value

This section shows how to calculate missing precipitationwith kernel mean weighed function by using certain numberThis sample selected daily data from 2008 to 2010 with002 possibilities to bivariate by random After selected datasetting data location is operated Zhang et al [39] addressedthat kernel based nonparametric multiple imputation hasbetter performance than general linear regression when thesample data is small or limited

Table 9 shows procedure of kernel weight in each func-tion We used data Feb 10 2012 from Feb 14 2014 toestimate Feb 12 2012 missing data Epanechnikov kernelshowed that longest data has highest estimation as 0417however Triweight kernel showed that longest data has lowestestimation as 0188 Highest weight in nearest value is Tricubekernel and lowest weight is Epanechnikov kernel Generally

Advances in Meteorology 11

Table 9 Sample calculation with certain number

(a) Epanechnikov

Date 210 211 212 213 214Prec 150 172 60 91 00Ep weight 0417 0667 mdash 0667 0417Precsdotweight 626 1147 mdash 607 000Estimation 595

(b) Quartic

Date 210 211 212 213 214Prec 150 172 60 91 00Qu weight 0289 0741 mdash 0741 0289Precsdotweight 434 1275 mdash 674 000Estimation 596

(c) Triweight

Date 210 211 212 213 214Prec 150 172 60 91 00Tw weight 0188 0768 mdash 0768 0188Precsdotweight 282 1321 mdash 699 000Estimation 575

(d) Tricube

Date 210 211 212 213 214Prec 150 172 60 91 00Tc weight 0301 0772 mdash 0772 0301Precsdotweight 452 1328 mdash 703 000Estimation 620

(e) Cosine

Date 210 211 212 213 214Prec 150 172 60 91 00Co weight 0393 0680 mdash 0680 0393Precsdotweight 590 1170 mdash 619 000Estimation 594

Tricube that is high weight shows the overestimation formissing precipitation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] K Kang and V Merwade ldquoDevelopment and application of astorage-release based distributed hydrologic model using GISrdquoJournal of Hydrology vol 403 no 1-2 pp 1ndash13 2011

[2] A J Abebe D P Solomatine and R G W Venneker ldquoApplica-tion of adaptive fuzzy rule-based models for reconstruction ofmissing precipitation eventsrdquoHydrological Sciences Journal vol45 no 3 pp 425ndash436 2000

[3] P Ramos-Calzado J Gomez-Camacho F Perez-Bernal andMF Pita-Lopez ldquoA novel approach to precipitation series com-pletion in climatological datasets application to AndalusiardquoInternational Journal of Climatology vol 28 no 11 pp 1525ndash1534 2008

[4] T Schneider ldquoAnalysis of incomplete climate data estimation ofmean values and covariancematrices and imputation ofmissingvaluesrdquo Journal of Climate vol 14 no 5 pp 853ndash871 2001

[5] S K Regonda D-J Seo B Lawrence J D Brown and JDemargne ldquoShort-term ensemble stream forecasting usingoperationally-produced single-valued streamflow forecastsmdasha Hydrologic Model Output Statistics (HMOS) approachrdquoJournal of Hydrology vol 497 pp 80ndash96 2013

[6] P Coulibaly and N D Evora ldquoComparison of neural networkmethods for infilling missing daily weather recordsrdquo Journal ofHydrology vol 341 no 1-2 pp 27ndash41 2007

[7] H A El Sharif and R S V Teegavarapu ldquoEvaluation ofspatial interpolation methods for missing precipitation datapreservation of spatial statisticsrdquo in Proceedings of the WorldEnvironmental and Water Resources Congress pp 3822ndash3832May 2012

[8] A Bardossy and G Pegram ldquoInfilling missing precipitationrecordsmdasha comparison of a new copula-based method withother techniquesrdquo Journal of Hydrology vol 519 pp 1162ndash11702014

[9] K Schamm M Ziese A Becker et al ldquoGlobal griddedprecipitation over land a description of the new GPCC firstguess daily productrdquo Earth System Science Data vol 6 no 1pp 49ndash60 2014

[10] R S V Teegavarapu ldquoStatistical corrections of spatially inter-polated missing precipitation data estimatesrdquoHydrological Pro-cesses vol 28 no 11 pp 3789ndash3808 2014

[11] Y Da and G Xiurun ldquoAn improved PSO-based ANN withsimulated annealing techniquerdquo Neurocomputing vol 63 pp527ndash533 2005

[12] A di Piazza F L Conti L V Noto F Viola and G La LoggialdquoComparative analysis of different techniques for spatial inter-polation of rainfall data to create a serially complete monthlytime series of precipitation for Sicily Italyrdquo International Journalof Applied Earth Observation and Geoinformation vol 13 no 3pp 396ndash408 2011

[13] E Pisoni F Pastor and M Volta ldquoArtificial Neural Networksto reconstruct incomplete satellite data application to theMediterranean Sea SurfaceTemperaturerdquoNonlinear Processes inGeophysics vol 15 no 1 pp 61ndash70 2008

[14] V Sharma S Rai and A Dev ldquoA comprehensive study ofartificial neural networksrdquo International Journal of AdvancedResearch in Computer Science and Sofrware Engineering vol 2no 10 pp 278ndash284 2012

[15] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy II hydrologic applicationsrdquo Journal of Hydrological Engi-neering vol 5 no 2 pp 124ndash137 2000

[16] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy I preliminary conceptsrdquo Journal of Hydrologic Engineeringvol 5 no 2 pp 115ndash123 2000

[17] D E Rumelhart B Widrow and M A Lehr ldquoThe basic ideasin neural networksrdquo Communications of the ACM vol 37 no 3pp 87ndash92 1994

[18] J Amorocho and W E Hart ldquoA critique of current methods inhydrologic systems investigationrdquo Transactions of the AmericanGeophysical Union vol 45 no 2 pp 307ndash321 1964

12 Advances in Meteorology

[19] A W Minns and M J Hall ldquoArtificial neural networks asrainfall-runoff modelsrdquo Hydrological Sciences Journal vol 41no 3 pp 399ndash417 1996

[20] K Kang and V Merwade ldquoThe effect of spatially uniformand non-uniform precipitation bias correction methods onimproving NEXRAD rainfall accuracy for distributed hydro-logic modelingrdquo Hydrology Research vol 45 no 1 pp 23ndash422014

[21] C Daly W P Gibson G H Taylor G L Johnson andP Pasteris ldquoA knowledge-based approach to the statisticalmapping of climaterdquo Climate Research vol 22 no 2 pp 99ndash1132002

[22] J D Creutin H Andrieu and D Faure ldquoUse of a weatherradar for the hydrology of a mountainous area Part II radarmeasurement validationrdquo Journal of Hydrology vol 193 no 1ndash4 pp 26ndash44 1997

[23] Y L Xia P Fabian A Stohl and M Winterhalter ldquoForest cli-matology estimation of missing values for Bavaria GermanyrdquoAgricultural and Forest Meteorology vol 96 no 1ndash3 pp 131ndash1441999

[24] C J Willmott S M Robeson and J J Feddema ldquoEstimatingcontinental and terrestrial precipitation averages from rain-gauge networksrdquo International Journal of Climatology vol 14no 4 pp 403ndash414 1994

[25] R S V Teegavarapu and V Chandramouli ldquoImproved weight-ing methods deterministic and stochastic data-driven modelsfor estimation of missing precipitation recordsrdquo Journal ofHydrology vol 312 no 1ndash4 pp 191ndash206 2005

[26] J A Smith ldquoPrecipitationrdquo in Handbook of Hydrology D RMaidment Ed vol 3 chapter 3 McGraw Hill New York NYUSA 1993

[27] J R Simanton and H B Osborn ldquoReciprocal-distance estimateof point rainfallrdquo Journal of Hydraulic Engineering Division vol106 no 7 pp 1242ndash1246 1980

[28] J D-J Salas ldquoAnalysis and modeling of hydrological timeseriesrdquo inHandbook of Hydrology D R Maidment Ed vol 19chapter 19 pp 191ndash1972 McGraw-Hill New York NY USA1993

[29] S J Jeffrey J O Carter K BMoodie and A R Beswick ldquoUsingspatial interpolation to construct a comprehensive archive ofAustralian climate datardquoEnvironmentalModelling and Softwarevol 16 no 4 pp 309ndash330 2001

[30] M Franklin V R Kotamarthi M L Stein and D R CookldquoGenerating data ensembles over a model grid from sparseclimate point measurementsrdquo Journal of Physics ConferenceSeries vol 125 Article ID 012019 2008

[31] K C Young ldquoA three-way model for interpolating for monthlyprecipitation valuesrdquo Monthly Weather Review vol 120 no 11pp 2561ndash2569 1992

[32] F Filippini G Galliani and L Pomi ldquoThe estimation ofmissingmeteorological data in a network of automatic stationsrdquoTransactions on Ecology and the Environment vol 4 pp 283ndash291 1994

[33] W P Lowry Compendium of Lecture Notes in Climatologyfor Class IV Meteorological Personnel Secretariat of the WorldMeteorological Organization Geneva Switzerland 1972

[34] M C Acock and Y A Pachepsky ldquoEstimating missing weatherdata for agricultural simulations using group method of datahandlingrdquo Journal of AppliedMeteorology vol 39 no 7 pp 1176ndash1184 2000

[35] S L Neitsch J G Arnold J R Kiniry J R Williamsand K W King Soil and Water Assessment ToolmdashTheoreticalDocumentation (Version 2005) Texas Water Resource InstituteCollege Station Tex USA 2005

[36] S S Shapiro and M B Wilk ldquoAn analysis of variance test fornormality complete samplesrdquo Biometrika vol 52 no 3-4 pp591ndash611 1965

[37] M Friedman ldquoThe use of ranks to avoid the assumption ofnormality implicit in the analysis of variancerdquo Journal of theAmerican Statistical Association vol 32 no 200 pp 675ndash7011937

[38] F Wilcoxon ldquoIndividual Comparisons by Ranking MethodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[39] S Zhang Z Jin X Zhu and J Zhang ldquoMissing data analysisa kernel-based multi-imputation approachrdquo in Transactionson Computational Science III vol 5300 of Lecture Notes inComputer Science pp 122ndash142 Springer Berlin Germany 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ClimatologyJournal of

EcologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

EarthquakesJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom

Applied ampEnvironmentalSoil Science

Volume 2014

Mining

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal of

Geophysics

OceanographyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of Computational Environmental SciencesHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal ofPetroleum Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

GeochemistryHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Atmospheric SciencesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OceanographyHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MineralogyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MeteorologyAdvances in

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Paleontology JournalHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ScientificaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geological ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geology Advances in

Page 2: Research Article Interpolation of Missing Precipitation ...downloads.hindawi.com/journals/amete/2015/935868.pdf · withthe NN regression approach, in terms of both statistical data

2 Advances in Meteorology

model [18] it is difficult to use this method for realizingmore linear relationships even though ANNs can achieveconvergence for almost any problem [17]Thus for realmech-anisms in hydrologic models in which linear relationshipsexist between series of weather inputs the solution is lessexplicit [19]

Generally a regression or a distance weighted methodis most commonly used for estimating missing precipitationfor hydrologic modeling [20] Daly et al [21] also proposea variety of regression models to incorporate spatial vari-ation in weather data However Creutin et al [22] foundthat even though simple linear regression of interpolationapproaches show satisfactory serial correlation of daily ormonthly streamflow precipitation patterns do not showproper correlation when simple linear regression or inter-polation approaches are used Furthermore if a regressionmethod is used for estimating missing precipitation to makerefined precipitation time series a small data sample wouldnot follow the normal distribution based on basic theory oflinear regression

Another approach for estimating missing precipitationdata to use neighboring data is based on distance weightXia et al [23] used the closest station to reconstruct missingprecipitation data through geometrical distance weightWill-mott et al [24] used arithmetic data averaging fromneighbor-ing data to filling missing precipitation and Teegavarapu andChandramouli [25] used an inverse distance weight methodfrom neighboring data to estimate missing precipitationdata Smith [26] Simanton and Osborn [27] and Salas [28]suggest that traditional weighting and data-driven methodsnamely distance based weighting methods are interpolatedfor estimating missing precipitation data Distance weightapproaches for estimating missing precipitation data arecombined with linear regression and median distributionof regression [29 30] Young [31] and Filippini et al [32]suggested spatially interpolating the correlation to defineweight in terms of each station

Estimation of missing precipitation data is possible whendata are available for the same location Linacre (1992)investigated the interpolation ofmissing precipitation data byusing the mean value of a data series at the same locationand Lowry [33] suggested simple interpolation betweenavailable data series Acock and Pachepsky [34] used datafrom several days before and after missing precipitation datapoints for estimating the incomplete precipitation data 119870-nearest neighborhood (119896nn) regression is a basic method forestimating missing precipitation data that considers vicinityHowever the method has some weaknesses when the datahave outliers or a nonlinear trend exists around the missingdata While 119896nn regression has a fundamental assumption tofollow a normal distribution which is statistically unsoundthe kernel method uses a mean value which can overcome119896nn regressionrsquos weakness through the kernel weightingmethod By using neighbor data in a kernel function eventhough the data show a nonlinear trend it can overcome 119896nnregression weakness

The objective of this study was to reconstruct dailyprecipitation data by using five different kernel functions(Epanechnikov Quartic Triweight Tricube and Cosine) to

estimate missing precipitation data This study also presentsan assessment that compares estimation of missing precipita-tion data through 119896nn regression to the five different kernelestimations and their performance in simulating streamflowusing the Soil Water Assessment Tool (SWAT) hydrologicmodel The remainder of this paper is organized as followsSection 2 provides a description of the study area and thehydrologic model In Section 3 the methodology of the fivedifferent kernel methods is presented Section 4 presents theresults of the interpolation of the missing daily precipitationdata and the hydrologic model simulation Finally conclu-sions are in Section 5

2 Study Area and Hydrologic Model

The Imha (Figure 1) watershed was selected as the test bedfor this study The Imha watershed is a tributary of theNakdong River basin and is located in the upper side of theNakdong River basin in South Korea It is characterized bya mountainous area approximately 798 of the total area of1361 km2 is mountainous The slope in the Imha watershedis 40 to 60 that is 655 km2 as 33 of total watershedarea The elevation of the Imha watershed ranges from80 to 1215m The average annual precipitation minimumtemperature maximum temperature humidity and windspeed for the Imha watershed are 1050mm 7∘C 188∘C65 and 16ms respectively (Water Management Infor-mation System (WAMIS) httpwwwwamisgokr) Sincethe climate conditions in this area are defined by warmtemperatures there is no precipitation in the form of snowall precipitation consists of rainfall For this evaluation ofinterpolation of precipitation data and hydrologic model per-formance precipitation and streamflow gauges were selectedas shown in Figure 1 and precipitation and streamflow datawere sourced from the Water Management Information Sys-tem (httpwwwwamisgokr)

This study selected the SWAT model for analysis SWAThas a GIS extension ArcSWAT which allows the use ofvarious GIS based datasets to model the geomorphologyof a given basin The SWAT model was developed throughresearch by the USDA (United States Department of Agricul-ture) Agricultural Research Service (ARS)Major data inputsfor SWAT include temperature (maximum and minimum)daily precipitation solar radiation relative humidity windspeed and geospatial data representing soil types land coverand elevation A watershed is divided into smaller subbasinswhich must be broken up into smaller units known as hydro-logic response units (HRU) Each of these HRUs is char-acterized by uniform land use and soil type SWAT can beused to accurately predict hydrologic patterns for extendedperiods of time [35] Canopy interception is implicit in thecurve number (CN) method and is explicit for the Green-Ampt method Infiltration is most accurately accounted forusing the CN method in SWAT An alternative method maybe used to account for infiltration is theGreen-AmptmethodHowever the Green-Ampt method has not been shown toincrease accuracy over the CN method thus the CN methodwas used in this study

Advances in Meteorology 3

Precipitation gaugeStreamflow gaugeImha

(a)

Precipitation gauge

0 750015000 30000(m)

Streamflow gaugeImha

N

(b)

Figure 1 Study basin locations including rain and stream gauges (left figure map of South Korea right figure Imha watershed)

3 Methodology

This study used the five kernel functions EpanechnikovQuartic Triweight Tricube and Cosine as a weight to pre-dict missing values Tricube method has large weight aroundtarget point Even though Tricube weight is similar to Tri-weight the decreasing acceleration of weight as far away fromtarget point is less than Triweight Next higher weight aroundtarget point is Quartic which speed in decreasing weight issimilar to Triweight Both Epanechnikov and Cosine havesmall effect on neighboring values A brief description of thefive kernel functions and their application for reconstructingthe missing values is presented in the following and specifickernel functions are described in Appendix A

31 Epanechnikov TheEpanechnikov kernel is themost oftenused kernel function The Epanechnikov kernel assigns zeroweight to observations that are a distance of four six andeight away from the reference pointThese values correspondto the choice of the interval width This is often called thechoice of smoothing parameter or band width selectionThe main character of the Epanechnikov kernel is that eventhough the distance is far away from target value namelythe missing value in this research its estimation is smoothA brief description is given by the following

119870 (119909) =3

4(1 minus 119909

2) (1)

where 119870(119909) is the kernel function and 119909 is surrounding thenearest value as an independent in data

32 Quartic The second kernel function used in this re-search was the Quartic kernel which has more weight sen-sitivity based on distance from the missing value Since theapplied weight is largely different between near and far datapoints it is more influenced by surrounding data It consistsof a fourth-order equation which has more sensitivity interms of distance than second-order equation It is describedby the following

119870 (119909) =15

16(1 minus 119909

2)2

(2)

33 Triweight The third kernel function used in this researchwas the Triweight kernel which consists of a sixth-orderequation It has the most sensitivity in terms of distancebecause a sixth-order equation estimates the missing valuebased on the difference in distance with a weighted functionas shown by the following

119870 (119909) =35

32(1 minus 119909

2)3

(3)

34 Tricube The fourth kernel function used in this researchwas the Tricube kernel which uses absolute values Sinceit uses absolute values it presents a smoother patternfor nearest values than the Triweight kernel However as

4 Advances in Meteorology

Table 1 Results of normality test with Shapiro-Wilk method for each119870-nearest neighborhood DF represents degree of freedom and 119875 valuemeans significance probability

4-NN 6-NN 8-NN119882 DF 119875 value 119882 DF 119875 value 119882 DF 119875 value

Ep 0808 19 00015 0740 19 00002 0766 19 00004Qu 0831 19 00033 0768 19 00004 0721 19 00001Tw 0827 19 00029 0789 19 00008 0745 19 00002Tc 0839 19 00045 0764 19 00004 0742 19 00002Co 0817 19 00020 0742 19 00002 0763 19 00003Reg 0876 19 00186 0858 19 00089 0883 19 00242(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine and Reg regression)

the valuesmove further away from the nearest values it showsa steep trend The Tricube kernel has the most sensitivity interms of weighted distance due to the fact that it consists of aninth-order equation as shown in the following

119870 (119909) =70

81(1 minus |119909|

3)3

(4)

35 Cosine The fifth kernel function used in this researchwas the Cosine kernel function It is a widely applied kernelfunction in various fields because it has a constant curvatureIts shape is similar to the Epanechnikov kernel even thoughit uses a cosine function as shown in the following

119870 (119909) =120587

4cos(120587

2119909) (5)

36 Calculation of the Missing Value After using a kernelfunction to calculate the weight of the missing data estima-tion of the missing data is performed using the following

119872 =1

119875

119875

sum

119894=1

119909119894sdot 119870 (119906

119894)

119906119894=

119873119894

05119875 + 1

119873119894= minus

119875

2

119875

2

(6)

where 119872 is the missing value 119875 is the number of thenearest neighborhood and 119906

119894is the119873th nearest values which

correspond to 119909119894(positive means the right side and negative

means the left side)The kernel function should have bilateralsymmetry based on a value of zero If using for example thefour nearest neighborhoods for estimating the missing valuethe neighborhood values used will be two from right sideand another two from left side The specific equation for thisexample is shown in the following and example calculation isdescribed in Appendix B

119872 =1

4119870(minus

2

3) sdot 1199091+ 119870(minus

1

3) sdot 1199092

+ 119870(1

3) sdot 1199093+ 119870(

2

3) sdot 1199094

(7)

37 Statistic Tests A normality test is required to evaluatefor infilling the methods for filling in interpolation dataThe Shapiro-Wilk [36] normality test was used with nineteensamples to determine whether the average difference isnormally distributed or not The test statistic is as shown inthe following

119882 =(sum119899

119894=1119886119894119910119894)2

sum119899

119894=1(119910119894minus 119910)2 (8)

where 119910119894is the 119894th order statistic namely the 119894th smallest

value in the sample 119910 is the mean of 119910119894 and 119886

119894is a constant

given by ordered data The null hypothesis of the Shapiro-Wilk normality test is that sample is normally distributed andif significance probability is less than 5 the null hypothesiswill be denied meaning the sample does not satisfy normaldistribution Since the significance probability for the entiregroup (Table 1) is below 5 the null hypothesis is deniedThis study should therefore use a nonparametric test fornormality analysis

The Friedman test [37] which is a kind of 119896-sample testthat can provide the difference between paired values wasselected as a nonparametric test This method evaluates asmall sample for differences by ranking a sequence list Thenull hypothesis of the Friedman test is that there is no averagedifference in each group and if the significance probabilityis less than 5 the null hypothesis will be denied thusconducting that in each group exists an average differenceA brief description of Friedman test is in the following

119876 =SStSSe

(9)

where SSt and SSe are the sum of the squared treatment andsum of the squared error respectively

The null hypothesis in this instance was denied becausethe significance probability was less than 5 for each andthis study concluded that each interpolation method has anaverage difference which is why each method is consideredindependent even though this study used five different kernelmethods For example the average rank for four referencepoints for 119896nn-regression Tricube Quartic Cosine Tri-weight and Epanechnikov varies from a large average toa small average rank (Table 2) For six reference points

Advances in Meteorology 5

Table 2 Chi-square (Χ2) test with Friedman method for finding difference among six infilling methods SD represents standard deviation119875 value means significance probability

(a) 4-NN

119873 Mean SD Min Max value Mean rank Χ2

119875 valueEp 19 minus129 446 minus1550 576 253

55602 00000

Qu 19 minus142 503 minus1506 751 274Tw 19 minus164 537 minus1490 813 258Tc 19 minus118 506 minus1487 829 447Co 19 minus140 469 minus1542 608 268Reg 19 276 542 minus1347 1438 600

(b) 6-NN

119873 Mean SD Min Max value Mean rank Χ2

119875 valueEp 19 minus261 472 minus1668 158 153

66519 00000

Qu 19 minus227 471 minus1620 282 316Tw 19 minus218 484 minus1589 406 379Tc 19 minus215 463 minus1620 284 421Co 19 minus254 469 minus1659 150 232Reg 19 006 497 minus1548 733 600

(c) 8-NN

119873 Mean SD Min Max value Mean rank Χ2

119875 valueEp 19 minus340 504 minus1733 194 132

75812 00000

Qu 19 minus310 475 minus1690 045 321Tw 19 minus274 479 minus1659 129 458Tc 19 minus293 477 minus1696 151 368Co 19 minus328 493 minus1725 185 221Reg 19 minus124 535 minus1649 808 600(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine and Reg regression)

the 119896nn-regression Tricube Triweight Quartic Cosine andEpanechnikov were ranked as shown in Table 2 In anotherexample eight reference points used 119896nn-regression Tri-weight Quartic Cosine and Epanechnikov average rank(Table 2) As shown in Table 2 the 119896nn-regression has thelargest average rank and Epanechnikov has the smallest rankaverage for all of the reference point cases This result provesthe dissimilarity of these methods

To determine which methods are dissimilar to the othersthis study performed theWilcoxon signed rank test [38]Thebasic feature of the Wilcoxon signed rank test is that datasamples that come from the same population are paired andit is detailed in the following

119882 =

1003816100381610038161003816100381610038161003816100381610038161003816

119873

sum

119894=1

[sign (1199102119894minus 1199101119894) 119877119894]

1003816100381610038161003816100381610038161003816100381610038161003816

(10)

where 119873 is the sample size 1199102119894

is 119894th value of the seconddata point 119910

1119894is 119894th value of the first data point and 119877

119894

is the rank of |1199102119894minus 1199101119894| If the 119882 value is less than 5

it means there is different mechanism used on the sampledata or method Table 3 shows that the 119882 value for 119896nn-regression is less than 5 for all cases Accordingly thissignifies that 119896nn-regression is completely dissimilar to theothermethodsAlthough the five different kernelmethods for

data interpolation exhibit similarity or dissimilarity to eachother depending on the number of reference points all ofthe kernelmethods can be distinguished from 119896nn-regressionusing the Wilcoxon signed rank test

4 Results

Since Epanechnikov has the smallest average rank whichsignifies a small difference between the observation valueand the interpolated value for all reference points in Table 2interpolation data obtained from the Epanechnikov methodhas the best result among the studied methods Figure 2shows that filling in data from 119896nn-regression has a largedifference at both four and six reference points Interpolationdata from the kernel methods are close to zero for both theaverage and median values at four reference points meaningthat the interpolation data are similar to the observation dataOn the other hand more than 75 of the interpolation datafrom 119896nn-regression exhibits a difference than zero Whenthe interpolation data are evaluated at six reference points inFigure 2 the median value from the 119896nn-regression is shownto be far away from zero At eight reference points 119896nn-regression is close to zero for both average andmedian valueshowever it is difficult to conclude that this is an ideal method

6 Advances in Meteorology

Table 3 Chi-square (Χ2) test with Wilcoxon signed rank method between regression and five different kernel methods

(a) 4-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(b) 6-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(c) 8-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine and Reg regression)

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(a) 4-NN

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(b) 6-NN

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(c) 8-NN

Figure 2 Box plots for difference between actual precipitation and interpolated precipitation 119910-axis represents mm per day

because outlying maximum values will affect the average andmedian value

This study on precipitation data interpolation also eval-uated the simulation of the interpolated data using theSWAT hydrologic model In SWAT hydrologic modelingthe surface runoff is estimated by considering excess pre-cipitation with abstractions and infiltration factor throughSoil Conservation Service CurveNumber (SCS-CN)method

Green-Ampt (GA) infiltration method is another methodto calculate the surface runoff in SWAT A study showsthat both methods give reasonable results and there is nosignificant advantage observed in using one over the otherHowever the GAmethod appears to havemore limitations inmodeling seasonal variability than the SCS-CNmethod doesHence the SCS-CN method is used for infiltration factor inthis study An SCS curve number based simulation needs

Advances in Meteorology 7

0

100

200

300

400

500

60020

08-0

1-01

2008

-01-

2220

08-0

2-12

2008

-03-

0420

08-0

3-25

2008

-04-

1520

08-0

5-06

2008

-05-

2720

08-0

6-17

2008

-07-

0820

08-0

7-29

2008

-08-

1920

08-0

9-09

2008

-09-

3020

08-1

0-21

2008

-11-

1120

08-1

2-02

2008

-12-

2320

09-0

1-13

2009

-02-

0320

09-0

2-24

2009

-03-

1720

09-0

4-07

2009

-04-

2820

09-0

5-19

2009

-06-

0920

09-0

6-30

2009

-07-

2120

09-0

8-11

2009

-09-

0120

09-0

9-22

2009

-10-

1320

09-1

1-03

2009

-11-

2420

09-1

2-15

2010

-01-

0520

10-0

1-26

2010

-02-

1620

10-0

3-09

2010

-03-

3020

10-0

4-20

2010

-05-

1120

10-0

6-01

2010

-06-

2220

10-0

7-13

2010

-08-

0320

10-0

8-24

2010

-09-

1420

10-1

0-05

2010

-10-

2620

10-1

1-16

2010

-12-

0720

10-1

2-28

ObsSim

Figure 3 Calibrated model result using original precipitation input 119909-axis represents time in days and 119910-axis represents flow in cubic metersper second

Table 4 Details of SWAT parameters which are related to runoff mechanism for Imha watershed

Parameter Description Selected valueESCO Soil evaporation compensation factor 09500EPCO Plant water uptake compensation factor 10000EVLAI Leaf area index at which no evaporation occurs from water surface [m2m2

] 30000FFCB Initial soil water storage expressed as a fraction of field capacity water content 00000IEVENT Rainfallrunoff code 0 = daily rainfallCN 00000ICRK Crack flow code 1 = model crack flow in soil 00000SURLAG Surface runoff lag time [days] 40000ADJ PKR Peak rate adjustment factor for sediment routing in the subbasin (tributary channels) 00000PRF Peak rate adjustment factor for sediment routing in the main channel 10000

SPCON Linear parameter for calculating the maximum amount of sediment that can be reentrainedduring channel sediment 00001

SPEXP Exponent parameter for calculating sediment reentrained in channel sediment routing 10000

time step updated information as soil water content changesExcess rainfall equation in SCS-CN method was generatedbased on historical relationship between the curve numberand the hydrologic mechanism for over 20 years Through-out the surface runoff calculation infiltration should beupdated over time according to the soil type Other abstrac-tions such as evapotranspiration and soil and snow evap-oration are calculated by Penman-Monteith method and

meteorological statistics Finally the kinematic storagemodelis used to compute groundwater storage and seepage Flowresulting in SWAT modeling is routed HRUs to watershedoutlet Figure 3 shows the calibration of themodel simulationas the initial step and the specific parameters are describedin Table 4 After the calibration of the SWAT model thesix different interpolated precipitation datasets with threedifferent reference ranges for each (a total of twenty-four

8 Advances in Meteorology

Table 5 Details of simulation results with six different precipitation infilling methods in Imha watershed

4-NN 6-NN 8-NN119864NS 119877

2 RMSE 119864NS 1198772 RMSE 119864NS 119877

2 RMSEEp 080 083 1532 091 092 1060 078 082 1616Qu 073 078 1783 091 093 1048 088 092 1180Tw 091 091 1056 093 094 925 095 095 810Tc 095 095 772 093 094 903 095 095 764Co 093 094 883 095 095 772 091 093 1044Reg 069 080 1914 021 065 3071 071 073 2148

interpolated precipitations data points) were used to assessthe performance of interpolated precipitation data for hydro-logic model simulation Streamflow simulations were donefor three years from 2008 to 2010 To evaluate the modelperformance considering the use of different interpolatedprecipitation datasets this study used 119864NS (Nash-Sutcliffecoefficient) 119877-square (coefficient of determination) andRMSE (root mean square error) Table 5 and Figure 4 showthat the simulation results from 119896nn-regression exhibit lowSWAT simulation performance for streamflow estimationswith 054119864NS 074119877-square and 2378m3s RMSE as anaverage All of the kernel functions on the other handexhibit good performance for hydrologic simulations withinterpolated precipitation data (Table 5 and Figure 4) theaverage of 119864NS 119877-square and RMSE (1) for Epanechnikov is083 086 and 1403m3s (2) for Quartic is 084 088 and1303m3s (3) for Triweight is 093 093 and 930m3s (4)for Tricube is 094 095 and 813m3s and (5) for Cosine is093 094 and 900m3s respectively

5 Conclusions

Five different kernel functions were applied to the Imhawatershed to evaluate the performance of each weightedmethod for estimating missing precipitation data and the useof interpolated data for hydrologic simulations was assessedThe following conclusions can be drawn from this research

(1) To estimate missing precipitation data points explor-atory procedures should consider the spatiotempo-ral variations of precipitation Due to difficulty onaccounting for these variations statistical methodsfor estimating missing precipitation data are com-monly used

(2) Although ANNs are an advanced approach for esti-matingmissing datamechanisms are unclear becausethe neuron system is ultimately a black-box modelThus regressionmethods are widely used for estimat-ing missing data even though there are limitationsin that regression methods cannot follow normaldistribution when the sample is small

(3) When using kernel functions as a weighted methodestimated missing data would satisfy normal distri-bution which is more statistically sound Also kernelmethods can overcome weakness in 119896nn-regression if

the data have outliers andor a nonlinear trend aroundthe missing data points in terms of mean value

(4) This study assessed the five kernel functions Epan-echnikov Quartic Triweight Tricube and Cosine asa weight for predictingmissing values In comparisonwith the 119896nn-regression method this study demon-strates that the kernel approaches provide higherquality interpolated precipitation data than the 119896nn-regression approach In addition the kernel functionresults better conform to statistical standards

(5) Furthermore higher quality of interpolated precipi-tation data results in better performance for hydro-logic simulations as exemplified in this study All ofthe statistical analyses of the streamflow simulationsshowed that the simulations using the interpolatedprecipitation data from the kernel functions providebetter results than using 119896nn-regression

(6) Use of kernel distribution is a more effective methodthan regression when the precipitation data have anupward or downward trend However if the precip-itation data have a nonlinear trend it is difficult toeffectively reconstruct the missing values For furtherresearch a time series analysis or a random walkmodel using a stochastic process are possiblemethodsby which to estimate missing data where there is anonlinear trend

Appendices

A Kenel Functions

Kernel density estimation is an unsupervised learning pro-cedure which historically precedes kernel regression It alsoleads naturally to a simple family of procedures for nonpara-metric classification

A1 Kernel Density Estimation Suppose we have a randomsample 119909

1 1199092 119909

119873draw from a probability density 119891

119909(119909)

and we wish to estimate 119891119909at a point 119909

0 For simplicity we

assume for now that 119909 isin 119877 (real value) Arguing as before anatural local estimate has the form

119891119909(1199090) =

119909119894isin 119873 (119909

0)

119873120582 (A1)

where 119909119894means number of 119909

119894which converges to119873(119909

0) and

119873(1199090) is a small metric neighborhood around 119909

0of width 120582

Advances in Meteorology 9

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(a) 4-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(b) 6-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

EPQUTW

TCCOKNN

(c) 8-NN

Figure 4 Scatter plots for SWAT simulation (EP QU TW TC CO and119870NN represents Epanechnikov Quartic Triweight Tricube Cosineand 119870NN-regression resp)

This estimate is bumpy and the smooth Parzen estimate ispreferred

119891119909(1199090) =

1

119873120582

119873

sum

119894=1

119870120582(1199090 119909119894) (A2)

because it counts observations close to 1199090with weights that

decrease with distance from 1199090 In this case a popular choice

for 119870120582is the Gaussian kernel 119870

120582(1199090 119909119894) = 120601(|119909 minus 119909

0|120582)

Letting 120601120582denote the Gaussian density with mean zero and

standard-deviation 120582 then (A2) has the form

119891119909(1199090) =

1

119873

119873

sum

119894=1

120601120582(119909 minus 119909

119894) = (119865 lowast 120601

120582) (119909) (A3)

the convolution of the sample empirical distribution 119865 with120601120582 The distribution 119865(119909) puts mass 1119873 at each of the ob-

served 119909119894and is jumpy in 119891

119909(119909) we have smoothed 119865

by adding independent Gaussian noise to each observation119909119894The Parzen density estimate is the equivalent of the local

average and improvements have been proposed along thelines of local regression (on the log scale for densities) Wewill not pursue these here In 119877119901 the natural generalization ofthe Gaussian density estimate amounts to using the Gaussianproduct kernel in (A3)

119891119909(1199090) =

1

119873 (21205822120587)1199012

119873

sum

119894=1

119890minus(12)(||119909119894minus1199090||120582)

2

(A4)

10 Advances in Meteorology

Table 6 Weighted values depending on day distance with each 119870NN

4-NN 6-NN 8-NN1st 2nd 1st 2nd 3rd 1st 2nd 3rd 4th

Ep 0667 0417 0703 0563 0328 0720 0630 0480 0270Qu 0741 0289 0824 0527 0179 0864 0662 0384 0122Tw 0768 0188 0901 0461 0092 0968 0648 0287 0051Tc 0772 0301 0824 0579 0167 0844 0709 0416 0100Co 0680 0393 0726 0555 0301 0747 0635 0462 0243

A2 Kernel Density Classification One can use nonparamet-ric density estimates for classification in a straight-forwardfashion using Bayesrsquo theorem Suppose for a 119869 class problemwe fit nonparametric density estimates 119891

119895(119883) 119895 = 1 119869

separately in each of the classes and we also have estimatesof the class priors

119895(usually the sample proportions) Then

Pr (119866 = 119895 | 119883 = 1199090) =

119895119891119895(1199090)

sum119869

119896=1119896119891119896(1199090)

(A5)

In this region the data are sparse for both classes and sincethe Gaussian kernel density estimates use matric kernels thedensity estimates are low and of poor quality (high variance)in these regions The local logistic regression method usesthe tricube kernel with 119896-NN bandwidth this effectivelywidens the kernel in this region and makes use of the locallinear assumption to smooth out the estimate (on the logitscale)

If classification is the ultimate goal then learning theseparate class densities well may be unnecessary and can infact be misleading In learning the separate densities formdata one might decide to settle for a rougher high-variancefit to capture these features which are irrelevant for thepurposes of estimating the posterior probabilities In factif classification is the ultimate goal then we need only toestimate the posterior well near the decision boundary (fortwo classes this is the set 119909 | Pr(119866 = 1 | 119883 = 119909) = 12)

B Procedures of Missing Precipitation

This step shows example calculation for kernel functions forweighted mean It is an example question about the weight ofeach situation If the kernel functions are all symmetric samevalues are used for weight based on day distance FollowingTable 6 1st 2nd 3rd and 4th day distance and weightedvalues are shown For example if we want to estimatemissing precipitation for 2010-02-12 (actual value is 6) seefollowing procedures (3 steps) with 4-NN Epanechnikovkernel (Table 7)

Step 1 Select the date for target interpolation data

Step 2 Decide119870th nearest days precipitation and each kernelweight

Step 3 Calculate the weight average to estimate missing

Table 7 Example calculation to interpolate for missing precipita-tion

Step 1 Step 2 Step 3Date Prec Weight PrecsdotWeight Estimation2010-02-10 15 0417 6255

59492010-02-11 172 0667 114722010-02-12 6 mdash mdash2010-02-13 91 0667 60702010-02-14 0 0417 0

Table 8 Calculating missing precipitation for 2010-02-12 (actualvalue is 6) with six different methods

Method 4NN 6NN 8NNEp 5949 5172 4862Qu 5956 5302 4936Tw 5755 5294 4952Tc 6205 5407 4963Co 5945 5197 4876Reg 10325 8967 8813(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine andReg regression)

The rest of the kernel methods for estimating missingprecipitation are described in Table 8

C Sample Calculations with Real Value

This section shows how to calculate missing precipitationwith kernel mean weighed function by using certain numberThis sample selected daily data from 2008 to 2010 with002 possibilities to bivariate by random After selected datasetting data location is operated Zhang et al [39] addressedthat kernel based nonparametric multiple imputation hasbetter performance than general linear regression when thesample data is small or limited

Table 9 shows procedure of kernel weight in each func-tion We used data Feb 10 2012 from Feb 14 2014 toestimate Feb 12 2012 missing data Epanechnikov kernelshowed that longest data has highest estimation as 0417however Triweight kernel showed that longest data has lowestestimation as 0188 Highest weight in nearest value is Tricubekernel and lowest weight is Epanechnikov kernel Generally

Advances in Meteorology 11

Table 9 Sample calculation with certain number

(a) Epanechnikov

Date 210 211 212 213 214Prec 150 172 60 91 00Ep weight 0417 0667 mdash 0667 0417Precsdotweight 626 1147 mdash 607 000Estimation 595

(b) Quartic

Date 210 211 212 213 214Prec 150 172 60 91 00Qu weight 0289 0741 mdash 0741 0289Precsdotweight 434 1275 mdash 674 000Estimation 596

(c) Triweight

Date 210 211 212 213 214Prec 150 172 60 91 00Tw weight 0188 0768 mdash 0768 0188Precsdotweight 282 1321 mdash 699 000Estimation 575

(d) Tricube

Date 210 211 212 213 214Prec 150 172 60 91 00Tc weight 0301 0772 mdash 0772 0301Precsdotweight 452 1328 mdash 703 000Estimation 620

(e) Cosine

Date 210 211 212 213 214Prec 150 172 60 91 00Co weight 0393 0680 mdash 0680 0393Precsdotweight 590 1170 mdash 619 000Estimation 594

Tricube that is high weight shows the overestimation formissing precipitation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] K Kang and V Merwade ldquoDevelopment and application of astorage-release based distributed hydrologic model using GISrdquoJournal of Hydrology vol 403 no 1-2 pp 1ndash13 2011

[2] A J Abebe D P Solomatine and R G W Venneker ldquoApplica-tion of adaptive fuzzy rule-based models for reconstruction ofmissing precipitation eventsrdquoHydrological Sciences Journal vol45 no 3 pp 425ndash436 2000

[3] P Ramos-Calzado J Gomez-Camacho F Perez-Bernal andMF Pita-Lopez ldquoA novel approach to precipitation series com-pletion in climatological datasets application to AndalusiardquoInternational Journal of Climatology vol 28 no 11 pp 1525ndash1534 2008

[4] T Schneider ldquoAnalysis of incomplete climate data estimation ofmean values and covariancematrices and imputation ofmissingvaluesrdquo Journal of Climate vol 14 no 5 pp 853ndash871 2001

[5] S K Regonda D-J Seo B Lawrence J D Brown and JDemargne ldquoShort-term ensemble stream forecasting usingoperationally-produced single-valued streamflow forecastsmdasha Hydrologic Model Output Statistics (HMOS) approachrdquoJournal of Hydrology vol 497 pp 80ndash96 2013

[6] P Coulibaly and N D Evora ldquoComparison of neural networkmethods for infilling missing daily weather recordsrdquo Journal ofHydrology vol 341 no 1-2 pp 27ndash41 2007

[7] H A El Sharif and R S V Teegavarapu ldquoEvaluation ofspatial interpolation methods for missing precipitation datapreservation of spatial statisticsrdquo in Proceedings of the WorldEnvironmental and Water Resources Congress pp 3822ndash3832May 2012

[8] A Bardossy and G Pegram ldquoInfilling missing precipitationrecordsmdasha comparison of a new copula-based method withother techniquesrdquo Journal of Hydrology vol 519 pp 1162ndash11702014

[9] K Schamm M Ziese A Becker et al ldquoGlobal griddedprecipitation over land a description of the new GPCC firstguess daily productrdquo Earth System Science Data vol 6 no 1pp 49ndash60 2014

[10] R S V Teegavarapu ldquoStatistical corrections of spatially inter-polated missing precipitation data estimatesrdquoHydrological Pro-cesses vol 28 no 11 pp 3789ndash3808 2014

[11] Y Da and G Xiurun ldquoAn improved PSO-based ANN withsimulated annealing techniquerdquo Neurocomputing vol 63 pp527ndash533 2005

[12] A di Piazza F L Conti L V Noto F Viola and G La LoggialdquoComparative analysis of different techniques for spatial inter-polation of rainfall data to create a serially complete monthlytime series of precipitation for Sicily Italyrdquo International Journalof Applied Earth Observation and Geoinformation vol 13 no 3pp 396ndash408 2011

[13] E Pisoni F Pastor and M Volta ldquoArtificial Neural Networksto reconstruct incomplete satellite data application to theMediterranean Sea SurfaceTemperaturerdquoNonlinear Processes inGeophysics vol 15 no 1 pp 61ndash70 2008

[14] V Sharma S Rai and A Dev ldquoA comprehensive study ofartificial neural networksrdquo International Journal of AdvancedResearch in Computer Science and Sofrware Engineering vol 2no 10 pp 278ndash284 2012

[15] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy II hydrologic applicationsrdquo Journal of Hydrological Engi-neering vol 5 no 2 pp 124ndash137 2000

[16] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy I preliminary conceptsrdquo Journal of Hydrologic Engineeringvol 5 no 2 pp 115ndash123 2000

[17] D E Rumelhart B Widrow and M A Lehr ldquoThe basic ideasin neural networksrdquo Communications of the ACM vol 37 no 3pp 87ndash92 1994

[18] J Amorocho and W E Hart ldquoA critique of current methods inhydrologic systems investigationrdquo Transactions of the AmericanGeophysical Union vol 45 no 2 pp 307ndash321 1964

12 Advances in Meteorology

[19] A W Minns and M J Hall ldquoArtificial neural networks asrainfall-runoff modelsrdquo Hydrological Sciences Journal vol 41no 3 pp 399ndash417 1996

[20] K Kang and V Merwade ldquoThe effect of spatially uniformand non-uniform precipitation bias correction methods onimproving NEXRAD rainfall accuracy for distributed hydro-logic modelingrdquo Hydrology Research vol 45 no 1 pp 23ndash422014

[21] C Daly W P Gibson G H Taylor G L Johnson andP Pasteris ldquoA knowledge-based approach to the statisticalmapping of climaterdquo Climate Research vol 22 no 2 pp 99ndash1132002

[22] J D Creutin H Andrieu and D Faure ldquoUse of a weatherradar for the hydrology of a mountainous area Part II radarmeasurement validationrdquo Journal of Hydrology vol 193 no 1ndash4 pp 26ndash44 1997

[23] Y L Xia P Fabian A Stohl and M Winterhalter ldquoForest cli-matology estimation of missing values for Bavaria GermanyrdquoAgricultural and Forest Meteorology vol 96 no 1ndash3 pp 131ndash1441999

[24] C J Willmott S M Robeson and J J Feddema ldquoEstimatingcontinental and terrestrial precipitation averages from rain-gauge networksrdquo International Journal of Climatology vol 14no 4 pp 403ndash414 1994

[25] R S V Teegavarapu and V Chandramouli ldquoImproved weight-ing methods deterministic and stochastic data-driven modelsfor estimation of missing precipitation recordsrdquo Journal ofHydrology vol 312 no 1ndash4 pp 191ndash206 2005

[26] J A Smith ldquoPrecipitationrdquo in Handbook of Hydrology D RMaidment Ed vol 3 chapter 3 McGraw Hill New York NYUSA 1993

[27] J R Simanton and H B Osborn ldquoReciprocal-distance estimateof point rainfallrdquo Journal of Hydraulic Engineering Division vol106 no 7 pp 1242ndash1246 1980

[28] J D-J Salas ldquoAnalysis and modeling of hydrological timeseriesrdquo inHandbook of Hydrology D R Maidment Ed vol 19chapter 19 pp 191ndash1972 McGraw-Hill New York NY USA1993

[29] S J Jeffrey J O Carter K BMoodie and A R Beswick ldquoUsingspatial interpolation to construct a comprehensive archive ofAustralian climate datardquoEnvironmentalModelling and Softwarevol 16 no 4 pp 309ndash330 2001

[30] M Franklin V R Kotamarthi M L Stein and D R CookldquoGenerating data ensembles over a model grid from sparseclimate point measurementsrdquo Journal of Physics ConferenceSeries vol 125 Article ID 012019 2008

[31] K C Young ldquoA three-way model for interpolating for monthlyprecipitation valuesrdquo Monthly Weather Review vol 120 no 11pp 2561ndash2569 1992

[32] F Filippini G Galliani and L Pomi ldquoThe estimation ofmissingmeteorological data in a network of automatic stationsrdquoTransactions on Ecology and the Environment vol 4 pp 283ndash291 1994

[33] W P Lowry Compendium of Lecture Notes in Climatologyfor Class IV Meteorological Personnel Secretariat of the WorldMeteorological Organization Geneva Switzerland 1972

[34] M C Acock and Y A Pachepsky ldquoEstimating missing weatherdata for agricultural simulations using group method of datahandlingrdquo Journal of AppliedMeteorology vol 39 no 7 pp 1176ndash1184 2000

[35] S L Neitsch J G Arnold J R Kiniry J R Williamsand K W King Soil and Water Assessment ToolmdashTheoreticalDocumentation (Version 2005) Texas Water Resource InstituteCollege Station Tex USA 2005

[36] S S Shapiro and M B Wilk ldquoAn analysis of variance test fornormality complete samplesrdquo Biometrika vol 52 no 3-4 pp591ndash611 1965

[37] M Friedman ldquoThe use of ranks to avoid the assumption ofnormality implicit in the analysis of variancerdquo Journal of theAmerican Statistical Association vol 32 no 200 pp 675ndash7011937

[38] F Wilcoxon ldquoIndividual Comparisons by Ranking MethodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[39] S Zhang Z Jin X Zhu and J Zhang ldquoMissing data analysisa kernel-based multi-imputation approachrdquo in Transactionson Computational Science III vol 5300 of Lecture Notes inComputer Science pp 122ndash142 Springer Berlin Germany 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ClimatologyJournal of

EcologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

EarthquakesJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom

Applied ampEnvironmentalSoil Science

Volume 2014

Mining

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal of

Geophysics

OceanographyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of Computational Environmental SciencesHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal ofPetroleum Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

GeochemistryHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Atmospheric SciencesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OceanographyHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MineralogyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MeteorologyAdvances in

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Paleontology JournalHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ScientificaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geological ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geology Advances in

Page 3: Research Article Interpolation of Missing Precipitation ...downloads.hindawi.com/journals/amete/2015/935868.pdf · withthe NN regression approach, in terms of both statistical data

Advances in Meteorology 3

Precipitation gaugeStreamflow gaugeImha

(a)

Precipitation gauge

0 750015000 30000(m)

Streamflow gaugeImha

N

(b)

Figure 1 Study basin locations including rain and stream gauges (left figure map of South Korea right figure Imha watershed)

3 Methodology

This study used the five kernel functions EpanechnikovQuartic Triweight Tricube and Cosine as a weight to pre-dict missing values Tricube method has large weight aroundtarget point Even though Tricube weight is similar to Tri-weight the decreasing acceleration of weight as far away fromtarget point is less than Triweight Next higher weight aroundtarget point is Quartic which speed in decreasing weight issimilar to Triweight Both Epanechnikov and Cosine havesmall effect on neighboring values A brief description of thefive kernel functions and their application for reconstructingthe missing values is presented in the following and specifickernel functions are described in Appendix A

31 Epanechnikov TheEpanechnikov kernel is themost oftenused kernel function The Epanechnikov kernel assigns zeroweight to observations that are a distance of four six andeight away from the reference pointThese values correspondto the choice of the interval width This is often called thechoice of smoothing parameter or band width selectionThe main character of the Epanechnikov kernel is that eventhough the distance is far away from target value namelythe missing value in this research its estimation is smoothA brief description is given by the following

119870 (119909) =3

4(1 minus 119909

2) (1)

where 119870(119909) is the kernel function and 119909 is surrounding thenearest value as an independent in data

32 Quartic The second kernel function used in this re-search was the Quartic kernel which has more weight sen-sitivity based on distance from the missing value Since theapplied weight is largely different between near and far datapoints it is more influenced by surrounding data It consistsof a fourth-order equation which has more sensitivity interms of distance than second-order equation It is describedby the following

119870 (119909) =15

16(1 minus 119909

2)2

(2)

33 Triweight The third kernel function used in this researchwas the Triweight kernel which consists of a sixth-orderequation It has the most sensitivity in terms of distancebecause a sixth-order equation estimates the missing valuebased on the difference in distance with a weighted functionas shown by the following

119870 (119909) =35

32(1 minus 119909

2)3

(3)

34 Tricube The fourth kernel function used in this researchwas the Tricube kernel which uses absolute values Sinceit uses absolute values it presents a smoother patternfor nearest values than the Triweight kernel However as

4 Advances in Meteorology

Table 1 Results of normality test with Shapiro-Wilk method for each119870-nearest neighborhood DF represents degree of freedom and 119875 valuemeans significance probability

4-NN 6-NN 8-NN119882 DF 119875 value 119882 DF 119875 value 119882 DF 119875 value

Ep 0808 19 00015 0740 19 00002 0766 19 00004Qu 0831 19 00033 0768 19 00004 0721 19 00001Tw 0827 19 00029 0789 19 00008 0745 19 00002Tc 0839 19 00045 0764 19 00004 0742 19 00002Co 0817 19 00020 0742 19 00002 0763 19 00003Reg 0876 19 00186 0858 19 00089 0883 19 00242(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine and Reg regression)

the valuesmove further away from the nearest values it showsa steep trend The Tricube kernel has the most sensitivity interms of weighted distance due to the fact that it consists of aninth-order equation as shown in the following

119870 (119909) =70

81(1 minus |119909|

3)3

(4)

35 Cosine The fifth kernel function used in this researchwas the Cosine kernel function It is a widely applied kernelfunction in various fields because it has a constant curvatureIts shape is similar to the Epanechnikov kernel even thoughit uses a cosine function as shown in the following

119870 (119909) =120587

4cos(120587

2119909) (5)

36 Calculation of the Missing Value After using a kernelfunction to calculate the weight of the missing data estima-tion of the missing data is performed using the following

119872 =1

119875

119875

sum

119894=1

119909119894sdot 119870 (119906

119894)

119906119894=

119873119894

05119875 + 1

119873119894= minus

119875

2

119875

2

(6)

where 119872 is the missing value 119875 is the number of thenearest neighborhood and 119906

119894is the119873th nearest values which

correspond to 119909119894(positive means the right side and negative

means the left side)The kernel function should have bilateralsymmetry based on a value of zero If using for example thefour nearest neighborhoods for estimating the missing valuethe neighborhood values used will be two from right sideand another two from left side The specific equation for thisexample is shown in the following and example calculation isdescribed in Appendix B

119872 =1

4119870(minus

2

3) sdot 1199091+ 119870(minus

1

3) sdot 1199092

+ 119870(1

3) sdot 1199093+ 119870(

2

3) sdot 1199094

(7)

37 Statistic Tests A normality test is required to evaluatefor infilling the methods for filling in interpolation dataThe Shapiro-Wilk [36] normality test was used with nineteensamples to determine whether the average difference isnormally distributed or not The test statistic is as shown inthe following

119882 =(sum119899

119894=1119886119894119910119894)2

sum119899

119894=1(119910119894minus 119910)2 (8)

where 119910119894is the 119894th order statistic namely the 119894th smallest

value in the sample 119910 is the mean of 119910119894 and 119886

119894is a constant

given by ordered data The null hypothesis of the Shapiro-Wilk normality test is that sample is normally distributed andif significance probability is less than 5 the null hypothesiswill be denied meaning the sample does not satisfy normaldistribution Since the significance probability for the entiregroup (Table 1) is below 5 the null hypothesis is deniedThis study should therefore use a nonparametric test fornormality analysis

The Friedman test [37] which is a kind of 119896-sample testthat can provide the difference between paired values wasselected as a nonparametric test This method evaluates asmall sample for differences by ranking a sequence list Thenull hypothesis of the Friedman test is that there is no averagedifference in each group and if the significance probabilityis less than 5 the null hypothesis will be denied thusconducting that in each group exists an average differenceA brief description of Friedman test is in the following

119876 =SStSSe

(9)

where SSt and SSe are the sum of the squared treatment andsum of the squared error respectively

The null hypothesis in this instance was denied becausethe significance probability was less than 5 for each andthis study concluded that each interpolation method has anaverage difference which is why each method is consideredindependent even though this study used five different kernelmethods For example the average rank for four referencepoints for 119896nn-regression Tricube Quartic Cosine Tri-weight and Epanechnikov varies from a large average toa small average rank (Table 2) For six reference points

Advances in Meteorology 5

Table 2 Chi-square (Χ2) test with Friedman method for finding difference among six infilling methods SD represents standard deviation119875 value means significance probability

(a) 4-NN

119873 Mean SD Min Max value Mean rank Χ2

119875 valueEp 19 minus129 446 minus1550 576 253

55602 00000

Qu 19 minus142 503 minus1506 751 274Tw 19 minus164 537 minus1490 813 258Tc 19 minus118 506 minus1487 829 447Co 19 minus140 469 minus1542 608 268Reg 19 276 542 minus1347 1438 600

(b) 6-NN

119873 Mean SD Min Max value Mean rank Χ2

119875 valueEp 19 minus261 472 minus1668 158 153

66519 00000

Qu 19 minus227 471 minus1620 282 316Tw 19 minus218 484 minus1589 406 379Tc 19 minus215 463 minus1620 284 421Co 19 minus254 469 minus1659 150 232Reg 19 006 497 minus1548 733 600

(c) 8-NN

119873 Mean SD Min Max value Mean rank Χ2

119875 valueEp 19 minus340 504 minus1733 194 132

75812 00000

Qu 19 minus310 475 minus1690 045 321Tw 19 minus274 479 minus1659 129 458Tc 19 minus293 477 minus1696 151 368Co 19 minus328 493 minus1725 185 221Reg 19 minus124 535 minus1649 808 600(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine and Reg regression)

the 119896nn-regression Tricube Triweight Quartic Cosine andEpanechnikov were ranked as shown in Table 2 In anotherexample eight reference points used 119896nn-regression Tri-weight Quartic Cosine and Epanechnikov average rank(Table 2) As shown in Table 2 the 119896nn-regression has thelargest average rank and Epanechnikov has the smallest rankaverage for all of the reference point cases This result provesthe dissimilarity of these methods

To determine which methods are dissimilar to the othersthis study performed theWilcoxon signed rank test [38]Thebasic feature of the Wilcoxon signed rank test is that datasamples that come from the same population are paired andit is detailed in the following

119882 =

1003816100381610038161003816100381610038161003816100381610038161003816

119873

sum

119894=1

[sign (1199102119894minus 1199101119894) 119877119894]

1003816100381610038161003816100381610038161003816100381610038161003816

(10)

where 119873 is the sample size 1199102119894

is 119894th value of the seconddata point 119910

1119894is 119894th value of the first data point and 119877

119894

is the rank of |1199102119894minus 1199101119894| If the 119882 value is less than 5

it means there is different mechanism used on the sampledata or method Table 3 shows that the 119882 value for 119896nn-regression is less than 5 for all cases Accordingly thissignifies that 119896nn-regression is completely dissimilar to theothermethodsAlthough the five different kernelmethods for

data interpolation exhibit similarity or dissimilarity to eachother depending on the number of reference points all ofthe kernelmethods can be distinguished from 119896nn-regressionusing the Wilcoxon signed rank test

4 Results

Since Epanechnikov has the smallest average rank whichsignifies a small difference between the observation valueand the interpolated value for all reference points in Table 2interpolation data obtained from the Epanechnikov methodhas the best result among the studied methods Figure 2shows that filling in data from 119896nn-regression has a largedifference at both four and six reference points Interpolationdata from the kernel methods are close to zero for both theaverage and median values at four reference points meaningthat the interpolation data are similar to the observation dataOn the other hand more than 75 of the interpolation datafrom 119896nn-regression exhibits a difference than zero Whenthe interpolation data are evaluated at six reference points inFigure 2 the median value from the 119896nn-regression is shownto be far away from zero At eight reference points 119896nn-regression is close to zero for both average andmedian valueshowever it is difficult to conclude that this is an ideal method

6 Advances in Meteorology

Table 3 Chi-square (Χ2) test with Wilcoxon signed rank method between regression and five different kernel methods

(a) 4-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(b) 6-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(c) 8-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine and Reg regression)

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(a) 4-NN

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(b) 6-NN

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(c) 8-NN

Figure 2 Box plots for difference between actual precipitation and interpolated precipitation 119910-axis represents mm per day

because outlying maximum values will affect the average andmedian value

This study on precipitation data interpolation also eval-uated the simulation of the interpolated data using theSWAT hydrologic model In SWAT hydrologic modelingthe surface runoff is estimated by considering excess pre-cipitation with abstractions and infiltration factor throughSoil Conservation Service CurveNumber (SCS-CN)method

Green-Ampt (GA) infiltration method is another methodto calculate the surface runoff in SWAT A study showsthat both methods give reasonable results and there is nosignificant advantage observed in using one over the otherHowever the GAmethod appears to havemore limitations inmodeling seasonal variability than the SCS-CNmethod doesHence the SCS-CN method is used for infiltration factor inthis study An SCS curve number based simulation needs

Advances in Meteorology 7

0

100

200

300

400

500

60020

08-0

1-01

2008

-01-

2220

08-0

2-12

2008

-03-

0420

08-0

3-25

2008

-04-

1520

08-0

5-06

2008

-05-

2720

08-0

6-17

2008

-07-

0820

08-0

7-29

2008

-08-

1920

08-0

9-09

2008

-09-

3020

08-1

0-21

2008

-11-

1120

08-1

2-02

2008

-12-

2320

09-0

1-13

2009

-02-

0320

09-0

2-24

2009

-03-

1720

09-0

4-07

2009

-04-

2820

09-0

5-19

2009

-06-

0920

09-0

6-30

2009

-07-

2120

09-0

8-11

2009

-09-

0120

09-0

9-22

2009

-10-

1320

09-1

1-03

2009

-11-

2420

09-1

2-15

2010

-01-

0520

10-0

1-26

2010

-02-

1620

10-0

3-09

2010

-03-

3020

10-0

4-20

2010

-05-

1120

10-0

6-01

2010

-06-

2220

10-0

7-13

2010

-08-

0320

10-0

8-24

2010

-09-

1420

10-1

0-05

2010

-10-

2620

10-1

1-16

2010

-12-

0720

10-1

2-28

ObsSim

Figure 3 Calibrated model result using original precipitation input 119909-axis represents time in days and 119910-axis represents flow in cubic metersper second

Table 4 Details of SWAT parameters which are related to runoff mechanism for Imha watershed

Parameter Description Selected valueESCO Soil evaporation compensation factor 09500EPCO Plant water uptake compensation factor 10000EVLAI Leaf area index at which no evaporation occurs from water surface [m2m2

] 30000FFCB Initial soil water storage expressed as a fraction of field capacity water content 00000IEVENT Rainfallrunoff code 0 = daily rainfallCN 00000ICRK Crack flow code 1 = model crack flow in soil 00000SURLAG Surface runoff lag time [days] 40000ADJ PKR Peak rate adjustment factor for sediment routing in the subbasin (tributary channels) 00000PRF Peak rate adjustment factor for sediment routing in the main channel 10000

SPCON Linear parameter for calculating the maximum amount of sediment that can be reentrainedduring channel sediment 00001

SPEXP Exponent parameter for calculating sediment reentrained in channel sediment routing 10000

time step updated information as soil water content changesExcess rainfall equation in SCS-CN method was generatedbased on historical relationship between the curve numberand the hydrologic mechanism for over 20 years Through-out the surface runoff calculation infiltration should beupdated over time according to the soil type Other abstrac-tions such as evapotranspiration and soil and snow evap-oration are calculated by Penman-Monteith method and

meteorological statistics Finally the kinematic storagemodelis used to compute groundwater storage and seepage Flowresulting in SWAT modeling is routed HRUs to watershedoutlet Figure 3 shows the calibration of themodel simulationas the initial step and the specific parameters are describedin Table 4 After the calibration of the SWAT model thesix different interpolated precipitation datasets with threedifferent reference ranges for each (a total of twenty-four

8 Advances in Meteorology

Table 5 Details of simulation results with six different precipitation infilling methods in Imha watershed

4-NN 6-NN 8-NN119864NS 119877

2 RMSE 119864NS 1198772 RMSE 119864NS 119877

2 RMSEEp 080 083 1532 091 092 1060 078 082 1616Qu 073 078 1783 091 093 1048 088 092 1180Tw 091 091 1056 093 094 925 095 095 810Tc 095 095 772 093 094 903 095 095 764Co 093 094 883 095 095 772 091 093 1044Reg 069 080 1914 021 065 3071 071 073 2148

interpolated precipitations data points) were used to assessthe performance of interpolated precipitation data for hydro-logic model simulation Streamflow simulations were donefor three years from 2008 to 2010 To evaluate the modelperformance considering the use of different interpolatedprecipitation datasets this study used 119864NS (Nash-Sutcliffecoefficient) 119877-square (coefficient of determination) andRMSE (root mean square error) Table 5 and Figure 4 showthat the simulation results from 119896nn-regression exhibit lowSWAT simulation performance for streamflow estimationswith 054119864NS 074119877-square and 2378m3s RMSE as anaverage All of the kernel functions on the other handexhibit good performance for hydrologic simulations withinterpolated precipitation data (Table 5 and Figure 4) theaverage of 119864NS 119877-square and RMSE (1) for Epanechnikov is083 086 and 1403m3s (2) for Quartic is 084 088 and1303m3s (3) for Triweight is 093 093 and 930m3s (4)for Tricube is 094 095 and 813m3s and (5) for Cosine is093 094 and 900m3s respectively

5 Conclusions

Five different kernel functions were applied to the Imhawatershed to evaluate the performance of each weightedmethod for estimating missing precipitation data and the useof interpolated data for hydrologic simulations was assessedThe following conclusions can be drawn from this research

(1) To estimate missing precipitation data points explor-atory procedures should consider the spatiotempo-ral variations of precipitation Due to difficulty onaccounting for these variations statistical methodsfor estimating missing precipitation data are com-monly used

(2) Although ANNs are an advanced approach for esti-matingmissing datamechanisms are unclear becausethe neuron system is ultimately a black-box modelThus regressionmethods are widely used for estimat-ing missing data even though there are limitationsin that regression methods cannot follow normaldistribution when the sample is small

(3) When using kernel functions as a weighted methodestimated missing data would satisfy normal distri-bution which is more statistically sound Also kernelmethods can overcome weakness in 119896nn-regression if

the data have outliers andor a nonlinear trend aroundthe missing data points in terms of mean value

(4) This study assessed the five kernel functions Epan-echnikov Quartic Triweight Tricube and Cosine asa weight for predictingmissing values In comparisonwith the 119896nn-regression method this study demon-strates that the kernel approaches provide higherquality interpolated precipitation data than the 119896nn-regression approach In addition the kernel functionresults better conform to statistical standards

(5) Furthermore higher quality of interpolated precipi-tation data results in better performance for hydro-logic simulations as exemplified in this study All ofthe statistical analyses of the streamflow simulationsshowed that the simulations using the interpolatedprecipitation data from the kernel functions providebetter results than using 119896nn-regression

(6) Use of kernel distribution is a more effective methodthan regression when the precipitation data have anupward or downward trend However if the precip-itation data have a nonlinear trend it is difficult toeffectively reconstruct the missing values For furtherresearch a time series analysis or a random walkmodel using a stochastic process are possiblemethodsby which to estimate missing data where there is anonlinear trend

Appendices

A Kenel Functions

Kernel density estimation is an unsupervised learning pro-cedure which historically precedes kernel regression It alsoleads naturally to a simple family of procedures for nonpara-metric classification

A1 Kernel Density Estimation Suppose we have a randomsample 119909

1 1199092 119909

119873draw from a probability density 119891

119909(119909)

and we wish to estimate 119891119909at a point 119909

0 For simplicity we

assume for now that 119909 isin 119877 (real value) Arguing as before anatural local estimate has the form

119891119909(1199090) =

119909119894isin 119873 (119909

0)

119873120582 (A1)

where 119909119894means number of 119909

119894which converges to119873(119909

0) and

119873(1199090) is a small metric neighborhood around 119909

0of width 120582

Advances in Meteorology 9

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(a) 4-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(b) 6-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

EPQUTW

TCCOKNN

(c) 8-NN

Figure 4 Scatter plots for SWAT simulation (EP QU TW TC CO and119870NN represents Epanechnikov Quartic Triweight Tricube Cosineand 119870NN-regression resp)

This estimate is bumpy and the smooth Parzen estimate ispreferred

119891119909(1199090) =

1

119873120582

119873

sum

119894=1

119870120582(1199090 119909119894) (A2)

because it counts observations close to 1199090with weights that

decrease with distance from 1199090 In this case a popular choice

for 119870120582is the Gaussian kernel 119870

120582(1199090 119909119894) = 120601(|119909 minus 119909

0|120582)

Letting 120601120582denote the Gaussian density with mean zero and

standard-deviation 120582 then (A2) has the form

119891119909(1199090) =

1

119873

119873

sum

119894=1

120601120582(119909 minus 119909

119894) = (119865 lowast 120601

120582) (119909) (A3)

the convolution of the sample empirical distribution 119865 with120601120582 The distribution 119865(119909) puts mass 1119873 at each of the ob-

served 119909119894and is jumpy in 119891

119909(119909) we have smoothed 119865

by adding independent Gaussian noise to each observation119909119894The Parzen density estimate is the equivalent of the local

average and improvements have been proposed along thelines of local regression (on the log scale for densities) Wewill not pursue these here In 119877119901 the natural generalization ofthe Gaussian density estimate amounts to using the Gaussianproduct kernel in (A3)

119891119909(1199090) =

1

119873 (21205822120587)1199012

119873

sum

119894=1

119890minus(12)(||119909119894minus1199090||120582)

2

(A4)

10 Advances in Meteorology

Table 6 Weighted values depending on day distance with each 119870NN

4-NN 6-NN 8-NN1st 2nd 1st 2nd 3rd 1st 2nd 3rd 4th

Ep 0667 0417 0703 0563 0328 0720 0630 0480 0270Qu 0741 0289 0824 0527 0179 0864 0662 0384 0122Tw 0768 0188 0901 0461 0092 0968 0648 0287 0051Tc 0772 0301 0824 0579 0167 0844 0709 0416 0100Co 0680 0393 0726 0555 0301 0747 0635 0462 0243

A2 Kernel Density Classification One can use nonparamet-ric density estimates for classification in a straight-forwardfashion using Bayesrsquo theorem Suppose for a 119869 class problemwe fit nonparametric density estimates 119891

119895(119883) 119895 = 1 119869

separately in each of the classes and we also have estimatesof the class priors

119895(usually the sample proportions) Then

Pr (119866 = 119895 | 119883 = 1199090) =

119895119891119895(1199090)

sum119869

119896=1119896119891119896(1199090)

(A5)

In this region the data are sparse for both classes and sincethe Gaussian kernel density estimates use matric kernels thedensity estimates are low and of poor quality (high variance)in these regions The local logistic regression method usesthe tricube kernel with 119896-NN bandwidth this effectivelywidens the kernel in this region and makes use of the locallinear assumption to smooth out the estimate (on the logitscale)

If classification is the ultimate goal then learning theseparate class densities well may be unnecessary and can infact be misleading In learning the separate densities formdata one might decide to settle for a rougher high-variancefit to capture these features which are irrelevant for thepurposes of estimating the posterior probabilities In factif classification is the ultimate goal then we need only toestimate the posterior well near the decision boundary (fortwo classes this is the set 119909 | Pr(119866 = 1 | 119883 = 119909) = 12)

B Procedures of Missing Precipitation

This step shows example calculation for kernel functions forweighted mean It is an example question about the weight ofeach situation If the kernel functions are all symmetric samevalues are used for weight based on day distance FollowingTable 6 1st 2nd 3rd and 4th day distance and weightedvalues are shown For example if we want to estimatemissing precipitation for 2010-02-12 (actual value is 6) seefollowing procedures (3 steps) with 4-NN Epanechnikovkernel (Table 7)

Step 1 Select the date for target interpolation data

Step 2 Decide119870th nearest days precipitation and each kernelweight

Step 3 Calculate the weight average to estimate missing

Table 7 Example calculation to interpolate for missing precipita-tion

Step 1 Step 2 Step 3Date Prec Weight PrecsdotWeight Estimation2010-02-10 15 0417 6255

59492010-02-11 172 0667 114722010-02-12 6 mdash mdash2010-02-13 91 0667 60702010-02-14 0 0417 0

Table 8 Calculating missing precipitation for 2010-02-12 (actualvalue is 6) with six different methods

Method 4NN 6NN 8NNEp 5949 5172 4862Qu 5956 5302 4936Tw 5755 5294 4952Tc 6205 5407 4963Co 5945 5197 4876Reg 10325 8967 8813(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine andReg regression)

The rest of the kernel methods for estimating missingprecipitation are described in Table 8

C Sample Calculations with Real Value

This section shows how to calculate missing precipitationwith kernel mean weighed function by using certain numberThis sample selected daily data from 2008 to 2010 with002 possibilities to bivariate by random After selected datasetting data location is operated Zhang et al [39] addressedthat kernel based nonparametric multiple imputation hasbetter performance than general linear regression when thesample data is small or limited

Table 9 shows procedure of kernel weight in each func-tion We used data Feb 10 2012 from Feb 14 2014 toestimate Feb 12 2012 missing data Epanechnikov kernelshowed that longest data has highest estimation as 0417however Triweight kernel showed that longest data has lowestestimation as 0188 Highest weight in nearest value is Tricubekernel and lowest weight is Epanechnikov kernel Generally

Advances in Meteorology 11

Table 9 Sample calculation with certain number

(a) Epanechnikov

Date 210 211 212 213 214Prec 150 172 60 91 00Ep weight 0417 0667 mdash 0667 0417Precsdotweight 626 1147 mdash 607 000Estimation 595

(b) Quartic

Date 210 211 212 213 214Prec 150 172 60 91 00Qu weight 0289 0741 mdash 0741 0289Precsdotweight 434 1275 mdash 674 000Estimation 596

(c) Triweight

Date 210 211 212 213 214Prec 150 172 60 91 00Tw weight 0188 0768 mdash 0768 0188Precsdotweight 282 1321 mdash 699 000Estimation 575

(d) Tricube

Date 210 211 212 213 214Prec 150 172 60 91 00Tc weight 0301 0772 mdash 0772 0301Precsdotweight 452 1328 mdash 703 000Estimation 620

(e) Cosine

Date 210 211 212 213 214Prec 150 172 60 91 00Co weight 0393 0680 mdash 0680 0393Precsdotweight 590 1170 mdash 619 000Estimation 594

Tricube that is high weight shows the overestimation formissing precipitation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] K Kang and V Merwade ldquoDevelopment and application of astorage-release based distributed hydrologic model using GISrdquoJournal of Hydrology vol 403 no 1-2 pp 1ndash13 2011

[2] A J Abebe D P Solomatine and R G W Venneker ldquoApplica-tion of adaptive fuzzy rule-based models for reconstruction ofmissing precipitation eventsrdquoHydrological Sciences Journal vol45 no 3 pp 425ndash436 2000

[3] P Ramos-Calzado J Gomez-Camacho F Perez-Bernal andMF Pita-Lopez ldquoA novel approach to precipitation series com-pletion in climatological datasets application to AndalusiardquoInternational Journal of Climatology vol 28 no 11 pp 1525ndash1534 2008

[4] T Schneider ldquoAnalysis of incomplete climate data estimation ofmean values and covariancematrices and imputation ofmissingvaluesrdquo Journal of Climate vol 14 no 5 pp 853ndash871 2001

[5] S K Regonda D-J Seo B Lawrence J D Brown and JDemargne ldquoShort-term ensemble stream forecasting usingoperationally-produced single-valued streamflow forecastsmdasha Hydrologic Model Output Statistics (HMOS) approachrdquoJournal of Hydrology vol 497 pp 80ndash96 2013

[6] P Coulibaly and N D Evora ldquoComparison of neural networkmethods for infilling missing daily weather recordsrdquo Journal ofHydrology vol 341 no 1-2 pp 27ndash41 2007

[7] H A El Sharif and R S V Teegavarapu ldquoEvaluation ofspatial interpolation methods for missing precipitation datapreservation of spatial statisticsrdquo in Proceedings of the WorldEnvironmental and Water Resources Congress pp 3822ndash3832May 2012

[8] A Bardossy and G Pegram ldquoInfilling missing precipitationrecordsmdasha comparison of a new copula-based method withother techniquesrdquo Journal of Hydrology vol 519 pp 1162ndash11702014

[9] K Schamm M Ziese A Becker et al ldquoGlobal griddedprecipitation over land a description of the new GPCC firstguess daily productrdquo Earth System Science Data vol 6 no 1pp 49ndash60 2014

[10] R S V Teegavarapu ldquoStatistical corrections of spatially inter-polated missing precipitation data estimatesrdquoHydrological Pro-cesses vol 28 no 11 pp 3789ndash3808 2014

[11] Y Da and G Xiurun ldquoAn improved PSO-based ANN withsimulated annealing techniquerdquo Neurocomputing vol 63 pp527ndash533 2005

[12] A di Piazza F L Conti L V Noto F Viola and G La LoggialdquoComparative analysis of different techniques for spatial inter-polation of rainfall data to create a serially complete monthlytime series of precipitation for Sicily Italyrdquo International Journalof Applied Earth Observation and Geoinformation vol 13 no 3pp 396ndash408 2011

[13] E Pisoni F Pastor and M Volta ldquoArtificial Neural Networksto reconstruct incomplete satellite data application to theMediterranean Sea SurfaceTemperaturerdquoNonlinear Processes inGeophysics vol 15 no 1 pp 61ndash70 2008

[14] V Sharma S Rai and A Dev ldquoA comprehensive study ofartificial neural networksrdquo International Journal of AdvancedResearch in Computer Science and Sofrware Engineering vol 2no 10 pp 278ndash284 2012

[15] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy II hydrologic applicationsrdquo Journal of Hydrological Engi-neering vol 5 no 2 pp 124ndash137 2000

[16] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy I preliminary conceptsrdquo Journal of Hydrologic Engineeringvol 5 no 2 pp 115ndash123 2000

[17] D E Rumelhart B Widrow and M A Lehr ldquoThe basic ideasin neural networksrdquo Communications of the ACM vol 37 no 3pp 87ndash92 1994

[18] J Amorocho and W E Hart ldquoA critique of current methods inhydrologic systems investigationrdquo Transactions of the AmericanGeophysical Union vol 45 no 2 pp 307ndash321 1964

12 Advances in Meteorology

[19] A W Minns and M J Hall ldquoArtificial neural networks asrainfall-runoff modelsrdquo Hydrological Sciences Journal vol 41no 3 pp 399ndash417 1996

[20] K Kang and V Merwade ldquoThe effect of spatially uniformand non-uniform precipitation bias correction methods onimproving NEXRAD rainfall accuracy for distributed hydro-logic modelingrdquo Hydrology Research vol 45 no 1 pp 23ndash422014

[21] C Daly W P Gibson G H Taylor G L Johnson andP Pasteris ldquoA knowledge-based approach to the statisticalmapping of climaterdquo Climate Research vol 22 no 2 pp 99ndash1132002

[22] J D Creutin H Andrieu and D Faure ldquoUse of a weatherradar for the hydrology of a mountainous area Part II radarmeasurement validationrdquo Journal of Hydrology vol 193 no 1ndash4 pp 26ndash44 1997

[23] Y L Xia P Fabian A Stohl and M Winterhalter ldquoForest cli-matology estimation of missing values for Bavaria GermanyrdquoAgricultural and Forest Meteorology vol 96 no 1ndash3 pp 131ndash1441999

[24] C J Willmott S M Robeson and J J Feddema ldquoEstimatingcontinental and terrestrial precipitation averages from rain-gauge networksrdquo International Journal of Climatology vol 14no 4 pp 403ndash414 1994

[25] R S V Teegavarapu and V Chandramouli ldquoImproved weight-ing methods deterministic and stochastic data-driven modelsfor estimation of missing precipitation recordsrdquo Journal ofHydrology vol 312 no 1ndash4 pp 191ndash206 2005

[26] J A Smith ldquoPrecipitationrdquo in Handbook of Hydrology D RMaidment Ed vol 3 chapter 3 McGraw Hill New York NYUSA 1993

[27] J R Simanton and H B Osborn ldquoReciprocal-distance estimateof point rainfallrdquo Journal of Hydraulic Engineering Division vol106 no 7 pp 1242ndash1246 1980

[28] J D-J Salas ldquoAnalysis and modeling of hydrological timeseriesrdquo inHandbook of Hydrology D R Maidment Ed vol 19chapter 19 pp 191ndash1972 McGraw-Hill New York NY USA1993

[29] S J Jeffrey J O Carter K BMoodie and A R Beswick ldquoUsingspatial interpolation to construct a comprehensive archive ofAustralian climate datardquoEnvironmentalModelling and Softwarevol 16 no 4 pp 309ndash330 2001

[30] M Franklin V R Kotamarthi M L Stein and D R CookldquoGenerating data ensembles over a model grid from sparseclimate point measurementsrdquo Journal of Physics ConferenceSeries vol 125 Article ID 012019 2008

[31] K C Young ldquoA three-way model for interpolating for monthlyprecipitation valuesrdquo Monthly Weather Review vol 120 no 11pp 2561ndash2569 1992

[32] F Filippini G Galliani and L Pomi ldquoThe estimation ofmissingmeteorological data in a network of automatic stationsrdquoTransactions on Ecology and the Environment vol 4 pp 283ndash291 1994

[33] W P Lowry Compendium of Lecture Notes in Climatologyfor Class IV Meteorological Personnel Secretariat of the WorldMeteorological Organization Geneva Switzerland 1972

[34] M C Acock and Y A Pachepsky ldquoEstimating missing weatherdata for agricultural simulations using group method of datahandlingrdquo Journal of AppliedMeteorology vol 39 no 7 pp 1176ndash1184 2000

[35] S L Neitsch J G Arnold J R Kiniry J R Williamsand K W King Soil and Water Assessment ToolmdashTheoreticalDocumentation (Version 2005) Texas Water Resource InstituteCollege Station Tex USA 2005

[36] S S Shapiro and M B Wilk ldquoAn analysis of variance test fornormality complete samplesrdquo Biometrika vol 52 no 3-4 pp591ndash611 1965

[37] M Friedman ldquoThe use of ranks to avoid the assumption ofnormality implicit in the analysis of variancerdquo Journal of theAmerican Statistical Association vol 32 no 200 pp 675ndash7011937

[38] F Wilcoxon ldquoIndividual Comparisons by Ranking MethodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[39] S Zhang Z Jin X Zhu and J Zhang ldquoMissing data analysisa kernel-based multi-imputation approachrdquo in Transactionson Computational Science III vol 5300 of Lecture Notes inComputer Science pp 122ndash142 Springer Berlin Germany 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ClimatologyJournal of

EcologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

EarthquakesJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom

Applied ampEnvironmentalSoil Science

Volume 2014

Mining

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal of

Geophysics

OceanographyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of Computational Environmental SciencesHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal ofPetroleum Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

GeochemistryHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Atmospheric SciencesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OceanographyHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MineralogyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MeteorologyAdvances in

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Paleontology JournalHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ScientificaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geological ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geology Advances in

Page 4: Research Article Interpolation of Missing Precipitation ...downloads.hindawi.com/journals/amete/2015/935868.pdf · withthe NN regression approach, in terms of both statistical data

4 Advances in Meteorology

Table 1 Results of normality test with Shapiro-Wilk method for each119870-nearest neighborhood DF represents degree of freedom and 119875 valuemeans significance probability

4-NN 6-NN 8-NN119882 DF 119875 value 119882 DF 119875 value 119882 DF 119875 value

Ep 0808 19 00015 0740 19 00002 0766 19 00004Qu 0831 19 00033 0768 19 00004 0721 19 00001Tw 0827 19 00029 0789 19 00008 0745 19 00002Tc 0839 19 00045 0764 19 00004 0742 19 00002Co 0817 19 00020 0742 19 00002 0763 19 00003Reg 0876 19 00186 0858 19 00089 0883 19 00242(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine and Reg regression)

the valuesmove further away from the nearest values it showsa steep trend The Tricube kernel has the most sensitivity interms of weighted distance due to the fact that it consists of aninth-order equation as shown in the following

119870 (119909) =70

81(1 minus |119909|

3)3

(4)

35 Cosine The fifth kernel function used in this researchwas the Cosine kernel function It is a widely applied kernelfunction in various fields because it has a constant curvatureIts shape is similar to the Epanechnikov kernel even thoughit uses a cosine function as shown in the following

119870 (119909) =120587

4cos(120587

2119909) (5)

36 Calculation of the Missing Value After using a kernelfunction to calculate the weight of the missing data estima-tion of the missing data is performed using the following

119872 =1

119875

119875

sum

119894=1

119909119894sdot 119870 (119906

119894)

119906119894=

119873119894

05119875 + 1

119873119894= minus

119875

2

119875

2

(6)

where 119872 is the missing value 119875 is the number of thenearest neighborhood and 119906

119894is the119873th nearest values which

correspond to 119909119894(positive means the right side and negative

means the left side)The kernel function should have bilateralsymmetry based on a value of zero If using for example thefour nearest neighborhoods for estimating the missing valuethe neighborhood values used will be two from right sideand another two from left side The specific equation for thisexample is shown in the following and example calculation isdescribed in Appendix B

119872 =1

4119870(minus

2

3) sdot 1199091+ 119870(minus

1

3) sdot 1199092

+ 119870(1

3) sdot 1199093+ 119870(

2

3) sdot 1199094

(7)

37 Statistic Tests A normality test is required to evaluatefor infilling the methods for filling in interpolation dataThe Shapiro-Wilk [36] normality test was used with nineteensamples to determine whether the average difference isnormally distributed or not The test statistic is as shown inthe following

119882 =(sum119899

119894=1119886119894119910119894)2

sum119899

119894=1(119910119894minus 119910)2 (8)

where 119910119894is the 119894th order statistic namely the 119894th smallest

value in the sample 119910 is the mean of 119910119894 and 119886

119894is a constant

given by ordered data The null hypothesis of the Shapiro-Wilk normality test is that sample is normally distributed andif significance probability is less than 5 the null hypothesiswill be denied meaning the sample does not satisfy normaldistribution Since the significance probability for the entiregroup (Table 1) is below 5 the null hypothesis is deniedThis study should therefore use a nonparametric test fornormality analysis

The Friedman test [37] which is a kind of 119896-sample testthat can provide the difference between paired values wasselected as a nonparametric test This method evaluates asmall sample for differences by ranking a sequence list Thenull hypothesis of the Friedman test is that there is no averagedifference in each group and if the significance probabilityis less than 5 the null hypothesis will be denied thusconducting that in each group exists an average differenceA brief description of Friedman test is in the following

119876 =SStSSe

(9)

where SSt and SSe are the sum of the squared treatment andsum of the squared error respectively

The null hypothesis in this instance was denied becausethe significance probability was less than 5 for each andthis study concluded that each interpolation method has anaverage difference which is why each method is consideredindependent even though this study used five different kernelmethods For example the average rank for four referencepoints for 119896nn-regression Tricube Quartic Cosine Tri-weight and Epanechnikov varies from a large average toa small average rank (Table 2) For six reference points

Advances in Meteorology 5

Table 2 Chi-square (Χ2) test with Friedman method for finding difference among six infilling methods SD represents standard deviation119875 value means significance probability

(a) 4-NN

119873 Mean SD Min Max value Mean rank Χ2

119875 valueEp 19 minus129 446 minus1550 576 253

55602 00000

Qu 19 minus142 503 minus1506 751 274Tw 19 minus164 537 minus1490 813 258Tc 19 minus118 506 minus1487 829 447Co 19 minus140 469 minus1542 608 268Reg 19 276 542 minus1347 1438 600

(b) 6-NN

119873 Mean SD Min Max value Mean rank Χ2

119875 valueEp 19 minus261 472 minus1668 158 153

66519 00000

Qu 19 minus227 471 minus1620 282 316Tw 19 minus218 484 minus1589 406 379Tc 19 minus215 463 minus1620 284 421Co 19 minus254 469 minus1659 150 232Reg 19 006 497 minus1548 733 600

(c) 8-NN

119873 Mean SD Min Max value Mean rank Χ2

119875 valueEp 19 minus340 504 minus1733 194 132

75812 00000

Qu 19 minus310 475 minus1690 045 321Tw 19 minus274 479 minus1659 129 458Tc 19 minus293 477 minus1696 151 368Co 19 minus328 493 minus1725 185 221Reg 19 minus124 535 minus1649 808 600(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine and Reg regression)

the 119896nn-regression Tricube Triweight Quartic Cosine andEpanechnikov were ranked as shown in Table 2 In anotherexample eight reference points used 119896nn-regression Tri-weight Quartic Cosine and Epanechnikov average rank(Table 2) As shown in Table 2 the 119896nn-regression has thelargest average rank and Epanechnikov has the smallest rankaverage for all of the reference point cases This result provesthe dissimilarity of these methods

To determine which methods are dissimilar to the othersthis study performed theWilcoxon signed rank test [38]Thebasic feature of the Wilcoxon signed rank test is that datasamples that come from the same population are paired andit is detailed in the following

119882 =

1003816100381610038161003816100381610038161003816100381610038161003816

119873

sum

119894=1

[sign (1199102119894minus 1199101119894) 119877119894]

1003816100381610038161003816100381610038161003816100381610038161003816

(10)

where 119873 is the sample size 1199102119894

is 119894th value of the seconddata point 119910

1119894is 119894th value of the first data point and 119877

119894

is the rank of |1199102119894minus 1199101119894| If the 119882 value is less than 5

it means there is different mechanism used on the sampledata or method Table 3 shows that the 119882 value for 119896nn-regression is less than 5 for all cases Accordingly thissignifies that 119896nn-regression is completely dissimilar to theothermethodsAlthough the five different kernelmethods for

data interpolation exhibit similarity or dissimilarity to eachother depending on the number of reference points all ofthe kernelmethods can be distinguished from 119896nn-regressionusing the Wilcoxon signed rank test

4 Results

Since Epanechnikov has the smallest average rank whichsignifies a small difference between the observation valueand the interpolated value for all reference points in Table 2interpolation data obtained from the Epanechnikov methodhas the best result among the studied methods Figure 2shows that filling in data from 119896nn-regression has a largedifference at both four and six reference points Interpolationdata from the kernel methods are close to zero for both theaverage and median values at four reference points meaningthat the interpolation data are similar to the observation dataOn the other hand more than 75 of the interpolation datafrom 119896nn-regression exhibits a difference than zero Whenthe interpolation data are evaluated at six reference points inFigure 2 the median value from the 119896nn-regression is shownto be far away from zero At eight reference points 119896nn-regression is close to zero for both average andmedian valueshowever it is difficult to conclude that this is an ideal method

6 Advances in Meteorology

Table 3 Chi-square (Χ2) test with Wilcoxon signed rank method between regression and five different kernel methods

(a) 4-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(b) 6-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(c) 8-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine and Reg regression)

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(a) 4-NN

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(b) 6-NN

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(c) 8-NN

Figure 2 Box plots for difference between actual precipitation and interpolated precipitation 119910-axis represents mm per day

because outlying maximum values will affect the average andmedian value

This study on precipitation data interpolation also eval-uated the simulation of the interpolated data using theSWAT hydrologic model In SWAT hydrologic modelingthe surface runoff is estimated by considering excess pre-cipitation with abstractions and infiltration factor throughSoil Conservation Service CurveNumber (SCS-CN)method

Green-Ampt (GA) infiltration method is another methodto calculate the surface runoff in SWAT A study showsthat both methods give reasonable results and there is nosignificant advantage observed in using one over the otherHowever the GAmethod appears to havemore limitations inmodeling seasonal variability than the SCS-CNmethod doesHence the SCS-CN method is used for infiltration factor inthis study An SCS curve number based simulation needs

Advances in Meteorology 7

0

100

200

300

400

500

60020

08-0

1-01

2008

-01-

2220

08-0

2-12

2008

-03-

0420

08-0

3-25

2008

-04-

1520

08-0

5-06

2008

-05-

2720

08-0

6-17

2008

-07-

0820

08-0

7-29

2008

-08-

1920

08-0

9-09

2008

-09-

3020

08-1

0-21

2008

-11-

1120

08-1

2-02

2008

-12-

2320

09-0

1-13

2009

-02-

0320

09-0

2-24

2009

-03-

1720

09-0

4-07

2009

-04-

2820

09-0

5-19

2009

-06-

0920

09-0

6-30

2009

-07-

2120

09-0

8-11

2009

-09-

0120

09-0

9-22

2009

-10-

1320

09-1

1-03

2009

-11-

2420

09-1

2-15

2010

-01-

0520

10-0

1-26

2010

-02-

1620

10-0

3-09

2010

-03-

3020

10-0

4-20

2010

-05-

1120

10-0

6-01

2010

-06-

2220

10-0

7-13

2010

-08-

0320

10-0

8-24

2010

-09-

1420

10-1

0-05

2010

-10-

2620

10-1

1-16

2010

-12-

0720

10-1

2-28

ObsSim

Figure 3 Calibrated model result using original precipitation input 119909-axis represents time in days and 119910-axis represents flow in cubic metersper second

Table 4 Details of SWAT parameters which are related to runoff mechanism for Imha watershed

Parameter Description Selected valueESCO Soil evaporation compensation factor 09500EPCO Plant water uptake compensation factor 10000EVLAI Leaf area index at which no evaporation occurs from water surface [m2m2

] 30000FFCB Initial soil water storage expressed as a fraction of field capacity water content 00000IEVENT Rainfallrunoff code 0 = daily rainfallCN 00000ICRK Crack flow code 1 = model crack flow in soil 00000SURLAG Surface runoff lag time [days] 40000ADJ PKR Peak rate adjustment factor for sediment routing in the subbasin (tributary channels) 00000PRF Peak rate adjustment factor for sediment routing in the main channel 10000

SPCON Linear parameter for calculating the maximum amount of sediment that can be reentrainedduring channel sediment 00001

SPEXP Exponent parameter for calculating sediment reentrained in channel sediment routing 10000

time step updated information as soil water content changesExcess rainfall equation in SCS-CN method was generatedbased on historical relationship between the curve numberand the hydrologic mechanism for over 20 years Through-out the surface runoff calculation infiltration should beupdated over time according to the soil type Other abstrac-tions such as evapotranspiration and soil and snow evap-oration are calculated by Penman-Monteith method and

meteorological statistics Finally the kinematic storagemodelis used to compute groundwater storage and seepage Flowresulting in SWAT modeling is routed HRUs to watershedoutlet Figure 3 shows the calibration of themodel simulationas the initial step and the specific parameters are describedin Table 4 After the calibration of the SWAT model thesix different interpolated precipitation datasets with threedifferent reference ranges for each (a total of twenty-four

8 Advances in Meteorology

Table 5 Details of simulation results with six different precipitation infilling methods in Imha watershed

4-NN 6-NN 8-NN119864NS 119877

2 RMSE 119864NS 1198772 RMSE 119864NS 119877

2 RMSEEp 080 083 1532 091 092 1060 078 082 1616Qu 073 078 1783 091 093 1048 088 092 1180Tw 091 091 1056 093 094 925 095 095 810Tc 095 095 772 093 094 903 095 095 764Co 093 094 883 095 095 772 091 093 1044Reg 069 080 1914 021 065 3071 071 073 2148

interpolated precipitations data points) were used to assessthe performance of interpolated precipitation data for hydro-logic model simulation Streamflow simulations were donefor three years from 2008 to 2010 To evaluate the modelperformance considering the use of different interpolatedprecipitation datasets this study used 119864NS (Nash-Sutcliffecoefficient) 119877-square (coefficient of determination) andRMSE (root mean square error) Table 5 and Figure 4 showthat the simulation results from 119896nn-regression exhibit lowSWAT simulation performance for streamflow estimationswith 054119864NS 074119877-square and 2378m3s RMSE as anaverage All of the kernel functions on the other handexhibit good performance for hydrologic simulations withinterpolated precipitation data (Table 5 and Figure 4) theaverage of 119864NS 119877-square and RMSE (1) for Epanechnikov is083 086 and 1403m3s (2) for Quartic is 084 088 and1303m3s (3) for Triweight is 093 093 and 930m3s (4)for Tricube is 094 095 and 813m3s and (5) for Cosine is093 094 and 900m3s respectively

5 Conclusions

Five different kernel functions were applied to the Imhawatershed to evaluate the performance of each weightedmethod for estimating missing precipitation data and the useof interpolated data for hydrologic simulations was assessedThe following conclusions can be drawn from this research

(1) To estimate missing precipitation data points explor-atory procedures should consider the spatiotempo-ral variations of precipitation Due to difficulty onaccounting for these variations statistical methodsfor estimating missing precipitation data are com-monly used

(2) Although ANNs are an advanced approach for esti-matingmissing datamechanisms are unclear becausethe neuron system is ultimately a black-box modelThus regressionmethods are widely used for estimat-ing missing data even though there are limitationsin that regression methods cannot follow normaldistribution when the sample is small

(3) When using kernel functions as a weighted methodestimated missing data would satisfy normal distri-bution which is more statistically sound Also kernelmethods can overcome weakness in 119896nn-regression if

the data have outliers andor a nonlinear trend aroundthe missing data points in terms of mean value

(4) This study assessed the five kernel functions Epan-echnikov Quartic Triweight Tricube and Cosine asa weight for predictingmissing values In comparisonwith the 119896nn-regression method this study demon-strates that the kernel approaches provide higherquality interpolated precipitation data than the 119896nn-regression approach In addition the kernel functionresults better conform to statistical standards

(5) Furthermore higher quality of interpolated precipi-tation data results in better performance for hydro-logic simulations as exemplified in this study All ofthe statistical analyses of the streamflow simulationsshowed that the simulations using the interpolatedprecipitation data from the kernel functions providebetter results than using 119896nn-regression

(6) Use of kernel distribution is a more effective methodthan regression when the precipitation data have anupward or downward trend However if the precip-itation data have a nonlinear trend it is difficult toeffectively reconstruct the missing values For furtherresearch a time series analysis or a random walkmodel using a stochastic process are possiblemethodsby which to estimate missing data where there is anonlinear trend

Appendices

A Kenel Functions

Kernel density estimation is an unsupervised learning pro-cedure which historically precedes kernel regression It alsoleads naturally to a simple family of procedures for nonpara-metric classification

A1 Kernel Density Estimation Suppose we have a randomsample 119909

1 1199092 119909

119873draw from a probability density 119891

119909(119909)

and we wish to estimate 119891119909at a point 119909

0 For simplicity we

assume for now that 119909 isin 119877 (real value) Arguing as before anatural local estimate has the form

119891119909(1199090) =

119909119894isin 119873 (119909

0)

119873120582 (A1)

where 119909119894means number of 119909

119894which converges to119873(119909

0) and

119873(1199090) is a small metric neighborhood around 119909

0of width 120582

Advances in Meteorology 9

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(a) 4-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(b) 6-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

EPQUTW

TCCOKNN

(c) 8-NN

Figure 4 Scatter plots for SWAT simulation (EP QU TW TC CO and119870NN represents Epanechnikov Quartic Triweight Tricube Cosineand 119870NN-regression resp)

This estimate is bumpy and the smooth Parzen estimate ispreferred

119891119909(1199090) =

1

119873120582

119873

sum

119894=1

119870120582(1199090 119909119894) (A2)

because it counts observations close to 1199090with weights that

decrease with distance from 1199090 In this case a popular choice

for 119870120582is the Gaussian kernel 119870

120582(1199090 119909119894) = 120601(|119909 minus 119909

0|120582)

Letting 120601120582denote the Gaussian density with mean zero and

standard-deviation 120582 then (A2) has the form

119891119909(1199090) =

1

119873

119873

sum

119894=1

120601120582(119909 minus 119909

119894) = (119865 lowast 120601

120582) (119909) (A3)

the convolution of the sample empirical distribution 119865 with120601120582 The distribution 119865(119909) puts mass 1119873 at each of the ob-

served 119909119894and is jumpy in 119891

119909(119909) we have smoothed 119865

by adding independent Gaussian noise to each observation119909119894The Parzen density estimate is the equivalent of the local

average and improvements have been proposed along thelines of local regression (on the log scale for densities) Wewill not pursue these here In 119877119901 the natural generalization ofthe Gaussian density estimate amounts to using the Gaussianproduct kernel in (A3)

119891119909(1199090) =

1

119873 (21205822120587)1199012

119873

sum

119894=1

119890minus(12)(||119909119894minus1199090||120582)

2

(A4)

10 Advances in Meteorology

Table 6 Weighted values depending on day distance with each 119870NN

4-NN 6-NN 8-NN1st 2nd 1st 2nd 3rd 1st 2nd 3rd 4th

Ep 0667 0417 0703 0563 0328 0720 0630 0480 0270Qu 0741 0289 0824 0527 0179 0864 0662 0384 0122Tw 0768 0188 0901 0461 0092 0968 0648 0287 0051Tc 0772 0301 0824 0579 0167 0844 0709 0416 0100Co 0680 0393 0726 0555 0301 0747 0635 0462 0243

A2 Kernel Density Classification One can use nonparamet-ric density estimates for classification in a straight-forwardfashion using Bayesrsquo theorem Suppose for a 119869 class problemwe fit nonparametric density estimates 119891

119895(119883) 119895 = 1 119869

separately in each of the classes and we also have estimatesof the class priors

119895(usually the sample proportions) Then

Pr (119866 = 119895 | 119883 = 1199090) =

119895119891119895(1199090)

sum119869

119896=1119896119891119896(1199090)

(A5)

In this region the data are sparse for both classes and sincethe Gaussian kernel density estimates use matric kernels thedensity estimates are low and of poor quality (high variance)in these regions The local logistic regression method usesthe tricube kernel with 119896-NN bandwidth this effectivelywidens the kernel in this region and makes use of the locallinear assumption to smooth out the estimate (on the logitscale)

If classification is the ultimate goal then learning theseparate class densities well may be unnecessary and can infact be misleading In learning the separate densities formdata one might decide to settle for a rougher high-variancefit to capture these features which are irrelevant for thepurposes of estimating the posterior probabilities In factif classification is the ultimate goal then we need only toestimate the posterior well near the decision boundary (fortwo classes this is the set 119909 | Pr(119866 = 1 | 119883 = 119909) = 12)

B Procedures of Missing Precipitation

This step shows example calculation for kernel functions forweighted mean It is an example question about the weight ofeach situation If the kernel functions are all symmetric samevalues are used for weight based on day distance FollowingTable 6 1st 2nd 3rd and 4th day distance and weightedvalues are shown For example if we want to estimatemissing precipitation for 2010-02-12 (actual value is 6) seefollowing procedures (3 steps) with 4-NN Epanechnikovkernel (Table 7)

Step 1 Select the date for target interpolation data

Step 2 Decide119870th nearest days precipitation and each kernelweight

Step 3 Calculate the weight average to estimate missing

Table 7 Example calculation to interpolate for missing precipita-tion

Step 1 Step 2 Step 3Date Prec Weight PrecsdotWeight Estimation2010-02-10 15 0417 6255

59492010-02-11 172 0667 114722010-02-12 6 mdash mdash2010-02-13 91 0667 60702010-02-14 0 0417 0

Table 8 Calculating missing precipitation for 2010-02-12 (actualvalue is 6) with six different methods

Method 4NN 6NN 8NNEp 5949 5172 4862Qu 5956 5302 4936Tw 5755 5294 4952Tc 6205 5407 4963Co 5945 5197 4876Reg 10325 8967 8813(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine andReg regression)

The rest of the kernel methods for estimating missingprecipitation are described in Table 8

C Sample Calculations with Real Value

This section shows how to calculate missing precipitationwith kernel mean weighed function by using certain numberThis sample selected daily data from 2008 to 2010 with002 possibilities to bivariate by random After selected datasetting data location is operated Zhang et al [39] addressedthat kernel based nonparametric multiple imputation hasbetter performance than general linear regression when thesample data is small or limited

Table 9 shows procedure of kernel weight in each func-tion We used data Feb 10 2012 from Feb 14 2014 toestimate Feb 12 2012 missing data Epanechnikov kernelshowed that longest data has highest estimation as 0417however Triweight kernel showed that longest data has lowestestimation as 0188 Highest weight in nearest value is Tricubekernel and lowest weight is Epanechnikov kernel Generally

Advances in Meteorology 11

Table 9 Sample calculation with certain number

(a) Epanechnikov

Date 210 211 212 213 214Prec 150 172 60 91 00Ep weight 0417 0667 mdash 0667 0417Precsdotweight 626 1147 mdash 607 000Estimation 595

(b) Quartic

Date 210 211 212 213 214Prec 150 172 60 91 00Qu weight 0289 0741 mdash 0741 0289Precsdotweight 434 1275 mdash 674 000Estimation 596

(c) Triweight

Date 210 211 212 213 214Prec 150 172 60 91 00Tw weight 0188 0768 mdash 0768 0188Precsdotweight 282 1321 mdash 699 000Estimation 575

(d) Tricube

Date 210 211 212 213 214Prec 150 172 60 91 00Tc weight 0301 0772 mdash 0772 0301Precsdotweight 452 1328 mdash 703 000Estimation 620

(e) Cosine

Date 210 211 212 213 214Prec 150 172 60 91 00Co weight 0393 0680 mdash 0680 0393Precsdotweight 590 1170 mdash 619 000Estimation 594

Tricube that is high weight shows the overestimation formissing precipitation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] K Kang and V Merwade ldquoDevelopment and application of astorage-release based distributed hydrologic model using GISrdquoJournal of Hydrology vol 403 no 1-2 pp 1ndash13 2011

[2] A J Abebe D P Solomatine and R G W Venneker ldquoApplica-tion of adaptive fuzzy rule-based models for reconstruction ofmissing precipitation eventsrdquoHydrological Sciences Journal vol45 no 3 pp 425ndash436 2000

[3] P Ramos-Calzado J Gomez-Camacho F Perez-Bernal andMF Pita-Lopez ldquoA novel approach to precipitation series com-pletion in climatological datasets application to AndalusiardquoInternational Journal of Climatology vol 28 no 11 pp 1525ndash1534 2008

[4] T Schneider ldquoAnalysis of incomplete climate data estimation ofmean values and covariancematrices and imputation ofmissingvaluesrdquo Journal of Climate vol 14 no 5 pp 853ndash871 2001

[5] S K Regonda D-J Seo B Lawrence J D Brown and JDemargne ldquoShort-term ensemble stream forecasting usingoperationally-produced single-valued streamflow forecastsmdasha Hydrologic Model Output Statistics (HMOS) approachrdquoJournal of Hydrology vol 497 pp 80ndash96 2013

[6] P Coulibaly and N D Evora ldquoComparison of neural networkmethods for infilling missing daily weather recordsrdquo Journal ofHydrology vol 341 no 1-2 pp 27ndash41 2007

[7] H A El Sharif and R S V Teegavarapu ldquoEvaluation ofspatial interpolation methods for missing precipitation datapreservation of spatial statisticsrdquo in Proceedings of the WorldEnvironmental and Water Resources Congress pp 3822ndash3832May 2012

[8] A Bardossy and G Pegram ldquoInfilling missing precipitationrecordsmdasha comparison of a new copula-based method withother techniquesrdquo Journal of Hydrology vol 519 pp 1162ndash11702014

[9] K Schamm M Ziese A Becker et al ldquoGlobal griddedprecipitation over land a description of the new GPCC firstguess daily productrdquo Earth System Science Data vol 6 no 1pp 49ndash60 2014

[10] R S V Teegavarapu ldquoStatistical corrections of spatially inter-polated missing precipitation data estimatesrdquoHydrological Pro-cesses vol 28 no 11 pp 3789ndash3808 2014

[11] Y Da and G Xiurun ldquoAn improved PSO-based ANN withsimulated annealing techniquerdquo Neurocomputing vol 63 pp527ndash533 2005

[12] A di Piazza F L Conti L V Noto F Viola and G La LoggialdquoComparative analysis of different techniques for spatial inter-polation of rainfall data to create a serially complete monthlytime series of precipitation for Sicily Italyrdquo International Journalof Applied Earth Observation and Geoinformation vol 13 no 3pp 396ndash408 2011

[13] E Pisoni F Pastor and M Volta ldquoArtificial Neural Networksto reconstruct incomplete satellite data application to theMediterranean Sea SurfaceTemperaturerdquoNonlinear Processes inGeophysics vol 15 no 1 pp 61ndash70 2008

[14] V Sharma S Rai and A Dev ldquoA comprehensive study ofartificial neural networksrdquo International Journal of AdvancedResearch in Computer Science and Sofrware Engineering vol 2no 10 pp 278ndash284 2012

[15] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy II hydrologic applicationsrdquo Journal of Hydrological Engi-neering vol 5 no 2 pp 124ndash137 2000

[16] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy I preliminary conceptsrdquo Journal of Hydrologic Engineeringvol 5 no 2 pp 115ndash123 2000

[17] D E Rumelhart B Widrow and M A Lehr ldquoThe basic ideasin neural networksrdquo Communications of the ACM vol 37 no 3pp 87ndash92 1994

[18] J Amorocho and W E Hart ldquoA critique of current methods inhydrologic systems investigationrdquo Transactions of the AmericanGeophysical Union vol 45 no 2 pp 307ndash321 1964

12 Advances in Meteorology

[19] A W Minns and M J Hall ldquoArtificial neural networks asrainfall-runoff modelsrdquo Hydrological Sciences Journal vol 41no 3 pp 399ndash417 1996

[20] K Kang and V Merwade ldquoThe effect of spatially uniformand non-uniform precipitation bias correction methods onimproving NEXRAD rainfall accuracy for distributed hydro-logic modelingrdquo Hydrology Research vol 45 no 1 pp 23ndash422014

[21] C Daly W P Gibson G H Taylor G L Johnson andP Pasteris ldquoA knowledge-based approach to the statisticalmapping of climaterdquo Climate Research vol 22 no 2 pp 99ndash1132002

[22] J D Creutin H Andrieu and D Faure ldquoUse of a weatherradar for the hydrology of a mountainous area Part II radarmeasurement validationrdquo Journal of Hydrology vol 193 no 1ndash4 pp 26ndash44 1997

[23] Y L Xia P Fabian A Stohl and M Winterhalter ldquoForest cli-matology estimation of missing values for Bavaria GermanyrdquoAgricultural and Forest Meteorology vol 96 no 1ndash3 pp 131ndash1441999

[24] C J Willmott S M Robeson and J J Feddema ldquoEstimatingcontinental and terrestrial precipitation averages from rain-gauge networksrdquo International Journal of Climatology vol 14no 4 pp 403ndash414 1994

[25] R S V Teegavarapu and V Chandramouli ldquoImproved weight-ing methods deterministic and stochastic data-driven modelsfor estimation of missing precipitation recordsrdquo Journal ofHydrology vol 312 no 1ndash4 pp 191ndash206 2005

[26] J A Smith ldquoPrecipitationrdquo in Handbook of Hydrology D RMaidment Ed vol 3 chapter 3 McGraw Hill New York NYUSA 1993

[27] J R Simanton and H B Osborn ldquoReciprocal-distance estimateof point rainfallrdquo Journal of Hydraulic Engineering Division vol106 no 7 pp 1242ndash1246 1980

[28] J D-J Salas ldquoAnalysis and modeling of hydrological timeseriesrdquo inHandbook of Hydrology D R Maidment Ed vol 19chapter 19 pp 191ndash1972 McGraw-Hill New York NY USA1993

[29] S J Jeffrey J O Carter K BMoodie and A R Beswick ldquoUsingspatial interpolation to construct a comprehensive archive ofAustralian climate datardquoEnvironmentalModelling and Softwarevol 16 no 4 pp 309ndash330 2001

[30] M Franklin V R Kotamarthi M L Stein and D R CookldquoGenerating data ensembles over a model grid from sparseclimate point measurementsrdquo Journal of Physics ConferenceSeries vol 125 Article ID 012019 2008

[31] K C Young ldquoA three-way model for interpolating for monthlyprecipitation valuesrdquo Monthly Weather Review vol 120 no 11pp 2561ndash2569 1992

[32] F Filippini G Galliani and L Pomi ldquoThe estimation ofmissingmeteorological data in a network of automatic stationsrdquoTransactions on Ecology and the Environment vol 4 pp 283ndash291 1994

[33] W P Lowry Compendium of Lecture Notes in Climatologyfor Class IV Meteorological Personnel Secretariat of the WorldMeteorological Organization Geneva Switzerland 1972

[34] M C Acock and Y A Pachepsky ldquoEstimating missing weatherdata for agricultural simulations using group method of datahandlingrdquo Journal of AppliedMeteorology vol 39 no 7 pp 1176ndash1184 2000

[35] S L Neitsch J G Arnold J R Kiniry J R Williamsand K W King Soil and Water Assessment ToolmdashTheoreticalDocumentation (Version 2005) Texas Water Resource InstituteCollege Station Tex USA 2005

[36] S S Shapiro and M B Wilk ldquoAn analysis of variance test fornormality complete samplesrdquo Biometrika vol 52 no 3-4 pp591ndash611 1965

[37] M Friedman ldquoThe use of ranks to avoid the assumption ofnormality implicit in the analysis of variancerdquo Journal of theAmerican Statistical Association vol 32 no 200 pp 675ndash7011937

[38] F Wilcoxon ldquoIndividual Comparisons by Ranking MethodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[39] S Zhang Z Jin X Zhu and J Zhang ldquoMissing data analysisa kernel-based multi-imputation approachrdquo in Transactionson Computational Science III vol 5300 of Lecture Notes inComputer Science pp 122ndash142 Springer Berlin Germany 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ClimatologyJournal of

EcologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

EarthquakesJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom

Applied ampEnvironmentalSoil Science

Volume 2014

Mining

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal of

Geophysics

OceanographyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of Computational Environmental SciencesHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal ofPetroleum Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

GeochemistryHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Atmospheric SciencesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OceanographyHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MineralogyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MeteorologyAdvances in

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Paleontology JournalHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ScientificaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geological ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geology Advances in

Page 5: Research Article Interpolation of Missing Precipitation ...downloads.hindawi.com/journals/amete/2015/935868.pdf · withthe NN regression approach, in terms of both statistical data

Advances in Meteorology 5

Table 2 Chi-square (Χ2) test with Friedman method for finding difference among six infilling methods SD represents standard deviation119875 value means significance probability

(a) 4-NN

119873 Mean SD Min Max value Mean rank Χ2

119875 valueEp 19 minus129 446 minus1550 576 253

55602 00000

Qu 19 minus142 503 minus1506 751 274Tw 19 minus164 537 minus1490 813 258Tc 19 minus118 506 minus1487 829 447Co 19 minus140 469 minus1542 608 268Reg 19 276 542 minus1347 1438 600

(b) 6-NN

119873 Mean SD Min Max value Mean rank Χ2

119875 valueEp 19 minus261 472 minus1668 158 153

66519 00000

Qu 19 minus227 471 minus1620 282 316Tw 19 minus218 484 minus1589 406 379Tc 19 minus215 463 minus1620 284 421Co 19 minus254 469 minus1659 150 232Reg 19 006 497 minus1548 733 600

(c) 8-NN

119873 Mean SD Min Max value Mean rank Χ2

119875 valueEp 19 minus340 504 minus1733 194 132

75812 00000

Qu 19 minus310 475 minus1690 045 321Tw 19 minus274 479 minus1659 129 458Tc 19 minus293 477 minus1696 151 368Co 19 minus328 493 minus1725 185 221Reg 19 minus124 535 minus1649 808 600(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine and Reg regression)

the 119896nn-regression Tricube Triweight Quartic Cosine andEpanechnikov were ranked as shown in Table 2 In anotherexample eight reference points used 119896nn-regression Tri-weight Quartic Cosine and Epanechnikov average rank(Table 2) As shown in Table 2 the 119896nn-regression has thelargest average rank and Epanechnikov has the smallest rankaverage for all of the reference point cases This result provesthe dissimilarity of these methods

To determine which methods are dissimilar to the othersthis study performed theWilcoxon signed rank test [38]Thebasic feature of the Wilcoxon signed rank test is that datasamples that come from the same population are paired andit is detailed in the following

119882 =

1003816100381610038161003816100381610038161003816100381610038161003816

119873

sum

119894=1

[sign (1199102119894minus 1199101119894) 119877119894]

1003816100381610038161003816100381610038161003816100381610038161003816

(10)

where 119873 is the sample size 1199102119894

is 119894th value of the seconddata point 119910

1119894is 119894th value of the first data point and 119877

119894

is the rank of |1199102119894minus 1199101119894| If the 119882 value is less than 5

it means there is different mechanism used on the sampledata or method Table 3 shows that the 119882 value for 119896nn-regression is less than 5 for all cases Accordingly thissignifies that 119896nn-regression is completely dissimilar to theothermethodsAlthough the five different kernelmethods for

data interpolation exhibit similarity or dissimilarity to eachother depending on the number of reference points all ofthe kernelmethods can be distinguished from 119896nn-regressionusing the Wilcoxon signed rank test

4 Results

Since Epanechnikov has the smallest average rank whichsignifies a small difference between the observation valueand the interpolated value for all reference points in Table 2interpolation data obtained from the Epanechnikov methodhas the best result among the studied methods Figure 2shows that filling in data from 119896nn-regression has a largedifference at both four and six reference points Interpolationdata from the kernel methods are close to zero for both theaverage and median values at four reference points meaningthat the interpolation data are similar to the observation dataOn the other hand more than 75 of the interpolation datafrom 119896nn-regression exhibits a difference than zero Whenthe interpolation data are evaluated at six reference points inFigure 2 the median value from the 119896nn-regression is shownto be far away from zero At eight reference points 119896nn-regression is close to zero for both average andmedian valueshowever it is difficult to conclude that this is an ideal method

6 Advances in Meteorology

Table 3 Chi-square (Χ2) test with Wilcoxon signed rank method between regression and five different kernel methods

(a) 4-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(b) 6-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(c) 8-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine and Reg regression)

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(a) 4-NN

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(b) 6-NN

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(c) 8-NN

Figure 2 Box plots for difference between actual precipitation and interpolated precipitation 119910-axis represents mm per day

because outlying maximum values will affect the average andmedian value

This study on precipitation data interpolation also eval-uated the simulation of the interpolated data using theSWAT hydrologic model In SWAT hydrologic modelingthe surface runoff is estimated by considering excess pre-cipitation with abstractions and infiltration factor throughSoil Conservation Service CurveNumber (SCS-CN)method

Green-Ampt (GA) infiltration method is another methodto calculate the surface runoff in SWAT A study showsthat both methods give reasonable results and there is nosignificant advantage observed in using one over the otherHowever the GAmethod appears to havemore limitations inmodeling seasonal variability than the SCS-CNmethod doesHence the SCS-CN method is used for infiltration factor inthis study An SCS curve number based simulation needs

Advances in Meteorology 7

0

100

200

300

400

500

60020

08-0

1-01

2008

-01-

2220

08-0

2-12

2008

-03-

0420

08-0

3-25

2008

-04-

1520

08-0

5-06

2008

-05-

2720

08-0

6-17

2008

-07-

0820

08-0

7-29

2008

-08-

1920

08-0

9-09

2008

-09-

3020

08-1

0-21

2008

-11-

1120

08-1

2-02

2008

-12-

2320

09-0

1-13

2009

-02-

0320

09-0

2-24

2009

-03-

1720

09-0

4-07

2009

-04-

2820

09-0

5-19

2009

-06-

0920

09-0

6-30

2009

-07-

2120

09-0

8-11

2009

-09-

0120

09-0

9-22

2009

-10-

1320

09-1

1-03

2009

-11-

2420

09-1

2-15

2010

-01-

0520

10-0

1-26

2010

-02-

1620

10-0

3-09

2010

-03-

3020

10-0

4-20

2010

-05-

1120

10-0

6-01

2010

-06-

2220

10-0

7-13

2010

-08-

0320

10-0

8-24

2010

-09-

1420

10-1

0-05

2010

-10-

2620

10-1

1-16

2010

-12-

0720

10-1

2-28

ObsSim

Figure 3 Calibrated model result using original precipitation input 119909-axis represents time in days and 119910-axis represents flow in cubic metersper second

Table 4 Details of SWAT parameters which are related to runoff mechanism for Imha watershed

Parameter Description Selected valueESCO Soil evaporation compensation factor 09500EPCO Plant water uptake compensation factor 10000EVLAI Leaf area index at which no evaporation occurs from water surface [m2m2

] 30000FFCB Initial soil water storage expressed as a fraction of field capacity water content 00000IEVENT Rainfallrunoff code 0 = daily rainfallCN 00000ICRK Crack flow code 1 = model crack flow in soil 00000SURLAG Surface runoff lag time [days] 40000ADJ PKR Peak rate adjustment factor for sediment routing in the subbasin (tributary channels) 00000PRF Peak rate adjustment factor for sediment routing in the main channel 10000

SPCON Linear parameter for calculating the maximum amount of sediment that can be reentrainedduring channel sediment 00001

SPEXP Exponent parameter for calculating sediment reentrained in channel sediment routing 10000

time step updated information as soil water content changesExcess rainfall equation in SCS-CN method was generatedbased on historical relationship between the curve numberand the hydrologic mechanism for over 20 years Through-out the surface runoff calculation infiltration should beupdated over time according to the soil type Other abstrac-tions such as evapotranspiration and soil and snow evap-oration are calculated by Penman-Monteith method and

meteorological statistics Finally the kinematic storagemodelis used to compute groundwater storage and seepage Flowresulting in SWAT modeling is routed HRUs to watershedoutlet Figure 3 shows the calibration of themodel simulationas the initial step and the specific parameters are describedin Table 4 After the calibration of the SWAT model thesix different interpolated precipitation datasets with threedifferent reference ranges for each (a total of twenty-four

8 Advances in Meteorology

Table 5 Details of simulation results with six different precipitation infilling methods in Imha watershed

4-NN 6-NN 8-NN119864NS 119877

2 RMSE 119864NS 1198772 RMSE 119864NS 119877

2 RMSEEp 080 083 1532 091 092 1060 078 082 1616Qu 073 078 1783 091 093 1048 088 092 1180Tw 091 091 1056 093 094 925 095 095 810Tc 095 095 772 093 094 903 095 095 764Co 093 094 883 095 095 772 091 093 1044Reg 069 080 1914 021 065 3071 071 073 2148

interpolated precipitations data points) were used to assessthe performance of interpolated precipitation data for hydro-logic model simulation Streamflow simulations were donefor three years from 2008 to 2010 To evaluate the modelperformance considering the use of different interpolatedprecipitation datasets this study used 119864NS (Nash-Sutcliffecoefficient) 119877-square (coefficient of determination) andRMSE (root mean square error) Table 5 and Figure 4 showthat the simulation results from 119896nn-regression exhibit lowSWAT simulation performance for streamflow estimationswith 054119864NS 074119877-square and 2378m3s RMSE as anaverage All of the kernel functions on the other handexhibit good performance for hydrologic simulations withinterpolated precipitation data (Table 5 and Figure 4) theaverage of 119864NS 119877-square and RMSE (1) for Epanechnikov is083 086 and 1403m3s (2) for Quartic is 084 088 and1303m3s (3) for Triweight is 093 093 and 930m3s (4)for Tricube is 094 095 and 813m3s and (5) for Cosine is093 094 and 900m3s respectively

5 Conclusions

Five different kernel functions were applied to the Imhawatershed to evaluate the performance of each weightedmethod for estimating missing precipitation data and the useof interpolated data for hydrologic simulations was assessedThe following conclusions can be drawn from this research

(1) To estimate missing precipitation data points explor-atory procedures should consider the spatiotempo-ral variations of precipitation Due to difficulty onaccounting for these variations statistical methodsfor estimating missing precipitation data are com-monly used

(2) Although ANNs are an advanced approach for esti-matingmissing datamechanisms are unclear becausethe neuron system is ultimately a black-box modelThus regressionmethods are widely used for estimat-ing missing data even though there are limitationsin that regression methods cannot follow normaldistribution when the sample is small

(3) When using kernel functions as a weighted methodestimated missing data would satisfy normal distri-bution which is more statistically sound Also kernelmethods can overcome weakness in 119896nn-regression if

the data have outliers andor a nonlinear trend aroundthe missing data points in terms of mean value

(4) This study assessed the five kernel functions Epan-echnikov Quartic Triweight Tricube and Cosine asa weight for predictingmissing values In comparisonwith the 119896nn-regression method this study demon-strates that the kernel approaches provide higherquality interpolated precipitation data than the 119896nn-regression approach In addition the kernel functionresults better conform to statistical standards

(5) Furthermore higher quality of interpolated precipi-tation data results in better performance for hydro-logic simulations as exemplified in this study All ofthe statistical analyses of the streamflow simulationsshowed that the simulations using the interpolatedprecipitation data from the kernel functions providebetter results than using 119896nn-regression

(6) Use of kernel distribution is a more effective methodthan regression when the precipitation data have anupward or downward trend However if the precip-itation data have a nonlinear trend it is difficult toeffectively reconstruct the missing values For furtherresearch a time series analysis or a random walkmodel using a stochastic process are possiblemethodsby which to estimate missing data where there is anonlinear trend

Appendices

A Kenel Functions

Kernel density estimation is an unsupervised learning pro-cedure which historically precedes kernel regression It alsoleads naturally to a simple family of procedures for nonpara-metric classification

A1 Kernel Density Estimation Suppose we have a randomsample 119909

1 1199092 119909

119873draw from a probability density 119891

119909(119909)

and we wish to estimate 119891119909at a point 119909

0 For simplicity we

assume for now that 119909 isin 119877 (real value) Arguing as before anatural local estimate has the form

119891119909(1199090) =

119909119894isin 119873 (119909

0)

119873120582 (A1)

where 119909119894means number of 119909

119894which converges to119873(119909

0) and

119873(1199090) is a small metric neighborhood around 119909

0of width 120582

Advances in Meteorology 9

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(a) 4-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(b) 6-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

EPQUTW

TCCOKNN

(c) 8-NN

Figure 4 Scatter plots for SWAT simulation (EP QU TW TC CO and119870NN represents Epanechnikov Quartic Triweight Tricube Cosineand 119870NN-regression resp)

This estimate is bumpy and the smooth Parzen estimate ispreferred

119891119909(1199090) =

1

119873120582

119873

sum

119894=1

119870120582(1199090 119909119894) (A2)

because it counts observations close to 1199090with weights that

decrease with distance from 1199090 In this case a popular choice

for 119870120582is the Gaussian kernel 119870

120582(1199090 119909119894) = 120601(|119909 minus 119909

0|120582)

Letting 120601120582denote the Gaussian density with mean zero and

standard-deviation 120582 then (A2) has the form

119891119909(1199090) =

1

119873

119873

sum

119894=1

120601120582(119909 minus 119909

119894) = (119865 lowast 120601

120582) (119909) (A3)

the convolution of the sample empirical distribution 119865 with120601120582 The distribution 119865(119909) puts mass 1119873 at each of the ob-

served 119909119894and is jumpy in 119891

119909(119909) we have smoothed 119865

by adding independent Gaussian noise to each observation119909119894The Parzen density estimate is the equivalent of the local

average and improvements have been proposed along thelines of local regression (on the log scale for densities) Wewill not pursue these here In 119877119901 the natural generalization ofthe Gaussian density estimate amounts to using the Gaussianproduct kernel in (A3)

119891119909(1199090) =

1

119873 (21205822120587)1199012

119873

sum

119894=1

119890minus(12)(||119909119894minus1199090||120582)

2

(A4)

10 Advances in Meteorology

Table 6 Weighted values depending on day distance with each 119870NN

4-NN 6-NN 8-NN1st 2nd 1st 2nd 3rd 1st 2nd 3rd 4th

Ep 0667 0417 0703 0563 0328 0720 0630 0480 0270Qu 0741 0289 0824 0527 0179 0864 0662 0384 0122Tw 0768 0188 0901 0461 0092 0968 0648 0287 0051Tc 0772 0301 0824 0579 0167 0844 0709 0416 0100Co 0680 0393 0726 0555 0301 0747 0635 0462 0243

A2 Kernel Density Classification One can use nonparamet-ric density estimates for classification in a straight-forwardfashion using Bayesrsquo theorem Suppose for a 119869 class problemwe fit nonparametric density estimates 119891

119895(119883) 119895 = 1 119869

separately in each of the classes and we also have estimatesof the class priors

119895(usually the sample proportions) Then

Pr (119866 = 119895 | 119883 = 1199090) =

119895119891119895(1199090)

sum119869

119896=1119896119891119896(1199090)

(A5)

In this region the data are sparse for both classes and sincethe Gaussian kernel density estimates use matric kernels thedensity estimates are low and of poor quality (high variance)in these regions The local logistic regression method usesthe tricube kernel with 119896-NN bandwidth this effectivelywidens the kernel in this region and makes use of the locallinear assumption to smooth out the estimate (on the logitscale)

If classification is the ultimate goal then learning theseparate class densities well may be unnecessary and can infact be misleading In learning the separate densities formdata one might decide to settle for a rougher high-variancefit to capture these features which are irrelevant for thepurposes of estimating the posterior probabilities In factif classification is the ultimate goal then we need only toestimate the posterior well near the decision boundary (fortwo classes this is the set 119909 | Pr(119866 = 1 | 119883 = 119909) = 12)

B Procedures of Missing Precipitation

This step shows example calculation for kernel functions forweighted mean It is an example question about the weight ofeach situation If the kernel functions are all symmetric samevalues are used for weight based on day distance FollowingTable 6 1st 2nd 3rd and 4th day distance and weightedvalues are shown For example if we want to estimatemissing precipitation for 2010-02-12 (actual value is 6) seefollowing procedures (3 steps) with 4-NN Epanechnikovkernel (Table 7)

Step 1 Select the date for target interpolation data

Step 2 Decide119870th nearest days precipitation and each kernelweight

Step 3 Calculate the weight average to estimate missing

Table 7 Example calculation to interpolate for missing precipita-tion

Step 1 Step 2 Step 3Date Prec Weight PrecsdotWeight Estimation2010-02-10 15 0417 6255

59492010-02-11 172 0667 114722010-02-12 6 mdash mdash2010-02-13 91 0667 60702010-02-14 0 0417 0

Table 8 Calculating missing precipitation for 2010-02-12 (actualvalue is 6) with six different methods

Method 4NN 6NN 8NNEp 5949 5172 4862Qu 5956 5302 4936Tw 5755 5294 4952Tc 6205 5407 4963Co 5945 5197 4876Reg 10325 8967 8813(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine andReg regression)

The rest of the kernel methods for estimating missingprecipitation are described in Table 8

C Sample Calculations with Real Value

This section shows how to calculate missing precipitationwith kernel mean weighed function by using certain numberThis sample selected daily data from 2008 to 2010 with002 possibilities to bivariate by random After selected datasetting data location is operated Zhang et al [39] addressedthat kernel based nonparametric multiple imputation hasbetter performance than general linear regression when thesample data is small or limited

Table 9 shows procedure of kernel weight in each func-tion We used data Feb 10 2012 from Feb 14 2014 toestimate Feb 12 2012 missing data Epanechnikov kernelshowed that longest data has highest estimation as 0417however Triweight kernel showed that longest data has lowestestimation as 0188 Highest weight in nearest value is Tricubekernel and lowest weight is Epanechnikov kernel Generally

Advances in Meteorology 11

Table 9 Sample calculation with certain number

(a) Epanechnikov

Date 210 211 212 213 214Prec 150 172 60 91 00Ep weight 0417 0667 mdash 0667 0417Precsdotweight 626 1147 mdash 607 000Estimation 595

(b) Quartic

Date 210 211 212 213 214Prec 150 172 60 91 00Qu weight 0289 0741 mdash 0741 0289Precsdotweight 434 1275 mdash 674 000Estimation 596

(c) Triweight

Date 210 211 212 213 214Prec 150 172 60 91 00Tw weight 0188 0768 mdash 0768 0188Precsdotweight 282 1321 mdash 699 000Estimation 575

(d) Tricube

Date 210 211 212 213 214Prec 150 172 60 91 00Tc weight 0301 0772 mdash 0772 0301Precsdotweight 452 1328 mdash 703 000Estimation 620

(e) Cosine

Date 210 211 212 213 214Prec 150 172 60 91 00Co weight 0393 0680 mdash 0680 0393Precsdotweight 590 1170 mdash 619 000Estimation 594

Tricube that is high weight shows the overestimation formissing precipitation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] K Kang and V Merwade ldquoDevelopment and application of astorage-release based distributed hydrologic model using GISrdquoJournal of Hydrology vol 403 no 1-2 pp 1ndash13 2011

[2] A J Abebe D P Solomatine and R G W Venneker ldquoApplica-tion of adaptive fuzzy rule-based models for reconstruction ofmissing precipitation eventsrdquoHydrological Sciences Journal vol45 no 3 pp 425ndash436 2000

[3] P Ramos-Calzado J Gomez-Camacho F Perez-Bernal andMF Pita-Lopez ldquoA novel approach to precipitation series com-pletion in climatological datasets application to AndalusiardquoInternational Journal of Climatology vol 28 no 11 pp 1525ndash1534 2008

[4] T Schneider ldquoAnalysis of incomplete climate data estimation ofmean values and covariancematrices and imputation ofmissingvaluesrdquo Journal of Climate vol 14 no 5 pp 853ndash871 2001

[5] S K Regonda D-J Seo B Lawrence J D Brown and JDemargne ldquoShort-term ensemble stream forecasting usingoperationally-produced single-valued streamflow forecastsmdasha Hydrologic Model Output Statistics (HMOS) approachrdquoJournal of Hydrology vol 497 pp 80ndash96 2013

[6] P Coulibaly and N D Evora ldquoComparison of neural networkmethods for infilling missing daily weather recordsrdquo Journal ofHydrology vol 341 no 1-2 pp 27ndash41 2007

[7] H A El Sharif and R S V Teegavarapu ldquoEvaluation ofspatial interpolation methods for missing precipitation datapreservation of spatial statisticsrdquo in Proceedings of the WorldEnvironmental and Water Resources Congress pp 3822ndash3832May 2012

[8] A Bardossy and G Pegram ldquoInfilling missing precipitationrecordsmdasha comparison of a new copula-based method withother techniquesrdquo Journal of Hydrology vol 519 pp 1162ndash11702014

[9] K Schamm M Ziese A Becker et al ldquoGlobal griddedprecipitation over land a description of the new GPCC firstguess daily productrdquo Earth System Science Data vol 6 no 1pp 49ndash60 2014

[10] R S V Teegavarapu ldquoStatistical corrections of spatially inter-polated missing precipitation data estimatesrdquoHydrological Pro-cesses vol 28 no 11 pp 3789ndash3808 2014

[11] Y Da and G Xiurun ldquoAn improved PSO-based ANN withsimulated annealing techniquerdquo Neurocomputing vol 63 pp527ndash533 2005

[12] A di Piazza F L Conti L V Noto F Viola and G La LoggialdquoComparative analysis of different techniques for spatial inter-polation of rainfall data to create a serially complete monthlytime series of precipitation for Sicily Italyrdquo International Journalof Applied Earth Observation and Geoinformation vol 13 no 3pp 396ndash408 2011

[13] E Pisoni F Pastor and M Volta ldquoArtificial Neural Networksto reconstruct incomplete satellite data application to theMediterranean Sea SurfaceTemperaturerdquoNonlinear Processes inGeophysics vol 15 no 1 pp 61ndash70 2008

[14] V Sharma S Rai and A Dev ldquoA comprehensive study ofartificial neural networksrdquo International Journal of AdvancedResearch in Computer Science and Sofrware Engineering vol 2no 10 pp 278ndash284 2012

[15] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy II hydrologic applicationsrdquo Journal of Hydrological Engi-neering vol 5 no 2 pp 124ndash137 2000

[16] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy I preliminary conceptsrdquo Journal of Hydrologic Engineeringvol 5 no 2 pp 115ndash123 2000

[17] D E Rumelhart B Widrow and M A Lehr ldquoThe basic ideasin neural networksrdquo Communications of the ACM vol 37 no 3pp 87ndash92 1994

[18] J Amorocho and W E Hart ldquoA critique of current methods inhydrologic systems investigationrdquo Transactions of the AmericanGeophysical Union vol 45 no 2 pp 307ndash321 1964

12 Advances in Meteorology

[19] A W Minns and M J Hall ldquoArtificial neural networks asrainfall-runoff modelsrdquo Hydrological Sciences Journal vol 41no 3 pp 399ndash417 1996

[20] K Kang and V Merwade ldquoThe effect of spatially uniformand non-uniform precipitation bias correction methods onimproving NEXRAD rainfall accuracy for distributed hydro-logic modelingrdquo Hydrology Research vol 45 no 1 pp 23ndash422014

[21] C Daly W P Gibson G H Taylor G L Johnson andP Pasteris ldquoA knowledge-based approach to the statisticalmapping of climaterdquo Climate Research vol 22 no 2 pp 99ndash1132002

[22] J D Creutin H Andrieu and D Faure ldquoUse of a weatherradar for the hydrology of a mountainous area Part II radarmeasurement validationrdquo Journal of Hydrology vol 193 no 1ndash4 pp 26ndash44 1997

[23] Y L Xia P Fabian A Stohl and M Winterhalter ldquoForest cli-matology estimation of missing values for Bavaria GermanyrdquoAgricultural and Forest Meteorology vol 96 no 1ndash3 pp 131ndash1441999

[24] C J Willmott S M Robeson and J J Feddema ldquoEstimatingcontinental and terrestrial precipitation averages from rain-gauge networksrdquo International Journal of Climatology vol 14no 4 pp 403ndash414 1994

[25] R S V Teegavarapu and V Chandramouli ldquoImproved weight-ing methods deterministic and stochastic data-driven modelsfor estimation of missing precipitation recordsrdquo Journal ofHydrology vol 312 no 1ndash4 pp 191ndash206 2005

[26] J A Smith ldquoPrecipitationrdquo in Handbook of Hydrology D RMaidment Ed vol 3 chapter 3 McGraw Hill New York NYUSA 1993

[27] J R Simanton and H B Osborn ldquoReciprocal-distance estimateof point rainfallrdquo Journal of Hydraulic Engineering Division vol106 no 7 pp 1242ndash1246 1980

[28] J D-J Salas ldquoAnalysis and modeling of hydrological timeseriesrdquo inHandbook of Hydrology D R Maidment Ed vol 19chapter 19 pp 191ndash1972 McGraw-Hill New York NY USA1993

[29] S J Jeffrey J O Carter K BMoodie and A R Beswick ldquoUsingspatial interpolation to construct a comprehensive archive ofAustralian climate datardquoEnvironmentalModelling and Softwarevol 16 no 4 pp 309ndash330 2001

[30] M Franklin V R Kotamarthi M L Stein and D R CookldquoGenerating data ensembles over a model grid from sparseclimate point measurementsrdquo Journal of Physics ConferenceSeries vol 125 Article ID 012019 2008

[31] K C Young ldquoA three-way model for interpolating for monthlyprecipitation valuesrdquo Monthly Weather Review vol 120 no 11pp 2561ndash2569 1992

[32] F Filippini G Galliani and L Pomi ldquoThe estimation ofmissingmeteorological data in a network of automatic stationsrdquoTransactions on Ecology and the Environment vol 4 pp 283ndash291 1994

[33] W P Lowry Compendium of Lecture Notes in Climatologyfor Class IV Meteorological Personnel Secretariat of the WorldMeteorological Organization Geneva Switzerland 1972

[34] M C Acock and Y A Pachepsky ldquoEstimating missing weatherdata for agricultural simulations using group method of datahandlingrdquo Journal of AppliedMeteorology vol 39 no 7 pp 1176ndash1184 2000

[35] S L Neitsch J G Arnold J R Kiniry J R Williamsand K W King Soil and Water Assessment ToolmdashTheoreticalDocumentation (Version 2005) Texas Water Resource InstituteCollege Station Tex USA 2005

[36] S S Shapiro and M B Wilk ldquoAn analysis of variance test fornormality complete samplesrdquo Biometrika vol 52 no 3-4 pp591ndash611 1965

[37] M Friedman ldquoThe use of ranks to avoid the assumption ofnormality implicit in the analysis of variancerdquo Journal of theAmerican Statistical Association vol 32 no 200 pp 675ndash7011937

[38] F Wilcoxon ldquoIndividual Comparisons by Ranking MethodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[39] S Zhang Z Jin X Zhu and J Zhang ldquoMissing data analysisa kernel-based multi-imputation approachrdquo in Transactionson Computational Science III vol 5300 of Lecture Notes inComputer Science pp 122ndash142 Springer Berlin Germany 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ClimatologyJournal of

EcologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

EarthquakesJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom

Applied ampEnvironmentalSoil Science

Volume 2014

Mining

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal of

Geophysics

OceanographyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of Computational Environmental SciencesHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal ofPetroleum Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

GeochemistryHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Atmospheric SciencesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OceanographyHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MineralogyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MeteorologyAdvances in

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Paleontology JournalHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ScientificaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geological ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geology Advances in

Page 6: Research Article Interpolation of Missing Precipitation ...downloads.hindawi.com/journals/amete/2015/935868.pdf · withthe NN regression approach, in terms of both statistical data

6 Advances in Meteorology

Table 3 Chi-square (Χ2) test with Wilcoxon signed rank method between regression and five different kernel methods

(a) 4-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(b) 6-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(c) 8-NN

Ep Qu Tw Tc Co

Reg Χ2

minus3823 minus3823 minus3823 minus3823 minus3823119875 value 00001 00001 00001 00001 00001

(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine and Reg regression)

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(a) 4-NN

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(b) 6-NN

EP QU TW TC CO RE

minus15

minus10

minus5

0

5

(c) 8-NN

Figure 2 Box plots for difference between actual precipitation and interpolated precipitation 119910-axis represents mm per day

because outlying maximum values will affect the average andmedian value

This study on precipitation data interpolation also eval-uated the simulation of the interpolated data using theSWAT hydrologic model In SWAT hydrologic modelingthe surface runoff is estimated by considering excess pre-cipitation with abstractions and infiltration factor throughSoil Conservation Service CurveNumber (SCS-CN)method

Green-Ampt (GA) infiltration method is another methodto calculate the surface runoff in SWAT A study showsthat both methods give reasonable results and there is nosignificant advantage observed in using one over the otherHowever the GAmethod appears to havemore limitations inmodeling seasonal variability than the SCS-CNmethod doesHence the SCS-CN method is used for infiltration factor inthis study An SCS curve number based simulation needs

Advances in Meteorology 7

0

100

200

300

400

500

60020

08-0

1-01

2008

-01-

2220

08-0

2-12

2008

-03-

0420

08-0

3-25

2008

-04-

1520

08-0

5-06

2008

-05-

2720

08-0

6-17

2008

-07-

0820

08-0

7-29

2008

-08-

1920

08-0

9-09

2008

-09-

3020

08-1

0-21

2008

-11-

1120

08-1

2-02

2008

-12-

2320

09-0

1-13

2009

-02-

0320

09-0

2-24

2009

-03-

1720

09-0

4-07

2009

-04-

2820

09-0

5-19

2009

-06-

0920

09-0

6-30

2009

-07-

2120

09-0

8-11

2009

-09-

0120

09-0

9-22

2009

-10-

1320

09-1

1-03

2009

-11-

2420

09-1

2-15

2010

-01-

0520

10-0

1-26

2010

-02-

1620

10-0

3-09

2010

-03-

3020

10-0

4-20

2010

-05-

1120

10-0

6-01

2010

-06-

2220

10-0

7-13

2010

-08-

0320

10-0

8-24

2010

-09-

1420

10-1

0-05

2010

-10-

2620

10-1

1-16

2010

-12-

0720

10-1

2-28

ObsSim

Figure 3 Calibrated model result using original precipitation input 119909-axis represents time in days and 119910-axis represents flow in cubic metersper second

Table 4 Details of SWAT parameters which are related to runoff mechanism for Imha watershed

Parameter Description Selected valueESCO Soil evaporation compensation factor 09500EPCO Plant water uptake compensation factor 10000EVLAI Leaf area index at which no evaporation occurs from water surface [m2m2

] 30000FFCB Initial soil water storage expressed as a fraction of field capacity water content 00000IEVENT Rainfallrunoff code 0 = daily rainfallCN 00000ICRK Crack flow code 1 = model crack flow in soil 00000SURLAG Surface runoff lag time [days] 40000ADJ PKR Peak rate adjustment factor for sediment routing in the subbasin (tributary channels) 00000PRF Peak rate adjustment factor for sediment routing in the main channel 10000

SPCON Linear parameter for calculating the maximum amount of sediment that can be reentrainedduring channel sediment 00001

SPEXP Exponent parameter for calculating sediment reentrained in channel sediment routing 10000

time step updated information as soil water content changesExcess rainfall equation in SCS-CN method was generatedbased on historical relationship between the curve numberand the hydrologic mechanism for over 20 years Through-out the surface runoff calculation infiltration should beupdated over time according to the soil type Other abstrac-tions such as evapotranspiration and soil and snow evap-oration are calculated by Penman-Monteith method and

meteorological statistics Finally the kinematic storagemodelis used to compute groundwater storage and seepage Flowresulting in SWAT modeling is routed HRUs to watershedoutlet Figure 3 shows the calibration of themodel simulationas the initial step and the specific parameters are describedin Table 4 After the calibration of the SWAT model thesix different interpolated precipitation datasets with threedifferent reference ranges for each (a total of twenty-four

8 Advances in Meteorology

Table 5 Details of simulation results with six different precipitation infilling methods in Imha watershed

4-NN 6-NN 8-NN119864NS 119877

2 RMSE 119864NS 1198772 RMSE 119864NS 119877

2 RMSEEp 080 083 1532 091 092 1060 078 082 1616Qu 073 078 1783 091 093 1048 088 092 1180Tw 091 091 1056 093 094 925 095 095 810Tc 095 095 772 093 094 903 095 095 764Co 093 094 883 095 095 772 091 093 1044Reg 069 080 1914 021 065 3071 071 073 2148

interpolated precipitations data points) were used to assessthe performance of interpolated precipitation data for hydro-logic model simulation Streamflow simulations were donefor three years from 2008 to 2010 To evaluate the modelperformance considering the use of different interpolatedprecipitation datasets this study used 119864NS (Nash-Sutcliffecoefficient) 119877-square (coefficient of determination) andRMSE (root mean square error) Table 5 and Figure 4 showthat the simulation results from 119896nn-regression exhibit lowSWAT simulation performance for streamflow estimationswith 054119864NS 074119877-square and 2378m3s RMSE as anaverage All of the kernel functions on the other handexhibit good performance for hydrologic simulations withinterpolated precipitation data (Table 5 and Figure 4) theaverage of 119864NS 119877-square and RMSE (1) for Epanechnikov is083 086 and 1403m3s (2) for Quartic is 084 088 and1303m3s (3) for Triweight is 093 093 and 930m3s (4)for Tricube is 094 095 and 813m3s and (5) for Cosine is093 094 and 900m3s respectively

5 Conclusions

Five different kernel functions were applied to the Imhawatershed to evaluate the performance of each weightedmethod for estimating missing precipitation data and the useof interpolated data for hydrologic simulations was assessedThe following conclusions can be drawn from this research

(1) To estimate missing precipitation data points explor-atory procedures should consider the spatiotempo-ral variations of precipitation Due to difficulty onaccounting for these variations statistical methodsfor estimating missing precipitation data are com-monly used

(2) Although ANNs are an advanced approach for esti-matingmissing datamechanisms are unclear becausethe neuron system is ultimately a black-box modelThus regressionmethods are widely used for estimat-ing missing data even though there are limitationsin that regression methods cannot follow normaldistribution when the sample is small

(3) When using kernel functions as a weighted methodestimated missing data would satisfy normal distri-bution which is more statistically sound Also kernelmethods can overcome weakness in 119896nn-regression if

the data have outliers andor a nonlinear trend aroundthe missing data points in terms of mean value

(4) This study assessed the five kernel functions Epan-echnikov Quartic Triweight Tricube and Cosine asa weight for predictingmissing values In comparisonwith the 119896nn-regression method this study demon-strates that the kernel approaches provide higherquality interpolated precipitation data than the 119896nn-regression approach In addition the kernel functionresults better conform to statistical standards

(5) Furthermore higher quality of interpolated precipi-tation data results in better performance for hydro-logic simulations as exemplified in this study All ofthe statistical analyses of the streamflow simulationsshowed that the simulations using the interpolatedprecipitation data from the kernel functions providebetter results than using 119896nn-regression

(6) Use of kernel distribution is a more effective methodthan regression when the precipitation data have anupward or downward trend However if the precip-itation data have a nonlinear trend it is difficult toeffectively reconstruct the missing values For furtherresearch a time series analysis or a random walkmodel using a stochastic process are possiblemethodsby which to estimate missing data where there is anonlinear trend

Appendices

A Kenel Functions

Kernel density estimation is an unsupervised learning pro-cedure which historically precedes kernel regression It alsoleads naturally to a simple family of procedures for nonpara-metric classification

A1 Kernel Density Estimation Suppose we have a randomsample 119909

1 1199092 119909

119873draw from a probability density 119891

119909(119909)

and we wish to estimate 119891119909at a point 119909

0 For simplicity we

assume for now that 119909 isin 119877 (real value) Arguing as before anatural local estimate has the form

119891119909(1199090) =

119909119894isin 119873 (119909

0)

119873120582 (A1)

where 119909119894means number of 119909

119894which converges to119873(119909

0) and

119873(1199090) is a small metric neighborhood around 119909

0of width 120582

Advances in Meteorology 9

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(a) 4-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(b) 6-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

EPQUTW

TCCOKNN

(c) 8-NN

Figure 4 Scatter plots for SWAT simulation (EP QU TW TC CO and119870NN represents Epanechnikov Quartic Triweight Tricube Cosineand 119870NN-regression resp)

This estimate is bumpy and the smooth Parzen estimate ispreferred

119891119909(1199090) =

1

119873120582

119873

sum

119894=1

119870120582(1199090 119909119894) (A2)

because it counts observations close to 1199090with weights that

decrease with distance from 1199090 In this case a popular choice

for 119870120582is the Gaussian kernel 119870

120582(1199090 119909119894) = 120601(|119909 minus 119909

0|120582)

Letting 120601120582denote the Gaussian density with mean zero and

standard-deviation 120582 then (A2) has the form

119891119909(1199090) =

1

119873

119873

sum

119894=1

120601120582(119909 minus 119909

119894) = (119865 lowast 120601

120582) (119909) (A3)

the convolution of the sample empirical distribution 119865 with120601120582 The distribution 119865(119909) puts mass 1119873 at each of the ob-

served 119909119894and is jumpy in 119891

119909(119909) we have smoothed 119865

by adding independent Gaussian noise to each observation119909119894The Parzen density estimate is the equivalent of the local

average and improvements have been proposed along thelines of local regression (on the log scale for densities) Wewill not pursue these here In 119877119901 the natural generalization ofthe Gaussian density estimate amounts to using the Gaussianproduct kernel in (A3)

119891119909(1199090) =

1

119873 (21205822120587)1199012

119873

sum

119894=1

119890minus(12)(||119909119894minus1199090||120582)

2

(A4)

10 Advances in Meteorology

Table 6 Weighted values depending on day distance with each 119870NN

4-NN 6-NN 8-NN1st 2nd 1st 2nd 3rd 1st 2nd 3rd 4th

Ep 0667 0417 0703 0563 0328 0720 0630 0480 0270Qu 0741 0289 0824 0527 0179 0864 0662 0384 0122Tw 0768 0188 0901 0461 0092 0968 0648 0287 0051Tc 0772 0301 0824 0579 0167 0844 0709 0416 0100Co 0680 0393 0726 0555 0301 0747 0635 0462 0243

A2 Kernel Density Classification One can use nonparamet-ric density estimates for classification in a straight-forwardfashion using Bayesrsquo theorem Suppose for a 119869 class problemwe fit nonparametric density estimates 119891

119895(119883) 119895 = 1 119869

separately in each of the classes and we also have estimatesof the class priors

119895(usually the sample proportions) Then

Pr (119866 = 119895 | 119883 = 1199090) =

119895119891119895(1199090)

sum119869

119896=1119896119891119896(1199090)

(A5)

In this region the data are sparse for both classes and sincethe Gaussian kernel density estimates use matric kernels thedensity estimates are low and of poor quality (high variance)in these regions The local logistic regression method usesthe tricube kernel with 119896-NN bandwidth this effectivelywidens the kernel in this region and makes use of the locallinear assumption to smooth out the estimate (on the logitscale)

If classification is the ultimate goal then learning theseparate class densities well may be unnecessary and can infact be misleading In learning the separate densities formdata one might decide to settle for a rougher high-variancefit to capture these features which are irrelevant for thepurposes of estimating the posterior probabilities In factif classification is the ultimate goal then we need only toestimate the posterior well near the decision boundary (fortwo classes this is the set 119909 | Pr(119866 = 1 | 119883 = 119909) = 12)

B Procedures of Missing Precipitation

This step shows example calculation for kernel functions forweighted mean It is an example question about the weight ofeach situation If the kernel functions are all symmetric samevalues are used for weight based on day distance FollowingTable 6 1st 2nd 3rd and 4th day distance and weightedvalues are shown For example if we want to estimatemissing precipitation for 2010-02-12 (actual value is 6) seefollowing procedures (3 steps) with 4-NN Epanechnikovkernel (Table 7)

Step 1 Select the date for target interpolation data

Step 2 Decide119870th nearest days precipitation and each kernelweight

Step 3 Calculate the weight average to estimate missing

Table 7 Example calculation to interpolate for missing precipita-tion

Step 1 Step 2 Step 3Date Prec Weight PrecsdotWeight Estimation2010-02-10 15 0417 6255

59492010-02-11 172 0667 114722010-02-12 6 mdash mdash2010-02-13 91 0667 60702010-02-14 0 0417 0

Table 8 Calculating missing precipitation for 2010-02-12 (actualvalue is 6) with six different methods

Method 4NN 6NN 8NNEp 5949 5172 4862Qu 5956 5302 4936Tw 5755 5294 4952Tc 6205 5407 4963Co 5945 5197 4876Reg 10325 8967 8813(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine andReg regression)

The rest of the kernel methods for estimating missingprecipitation are described in Table 8

C Sample Calculations with Real Value

This section shows how to calculate missing precipitationwith kernel mean weighed function by using certain numberThis sample selected daily data from 2008 to 2010 with002 possibilities to bivariate by random After selected datasetting data location is operated Zhang et al [39] addressedthat kernel based nonparametric multiple imputation hasbetter performance than general linear regression when thesample data is small or limited

Table 9 shows procedure of kernel weight in each func-tion We used data Feb 10 2012 from Feb 14 2014 toestimate Feb 12 2012 missing data Epanechnikov kernelshowed that longest data has highest estimation as 0417however Triweight kernel showed that longest data has lowestestimation as 0188 Highest weight in nearest value is Tricubekernel and lowest weight is Epanechnikov kernel Generally

Advances in Meteorology 11

Table 9 Sample calculation with certain number

(a) Epanechnikov

Date 210 211 212 213 214Prec 150 172 60 91 00Ep weight 0417 0667 mdash 0667 0417Precsdotweight 626 1147 mdash 607 000Estimation 595

(b) Quartic

Date 210 211 212 213 214Prec 150 172 60 91 00Qu weight 0289 0741 mdash 0741 0289Precsdotweight 434 1275 mdash 674 000Estimation 596

(c) Triweight

Date 210 211 212 213 214Prec 150 172 60 91 00Tw weight 0188 0768 mdash 0768 0188Precsdotweight 282 1321 mdash 699 000Estimation 575

(d) Tricube

Date 210 211 212 213 214Prec 150 172 60 91 00Tc weight 0301 0772 mdash 0772 0301Precsdotweight 452 1328 mdash 703 000Estimation 620

(e) Cosine

Date 210 211 212 213 214Prec 150 172 60 91 00Co weight 0393 0680 mdash 0680 0393Precsdotweight 590 1170 mdash 619 000Estimation 594

Tricube that is high weight shows the overestimation formissing precipitation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] K Kang and V Merwade ldquoDevelopment and application of astorage-release based distributed hydrologic model using GISrdquoJournal of Hydrology vol 403 no 1-2 pp 1ndash13 2011

[2] A J Abebe D P Solomatine and R G W Venneker ldquoApplica-tion of adaptive fuzzy rule-based models for reconstruction ofmissing precipitation eventsrdquoHydrological Sciences Journal vol45 no 3 pp 425ndash436 2000

[3] P Ramos-Calzado J Gomez-Camacho F Perez-Bernal andMF Pita-Lopez ldquoA novel approach to precipitation series com-pletion in climatological datasets application to AndalusiardquoInternational Journal of Climatology vol 28 no 11 pp 1525ndash1534 2008

[4] T Schneider ldquoAnalysis of incomplete climate data estimation ofmean values and covariancematrices and imputation ofmissingvaluesrdquo Journal of Climate vol 14 no 5 pp 853ndash871 2001

[5] S K Regonda D-J Seo B Lawrence J D Brown and JDemargne ldquoShort-term ensemble stream forecasting usingoperationally-produced single-valued streamflow forecastsmdasha Hydrologic Model Output Statistics (HMOS) approachrdquoJournal of Hydrology vol 497 pp 80ndash96 2013

[6] P Coulibaly and N D Evora ldquoComparison of neural networkmethods for infilling missing daily weather recordsrdquo Journal ofHydrology vol 341 no 1-2 pp 27ndash41 2007

[7] H A El Sharif and R S V Teegavarapu ldquoEvaluation ofspatial interpolation methods for missing precipitation datapreservation of spatial statisticsrdquo in Proceedings of the WorldEnvironmental and Water Resources Congress pp 3822ndash3832May 2012

[8] A Bardossy and G Pegram ldquoInfilling missing precipitationrecordsmdasha comparison of a new copula-based method withother techniquesrdquo Journal of Hydrology vol 519 pp 1162ndash11702014

[9] K Schamm M Ziese A Becker et al ldquoGlobal griddedprecipitation over land a description of the new GPCC firstguess daily productrdquo Earth System Science Data vol 6 no 1pp 49ndash60 2014

[10] R S V Teegavarapu ldquoStatistical corrections of spatially inter-polated missing precipitation data estimatesrdquoHydrological Pro-cesses vol 28 no 11 pp 3789ndash3808 2014

[11] Y Da and G Xiurun ldquoAn improved PSO-based ANN withsimulated annealing techniquerdquo Neurocomputing vol 63 pp527ndash533 2005

[12] A di Piazza F L Conti L V Noto F Viola and G La LoggialdquoComparative analysis of different techniques for spatial inter-polation of rainfall data to create a serially complete monthlytime series of precipitation for Sicily Italyrdquo International Journalof Applied Earth Observation and Geoinformation vol 13 no 3pp 396ndash408 2011

[13] E Pisoni F Pastor and M Volta ldquoArtificial Neural Networksto reconstruct incomplete satellite data application to theMediterranean Sea SurfaceTemperaturerdquoNonlinear Processes inGeophysics vol 15 no 1 pp 61ndash70 2008

[14] V Sharma S Rai and A Dev ldquoA comprehensive study ofartificial neural networksrdquo International Journal of AdvancedResearch in Computer Science and Sofrware Engineering vol 2no 10 pp 278ndash284 2012

[15] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy II hydrologic applicationsrdquo Journal of Hydrological Engi-neering vol 5 no 2 pp 124ndash137 2000

[16] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy I preliminary conceptsrdquo Journal of Hydrologic Engineeringvol 5 no 2 pp 115ndash123 2000

[17] D E Rumelhart B Widrow and M A Lehr ldquoThe basic ideasin neural networksrdquo Communications of the ACM vol 37 no 3pp 87ndash92 1994

[18] J Amorocho and W E Hart ldquoA critique of current methods inhydrologic systems investigationrdquo Transactions of the AmericanGeophysical Union vol 45 no 2 pp 307ndash321 1964

12 Advances in Meteorology

[19] A W Minns and M J Hall ldquoArtificial neural networks asrainfall-runoff modelsrdquo Hydrological Sciences Journal vol 41no 3 pp 399ndash417 1996

[20] K Kang and V Merwade ldquoThe effect of spatially uniformand non-uniform precipitation bias correction methods onimproving NEXRAD rainfall accuracy for distributed hydro-logic modelingrdquo Hydrology Research vol 45 no 1 pp 23ndash422014

[21] C Daly W P Gibson G H Taylor G L Johnson andP Pasteris ldquoA knowledge-based approach to the statisticalmapping of climaterdquo Climate Research vol 22 no 2 pp 99ndash1132002

[22] J D Creutin H Andrieu and D Faure ldquoUse of a weatherradar for the hydrology of a mountainous area Part II radarmeasurement validationrdquo Journal of Hydrology vol 193 no 1ndash4 pp 26ndash44 1997

[23] Y L Xia P Fabian A Stohl and M Winterhalter ldquoForest cli-matology estimation of missing values for Bavaria GermanyrdquoAgricultural and Forest Meteorology vol 96 no 1ndash3 pp 131ndash1441999

[24] C J Willmott S M Robeson and J J Feddema ldquoEstimatingcontinental and terrestrial precipitation averages from rain-gauge networksrdquo International Journal of Climatology vol 14no 4 pp 403ndash414 1994

[25] R S V Teegavarapu and V Chandramouli ldquoImproved weight-ing methods deterministic and stochastic data-driven modelsfor estimation of missing precipitation recordsrdquo Journal ofHydrology vol 312 no 1ndash4 pp 191ndash206 2005

[26] J A Smith ldquoPrecipitationrdquo in Handbook of Hydrology D RMaidment Ed vol 3 chapter 3 McGraw Hill New York NYUSA 1993

[27] J R Simanton and H B Osborn ldquoReciprocal-distance estimateof point rainfallrdquo Journal of Hydraulic Engineering Division vol106 no 7 pp 1242ndash1246 1980

[28] J D-J Salas ldquoAnalysis and modeling of hydrological timeseriesrdquo inHandbook of Hydrology D R Maidment Ed vol 19chapter 19 pp 191ndash1972 McGraw-Hill New York NY USA1993

[29] S J Jeffrey J O Carter K BMoodie and A R Beswick ldquoUsingspatial interpolation to construct a comprehensive archive ofAustralian climate datardquoEnvironmentalModelling and Softwarevol 16 no 4 pp 309ndash330 2001

[30] M Franklin V R Kotamarthi M L Stein and D R CookldquoGenerating data ensembles over a model grid from sparseclimate point measurementsrdquo Journal of Physics ConferenceSeries vol 125 Article ID 012019 2008

[31] K C Young ldquoA three-way model for interpolating for monthlyprecipitation valuesrdquo Monthly Weather Review vol 120 no 11pp 2561ndash2569 1992

[32] F Filippini G Galliani and L Pomi ldquoThe estimation ofmissingmeteorological data in a network of automatic stationsrdquoTransactions on Ecology and the Environment vol 4 pp 283ndash291 1994

[33] W P Lowry Compendium of Lecture Notes in Climatologyfor Class IV Meteorological Personnel Secretariat of the WorldMeteorological Organization Geneva Switzerland 1972

[34] M C Acock and Y A Pachepsky ldquoEstimating missing weatherdata for agricultural simulations using group method of datahandlingrdquo Journal of AppliedMeteorology vol 39 no 7 pp 1176ndash1184 2000

[35] S L Neitsch J G Arnold J R Kiniry J R Williamsand K W King Soil and Water Assessment ToolmdashTheoreticalDocumentation (Version 2005) Texas Water Resource InstituteCollege Station Tex USA 2005

[36] S S Shapiro and M B Wilk ldquoAn analysis of variance test fornormality complete samplesrdquo Biometrika vol 52 no 3-4 pp591ndash611 1965

[37] M Friedman ldquoThe use of ranks to avoid the assumption ofnormality implicit in the analysis of variancerdquo Journal of theAmerican Statistical Association vol 32 no 200 pp 675ndash7011937

[38] F Wilcoxon ldquoIndividual Comparisons by Ranking MethodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[39] S Zhang Z Jin X Zhu and J Zhang ldquoMissing data analysisa kernel-based multi-imputation approachrdquo in Transactionson Computational Science III vol 5300 of Lecture Notes inComputer Science pp 122ndash142 Springer Berlin Germany 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ClimatologyJournal of

EcologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

EarthquakesJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom

Applied ampEnvironmentalSoil Science

Volume 2014

Mining

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal of

Geophysics

OceanographyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of Computational Environmental SciencesHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal ofPetroleum Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

GeochemistryHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Atmospheric SciencesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OceanographyHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MineralogyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MeteorologyAdvances in

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Paleontology JournalHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ScientificaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geological ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geology Advances in

Page 7: Research Article Interpolation of Missing Precipitation ...downloads.hindawi.com/journals/amete/2015/935868.pdf · withthe NN regression approach, in terms of both statistical data

Advances in Meteorology 7

0

100

200

300

400

500

60020

08-0

1-01

2008

-01-

2220

08-0

2-12

2008

-03-

0420

08-0

3-25

2008

-04-

1520

08-0

5-06

2008

-05-

2720

08-0

6-17

2008

-07-

0820

08-0

7-29

2008

-08-

1920

08-0

9-09

2008

-09-

3020

08-1

0-21

2008

-11-

1120

08-1

2-02

2008

-12-

2320

09-0

1-13

2009

-02-

0320

09-0

2-24

2009

-03-

1720

09-0

4-07

2009

-04-

2820

09-0

5-19

2009

-06-

0920

09-0

6-30

2009

-07-

2120

09-0

8-11

2009

-09-

0120

09-0

9-22

2009

-10-

1320

09-1

1-03

2009

-11-

2420

09-1

2-15

2010

-01-

0520

10-0

1-26

2010

-02-

1620

10-0

3-09

2010

-03-

3020

10-0

4-20

2010

-05-

1120

10-0

6-01

2010

-06-

2220

10-0

7-13

2010

-08-

0320

10-0

8-24

2010

-09-

1420

10-1

0-05

2010

-10-

2620

10-1

1-16

2010

-12-

0720

10-1

2-28

ObsSim

Figure 3 Calibrated model result using original precipitation input 119909-axis represents time in days and 119910-axis represents flow in cubic metersper second

Table 4 Details of SWAT parameters which are related to runoff mechanism for Imha watershed

Parameter Description Selected valueESCO Soil evaporation compensation factor 09500EPCO Plant water uptake compensation factor 10000EVLAI Leaf area index at which no evaporation occurs from water surface [m2m2

] 30000FFCB Initial soil water storage expressed as a fraction of field capacity water content 00000IEVENT Rainfallrunoff code 0 = daily rainfallCN 00000ICRK Crack flow code 1 = model crack flow in soil 00000SURLAG Surface runoff lag time [days] 40000ADJ PKR Peak rate adjustment factor for sediment routing in the subbasin (tributary channels) 00000PRF Peak rate adjustment factor for sediment routing in the main channel 10000

SPCON Linear parameter for calculating the maximum amount of sediment that can be reentrainedduring channel sediment 00001

SPEXP Exponent parameter for calculating sediment reentrained in channel sediment routing 10000

time step updated information as soil water content changesExcess rainfall equation in SCS-CN method was generatedbased on historical relationship between the curve numberand the hydrologic mechanism for over 20 years Through-out the surface runoff calculation infiltration should beupdated over time according to the soil type Other abstrac-tions such as evapotranspiration and soil and snow evap-oration are calculated by Penman-Monteith method and

meteorological statistics Finally the kinematic storagemodelis used to compute groundwater storage and seepage Flowresulting in SWAT modeling is routed HRUs to watershedoutlet Figure 3 shows the calibration of themodel simulationas the initial step and the specific parameters are describedin Table 4 After the calibration of the SWAT model thesix different interpolated precipitation datasets with threedifferent reference ranges for each (a total of twenty-four

8 Advances in Meteorology

Table 5 Details of simulation results with six different precipitation infilling methods in Imha watershed

4-NN 6-NN 8-NN119864NS 119877

2 RMSE 119864NS 1198772 RMSE 119864NS 119877

2 RMSEEp 080 083 1532 091 092 1060 078 082 1616Qu 073 078 1783 091 093 1048 088 092 1180Tw 091 091 1056 093 094 925 095 095 810Tc 095 095 772 093 094 903 095 095 764Co 093 094 883 095 095 772 091 093 1044Reg 069 080 1914 021 065 3071 071 073 2148

interpolated precipitations data points) were used to assessthe performance of interpolated precipitation data for hydro-logic model simulation Streamflow simulations were donefor three years from 2008 to 2010 To evaluate the modelperformance considering the use of different interpolatedprecipitation datasets this study used 119864NS (Nash-Sutcliffecoefficient) 119877-square (coefficient of determination) andRMSE (root mean square error) Table 5 and Figure 4 showthat the simulation results from 119896nn-regression exhibit lowSWAT simulation performance for streamflow estimationswith 054119864NS 074119877-square and 2378m3s RMSE as anaverage All of the kernel functions on the other handexhibit good performance for hydrologic simulations withinterpolated precipitation data (Table 5 and Figure 4) theaverage of 119864NS 119877-square and RMSE (1) for Epanechnikov is083 086 and 1403m3s (2) for Quartic is 084 088 and1303m3s (3) for Triweight is 093 093 and 930m3s (4)for Tricube is 094 095 and 813m3s and (5) for Cosine is093 094 and 900m3s respectively

5 Conclusions

Five different kernel functions were applied to the Imhawatershed to evaluate the performance of each weightedmethod for estimating missing precipitation data and the useof interpolated data for hydrologic simulations was assessedThe following conclusions can be drawn from this research

(1) To estimate missing precipitation data points explor-atory procedures should consider the spatiotempo-ral variations of precipitation Due to difficulty onaccounting for these variations statistical methodsfor estimating missing precipitation data are com-monly used

(2) Although ANNs are an advanced approach for esti-matingmissing datamechanisms are unclear becausethe neuron system is ultimately a black-box modelThus regressionmethods are widely used for estimat-ing missing data even though there are limitationsin that regression methods cannot follow normaldistribution when the sample is small

(3) When using kernel functions as a weighted methodestimated missing data would satisfy normal distri-bution which is more statistically sound Also kernelmethods can overcome weakness in 119896nn-regression if

the data have outliers andor a nonlinear trend aroundthe missing data points in terms of mean value

(4) This study assessed the five kernel functions Epan-echnikov Quartic Triweight Tricube and Cosine asa weight for predictingmissing values In comparisonwith the 119896nn-regression method this study demon-strates that the kernel approaches provide higherquality interpolated precipitation data than the 119896nn-regression approach In addition the kernel functionresults better conform to statistical standards

(5) Furthermore higher quality of interpolated precipi-tation data results in better performance for hydro-logic simulations as exemplified in this study All ofthe statistical analyses of the streamflow simulationsshowed that the simulations using the interpolatedprecipitation data from the kernel functions providebetter results than using 119896nn-regression

(6) Use of kernel distribution is a more effective methodthan regression when the precipitation data have anupward or downward trend However if the precip-itation data have a nonlinear trend it is difficult toeffectively reconstruct the missing values For furtherresearch a time series analysis or a random walkmodel using a stochastic process are possiblemethodsby which to estimate missing data where there is anonlinear trend

Appendices

A Kenel Functions

Kernel density estimation is an unsupervised learning pro-cedure which historically precedes kernel regression It alsoleads naturally to a simple family of procedures for nonpara-metric classification

A1 Kernel Density Estimation Suppose we have a randomsample 119909

1 1199092 119909

119873draw from a probability density 119891

119909(119909)

and we wish to estimate 119891119909at a point 119909

0 For simplicity we

assume for now that 119909 isin 119877 (real value) Arguing as before anatural local estimate has the form

119891119909(1199090) =

119909119894isin 119873 (119909

0)

119873120582 (A1)

where 119909119894means number of 119909

119894which converges to119873(119909

0) and

119873(1199090) is a small metric neighborhood around 119909

0of width 120582

Advances in Meteorology 9

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(a) 4-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(b) 6-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

EPQUTW

TCCOKNN

(c) 8-NN

Figure 4 Scatter plots for SWAT simulation (EP QU TW TC CO and119870NN represents Epanechnikov Quartic Triweight Tricube Cosineand 119870NN-regression resp)

This estimate is bumpy and the smooth Parzen estimate ispreferred

119891119909(1199090) =

1

119873120582

119873

sum

119894=1

119870120582(1199090 119909119894) (A2)

because it counts observations close to 1199090with weights that

decrease with distance from 1199090 In this case a popular choice

for 119870120582is the Gaussian kernel 119870

120582(1199090 119909119894) = 120601(|119909 minus 119909

0|120582)

Letting 120601120582denote the Gaussian density with mean zero and

standard-deviation 120582 then (A2) has the form

119891119909(1199090) =

1

119873

119873

sum

119894=1

120601120582(119909 minus 119909

119894) = (119865 lowast 120601

120582) (119909) (A3)

the convolution of the sample empirical distribution 119865 with120601120582 The distribution 119865(119909) puts mass 1119873 at each of the ob-

served 119909119894and is jumpy in 119891

119909(119909) we have smoothed 119865

by adding independent Gaussian noise to each observation119909119894The Parzen density estimate is the equivalent of the local

average and improvements have been proposed along thelines of local regression (on the log scale for densities) Wewill not pursue these here In 119877119901 the natural generalization ofthe Gaussian density estimate amounts to using the Gaussianproduct kernel in (A3)

119891119909(1199090) =

1

119873 (21205822120587)1199012

119873

sum

119894=1

119890minus(12)(||119909119894minus1199090||120582)

2

(A4)

10 Advances in Meteorology

Table 6 Weighted values depending on day distance with each 119870NN

4-NN 6-NN 8-NN1st 2nd 1st 2nd 3rd 1st 2nd 3rd 4th

Ep 0667 0417 0703 0563 0328 0720 0630 0480 0270Qu 0741 0289 0824 0527 0179 0864 0662 0384 0122Tw 0768 0188 0901 0461 0092 0968 0648 0287 0051Tc 0772 0301 0824 0579 0167 0844 0709 0416 0100Co 0680 0393 0726 0555 0301 0747 0635 0462 0243

A2 Kernel Density Classification One can use nonparamet-ric density estimates for classification in a straight-forwardfashion using Bayesrsquo theorem Suppose for a 119869 class problemwe fit nonparametric density estimates 119891

119895(119883) 119895 = 1 119869

separately in each of the classes and we also have estimatesof the class priors

119895(usually the sample proportions) Then

Pr (119866 = 119895 | 119883 = 1199090) =

119895119891119895(1199090)

sum119869

119896=1119896119891119896(1199090)

(A5)

In this region the data are sparse for both classes and sincethe Gaussian kernel density estimates use matric kernels thedensity estimates are low and of poor quality (high variance)in these regions The local logistic regression method usesthe tricube kernel with 119896-NN bandwidth this effectivelywidens the kernel in this region and makes use of the locallinear assumption to smooth out the estimate (on the logitscale)

If classification is the ultimate goal then learning theseparate class densities well may be unnecessary and can infact be misleading In learning the separate densities formdata one might decide to settle for a rougher high-variancefit to capture these features which are irrelevant for thepurposes of estimating the posterior probabilities In factif classification is the ultimate goal then we need only toestimate the posterior well near the decision boundary (fortwo classes this is the set 119909 | Pr(119866 = 1 | 119883 = 119909) = 12)

B Procedures of Missing Precipitation

This step shows example calculation for kernel functions forweighted mean It is an example question about the weight ofeach situation If the kernel functions are all symmetric samevalues are used for weight based on day distance FollowingTable 6 1st 2nd 3rd and 4th day distance and weightedvalues are shown For example if we want to estimatemissing precipitation for 2010-02-12 (actual value is 6) seefollowing procedures (3 steps) with 4-NN Epanechnikovkernel (Table 7)

Step 1 Select the date for target interpolation data

Step 2 Decide119870th nearest days precipitation and each kernelweight

Step 3 Calculate the weight average to estimate missing

Table 7 Example calculation to interpolate for missing precipita-tion

Step 1 Step 2 Step 3Date Prec Weight PrecsdotWeight Estimation2010-02-10 15 0417 6255

59492010-02-11 172 0667 114722010-02-12 6 mdash mdash2010-02-13 91 0667 60702010-02-14 0 0417 0

Table 8 Calculating missing precipitation for 2010-02-12 (actualvalue is 6) with six different methods

Method 4NN 6NN 8NNEp 5949 5172 4862Qu 5956 5302 4936Tw 5755 5294 4952Tc 6205 5407 4963Co 5945 5197 4876Reg 10325 8967 8813(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine andReg regression)

The rest of the kernel methods for estimating missingprecipitation are described in Table 8

C Sample Calculations with Real Value

This section shows how to calculate missing precipitationwith kernel mean weighed function by using certain numberThis sample selected daily data from 2008 to 2010 with002 possibilities to bivariate by random After selected datasetting data location is operated Zhang et al [39] addressedthat kernel based nonparametric multiple imputation hasbetter performance than general linear regression when thesample data is small or limited

Table 9 shows procedure of kernel weight in each func-tion We used data Feb 10 2012 from Feb 14 2014 toestimate Feb 12 2012 missing data Epanechnikov kernelshowed that longest data has highest estimation as 0417however Triweight kernel showed that longest data has lowestestimation as 0188 Highest weight in nearest value is Tricubekernel and lowest weight is Epanechnikov kernel Generally

Advances in Meteorology 11

Table 9 Sample calculation with certain number

(a) Epanechnikov

Date 210 211 212 213 214Prec 150 172 60 91 00Ep weight 0417 0667 mdash 0667 0417Precsdotweight 626 1147 mdash 607 000Estimation 595

(b) Quartic

Date 210 211 212 213 214Prec 150 172 60 91 00Qu weight 0289 0741 mdash 0741 0289Precsdotweight 434 1275 mdash 674 000Estimation 596

(c) Triweight

Date 210 211 212 213 214Prec 150 172 60 91 00Tw weight 0188 0768 mdash 0768 0188Precsdotweight 282 1321 mdash 699 000Estimation 575

(d) Tricube

Date 210 211 212 213 214Prec 150 172 60 91 00Tc weight 0301 0772 mdash 0772 0301Precsdotweight 452 1328 mdash 703 000Estimation 620

(e) Cosine

Date 210 211 212 213 214Prec 150 172 60 91 00Co weight 0393 0680 mdash 0680 0393Precsdotweight 590 1170 mdash 619 000Estimation 594

Tricube that is high weight shows the overestimation formissing precipitation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] K Kang and V Merwade ldquoDevelopment and application of astorage-release based distributed hydrologic model using GISrdquoJournal of Hydrology vol 403 no 1-2 pp 1ndash13 2011

[2] A J Abebe D P Solomatine and R G W Venneker ldquoApplica-tion of adaptive fuzzy rule-based models for reconstruction ofmissing precipitation eventsrdquoHydrological Sciences Journal vol45 no 3 pp 425ndash436 2000

[3] P Ramos-Calzado J Gomez-Camacho F Perez-Bernal andMF Pita-Lopez ldquoA novel approach to precipitation series com-pletion in climatological datasets application to AndalusiardquoInternational Journal of Climatology vol 28 no 11 pp 1525ndash1534 2008

[4] T Schneider ldquoAnalysis of incomplete climate data estimation ofmean values and covariancematrices and imputation ofmissingvaluesrdquo Journal of Climate vol 14 no 5 pp 853ndash871 2001

[5] S K Regonda D-J Seo B Lawrence J D Brown and JDemargne ldquoShort-term ensemble stream forecasting usingoperationally-produced single-valued streamflow forecastsmdasha Hydrologic Model Output Statistics (HMOS) approachrdquoJournal of Hydrology vol 497 pp 80ndash96 2013

[6] P Coulibaly and N D Evora ldquoComparison of neural networkmethods for infilling missing daily weather recordsrdquo Journal ofHydrology vol 341 no 1-2 pp 27ndash41 2007

[7] H A El Sharif and R S V Teegavarapu ldquoEvaluation ofspatial interpolation methods for missing precipitation datapreservation of spatial statisticsrdquo in Proceedings of the WorldEnvironmental and Water Resources Congress pp 3822ndash3832May 2012

[8] A Bardossy and G Pegram ldquoInfilling missing precipitationrecordsmdasha comparison of a new copula-based method withother techniquesrdquo Journal of Hydrology vol 519 pp 1162ndash11702014

[9] K Schamm M Ziese A Becker et al ldquoGlobal griddedprecipitation over land a description of the new GPCC firstguess daily productrdquo Earth System Science Data vol 6 no 1pp 49ndash60 2014

[10] R S V Teegavarapu ldquoStatistical corrections of spatially inter-polated missing precipitation data estimatesrdquoHydrological Pro-cesses vol 28 no 11 pp 3789ndash3808 2014

[11] Y Da and G Xiurun ldquoAn improved PSO-based ANN withsimulated annealing techniquerdquo Neurocomputing vol 63 pp527ndash533 2005

[12] A di Piazza F L Conti L V Noto F Viola and G La LoggialdquoComparative analysis of different techniques for spatial inter-polation of rainfall data to create a serially complete monthlytime series of precipitation for Sicily Italyrdquo International Journalof Applied Earth Observation and Geoinformation vol 13 no 3pp 396ndash408 2011

[13] E Pisoni F Pastor and M Volta ldquoArtificial Neural Networksto reconstruct incomplete satellite data application to theMediterranean Sea SurfaceTemperaturerdquoNonlinear Processes inGeophysics vol 15 no 1 pp 61ndash70 2008

[14] V Sharma S Rai and A Dev ldquoA comprehensive study ofartificial neural networksrdquo International Journal of AdvancedResearch in Computer Science and Sofrware Engineering vol 2no 10 pp 278ndash284 2012

[15] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy II hydrologic applicationsrdquo Journal of Hydrological Engi-neering vol 5 no 2 pp 124ndash137 2000

[16] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy I preliminary conceptsrdquo Journal of Hydrologic Engineeringvol 5 no 2 pp 115ndash123 2000

[17] D E Rumelhart B Widrow and M A Lehr ldquoThe basic ideasin neural networksrdquo Communications of the ACM vol 37 no 3pp 87ndash92 1994

[18] J Amorocho and W E Hart ldquoA critique of current methods inhydrologic systems investigationrdquo Transactions of the AmericanGeophysical Union vol 45 no 2 pp 307ndash321 1964

12 Advances in Meteorology

[19] A W Minns and M J Hall ldquoArtificial neural networks asrainfall-runoff modelsrdquo Hydrological Sciences Journal vol 41no 3 pp 399ndash417 1996

[20] K Kang and V Merwade ldquoThe effect of spatially uniformand non-uniform precipitation bias correction methods onimproving NEXRAD rainfall accuracy for distributed hydro-logic modelingrdquo Hydrology Research vol 45 no 1 pp 23ndash422014

[21] C Daly W P Gibson G H Taylor G L Johnson andP Pasteris ldquoA knowledge-based approach to the statisticalmapping of climaterdquo Climate Research vol 22 no 2 pp 99ndash1132002

[22] J D Creutin H Andrieu and D Faure ldquoUse of a weatherradar for the hydrology of a mountainous area Part II radarmeasurement validationrdquo Journal of Hydrology vol 193 no 1ndash4 pp 26ndash44 1997

[23] Y L Xia P Fabian A Stohl and M Winterhalter ldquoForest cli-matology estimation of missing values for Bavaria GermanyrdquoAgricultural and Forest Meteorology vol 96 no 1ndash3 pp 131ndash1441999

[24] C J Willmott S M Robeson and J J Feddema ldquoEstimatingcontinental and terrestrial precipitation averages from rain-gauge networksrdquo International Journal of Climatology vol 14no 4 pp 403ndash414 1994

[25] R S V Teegavarapu and V Chandramouli ldquoImproved weight-ing methods deterministic and stochastic data-driven modelsfor estimation of missing precipitation recordsrdquo Journal ofHydrology vol 312 no 1ndash4 pp 191ndash206 2005

[26] J A Smith ldquoPrecipitationrdquo in Handbook of Hydrology D RMaidment Ed vol 3 chapter 3 McGraw Hill New York NYUSA 1993

[27] J R Simanton and H B Osborn ldquoReciprocal-distance estimateof point rainfallrdquo Journal of Hydraulic Engineering Division vol106 no 7 pp 1242ndash1246 1980

[28] J D-J Salas ldquoAnalysis and modeling of hydrological timeseriesrdquo inHandbook of Hydrology D R Maidment Ed vol 19chapter 19 pp 191ndash1972 McGraw-Hill New York NY USA1993

[29] S J Jeffrey J O Carter K BMoodie and A R Beswick ldquoUsingspatial interpolation to construct a comprehensive archive ofAustralian climate datardquoEnvironmentalModelling and Softwarevol 16 no 4 pp 309ndash330 2001

[30] M Franklin V R Kotamarthi M L Stein and D R CookldquoGenerating data ensembles over a model grid from sparseclimate point measurementsrdquo Journal of Physics ConferenceSeries vol 125 Article ID 012019 2008

[31] K C Young ldquoA three-way model for interpolating for monthlyprecipitation valuesrdquo Monthly Weather Review vol 120 no 11pp 2561ndash2569 1992

[32] F Filippini G Galliani and L Pomi ldquoThe estimation ofmissingmeteorological data in a network of automatic stationsrdquoTransactions on Ecology and the Environment vol 4 pp 283ndash291 1994

[33] W P Lowry Compendium of Lecture Notes in Climatologyfor Class IV Meteorological Personnel Secretariat of the WorldMeteorological Organization Geneva Switzerland 1972

[34] M C Acock and Y A Pachepsky ldquoEstimating missing weatherdata for agricultural simulations using group method of datahandlingrdquo Journal of AppliedMeteorology vol 39 no 7 pp 1176ndash1184 2000

[35] S L Neitsch J G Arnold J R Kiniry J R Williamsand K W King Soil and Water Assessment ToolmdashTheoreticalDocumentation (Version 2005) Texas Water Resource InstituteCollege Station Tex USA 2005

[36] S S Shapiro and M B Wilk ldquoAn analysis of variance test fornormality complete samplesrdquo Biometrika vol 52 no 3-4 pp591ndash611 1965

[37] M Friedman ldquoThe use of ranks to avoid the assumption ofnormality implicit in the analysis of variancerdquo Journal of theAmerican Statistical Association vol 32 no 200 pp 675ndash7011937

[38] F Wilcoxon ldquoIndividual Comparisons by Ranking MethodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[39] S Zhang Z Jin X Zhu and J Zhang ldquoMissing data analysisa kernel-based multi-imputation approachrdquo in Transactionson Computational Science III vol 5300 of Lecture Notes inComputer Science pp 122ndash142 Springer Berlin Germany 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ClimatologyJournal of

EcologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

EarthquakesJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom

Applied ampEnvironmentalSoil Science

Volume 2014

Mining

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal of

Geophysics

OceanographyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of Computational Environmental SciencesHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal ofPetroleum Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

GeochemistryHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Atmospheric SciencesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OceanographyHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MineralogyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MeteorologyAdvances in

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Paleontology JournalHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ScientificaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geological ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geology Advances in

Page 8: Research Article Interpolation of Missing Precipitation ...downloads.hindawi.com/journals/amete/2015/935868.pdf · withthe NN regression approach, in terms of both statistical data

8 Advances in Meteorology

Table 5 Details of simulation results with six different precipitation infilling methods in Imha watershed

4-NN 6-NN 8-NN119864NS 119877

2 RMSE 119864NS 1198772 RMSE 119864NS 119877

2 RMSEEp 080 083 1532 091 092 1060 078 082 1616Qu 073 078 1783 091 093 1048 088 092 1180Tw 091 091 1056 093 094 925 095 095 810Tc 095 095 772 093 094 903 095 095 764Co 093 094 883 095 095 772 091 093 1044Reg 069 080 1914 021 065 3071 071 073 2148

interpolated precipitations data points) were used to assessthe performance of interpolated precipitation data for hydro-logic model simulation Streamflow simulations were donefor three years from 2008 to 2010 To evaluate the modelperformance considering the use of different interpolatedprecipitation datasets this study used 119864NS (Nash-Sutcliffecoefficient) 119877-square (coefficient of determination) andRMSE (root mean square error) Table 5 and Figure 4 showthat the simulation results from 119896nn-regression exhibit lowSWAT simulation performance for streamflow estimationswith 054119864NS 074119877-square and 2378m3s RMSE as anaverage All of the kernel functions on the other handexhibit good performance for hydrologic simulations withinterpolated precipitation data (Table 5 and Figure 4) theaverage of 119864NS 119877-square and RMSE (1) for Epanechnikov is083 086 and 1403m3s (2) for Quartic is 084 088 and1303m3s (3) for Triweight is 093 093 and 930m3s (4)for Tricube is 094 095 and 813m3s and (5) for Cosine is093 094 and 900m3s respectively

5 Conclusions

Five different kernel functions were applied to the Imhawatershed to evaluate the performance of each weightedmethod for estimating missing precipitation data and the useof interpolated data for hydrologic simulations was assessedThe following conclusions can be drawn from this research

(1) To estimate missing precipitation data points explor-atory procedures should consider the spatiotempo-ral variations of precipitation Due to difficulty onaccounting for these variations statistical methodsfor estimating missing precipitation data are com-monly used

(2) Although ANNs are an advanced approach for esti-matingmissing datamechanisms are unclear becausethe neuron system is ultimately a black-box modelThus regressionmethods are widely used for estimat-ing missing data even though there are limitationsin that regression methods cannot follow normaldistribution when the sample is small

(3) When using kernel functions as a weighted methodestimated missing data would satisfy normal distri-bution which is more statistically sound Also kernelmethods can overcome weakness in 119896nn-regression if

the data have outliers andor a nonlinear trend aroundthe missing data points in terms of mean value

(4) This study assessed the five kernel functions Epan-echnikov Quartic Triweight Tricube and Cosine asa weight for predictingmissing values In comparisonwith the 119896nn-regression method this study demon-strates that the kernel approaches provide higherquality interpolated precipitation data than the 119896nn-regression approach In addition the kernel functionresults better conform to statistical standards

(5) Furthermore higher quality of interpolated precipi-tation data results in better performance for hydro-logic simulations as exemplified in this study All ofthe statistical analyses of the streamflow simulationsshowed that the simulations using the interpolatedprecipitation data from the kernel functions providebetter results than using 119896nn-regression

(6) Use of kernel distribution is a more effective methodthan regression when the precipitation data have anupward or downward trend However if the precip-itation data have a nonlinear trend it is difficult toeffectively reconstruct the missing values For furtherresearch a time series analysis or a random walkmodel using a stochastic process are possiblemethodsby which to estimate missing data where there is anonlinear trend

Appendices

A Kenel Functions

Kernel density estimation is an unsupervised learning pro-cedure which historically precedes kernel regression It alsoleads naturally to a simple family of procedures for nonpara-metric classification

A1 Kernel Density Estimation Suppose we have a randomsample 119909

1 1199092 119909

119873draw from a probability density 119891

119909(119909)

and we wish to estimate 119891119909at a point 119909

0 For simplicity we

assume for now that 119909 isin 119877 (real value) Arguing as before anatural local estimate has the form

119891119909(1199090) =

119909119894isin 119873 (119909

0)

119873120582 (A1)

where 119909119894means number of 119909

119894which converges to119873(119909

0) and

119873(1199090) is a small metric neighborhood around 119909

0of width 120582

Advances in Meteorology 9

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(a) 4-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(b) 6-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

EPQUTW

TCCOKNN

(c) 8-NN

Figure 4 Scatter plots for SWAT simulation (EP QU TW TC CO and119870NN represents Epanechnikov Quartic Triweight Tricube Cosineand 119870NN-regression resp)

This estimate is bumpy and the smooth Parzen estimate ispreferred

119891119909(1199090) =

1

119873120582

119873

sum

119894=1

119870120582(1199090 119909119894) (A2)

because it counts observations close to 1199090with weights that

decrease with distance from 1199090 In this case a popular choice

for 119870120582is the Gaussian kernel 119870

120582(1199090 119909119894) = 120601(|119909 minus 119909

0|120582)

Letting 120601120582denote the Gaussian density with mean zero and

standard-deviation 120582 then (A2) has the form

119891119909(1199090) =

1

119873

119873

sum

119894=1

120601120582(119909 minus 119909

119894) = (119865 lowast 120601

120582) (119909) (A3)

the convolution of the sample empirical distribution 119865 with120601120582 The distribution 119865(119909) puts mass 1119873 at each of the ob-

served 119909119894and is jumpy in 119891

119909(119909) we have smoothed 119865

by adding independent Gaussian noise to each observation119909119894The Parzen density estimate is the equivalent of the local

average and improvements have been proposed along thelines of local regression (on the log scale for densities) Wewill not pursue these here In 119877119901 the natural generalization ofthe Gaussian density estimate amounts to using the Gaussianproduct kernel in (A3)

119891119909(1199090) =

1

119873 (21205822120587)1199012

119873

sum

119894=1

119890minus(12)(||119909119894minus1199090||120582)

2

(A4)

10 Advances in Meteorology

Table 6 Weighted values depending on day distance with each 119870NN

4-NN 6-NN 8-NN1st 2nd 1st 2nd 3rd 1st 2nd 3rd 4th

Ep 0667 0417 0703 0563 0328 0720 0630 0480 0270Qu 0741 0289 0824 0527 0179 0864 0662 0384 0122Tw 0768 0188 0901 0461 0092 0968 0648 0287 0051Tc 0772 0301 0824 0579 0167 0844 0709 0416 0100Co 0680 0393 0726 0555 0301 0747 0635 0462 0243

A2 Kernel Density Classification One can use nonparamet-ric density estimates for classification in a straight-forwardfashion using Bayesrsquo theorem Suppose for a 119869 class problemwe fit nonparametric density estimates 119891

119895(119883) 119895 = 1 119869

separately in each of the classes and we also have estimatesof the class priors

119895(usually the sample proportions) Then

Pr (119866 = 119895 | 119883 = 1199090) =

119895119891119895(1199090)

sum119869

119896=1119896119891119896(1199090)

(A5)

In this region the data are sparse for both classes and sincethe Gaussian kernel density estimates use matric kernels thedensity estimates are low and of poor quality (high variance)in these regions The local logistic regression method usesthe tricube kernel with 119896-NN bandwidth this effectivelywidens the kernel in this region and makes use of the locallinear assumption to smooth out the estimate (on the logitscale)

If classification is the ultimate goal then learning theseparate class densities well may be unnecessary and can infact be misleading In learning the separate densities formdata one might decide to settle for a rougher high-variancefit to capture these features which are irrelevant for thepurposes of estimating the posterior probabilities In factif classification is the ultimate goal then we need only toestimate the posterior well near the decision boundary (fortwo classes this is the set 119909 | Pr(119866 = 1 | 119883 = 119909) = 12)

B Procedures of Missing Precipitation

This step shows example calculation for kernel functions forweighted mean It is an example question about the weight ofeach situation If the kernel functions are all symmetric samevalues are used for weight based on day distance FollowingTable 6 1st 2nd 3rd and 4th day distance and weightedvalues are shown For example if we want to estimatemissing precipitation for 2010-02-12 (actual value is 6) seefollowing procedures (3 steps) with 4-NN Epanechnikovkernel (Table 7)

Step 1 Select the date for target interpolation data

Step 2 Decide119870th nearest days precipitation and each kernelweight

Step 3 Calculate the weight average to estimate missing

Table 7 Example calculation to interpolate for missing precipita-tion

Step 1 Step 2 Step 3Date Prec Weight PrecsdotWeight Estimation2010-02-10 15 0417 6255

59492010-02-11 172 0667 114722010-02-12 6 mdash mdash2010-02-13 91 0667 60702010-02-14 0 0417 0

Table 8 Calculating missing precipitation for 2010-02-12 (actualvalue is 6) with six different methods

Method 4NN 6NN 8NNEp 5949 5172 4862Qu 5956 5302 4936Tw 5755 5294 4952Tc 6205 5407 4963Co 5945 5197 4876Reg 10325 8967 8813(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine andReg regression)

The rest of the kernel methods for estimating missingprecipitation are described in Table 8

C Sample Calculations with Real Value

This section shows how to calculate missing precipitationwith kernel mean weighed function by using certain numberThis sample selected daily data from 2008 to 2010 with002 possibilities to bivariate by random After selected datasetting data location is operated Zhang et al [39] addressedthat kernel based nonparametric multiple imputation hasbetter performance than general linear regression when thesample data is small or limited

Table 9 shows procedure of kernel weight in each func-tion We used data Feb 10 2012 from Feb 14 2014 toestimate Feb 12 2012 missing data Epanechnikov kernelshowed that longest data has highest estimation as 0417however Triweight kernel showed that longest data has lowestestimation as 0188 Highest weight in nearest value is Tricubekernel and lowest weight is Epanechnikov kernel Generally

Advances in Meteorology 11

Table 9 Sample calculation with certain number

(a) Epanechnikov

Date 210 211 212 213 214Prec 150 172 60 91 00Ep weight 0417 0667 mdash 0667 0417Precsdotweight 626 1147 mdash 607 000Estimation 595

(b) Quartic

Date 210 211 212 213 214Prec 150 172 60 91 00Qu weight 0289 0741 mdash 0741 0289Precsdotweight 434 1275 mdash 674 000Estimation 596

(c) Triweight

Date 210 211 212 213 214Prec 150 172 60 91 00Tw weight 0188 0768 mdash 0768 0188Precsdotweight 282 1321 mdash 699 000Estimation 575

(d) Tricube

Date 210 211 212 213 214Prec 150 172 60 91 00Tc weight 0301 0772 mdash 0772 0301Precsdotweight 452 1328 mdash 703 000Estimation 620

(e) Cosine

Date 210 211 212 213 214Prec 150 172 60 91 00Co weight 0393 0680 mdash 0680 0393Precsdotweight 590 1170 mdash 619 000Estimation 594

Tricube that is high weight shows the overestimation formissing precipitation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] K Kang and V Merwade ldquoDevelopment and application of astorage-release based distributed hydrologic model using GISrdquoJournal of Hydrology vol 403 no 1-2 pp 1ndash13 2011

[2] A J Abebe D P Solomatine and R G W Venneker ldquoApplica-tion of adaptive fuzzy rule-based models for reconstruction ofmissing precipitation eventsrdquoHydrological Sciences Journal vol45 no 3 pp 425ndash436 2000

[3] P Ramos-Calzado J Gomez-Camacho F Perez-Bernal andMF Pita-Lopez ldquoA novel approach to precipitation series com-pletion in climatological datasets application to AndalusiardquoInternational Journal of Climatology vol 28 no 11 pp 1525ndash1534 2008

[4] T Schneider ldquoAnalysis of incomplete climate data estimation ofmean values and covariancematrices and imputation ofmissingvaluesrdquo Journal of Climate vol 14 no 5 pp 853ndash871 2001

[5] S K Regonda D-J Seo B Lawrence J D Brown and JDemargne ldquoShort-term ensemble stream forecasting usingoperationally-produced single-valued streamflow forecastsmdasha Hydrologic Model Output Statistics (HMOS) approachrdquoJournal of Hydrology vol 497 pp 80ndash96 2013

[6] P Coulibaly and N D Evora ldquoComparison of neural networkmethods for infilling missing daily weather recordsrdquo Journal ofHydrology vol 341 no 1-2 pp 27ndash41 2007

[7] H A El Sharif and R S V Teegavarapu ldquoEvaluation ofspatial interpolation methods for missing precipitation datapreservation of spatial statisticsrdquo in Proceedings of the WorldEnvironmental and Water Resources Congress pp 3822ndash3832May 2012

[8] A Bardossy and G Pegram ldquoInfilling missing precipitationrecordsmdasha comparison of a new copula-based method withother techniquesrdquo Journal of Hydrology vol 519 pp 1162ndash11702014

[9] K Schamm M Ziese A Becker et al ldquoGlobal griddedprecipitation over land a description of the new GPCC firstguess daily productrdquo Earth System Science Data vol 6 no 1pp 49ndash60 2014

[10] R S V Teegavarapu ldquoStatistical corrections of spatially inter-polated missing precipitation data estimatesrdquoHydrological Pro-cesses vol 28 no 11 pp 3789ndash3808 2014

[11] Y Da and G Xiurun ldquoAn improved PSO-based ANN withsimulated annealing techniquerdquo Neurocomputing vol 63 pp527ndash533 2005

[12] A di Piazza F L Conti L V Noto F Viola and G La LoggialdquoComparative analysis of different techniques for spatial inter-polation of rainfall data to create a serially complete monthlytime series of precipitation for Sicily Italyrdquo International Journalof Applied Earth Observation and Geoinformation vol 13 no 3pp 396ndash408 2011

[13] E Pisoni F Pastor and M Volta ldquoArtificial Neural Networksto reconstruct incomplete satellite data application to theMediterranean Sea SurfaceTemperaturerdquoNonlinear Processes inGeophysics vol 15 no 1 pp 61ndash70 2008

[14] V Sharma S Rai and A Dev ldquoA comprehensive study ofartificial neural networksrdquo International Journal of AdvancedResearch in Computer Science and Sofrware Engineering vol 2no 10 pp 278ndash284 2012

[15] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy II hydrologic applicationsrdquo Journal of Hydrological Engi-neering vol 5 no 2 pp 124ndash137 2000

[16] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy I preliminary conceptsrdquo Journal of Hydrologic Engineeringvol 5 no 2 pp 115ndash123 2000

[17] D E Rumelhart B Widrow and M A Lehr ldquoThe basic ideasin neural networksrdquo Communications of the ACM vol 37 no 3pp 87ndash92 1994

[18] J Amorocho and W E Hart ldquoA critique of current methods inhydrologic systems investigationrdquo Transactions of the AmericanGeophysical Union vol 45 no 2 pp 307ndash321 1964

12 Advances in Meteorology

[19] A W Minns and M J Hall ldquoArtificial neural networks asrainfall-runoff modelsrdquo Hydrological Sciences Journal vol 41no 3 pp 399ndash417 1996

[20] K Kang and V Merwade ldquoThe effect of spatially uniformand non-uniform precipitation bias correction methods onimproving NEXRAD rainfall accuracy for distributed hydro-logic modelingrdquo Hydrology Research vol 45 no 1 pp 23ndash422014

[21] C Daly W P Gibson G H Taylor G L Johnson andP Pasteris ldquoA knowledge-based approach to the statisticalmapping of climaterdquo Climate Research vol 22 no 2 pp 99ndash1132002

[22] J D Creutin H Andrieu and D Faure ldquoUse of a weatherradar for the hydrology of a mountainous area Part II radarmeasurement validationrdquo Journal of Hydrology vol 193 no 1ndash4 pp 26ndash44 1997

[23] Y L Xia P Fabian A Stohl and M Winterhalter ldquoForest cli-matology estimation of missing values for Bavaria GermanyrdquoAgricultural and Forest Meteorology vol 96 no 1ndash3 pp 131ndash1441999

[24] C J Willmott S M Robeson and J J Feddema ldquoEstimatingcontinental and terrestrial precipitation averages from rain-gauge networksrdquo International Journal of Climatology vol 14no 4 pp 403ndash414 1994

[25] R S V Teegavarapu and V Chandramouli ldquoImproved weight-ing methods deterministic and stochastic data-driven modelsfor estimation of missing precipitation recordsrdquo Journal ofHydrology vol 312 no 1ndash4 pp 191ndash206 2005

[26] J A Smith ldquoPrecipitationrdquo in Handbook of Hydrology D RMaidment Ed vol 3 chapter 3 McGraw Hill New York NYUSA 1993

[27] J R Simanton and H B Osborn ldquoReciprocal-distance estimateof point rainfallrdquo Journal of Hydraulic Engineering Division vol106 no 7 pp 1242ndash1246 1980

[28] J D-J Salas ldquoAnalysis and modeling of hydrological timeseriesrdquo inHandbook of Hydrology D R Maidment Ed vol 19chapter 19 pp 191ndash1972 McGraw-Hill New York NY USA1993

[29] S J Jeffrey J O Carter K BMoodie and A R Beswick ldquoUsingspatial interpolation to construct a comprehensive archive ofAustralian climate datardquoEnvironmentalModelling and Softwarevol 16 no 4 pp 309ndash330 2001

[30] M Franklin V R Kotamarthi M L Stein and D R CookldquoGenerating data ensembles over a model grid from sparseclimate point measurementsrdquo Journal of Physics ConferenceSeries vol 125 Article ID 012019 2008

[31] K C Young ldquoA three-way model for interpolating for monthlyprecipitation valuesrdquo Monthly Weather Review vol 120 no 11pp 2561ndash2569 1992

[32] F Filippini G Galliani and L Pomi ldquoThe estimation ofmissingmeteorological data in a network of automatic stationsrdquoTransactions on Ecology and the Environment vol 4 pp 283ndash291 1994

[33] W P Lowry Compendium of Lecture Notes in Climatologyfor Class IV Meteorological Personnel Secretariat of the WorldMeteorological Organization Geneva Switzerland 1972

[34] M C Acock and Y A Pachepsky ldquoEstimating missing weatherdata for agricultural simulations using group method of datahandlingrdquo Journal of AppliedMeteorology vol 39 no 7 pp 1176ndash1184 2000

[35] S L Neitsch J G Arnold J R Kiniry J R Williamsand K W King Soil and Water Assessment ToolmdashTheoreticalDocumentation (Version 2005) Texas Water Resource InstituteCollege Station Tex USA 2005

[36] S S Shapiro and M B Wilk ldquoAn analysis of variance test fornormality complete samplesrdquo Biometrika vol 52 no 3-4 pp591ndash611 1965

[37] M Friedman ldquoThe use of ranks to avoid the assumption ofnormality implicit in the analysis of variancerdquo Journal of theAmerican Statistical Association vol 32 no 200 pp 675ndash7011937

[38] F Wilcoxon ldquoIndividual Comparisons by Ranking MethodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[39] S Zhang Z Jin X Zhu and J Zhang ldquoMissing data analysisa kernel-based multi-imputation approachrdquo in Transactionson Computational Science III vol 5300 of Lecture Notes inComputer Science pp 122ndash142 Springer Berlin Germany 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ClimatologyJournal of

EcologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

EarthquakesJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom

Applied ampEnvironmentalSoil Science

Volume 2014

Mining

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal of

Geophysics

OceanographyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of Computational Environmental SciencesHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal ofPetroleum Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

GeochemistryHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Atmospheric SciencesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OceanographyHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MineralogyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MeteorologyAdvances in

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Paleontology JournalHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ScientificaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geological ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geology Advances in

Page 9: Research Article Interpolation of Missing Precipitation ...downloads.hindawi.com/journals/amete/2015/935868.pdf · withthe NN regression approach, in terms of both statistical data

Advances in Meteorology 9

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(a) 4-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

(b) 6-NN

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Obs

Simulation

EPQUTW

TCCOKNN

(c) 8-NN

Figure 4 Scatter plots for SWAT simulation (EP QU TW TC CO and119870NN represents Epanechnikov Quartic Triweight Tricube Cosineand 119870NN-regression resp)

This estimate is bumpy and the smooth Parzen estimate ispreferred

119891119909(1199090) =

1

119873120582

119873

sum

119894=1

119870120582(1199090 119909119894) (A2)

because it counts observations close to 1199090with weights that

decrease with distance from 1199090 In this case a popular choice

for 119870120582is the Gaussian kernel 119870

120582(1199090 119909119894) = 120601(|119909 minus 119909

0|120582)

Letting 120601120582denote the Gaussian density with mean zero and

standard-deviation 120582 then (A2) has the form

119891119909(1199090) =

1

119873

119873

sum

119894=1

120601120582(119909 minus 119909

119894) = (119865 lowast 120601

120582) (119909) (A3)

the convolution of the sample empirical distribution 119865 with120601120582 The distribution 119865(119909) puts mass 1119873 at each of the ob-

served 119909119894and is jumpy in 119891

119909(119909) we have smoothed 119865

by adding independent Gaussian noise to each observation119909119894The Parzen density estimate is the equivalent of the local

average and improvements have been proposed along thelines of local regression (on the log scale for densities) Wewill not pursue these here In 119877119901 the natural generalization ofthe Gaussian density estimate amounts to using the Gaussianproduct kernel in (A3)

119891119909(1199090) =

1

119873 (21205822120587)1199012

119873

sum

119894=1

119890minus(12)(||119909119894minus1199090||120582)

2

(A4)

10 Advances in Meteorology

Table 6 Weighted values depending on day distance with each 119870NN

4-NN 6-NN 8-NN1st 2nd 1st 2nd 3rd 1st 2nd 3rd 4th

Ep 0667 0417 0703 0563 0328 0720 0630 0480 0270Qu 0741 0289 0824 0527 0179 0864 0662 0384 0122Tw 0768 0188 0901 0461 0092 0968 0648 0287 0051Tc 0772 0301 0824 0579 0167 0844 0709 0416 0100Co 0680 0393 0726 0555 0301 0747 0635 0462 0243

A2 Kernel Density Classification One can use nonparamet-ric density estimates for classification in a straight-forwardfashion using Bayesrsquo theorem Suppose for a 119869 class problemwe fit nonparametric density estimates 119891

119895(119883) 119895 = 1 119869

separately in each of the classes and we also have estimatesof the class priors

119895(usually the sample proportions) Then

Pr (119866 = 119895 | 119883 = 1199090) =

119895119891119895(1199090)

sum119869

119896=1119896119891119896(1199090)

(A5)

In this region the data are sparse for both classes and sincethe Gaussian kernel density estimates use matric kernels thedensity estimates are low and of poor quality (high variance)in these regions The local logistic regression method usesthe tricube kernel with 119896-NN bandwidth this effectivelywidens the kernel in this region and makes use of the locallinear assumption to smooth out the estimate (on the logitscale)

If classification is the ultimate goal then learning theseparate class densities well may be unnecessary and can infact be misleading In learning the separate densities formdata one might decide to settle for a rougher high-variancefit to capture these features which are irrelevant for thepurposes of estimating the posterior probabilities In factif classification is the ultimate goal then we need only toestimate the posterior well near the decision boundary (fortwo classes this is the set 119909 | Pr(119866 = 1 | 119883 = 119909) = 12)

B Procedures of Missing Precipitation

This step shows example calculation for kernel functions forweighted mean It is an example question about the weight ofeach situation If the kernel functions are all symmetric samevalues are used for weight based on day distance FollowingTable 6 1st 2nd 3rd and 4th day distance and weightedvalues are shown For example if we want to estimatemissing precipitation for 2010-02-12 (actual value is 6) seefollowing procedures (3 steps) with 4-NN Epanechnikovkernel (Table 7)

Step 1 Select the date for target interpolation data

Step 2 Decide119870th nearest days precipitation and each kernelweight

Step 3 Calculate the weight average to estimate missing

Table 7 Example calculation to interpolate for missing precipita-tion

Step 1 Step 2 Step 3Date Prec Weight PrecsdotWeight Estimation2010-02-10 15 0417 6255

59492010-02-11 172 0667 114722010-02-12 6 mdash mdash2010-02-13 91 0667 60702010-02-14 0 0417 0

Table 8 Calculating missing precipitation for 2010-02-12 (actualvalue is 6) with six different methods

Method 4NN 6NN 8NNEp 5949 5172 4862Qu 5956 5302 4936Tw 5755 5294 4952Tc 6205 5407 4963Co 5945 5197 4876Reg 10325 8967 8813(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine andReg regression)

The rest of the kernel methods for estimating missingprecipitation are described in Table 8

C Sample Calculations with Real Value

This section shows how to calculate missing precipitationwith kernel mean weighed function by using certain numberThis sample selected daily data from 2008 to 2010 with002 possibilities to bivariate by random After selected datasetting data location is operated Zhang et al [39] addressedthat kernel based nonparametric multiple imputation hasbetter performance than general linear regression when thesample data is small or limited

Table 9 shows procedure of kernel weight in each func-tion We used data Feb 10 2012 from Feb 14 2014 toestimate Feb 12 2012 missing data Epanechnikov kernelshowed that longest data has highest estimation as 0417however Triweight kernel showed that longest data has lowestestimation as 0188 Highest weight in nearest value is Tricubekernel and lowest weight is Epanechnikov kernel Generally

Advances in Meteorology 11

Table 9 Sample calculation with certain number

(a) Epanechnikov

Date 210 211 212 213 214Prec 150 172 60 91 00Ep weight 0417 0667 mdash 0667 0417Precsdotweight 626 1147 mdash 607 000Estimation 595

(b) Quartic

Date 210 211 212 213 214Prec 150 172 60 91 00Qu weight 0289 0741 mdash 0741 0289Precsdotweight 434 1275 mdash 674 000Estimation 596

(c) Triweight

Date 210 211 212 213 214Prec 150 172 60 91 00Tw weight 0188 0768 mdash 0768 0188Precsdotweight 282 1321 mdash 699 000Estimation 575

(d) Tricube

Date 210 211 212 213 214Prec 150 172 60 91 00Tc weight 0301 0772 mdash 0772 0301Precsdotweight 452 1328 mdash 703 000Estimation 620

(e) Cosine

Date 210 211 212 213 214Prec 150 172 60 91 00Co weight 0393 0680 mdash 0680 0393Precsdotweight 590 1170 mdash 619 000Estimation 594

Tricube that is high weight shows the overestimation formissing precipitation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] K Kang and V Merwade ldquoDevelopment and application of astorage-release based distributed hydrologic model using GISrdquoJournal of Hydrology vol 403 no 1-2 pp 1ndash13 2011

[2] A J Abebe D P Solomatine and R G W Venneker ldquoApplica-tion of adaptive fuzzy rule-based models for reconstruction ofmissing precipitation eventsrdquoHydrological Sciences Journal vol45 no 3 pp 425ndash436 2000

[3] P Ramos-Calzado J Gomez-Camacho F Perez-Bernal andMF Pita-Lopez ldquoA novel approach to precipitation series com-pletion in climatological datasets application to AndalusiardquoInternational Journal of Climatology vol 28 no 11 pp 1525ndash1534 2008

[4] T Schneider ldquoAnalysis of incomplete climate data estimation ofmean values and covariancematrices and imputation ofmissingvaluesrdquo Journal of Climate vol 14 no 5 pp 853ndash871 2001

[5] S K Regonda D-J Seo B Lawrence J D Brown and JDemargne ldquoShort-term ensemble stream forecasting usingoperationally-produced single-valued streamflow forecastsmdasha Hydrologic Model Output Statistics (HMOS) approachrdquoJournal of Hydrology vol 497 pp 80ndash96 2013

[6] P Coulibaly and N D Evora ldquoComparison of neural networkmethods for infilling missing daily weather recordsrdquo Journal ofHydrology vol 341 no 1-2 pp 27ndash41 2007

[7] H A El Sharif and R S V Teegavarapu ldquoEvaluation ofspatial interpolation methods for missing precipitation datapreservation of spatial statisticsrdquo in Proceedings of the WorldEnvironmental and Water Resources Congress pp 3822ndash3832May 2012

[8] A Bardossy and G Pegram ldquoInfilling missing precipitationrecordsmdasha comparison of a new copula-based method withother techniquesrdquo Journal of Hydrology vol 519 pp 1162ndash11702014

[9] K Schamm M Ziese A Becker et al ldquoGlobal griddedprecipitation over land a description of the new GPCC firstguess daily productrdquo Earth System Science Data vol 6 no 1pp 49ndash60 2014

[10] R S V Teegavarapu ldquoStatistical corrections of spatially inter-polated missing precipitation data estimatesrdquoHydrological Pro-cesses vol 28 no 11 pp 3789ndash3808 2014

[11] Y Da and G Xiurun ldquoAn improved PSO-based ANN withsimulated annealing techniquerdquo Neurocomputing vol 63 pp527ndash533 2005

[12] A di Piazza F L Conti L V Noto F Viola and G La LoggialdquoComparative analysis of different techniques for spatial inter-polation of rainfall data to create a serially complete monthlytime series of precipitation for Sicily Italyrdquo International Journalof Applied Earth Observation and Geoinformation vol 13 no 3pp 396ndash408 2011

[13] E Pisoni F Pastor and M Volta ldquoArtificial Neural Networksto reconstruct incomplete satellite data application to theMediterranean Sea SurfaceTemperaturerdquoNonlinear Processes inGeophysics vol 15 no 1 pp 61ndash70 2008

[14] V Sharma S Rai and A Dev ldquoA comprehensive study ofartificial neural networksrdquo International Journal of AdvancedResearch in Computer Science and Sofrware Engineering vol 2no 10 pp 278ndash284 2012

[15] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy II hydrologic applicationsrdquo Journal of Hydrological Engi-neering vol 5 no 2 pp 124ndash137 2000

[16] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy I preliminary conceptsrdquo Journal of Hydrologic Engineeringvol 5 no 2 pp 115ndash123 2000

[17] D E Rumelhart B Widrow and M A Lehr ldquoThe basic ideasin neural networksrdquo Communications of the ACM vol 37 no 3pp 87ndash92 1994

[18] J Amorocho and W E Hart ldquoA critique of current methods inhydrologic systems investigationrdquo Transactions of the AmericanGeophysical Union vol 45 no 2 pp 307ndash321 1964

12 Advances in Meteorology

[19] A W Minns and M J Hall ldquoArtificial neural networks asrainfall-runoff modelsrdquo Hydrological Sciences Journal vol 41no 3 pp 399ndash417 1996

[20] K Kang and V Merwade ldquoThe effect of spatially uniformand non-uniform precipitation bias correction methods onimproving NEXRAD rainfall accuracy for distributed hydro-logic modelingrdquo Hydrology Research vol 45 no 1 pp 23ndash422014

[21] C Daly W P Gibson G H Taylor G L Johnson andP Pasteris ldquoA knowledge-based approach to the statisticalmapping of climaterdquo Climate Research vol 22 no 2 pp 99ndash1132002

[22] J D Creutin H Andrieu and D Faure ldquoUse of a weatherradar for the hydrology of a mountainous area Part II radarmeasurement validationrdquo Journal of Hydrology vol 193 no 1ndash4 pp 26ndash44 1997

[23] Y L Xia P Fabian A Stohl and M Winterhalter ldquoForest cli-matology estimation of missing values for Bavaria GermanyrdquoAgricultural and Forest Meteorology vol 96 no 1ndash3 pp 131ndash1441999

[24] C J Willmott S M Robeson and J J Feddema ldquoEstimatingcontinental and terrestrial precipitation averages from rain-gauge networksrdquo International Journal of Climatology vol 14no 4 pp 403ndash414 1994

[25] R S V Teegavarapu and V Chandramouli ldquoImproved weight-ing methods deterministic and stochastic data-driven modelsfor estimation of missing precipitation recordsrdquo Journal ofHydrology vol 312 no 1ndash4 pp 191ndash206 2005

[26] J A Smith ldquoPrecipitationrdquo in Handbook of Hydrology D RMaidment Ed vol 3 chapter 3 McGraw Hill New York NYUSA 1993

[27] J R Simanton and H B Osborn ldquoReciprocal-distance estimateof point rainfallrdquo Journal of Hydraulic Engineering Division vol106 no 7 pp 1242ndash1246 1980

[28] J D-J Salas ldquoAnalysis and modeling of hydrological timeseriesrdquo inHandbook of Hydrology D R Maidment Ed vol 19chapter 19 pp 191ndash1972 McGraw-Hill New York NY USA1993

[29] S J Jeffrey J O Carter K BMoodie and A R Beswick ldquoUsingspatial interpolation to construct a comprehensive archive ofAustralian climate datardquoEnvironmentalModelling and Softwarevol 16 no 4 pp 309ndash330 2001

[30] M Franklin V R Kotamarthi M L Stein and D R CookldquoGenerating data ensembles over a model grid from sparseclimate point measurementsrdquo Journal of Physics ConferenceSeries vol 125 Article ID 012019 2008

[31] K C Young ldquoA three-way model for interpolating for monthlyprecipitation valuesrdquo Monthly Weather Review vol 120 no 11pp 2561ndash2569 1992

[32] F Filippini G Galliani and L Pomi ldquoThe estimation ofmissingmeteorological data in a network of automatic stationsrdquoTransactions on Ecology and the Environment vol 4 pp 283ndash291 1994

[33] W P Lowry Compendium of Lecture Notes in Climatologyfor Class IV Meteorological Personnel Secretariat of the WorldMeteorological Organization Geneva Switzerland 1972

[34] M C Acock and Y A Pachepsky ldquoEstimating missing weatherdata for agricultural simulations using group method of datahandlingrdquo Journal of AppliedMeteorology vol 39 no 7 pp 1176ndash1184 2000

[35] S L Neitsch J G Arnold J R Kiniry J R Williamsand K W King Soil and Water Assessment ToolmdashTheoreticalDocumentation (Version 2005) Texas Water Resource InstituteCollege Station Tex USA 2005

[36] S S Shapiro and M B Wilk ldquoAn analysis of variance test fornormality complete samplesrdquo Biometrika vol 52 no 3-4 pp591ndash611 1965

[37] M Friedman ldquoThe use of ranks to avoid the assumption ofnormality implicit in the analysis of variancerdquo Journal of theAmerican Statistical Association vol 32 no 200 pp 675ndash7011937

[38] F Wilcoxon ldquoIndividual Comparisons by Ranking MethodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[39] S Zhang Z Jin X Zhu and J Zhang ldquoMissing data analysisa kernel-based multi-imputation approachrdquo in Transactionson Computational Science III vol 5300 of Lecture Notes inComputer Science pp 122ndash142 Springer Berlin Germany 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ClimatologyJournal of

EcologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

EarthquakesJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom

Applied ampEnvironmentalSoil Science

Volume 2014

Mining

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal of

Geophysics

OceanographyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of Computational Environmental SciencesHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal ofPetroleum Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

GeochemistryHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Atmospheric SciencesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OceanographyHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MineralogyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MeteorologyAdvances in

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Paleontology JournalHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ScientificaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geological ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geology Advances in

Page 10: Research Article Interpolation of Missing Precipitation ...downloads.hindawi.com/journals/amete/2015/935868.pdf · withthe NN regression approach, in terms of both statistical data

10 Advances in Meteorology

Table 6 Weighted values depending on day distance with each 119870NN

4-NN 6-NN 8-NN1st 2nd 1st 2nd 3rd 1st 2nd 3rd 4th

Ep 0667 0417 0703 0563 0328 0720 0630 0480 0270Qu 0741 0289 0824 0527 0179 0864 0662 0384 0122Tw 0768 0188 0901 0461 0092 0968 0648 0287 0051Tc 0772 0301 0824 0579 0167 0844 0709 0416 0100Co 0680 0393 0726 0555 0301 0747 0635 0462 0243

A2 Kernel Density Classification One can use nonparamet-ric density estimates for classification in a straight-forwardfashion using Bayesrsquo theorem Suppose for a 119869 class problemwe fit nonparametric density estimates 119891

119895(119883) 119895 = 1 119869

separately in each of the classes and we also have estimatesof the class priors

119895(usually the sample proportions) Then

Pr (119866 = 119895 | 119883 = 1199090) =

119895119891119895(1199090)

sum119869

119896=1119896119891119896(1199090)

(A5)

In this region the data are sparse for both classes and sincethe Gaussian kernel density estimates use matric kernels thedensity estimates are low and of poor quality (high variance)in these regions The local logistic regression method usesthe tricube kernel with 119896-NN bandwidth this effectivelywidens the kernel in this region and makes use of the locallinear assumption to smooth out the estimate (on the logitscale)

If classification is the ultimate goal then learning theseparate class densities well may be unnecessary and can infact be misleading In learning the separate densities formdata one might decide to settle for a rougher high-variancefit to capture these features which are irrelevant for thepurposes of estimating the posterior probabilities In factif classification is the ultimate goal then we need only toestimate the posterior well near the decision boundary (fortwo classes this is the set 119909 | Pr(119866 = 1 | 119883 = 119909) = 12)

B Procedures of Missing Precipitation

This step shows example calculation for kernel functions forweighted mean It is an example question about the weight ofeach situation If the kernel functions are all symmetric samevalues are used for weight based on day distance FollowingTable 6 1st 2nd 3rd and 4th day distance and weightedvalues are shown For example if we want to estimatemissing precipitation for 2010-02-12 (actual value is 6) seefollowing procedures (3 steps) with 4-NN Epanechnikovkernel (Table 7)

Step 1 Select the date for target interpolation data

Step 2 Decide119870th nearest days precipitation and each kernelweight

Step 3 Calculate the weight average to estimate missing

Table 7 Example calculation to interpolate for missing precipita-tion

Step 1 Step 2 Step 3Date Prec Weight PrecsdotWeight Estimation2010-02-10 15 0417 6255

59492010-02-11 172 0667 114722010-02-12 6 mdash mdash2010-02-13 91 0667 60702010-02-14 0 0417 0

Table 8 Calculating missing precipitation for 2010-02-12 (actualvalue is 6) with six different methods

Method 4NN 6NN 8NNEp 5949 5172 4862Qu 5956 5302 4936Tw 5755 5294 4952Tc 6205 5407 4963Co 5945 5197 4876Reg 10325 8967 8813(Ep Epanechnikov Qu Quartic Tw Triweight Tc Tricube Co Cosine andReg regression)

The rest of the kernel methods for estimating missingprecipitation are described in Table 8

C Sample Calculations with Real Value

This section shows how to calculate missing precipitationwith kernel mean weighed function by using certain numberThis sample selected daily data from 2008 to 2010 with002 possibilities to bivariate by random After selected datasetting data location is operated Zhang et al [39] addressedthat kernel based nonparametric multiple imputation hasbetter performance than general linear regression when thesample data is small or limited

Table 9 shows procedure of kernel weight in each func-tion We used data Feb 10 2012 from Feb 14 2014 toestimate Feb 12 2012 missing data Epanechnikov kernelshowed that longest data has highest estimation as 0417however Triweight kernel showed that longest data has lowestestimation as 0188 Highest weight in nearest value is Tricubekernel and lowest weight is Epanechnikov kernel Generally

Advances in Meteorology 11

Table 9 Sample calculation with certain number

(a) Epanechnikov

Date 210 211 212 213 214Prec 150 172 60 91 00Ep weight 0417 0667 mdash 0667 0417Precsdotweight 626 1147 mdash 607 000Estimation 595

(b) Quartic

Date 210 211 212 213 214Prec 150 172 60 91 00Qu weight 0289 0741 mdash 0741 0289Precsdotweight 434 1275 mdash 674 000Estimation 596

(c) Triweight

Date 210 211 212 213 214Prec 150 172 60 91 00Tw weight 0188 0768 mdash 0768 0188Precsdotweight 282 1321 mdash 699 000Estimation 575

(d) Tricube

Date 210 211 212 213 214Prec 150 172 60 91 00Tc weight 0301 0772 mdash 0772 0301Precsdotweight 452 1328 mdash 703 000Estimation 620

(e) Cosine

Date 210 211 212 213 214Prec 150 172 60 91 00Co weight 0393 0680 mdash 0680 0393Precsdotweight 590 1170 mdash 619 000Estimation 594

Tricube that is high weight shows the overestimation formissing precipitation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] K Kang and V Merwade ldquoDevelopment and application of astorage-release based distributed hydrologic model using GISrdquoJournal of Hydrology vol 403 no 1-2 pp 1ndash13 2011

[2] A J Abebe D P Solomatine and R G W Venneker ldquoApplica-tion of adaptive fuzzy rule-based models for reconstruction ofmissing precipitation eventsrdquoHydrological Sciences Journal vol45 no 3 pp 425ndash436 2000

[3] P Ramos-Calzado J Gomez-Camacho F Perez-Bernal andMF Pita-Lopez ldquoA novel approach to precipitation series com-pletion in climatological datasets application to AndalusiardquoInternational Journal of Climatology vol 28 no 11 pp 1525ndash1534 2008

[4] T Schneider ldquoAnalysis of incomplete climate data estimation ofmean values and covariancematrices and imputation ofmissingvaluesrdquo Journal of Climate vol 14 no 5 pp 853ndash871 2001

[5] S K Regonda D-J Seo B Lawrence J D Brown and JDemargne ldquoShort-term ensemble stream forecasting usingoperationally-produced single-valued streamflow forecastsmdasha Hydrologic Model Output Statistics (HMOS) approachrdquoJournal of Hydrology vol 497 pp 80ndash96 2013

[6] P Coulibaly and N D Evora ldquoComparison of neural networkmethods for infilling missing daily weather recordsrdquo Journal ofHydrology vol 341 no 1-2 pp 27ndash41 2007

[7] H A El Sharif and R S V Teegavarapu ldquoEvaluation ofspatial interpolation methods for missing precipitation datapreservation of spatial statisticsrdquo in Proceedings of the WorldEnvironmental and Water Resources Congress pp 3822ndash3832May 2012

[8] A Bardossy and G Pegram ldquoInfilling missing precipitationrecordsmdasha comparison of a new copula-based method withother techniquesrdquo Journal of Hydrology vol 519 pp 1162ndash11702014

[9] K Schamm M Ziese A Becker et al ldquoGlobal griddedprecipitation over land a description of the new GPCC firstguess daily productrdquo Earth System Science Data vol 6 no 1pp 49ndash60 2014

[10] R S V Teegavarapu ldquoStatistical corrections of spatially inter-polated missing precipitation data estimatesrdquoHydrological Pro-cesses vol 28 no 11 pp 3789ndash3808 2014

[11] Y Da and G Xiurun ldquoAn improved PSO-based ANN withsimulated annealing techniquerdquo Neurocomputing vol 63 pp527ndash533 2005

[12] A di Piazza F L Conti L V Noto F Viola and G La LoggialdquoComparative analysis of different techniques for spatial inter-polation of rainfall data to create a serially complete monthlytime series of precipitation for Sicily Italyrdquo International Journalof Applied Earth Observation and Geoinformation vol 13 no 3pp 396ndash408 2011

[13] E Pisoni F Pastor and M Volta ldquoArtificial Neural Networksto reconstruct incomplete satellite data application to theMediterranean Sea SurfaceTemperaturerdquoNonlinear Processes inGeophysics vol 15 no 1 pp 61ndash70 2008

[14] V Sharma S Rai and A Dev ldquoA comprehensive study ofartificial neural networksrdquo International Journal of AdvancedResearch in Computer Science and Sofrware Engineering vol 2no 10 pp 278ndash284 2012

[15] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy II hydrologic applicationsrdquo Journal of Hydrological Engi-neering vol 5 no 2 pp 124ndash137 2000

[16] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy I preliminary conceptsrdquo Journal of Hydrologic Engineeringvol 5 no 2 pp 115ndash123 2000

[17] D E Rumelhart B Widrow and M A Lehr ldquoThe basic ideasin neural networksrdquo Communications of the ACM vol 37 no 3pp 87ndash92 1994

[18] J Amorocho and W E Hart ldquoA critique of current methods inhydrologic systems investigationrdquo Transactions of the AmericanGeophysical Union vol 45 no 2 pp 307ndash321 1964

12 Advances in Meteorology

[19] A W Minns and M J Hall ldquoArtificial neural networks asrainfall-runoff modelsrdquo Hydrological Sciences Journal vol 41no 3 pp 399ndash417 1996

[20] K Kang and V Merwade ldquoThe effect of spatially uniformand non-uniform precipitation bias correction methods onimproving NEXRAD rainfall accuracy for distributed hydro-logic modelingrdquo Hydrology Research vol 45 no 1 pp 23ndash422014

[21] C Daly W P Gibson G H Taylor G L Johnson andP Pasteris ldquoA knowledge-based approach to the statisticalmapping of climaterdquo Climate Research vol 22 no 2 pp 99ndash1132002

[22] J D Creutin H Andrieu and D Faure ldquoUse of a weatherradar for the hydrology of a mountainous area Part II radarmeasurement validationrdquo Journal of Hydrology vol 193 no 1ndash4 pp 26ndash44 1997

[23] Y L Xia P Fabian A Stohl and M Winterhalter ldquoForest cli-matology estimation of missing values for Bavaria GermanyrdquoAgricultural and Forest Meteorology vol 96 no 1ndash3 pp 131ndash1441999

[24] C J Willmott S M Robeson and J J Feddema ldquoEstimatingcontinental and terrestrial precipitation averages from rain-gauge networksrdquo International Journal of Climatology vol 14no 4 pp 403ndash414 1994

[25] R S V Teegavarapu and V Chandramouli ldquoImproved weight-ing methods deterministic and stochastic data-driven modelsfor estimation of missing precipitation recordsrdquo Journal ofHydrology vol 312 no 1ndash4 pp 191ndash206 2005

[26] J A Smith ldquoPrecipitationrdquo in Handbook of Hydrology D RMaidment Ed vol 3 chapter 3 McGraw Hill New York NYUSA 1993

[27] J R Simanton and H B Osborn ldquoReciprocal-distance estimateof point rainfallrdquo Journal of Hydraulic Engineering Division vol106 no 7 pp 1242ndash1246 1980

[28] J D-J Salas ldquoAnalysis and modeling of hydrological timeseriesrdquo inHandbook of Hydrology D R Maidment Ed vol 19chapter 19 pp 191ndash1972 McGraw-Hill New York NY USA1993

[29] S J Jeffrey J O Carter K BMoodie and A R Beswick ldquoUsingspatial interpolation to construct a comprehensive archive ofAustralian climate datardquoEnvironmentalModelling and Softwarevol 16 no 4 pp 309ndash330 2001

[30] M Franklin V R Kotamarthi M L Stein and D R CookldquoGenerating data ensembles over a model grid from sparseclimate point measurementsrdquo Journal of Physics ConferenceSeries vol 125 Article ID 012019 2008

[31] K C Young ldquoA three-way model for interpolating for monthlyprecipitation valuesrdquo Monthly Weather Review vol 120 no 11pp 2561ndash2569 1992

[32] F Filippini G Galliani and L Pomi ldquoThe estimation ofmissingmeteorological data in a network of automatic stationsrdquoTransactions on Ecology and the Environment vol 4 pp 283ndash291 1994

[33] W P Lowry Compendium of Lecture Notes in Climatologyfor Class IV Meteorological Personnel Secretariat of the WorldMeteorological Organization Geneva Switzerland 1972

[34] M C Acock and Y A Pachepsky ldquoEstimating missing weatherdata for agricultural simulations using group method of datahandlingrdquo Journal of AppliedMeteorology vol 39 no 7 pp 1176ndash1184 2000

[35] S L Neitsch J G Arnold J R Kiniry J R Williamsand K W King Soil and Water Assessment ToolmdashTheoreticalDocumentation (Version 2005) Texas Water Resource InstituteCollege Station Tex USA 2005

[36] S S Shapiro and M B Wilk ldquoAn analysis of variance test fornormality complete samplesrdquo Biometrika vol 52 no 3-4 pp591ndash611 1965

[37] M Friedman ldquoThe use of ranks to avoid the assumption ofnormality implicit in the analysis of variancerdquo Journal of theAmerican Statistical Association vol 32 no 200 pp 675ndash7011937

[38] F Wilcoxon ldquoIndividual Comparisons by Ranking MethodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[39] S Zhang Z Jin X Zhu and J Zhang ldquoMissing data analysisa kernel-based multi-imputation approachrdquo in Transactionson Computational Science III vol 5300 of Lecture Notes inComputer Science pp 122ndash142 Springer Berlin Germany 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ClimatologyJournal of

EcologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

EarthquakesJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom

Applied ampEnvironmentalSoil Science

Volume 2014

Mining

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal of

Geophysics

OceanographyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of Computational Environmental SciencesHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal ofPetroleum Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

GeochemistryHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Atmospheric SciencesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OceanographyHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MineralogyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MeteorologyAdvances in

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Paleontology JournalHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ScientificaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geological ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geology Advances in

Page 11: Research Article Interpolation of Missing Precipitation ...downloads.hindawi.com/journals/amete/2015/935868.pdf · withthe NN regression approach, in terms of both statistical data

Advances in Meteorology 11

Table 9 Sample calculation with certain number

(a) Epanechnikov

Date 210 211 212 213 214Prec 150 172 60 91 00Ep weight 0417 0667 mdash 0667 0417Precsdotweight 626 1147 mdash 607 000Estimation 595

(b) Quartic

Date 210 211 212 213 214Prec 150 172 60 91 00Qu weight 0289 0741 mdash 0741 0289Precsdotweight 434 1275 mdash 674 000Estimation 596

(c) Triweight

Date 210 211 212 213 214Prec 150 172 60 91 00Tw weight 0188 0768 mdash 0768 0188Precsdotweight 282 1321 mdash 699 000Estimation 575

(d) Tricube

Date 210 211 212 213 214Prec 150 172 60 91 00Tc weight 0301 0772 mdash 0772 0301Precsdotweight 452 1328 mdash 703 000Estimation 620

(e) Cosine

Date 210 211 212 213 214Prec 150 172 60 91 00Co weight 0393 0680 mdash 0680 0393Precsdotweight 590 1170 mdash 619 000Estimation 594

Tricube that is high weight shows the overestimation formissing precipitation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] K Kang and V Merwade ldquoDevelopment and application of astorage-release based distributed hydrologic model using GISrdquoJournal of Hydrology vol 403 no 1-2 pp 1ndash13 2011

[2] A J Abebe D P Solomatine and R G W Venneker ldquoApplica-tion of adaptive fuzzy rule-based models for reconstruction ofmissing precipitation eventsrdquoHydrological Sciences Journal vol45 no 3 pp 425ndash436 2000

[3] P Ramos-Calzado J Gomez-Camacho F Perez-Bernal andMF Pita-Lopez ldquoA novel approach to precipitation series com-pletion in climatological datasets application to AndalusiardquoInternational Journal of Climatology vol 28 no 11 pp 1525ndash1534 2008

[4] T Schneider ldquoAnalysis of incomplete climate data estimation ofmean values and covariancematrices and imputation ofmissingvaluesrdquo Journal of Climate vol 14 no 5 pp 853ndash871 2001

[5] S K Regonda D-J Seo B Lawrence J D Brown and JDemargne ldquoShort-term ensemble stream forecasting usingoperationally-produced single-valued streamflow forecastsmdasha Hydrologic Model Output Statistics (HMOS) approachrdquoJournal of Hydrology vol 497 pp 80ndash96 2013

[6] P Coulibaly and N D Evora ldquoComparison of neural networkmethods for infilling missing daily weather recordsrdquo Journal ofHydrology vol 341 no 1-2 pp 27ndash41 2007

[7] H A El Sharif and R S V Teegavarapu ldquoEvaluation ofspatial interpolation methods for missing precipitation datapreservation of spatial statisticsrdquo in Proceedings of the WorldEnvironmental and Water Resources Congress pp 3822ndash3832May 2012

[8] A Bardossy and G Pegram ldquoInfilling missing precipitationrecordsmdasha comparison of a new copula-based method withother techniquesrdquo Journal of Hydrology vol 519 pp 1162ndash11702014

[9] K Schamm M Ziese A Becker et al ldquoGlobal griddedprecipitation over land a description of the new GPCC firstguess daily productrdquo Earth System Science Data vol 6 no 1pp 49ndash60 2014

[10] R S V Teegavarapu ldquoStatistical corrections of spatially inter-polated missing precipitation data estimatesrdquoHydrological Pro-cesses vol 28 no 11 pp 3789ndash3808 2014

[11] Y Da and G Xiurun ldquoAn improved PSO-based ANN withsimulated annealing techniquerdquo Neurocomputing vol 63 pp527ndash533 2005

[12] A di Piazza F L Conti L V Noto F Viola and G La LoggialdquoComparative analysis of different techniques for spatial inter-polation of rainfall data to create a serially complete monthlytime series of precipitation for Sicily Italyrdquo International Journalof Applied Earth Observation and Geoinformation vol 13 no 3pp 396ndash408 2011

[13] E Pisoni F Pastor and M Volta ldquoArtificial Neural Networksto reconstruct incomplete satellite data application to theMediterranean Sea SurfaceTemperaturerdquoNonlinear Processes inGeophysics vol 15 no 1 pp 61ndash70 2008

[14] V Sharma S Rai and A Dev ldquoA comprehensive study ofartificial neural networksrdquo International Journal of AdvancedResearch in Computer Science and Sofrware Engineering vol 2no 10 pp 278ndash284 2012

[15] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy II hydrologic applicationsrdquo Journal of Hydrological Engi-neering vol 5 no 2 pp 124ndash137 2000

[16] ASCE Task Committee ldquoArtificial neural networks in hydrol-ogy I preliminary conceptsrdquo Journal of Hydrologic Engineeringvol 5 no 2 pp 115ndash123 2000

[17] D E Rumelhart B Widrow and M A Lehr ldquoThe basic ideasin neural networksrdquo Communications of the ACM vol 37 no 3pp 87ndash92 1994

[18] J Amorocho and W E Hart ldquoA critique of current methods inhydrologic systems investigationrdquo Transactions of the AmericanGeophysical Union vol 45 no 2 pp 307ndash321 1964

12 Advances in Meteorology

[19] A W Minns and M J Hall ldquoArtificial neural networks asrainfall-runoff modelsrdquo Hydrological Sciences Journal vol 41no 3 pp 399ndash417 1996

[20] K Kang and V Merwade ldquoThe effect of spatially uniformand non-uniform precipitation bias correction methods onimproving NEXRAD rainfall accuracy for distributed hydro-logic modelingrdquo Hydrology Research vol 45 no 1 pp 23ndash422014

[21] C Daly W P Gibson G H Taylor G L Johnson andP Pasteris ldquoA knowledge-based approach to the statisticalmapping of climaterdquo Climate Research vol 22 no 2 pp 99ndash1132002

[22] J D Creutin H Andrieu and D Faure ldquoUse of a weatherradar for the hydrology of a mountainous area Part II radarmeasurement validationrdquo Journal of Hydrology vol 193 no 1ndash4 pp 26ndash44 1997

[23] Y L Xia P Fabian A Stohl and M Winterhalter ldquoForest cli-matology estimation of missing values for Bavaria GermanyrdquoAgricultural and Forest Meteorology vol 96 no 1ndash3 pp 131ndash1441999

[24] C J Willmott S M Robeson and J J Feddema ldquoEstimatingcontinental and terrestrial precipitation averages from rain-gauge networksrdquo International Journal of Climatology vol 14no 4 pp 403ndash414 1994

[25] R S V Teegavarapu and V Chandramouli ldquoImproved weight-ing methods deterministic and stochastic data-driven modelsfor estimation of missing precipitation recordsrdquo Journal ofHydrology vol 312 no 1ndash4 pp 191ndash206 2005

[26] J A Smith ldquoPrecipitationrdquo in Handbook of Hydrology D RMaidment Ed vol 3 chapter 3 McGraw Hill New York NYUSA 1993

[27] J R Simanton and H B Osborn ldquoReciprocal-distance estimateof point rainfallrdquo Journal of Hydraulic Engineering Division vol106 no 7 pp 1242ndash1246 1980

[28] J D-J Salas ldquoAnalysis and modeling of hydrological timeseriesrdquo inHandbook of Hydrology D R Maidment Ed vol 19chapter 19 pp 191ndash1972 McGraw-Hill New York NY USA1993

[29] S J Jeffrey J O Carter K BMoodie and A R Beswick ldquoUsingspatial interpolation to construct a comprehensive archive ofAustralian climate datardquoEnvironmentalModelling and Softwarevol 16 no 4 pp 309ndash330 2001

[30] M Franklin V R Kotamarthi M L Stein and D R CookldquoGenerating data ensembles over a model grid from sparseclimate point measurementsrdquo Journal of Physics ConferenceSeries vol 125 Article ID 012019 2008

[31] K C Young ldquoA three-way model for interpolating for monthlyprecipitation valuesrdquo Monthly Weather Review vol 120 no 11pp 2561ndash2569 1992

[32] F Filippini G Galliani and L Pomi ldquoThe estimation ofmissingmeteorological data in a network of automatic stationsrdquoTransactions on Ecology and the Environment vol 4 pp 283ndash291 1994

[33] W P Lowry Compendium of Lecture Notes in Climatologyfor Class IV Meteorological Personnel Secretariat of the WorldMeteorological Organization Geneva Switzerland 1972

[34] M C Acock and Y A Pachepsky ldquoEstimating missing weatherdata for agricultural simulations using group method of datahandlingrdquo Journal of AppliedMeteorology vol 39 no 7 pp 1176ndash1184 2000

[35] S L Neitsch J G Arnold J R Kiniry J R Williamsand K W King Soil and Water Assessment ToolmdashTheoreticalDocumentation (Version 2005) Texas Water Resource InstituteCollege Station Tex USA 2005

[36] S S Shapiro and M B Wilk ldquoAn analysis of variance test fornormality complete samplesrdquo Biometrika vol 52 no 3-4 pp591ndash611 1965

[37] M Friedman ldquoThe use of ranks to avoid the assumption ofnormality implicit in the analysis of variancerdquo Journal of theAmerican Statistical Association vol 32 no 200 pp 675ndash7011937

[38] F Wilcoxon ldquoIndividual Comparisons by Ranking MethodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[39] S Zhang Z Jin X Zhu and J Zhang ldquoMissing data analysisa kernel-based multi-imputation approachrdquo in Transactionson Computational Science III vol 5300 of Lecture Notes inComputer Science pp 122ndash142 Springer Berlin Germany 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ClimatologyJournal of

EcologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

EarthquakesJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom

Applied ampEnvironmentalSoil Science

Volume 2014

Mining

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal of

Geophysics

OceanographyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of Computational Environmental SciencesHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal ofPetroleum Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

GeochemistryHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Atmospheric SciencesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OceanographyHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MineralogyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MeteorologyAdvances in

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Paleontology JournalHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ScientificaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geological ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geology Advances in

Page 12: Research Article Interpolation of Missing Precipitation ...downloads.hindawi.com/journals/amete/2015/935868.pdf · withthe NN regression approach, in terms of both statistical data

12 Advances in Meteorology

[19] A W Minns and M J Hall ldquoArtificial neural networks asrainfall-runoff modelsrdquo Hydrological Sciences Journal vol 41no 3 pp 399ndash417 1996

[20] K Kang and V Merwade ldquoThe effect of spatially uniformand non-uniform precipitation bias correction methods onimproving NEXRAD rainfall accuracy for distributed hydro-logic modelingrdquo Hydrology Research vol 45 no 1 pp 23ndash422014

[21] C Daly W P Gibson G H Taylor G L Johnson andP Pasteris ldquoA knowledge-based approach to the statisticalmapping of climaterdquo Climate Research vol 22 no 2 pp 99ndash1132002

[22] J D Creutin H Andrieu and D Faure ldquoUse of a weatherradar for the hydrology of a mountainous area Part II radarmeasurement validationrdquo Journal of Hydrology vol 193 no 1ndash4 pp 26ndash44 1997

[23] Y L Xia P Fabian A Stohl and M Winterhalter ldquoForest cli-matology estimation of missing values for Bavaria GermanyrdquoAgricultural and Forest Meteorology vol 96 no 1ndash3 pp 131ndash1441999

[24] C J Willmott S M Robeson and J J Feddema ldquoEstimatingcontinental and terrestrial precipitation averages from rain-gauge networksrdquo International Journal of Climatology vol 14no 4 pp 403ndash414 1994

[25] R S V Teegavarapu and V Chandramouli ldquoImproved weight-ing methods deterministic and stochastic data-driven modelsfor estimation of missing precipitation recordsrdquo Journal ofHydrology vol 312 no 1ndash4 pp 191ndash206 2005

[26] J A Smith ldquoPrecipitationrdquo in Handbook of Hydrology D RMaidment Ed vol 3 chapter 3 McGraw Hill New York NYUSA 1993

[27] J R Simanton and H B Osborn ldquoReciprocal-distance estimateof point rainfallrdquo Journal of Hydraulic Engineering Division vol106 no 7 pp 1242ndash1246 1980

[28] J D-J Salas ldquoAnalysis and modeling of hydrological timeseriesrdquo inHandbook of Hydrology D R Maidment Ed vol 19chapter 19 pp 191ndash1972 McGraw-Hill New York NY USA1993

[29] S J Jeffrey J O Carter K BMoodie and A R Beswick ldquoUsingspatial interpolation to construct a comprehensive archive ofAustralian climate datardquoEnvironmentalModelling and Softwarevol 16 no 4 pp 309ndash330 2001

[30] M Franklin V R Kotamarthi M L Stein and D R CookldquoGenerating data ensembles over a model grid from sparseclimate point measurementsrdquo Journal of Physics ConferenceSeries vol 125 Article ID 012019 2008

[31] K C Young ldquoA three-way model for interpolating for monthlyprecipitation valuesrdquo Monthly Weather Review vol 120 no 11pp 2561ndash2569 1992

[32] F Filippini G Galliani and L Pomi ldquoThe estimation ofmissingmeteorological data in a network of automatic stationsrdquoTransactions on Ecology and the Environment vol 4 pp 283ndash291 1994

[33] W P Lowry Compendium of Lecture Notes in Climatologyfor Class IV Meteorological Personnel Secretariat of the WorldMeteorological Organization Geneva Switzerland 1972

[34] M C Acock and Y A Pachepsky ldquoEstimating missing weatherdata for agricultural simulations using group method of datahandlingrdquo Journal of AppliedMeteorology vol 39 no 7 pp 1176ndash1184 2000

[35] S L Neitsch J G Arnold J R Kiniry J R Williamsand K W King Soil and Water Assessment ToolmdashTheoreticalDocumentation (Version 2005) Texas Water Resource InstituteCollege Station Tex USA 2005

[36] S S Shapiro and M B Wilk ldquoAn analysis of variance test fornormality complete samplesrdquo Biometrika vol 52 no 3-4 pp591ndash611 1965

[37] M Friedman ldquoThe use of ranks to avoid the assumption ofnormality implicit in the analysis of variancerdquo Journal of theAmerican Statistical Association vol 32 no 200 pp 675ndash7011937

[38] F Wilcoxon ldquoIndividual Comparisons by Ranking MethodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[39] S Zhang Z Jin X Zhu and J Zhang ldquoMissing data analysisa kernel-based multi-imputation approachrdquo in Transactionson Computational Science III vol 5300 of Lecture Notes inComputer Science pp 122ndash142 Springer Berlin Germany 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ClimatologyJournal of

EcologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

EarthquakesJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom

Applied ampEnvironmentalSoil Science

Volume 2014

Mining

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal of

Geophysics

OceanographyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of Computational Environmental SciencesHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal ofPetroleum Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

GeochemistryHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Atmospheric SciencesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OceanographyHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MineralogyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MeteorologyAdvances in

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Paleontology JournalHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ScientificaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geological ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geology Advances in

Page 13: Research Article Interpolation of Missing Precipitation ...downloads.hindawi.com/journals/amete/2015/935868.pdf · withthe NN regression approach, in terms of both statistical data

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ClimatologyJournal of

EcologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

EarthquakesJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom

Applied ampEnvironmentalSoil Science

Volume 2014

Mining

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal of

Geophysics

OceanographyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of Computational Environmental SciencesHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal ofPetroleum Engineering

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

GeochemistryHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Atmospheric SciencesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OceanographyHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MineralogyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MeteorologyAdvances in

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Paleontology JournalHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ScientificaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geological ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Geology Advances in