researcharticle a poisson-gamma model for zero inflated … · 2017. 11. 7. · researcharticle a...

13
Research Article A Poisson-Gamma Model for Zero Inflated Rainfall Data Nelson Christopher Dzupire , 1 Philip Ngare, 1,2 and Leo Odongo 1,3 1 Pan African University Institute of Basic Sciences, Technology and Innovation, Juja, Kenya 2 University of Nairobi, Nairobi, Kenya 3 Kenyatta University, Nairobi, Kenya Correspondence should be addressed to Nelson Christopher Dzupire; [email protected] Received 7 November 2017; Accepted 20 February 2018; Published 4 April 2018 Academic Editor: Steve Su Copyright © 2018 Nelson Christopher Dzupire et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Rainfall modeling is significant for prediction and forecasting purposes in agriculture, weather derivatives, hydrology, and risk and disaster preparedness. Normally two models are used to model the rainfall process as a chain dependent process representing the occurrence and intensity of rainfall. Such two models help in understanding the physical features and dynamics of rainfall process. However rainfall data is zero inflated and exhibits overdispersion which is always underestimated by such models. In this study we have modeled the two processes simultaneously as a compound Poisson process. e rainfall events are modeled as a Poisson process while the intensity of each rainfall event is Gamma distributed. We minimize overdispersion by introducing the dispersion parameter in the model implemented through Tweedie distributions. Simulated rainfall data from the model shows a resemblance of the actual rainfall data in terms of seasonal variation, means, variance, and magnitude. e model also provides mechanisms for small but important properties of the rainfall process. e model developed can be used in forecasting and predicting rainfall amounts and occurrences which is important in weather derivatives, agriculture, hydrology, and prediction of drought and flood occurrences. 1. Introduction Climate variables, in particular, rainfall occurrence and intensity, hugely impact human and physical environment. Knowledge of the frequency of the occurrence and intensity of rainfall events is essential for planning, designing, and management of various water resources system [1]. Specif- ically rain-fed agriculture is a sensitive sector to weather and crop production is directly dependent on the amount of rainfall and its occurrence. Rainfall modeling has a great impact on crop growth, weather derivatives, hydrological systems, drought, and flood management and crop simulated studies. Rainfall modeling is also important in pricing of weather derivatives which are financial instruments that are used as a tool for risk management to reduce risk associated with adverse or unexpected weather conditions. Further as climate change greatly affects the environment there is an urgent need for predicting the variability of rainfall for future periods for different climate change scenarios in order to provide necessary information for high quality climate related impact studies [1]. However modeling precipitation poses a lot of challenges, namely, accurate measurement of precipitation since rainfall data consists of sequences of values which are either zero or some positive numbers (intensity) depending on the depth of accumulation over discrete intervals. In addition factors like wind can affect collection accuracy. Rainfall is localized unlike temperature which is highly correlated across regions; therefore a derivative holder based on rainfall may suffer geographical basis risk in case of pricing weather derivatives. e final challenge is the choice of a proper probability distribution function to describe precipitation data. e statistical property of precipitation is far more complex and a more sophisticated distribution is required [2]. Rainfall has been modeled as a chain dependent process where a two-state Markov chain model represents the occur- rence of rainfall and the intensity of rainfall is modeled by fitting a suitable distribution like Gamma [3], exponential, and mixed exponential [1, 4]. ese models are easy to Hindawi Journal of Probability and Statistics Volume 2018, Article ID 1012647, 12 pages https://doi.org/10.1155/2018/1012647

Upload: others

Post on 01-Sep-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ResearchArticle A Poisson-Gamma Model for Zero Inflated … · 2017. 11. 7. · ResearchArticle A Poisson-Gamma Model for Zero Inflated Rainfall Data NelsonChristopherDzupire ,1 PhilipNgare,1,2

Research ArticleA Poisson-Gamma Model for Zero Inflated Rainfall Data

Nelson Christopher Dzupire 1 Philip Ngare12 and Leo Odongo13

1Pan African University Institute of Basic Sciences Technology and Innovation Juja Kenya2University of Nairobi Nairobi Kenya3Kenyatta University Nairobi Kenya

Correspondence should be addressed to Nelson Christopher Dzupire ndzupireccacmw

Received 7 November 2017 Accepted 20 February 2018 Published 4 April 2018

Academic Editor Steve Su

Copyright copy 2018 Nelson Christopher Dzupire et al This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

Rainfall modeling is significant for prediction and forecasting purposes in agriculture weather derivatives hydrology and risk anddisaster preparedness Normally two models are used to model the rainfall process as a chain dependent process representing theoccurrence and intensity of rainfall Such two models help in understanding the physical features and dynamics of rainfall processHowever rainfall data is zero inflated and exhibits overdispersion which is always underestimated by such models In this studywe have modeled the two processes simultaneously as a compound Poisson process The rainfall events are modeled as a Poissonprocess while the intensity of each rainfall event is Gamma distributed We minimize overdispersion by introducing the dispersionparameter in the model implemented through Tweedie distributions Simulated rainfall data from the model shows a resemblanceof the actual rainfall data in terms of seasonal variation means variance and magnitude The model also provides mechanismsfor small but important properties of the rainfall process The model developed can be used in forecasting and predicting rainfallamounts and occurrences which is important in weather derivatives agriculture hydrology and prediction of drought and floodoccurrences

1 Introduction

Climate variables in particular rainfall occurrence andintensity hugely impact human and physical environmentKnowledge of the frequency of the occurrence and intensityof rainfall events is essential for planning designing andmanagement of various water resources system [1] Specif-ically rain-fed agriculture is a sensitive sector to weatherand crop production is directly dependent on the amountof rainfall and its occurrence Rainfall modeling has a greatimpact on crop growth weather derivatives hydrologicalsystems drought and floodmanagement and crop simulatedstudies

Rainfall modeling is also important in pricing of weatherderivatives which are financial instruments that are used asa tool for risk management to reduce risk associated withadverse or unexpected weather conditions

Further as climate change greatly affects the environmentthere is an urgent need for predicting the variability of rainfallfor future periods for different climate change scenarios

in order to provide necessary information for high qualityclimate related impact studies [1]

Howevermodeling precipitation poses a lot of challengesnamely accurate measurement of precipitation since rainfalldata consists of sequences of values which are either zero orsome positive numbers (intensity) depending on the depthof accumulation over discrete intervals In addition factorslike wind can affect collection accuracy Rainfall is localizedunlike temperature which is highly correlated across regionstherefore a derivative holder based on rainfall may suffergeographical basis risk in case of pricing weather derivativesThe final challenge is the choice of a proper probabilitydistribution function to describe precipitation data Thestatistical property of precipitation is far more complex anda more sophisticated distribution is required [2]

Rainfall has been modeled as a chain dependent processwhere a two-state Markov chain model represents the occur-rence of rainfall and the intensity of rainfall is modeled byfitting a suitable distribution like Gamma [3] exponentialand mixed exponential [1 4] These models are easy to

HindawiJournal of Probability and StatisticsVolume 2018 Article ID 1012647 12 pageshttpsdoiorg10115520181012647

2 Journal of Probability and Statistics

understand and interpret and use maximum likelihood tofind the parameters However models involve many parame-ters to fully describe the dynamics of rainfall aswell asmakingseveral assumptions for the process

Wilks [5] proposed a multisite model for daily precipi-tation using a combination of two-state Markov process (forthe rainfall occurrence) and amixed exponential distribution(for the precipitation amount) He found that the mixture ofexponential distributions offered a much better fit than thecommonly used Gamma distribution

In study of Leobacher and Ngare [3] the precipitationis modeled on a monthly basis by constructing a suit-able Markov-Gamma process to take into account seasonalchanges of precipitation It is assumed that rainfall datafor different years of the same month is independent andidentically distributed It is assumed that precipitation can beforecast with sufficient accuracy for a month

Another approach of modeling rainfall is based on thePoisson cluster model where two of the most recognizedcluster based models in the stochastic modeling of rain-fall are the Newman-Scott Rectangular Pulses model andthe Bartlett-Lewis Rectangular Pulse model These mod-els represent rainfall sequences in time and rainfall fieldsin space where both the occurrence and depth processesare combined The difficulty in Poisson cluster models asobserved by Onof et al [6] is the challenge of how manyfeatures should be addressed so that the model is stillmathematically tractable In addition the models are bestfitted by the method of moments and so requires matchinganalytic expressions for the statistical properties such asmeanand variance

Carmona and Diko [7] developed a time-homogeneousjump Markov process to describe rainfall dynamics Therainfall process was assumed to be in form of storms whichconsists of cells themselves At a cell arrival time the rainfallprocess jumps up by a random amount and at extinction timeit jumps down by a random amount bothmodeled as Poissonprocess Each time the rain intensity changes an exponentialincrease occurs either upwards or downwards To preservenonnegative intensity the downward jump size is truncatedto the current jump sizeTheMarkov jumpprocess also allowsfor a jump directly to zero corresponding to the state of norain [8]

In this study the rainfall process is modeled as a singlemodel where the occurrence and intensity of rainfall aresimultaneously modeled The Poisson process models thedaily occurrence of rainfall while the intensity is modeledusing Gamma distribution as the magnitude of the jumpsof the Poisson process Hence we have a compound Poissonprocess which is Poisson-GammamodelThe contribution ofthis study is twofold a Poisson-Gamma model that simul-taneously describes the rainfall occurrence and intensity atonce and a suitablemodel for zero inflated datawhich reducesoverdispersion

This paper is structured as follows In Section 2 thePoisson-Gamma model is described and then formulatedmathematically while Section 3 presents methods of estimat-ing the parameters of the model In Section 4 the model isfitted to the data and goodness of fit of the model is evaluated

by mean deviance whereas quantile residuals perform thediagnostics check of the model Simulation and forecastingare carried out in Section 5 and the study concludes inSection 6

2 Model Formulation

21 Model Description Rainfall comprises discrete and con-tinuous components in that if it does not rain the amountof rainfall is discrete whereas if it rains the amount iscontinuous In most research works [3 4 9] the rainfallprocess is presented by use of two separate models oneis for the occurrence and conditioned on the occurrenceand another model is developed for the amount of rain-fall Rainfall occurrence is basically modeled as first orhigher order Markov chain process and conditioned onthis process a distribution is used to fit the precipitationamount Commonly used distributions are Gamma expo-nential mixture of exponential Weibull and so on Thesemodels work based on several assumptions and inclusionof several parameters to capture the observed temporaldependence of the rainfall process However rainfall dataexhibit overdispersion [10] which is caused by various factorslike clustering unaccounted temporal correlation or the factthat the data is a product of Bernoulli trials with unequalprobability of eventsThe stochastic models developed in thisway underestimate the overdispersion of rainfall data whichmay result in underestimating the risk of low or high seasonalrainfall

Our interest in this research is to simultaneously modelthe occurrence and intensity of rainfall in one model Wewould model the rainfall process by using a Poisson-Gammaprobability distribution which is flexible to model the exactzeros and the amount of rainfall together

Rainfall ismodeled as a compoundPoisson processwhichis a Levy process with Gamma distributed jumps This ismotivated by the sudden changes of rainfall amount fromzero to a large positive value following each rainfall eventwhich are modeled as pure jumps of the compound Poissonprocess

We assume rainfall arrives in forms of storms following aPoisson process and at each arrival time the current intensityincreases by a randomamount based onGammadistributionThe jumps of the driving process represent the arrival ofthe storm events generating a jump size of random sizeEach storm comprises cells that also arrive following anotherPoisson process

The Poisson cluster processes gives an appropriate tool asrainfall data indicating presence of clusters of rainfall cellsAs observed by Onof et al [6] use of Gamma distributedvariables for cell depth improves the reproduction of extremevalues

Lord [11] used the Poisson-Gamma compound process tomodel the motor vehicle crashes where they examined theeffects of low sample mean values and small sample size onthe estimation of the fixed dispersion parameter Wang [12]proposed a Poisson-Gamma compound approach for speciesrichness estimation

Journal of Probability and Statistics 3

22 Mathematical Formulation Let 119873119905 be total number ofrainfall event per day following a Poisson process such that

119875 (119873119905 = 119899) = 119890minus120582 120582119899119899 forall119899 isin N119873119905 = sum

119905ge1

1[119905infin) (119905) (1)

The amount of rainfall is the total sum of the jumps ofeach rainfall event say (119910119894)119894ge1 assumed to be identically andindependently Gamma distributed and independent of thetimes of the occurrence of rainfall

119871 (119905) = 119873119905sum119894=1

119910119894 119873119905 = 1 2 3 0 119873119905 = 0 (2)

such that 119910119894 sim Gamma(120572 119875) is with probability density func-tion

119891 (119910) = 120572119901119910119875minus1119890minus120572119910Γ (119875) 119910 gt 00 119910 le 0 (3)

Lemma 1 The compound Poisson process (2) has a cumulantfunction

120595 (119904 119905 119909) = 120582119905 (119890119872119884(119909) minus 1) (4)

for 0 le 119904 lt 119905 and 119909 isin R where119872119884(119909) is the moment generat-ing function of the Gamma distribution

Proof Themoment generating functionΦ(119904) of 119871(119904) is givenby

119872119871 (119904) = E (119890119904119871(119905))= infinsum

119895=0

E (119890119904119871(119905) | 119873 (119905) = 119895) 119875 (119873 (119905) = 119895)= infinsum

119895=0

E (119890119904(119871(1)+119871(2)+sdotsdotsdot+119871(119895)) | 119873 (119905) = 119895) 119875 (119873 (119905) = 119895)= infinsum

119895=0

E (119890119904(119871(1)+119871(2)+sdotsdotsdot+119871(119895))) 119875 (119873 (119905) = 119895)because of independence of 119871 and 119873(119905)

= infinsum119895=0

(119872119884 (119904))119895 119890minus120582119905 (120582119905)119895119895 = 119890minus120582119905 infinsum119895=0

(119872119884 (119904))119895 (120582119905)119895119895= 119890minus120582119905+119872119884(119904)120582119905

(5)

So the cumulant of 119871 is

ln119872119871 (119904) = 120582 (119872119884 (119904) minus 1) = 120582 [(1 minus 120572119909)minus119875 minus 1] (6)

If we observe the occurrence of rainfall for 119899 periodsthen we have the sequence 119871 119894119899119894=1 which is independent andidentically distributed

If on a particular day there is no rainfall that occurredthen

119875 (119871 = 0) = exp (minus120582) (120582)00 = exp (minus120582) = 1199010 (7)

Therefore the process has a point mass at 0 which impliesthat it is not entirely continuous random variable

Lemma 2 The probability density function of 119871 in (2) is

119891120579 (119871) = exp (minus120582) 120575 (119871) + exp (minus120582 minus 120572119871) 119871minus1119903119875 (V119871119875) (8)

where 1205750(119871) is a dirac function at zero

Proof Let 1199020 = 1minus1199010 be the probability that it rained Hencefor 119871 119894 gt 0 we have119891+120579 (119871) = infinsum

119894=1

1199011198941199020 (120572119894119875119871119894119875minus1 exp (minus120572119871)Γ (119894119901) )where 119901119894 = exp (minus120582) (120582)119894119894

= 11199020 [infinsum119894=1

119901119894 exp (minus120572119871) 120572119894119875119871119894119875minus1Γ (119894119875) ]= 11199020 [exp (minus120572119871) infinsum

119894=1

119901119894 120572119894119875119871119894119875minus1Γ (119894119875) ]= 11199020 [exp (minus120572119871) infinsum

119894=1

(exp (minus120582) (120582)119894119894 ) 120572119894119875119871119894119875minus1Γ (119894119875) ]= exp (minus120582)1199020 [exp (minus120572119871) infinsum

119894=1

((120582)119894119894 ) 120572119894119901119871119894119875minus1Γ (119894119901) ]= exp (minus120582)1199020 exp (minus120572119871) [infinsum

119894=1

(120582)119894 (120572119871)119894119901119871119894Γ (119894119875) ]= 119871minus1 exp (minus120572119871)(exp (120582) minus 1)

infinsum119894=1

120582120572119875119871119875119894Γ (119894119875)

(9)

If we let V = 120582120572119875 and 119903119901(V119871119875) = suminfin119894=1(V119871119875119894Γ(119894119875)) then we

have

119891+120579 (119871) = 119871minus1 exp (minus120572119871)(exp (120582) minus 1) 119903119875 (V119871119875) (10)

4 Journal of Probability and Statistics

We can express the probability density function 119891120579(119871) interms of a Dirac function as119891120579 (119871) = 11990101205750 (119871) + 1199020119891+

120579 (119871)= exp (minus120582) 1205750 (119871)

+ [ 1199020(exp (120582) minus 1)] 119871minus1 exp (minus120572119871) 119903119875 (V119871119875)= exp (minus120582) 1205750 (119871)

+ exp (minus120582 minus 120572119871) 119871minus1119903119875 (V119871119875)

(11)

Consider a random sample of size 119899 of 119871 119894 with the proba-bility density function

119891120579 (119871) = exp (minus120582) 120575 (119871) + exp (minus120582 minus 120572119871) 119871minus1119903119901 (V119871119901) (12)

If we assume that there are 119898 positive values 1198711 1198712 119871119898then there are119872 = 119899 minus 119898 zeros where119898 gt 0

We observe that 119898 sim 119861119894(119899 1 minus exp (minus120582)) and 119901(119898 = 0) =exp (minus119899120582) hence the likelihood function is

119871 = (119899119898)119901119899minus1198980 1199021198980 119898prod119894=1

119891+120579 (119871 119894) (13)

and the log-likelihood for 120579 = (120582 120572 119901) islog119871 (120579 1198711 1198712 119871119899)

= log((119899119898)119901119899minus1198980 1199021198980 119898prod119894=1

119891+120579 (119871 119894))

= log((119899119898) 119890minus120582119899+120582119898 (1 minus 119890minus120582)119898 119898prod119894=1

119890minus120582minus120572119871 119894 1119871 119894

sdot infinsum119895=1

(120582120572119901119871119901119894119895)119895119895Γ (119895119901) ) = log(119899119898) + 120582 (119898 minus 119899) + 119898sdot log (1 minus 119890minus120582) + 119898sum

119894=1

minus 120582 minus 120572119871 119894 minus log 119871 119894

+ log119898sum119894=1

infinsum119895=1

(120582120572119901119871119901119894119895)119895119895Γ (119895119901)

(14)

Now for we have120597 log 119871 (120579 1198711 1198712 119871119899)120597120582= 119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898

+ 1120582119898sum119894=1

infinsum119895=1

119894 120597 log119871 (120579 1198711 1198712 119871119899)120597120582 = 0 997904rArr119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898 + 1120582

119898sum119894=1

infinsum119895=1

119894 = 0

(15)

We can observe from the above evaluation that 120582 can not beexpressed in closed form similar derivation also shows that120572 as well can not be expressed in closed form Therefore wecan only estimate 120582 and 120572 using numerical methods Withersand Nadarajah [13] also observed that the probability densityfunction can not be expressed in closed form and thereforeit is difficult to find the analytic form of the estimators Sowe will express the probability density function in terms ofexponential dispersion models as described below

Definition 3 (see [14]) A probability density function of theform

119891 (119910 120579 Θ) = 119886 (119910 Θ) exp 1Θ [119910120579 minus 119896 (120579)] (16)

for suitable functions 119896() and 119886() is called an exponentialdispersion model

Θ gt 0 is the dispersion parameterThe function 119896(120579) is thecumulant of the exponential dispersion model since Θ = 1then 1198961015840() are the successive cumulants of the distribution [15]The exponential dispersion models were first introduced byFisher in 1922

If we let 119871 119894 = log119891(119910119894 120579119894 Θ) as a contribution of 119910119894 to thelikelihood function 119871 = sum119894 119871 119894 then

119871 119894 = 1Θ [119910119894120579 minus 119896 (120579119894)] + log 119886 (119910 Θ) 120597119871 119894120597120579119894 = 1Θ (119910119894 minus 1198961015840 (120579119894)) 1205972119871 1198941205971205792119894 = minus 1Θ11989610158401015840 (120579119894)

(17)

However we expect that E(120597119871 119894120597120579119894) = 0 and minusE(1205972119871 1198941205971205792119894 ) =E(120597119871 119894120597120579119894)2 so that

E( 1Θ (119910119894 minus 1198961015840 (120579119894))) = 01Θ (E (119910119894) minus 1198961015840 (120579119894)) = 0

E (119910119894) = 1198961015840 (120579119894) (18)

Furthermore

minusE(1205972119871 1198941205971205792119894 ) = E(120597119871 119894120597120579119894 )2

minusE(minus 1Θ11989610158401015840 (120579119894)) = E( 1Θ (119910119894 minus 1198961015840 (120579119894)))2 11989610158401015840 (120579119894)Θ = Var (119910119894)Θ2

Var (119910119894) = Θ11989610158401015840 (120579119894)

(19)

Therefore the mean of the distribution is E[119884] = 120583 = 119889119896(120579)119889120579 and the variance is Var(119884) = Θ(1198892119896(120579)1198891205792)

Journal of Probability and Statistics 5

The relationship 120583 = 119889119896(120579)119889120579 is invertible so that 120579 canbe expressed as a function of 120583 as such we have Var(119884) =Θ119881(120583) where 119881(120583) is called a variance function

Definition 4 The family of exponential dispersion modelswhose variance functions are of the form 119881(120583) = 120583119901 for119901 isin (minusinfin 0]cup[1infin) are called Tweedie family distributions

Examples are as follows for 119901 = 0 then we have a normaldistribution 119901 = 1 and Θ = 1 it is a Poisson distributionand Gamma distribution for 119901 = 2 while when 119901 = 3 it isGaussian inverse distribution Tweedie densities can not beexpressed in closed form (apart from the examples above)but can instead be identified by their cumulants generatingfunctions

From Var(119884) = Θ(1198892119896(120579)1198891205792) then for Tweedie familydistribution we have

Var (119884) = Θ1198892119896 (120579)1198891205792 = Θ119881 (120583) = Θ120583119901 (20)

Hence we can solve for 120583 and 119896(120579) as follows120583 = 119889119896 (120579)119889120579

119889120583119889120579 = 120583119901 997904rArrint 119889120583120583119901 = int119889120579

120579 = 1205831minus1199011 minus 119901 119901 = 1log120583 119901 = 1

(21)

by equating the constants of integration above to zeroFor 119901 = 1 we have 120583 = [(1 minus 119901)120579]1(1minus119901) so that

int119889119896 (120579) = int [(1 minus 119901) 120579]1(1minus119901) 119889120579119896 (120579) = [(1 minus 119901) 120579](2minus119901)(1minus119901)2 minus 119901 = 120583(2minus119901)(1minus119901)2 minus 119901

119901 = 2(22)

Proposition 5 Thecumulant generating function of a Tweediedistribution for 1 lt 119901 lt 2 is

log119872119884 (119905)= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (23)

Proof From (16) the moment generating function is given by

119872119884 (119905) = int exp (119905119910) 119886 (119910 Θ) exp 1Θ [119910120579 minus 119896 (120579)] 119889119910= int119886 (119910Θ) exp 1Θ [119910 (120579 + 119905Θ) minus 119896 (120579)] 119889119910= int119886 (119910Θ) exp(119910 (120579 + 119905Θ) minus 119896 (120579)Θ+ 119896 (120579 + 119905Θ) minus 119896 (120579)Θ )119889119910 = int119886 (119910 Θ)sdot exp(119910 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ+ 119896 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )119889119910 = int119886 (119910Θ)sdot exp(119910 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )sdot exp(119896 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )119889119910= exp(119896 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )int119886 (119910Θ)sdot exp(119910 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )119889119910= exp 1Θ [119896 (120579 + 119905Θ) minus 119896 (120579)]

(24)

Hence cumulant generating function is

log119872119884 (119905) = 1Θ [119896 (120579 + 119905Θ) minus 119896 (120579)] (25)

For 1 lt 119901 lt 2 we substitute 120579 and 119896(120579) to havelog119872119884 (119905)

= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (26)

By comparing the cumulant generating functions inLemma 1 and Proposition 5 the compound Poisson processcan be thought of as Tweedie distribution with parameters(120582 120572 119875) expressed as follows

120582 = 1205832minus119901Θ(2 minus 119901) 120572 = Θ (119901 minus 1) 120583119901minus1119875 = 2 minus 119901119901 minus 1

(27)

The requirement that the Gamma shape parameter 119875 bepositive implies that only Tweedie distributions between 1 lt119901 lt 2 can represent the Poisson-Gamma compound processIn addition for 120582 gt 0 120572 gt 0 implies 120583 gt 0 and Θ gt 0

6 Journal of Probability and Statistics

Proposition 6 Based on Tweedie distribution the probabilityof receiving no rainfall at all is

119875 (119871 = 0) = exp[minus 1205832minus119901Θ(2 minus 119901)] (28)

and the probability of having a rainfall event is

119875 (119871 gt 0)= 119882 (120582 120572 119871 119875) exp[ 119871(1 minus 119901) 120583119901minus1 minus 1205832minus1199012 minus 119901] (29)

where

119882(120582 120572 119871 119875) = infinsum119895=1

120582119895 (120572119871)119895119875 119890minus120582119895Γ (119895119875) (30)

Proof This follows by directly substituting the values of 120582 and120579 119896(120579) into (16)The function 119882(120582 120572 119871 119875) is an example of Wrightrsquos

generalized Bessel function however it can not be expressedin terms of the more common Bessel function To evaluate itthe value of 119895 is determined forwhich the function119882119895 reachesthe maximum [15]

3 Parameter Estimation

We approximate the function 119882(120582 120572 119871 119875) =suminfin119895=1(120582119895(120572119871)119895119875119890minus120582119895Γ(119895119875)) = suminfin

119895=1 119882119895 following theprocedure by [15] where the value of 119895 is determined forwhich119882119895 reaches maximumWe treat 119895 as continuous so that119882119895 is differentiated with respect to 119895 and set the derivative tozero So for 119871 gt 0 we have the followingLemma 7 (see [15]) The log maximum approximation of 119882119895

is given by

log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ

(31)

where 119895max = 1198712minus119901(2 minus 119901)Θ

Proof

119882(120582 120572 119871 119875) = infinsum119895=1

120582119895 (120572119871)119895119875minus1 119890minus120582119895Γ (119895119875)= infinsum

119895=1

120582119895119871119895119875minus1119890minus119871120591119890minus120582119895120591119875119895Γ (119875119895) where 120591 = 1120572 (32)

Substituting the values of 120582 120572 in the above equation we have

119882(120582 120572 119871 119875)= infinsum

119895=1

(1205832minus119901Θ (2 minus 119901))119895 119871119895119875minus1 [Θ (1 minus 119901) 120583119901minus1]119895119875 119890minus119871120591119890minus120582119895Γ (119875119895)= 119890minus119871120591minus120582119871minus1 infinsum

119895=1

120583(2minus119901)119895 (Θ (119901 minus 1) 120583119901minus1)119895119875 119871119895119875Θ119895 (2 minus 119901)119895 119895Γ (119895119875)

= 119890minus119871120591minus120582119871minus1 infinsum119895=1

119871119895119875 (119901 minus 1)119895119875 120583(2minus119901)119895+(119901minus1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)

(33)

The term 120583(2minus119901)119895+(119901minus1)119895119875 depends on the 119871 119901 119875 Θ values sowe maximize the summation

119882(119871Θ 119875) = infinsum119895=1

119871119895119875 (119901 minus 1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)= infinsum

119895=1

119911119895119895Γ (119895119875)where 119911 = 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901)

= 119882119895

(34)

Considering119882119895 we have

log119882119895 = 119895 log 119911 minus log 119895 minus log (119875119895)= 119895 log 119911 minus log Γ (119895 + 1) minus log (119875119895) (35)

Using Stirlingrsquos approximation of Gamma functions we have

log Γ (1 + 119895) asymp (1 + 119895) log (1 + 119895) minus (1 + 119895)+ 12 log( 21205871 + 119895)

log Γ (119875119895) asymp 119875119895 log (119875119895) minus 119875119895 + 12 log(2120587119875119895 ) (36)

And hence we have

119882119895 asymp 119895 [log 119911 + (1 + 119875) minus 119875 log119875 minus (1 minus 119875) log 119895]minus log (2120587) minus 12 log119875 minus log 119895 (37)

For 1 lt 119901 lt 2 we have 119875 = (2 minus 119901)(119901 minus 1) gt 0 hencethe logarithms have positive arguments Differentiating withrespect to 119895 we have

120597 log119882119895120597119895 asymp log 119911 minus 1119895 minus log 119895 minus 119875 log (119875119895)asymp log 119911 minus log 119895 minus 119875 log (119875119895) (38)

Journal of Probability and Statistics 7

where 1119895 is ignored for large 119895 Solving for (120597 log119882119895)120597119895 = 0we have

119895max = 1198712minus119901(2 minus 119901)Θ (39)

Substituting 119895max in log119882119895 to find the maximum approxima-tion of119882119895 we have

log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ

(40)

Hence the result follows

It can be observed that 120597119882119895120597119895 is monotonically decreas-ing hence log119882119895 is strictly convex as a function of 119895Therefore 119882119895 decays faster than geometrically on either sideof 119895max [15] Therefore if we are to estimate 119882(119871Θ 119875) by(119871 Θ 119875) = sum119895119906

119895=119895119889119882119895 the approximation error is bounded

by geometric sum

119882(119871Θ 119875) minus (119871 Θ 119875)lt 119882119895119889minus1

1 minus 119903119895119889minus11198971 minus 119903119897 + 119882119895119906+1

11 minus 119903119906 119903119897 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119889 minus 1119903119906 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119906 + 1

(41)

For quick and accurate evaluation of119882(120582 120572 119871 119875) the seriesis summed for only those terms in the series which contributesignificantly to the sum

Generalized linear models extend the standard linearregressionmodels to incorporate nonnormal response distri-butions and possibly nonlinear functions of the mean Theadvantage of GLMs is that the fitting process maximizes thelikelihood for the choice of the distribution for a randomvariable 119910 and the choice is not restricted to normality unlikelinear regression [16]

The exponential dispersion models are the responsedistributions for the generalized linear models Tweedie dis-tributions are members of the exponential dispersion modelsupon which the generalized linear models are based Conse-quently fitting a Tweedie distribution follows the frameworkof fitting a generalized linear model

Lemma 8 In case of a canonical link function the sufficientstatistics for 120573119895 are sum119899

119894=1 119910119894119909119894119895

Proof For 119899 independent observations 119910119894 of the exponentialdispersion model (16) the log-likelihood function is

119871 (120573) = 119899sum119894=1

119871 119894 = 119899sum119894

log119891 (119910119894 120579119894 Θ)= 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ + 119899sum119894

log 119886 (119910119894 Θ) (42)

But 120579119894 = sum119901119895 120573119895119909119894119895 hence

119899sum119894

119910119894120579119894 = 119899sum119894=1

119910119894 119901sum119895

120573119895119909119894119895 = 119901sum119895

120573119895 119899sum119894=1

119910119894119909119894119895 (43)

Proposition 9 Given that 119910119894 is distributed as (16) then itsdistribution depends only on its first two moments namely 120583119894and Var(119910119894)Proof Let 119892(120583119894) be the link function of the GLM such that120578119894 = sum119901

119895=1 120573119895119909119894119895 = 119892(120583119894) The likelihood equations are

120597119871 (120573)120597120573 = 119899sum119894=1

120597119871 119894120597120573119895 forall119895 (44)

Using chain rule we have

120597119871 119894120597120573119895 = 120597119871 119894120597120579119894 120597120579119894120597120583119894 120597120583119894120597120578119894 120597120578119894120597120573119895 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 (45)

Hence

120597119871 (120573)120597120573 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 = 119910119894 minus 120583119894Θ120583119901119894 119909119894119895 120597120583119894120597120578119894 (46)

Since Var(119910119894) = 119881(120583119894) the relationship between themean andvariance characterizes the distribution

Clearly a GLM only requires the first two moments ofthe response 119910119894 hence despite the difficulty of full likelihoodanalysis of Tweedie distribution as it can not be expressedin closed form for 1 lt 119901 lt 2 we can still fit aTweedie distribution family The likelihood is only requiredto estimate 119901 and Θ as well as diagnostic check of the model

Proposition 10 Under the standard regularity conditions forlarge 119899 the maximum likelihood estimator 120573 of 120573 for general-ized linear model is efficient and has an approximate normaldistribution

Proof From the log-likelihood the covariance matrix of thedistribution is the inverse of the information matrix J =E(minus1205972119871(120573)120597120573ℎ120597120573119895)

8 Journal of Probability and Statistics

So

J = E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = E[(1205972119871 119894120597120573ℎ )(1205972119871 119894120597120573119895 )]= [( 119910119894 minus 120583119894

Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]

= 119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 (47)

Hence

E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = 119899sum119894

119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 = (119883119879119882119883) (48)

where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with

Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least

square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)

where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie

densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function

Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by

Θ = 119899sum119894=1

[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)

Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]

4 Data and Model Fitting

41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data

In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm

Rainfall for Balaka for 10 years

020

60

100

Am

ount

2000 2005 2010 20151995Year

Figure 1 Daily rainfall amount for Balaka district

minus8

minus4

0246

log(

varia

nce)

minus2 0 2minus4log(mean)

Figure 2 Variance mean relationship

We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as

log (Variance) = 120572 + 120573 log (mean) (51)

Variace = 119860 lowastmean120573 119860 isin R (52)

Hence the variance can be expressed as some power 120573 isin R

of the mean agreeing with the Tweedie variance functionrequirement

42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling

The canonical link function is given by

log120583119894 = 1198860 + 1198861 sin( 2120587119894365) + 1198862 cos( 2120587119894365) (53)

where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile

log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306

From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model

Journal of Probability and Statistics 9

Table 1 Estimated parameter values

Parameter Estimate Std error 119905 value Pr(gt |119905|)1198860 01653 00473 34930 00005lowastlowastlowast1198861 09049 00572 1581100 lt2e minus16lowastlowastlowast1198862 20326 00622 326720 lt2e minus16lowastlowastlowastΘ 148057 - - -With 119904119894119892119899119894f code 0 lowast lowast lowast

minus12500

minus12000

minus11500

minus11000

minus10500

L

(95 confidence interval)

13 14 15 16 17 1812120585 index

Figure 3 Profile likelihood

The predicted 120583119894 119901 Θ for each day only depends on thedayrsquos conditions so that for each day 119894 we have

120583119894 = exp [01653 + 09049 sin( 2120587119894365)+ 20326 cos( 2120587119894365)]

119901 = 15306Θ = 148057

(54)

From these estimated values we can calculate the parameter(119894 119894 ) from the corresponding formulas above as

119894 = 165716 (exp [01653 + 09049 sin( 2120587119894365)+ 203263 cos( 2120587119894365)])

04694 = 74284 (exp [01653 + 09049 sin( 2120587119894365)

+ 20326 cos( 2120587119894365)])05306

= 08847

(55)

Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally

43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean

Rainfall means

Actual meanPredicted mean

0

5

10

15

20

Mea

n

100 200 3000Days of the year

Figure 4 Actual versus predicted mean

estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894

The goodness of fit is determined by deviance which isdefined as

minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model

]= minus2 [119871 (120583 119910) minus 119871 (119910 119910)]= 2 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1

119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum

119894=1

119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ

(56)

Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance

In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is

Dev119901

= 2 119899sum119894=1

(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)

Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model

44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges

10 Journal of Probability and Statistics

Residuals

2000 2005 2010 20151995Year

0

50

100

150

Resid

uals

Figure 5 Residuals of the model

Normal Q-Q Plot

minus2

0

2

4

Sam

ple q

uant

iles

minus2 0minus4 42Theoretical quantiles

Figure 6 Q-Q plot of the quantile residuals

to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)

So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale

The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile

Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability

density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are

119903119902119894 = Φminus1 (119906119894) (58)

with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ

Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model

Rainfall amount

Days of the year1 44 95 153 218 283 348 413 478 543 608 673

PredictedActual

020

60

100

Am

ount

Figure 7 Simulated rainfall and observed rainfall

Probability of no rainfall

065

075

085

095

Prob

abili

ty

100 200 3000Days of the year

Figure 8 Probability of rainfall occurrence

5 Simulation

Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7

The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons

The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on

However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal

But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8

It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the

Journal of Probability and Statistics 11

Table 2 Data statistics

Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season

In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously

6 Conclusion

A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems

The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives

Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model

Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of

occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support

References

[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo

[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004

[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011

[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007

[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998

[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000

[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005

[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012

[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center

[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003

[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed

12 Journal of Probability and Statistics

dispersion parameterrdquo Accident Analysis amp Prevention vol 38no 4 pp 751ndash766 2006

[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010

[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011

[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014

[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008

[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015

[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 2: ResearchArticle A Poisson-Gamma Model for Zero Inflated … · 2017. 11. 7. · ResearchArticle A Poisson-Gamma Model for Zero Inflated Rainfall Data NelsonChristopherDzupire ,1 PhilipNgare,1,2

2 Journal of Probability and Statistics

understand and interpret and use maximum likelihood tofind the parameters However models involve many parame-ters to fully describe the dynamics of rainfall aswell asmakingseveral assumptions for the process

Wilks [5] proposed a multisite model for daily precipi-tation using a combination of two-state Markov process (forthe rainfall occurrence) and amixed exponential distribution(for the precipitation amount) He found that the mixture ofexponential distributions offered a much better fit than thecommonly used Gamma distribution

In study of Leobacher and Ngare [3] the precipitationis modeled on a monthly basis by constructing a suit-able Markov-Gamma process to take into account seasonalchanges of precipitation It is assumed that rainfall datafor different years of the same month is independent andidentically distributed It is assumed that precipitation can beforecast with sufficient accuracy for a month

Another approach of modeling rainfall is based on thePoisson cluster model where two of the most recognizedcluster based models in the stochastic modeling of rain-fall are the Newman-Scott Rectangular Pulses model andthe Bartlett-Lewis Rectangular Pulse model These mod-els represent rainfall sequences in time and rainfall fieldsin space where both the occurrence and depth processesare combined The difficulty in Poisson cluster models asobserved by Onof et al [6] is the challenge of how manyfeatures should be addressed so that the model is stillmathematically tractable In addition the models are bestfitted by the method of moments and so requires matchinganalytic expressions for the statistical properties such asmeanand variance

Carmona and Diko [7] developed a time-homogeneousjump Markov process to describe rainfall dynamics Therainfall process was assumed to be in form of storms whichconsists of cells themselves At a cell arrival time the rainfallprocess jumps up by a random amount and at extinction timeit jumps down by a random amount bothmodeled as Poissonprocess Each time the rain intensity changes an exponentialincrease occurs either upwards or downwards To preservenonnegative intensity the downward jump size is truncatedto the current jump sizeTheMarkov jumpprocess also allowsfor a jump directly to zero corresponding to the state of norain [8]

In this study the rainfall process is modeled as a singlemodel where the occurrence and intensity of rainfall aresimultaneously modeled The Poisson process models thedaily occurrence of rainfall while the intensity is modeledusing Gamma distribution as the magnitude of the jumpsof the Poisson process Hence we have a compound Poissonprocess which is Poisson-GammamodelThe contribution ofthis study is twofold a Poisson-Gamma model that simul-taneously describes the rainfall occurrence and intensity atonce and a suitablemodel for zero inflated datawhich reducesoverdispersion

This paper is structured as follows In Section 2 thePoisson-Gamma model is described and then formulatedmathematically while Section 3 presents methods of estimat-ing the parameters of the model In Section 4 the model isfitted to the data and goodness of fit of the model is evaluated

by mean deviance whereas quantile residuals perform thediagnostics check of the model Simulation and forecastingare carried out in Section 5 and the study concludes inSection 6

2 Model Formulation

21 Model Description Rainfall comprises discrete and con-tinuous components in that if it does not rain the amountof rainfall is discrete whereas if it rains the amount iscontinuous In most research works [3 4 9] the rainfallprocess is presented by use of two separate models oneis for the occurrence and conditioned on the occurrenceand another model is developed for the amount of rain-fall Rainfall occurrence is basically modeled as first orhigher order Markov chain process and conditioned onthis process a distribution is used to fit the precipitationamount Commonly used distributions are Gamma expo-nential mixture of exponential Weibull and so on Thesemodels work based on several assumptions and inclusionof several parameters to capture the observed temporaldependence of the rainfall process However rainfall dataexhibit overdispersion [10] which is caused by various factorslike clustering unaccounted temporal correlation or the factthat the data is a product of Bernoulli trials with unequalprobability of eventsThe stochastic models developed in thisway underestimate the overdispersion of rainfall data whichmay result in underestimating the risk of low or high seasonalrainfall

Our interest in this research is to simultaneously modelthe occurrence and intensity of rainfall in one model Wewould model the rainfall process by using a Poisson-Gammaprobability distribution which is flexible to model the exactzeros and the amount of rainfall together

Rainfall ismodeled as a compoundPoisson processwhichis a Levy process with Gamma distributed jumps This ismotivated by the sudden changes of rainfall amount fromzero to a large positive value following each rainfall eventwhich are modeled as pure jumps of the compound Poissonprocess

We assume rainfall arrives in forms of storms following aPoisson process and at each arrival time the current intensityincreases by a randomamount based onGammadistributionThe jumps of the driving process represent the arrival ofthe storm events generating a jump size of random sizeEach storm comprises cells that also arrive following anotherPoisson process

The Poisson cluster processes gives an appropriate tool asrainfall data indicating presence of clusters of rainfall cellsAs observed by Onof et al [6] use of Gamma distributedvariables for cell depth improves the reproduction of extremevalues

Lord [11] used the Poisson-Gamma compound process tomodel the motor vehicle crashes where they examined theeffects of low sample mean values and small sample size onthe estimation of the fixed dispersion parameter Wang [12]proposed a Poisson-Gamma compound approach for speciesrichness estimation

Journal of Probability and Statistics 3

22 Mathematical Formulation Let 119873119905 be total number ofrainfall event per day following a Poisson process such that

119875 (119873119905 = 119899) = 119890minus120582 120582119899119899 forall119899 isin N119873119905 = sum

119905ge1

1[119905infin) (119905) (1)

The amount of rainfall is the total sum of the jumps ofeach rainfall event say (119910119894)119894ge1 assumed to be identically andindependently Gamma distributed and independent of thetimes of the occurrence of rainfall

119871 (119905) = 119873119905sum119894=1

119910119894 119873119905 = 1 2 3 0 119873119905 = 0 (2)

such that 119910119894 sim Gamma(120572 119875) is with probability density func-tion

119891 (119910) = 120572119901119910119875minus1119890minus120572119910Γ (119875) 119910 gt 00 119910 le 0 (3)

Lemma 1 The compound Poisson process (2) has a cumulantfunction

120595 (119904 119905 119909) = 120582119905 (119890119872119884(119909) minus 1) (4)

for 0 le 119904 lt 119905 and 119909 isin R where119872119884(119909) is the moment generat-ing function of the Gamma distribution

Proof Themoment generating functionΦ(119904) of 119871(119904) is givenby

119872119871 (119904) = E (119890119904119871(119905))= infinsum

119895=0

E (119890119904119871(119905) | 119873 (119905) = 119895) 119875 (119873 (119905) = 119895)= infinsum

119895=0

E (119890119904(119871(1)+119871(2)+sdotsdotsdot+119871(119895)) | 119873 (119905) = 119895) 119875 (119873 (119905) = 119895)= infinsum

119895=0

E (119890119904(119871(1)+119871(2)+sdotsdotsdot+119871(119895))) 119875 (119873 (119905) = 119895)because of independence of 119871 and 119873(119905)

= infinsum119895=0

(119872119884 (119904))119895 119890minus120582119905 (120582119905)119895119895 = 119890minus120582119905 infinsum119895=0

(119872119884 (119904))119895 (120582119905)119895119895= 119890minus120582119905+119872119884(119904)120582119905

(5)

So the cumulant of 119871 is

ln119872119871 (119904) = 120582 (119872119884 (119904) minus 1) = 120582 [(1 minus 120572119909)minus119875 minus 1] (6)

If we observe the occurrence of rainfall for 119899 periodsthen we have the sequence 119871 119894119899119894=1 which is independent andidentically distributed

If on a particular day there is no rainfall that occurredthen

119875 (119871 = 0) = exp (minus120582) (120582)00 = exp (minus120582) = 1199010 (7)

Therefore the process has a point mass at 0 which impliesthat it is not entirely continuous random variable

Lemma 2 The probability density function of 119871 in (2) is

119891120579 (119871) = exp (minus120582) 120575 (119871) + exp (minus120582 minus 120572119871) 119871minus1119903119875 (V119871119875) (8)

where 1205750(119871) is a dirac function at zero

Proof Let 1199020 = 1minus1199010 be the probability that it rained Hencefor 119871 119894 gt 0 we have119891+120579 (119871) = infinsum

119894=1

1199011198941199020 (120572119894119875119871119894119875minus1 exp (minus120572119871)Γ (119894119901) )where 119901119894 = exp (minus120582) (120582)119894119894

= 11199020 [infinsum119894=1

119901119894 exp (minus120572119871) 120572119894119875119871119894119875minus1Γ (119894119875) ]= 11199020 [exp (minus120572119871) infinsum

119894=1

119901119894 120572119894119875119871119894119875minus1Γ (119894119875) ]= 11199020 [exp (minus120572119871) infinsum

119894=1

(exp (minus120582) (120582)119894119894 ) 120572119894119875119871119894119875minus1Γ (119894119875) ]= exp (minus120582)1199020 [exp (minus120572119871) infinsum

119894=1

((120582)119894119894 ) 120572119894119901119871119894119875minus1Γ (119894119901) ]= exp (minus120582)1199020 exp (minus120572119871) [infinsum

119894=1

(120582)119894 (120572119871)119894119901119871119894Γ (119894119875) ]= 119871minus1 exp (minus120572119871)(exp (120582) minus 1)

infinsum119894=1

120582120572119875119871119875119894Γ (119894119875)

(9)

If we let V = 120582120572119875 and 119903119901(V119871119875) = suminfin119894=1(V119871119875119894Γ(119894119875)) then we

have

119891+120579 (119871) = 119871minus1 exp (minus120572119871)(exp (120582) minus 1) 119903119875 (V119871119875) (10)

4 Journal of Probability and Statistics

We can express the probability density function 119891120579(119871) interms of a Dirac function as119891120579 (119871) = 11990101205750 (119871) + 1199020119891+

120579 (119871)= exp (minus120582) 1205750 (119871)

+ [ 1199020(exp (120582) minus 1)] 119871minus1 exp (minus120572119871) 119903119875 (V119871119875)= exp (minus120582) 1205750 (119871)

+ exp (minus120582 minus 120572119871) 119871minus1119903119875 (V119871119875)

(11)

Consider a random sample of size 119899 of 119871 119894 with the proba-bility density function

119891120579 (119871) = exp (minus120582) 120575 (119871) + exp (minus120582 minus 120572119871) 119871minus1119903119901 (V119871119901) (12)

If we assume that there are 119898 positive values 1198711 1198712 119871119898then there are119872 = 119899 minus 119898 zeros where119898 gt 0

We observe that 119898 sim 119861119894(119899 1 minus exp (minus120582)) and 119901(119898 = 0) =exp (minus119899120582) hence the likelihood function is

119871 = (119899119898)119901119899minus1198980 1199021198980 119898prod119894=1

119891+120579 (119871 119894) (13)

and the log-likelihood for 120579 = (120582 120572 119901) islog119871 (120579 1198711 1198712 119871119899)

= log((119899119898)119901119899minus1198980 1199021198980 119898prod119894=1

119891+120579 (119871 119894))

= log((119899119898) 119890minus120582119899+120582119898 (1 minus 119890minus120582)119898 119898prod119894=1

119890minus120582minus120572119871 119894 1119871 119894

sdot infinsum119895=1

(120582120572119901119871119901119894119895)119895119895Γ (119895119901) ) = log(119899119898) + 120582 (119898 minus 119899) + 119898sdot log (1 minus 119890minus120582) + 119898sum

119894=1

minus 120582 minus 120572119871 119894 minus log 119871 119894

+ log119898sum119894=1

infinsum119895=1

(120582120572119901119871119901119894119895)119895119895Γ (119895119901)

(14)

Now for we have120597 log 119871 (120579 1198711 1198712 119871119899)120597120582= 119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898

+ 1120582119898sum119894=1

infinsum119895=1

119894 120597 log119871 (120579 1198711 1198712 119871119899)120597120582 = 0 997904rArr119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898 + 1120582

119898sum119894=1

infinsum119895=1

119894 = 0

(15)

We can observe from the above evaluation that 120582 can not beexpressed in closed form similar derivation also shows that120572 as well can not be expressed in closed form Therefore wecan only estimate 120582 and 120572 using numerical methods Withersand Nadarajah [13] also observed that the probability densityfunction can not be expressed in closed form and thereforeit is difficult to find the analytic form of the estimators Sowe will express the probability density function in terms ofexponential dispersion models as described below

Definition 3 (see [14]) A probability density function of theform

119891 (119910 120579 Θ) = 119886 (119910 Θ) exp 1Θ [119910120579 minus 119896 (120579)] (16)

for suitable functions 119896() and 119886() is called an exponentialdispersion model

Θ gt 0 is the dispersion parameterThe function 119896(120579) is thecumulant of the exponential dispersion model since Θ = 1then 1198961015840() are the successive cumulants of the distribution [15]The exponential dispersion models were first introduced byFisher in 1922

If we let 119871 119894 = log119891(119910119894 120579119894 Θ) as a contribution of 119910119894 to thelikelihood function 119871 = sum119894 119871 119894 then

119871 119894 = 1Θ [119910119894120579 minus 119896 (120579119894)] + log 119886 (119910 Θ) 120597119871 119894120597120579119894 = 1Θ (119910119894 minus 1198961015840 (120579119894)) 1205972119871 1198941205971205792119894 = minus 1Θ11989610158401015840 (120579119894)

(17)

However we expect that E(120597119871 119894120597120579119894) = 0 and minusE(1205972119871 1198941205971205792119894 ) =E(120597119871 119894120597120579119894)2 so that

E( 1Θ (119910119894 minus 1198961015840 (120579119894))) = 01Θ (E (119910119894) minus 1198961015840 (120579119894)) = 0

E (119910119894) = 1198961015840 (120579119894) (18)

Furthermore

minusE(1205972119871 1198941205971205792119894 ) = E(120597119871 119894120597120579119894 )2

minusE(minus 1Θ11989610158401015840 (120579119894)) = E( 1Θ (119910119894 minus 1198961015840 (120579119894)))2 11989610158401015840 (120579119894)Θ = Var (119910119894)Θ2

Var (119910119894) = Θ11989610158401015840 (120579119894)

(19)

Therefore the mean of the distribution is E[119884] = 120583 = 119889119896(120579)119889120579 and the variance is Var(119884) = Θ(1198892119896(120579)1198891205792)

Journal of Probability and Statistics 5

The relationship 120583 = 119889119896(120579)119889120579 is invertible so that 120579 canbe expressed as a function of 120583 as such we have Var(119884) =Θ119881(120583) where 119881(120583) is called a variance function

Definition 4 The family of exponential dispersion modelswhose variance functions are of the form 119881(120583) = 120583119901 for119901 isin (minusinfin 0]cup[1infin) are called Tweedie family distributions

Examples are as follows for 119901 = 0 then we have a normaldistribution 119901 = 1 and Θ = 1 it is a Poisson distributionand Gamma distribution for 119901 = 2 while when 119901 = 3 it isGaussian inverse distribution Tweedie densities can not beexpressed in closed form (apart from the examples above)but can instead be identified by their cumulants generatingfunctions

From Var(119884) = Θ(1198892119896(120579)1198891205792) then for Tweedie familydistribution we have

Var (119884) = Θ1198892119896 (120579)1198891205792 = Θ119881 (120583) = Θ120583119901 (20)

Hence we can solve for 120583 and 119896(120579) as follows120583 = 119889119896 (120579)119889120579

119889120583119889120579 = 120583119901 997904rArrint 119889120583120583119901 = int119889120579

120579 = 1205831minus1199011 minus 119901 119901 = 1log120583 119901 = 1

(21)

by equating the constants of integration above to zeroFor 119901 = 1 we have 120583 = [(1 minus 119901)120579]1(1minus119901) so that

int119889119896 (120579) = int [(1 minus 119901) 120579]1(1minus119901) 119889120579119896 (120579) = [(1 minus 119901) 120579](2minus119901)(1minus119901)2 minus 119901 = 120583(2minus119901)(1minus119901)2 minus 119901

119901 = 2(22)

Proposition 5 Thecumulant generating function of a Tweediedistribution for 1 lt 119901 lt 2 is

log119872119884 (119905)= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (23)

Proof From (16) the moment generating function is given by

119872119884 (119905) = int exp (119905119910) 119886 (119910 Θ) exp 1Θ [119910120579 minus 119896 (120579)] 119889119910= int119886 (119910Θ) exp 1Θ [119910 (120579 + 119905Θ) minus 119896 (120579)] 119889119910= int119886 (119910Θ) exp(119910 (120579 + 119905Θ) minus 119896 (120579)Θ+ 119896 (120579 + 119905Θ) minus 119896 (120579)Θ )119889119910 = int119886 (119910 Θ)sdot exp(119910 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ+ 119896 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )119889119910 = int119886 (119910Θ)sdot exp(119910 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )sdot exp(119896 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )119889119910= exp(119896 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )int119886 (119910Θ)sdot exp(119910 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )119889119910= exp 1Θ [119896 (120579 + 119905Θ) minus 119896 (120579)]

(24)

Hence cumulant generating function is

log119872119884 (119905) = 1Θ [119896 (120579 + 119905Θ) minus 119896 (120579)] (25)

For 1 lt 119901 lt 2 we substitute 120579 and 119896(120579) to havelog119872119884 (119905)

= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (26)

By comparing the cumulant generating functions inLemma 1 and Proposition 5 the compound Poisson processcan be thought of as Tweedie distribution with parameters(120582 120572 119875) expressed as follows

120582 = 1205832minus119901Θ(2 minus 119901) 120572 = Θ (119901 minus 1) 120583119901minus1119875 = 2 minus 119901119901 minus 1

(27)

The requirement that the Gamma shape parameter 119875 bepositive implies that only Tweedie distributions between 1 lt119901 lt 2 can represent the Poisson-Gamma compound processIn addition for 120582 gt 0 120572 gt 0 implies 120583 gt 0 and Θ gt 0

6 Journal of Probability and Statistics

Proposition 6 Based on Tweedie distribution the probabilityof receiving no rainfall at all is

119875 (119871 = 0) = exp[minus 1205832minus119901Θ(2 minus 119901)] (28)

and the probability of having a rainfall event is

119875 (119871 gt 0)= 119882 (120582 120572 119871 119875) exp[ 119871(1 minus 119901) 120583119901minus1 minus 1205832minus1199012 minus 119901] (29)

where

119882(120582 120572 119871 119875) = infinsum119895=1

120582119895 (120572119871)119895119875 119890minus120582119895Γ (119895119875) (30)

Proof This follows by directly substituting the values of 120582 and120579 119896(120579) into (16)The function 119882(120582 120572 119871 119875) is an example of Wrightrsquos

generalized Bessel function however it can not be expressedin terms of the more common Bessel function To evaluate itthe value of 119895 is determined forwhich the function119882119895 reachesthe maximum [15]

3 Parameter Estimation

We approximate the function 119882(120582 120572 119871 119875) =suminfin119895=1(120582119895(120572119871)119895119875119890minus120582119895Γ(119895119875)) = suminfin

119895=1 119882119895 following theprocedure by [15] where the value of 119895 is determined forwhich119882119895 reaches maximumWe treat 119895 as continuous so that119882119895 is differentiated with respect to 119895 and set the derivative tozero So for 119871 gt 0 we have the followingLemma 7 (see [15]) The log maximum approximation of 119882119895

is given by

log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ

(31)

where 119895max = 1198712minus119901(2 minus 119901)Θ

Proof

119882(120582 120572 119871 119875) = infinsum119895=1

120582119895 (120572119871)119895119875minus1 119890minus120582119895Γ (119895119875)= infinsum

119895=1

120582119895119871119895119875minus1119890minus119871120591119890minus120582119895120591119875119895Γ (119875119895) where 120591 = 1120572 (32)

Substituting the values of 120582 120572 in the above equation we have

119882(120582 120572 119871 119875)= infinsum

119895=1

(1205832minus119901Θ (2 minus 119901))119895 119871119895119875minus1 [Θ (1 minus 119901) 120583119901minus1]119895119875 119890minus119871120591119890minus120582119895Γ (119875119895)= 119890minus119871120591minus120582119871minus1 infinsum

119895=1

120583(2minus119901)119895 (Θ (119901 minus 1) 120583119901minus1)119895119875 119871119895119875Θ119895 (2 minus 119901)119895 119895Γ (119895119875)

= 119890minus119871120591minus120582119871minus1 infinsum119895=1

119871119895119875 (119901 minus 1)119895119875 120583(2minus119901)119895+(119901minus1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)

(33)

The term 120583(2minus119901)119895+(119901minus1)119895119875 depends on the 119871 119901 119875 Θ values sowe maximize the summation

119882(119871Θ 119875) = infinsum119895=1

119871119895119875 (119901 minus 1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)= infinsum

119895=1

119911119895119895Γ (119895119875)where 119911 = 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901)

= 119882119895

(34)

Considering119882119895 we have

log119882119895 = 119895 log 119911 minus log 119895 minus log (119875119895)= 119895 log 119911 minus log Γ (119895 + 1) minus log (119875119895) (35)

Using Stirlingrsquos approximation of Gamma functions we have

log Γ (1 + 119895) asymp (1 + 119895) log (1 + 119895) minus (1 + 119895)+ 12 log( 21205871 + 119895)

log Γ (119875119895) asymp 119875119895 log (119875119895) minus 119875119895 + 12 log(2120587119875119895 ) (36)

And hence we have

119882119895 asymp 119895 [log 119911 + (1 + 119875) minus 119875 log119875 minus (1 minus 119875) log 119895]minus log (2120587) minus 12 log119875 minus log 119895 (37)

For 1 lt 119901 lt 2 we have 119875 = (2 minus 119901)(119901 minus 1) gt 0 hencethe logarithms have positive arguments Differentiating withrespect to 119895 we have

120597 log119882119895120597119895 asymp log 119911 minus 1119895 minus log 119895 minus 119875 log (119875119895)asymp log 119911 minus log 119895 minus 119875 log (119875119895) (38)

Journal of Probability and Statistics 7

where 1119895 is ignored for large 119895 Solving for (120597 log119882119895)120597119895 = 0we have

119895max = 1198712minus119901(2 minus 119901)Θ (39)

Substituting 119895max in log119882119895 to find the maximum approxima-tion of119882119895 we have

log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ

(40)

Hence the result follows

It can be observed that 120597119882119895120597119895 is monotonically decreas-ing hence log119882119895 is strictly convex as a function of 119895Therefore 119882119895 decays faster than geometrically on either sideof 119895max [15] Therefore if we are to estimate 119882(119871Θ 119875) by(119871 Θ 119875) = sum119895119906

119895=119895119889119882119895 the approximation error is bounded

by geometric sum

119882(119871Θ 119875) minus (119871 Θ 119875)lt 119882119895119889minus1

1 minus 119903119895119889minus11198971 minus 119903119897 + 119882119895119906+1

11 minus 119903119906 119903119897 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119889 minus 1119903119906 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119906 + 1

(41)

For quick and accurate evaluation of119882(120582 120572 119871 119875) the seriesis summed for only those terms in the series which contributesignificantly to the sum

Generalized linear models extend the standard linearregressionmodels to incorporate nonnormal response distri-butions and possibly nonlinear functions of the mean Theadvantage of GLMs is that the fitting process maximizes thelikelihood for the choice of the distribution for a randomvariable 119910 and the choice is not restricted to normality unlikelinear regression [16]

The exponential dispersion models are the responsedistributions for the generalized linear models Tweedie dis-tributions are members of the exponential dispersion modelsupon which the generalized linear models are based Conse-quently fitting a Tweedie distribution follows the frameworkof fitting a generalized linear model

Lemma 8 In case of a canonical link function the sufficientstatistics for 120573119895 are sum119899

119894=1 119910119894119909119894119895

Proof For 119899 independent observations 119910119894 of the exponentialdispersion model (16) the log-likelihood function is

119871 (120573) = 119899sum119894=1

119871 119894 = 119899sum119894

log119891 (119910119894 120579119894 Θ)= 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ + 119899sum119894

log 119886 (119910119894 Θ) (42)

But 120579119894 = sum119901119895 120573119895119909119894119895 hence

119899sum119894

119910119894120579119894 = 119899sum119894=1

119910119894 119901sum119895

120573119895119909119894119895 = 119901sum119895

120573119895 119899sum119894=1

119910119894119909119894119895 (43)

Proposition 9 Given that 119910119894 is distributed as (16) then itsdistribution depends only on its first two moments namely 120583119894and Var(119910119894)Proof Let 119892(120583119894) be the link function of the GLM such that120578119894 = sum119901

119895=1 120573119895119909119894119895 = 119892(120583119894) The likelihood equations are

120597119871 (120573)120597120573 = 119899sum119894=1

120597119871 119894120597120573119895 forall119895 (44)

Using chain rule we have

120597119871 119894120597120573119895 = 120597119871 119894120597120579119894 120597120579119894120597120583119894 120597120583119894120597120578119894 120597120578119894120597120573119895 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 (45)

Hence

120597119871 (120573)120597120573 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 = 119910119894 minus 120583119894Θ120583119901119894 119909119894119895 120597120583119894120597120578119894 (46)

Since Var(119910119894) = 119881(120583119894) the relationship between themean andvariance characterizes the distribution

Clearly a GLM only requires the first two moments ofthe response 119910119894 hence despite the difficulty of full likelihoodanalysis of Tweedie distribution as it can not be expressedin closed form for 1 lt 119901 lt 2 we can still fit aTweedie distribution family The likelihood is only requiredto estimate 119901 and Θ as well as diagnostic check of the model

Proposition 10 Under the standard regularity conditions forlarge 119899 the maximum likelihood estimator 120573 of 120573 for general-ized linear model is efficient and has an approximate normaldistribution

Proof From the log-likelihood the covariance matrix of thedistribution is the inverse of the information matrix J =E(minus1205972119871(120573)120597120573ℎ120597120573119895)

8 Journal of Probability and Statistics

So

J = E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = E[(1205972119871 119894120597120573ℎ )(1205972119871 119894120597120573119895 )]= [( 119910119894 minus 120583119894

Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]

= 119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 (47)

Hence

E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = 119899sum119894

119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 = (119883119879119882119883) (48)

where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with

Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least

square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)

where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie

densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function

Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by

Θ = 119899sum119894=1

[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)

Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]

4 Data and Model Fitting

41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data

In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm

Rainfall for Balaka for 10 years

020

60

100

Am

ount

2000 2005 2010 20151995Year

Figure 1 Daily rainfall amount for Balaka district

minus8

minus4

0246

log(

varia

nce)

minus2 0 2minus4log(mean)

Figure 2 Variance mean relationship

We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as

log (Variance) = 120572 + 120573 log (mean) (51)

Variace = 119860 lowastmean120573 119860 isin R (52)

Hence the variance can be expressed as some power 120573 isin R

of the mean agreeing with the Tweedie variance functionrequirement

42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling

The canonical link function is given by

log120583119894 = 1198860 + 1198861 sin( 2120587119894365) + 1198862 cos( 2120587119894365) (53)

where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile

log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306

From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model

Journal of Probability and Statistics 9

Table 1 Estimated parameter values

Parameter Estimate Std error 119905 value Pr(gt |119905|)1198860 01653 00473 34930 00005lowastlowastlowast1198861 09049 00572 1581100 lt2e minus16lowastlowastlowast1198862 20326 00622 326720 lt2e minus16lowastlowastlowastΘ 148057 - - -With 119904119894119892119899119894f code 0 lowast lowast lowast

minus12500

minus12000

minus11500

minus11000

minus10500

L

(95 confidence interval)

13 14 15 16 17 1812120585 index

Figure 3 Profile likelihood

The predicted 120583119894 119901 Θ for each day only depends on thedayrsquos conditions so that for each day 119894 we have

120583119894 = exp [01653 + 09049 sin( 2120587119894365)+ 20326 cos( 2120587119894365)]

119901 = 15306Θ = 148057

(54)

From these estimated values we can calculate the parameter(119894 119894 ) from the corresponding formulas above as

119894 = 165716 (exp [01653 + 09049 sin( 2120587119894365)+ 203263 cos( 2120587119894365)])

04694 = 74284 (exp [01653 + 09049 sin( 2120587119894365)

+ 20326 cos( 2120587119894365)])05306

= 08847

(55)

Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally

43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean

Rainfall means

Actual meanPredicted mean

0

5

10

15

20

Mea

n

100 200 3000Days of the year

Figure 4 Actual versus predicted mean

estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894

The goodness of fit is determined by deviance which isdefined as

minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model

]= minus2 [119871 (120583 119910) minus 119871 (119910 119910)]= 2 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1

119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum

119894=1

119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ

(56)

Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance

In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is

Dev119901

= 2 119899sum119894=1

(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)

Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model

44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges

10 Journal of Probability and Statistics

Residuals

2000 2005 2010 20151995Year

0

50

100

150

Resid

uals

Figure 5 Residuals of the model

Normal Q-Q Plot

minus2

0

2

4

Sam

ple q

uant

iles

minus2 0minus4 42Theoretical quantiles

Figure 6 Q-Q plot of the quantile residuals

to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)

So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale

The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile

Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability

density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are

119903119902119894 = Φminus1 (119906119894) (58)

with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ

Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model

Rainfall amount

Days of the year1 44 95 153 218 283 348 413 478 543 608 673

PredictedActual

020

60

100

Am

ount

Figure 7 Simulated rainfall and observed rainfall

Probability of no rainfall

065

075

085

095

Prob

abili

ty

100 200 3000Days of the year

Figure 8 Probability of rainfall occurrence

5 Simulation

Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7

The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons

The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on

However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal

But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8

It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the

Journal of Probability and Statistics 11

Table 2 Data statistics

Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season

In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously

6 Conclusion

A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems

The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives

Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model

Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of

occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support

References

[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo

[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004

[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011

[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007

[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998

[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000

[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005

[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012

[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center

[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003

[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed

12 Journal of Probability and Statistics

dispersion parameterrdquo Accident Analysis amp Prevention vol 38no 4 pp 751ndash766 2006

[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010

[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011

[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014

[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008

[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015

[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 3: ResearchArticle A Poisson-Gamma Model for Zero Inflated … · 2017. 11. 7. · ResearchArticle A Poisson-Gamma Model for Zero Inflated Rainfall Data NelsonChristopherDzupire ,1 PhilipNgare,1,2

Journal of Probability and Statistics 3

22 Mathematical Formulation Let 119873119905 be total number ofrainfall event per day following a Poisson process such that

119875 (119873119905 = 119899) = 119890minus120582 120582119899119899 forall119899 isin N119873119905 = sum

119905ge1

1[119905infin) (119905) (1)

The amount of rainfall is the total sum of the jumps ofeach rainfall event say (119910119894)119894ge1 assumed to be identically andindependently Gamma distributed and independent of thetimes of the occurrence of rainfall

119871 (119905) = 119873119905sum119894=1

119910119894 119873119905 = 1 2 3 0 119873119905 = 0 (2)

such that 119910119894 sim Gamma(120572 119875) is with probability density func-tion

119891 (119910) = 120572119901119910119875minus1119890minus120572119910Γ (119875) 119910 gt 00 119910 le 0 (3)

Lemma 1 The compound Poisson process (2) has a cumulantfunction

120595 (119904 119905 119909) = 120582119905 (119890119872119884(119909) minus 1) (4)

for 0 le 119904 lt 119905 and 119909 isin R where119872119884(119909) is the moment generat-ing function of the Gamma distribution

Proof Themoment generating functionΦ(119904) of 119871(119904) is givenby

119872119871 (119904) = E (119890119904119871(119905))= infinsum

119895=0

E (119890119904119871(119905) | 119873 (119905) = 119895) 119875 (119873 (119905) = 119895)= infinsum

119895=0

E (119890119904(119871(1)+119871(2)+sdotsdotsdot+119871(119895)) | 119873 (119905) = 119895) 119875 (119873 (119905) = 119895)= infinsum

119895=0

E (119890119904(119871(1)+119871(2)+sdotsdotsdot+119871(119895))) 119875 (119873 (119905) = 119895)because of independence of 119871 and 119873(119905)

= infinsum119895=0

(119872119884 (119904))119895 119890minus120582119905 (120582119905)119895119895 = 119890minus120582119905 infinsum119895=0

(119872119884 (119904))119895 (120582119905)119895119895= 119890minus120582119905+119872119884(119904)120582119905

(5)

So the cumulant of 119871 is

ln119872119871 (119904) = 120582 (119872119884 (119904) minus 1) = 120582 [(1 minus 120572119909)minus119875 minus 1] (6)

If we observe the occurrence of rainfall for 119899 periodsthen we have the sequence 119871 119894119899119894=1 which is independent andidentically distributed

If on a particular day there is no rainfall that occurredthen

119875 (119871 = 0) = exp (minus120582) (120582)00 = exp (minus120582) = 1199010 (7)

Therefore the process has a point mass at 0 which impliesthat it is not entirely continuous random variable

Lemma 2 The probability density function of 119871 in (2) is

119891120579 (119871) = exp (minus120582) 120575 (119871) + exp (minus120582 minus 120572119871) 119871minus1119903119875 (V119871119875) (8)

where 1205750(119871) is a dirac function at zero

Proof Let 1199020 = 1minus1199010 be the probability that it rained Hencefor 119871 119894 gt 0 we have119891+120579 (119871) = infinsum

119894=1

1199011198941199020 (120572119894119875119871119894119875minus1 exp (minus120572119871)Γ (119894119901) )where 119901119894 = exp (minus120582) (120582)119894119894

= 11199020 [infinsum119894=1

119901119894 exp (minus120572119871) 120572119894119875119871119894119875minus1Γ (119894119875) ]= 11199020 [exp (minus120572119871) infinsum

119894=1

119901119894 120572119894119875119871119894119875minus1Γ (119894119875) ]= 11199020 [exp (minus120572119871) infinsum

119894=1

(exp (minus120582) (120582)119894119894 ) 120572119894119875119871119894119875minus1Γ (119894119875) ]= exp (minus120582)1199020 [exp (minus120572119871) infinsum

119894=1

((120582)119894119894 ) 120572119894119901119871119894119875minus1Γ (119894119901) ]= exp (minus120582)1199020 exp (minus120572119871) [infinsum

119894=1

(120582)119894 (120572119871)119894119901119871119894Γ (119894119875) ]= 119871minus1 exp (minus120572119871)(exp (120582) minus 1)

infinsum119894=1

120582120572119875119871119875119894Γ (119894119875)

(9)

If we let V = 120582120572119875 and 119903119901(V119871119875) = suminfin119894=1(V119871119875119894Γ(119894119875)) then we

have

119891+120579 (119871) = 119871minus1 exp (minus120572119871)(exp (120582) minus 1) 119903119875 (V119871119875) (10)

4 Journal of Probability and Statistics

We can express the probability density function 119891120579(119871) interms of a Dirac function as119891120579 (119871) = 11990101205750 (119871) + 1199020119891+

120579 (119871)= exp (minus120582) 1205750 (119871)

+ [ 1199020(exp (120582) minus 1)] 119871minus1 exp (minus120572119871) 119903119875 (V119871119875)= exp (minus120582) 1205750 (119871)

+ exp (minus120582 minus 120572119871) 119871minus1119903119875 (V119871119875)

(11)

Consider a random sample of size 119899 of 119871 119894 with the proba-bility density function

119891120579 (119871) = exp (minus120582) 120575 (119871) + exp (minus120582 minus 120572119871) 119871minus1119903119901 (V119871119901) (12)

If we assume that there are 119898 positive values 1198711 1198712 119871119898then there are119872 = 119899 minus 119898 zeros where119898 gt 0

We observe that 119898 sim 119861119894(119899 1 minus exp (minus120582)) and 119901(119898 = 0) =exp (minus119899120582) hence the likelihood function is

119871 = (119899119898)119901119899minus1198980 1199021198980 119898prod119894=1

119891+120579 (119871 119894) (13)

and the log-likelihood for 120579 = (120582 120572 119901) islog119871 (120579 1198711 1198712 119871119899)

= log((119899119898)119901119899minus1198980 1199021198980 119898prod119894=1

119891+120579 (119871 119894))

= log((119899119898) 119890minus120582119899+120582119898 (1 minus 119890minus120582)119898 119898prod119894=1

119890minus120582minus120572119871 119894 1119871 119894

sdot infinsum119895=1

(120582120572119901119871119901119894119895)119895119895Γ (119895119901) ) = log(119899119898) + 120582 (119898 minus 119899) + 119898sdot log (1 minus 119890minus120582) + 119898sum

119894=1

minus 120582 minus 120572119871 119894 minus log 119871 119894

+ log119898sum119894=1

infinsum119895=1

(120582120572119901119871119901119894119895)119895119895Γ (119895119901)

(14)

Now for we have120597 log 119871 (120579 1198711 1198712 119871119899)120597120582= 119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898

+ 1120582119898sum119894=1

infinsum119895=1

119894 120597 log119871 (120579 1198711 1198712 119871119899)120597120582 = 0 997904rArr119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898 + 1120582

119898sum119894=1

infinsum119895=1

119894 = 0

(15)

We can observe from the above evaluation that 120582 can not beexpressed in closed form similar derivation also shows that120572 as well can not be expressed in closed form Therefore wecan only estimate 120582 and 120572 using numerical methods Withersand Nadarajah [13] also observed that the probability densityfunction can not be expressed in closed form and thereforeit is difficult to find the analytic form of the estimators Sowe will express the probability density function in terms ofexponential dispersion models as described below

Definition 3 (see [14]) A probability density function of theform

119891 (119910 120579 Θ) = 119886 (119910 Θ) exp 1Θ [119910120579 minus 119896 (120579)] (16)

for suitable functions 119896() and 119886() is called an exponentialdispersion model

Θ gt 0 is the dispersion parameterThe function 119896(120579) is thecumulant of the exponential dispersion model since Θ = 1then 1198961015840() are the successive cumulants of the distribution [15]The exponential dispersion models were first introduced byFisher in 1922

If we let 119871 119894 = log119891(119910119894 120579119894 Θ) as a contribution of 119910119894 to thelikelihood function 119871 = sum119894 119871 119894 then

119871 119894 = 1Θ [119910119894120579 minus 119896 (120579119894)] + log 119886 (119910 Θ) 120597119871 119894120597120579119894 = 1Θ (119910119894 minus 1198961015840 (120579119894)) 1205972119871 1198941205971205792119894 = minus 1Θ11989610158401015840 (120579119894)

(17)

However we expect that E(120597119871 119894120597120579119894) = 0 and minusE(1205972119871 1198941205971205792119894 ) =E(120597119871 119894120597120579119894)2 so that

E( 1Θ (119910119894 minus 1198961015840 (120579119894))) = 01Θ (E (119910119894) minus 1198961015840 (120579119894)) = 0

E (119910119894) = 1198961015840 (120579119894) (18)

Furthermore

minusE(1205972119871 1198941205971205792119894 ) = E(120597119871 119894120597120579119894 )2

minusE(minus 1Θ11989610158401015840 (120579119894)) = E( 1Θ (119910119894 minus 1198961015840 (120579119894)))2 11989610158401015840 (120579119894)Θ = Var (119910119894)Θ2

Var (119910119894) = Θ11989610158401015840 (120579119894)

(19)

Therefore the mean of the distribution is E[119884] = 120583 = 119889119896(120579)119889120579 and the variance is Var(119884) = Θ(1198892119896(120579)1198891205792)

Journal of Probability and Statistics 5

The relationship 120583 = 119889119896(120579)119889120579 is invertible so that 120579 canbe expressed as a function of 120583 as such we have Var(119884) =Θ119881(120583) where 119881(120583) is called a variance function

Definition 4 The family of exponential dispersion modelswhose variance functions are of the form 119881(120583) = 120583119901 for119901 isin (minusinfin 0]cup[1infin) are called Tweedie family distributions

Examples are as follows for 119901 = 0 then we have a normaldistribution 119901 = 1 and Θ = 1 it is a Poisson distributionand Gamma distribution for 119901 = 2 while when 119901 = 3 it isGaussian inverse distribution Tweedie densities can not beexpressed in closed form (apart from the examples above)but can instead be identified by their cumulants generatingfunctions

From Var(119884) = Θ(1198892119896(120579)1198891205792) then for Tweedie familydistribution we have

Var (119884) = Θ1198892119896 (120579)1198891205792 = Θ119881 (120583) = Θ120583119901 (20)

Hence we can solve for 120583 and 119896(120579) as follows120583 = 119889119896 (120579)119889120579

119889120583119889120579 = 120583119901 997904rArrint 119889120583120583119901 = int119889120579

120579 = 1205831minus1199011 minus 119901 119901 = 1log120583 119901 = 1

(21)

by equating the constants of integration above to zeroFor 119901 = 1 we have 120583 = [(1 minus 119901)120579]1(1minus119901) so that

int119889119896 (120579) = int [(1 minus 119901) 120579]1(1minus119901) 119889120579119896 (120579) = [(1 minus 119901) 120579](2minus119901)(1minus119901)2 minus 119901 = 120583(2minus119901)(1minus119901)2 minus 119901

119901 = 2(22)

Proposition 5 Thecumulant generating function of a Tweediedistribution for 1 lt 119901 lt 2 is

log119872119884 (119905)= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (23)

Proof From (16) the moment generating function is given by

119872119884 (119905) = int exp (119905119910) 119886 (119910 Θ) exp 1Θ [119910120579 minus 119896 (120579)] 119889119910= int119886 (119910Θ) exp 1Θ [119910 (120579 + 119905Θ) minus 119896 (120579)] 119889119910= int119886 (119910Θ) exp(119910 (120579 + 119905Θ) minus 119896 (120579)Θ+ 119896 (120579 + 119905Θ) minus 119896 (120579)Θ )119889119910 = int119886 (119910 Θ)sdot exp(119910 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ+ 119896 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )119889119910 = int119886 (119910Θ)sdot exp(119910 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )sdot exp(119896 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )119889119910= exp(119896 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )int119886 (119910Θ)sdot exp(119910 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )119889119910= exp 1Θ [119896 (120579 + 119905Θ) minus 119896 (120579)]

(24)

Hence cumulant generating function is

log119872119884 (119905) = 1Θ [119896 (120579 + 119905Θ) minus 119896 (120579)] (25)

For 1 lt 119901 lt 2 we substitute 120579 and 119896(120579) to havelog119872119884 (119905)

= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (26)

By comparing the cumulant generating functions inLemma 1 and Proposition 5 the compound Poisson processcan be thought of as Tweedie distribution with parameters(120582 120572 119875) expressed as follows

120582 = 1205832minus119901Θ(2 minus 119901) 120572 = Θ (119901 minus 1) 120583119901minus1119875 = 2 minus 119901119901 minus 1

(27)

The requirement that the Gamma shape parameter 119875 bepositive implies that only Tweedie distributions between 1 lt119901 lt 2 can represent the Poisson-Gamma compound processIn addition for 120582 gt 0 120572 gt 0 implies 120583 gt 0 and Θ gt 0

6 Journal of Probability and Statistics

Proposition 6 Based on Tweedie distribution the probabilityof receiving no rainfall at all is

119875 (119871 = 0) = exp[minus 1205832minus119901Θ(2 minus 119901)] (28)

and the probability of having a rainfall event is

119875 (119871 gt 0)= 119882 (120582 120572 119871 119875) exp[ 119871(1 minus 119901) 120583119901minus1 minus 1205832minus1199012 minus 119901] (29)

where

119882(120582 120572 119871 119875) = infinsum119895=1

120582119895 (120572119871)119895119875 119890minus120582119895Γ (119895119875) (30)

Proof This follows by directly substituting the values of 120582 and120579 119896(120579) into (16)The function 119882(120582 120572 119871 119875) is an example of Wrightrsquos

generalized Bessel function however it can not be expressedin terms of the more common Bessel function To evaluate itthe value of 119895 is determined forwhich the function119882119895 reachesthe maximum [15]

3 Parameter Estimation

We approximate the function 119882(120582 120572 119871 119875) =suminfin119895=1(120582119895(120572119871)119895119875119890minus120582119895Γ(119895119875)) = suminfin

119895=1 119882119895 following theprocedure by [15] where the value of 119895 is determined forwhich119882119895 reaches maximumWe treat 119895 as continuous so that119882119895 is differentiated with respect to 119895 and set the derivative tozero So for 119871 gt 0 we have the followingLemma 7 (see [15]) The log maximum approximation of 119882119895

is given by

log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ

(31)

where 119895max = 1198712minus119901(2 minus 119901)Θ

Proof

119882(120582 120572 119871 119875) = infinsum119895=1

120582119895 (120572119871)119895119875minus1 119890minus120582119895Γ (119895119875)= infinsum

119895=1

120582119895119871119895119875minus1119890minus119871120591119890minus120582119895120591119875119895Γ (119875119895) where 120591 = 1120572 (32)

Substituting the values of 120582 120572 in the above equation we have

119882(120582 120572 119871 119875)= infinsum

119895=1

(1205832minus119901Θ (2 minus 119901))119895 119871119895119875minus1 [Θ (1 minus 119901) 120583119901minus1]119895119875 119890minus119871120591119890minus120582119895Γ (119875119895)= 119890minus119871120591minus120582119871minus1 infinsum

119895=1

120583(2minus119901)119895 (Θ (119901 minus 1) 120583119901minus1)119895119875 119871119895119875Θ119895 (2 minus 119901)119895 119895Γ (119895119875)

= 119890minus119871120591minus120582119871minus1 infinsum119895=1

119871119895119875 (119901 minus 1)119895119875 120583(2minus119901)119895+(119901minus1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)

(33)

The term 120583(2minus119901)119895+(119901minus1)119895119875 depends on the 119871 119901 119875 Θ values sowe maximize the summation

119882(119871Θ 119875) = infinsum119895=1

119871119895119875 (119901 minus 1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)= infinsum

119895=1

119911119895119895Γ (119895119875)where 119911 = 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901)

= 119882119895

(34)

Considering119882119895 we have

log119882119895 = 119895 log 119911 minus log 119895 minus log (119875119895)= 119895 log 119911 minus log Γ (119895 + 1) minus log (119875119895) (35)

Using Stirlingrsquos approximation of Gamma functions we have

log Γ (1 + 119895) asymp (1 + 119895) log (1 + 119895) minus (1 + 119895)+ 12 log( 21205871 + 119895)

log Γ (119875119895) asymp 119875119895 log (119875119895) minus 119875119895 + 12 log(2120587119875119895 ) (36)

And hence we have

119882119895 asymp 119895 [log 119911 + (1 + 119875) minus 119875 log119875 minus (1 minus 119875) log 119895]minus log (2120587) minus 12 log119875 minus log 119895 (37)

For 1 lt 119901 lt 2 we have 119875 = (2 minus 119901)(119901 minus 1) gt 0 hencethe logarithms have positive arguments Differentiating withrespect to 119895 we have

120597 log119882119895120597119895 asymp log 119911 minus 1119895 minus log 119895 minus 119875 log (119875119895)asymp log 119911 minus log 119895 minus 119875 log (119875119895) (38)

Journal of Probability and Statistics 7

where 1119895 is ignored for large 119895 Solving for (120597 log119882119895)120597119895 = 0we have

119895max = 1198712minus119901(2 minus 119901)Θ (39)

Substituting 119895max in log119882119895 to find the maximum approxima-tion of119882119895 we have

log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ

(40)

Hence the result follows

It can be observed that 120597119882119895120597119895 is monotonically decreas-ing hence log119882119895 is strictly convex as a function of 119895Therefore 119882119895 decays faster than geometrically on either sideof 119895max [15] Therefore if we are to estimate 119882(119871Θ 119875) by(119871 Θ 119875) = sum119895119906

119895=119895119889119882119895 the approximation error is bounded

by geometric sum

119882(119871Θ 119875) minus (119871 Θ 119875)lt 119882119895119889minus1

1 minus 119903119895119889minus11198971 minus 119903119897 + 119882119895119906+1

11 minus 119903119906 119903119897 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119889 minus 1119903119906 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119906 + 1

(41)

For quick and accurate evaluation of119882(120582 120572 119871 119875) the seriesis summed for only those terms in the series which contributesignificantly to the sum

Generalized linear models extend the standard linearregressionmodels to incorporate nonnormal response distri-butions and possibly nonlinear functions of the mean Theadvantage of GLMs is that the fitting process maximizes thelikelihood for the choice of the distribution for a randomvariable 119910 and the choice is not restricted to normality unlikelinear regression [16]

The exponential dispersion models are the responsedistributions for the generalized linear models Tweedie dis-tributions are members of the exponential dispersion modelsupon which the generalized linear models are based Conse-quently fitting a Tweedie distribution follows the frameworkof fitting a generalized linear model

Lemma 8 In case of a canonical link function the sufficientstatistics for 120573119895 are sum119899

119894=1 119910119894119909119894119895

Proof For 119899 independent observations 119910119894 of the exponentialdispersion model (16) the log-likelihood function is

119871 (120573) = 119899sum119894=1

119871 119894 = 119899sum119894

log119891 (119910119894 120579119894 Θ)= 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ + 119899sum119894

log 119886 (119910119894 Θ) (42)

But 120579119894 = sum119901119895 120573119895119909119894119895 hence

119899sum119894

119910119894120579119894 = 119899sum119894=1

119910119894 119901sum119895

120573119895119909119894119895 = 119901sum119895

120573119895 119899sum119894=1

119910119894119909119894119895 (43)

Proposition 9 Given that 119910119894 is distributed as (16) then itsdistribution depends only on its first two moments namely 120583119894and Var(119910119894)Proof Let 119892(120583119894) be the link function of the GLM such that120578119894 = sum119901

119895=1 120573119895119909119894119895 = 119892(120583119894) The likelihood equations are

120597119871 (120573)120597120573 = 119899sum119894=1

120597119871 119894120597120573119895 forall119895 (44)

Using chain rule we have

120597119871 119894120597120573119895 = 120597119871 119894120597120579119894 120597120579119894120597120583119894 120597120583119894120597120578119894 120597120578119894120597120573119895 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 (45)

Hence

120597119871 (120573)120597120573 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 = 119910119894 minus 120583119894Θ120583119901119894 119909119894119895 120597120583119894120597120578119894 (46)

Since Var(119910119894) = 119881(120583119894) the relationship between themean andvariance characterizes the distribution

Clearly a GLM only requires the first two moments ofthe response 119910119894 hence despite the difficulty of full likelihoodanalysis of Tweedie distribution as it can not be expressedin closed form for 1 lt 119901 lt 2 we can still fit aTweedie distribution family The likelihood is only requiredto estimate 119901 and Θ as well as diagnostic check of the model

Proposition 10 Under the standard regularity conditions forlarge 119899 the maximum likelihood estimator 120573 of 120573 for general-ized linear model is efficient and has an approximate normaldistribution

Proof From the log-likelihood the covariance matrix of thedistribution is the inverse of the information matrix J =E(minus1205972119871(120573)120597120573ℎ120597120573119895)

8 Journal of Probability and Statistics

So

J = E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = E[(1205972119871 119894120597120573ℎ )(1205972119871 119894120597120573119895 )]= [( 119910119894 minus 120583119894

Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]

= 119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 (47)

Hence

E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = 119899sum119894

119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 = (119883119879119882119883) (48)

where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with

Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least

square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)

where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie

densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function

Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by

Θ = 119899sum119894=1

[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)

Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]

4 Data and Model Fitting

41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data

In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm

Rainfall for Balaka for 10 years

020

60

100

Am

ount

2000 2005 2010 20151995Year

Figure 1 Daily rainfall amount for Balaka district

minus8

minus4

0246

log(

varia

nce)

minus2 0 2minus4log(mean)

Figure 2 Variance mean relationship

We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as

log (Variance) = 120572 + 120573 log (mean) (51)

Variace = 119860 lowastmean120573 119860 isin R (52)

Hence the variance can be expressed as some power 120573 isin R

of the mean agreeing with the Tweedie variance functionrequirement

42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling

The canonical link function is given by

log120583119894 = 1198860 + 1198861 sin( 2120587119894365) + 1198862 cos( 2120587119894365) (53)

where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile

log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306

From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model

Journal of Probability and Statistics 9

Table 1 Estimated parameter values

Parameter Estimate Std error 119905 value Pr(gt |119905|)1198860 01653 00473 34930 00005lowastlowastlowast1198861 09049 00572 1581100 lt2e minus16lowastlowastlowast1198862 20326 00622 326720 lt2e minus16lowastlowastlowastΘ 148057 - - -With 119904119894119892119899119894f code 0 lowast lowast lowast

minus12500

minus12000

minus11500

minus11000

minus10500

L

(95 confidence interval)

13 14 15 16 17 1812120585 index

Figure 3 Profile likelihood

The predicted 120583119894 119901 Θ for each day only depends on thedayrsquos conditions so that for each day 119894 we have

120583119894 = exp [01653 + 09049 sin( 2120587119894365)+ 20326 cos( 2120587119894365)]

119901 = 15306Θ = 148057

(54)

From these estimated values we can calculate the parameter(119894 119894 ) from the corresponding formulas above as

119894 = 165716 (exp [01653 + 09049 sin( 2120587119894365)+ 203263 cos( 2120587119894365)])

04694 = 74284 (exp [01653 + 09049 sin( 2120587119894365)

+ 20326 cos( 2120587119894365)])05306

= 08847

(55)

Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally

43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean

Rainfall means

Actual meanPredicted mean

0

5

10

15

20

Mea

n

100 200 3000Days of the year

Figure 4 Actual versus predicted mean

estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894

The goodness of fit is determined by deviance which isdefined as

minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model

]= minus2 [119871 (120583 119910) minus 119871 (119910 119910)]= 2 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1

119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum

119894=1

119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ

(56)

Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance

In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is

Dev119901

= 2 119899sum119894=1

(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)

Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model

44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges

10 Journal of Probability and Statistics

Residuals

2000 2005 2010 20151995Year

0

50

100

150

Resid

uals

Figure 5 Residuals of the model

Normal Q-Q Plot

minus2

0

2

4

Sam

ple q

uant

iles

minus2 0minus4 42Theoretical quantiles

Figure 6 Q-Q plot of the quantile residuals

to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)

So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale

The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile

Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability

density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are

119903119902119894 = Φminus1 (119906119894) (58)

with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ

Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model

Rainfall amount

Days of the year1 44 95 153 218 283 348 413 478 543 608 673

PredictedActual

020

60

100

Am

ount

Figure 7 Simulated rainfall and observed rainfall

Probability of no rainfall

065

075

085

095

Prob

abili

ty

100 200 3000Days of the year

Figure 8 Probability of rainfall occurrence

5 Simulation

Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7

The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons

The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on

However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal

But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8

It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the

Journal of Probability and Statistics 11

Table 2 Data statistics

Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season

In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously

6 Conclusion

A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems

The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives

Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model

Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of

occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support

References

[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo

[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004

[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011

[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007

[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998

[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000

[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005

[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012

[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center

[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003

[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed

12 Journal of Probability and Statistics

dispersion parameterrdquo Accident Analysis amp Prevention vol 38no 4 pp 751ndash766 2006

[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010

[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011

[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014

[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008

[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015

[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 4: ResearchArticle A Poisson-Gamma Model for Zero Inflated … · 2017. 11. 7. · ResearchArticle A Poisson-Gamma Model for Zero Inflated Rainfall Data NelsonChristopherDzupire ,1 PhilipNgare,1,2

4 Journal of Probability and Statistics

We can express the probability density function 119891120579(119871) interms of a Dirac function as119891120579 (119871) = 11990101205750 (119871) + 1199020119891+

120579 (119871)= exp (minus120582) 1205750 (119871)

+ [ 1199020(exp (120582) minus 1)] 119871minus1 exp (minus120572119871) 119903119875 (V119871119875)= exp (minus120582) 1205750 (119871)

+ exp (minus120582 minus 120572119871) 119871minus1119903119875 (V119871119875)

(11)

Consider a random sample of size 119899 of 119871 119894 with the proba-bility density function

119891120579 (119871) = exp (minus120582) 120575 (119871) + exp (minus120582 minus 120572119871) 119871minus1119903119901 (V119871119901) (12)

If we assume that there are 119898 positive values 1198711 1198712 119871119898then there are119872 = 119899 minus 119898 zeros where119898 gt 0

We observe that 119898 sim 119861119894(119899 1 minus exp (minus120582)) and 119901(119898 = 0) =exp (minus119899120582) hence the likelihood function is

119871 = (119899119898)119901119899minus1198980 1199021198980 119898prod119894=1

119891+120579 (119871 119894) (13)

and the log-likelihood for 120579 = (120582 120572 119901) islog119871 (120579 1198711 1198712 119871119899)

= log((119899119898)119901119899minus1198980 1199021198980 119898prod119894=1

119891+120579 (119871 119894))

= log((119899119898) 119890minus120582119899+120582119898 (1 minus 119890minus120582)119898 119898prod119894=1

119890minus120582minus120572119871 119894 1119871 119894

sdot infinsum119895=1

(120582120572119901119871119901119894119895)119895119895Γ (119895119901) ) = log(119899119898) + 120582 (119898 minus 119899) + 119898sdot log (1 minus 119890minus120582) + 119898sum

119894=1

minus 120582 minus 120572119871 119894 minus log 119871 119894

+ log119898sum119894=1

infinsum119895=1

(120582120572119901119871119901119894119895)119895119895Γ (119895119901)

(14)

Now for we have120597 log 119871 (120579 1198711 1198712 119871119899)120597120582= 119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898

+ 1120582119898sum119894=1

infinsum119895=1

119894 120597 log119871 (120579 1198711 1198712 119871119899)120597120582 = 0 997904rArr119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898 + 1120582

119898sum119894=1

infinsum119895=1

119894 = 0

(15)

We can observe from the above evaluation that 120582 can not beexpressed in closed form similar derivation also shows that120572 as well can not be expressed in closed form Therefore wecan only estimate 120582 and 120572 using numerical methods Withersand Nadarajah [13] also observed that the probability densityfunction can not be expressed in closed form and thereforeit is difficult to find the analytic form of the estimators Sowe will express the probability density function in terms ofexponential dispersion models as described below

Definition 3 (see [14]) A probability density function of theform

119891 (119910 120579 Θ) = 119886 (119910 Θ) exp 1Θ [119910120579 minus 119896 (120579)] (16)

for suitable functions 119896() and 119886() is called an exponentialdispersion model

Θ gt 0 is the dispersion parameterThe function 119896(120579) is thecumulant of the exponential dispersion model since Θ = 1then 1198961015840() are the successive cumulants of the distribution [15]The exponential dispersion models were first introduced byFisher in 1922

If we let 119871 119894 = log119891(119910119894 120579119894 Θ) as a contribution of 119910119894 to thelikelihood function 119871 = sum119894 119871 119894 then

119871 119894 = 1Θ [119910119894120579 minus 119896 (120579119894)] + log 119886 (119910 Θ) 120597119871 119894120597120579119894 = 1Θ (119910119894 minus 1198961015840 (120579119894)) 1205972119871 1198941205971205792119894 = minus 1Θ11989610158401015840 (120579119894)

(17)

However we expect that E(120597119871 119894120597120579119894) = 0 and minusE(1205972119871 1198941205971205792119894 ) =E(120597119871 119894120597120579119894)2 so that

E( 1Θ (119910119894 minus 1198961015840 (120579119894))) = 01Θ (E (119910119894) minus 1198961015840 (120579119894)) = 0

E (119910119894) = 1198961015840 (120579119894) (18)

Furthermore

minusE(1205972119871 1198941205971205792119894 ) = E(120597119871 119894120597120579119894 )2

minusE(minus 1Θ11989610158401015840 (120579119894)) = E( 1Θ (119910119894 minus 1198961015840 (120579119894)))2 11989610158401015840 (120579119894)Θ = Var (119910119894)Θ2

Var (119910119894) = Θ11989610158401015840 (120579119894)

(19)

Therefore the mean of the distribution is E[119884] = 120583 = 119889119896(120579)119889120579 and the variance is Var(119884) = Θ(1198892119896(120579)1198891205792)

Journal of Probability and Statistics 5

The relationship 120583 = 119889119896(120579)119889120579 is invertible so that 120579 canbe expressed as a function of 120583 as such we have Var(119884) =Θ119881(120583) where 119881(120583) is called a variance function

Definition 4 The family of exponential dispersion modelswhose variance functions are of the form 119881(120583) = 120583119901 for119901 isin (minusinfin 0]cup[1infin) are called Tweedie family distributions

Examples are as follows for 119901 = 0 then we have a normaldistribution 119901 = 1 and Θ = 1 it is a Poisson distributionand Gamma distribution for 119901 = 2 while when 119901 = 3 it isGaussian inverse distribution Tweedie densities can not beexpressed in closed form (apart from the examples above)but can instead be identified by their cumulants generatingfunctions

From Var(119884) = Θ(1198892119896(120579)1198891205792) then for Tweedie familydistribution we have

Var (119884) = Θ1198892119896 (120579)1198891205792 = Θ119881 (120583) = Θ120583119901 (20)

Hence we can solve for 120583 and 119896(120579) as follows120583 = 119889119896 (120579)119889120579

119889120583119889120579 = 120583119901 997904rArrint 119889120583120583119901 = int119889120579

120579 = 1205831minus1199011 minus 119901 119901 = 1log120583 119901 = 1

(21)

by equating the constants of integration above to zeroFor 119901 = 1 we have 120583 = [(1 minus 119901)120579]1(1minus119901) so that

int119889119896 (120579) = int [(1 minus 119901) 120579]1(1minus119901) 119889120579119896 (120579) = [(1 minus 119901) 120579](2minus119901)(1minus119901)2 minus 119901 = 120583(2minus119901)(1minus119901)2 minus 119901

119901 = 2(22)

Proposition 5 Thecumulant generating function of a Tweediedistribution for 1 lt 119901 lt 2 is

log119872119884 (119905)= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (23)

Proof From (16) the moment generating function is given by

119872119884 (119905) = int exp (119905119910) 119886 (119910 Θ) exp 1Θ [119910120579 minus 119896 (120579)] 119889119910= int119886 (119910Θ) exp 1Θ [119910 (120579 + 119905Θ) minus 119896 (120579)] 119889119910= int119886 (119910Θ) exp(119910 (120579 + 119905Θ) minus 119896 (120579)Θ+ 119896 (120579 + 119905Θ) minus 119896 (120579)Θ )119889119910 = int119886 (119910 Θ)sdot exp(119910 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ+ 119896 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )119889119910 = int119886 (119910Θ)sdot exp(119910 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )sdot exp(119896 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )119889119910= exp(119896 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )int119886 (119910Θ)sdot exp(119910 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )119889119910= exp 1Θ [119896 (120579 + 119905Θ) minus 119896 (120579)]

(24)

Hence cumulant generating function is

log119872119884 (119905) = 1Θ [119896 (120579 + 119905Θ) minus 119896 (120579)] (25)

For 1 lt 119901 lt 2 we substitute 120579 and 119896(120579) to havelog119872119884 (119905)

= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (26)

By comparing the cumulant generating functions inLemma 1 and Proposition 5 the compound Poisson processcan be thought of as Tweedie distribution with parameters(120582 120572 119875) expressed as follows

120582 = 1205832minus119901Θ(2 minus 119901) 120572 = Θ (119901 minus 1) 120583119901minus1119875 = 2 minus 119901119901 minus 1

(27)

The requirement that the Gamma shape parameter 119875 bepositive implies that only Tweedie distributions between 1 lt119901 lt 2 can represent the Poisson-Gamma compound processIn addition for 120582 gt 0 120572 gt 0 implies 120583 gt 0 and Θ gt 0

6 Journal of Probability and Statistics

Proposition 6 Based on Tweedie distribution the probabilityof receiving no rainfall at all is

119875 (119871 = 0) = exp[minus 1205832minus119901Θ(2 minus 119901)] (28)

and the probability of having a rainfall event is

119875 (119871 gt 0)= 119882 (120582 120572 119871 119875) exp[ 119871(1 minus 119901) 120583119901minus1 minus 1205832minus1199012 minus 119901] (29)

where

119882(120582 120572 119871 119875) = infinsum119895=1

120582119895 (120572119871)119895119875 119890minus120582119895Γ (119895119875) (30)

Proof This follows by directly substituting the values of 120582 and120579 119896(120579) into (16)The function 119882(120582 120572 119871 119875) is an example of Wrightrsquos

generalized Bessel function however it can not be expressedin terms of the more common Bessel function To evaluate itthe value of 119895 is determined forwhich the function119882119895 reachesthe maximum [15]

3 Parameter Estimation

We approximate the function 119882(120582 120572 119871 119875) =suminfin119895=1(120582119895(120572119871)119895119875119890minus120582119895Γ(119895119875)) = suminfin

119895=1 119882119895 following theprocedure by [15] where the value of 119895 is determined forwhich119882119895 reaches maximumWe treat 119895 as continuous so that119882119895 is differentiated with respect to 119895 and set the derivative tozero So for 119871 gt 0 we have the followingLemma 7 (see [15]) The log maximum approximation of 119882119895

is given by

log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ

(31)

where 119895max = 1198712minus119901(2 minus 119901)Θ

Proof

119882(120582 120572 119871 119875) = infinsum119895=1

120582119895 (120572119871)119895119875minus1 119890minus120582119895Γ (119895119875)= infinsum

119895=1

120582119895119871119895119875minus1119890minus119871120591119890minus120582119895120591119875119895Γ (119875119895) where 120591 = 1120572 (32)

Substituting the values of 120582 120572 in the above equation we have

119882(120582 120572 119871 119875)= infinsum

119895=1

(1205832minus119901Θ (2 minus 119901))119895 119871119895119875minus1 [Θ (1 minus 119901) 120583119901minus1]119895119875 119890minus119871120591119890minus120582119895Γ (119875119895)= 119890minus119871120591minus120582119871minus1 infinsum

119895=1

120583(2minus119901)119895 (Θ (119901 minus 1) 120583119901minus1)119895119875 119871119895119875Θ119895 (2 minus 119901)119895 119895Γ (119895119875)

= 119890minus119871120591minus120582119871minus1 infinsum119895=1

119871119895119875 (119901 minus 1)119895119875 120583(2minus119901)119895+(119901minus1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)

(33)

The term 120583(2minus119901)119895+(119901minus1)119895119875 depends on the 119871 119901 119875 Θ values sowe maximize the summation

119882(119871Θ 119875) = infinsum119895=1

119871119895119875 (119901 minus 1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)= infinsum

119895=1

119911119895119895Γ (119895119875)where 119911 = 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901)

= 119882119895

(34)

Considering119882119895 we have

log119882119895 = 119895 log 119911 minus log 119895 minus log (119875119895)= 119895 log 119911 minus log Γ (119895 + 1) minus log (119875119895) (35)

Using Stirlingrsquos approximation of Gamma functions we have

log Γ (1 + 119895) asymp (1 + 119895) log (1 + 119895) minus (1 + 119895)+ 12 log( 21205871 + 119895)

log Γ (119875119895) asymp 119875119895 log (119875119895) minus 119875119895 + 12 log(2120587119875119895 ) (36)

And hence we have

119882119895 asymp 119895 [log 119911 + (1 + 119875) minus 119875 log119875 minus (1 minus 119875) log 119895]minus log (2120587) minus 12 log119875 minus log 119895 (37)

For 1 lt 119901 lt 2 we have 119875 = (2 minus 119901)(119901 minus 1) gt 0 hencethe logarithms have positive arguments Differentiating withrespect to 119895 we have

120597 log119882119895120597119895 asymp log 119911 minus 1119895 minus log 119895 minus 119875 log (119875119895)asymp log 119911 minus log 119895 minus 119875 log (119875119895) (38)

Journal of Probability and Statistics 7

where 1119895 is ignored for large 119895 Solving for (120597 log119882119895)120597119895 = 0we have

119895max = 1198712minus119901(2 minus 119901)Θ (39)

Substituting 119895max in log119882119895 to find the maximum approxima-tion of119882119895 we have

log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ

(40)

Hence the result follows

It can be observed that 120597119882119895120597119895 is monotonically decreas-ing hence log119882119895 is strictly convex as a function of 119895Therefore 119882119895 decays faster than geometrically on either sideof 119895max [15] Therefore if we are to estimate 119882(119871Θ 119875) by(119871 Θ 119875) = sum119895119906

119895=119895119889119882119895 the approximation error is bounded

by geometric sum

119882(119871Θ 119875) minus (119871 Θ 119875)lt 119882119895119889minus1

1 minus 119903119895119889minus11198971 minus 119903119897 + 119882119895119906+1

11 minus 119903119906 119903119897 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119889 minus 1119903119906 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119906 + 1

(41)

For quick and accurate evaluation of119882(120582 120572 119871 119875) the seriesis summed for only those terms in the series which contributesignificantly to the sum

Generalized linear models extend the standard linearregressionmodels to incorporate nonnormal response distri-butions and possibly nonlinear functions of the mean Theadvantage of GLMs is that the fitting process maximizes thelikelihood for the choice of the distribution for a randomvariable 119910 and the choice is not restricted to normality unlikelinear regression [16]

The exponential dispersion models are the responsedistributions for the generalized linear models Tweedie dis-tributions are members of the exponential dispersion modelsupon which the generalized linear models are based Conse-quently fitting a Tweedie distribution follows the frameworkof fitting a generalized linear model

Lemma 8 In case of a canonical link function the sufficientstatistics for 120573119895 are sum119899

119894=1 119910119894119909119894119895

Proof For 119899 independent observations 119910119894 of the exponentialdispersion model (16) the log-likelihood function is

119871 (120573) = 119899sum119894=1

119871 119894 = 119899sum119894

log119891 (119910119894 120579119894 Θ)= 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ + 119899sum119894

log 119886 (119910119894 Θ) (42)

But 120579119894 = sum119901119895 120573119895119909119894119895 hence

119899sum119894

119910119894120579119894 = 119899sum119894=1

119910119894 119901sum119895

120573119895119909119894119895 = 119901sum119895

120573119895 119899sum119894=1

119910119894119909119894119895 (43)

Proposition 9 Given that 119910119894 is distributed as (16) then itsdistribution depends only on its first two moments namely 120583119894and Var(119910119894)Proof Let 119892(120583119894) be the link function of the GLM such that120578119894 = sum119901

119895=1 120573119895119909119894119895 = 119892(120583119894) The likelihood equations are

120597119871 (120573)120597120573 = 119899sum119894=1

120597119871 119894120597120573119895 forall119895 (44)

Using chain rule we have

120597119871 119894120597120573119895 = 120597119871 119894120597120579119894 120597120579119894120597120583119894 120597120583119894120597120578119894 120597120578119894120597120573119895 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 (45)

Hence

120597119871 (120573)120597120573 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 = 119910119894 minus 120583119894Θ120583119901119894 119909119894119895 120597120583119894120597120578119894 (46)

Since Var(119910119894) = 119881(120583119894) the relationship between themean andvariance characterizes the distribution

Clearly a GLM only requires the first two moments ofthe response 119910119894 hence despite the difficulty of full likelihoodanalysis of Tweedie distribution as it can not be expressedin closed form for 1 lt 119901 lt 2 we can still fit aTweedie distribution family The likelihood is only requiredto estimate 119901 and Θ as well as diagnostic check of the model

Proposition 10 Under the standard regularity conditions forlarge 119899 the maximum likelihood estimator 120573 of 120573 for general-ized linear model is efficient and has an approximate normaldistribution

Proof From the log-likelihood the covariance matrix of thedistribution is the inverse of the information matrix J =E(minus1205972119871(120573)120597120573ℎ120597120573119895)

8 Journal of Probability and Statistics

So

J = E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = E[(1205972119871 119894120597120573ℎ )(1205972119871 119894120597120573119895 )]= [( 119910119894 minus 120583119894

Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]

= 119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 (47)

Hence

E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = 119899sum119894

119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 = (119883119879119882119883) (48)

where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with

Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least

square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)

where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie

densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function

Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by

Θ = 119899sum119894=1

[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)

Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]

4 Data and Model Fitting

41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data

In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm

Rainfall for Balaka for 10 years

020

60

100

Am

ount

2000 2005 2010 20151995Year

Figure 1 Daily rainfall amount for Balaka district

minus8

minus4

0246

log(

varia

nce)

minus2 0 2minus4log(mean)

Figure 2 Variance mean relationship

We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as

log (Variance) = 120572 + 120573 log (mean) (51)

Variace = 119860 lowastmean120573 119860 isin R (52)

Hence the variance can be expressed as some power 120573 isin R

of the mean agreeing with the Tweedie variance functionrequirement

42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling

The canonical link function is given by

log120583119894 = 1198860 + 1198861 sin( 2120587119894365) + 1198862 cos( 2120587119894365) (53)

where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile

log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306

From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model

Journal of Probability and Statistics 9

Table 1 Estimated parameter values

Parameter Estimate Std error 119905 value Pr(gt |119905|)1198860 01653 00473 34930 00005lowastlowastlowast1198861 09049 00572 1581100 lt2e minus16lowastlowastlowast1198862 20326 00622 326720 lt2e minus16lowastlowastlowastΘ 148057 - - -With 119904119894119892119899119894f code 0 lowast lowast lowast

minus12500

minus12000

minus11500

minus11000

minus10500

L

(95 confidence interval)

13 14 15 16 17 1812120585 index

Figure 3 Profile likelihood

The predicted 120583119894 119901 Θ for each day only depends on thedayrsquos conditions so that for each day 119894 we have

120583119894 = exp [01653 + 09049 sin( 2120587119894365)+ 20326 cos( 2120587119894365)]

119901 = 15306Θ = 148057

(54)

From these estimated values we can calculate the parameter(119894 119894 ) from the corresponding formulas above as

119894 = 165716 (exp [01653 + 09049 sin( 2120587119894365)+ 203263 cos( 2120587119894365)])

04694 = 74284 (exp [01653 + 09049 sin( 2120587119894365)

+ 20326 cos( 2120587119894365)])05306

= 08847

(55)

Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally

43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean

Rainfall means

Actual meanPredicted mean

0

5

10

15

20

Mea

n

100 200 3000Days of the year

Figure 4 Actual versus predicted mean

estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894

The goodness of fit is determined by deviance which isdefined as

minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model

]= minus2 [119871 (120583 119910) minus 119871 (119910 119910)]= 2 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1

119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum

119894=1

119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ

(56)

Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance

In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is

Dev119901

= 2 119899sum119894=1

(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)

Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model

44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges

10 Journal of Probability and Statistics

Residuals

2000 2005 2010 20151995Year

0

50

100

150

Resid

uals

Figure 5 Residuals of the model

Normal Q-Q Plot

minus2

0

2

4

Sam

ple q

uant

iles

minus2 0minus4 42Theoretical quantiles

Figure 6 Q-Q plot of the quantile residuals

to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)

So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale

The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile

Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability

density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are

119903119902119894 = Φminus1 (119906119894) (58)

with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ

Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model

Rainfall amount

Days of the year1 44 95 153 218 283 348 413 478 543 608 673

PredictedActual

020

60

100

Am

ount

Figure 7 Simulated rainfall and observed rainfall

Probability of no rainfall

065

075

085

095

Prob

abili

ty

100 200 3000Days of the year

Figure 8 Probability of rainfall occurrence

5 Simulation

Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7

The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons

The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on

However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal

But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8

It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the

Journal of Probability and Statistics 11

Table 2 Data statistics

Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season

In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously

6 Conclusion

A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems

The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives

Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model

Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of

occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support

References

[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo

[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004

[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011

[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007

[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998

[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000

[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005

[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012

[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center

[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003

[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed

12 Journal of Probability and Statistics

dispersion parameterrdquo Accident Analysis amp Prevention vol 38no 4 pp 751ndash766 2006

[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010

[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011

[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014

[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008

[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015

[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 5: ResearchArticle A Poisson-Gamma Model for Zero Inflated … · 2017. 11. 7. · ResearchArticle A Poisson-Gamma Model for Zero Inflated Rainfall Data NelsonChristopherDzupire ,1 PhilipNgare,1,2

Journal of Probability and Statistics 5

The relationship 120583 = 119889119896(120579)119889120579 is invertible so that 120579 canbe expressed as a function of 120583 as such we have Var(119884) =Θ119881(120583) where 119881(120583) is called a variance function

Definition 4 The family of exponential dispersion modelswhose variance functions are of the form 119881(120583) = 120583119901 for119901 isin (minusinfin 0]cup[1infin) are called Tweedie family distributions

Examples are as follows for 119901 = 0 then we have a normaldistribution 119901 = 1 and Θ = 1 it is a Poisson distributionand Gamma distribution for 119901 = 2 while when 119901 = 3 it isGaussian inverse distribution Tweedie densities can not beexpressed in closed form (apart from the examples above)but can instead be identified by their cumulants generatingfunctions

From Var(119884) = Θ(1198892119896(120579)1198891205792) then for Tweedie familydistribution we have

Var (119884) = Θ1198892119896 (120579)1198891205792 = Θ119881 (120583) = Θ120583119901 (20)

Hence we can solve for 120583 and 119896(120579) as follows120583 = 119889119896 (120579)119889120579

119889120583119889120579 = 120583119901 997904rArrint 119889120583120583119901 = int119889120579

120579 = 1205831minus1199011 minus 119901 119901 = 1log120583 119901 = 1

(21)

by equating the constants of integration above to zeroFor 119901 = 1 we have 120583 = [(1 minus 119901)120579]1(1minus119901) so that

int119889119896 (120579) = int [(1 minus 119901) 120579]1(1minus119901) 119889120579119896 (120579) = [(1 minus 119901) 120579](2minus119901)(1minus119901)2 minus 119901 = 120583(2minus119901)(1minus119901)2 minus 119901

119901 = 2(22)

Proposition 5 Thecumulant generating function of a Tweediedistribution for 1 lt 119901 lt 2 is

log119872119884 (119905)= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (23)

Proof From (16) the moment generating function is given by

119872119884 (119905) = int exp (119905119910) 119886 (119910 Θ) exp 1Θ [119910120579 minus 119896 (120579)] 119889119910= int119886 (119910Θ) exp 1Θ [119910 (120579 + 119905Θ) minus 119896 (120579)] 119889119910= int119886 (119910Θ) exp(119910 (120579 + 119905Θ) minus 119896 (120579)Θ+ 119896 (120579 + 119905Θ) minus 119896 (120579)Θ )119889119910 = int119886 (119910 Θ)sdot exp(119910 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ+ 119896 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )119889119910 = int119886 (119910Θ)sdot exp(119910 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )sdot exp(119896 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )119889119910= exp(119896 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )int119886 (119910Θ)sdot exp(119910 (120579 + 119905Θ) minus 119896 (120579 + 119905Θ)Θ )119889119910= exp 1Θ [119896 (120579 + 119905Θ) minus 119896 (120579)]

(24)

Hence cumulant generating function is

log119872119884 (119905) = 1Θ [119896 (120579 + 119905Θ) minus 119896 (120579)] (25)

For 1 lt 119901 lt 2 we substitute 120579 and 119896(120579) to havelog119872119884 (119905)

= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (26)

By comparing the cumulant generating functions inLemma 1 and Proposition 5 the compound Poisson processcan be thought of as Tweedie distribution with parameters(120582 120572 119875) expressed as follows

120582 = 1205832minus119901Θ(2 minus 119901) 120572 = Θ (119901 minus 1) 120583119901minus1119875 = 2 minus 119901119901 minus 1

(27)

The requirement that the Gamma shape parameter 119875 bepositive implies that only Tweedie distributions between 1 lt119901 lt 2 can represent the Poisson-Gamma compound processIn addition for 120582 gt 0 120572 gt 0 implies 120583 gt 0 and Θ gt 0

6 Journal of Probability and Statistics

Proposition 6 Based on Tweedie distribution the probabilityof receiving no rainfall at all is

119875 (119871 = 0) = exp[minus 1205832minus119901Θ(2 minus 119901)] (28)

and the probability of having a rainfall event is

119875 (119871 gt 0)= 119882 (120582 120572 119871 119875) exp[ 119871(1 minus 119901) 120583119901minus1 minus 1205832minus1199012 minus 119901] (29)

where

119882(120582 120572 119871 119875) = infinsum119895=1

120582119895 (120572119871)119895119875 119890minus120582119895Γ (119895119875) (30)

Proof This follows by directly substituting the values of 120582 and120579 119896(120579) into (16)The function 119882(120582 120572 119871 119875) is an example of Wrightrsquos

generalized Bessel function however it can not be expressedin terms of the more common Bessel function To evaluate itthe value of 119895 is determined forwhich the function119882119895 reachesthe maximum [15]

3 Parameter Estimation

We approximate the function 119882(120582 120572 119871 119875) =suminfin119895=1(120582119895(120572119871)119895119875119890minus120582119895Γ(119895119875)) = suminfin

119895=1 119882119895 following theprocedure by [15] where the value of 119895 is determined forwhich119882119895 reaches maximumWe treat 119895 as continuous so that119882119895 is differentiated with respect to 119895 and set the derivative tozero So for 119871 gt 0 we have the followingLemma 7 (see [15]) The log maximum approximation of 119882119895

is given by

log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ

(31)

where 119895max = 1198712minus119901(2 minus 119901)Θ

Proof

119882(120582 120572 119871 119875) = infinsum119895=1

120582119895 (120572119871)119895119875minus1 119890minus120582119895Γ (119895119875)= infinsum

119895=1

120582119895119871119895119875minus1119890minus119871120591119890minus120582119895120591119875119895Γ (119875119895) where 120591 = 1120572 (32)

Substituting the values of 120582 120572 in the above equation we have

119882(120582 120572 119871 119875)= infinsum

119895=1

(1205832minus119901Θ (2 minus 119901))119895 119871119895119875minus1 [Θ (1 minus 119901) 120583119901minus1]119895119875 119890minus119871120591119890minus120582119895Γ (119875119895)= 119890minus119871120591minus120582119871minus1 infinsum

119895=1

120583(2minus119901)119895 (Θ (119901 minus 1) 120583119901minus1)119895119875 119871119895119875Θ119895 (2 minus 119901)119895 119895Γ (119895119875)

= 119890minus119871120591minus120582119871minus1 infinsum119895=1

119871119895119875 (119901 minus 1)119895119875 120583(2minus119901)119895+(119901minus1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)

(33)

The term 120583(2minus119901)119895+(119901minus1)119895119875 depends on the 119871 119901 119875 Θ values sowe maximize the summation

119882(119871Θ 119875) = infinsum119895=1

119871119895119875 (119901 minus 1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)= infinsum

119895=1

119911119895119895Γ (119895119875)where 119911 = 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901)

= 119882119895

(34)

Considering119882119895 we have

log119882119895 = 119895 log 119911 minus log 119895 minus log (119875119895)= 119895 log 119911 minus log Γ (119895 + 1) minus log (119875119895) (35)

Using Stirlingrsquos approximation of Gamma functions we have

log Γ (1 + 119895) asymp (1 + 119895) log (1 + 119895) minus (1 + 119895)+ 12 log( 21205871 + 119895)

log Γ (119875119895) asymp 119875119895 log (119875119895) minus 119875119895 + 12 log(2120587119875119895 ) (36)

And hence we have

119882119895 asymp 119895 [log 119911 + (1 + 119875) minus 119875 log119875 minus (1 minus 119875) log 119895]minus log (2120587) minus 12 log119875 minus log 119895 (37)

For 1 lt 119901 lt 2 we have 119875 = (2 minus 119901)(119901 minus 1) gt 0 hencethe logarithms have positive arguments Differentiating withrespect to 119895 we have

120597 log119882119895120597119895 asymp log 119911 minus 1119895 minus log 119895 minus 119875 log (119875119895)asymp log 119911 minus log 119895 minus 119875 log (119875119895) (38)

Journal of Probability and Statistics 7

where 1119895 is ignored for large 119895 Solving for (120597 log119882119895)120597119895 = 0we have

119895max = 1198712minus119901(2 minus 119901)Θ (39)

Substituting 119895max in log119882119895 to find the maximum approxima-tion of119882119895 we have

log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ

(40)

Hence the result follows

It can be observed that 120597119882119895120597119895 is monotonically decreas-ing hence log119882119895 is strictly convex as a function of 119895Therefore 119882119895 decays faster than geometrically on either sideof 119895max [15] Therefore if we are to estimate 119882(119871Θ 119875) by(119871 Θ 119875) = sum119895119906

119895=119895119889119882119895 the approximation error is bounded

by geometric sum

119882(119871Θ 119875) minus (119871 Θ 119875)lt 119882119895119889minus1

1 minus 119903119895119889minus11198971 minus 119903119897 + 119882119895119906+1

11 minus 119903119906 119903119897 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119889 minus 1119903119906 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119906 + 1

(41)

For quick and accurate evaluation of119882(120582 120572 119871 119875) the seriesis summed for only those terms in the series which contributesignificantly to the sum

Generalized linear models extend the standard linearregressionmodels to incorporate nonnormal response distri-butions and possibly nonlinear functions of the mean Theadvantage of GLMs is that the fitting process maximizes thelikelihood for the choice of the distribution for a randomvariable 119910 and the choice is not restricted to normality unlikelinear regression [16]

The exponential dispersion models are the responsedistributions for the generalized linear models Tweedie dis-tributions are members of the exponential dispersion modelsupon which the generalized linear models are based Conse-quently fitting a Tweedie distribution follows the frameworkof fitting a generalized linear model

Lemma 8 In case of a canonical link function the sufficientstatistics for 120573119895 are sum119899

119894=1 119910119894119909119894119895

Proof For 119899 independent observations 119910119894 of the exponentialdispersion model (16) the log-likelihood function is

119871 (120573) = 119899sum119894=1

119871 119894 = 119899sum119894

log119891 (119910119894 120579119894 Θ)= 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ + 119899sum119894

log 119886 (119910119894 Θ) (42)

But 120579119894 = sum119901119895 120573119895119909119894119895 hence

119899sum119894

119910119894120579119894 = 119899sum119894=1

119910119894 119901sum119895

120573119895119909119894119895 = 119901sum119895

120573119895 119899sum119894=1

119910119894119909119894119895 (43)

Proposition 9 Given that 119910119894 is distributed as (16) then itsdistribution depends only on its first two moments namely 120583119894and Var(119910119894)Proof Let 119892(120583119894) be the link function of the GLM such that120578119894 = sum119901

119895=1 120573119895119909119894119895 = 119892(120583119894) The likelihood equations are

120597119871 (120573)120597120573 = 119899sum119894=1

120597119871 119894120597120573119895 forall119895 (44)

Using chain rule we have

120597119871 119894120597120573119895 = 120597119871 119894120597120579119894 120597120579119894120597120583119894 120597120583119894120597120578119894 120597120578119894120597120573119895 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 (45)

Hence

120597119871 (120573)120597120573 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 = 119910119894 minus 120583119894Θ120583119901119894 119909119894119895 120597120583119894120597120578119894 (46)

Since Var(119910119894) = 119881(120583119894) the relationship between themean andvariance characterizes the distribution

Clearly a GLM only requires the first two moments ofthe response 119910119894 hence despite the difficulty of full likelihoodanalysis of Tweedie distribution as it can not be expressedin closed form for 1 lt 119901 lt 2 we can still fit aTweedie distribution family The likelihood is only requiredto estimate 119901 and Θ as well as diagnostic check of the model

Proposition 10 Under the standard regularity conditions forlarge 119899 the maximum likelihood estimator 120573 of 120573 for general-ized linear model is efficient and has an approximate normaldistribution

Proof From the log-likelihood the covariance matrix of thedistribution is the inverse of the information matrix J =E(minus1205972119871(120573)120597120573ℎ120597120573119895)

8 Journal of Probability and Statistics

So

J = E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = E[(1205972119871 119894120597120573ℎ )(1205972119871 119894120597120573119895 )]= [( 119910119894 minus 120583119894

Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]

= 119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 (47)

Hence

E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = 119899sum119894

119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 = (119883119879119882119883) (48)

where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with

Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least

square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)

where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie

densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function

Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by

Θ = 119899sum119894=1

[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)

Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]

4 Data and Model Fitting

41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data

In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm

Rainfall for Balaka for 10 years

020

60

100

Am

ount

2000 2005 2010 20151995Year

Figure 1 Daily rainfall amount for Balaka district

minus8

minus4

0246

log(

varia

nce)

minus2 0 2minus4log(mean)

Figure 2 Variance mean relationship

We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as

log (Variance) = 120572 + 120573 log (mean) (51)

Variace = 119860 lowastmean120573 119860 isin R (52)

Hence the variance can be expressed as some power 120573 isin R

of the mean agreeing with the Tweedie variance functionrequirement

42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling

The canonical link function is given by

log120583119894 = 1198860 + 1198861 sin( 2120587119894365) + 1198862 cos( 2120587119894365) (53)

where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile

log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306

From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model

Journal of Probability and Statistics 9

Table 1 Estimated parameter values

Parameter Estimate Std error 119905 value Pr(gt |119905|)1198860 01653 00473 34930 00005lowastlowastlowast1198861 09049 00572 1581100 lt2e minus16lowastlowastlowast1198862 20326 00622 326720 lt2e minus16lowastlowastlowastΘ 148057 - - -With 119904119894119892119899119894f code 0 lowast lowast lowast

minus12500

minus12000

minus11500

minus11000

minus10500

L

(95 confidence interval)

13 14 15 16 17 1812120585 index

Figure 3 Profile likelihood

The predicted 120583119894 119901 Θ for each day only depends on thedayrsquos conditions so that for each day 119894 we have

120583119894 = exp [01653 + 09049 sin( 2120587119894365)+ 20326 cos( 2120587119894365)]

119901 = 15306Θ = 148057

(54)

From these estimated values we can calculate the parameter(119894 119894 ) from the corresponding formulas above as

119894 = 165716 (exp [01653 + 09049 sin( 2120587119894365)+ 203263 cos( 2120587119894365)])

04694 = 74284 (exp [01653 + 09049 sin( 2120587119894365)

+ 20326 cos( 2120587119894365)])05306

= 08847

(55)

Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally

43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean

Rainfall means

Actual meanPredicted mean

0

5

10

15

20

Mea

n

100 200 3000Days of the year

Figure 4 Actual versus predicted mean

estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894

The goodness of fit is determined by deviance which isdefined as

minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model

]= minus2 [119871 (120583 119910) minus 119871 (119910 119910)]= 2 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1

119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum

119894=1

119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ

(56)

Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance

In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is

Dev119901

= 2 119899sum119894=1

(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)

Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model

44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges

10 Journal of Probability and Statistics

Residuals

2000 2005 2010 20151995Year

0

50

100

150

Resid

uals

Figure 5 Residuals of the model

Normal Q-Q Plot

minus2

0

2

4

Sam

ple q

uant

iles

minus2 0minus4 42Theoretical quantiles

Figure 6 Q-Q plot of the quantile residuals

to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)

So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale

The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile

Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability

density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are

119903119902119894 = Φminus1 (119906119894) (58)

with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ

Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model

Rainfall amount

Days of the year1 44 95 153 218 283 348 413 478 543 608 673

PredictedActual

020

60

100

Am

ount

Figure 7 Simulated rainfall and observed rainfall

Probability of no rainfall

065

075

085

095

Prob

abili

ty

100 200 3000Days of the year

Figure 8 Probability of rainfall occurrence

5 Simulation

Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7

The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons

The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on

However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal

But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8

It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the

Journal of Probability and Statistics 11

Table 2 Data statistics

Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season

In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously

6 Conclusion

A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems

The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives

Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model

Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of

occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support

References

[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo

[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004

[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011

[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007

[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998

[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000

[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005

[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012

[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center

[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003

[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed

12 Journal of Probability and Statistics

dispersion parameterrdquo Accident Analysis amp Prevention vol 38no 4 pp 751ndash766 2006

[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010

[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011

[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014

[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008

[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015

[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 6: ResearchArticle A Poisson-Gamma Model for Zero Inflated … · 2017. 11. 7. · ResearchArticle A Poisson-Gamma Model for Zero Inflated Rainfall Data NelsonChristopherDzupire ,1 PhilipNgare,1,2

6 Journal of Probability and Statistics

Proposition 6 Based on Tweedie distribution the probabilityof receiving no rainfall at all is

119875 (119871 = 0) = exp[minus 1205832minus119901Θ(2 minus 119901)] (28)

and the probability of having a rainfall event is

119875 (119871 gt 0)= 119882 (120582 120572 119871 119875) exp[ 119871(1 minus 119901) 120583119901minus1 minus 1205832minus1199012 minus 119901] (29)

where

119882(120582 120572 119871 119875) = infinsum119895=1

120582119895 (120572119871)119895119875 119890minus120582119895Γ (119895119875) (30)

Proof This follows by directly substituting the values of 120582 and120579 119896(120579) into (16)The function 119882(120582 120572 119871 119875) is an example of Wrightrsquos

generalized Bessel function however it can not be expressedin terms of the more common Bessel function To evaluate itthe value of 119895 is determined forwhich the function119882119895 reachesthe maximum [15]

3 Parameter Estimation

We approximate the function 119882(120582 120572 119871 119875) =suminfin119895=1(120582119895(120572119871)119895119875119890minus120582119895Γ(119895119875)) = suminfin

119895=1 119882119895 following theprocedure by [15] where the value of 119895 is determined forwhich119882119895 reaches maximumWe treat 119895 as continuous so that119882119895 is differentiated with respect to 119895 and set the derivative tozero So for 119871 gt 0 we have the followingLemma 7 (see [15]) The log maximum approximation of 119882119895

is given by

log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ

(31)

where 119895max = 1198712minus119901(2 minus 119901)Θ

Proof

119882(120582 120572 119871 119875) = infinsum119895=1

120582119895 (120572119871)119895119875minus1 119890minus120582119895Γ (119895119875)= infinsum

119895=1

120582119895119871119895119875minus1119890minus119871120591119890minus120582119895120591119875119895Γ (119875119895) where 120591 = 1120572 (32)

Substituting the values of 120582 120572 in the above equation we have

119882(120582 120572 119871 119875)= infinsum

119895=1

(1205832minus119901Θ (2 minus 119901))119895 119871119895119875minus1 [Θ (1 minus 119901) 120583119901minus1]119895119875 119890minus119871120591119890minus120582119895Γ (119875119895)= 119890minus119871120591minus120582119871minus1 infinsum

119895=1

120583(2minus119901)119895 (Θ (119901 minus 1) 120583119901minus1)119895119875 119871119895119875Θ119895 (2 minus 119901)119895 119895Γ (119895119875)

= 119890minus119871120591minus120582119871minus1 infinsum119895=1

119871119895119875 (119901 minus 1)119895119875 120583(2minus119901)119895+(119901minus1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)

(33)

The term 120583(2minus119901)119895+(119901minus1)119895119875 depends on the 119871 119901 119875 Θ values sowe maximize the summation

119882(119871Θ 119875) = infinsum119895=1

119871119895119875 (119901 minus 1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)= infinsum

119895=1

119911119895119895Γ (119895119875)where 119911 = 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901)

= 119882119895

(34)

Considering119882119895 we have

log119882119895 = 119895 log 119911 minus log 119895 minus log (119875119895)= 119895 log 119911 minus log Γ (119895 + 1) minus log (119875119895) (35)

Using Stirlingrsquos approximation of Gamma functions we have

log Γ (1 + 119895) asymp (1 + 119895) log (1 + 119895) minus (1 + 119895)+ 12 log( 21205871 + 119895)

log Γ (119875119895) asymp 119875119895 log (119875119895) minus 119875119895 + 12 log(2120587119875119895 ) (36)

And hence we have

119882119895 asymp 119895 [log 119911 + (1 + 119875) minus 119875 log119875 minus (1 minus 119875) log 119895]minus log (2120587) minus 12 log119875 minus log 119895 (37)

For 1 lt 119901 lt 2 we have 119875 = (2 minus 119901)(119901 minus 1) gt 0 hencethe logarithms have positive arguments Differentiating withrespect to 119895 we have

120597 log119882119895120597119895 asymp log 119911 minus 1119895 minus log 119895 minus 119875 log (119875119895)asymp log 119911 minus log 119895 minus 119875 log (119875119895) (38)

Journal of Probability and Statistics 7

where 1119895 is ignored for large 119895 Solving for (120597 log119882119895)120597119895 = 0we have

119895max = 1198712minus119901(2 minus 119901)Θ (39)

Substituting 119895max in log119882119895 to find the maximum approxima-tion of119882119895 we have

log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ

(40)

Hence the result follows

It can be observed that 120597119882119895120597119895 is monotonically decreas-ing hence log119882119895 is strictly convex as a function of 119895Therefore 119882119895 decays faster than geometrically on either sideof 119895max [15] Therefore if we are to estimate 119882(119871Θ 119875) by(119871 Θ 119875) = sum119895119906

119895=119895119889119882119895 the approximation error is bounded

by geometric sum

119882(119871Θ 119875) minus (119871 Θ 119875)lt 119882119895119889minus1

1 minus 119903119895119889minus11198971 minus 119903119897 + 119882119895119906+1

11 minus 119903119906 119903119897 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119889 minus 1119903119906 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119906 + 1

(41)

For quick and accurate evaluation of119882(120582 120572 119871 119875) the seriesis summed for only those terms in the series which contributesignificantly to the sum

Generalized linear models extend the standard linearregressionmodels to incorporate nonnormal response distri-butions and possibly nonlinear functions of the mean Theadvantage of GLMs is that the fitting process maximizes thelikelihood for the choice of the distribution for a randomvariable 119910 and the choice is not restricted to normality unlikelinear regression [16]

The exponential dispersion models are the responsedistributions for the generalized linear models Tweedie dis-tributions are members of the exponential dispersion modelsupon which the generalized linear models are based Conse-quently fitting a Tweedie distribution follows the frameworkof fitting a generalized linear model

Lemma 8 In case of a canonical link function the sufficientstatistics for 120573119895 are sum119899

119894=1 119910119894119909119894119895

Proof For 119899 independent observations 119910119894 of the exponentialdispersion model (16) the log-likelihood function is

119871 (120573) = 119899sum119894=1

119871 119894 = 119899sum119894

log119891 (119910119894 120579119894 Θ)= 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ + 119899sum119894

log 119886 (119910119894 Θ) (42)

But 120579119894 = sum119901119895 120573119895119909119894119895 hence

119899sum119894

119910119894120579119894 = 119899sum119894=1

119910119894 119901sum119895

120573119895119909119894119895 = 119901sum119895

120573119895 119899sum119894=1

119910119894119909119894119895 (43)

Proposition 9 Given that 119910119894 is distributed as (16) then itsdistribution depends only on its first two moments namely 120583119894and Var(119910119894)Proof Let 119892(120583119894) be the link function of the GLM such that120578119894 = sum119901

119895=1 120573119895119909119894119895 = 119892(120583119894) The likelihood equations are

120597119871 (120573)120597120573 = 119899sum119894=1

120597119871 119894120597120573119895 forall119895 (44)

Using chain rule we have

120597119871 119894120597120573119895 = 120597119871 119894120597120579119894 120597120579119894120597120583119894 120597120583119894120597120578119894 120597120578119894120597120573119895 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 (45)

Hence

120597119871 (120573)120597120573 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 = 119910119894 minus 120583119894Θ120583119901119894 119909119894119895 120597120583119894120597120578119894 (46)

Since Var(119910119894) = 119881(120583119894) the relationship between themean andvariance characterizes the distribution

Clearly a GLM only requires the first two moments ofthe response 119910119894 hence despite the difficulty of full likelihoodanalysis of Tweedie distribution as it can not be expressedin closed form for 1 lt 119901 lt 2 we can still fit aTweedie distribution family The likelihood is only requiredto estimate 119901 and Θ as well as diagnostic check of the model

Proposition 10 Under the standard regularity conditions forlarge 119899 the maximum likelihood estimator 120573 of 120573 for general-ized linear model is efficient and has an approximate normaldistribution

Proof From the log-likelihood the covariance matrix of thedistribution is the inverse of the information matrix J =E(minus1205972119871(120573)120597120573ℎ120597120573119895)

8 Journal of Probability and Statistics

So

J = E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = E[(1205972119871 119894120597120573ℎ )(1205972119871 119894120597120573119895 )]= [( 119910119894 minus 120583119894

Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]

= 119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 (47)

Hence

E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = 119899sum119894

119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 = (119883119879119882119883) (48)

where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with

Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least

square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)

where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie

densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function

Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by

Θ = 119899sum119894=1

[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)

Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]

4 Data and Model Fitting

41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data

In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm

Rainfall for Balaka for 10 years

020

60

100

Am

ount

2000 2005 2010 20151995Year

Figure 1 Daily rainfall amount for Balaka district

minus8

minus4

0246

log(

varia

nce)

minus2 0 2minus4log(mean)

Figure 2 Variance mean relationship

We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as

log (Variance) = 120572 + 120573 log (mean) (51)

Variace = 119860 lowastmean120573 119860 isin R (52)

Hence the variance can be expressed as some power 120573 isin R

of the mean agreeing with the Tweedie variance functionrequirement

42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling

The canonical link function is given by

log120583119894 = 1198860 + 1198861 sin( 2120587119894365) + 1198862 cos( 2120587119894365) (53)

where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile

log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306

From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model

Journal of Probability and Statistics 9

Table 1 Estimated parameter values

Parameter Estimate Std error 119905 value Pr(gt |119905|)1198860 01653 00473 34930 00005lowastlowastlowast1198861 09049 00572 1581100 lt2e minus16lowastlowastlowast1198862 20326 00622 326720 lt2e minus16lowastlowastlowastΘ 148057 - - -With 119904119894119892119899119894f code 0 lowast lowast lowast

minus12500

minus12000

minus11500

minus11000

minus10500

L

(95 confidence interval)

13 14 15 16 17 1812120585 index

Figure 3 Profile likelihood

The predicted 120583119894 119901 Θ for each day only depends on thedayrsquos conditions so that for each day 119894 we have

120583119894 = exp [01653 + 09049 sin( 2120587119894365)+ 20326 cos( 2120587119894365)]

119901 = 15306Θ = 148057

(54)

From these estimated values we can calculate the parameter(119894 119894 ) from the corresponding formulas above as

119894 = 165716 (exp [01653 + 09049 sin( 2120587119894365)+ 203263 cos( 2120587119894365)])

04694 = 74284 (exp [01653 + 09049 sin( 2120587119894365)

+ 20326 cos( 2120587119894365)])05306

= 08847

(55)

Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally

43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean

Rainfall means

Actual meanPredicted mean

0

5

10

15

20

Mea

n

100 200 3000Days of the year

Figure 4 Actual versus predicted mean

estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894

The goodness of fit is determined by deviance which isdefined as

minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model

]= minus2 [119871 (120583 119910) minus 119871 (119910 119910)]= 2 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1

119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum

119894=1

119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ

(56)

Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance

In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is

Dev119901

= 2 119899sum119894=1

(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)

Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model

44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges

10 Journal of Probability and Statistics

Residuals

2000 2005 2010 20151995Year

0

50

100

150

Resid

uals

Figure 5 Residuals of the model

Normal Q-Q Plot

minus2

0

2

4

Sam

ple q

uant

iles

minus2 0minus4 42Theoretical quantiles

Figure 6 Q-Q plot of the quantile residuals

to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)

So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale

The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile

Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability

density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are

119903119902119894 = Φminus1 (119906119894) (58)

with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ

Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model

Rainfall amount

Days of the year1 44 95 153 218 283 348 413 478 543 608 673

PredictedActual

020

60

100

Am

ount

Figure 7 Simulated rainfall and observed rainfall

Probability of no rainfall

065

075

085

095

Prob

abili

ty

100 200 3000Days of the year

Figure 8 Probability of rainfall occurrence

5 Simulation

Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7

The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons

The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on

However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal

But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8

It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the

Journal of Probability and Statistics 11

Table 2 Data statistics

Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season

In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously

6 Conclusion

A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems

The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives

Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model

Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of

occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support

References

[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo

[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004

[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011

[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007

[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998

[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000

[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005

[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012

[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center

[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003

[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed

12 Journal of Probability and Statistics

dispersion parameterrdquo Accident Analysis amp Prevention vol 38no 4 pp 751ndash766 2006

[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010

[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011

[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014

[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008

[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015

[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 7: ResearchArticle A Poisson-Gamma Model for Zero Inflated … · 2017. 11. 7. · ResearchArticle A Poisson-Gamma Model for Zero Inflated Rainfall Data NelsonChristopherDzupire ,1 PhilipNgare,1,2

Journal of Probability and Statistics 7

where 1119895 is ignored for large 119895 Solving for (120597 log119882119895)120597119895 = 0we have

119895max = 1198712minus119901(2 minus 119901)Θ (39)

Substituting 119895max in log119882119895 to find the maximum approxima-tion of119882119895 we have

log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ

(40)

Hence the result follows

It can be observed that 120597119882119895120597119895 is monotonically decreas-ing hence log119882119895 is strictly convex as a function of 119895Therefore 119882119895 decays faster than geometrically on either sideof 119895max [15] Therefore if we are to estimate 119882(119871Θ 119875) by(119871 Θ 119875) = sum119895119906

119895=119895119889119882119895 the approximation error is bounded

by geometric sum

119882(119871Θ 119875) minus (119871 Θ 119875)lt 119882119895119889minus1

1 minus 119903119895119889minus11198971 minus 119903119897 + 119882119895119906+1

11 minus 119903119906 119903119897 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119889 minus 1119903119906 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119906 + 1

(41)

For quick and accurate evaluation of119882(120582 120572 119871 119875) the seriesis summed for only those terms in the series which contributesignificantly to the sum

Generalized linear models extend the standard linearregressionmodels to incorporate nonnormal response distri-butions and possibly nonlinear functions of the mean Theadvantage of GLMs is that the fitting process maximizes thelikelihood for the choice of the distribution for a randomvariable 119910 and the choice is not restricted to normality unlikelinear regression [16]

The exponential dispersion models are the responsedistributions for the generalized linear models Tweedie dis-tributions are members of the exponential dispersion modelsupon which the generalized linear models are based Conse-quently fitting a Tweedie distribution follows the frameworkof fitting a generalized linear model

Lemma 8 In case of a canonical link function the sufficientstatistics for 120573119895 are sum119899

119894=1 119910119894119909119894119895

Proof For 119899 independent observations 119910119894 of the exponentialdispersion model (16) the log-likelihood function is

119871 (120573) = 119899sum119894=1

119871 119894 = 119899sum119894

log119891 (119910119894 120579119894 Θ)= 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ + 119899sum119894

log 119886 (119910119894 Θ) (42)

But 120579119894 = sum119901119895 120573119895119909119894119895 hence

119899sum119894

119910119894120579119894 = 119899sum119894=1

119910119894 119901sum119895

120573119895119909119894119895 = 119901sum119895

120573119895 119899sum119894=1

119910119894119909119894119895 (43)

Proposition 9 Given that 119910119894 is distributed as (16) then itsdistribution depends only on its first two moments namely 120583119894and Var(119910119894)Proof Let 119892(120583119894) be the link function of the GLM such that120578119894 = sum119901

119895=1 120573119895119909119894119895 = 119892(120583119894) The likelihood equations are

120597119871 (120573)120597120573 = 119899sum119894=1

120597119871 119894120597120573119895 forall119895 (44)

Using chain rule we have

120597119871 119894120597120573119895 = 120597119871 119894120597120579119894 120597120579119894120597120583119894 120597120583119894120597120578119894 120597120578119894120597120573119895 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 (45)

Hence

120597119871 (120573)120597120573 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 = 119910119894 minus 120583119894Θ120583119901119894 119909119894119895 120597120583119894120597120578119894 (46)

Since Var(119910119894) = 119881(120583119894) the relationship between themean andvariance characterizes the distribution

Clearly a GLM only requires the first two moments ofthe response 119910119894 hence despite the difficulty of full likelihoodanalysis of Tweedie distribution as it can not be expressedin closed form for 1 lt 119901 lt 2 we can still fit aTweedie distribution family The likelihood is only requiredto estimate 119901 and Θ as well as diagnostic check of the model

Proposition 10 Under the standard regularity conditions forlarge 119899 the maximum likelihood estimator 120573 of 120573 for general-ized linear model is efficient and has an approximate normaldistribution

Proof From the log-likelihood the covariance matrix of thedistribution is the inverse of the information matrix J =E(minus1205972119871(120573)120597120573ℎ120597120573119895)

8 Journal of Probability and Statistics

So

J = E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = E[(1205972119871 119894120597120573ℎ )(1205972119871 119894120597120573119895 )]= [( 119910119894 minus 120583119894

Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]

= 119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 (47)

Hence

E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = 119899sum119894

119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 = (119883119879119882119883) (48)

where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with

Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least

square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)

where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie

densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function

Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by

Θ = 119899sum119894=1

[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)

Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]

4 Data and Model Fitting

41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data

In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm

Rainfall for Balaka for 10 years

020

60

100

Am

ount

2000 2005 2010 20151995Year

Figure 1 Daily rainfall amount for Balaka district

minus8

minus4

0246

log(

varia

nce)

minus2 0 2minus4log(mean)

Figure 2 Variance mean relationship

We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as

log (Variance) = 120572 + 120573 log (mean) (51)

Variace = 119860 lowastmean120573 119860 isin R (52)

Hence the variance can be expressed as some power 120573 isin R

of the mean agreeing with the Tweedie variance functionrequirement

42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling

The canonical link function is given by

log120583119894 = 1198860 + 1198861 sin( 2120587119894365) + 1198862 cos( 2120587119894365) (53)

where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile

log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306

From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model

Journal of Probability and Statistics 9

Table 1 Estimated parameter values

Parameter Estimate Std error 119905 value Pr(gt |119905|)1198860 01653 00473 34930 00005lowastlowastlowast1198861 09049 00572 1581100 lt2e minus16lowastlowastlowast1198862 20326 00622 326720 lt2e minus16lowastlowastlowastΘ 148057 - - -With 119904119894119892119899119894f code 0 lowast lowast lowast

minus12500

minus12000

minus11500

minus11000

minus10500

L

(95 confidence interval)

13 14 15 16 17 1812120585 index

Figure 3 Profile likelihood

The predicted 120583119894 119901 Θ for each day only depends on thedayrsquos conditions so that for each day 119894 we have

120583119894 = exp [01653 + 09049 sin( 2120587119894365)+ 20326 cos( 2120587119894365)]

119901 = 15306Θ = 148057

(54)

From these estimated values we can calculate the parameter(119894 119894 ) from the corresponding formulas above as

119894 = 165716 (exp [01653 + 09049 sin( 2120587119894365)+ 203263 cos( 2120587119894365)])

04694 = 74284 (exp [01653 + 09049 sin( 2120587119894365)

+ 20326 cos( 2120587119894365)])05306

= 08847

(55)

Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally

43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean

Rainfall means

Actual meanPredicted mean

0

5

10

15

20

Mea

n

100 200 3000Days of the year

Figure 4 Actual versus predicted mean

estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894

The goodness of fit is determined by deviance which isdefined as

minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model

]= minus2 [119871 (120583 119910) minus 119871 (119910 119910)]= 2 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1

119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum

119894=1

119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ

(56)

Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance

In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is

Dev119901

= 2 119899sum119894=1

(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)

Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model

44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges

10 Journal of Probability and Statistics

Residuals

2000 2005 2010 20151995Year

0

50

100

150

Resid

uals

Figure 5 Residuals of the model

Normal Q-Q Plot

minus2

0

2

4

Sam

ple q

uant

iles

minus2 0minus4 42Theoretical quantiles

Figure 6 Q-Q plot of the quantile residuals

to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)

So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale

The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile

Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability

density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are

119903119902119894 = Φminus1 (119906119894) (58)

with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ

Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model

Rainfall amount

Days of the year1 44 95 153 218 283 348 413 478 543 608 673

PredictedActual

020

60

100

Am

ount

Figure 7 Simulated rainfall and observed rainfall

Probability of no rainfall

065

075

085

095

Prob

abili

ty

100 200 3000Days of the year

Figure 8 Probability of rainfall occurrence

5 Simulation

Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7

The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons

The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on

However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal

But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8

It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the

Journal of Probability and Statistics 11

Table 2 Data statistics

Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season

In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously

6 Conclusion

A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems

The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives

Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model

Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of

occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support

References

[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo

[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004

[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011

[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007

[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998

[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000

[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005

[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012

[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center

[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003

[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed

12 Journal of Probability and Statistics

dispersion parameterrdquo Accident Analysis amp Prevention vol 38no 4 pp 751ndash766 2006

[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010

[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011

[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014

[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008

[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015

[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 8: ResearchArticle A Poisson-Gamma Model for Zero Inflated … · 2017. 11. 7. · ResearchArticle A Poisson-Gamma Model for Zero Inflated Rainfall Data NelsonChristopherDzupire ,1 PhilipNgare,1,2

8 Journal of Probability and Statistics

So

J = E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = E[(1205972119871 119894120597120573ℎ )(1205972119871 119894120597120573119895 )]= [( 119910119894 minus 120583119894

Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]

= 119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 (47)

Hence

E(minus1205972119871 (120573)120597120573ℎ120597120573119895 ) = 119899sum119894

119909119894ℎ119909119894119895Var (119910119894) (120597120583119894120597120578119894)

2 = (119883119879119882119883) (48)

where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with

Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least

square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)

where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie

densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function

Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by

Θ = 119899sum119894=1

[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)

Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]

4 Data and Model Fitting

41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data

In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm

Rainfall for Balaka for 10 years

020

60

100

Am

ount

2000 2005 2010 20151995Year

Figure 1 Daily rainfall amount for Balaka district

minus8

minus4

0246

log(

varia

nce)

minus2 0 2minus4log(mean)

Figure 2 Variance mean relationship

We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as

log (Variance) = 120572 + 120573 log (mean) (51)

Variace = 119860 lowastmean120573 119860 isin R (52)

Hence the variance can be expressed as some power 120573 isin R

of the mean agreeing with the Tweedie variance functionrequirement

42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling

The canonical link function is given by

log120583119894 = 1198860 + 1198861 sin( 2120587119894365) + 1198862 cos( 2120587119894365) (53)

where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile

log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306

From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model

Journal of Probability and Statistics 9

Table 1 Estimated parameter values

Parameter Estimate Std error 119905 value Pr(gt |119905|)1198860 01653 00473 34930 00005lowastlowastlowast1198861 09049 00572 1581100 lt2e minus16lowastlowastlowast1198862 20326 00622 326720 lt2e minus16lowastlowastlowastΘ 148057 - - -With 119904119894119892119899119894f code 0 lowast lowast lowast

minus12500

minus12000

minus11500

minus11000

minus10500

L

(95 confidence interval)

13 14 15 16 17 1812120585 index

Figure 3 Profile likelihood

The predicted 120583119894 119901 Θ for each day only depends on thedayrsquos conditions so that for each day 119894 we have

120583119894 = exp [01653 + 09049 sin( 2120587119894365)+ 20326 cos( 2120587119894365)]

119901 = 15306Θ = 148057

(54)

From these estimated values we can calculate the parameter(119894 119894 ) from the corresponding formulas above as

119894 = 165716 (exp [01653 + 09049 sin( 2120587119894365)+ 203263 cos( 2120587119894365)])

04694 = 74284 (exp [01653 + 09049 sin( 2120587119894365)

+ 20326 cos( 2120587119894365)])05306

= 08847

(55)

Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally

43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean

Rainfall means

Actual meanPredicted mean

0

5

10

15

20

Mea

n

100 200 3000Days of the year

Figure 4 Actual versus predicted mean

estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894

The goodness of fit is determined by deviance which isdefined as

minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model

]= minus2 [119871 (120583 119910) minus 119871 (119910 119910)]= 2 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1

119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum

119894=1

119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ

(56)

Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance

In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is

Dev119901

= 2 119899sum119894=1

(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)

Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model

44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges

10 Journal of Probability and Statistics

Residuals

2000 2005 2010 20151995Year

0

50

100

150

Resid

uals

Figure 5 Residuals of the model

Normal Q-Q Plot

minus2

0

2

4

Sam

ple q

uant

iles

minus2 0minus4 42Theoretical quantiles

Figure 6 Q-Q plot of the quantile residuals

to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)

So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale

The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile

Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability

density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are

119903119902119894 = Φminus1 (119906119894) (58)

with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ

Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model

Rainfall amount

Days of the year1 44 95 153 218 283 348 413 478 543 608 673

PredictedActual

020

60

100

Am

ount

Figure 7 Simulated rainfall and observed rainfall

Probability of no rainfall

065

075

085

095

Prob

abili

ty

100 200 3000Days of the year

Figure 8 Probability of rainfall occurrence

5 Simulation

Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7

The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons

The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on

However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal

But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8

It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the

Journal of Probability and Statistics 11

Table 2 Data statistics

Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season

In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously

6 Conclusion

A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems

The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives

Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model

Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of

occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support

References

[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo

[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004

[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011

[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007

[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998

[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000

[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005

[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012

[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center

[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003

[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed

12 Journal of Probability and Statistics

dispersion parameterrdquo Accident Analysis amp Prevention vol 38no 4 pp 751ndash766 2006

[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010

[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011

[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014

[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008

[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015

[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 9: ResearchArticle A Poisson-Gamma Model for Zero Inflated … · 2017. 11. 7. · ResearchArticle A Poisson-Gamma Model for Zero Inflated Rainfall Data NelsonChristopherDzupire ,1 PhilipNgare,1,2

Journal of Probability and Statistics 9

Table 1 Estimated parameter values

Parameter Estimate Std error 119905 value Pr(gt |119905|)1198860 01653 00473 34930 00005lowastlowastlowast1198861 09049 00572 1581100 lt2e minus16lowastlowastlowast1198862 20326 00622 326720 lt2e minus16lowastlowastlowastΘ 148057 - - -With 119904119894119892119899119894f code 0 lowast lowast lowast

minus12500

minus12000

minus11500

minus11000

minus10500

L

(95 confidence interval)

13 14 15 16 17 1812120585 index

Figure 3 Profile likelihood

The predicted 120583119894 119901 Θ for each day only depends on thedayrsquos conditions so that for each day 119894 we have

120583119894 = exp [01653 + 09049 sin( 2120587119894365)+ 20326 cos( 2120587119894365)]

119901 = 15306Θ = 148057

(54)

From these estimated values we can calculate the parameter(119894 119894 ) from the corresponding formulas above as

119894 = 165716 (exp [01653 + 09049 sin( 2120587119894365)+ 203263 cos( 2120587119894365)])

04694 = 74284 (exp [01653 + 09049 sin( 2120587119894365)

+ 20326 cos( 2120587119894365)])05306

= 08847

(55)

Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally

43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean

Rainfall means

Actual meanPredicted mean

0

5

10

15

20

Mea

n

100 200 3000Days of the year

Figure 4 Actual versus predicted mean

estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894

The goodness of fit is determined by deviance which isdefined as

minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model

]= minus2 [119871 (120583 119910) minus 119871 (119910 119910)]= 2 119899sum

119894=1

119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1

119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum

119894=1

119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ

(56)

Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance

In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is

Dev119901

= 2 119899sum119894=1

(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)

Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model

44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges

10 Journal of Probability and Statistics

Residuals

2000 2005 2010 20151995Year

0

50

100

150

Resid

uals

Figure 5 Residuals of the model

Normal Q-Q Plot

minus2

0

2

4

Sam

ple q

uant

iles

minus2 0minus4 42Theoretical quantiles

Figure 6 Q-Q plot of the quantile residuals

to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)

So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale

The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile

Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability

density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are

119903119902119894 = Φminus1 (119906119894) (58)

with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ

Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model

Rainfall amount

Days of the year1 44 95 153 218 283 348 413 478 543 608 673

PredictedActual

020

60

100

Am

ount

Figure 7 Simulated rainfall and observed rainfall

Probability of no rainfall

065

075

085

095

Prob

abili

ty

100 200 3000Days of the year

Figure 8 Probability of rainfall occurrence

5 Simulation

Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7

The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons

The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on

However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal

But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8

It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the

Journal of Probability and Statistics 11

Table 2 Data statistics

Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season

In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously

6 Conclusion

A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems

The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives

Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model

Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of

occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support

References

[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo

[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004

[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011

[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007

[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998

[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000

[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005

[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012

[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center

[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003

[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed

12 Journal of Probability and Statistics

dispersion parameterrdquo Accident Analysis amp Prevention vol 38no 4 pp 751ndash766 2006

[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010

[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011

[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014

[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008

[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015

[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 10: ResearchArticle A Poisson-Gamma Model for Zero Inflated … · 2017. 11. 7. · ResearchArticle A Poisson-Gamma Model for Zero Inflated Rainfall Data NelsonChristopherDzupire ,1 PhilipNgare,1,2

10 Journal of Probability and Statistics

Residuals

2000 2005 2010 20151995Year

0

50

100

150

Resid

uals

Figure 5 Residuals of the model

Normal Q-Q Plot

minus2

0

2

4

Sam

ple q

uant

iles

minus2 0minus4 42Theoretical quantiles

Figure 6 Q-Q plot of the quantile residuals

to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)

So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale

The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile

Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability

density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are

119903119902119894 = Φminus1 (119906119894) (58)

with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ

Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model

Rainfall amount

Days of the year1 44 95 153 218 283 348 413 478 543 608 673

PredictedActual

020

60

100

Am

ount

Figure 7 Simulated rainfall and observed rainfall

Probability of no rainfall

065

075

085

095

Prob

abili

ty

100 200 3000Days of the year

Figure 8 Probability of rainfall occurrence

5 Simulation

Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7

The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons

The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on

However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal

But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8

It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the

Journal of Probability and Statistics 11

Table 2 Data statistics

Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season

In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously

6 Conclusion

A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems

The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives

Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model

Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of

occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support

References

[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo

[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004

[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011

[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007

[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998

[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000

[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005

[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012

[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center

[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003

[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed

12 Journal of Probability and Statistics

dispersion parameterrdquo Accident Analysis amp Prevention vol 38no 4 pp 751ndash766 2006

[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010

[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011

[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014

[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008

[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015

[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 11: ResearchArticle A Poisson-Gamma Model for Zero Inflated … · 2017. 11. 7. · ResearchArticle A Poisson-Gamma Model for Zero Inflated Rainfall Data NelsonChristopherDzupire ,1 PhilipNgare,1,2

Journal of Probability and Statistics 11

Table 2 Data statistics

Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season

In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously

6 Conclusion

A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems

The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives

Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model

Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of

occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support

References

[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo

[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004

[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011

[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007

[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998

[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000

[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005

[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012

[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center

[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003

[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed

12 Journal of Probability and Statistics

dispersion parameterrdquo Accident Analysis amp Prevention vol 38no 4 pp 751ndash766 2006

[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010

[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011

[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014

[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008

[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015

[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 12: ResearchArticle A Poisson-Gamma Model for Zero Inflated … · 2017. 11. 7. · ResearchArticle A Poisson-Gamma Model for Zero Inflated Rainfall Data NelsonChristopherDzupire ,1 PhilipNgare,1,2

12 Journal of Probability and Statistics

dispersion parameterrdquo Accident Analysis amp Prevention vol 38no 4 pp 751ndash766 2006

[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010

[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011

[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014

[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008

[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015

[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 13: ResearchArticle A Poisson-Gamma Model for Zero Inflated … · 2017. 11. 7. · ResearchArticle A Poisson-Gamma Model for Zero Inflated Rainfall Data NelsonChristopherDzupire ,1 PhilipNgare,1,2

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom