nanyang technological university school of...

NANYANG TECHNOLOGICAL UNIVERSITY

SCHOOL OF

ELECTRICAL AND ELECTRONIC ENGINEERING

MARKOV RANDOM FIELDS GENERALIZED PARETO DISTRIBUTION

FOR MULTI-SITE DATASETS (MRF-GP)

SUBMITTED BY

ZHOU QIAO

A Final Year Project presented to

Nanyang Technological University, Singapore

In partial Fulfillment of The Requirement for The Degree of Bachelor of

Engineering (Electrical & Electronic Engineering)

Year: 2011/2012

Supervised by prof Justin Dauwels

Assessed by Prof Chua Chin Seng

1

TABLE OF CONTENTS

ABSTRACT ........................................................................................................................ i

ACKNOWLEDGEMENT ................................................................................................ ii

LIST OF FIGURES ......................................................................................................... iii

LIST OF CHAPTERS ..................................................................................................... iv

CHAPTER 1: INTRODUCTION .................................................................................... 1

1.1 Motivation ........................................................................................................................ 1

1.2 Scope and Objectives ............................................................................................................ 2

1.2.1 Research Scope ............................................................................................................... 2

1.2.2 Research Objectives ....................................................................................................... 3

1.3 Main Results .......................................................................................................................... 3

1.4 Thesis Organization ............................................................................................................... 3

CHAPTER 2: MODELING EXTREME EVENTS ....................................................... 5

2.1 Overview ............................................................................................................................... 5

2.2 Parameter Dependence Modeling .......................................................................................... 6

2.2.1 Directional Model ........................................................................................................... 6

2.2.2 Seasonal Model .............................................................................................................. 8

2.2.3 Spatial Model.................................................................................................................. 9

2.3 Model Discussions and Limitations .................................................................................... 11

CHAPTER 3: PRELIMINARIES ................................................................................. 13

3.1 Extreme Value Modeling .................................................................................................... 13

3.1.1 Extreme Value Theory ................................................................................................. 13

3.1.2 Generalized Extreme Value Model .............................................................................. 14

3.1.3 Peak Over Threshold (POT) Model- Generalized Pareto Distribution......................... 15

2

3.2 Thin Membrane Gaussian Graphical Model ....................................................................... 16

3.2.1 Markov Random Field .................................................................................................. 16

3.2.2 Gaussian Graphical Model (GGM) ........................................................................ 17

3.2.2.1 GGM Basics ........................................................................................................... 17

3.2.3 Thin Membrane Model ................................................................................................. 19

3.3 Estimation Method .............................................................................................................. 21

3.3.1 Maximum Likelihood (ML) ......................................................................................... 21

3.3.2 Maximum A Posterior (MAP) ..................................................................................... 22

CHAPTER 4: NON-PARAMETRIC METHOD TO MODEL SPATIAL

DEPENDENCE ............................................................................................................... 24

4.1 Model Construction ............................................................................................................. 24

4.1.1 Introduction to the MRF-GP Model ............................................................................. 24

4.1.2 Locally Data Fitting ...................................................................................................... 25

4.1.3 Threshold Optimization Method .................................................................................. 26

4.1.3.1 Threshold Selection ................................................................................................... 26

4.1.4 GP Parameter Optimization .......................................................................................... 31

4.2 Smoothing Parameter Selection .......................................................................................... 32

4.2.1 Cross-validation (CV) .................................................................................................. 33

4.2.2 Maximum a posterior (MAP) ....................................................................................... 33

4.2.3 Iterative Conditional Modes (ICM) .............................................................................. 35

4.2.4 Limitations .................................................................................................................... 37

CHAPTER 5: SMOOTHING PARAMETER SELECTION USING

EXPECTATION MAXIMIZATION (EM) .................................................................. 38

5.1 Model in Matrix Form ......................................................................................................... 38

5.1.1 The Prior Model: Gauss- Markov Random Field ......................................................... 38

5.1.2 Conditional Distribution ............................................................................................... 39

5.1.3 The Posterior Distribution ............................................................................................ 40

5.1.4 Measurements Bootstrapping ....................................................................................... 41

5.1.5 Threshold Smoothing ................................................................................................... 42

3

5.1.6 GP Parameter Smoothing ............................................................................................ 43

5.2 The Exponential Family ...................................................................................................... 44

5.3 Expectation Maximization Algorithm ................................................................................. 45

5.3.1 Introduction to Expectation Maximization ................................................................... 45

5.3.2 Jensen‟s Inequality ....................................................................................................... 45

5.3.3 The EM Algorithm ....................................................................................................... 46

5.3.4 Expectation Maximization in MRF-GP ........................................................................ 47

5.4 Model Implementation by EM ............................................................................................ 48

5.4.1 The Expectation Step .................................................................................................... 49

5.4.2 The Maximization Step ................................................................................................ 50

5.5 Discussions .......................................................................................................................... 51

CHAPTER 6: RESULTS AND DISCUSSIONS .......................................................... 53

6.1 Parameters Initialization and Data Generation .................................................................... 53

6.2 Theoretical Expectations ..................................................................................................... 54

6.3 Threshold Sensitivity and Uncertainty Analysis ................................................................. 54

6.4 Smoothing Sensitivity and Uncertainty Analysis ................................................................ 56

6.5 The Sizing Effect ................................................................................................................. 58

CHAPTER 7: CONCLUSION AND RECOMMENDATIONS ................................. 61

7.1 Summary of the Contributions ............................................................................................ 61

7.2 Recommendations for future works .................................................................................... 63

REFERENCES ............................................................................................................... 64

APPENDIX- LIST OF CODES ....................................................................................... 66

i

ABSTRACT

In this thesis, the author proposes a new nonparametric approach to model the extreme

behaviors of the multi-site time-series data, taking into account the covariate dependence

of neighboring sites using Gaussian Markov random fields.

The modeling of extreme or catastrophic events has shown a rising popularity and

significance recently, especially in areas such as weather forecasting, flood measurement

and environmental assessment. Original methods only fit the extreme data to the

Generalized Pareto (GP) distribution locally. However, dependence between neighboring

sites is obvious. We define the marginal events with observed values over threshold as

extreme, where initial threshold surface is inferred using quantile regression. We first fit

the threshold exceedance to the Generalized Pareto (GP) distribution, which provides

good asymptotic property to the underlying system when the threshold is sufficiently high

in theory. We propose that the covariate dependence exists in the underlying system, and

hence prediction precision is significantly enhanced if the dependence structure is

considered properly. We use the locally fitted results as the initial estimate for GP

parameters. Further, we assume the observed data are the latent variables mixed with

Gaussian noises with zero mean and unknown variance that can be learned from data

using bootstrapping. The thin membrane model provides the prior information and is also

adopted as the panelized functions added to the underlying distributions, controlled by a

set of smoothing parameters. The parameters of GP distribution are smoothed based on

the sites‟ locations and learned from the data using expectation-maximization. Results of

simulation study demonstrate the superiority of the MRF-GP over the locally-fit models.

Sensitivity and uncertainty analysis are also performed to inspect model‟s precision of

inference. In future, we aim to enhance the model performance by extending the

monoscale Gaussian graphical model to its multiscale, which captures the long-range

dependency by introducing several coarser scales.

Keywords: Gauss- Markov Random Field, GP Distribution, Maximum a Posterior, Thin

Membrane Model, Covariate Effects, Spatial dependence, Expectation Maximization

ii

ACKNOWLEDGEMENT

The author would like to dedicate this page to everyone involved in this project and

helped me along the way to make this project a success.

First of all, I would like to express my highest gratitude for my FYP supervisor-

Professor Justin Dauwels for his consistent insights, consultations, mentorships,

motivations and facilities support throughout the project.

The author thanks Dr. Yu Hang for his inspirational guidance and motivations. I

acknowledge the enlightening discussions with Mr Choo Zheng and Miss Wang

Xueou for their kind collaborations and helpful feedbacks. I further acknowledge the

support of Shell International Research and Massachusetts Institute of Technology

(MIT), the insightful advice and previous works done by Philip Jonathan and his

research team.

Furthermore, I want to dedicate my sincere gratitude to all the teaching faculty and

academic stuff that have taught me and helped me during my undergraduate study in

Nanyang Technological University: Thanks Prof Zhang Qing for bringing me into the

research world through the URECA program; Thanks Prof Er Meng Joo for his

insightful teaching and guidance through the Design and Innovation Project. Thanks

prof Nicolas Privault for his encouragement and mentoring in the Stochastic Process

course.

Last but not least, special thanks for Prof Chua Chin Seng for taking out the precious

time to do my project assessment. I also acknowledge Mr Tan from Machine

Learning Lab and other officers in the Biomedical Research Lab for their technical

support.

iii

LIST OF FIGURES

Figure 2.2.1: Different method to prove the superiority of directional model

Figure 2.2.2: Empirical density of storm peak events at Gulf of Mexico

Figure 2.2.3: Observations of the strong positive correlation between the neighboring sites

Figure 3.2.2.2: The conditional independence of the Gaussian graphical model

Figure 3.2.3.1: Simplified version of the 6 by 13 grid structure

Figure 4.1.2 (a): The locally fit threshold surface

Figure 4.1.2 (b): The locally fit shape parameters surface

Figure 4.1.2 (c): The locally fit scale parameters surface

Figure 6.1: Threshold surface constructed from the quadratic model

Figure 6.3.1: GP Parameters changed with threshold for locally fitted model (sample=

1250)

Figure 6.3.2: GP parameters changed with threshold for MRF-GP model (sample= 1250)

Figure 6.4.1: GP parameters changed with for locally fitted model (sample= 1250)

Figure 6.4.2: GP parameters changed with for MRF-GP model (sample= 1250)

Figure 6.5.1: GP Parameters changed with threshold for locally fitted model (sample= 315)

Figure 6.5.2: GP parameters changed with threshold for MRF-GP model (sample= 315

Figure 6.5.3: GP parameters changed with for locally fitted model (sample= 315)

Figure 6.5.4: GP parameters changed with for MRF-GP model (sample= 315)

Figure 7.2: Multi-scale gauss graphical grid structure (3D visualization from MATLAB)

iv

LIST OF CHAPTERS

CHAPTER 1: INTRODUCTION

In this chapter, the author first introduces the background of the topic in concern,

followed by the motivation for conducting this research. After these, the scope and

objectives of this project are explained and main results are demonstrated in brief. Finally,

the organization of this thesis is presented.

CHAPTER 2: LITERATURE REVIEW

This chapter starts with an overview of the previous studies regarding the extreme value

modeling. We introduce two types of modeling framework that addresses the dependence

structure estimation of the multivariate dataset. Specifically, we first discuss the

conditional approach that captures the probability dependence of pair-wise locations in

brief. After that, we introduce the various modeling approaches that handle the covariate

dependence. Three currently popular parametric covariate-dependent models are

presented in details, namely, directional model, seasonal model and spatial model. A

discussion of the limitations of these existing modeling techniques ends this chapter.

CHAPTER 3: PRELIMINARIES

This chapter covers the major preliminary topics that are essential to the comprehension

of the designed MRF-GP model. The concept of extreme modeling and major categories

are illustrated briefly. We then concentrate on discussing the underlying thin membrane

Gaussian graphical model, with the significant terminologies and concepts explained in

details. The various estimation methods are discussed at the end of this chapter.


DEPENDENCE

v

In this chapter, the proposed non-parametric MRF-GP model is presented. First of all, the

motivation and procedures for model construction are introduced with great details.

Following the introduction, the various smoothing parameter selection techniques are

demonstrated. To optimize the model parameters, some major estimation and

optimization methods are examined, including cross-validation, maximum a posterior and

iterative conditional modes. However, they all have their own flaws that cannot be solved

easily. Therefore, the limitations of these various methods are discussed. The motivation

for other selection methods ends this chapter.


EXPECTATION MAXIMIZATION

In this chapter, another smoothing parameter selection method- Expectation

Maximization- is proposed, motivated by the limitations of the abovementioned

estimation and optimization approaches. The MRF-GP model, with partial modifications,

is reproduced in its matrix form for the convenience of further mathematical operations.

To demonstrate the EM algorithm for parameter selection, some preliminary concepts are

prepared next. The detailed model implementation procedures using the EM algorithm is

introduced to conclude this chapter.

CHAPTER 6: RESULTS AND DISCUSSIONS

In this chapter, the model implemented results are prepared and scrutinized. First, the

approaches to initialize the parameters and therefore to construct the data are introduced.

Second, the theoretical expectations of the designed model are argued, with which the

performance of the MRF-GP model is expected to satisfy. After the model construction

criteria are fixed, various sensitivity and uncertainty tests are performed to assess the

robustness of the model and additionally, to prove the superiority of our model over other

suggested frameworks. Specifically, the threshold sensitivity and uncertainty tests are

performed and the MRF-GP model is proved to be almost surely superior to the locally-

vi

fit results with less sensitivity and uncertainty with respect to the threshold levels.

Followed by that, the smoothing parameter sensitivity and uncertainty measurements are

conducted and our model superiority is confirmed. We repeated the above discussion

procedures with reduced sample sizes to test the sizing effect. A summary of the

abovementioned procedures and results are provided at the end of this section.

CHAPTER 7: CONCLUSION AND RECOMMENDATIONS

In this chapter, the objective and conclusion of our research work are elaborated and my

contribution towards it is specified. Some limitations of the conducted researches are

admitted and justified, followed by my future action plans. After these, my

recommendations to future researchers in this field are presented.

* Chapter 7 concludes the whole thesis.

1

CHAPTER 1: INTRODUCTION

In this chapter, the author first introduces the background of the topic in concern,

followed by the motivation for conducting this research. After these, the objectives of this

project are explained and main results are demonstrated in brief. Finally, the organization

of this thesis is presented.

1.1 Motivation

The modeling of extreme or catastrophic events has shown a rising popularity and

significance recently. It is observed in UK a serious of severe fluvial flooding events,

which have largely affected communities over different parts of the country. These issues

have been addressed by the administrative parties. For the co-ordination of flood

mitigation and risk assessment activities, knowledge of the special characteristics of

fluvial flooding (Extreme River flows) is essential, especially the probability that the

flood will happen in one river during the following days when another river has flood.

Mastering such information can help us take certain precautions when the probability is

high enough, resulting in highly reduction of catastrophic loss.

Another example is about Gulf of Mexico, where the extreme sea states are our main

focus. Extreme sea states are often associated with hurricanes. Modeling those extreme

events is of high importance to the development of offshore facilities. Environmental

design criteria for offshore facilities in this area have inherent uncertainties and

dependencies. These can be functions of climate variability in different covariates

including time and space, and of storm direction and track. Modeling directional model

can help us build the offshore facility with different criteria for different direction, which

is economical as well as secure. Modeling temporal dependence can tell us when is the

best season for activities such as offshore drilling and tourist industry. Meanwhile,

modeling spatial dependence can give us a global view of all the extreme sea states in

Gulf of Mexico. As such, knowledge of the characteristics of the sea states in Gulf of

Mexico is crucial, and reliable extreme value models must incorporate the covariates

effects properly.

2

For reasons of the strong demands, several research works have been undertaken to

investigate and attempt to model the dependence structure of the multisite time-series

extreme dataset. To the best of our knowledge, vast majority of them adopted various

parametric approaches to model the underlying distributions. However, the parametric

models are inherent with several disadvantages. When one is confident to claim that the

interested data is derived from a specified probability model, the parametric statistics are

able to provide satisfactory information and knowledge of the underlying system. On the

other hand, its performance is significantly deteriorated when the underlying distribution

is unknown and no specified model can guarantee to fit the data well. In addition, the

parametric model requires more assumptions than the non-parametric one. Furthermore,

since exhaustive searching for optimal parametric format is not practically possible, it

often leads to significantly biased results if wrong or sub-optimal model is suggested.

In practice, for the purpose of extreme events inference or catastrophic modeling, the

non-parametric models are often preferred, since the underlying distribution is almost

surely unknown and parametric one will tend to provide suboptimal solution and bias the

conclusions.

Therefore, motivated by the rising significance and demands of extreme value modeling

and catastrophic events prediction, the author is encouraged to find means to address

these surging issues. Further, given the drawbacks of the currently dominant parametric

approaches, the author proposes a non-parametric model, MRF-GP, to handle these pain-

points.

1.2 Scope and Objectives

1.2.1 Research Scope

Due to the constraints imposed by time, capital, access to research facilities and other

factors, the author admits the limitations of the research scope. Future enhancements of

the model for higher estimation precision are included in our future action plans.

3

1.2.2 Research Objectives

In this thesis, the author will accomplish the following research objectives:

To review the existing research works regarding the extreme value modeling and

analyzing their pros and cons;

To propose a non-parametric model for catastrophic events prediction and

inference followed by essential case studies;

To validate the results of the simulation is in line with the theory;

To elaborate the superiority of the suggested MRF-GP model with evident

analysis;

To discuss its applications and recommendations for future research works.

1.3 Main Results

The numerical results obtained from the simulation studies are in line with our

expectation from theory, with acceptable variations and deviations. By inspecting the

results from two sets of comparison studies with locally-fit results, the superiority of the

designed approach is proved. The detailed results demonstrations and discussions are

shown in chapter 6.

1.4 Thesis Organization

This thesis covers 7 main chapters. Chapter one provides the introduction of the topics in

concern- extreme events modeling. The motivation for conducting this project is

elaborated in details. After that, the project scope and objectives are presented briefly.

The major research results are also included in this chapter.

In Chapter 2, I give an overview of the previous researches regarding the interested topics

and elaborate several prevalent approaches and models, selected from a large body of

research works. Furthermore, some limitations of those existing estimation frameworks

are argued in favor of the designed non-parametric MRF-GP model.

In Chapter 3, the essential preliminary theories and topics are covered in brief. We first

discuss the families of extreme value models. Following that, the thin membrane

4

Gaussian Graphical Model is introduced. A selection of estimation methods including

Maximum likelihood and Maximum a posterior estimation are also presented in this

chapter.

In Chapter 4, the proposed non-parametric MRF-GP model is presented in its theoretical

form. The motivation and procedures for model construction are introduced and the

various smoothing parameter selection techniques are demonstrated. However, the flaws

of these parameter selection approaches are obvious and ignores these drawbacks will

cause misleading results. The motivation for other selection methods ends this chapter.

In Chapter 5, another smoothing parameter selection method is proposed, motivated by

these parameter selection limitations. The MRF-GP model, with partial modifications, is

reproduced in its matrix form for the convenience of further mathematical operations. To

demonstrate the EM algorithm for parameter selection, some preliminary concepts are

prepared next. The detailed model implementation procedures using the EM algorithm is

introduced to conclude this chapter.

In Chapter 6, the major results and presented with detailed discussions and analysis. The

sensitivity and uncertainty of the estimation and prediction by the MRF-GP model are

elaborated. The conclusion of MRF-GP‟s superiority ends this chapter.

In Chapter 7, the final conclusions and recommendation are made. Limitations for the

research are admitted and some suggestions for future research works are proposed. A

final summary of the topics in concern ends the whole thesis.

5

CHAPTER 2: MODELING EXTREME EVENTS

This chapter starts with an overview of the previous studies regarding the extreme value

modeling. We introduce two types of modeling framework that addresses the dependence

structure estimation of the multivariate dataset. Specifically, we first discuss the

conditional approach that captures the probability dependence of pair-wise locations in

brief. After that, we introduce the various modeling approaches that handle the covariate

dependence. Three currently popular parametric covariate-dependent models are

presented in details, namely, directional model, seasonal model and spatial model. A

discussion of the limitations of these existing modeling techniques ends this chapter.

2.1 Overview

A large body of statistical research works has been launched to routinely investigate the

covariate dependence among multi-site time series dataset in extreme value analysis1, for

instance, Davison and Smith [1990], Robinson and Tawn [1997]. The research literatures

we review are mainly on the application regarding the offshore facilities design criteria

for the hurricane dominant region such as the Gulf of Mexico. The covariates being taken

into consideration are spanning among a large family, although the spatiality,

directionality and seasonality are the ones of major concerns.

Ledford and Tawn [1997] and Heffernan and Tawn [2004] discuss the modeling of joint

depend extremes using conditional approach, in which the extrapolation method for

limited samples are introduced and therefore statistical accuracies are to be enhanced.

Scotto and Guedes-Soares [2000] describe modeling using non-linear threshold. Spatial

models for extremes are designed for estimation of predictive distribution by Cores and

Casson [1998], Casson and Coles [1999] and Cores and Tawn [1996, 2005]. For spatial

applications, a spatio-directional model for extreme waves is designed by Philip Jonathan

and Kevin Ewans [2009] for the application of Gulf of Mexico. Paul Northrop and Philip

Jonathan [2010] further discuss the spatially-dependent non-stationary extremes with

applications in the same region. Regarding directionality, a large body of works is also

1 Philip Jonathan, Kevin Ewans (2008). Modelling The Seasonality of Extreme Waves In the Gulf of Mexico.

Sell Technology Centre Thornton and Sell International Exploration and Production. Proceedings of OMAE 2008, the 27

th International Conference on Offshore Mechanics and Arctic Engineering.

6

available, such as the offshore facility design criteria proposed by Jonathan and Ewans

[2007], Ewans and Jonathan [2007] and Jonathan and Ewans and George Forristall

[2008], in which a detailed model comparison is suggested. Addressing seasonal

dependencies, Anderson et al. [2001] performed a seasonal analysis and asserted that the

advantages for adopting model incorporating covariate dependence is apparent, unless it

can be proved statistically positive that a model ignoring covariate effect is no less

appropriate and thus the save of extra efforts accountable for the covariate analysis can

outweigh the increase of statistical accuracy. Motivated by Anderson, Chavez- Demoulin

and Davison [2005] and Coles [2001] provide insights into the design of non-

homogenous Poison Model in which extremal properties are modeled as functions of

covariates. Demoulin and Davison also demonstrated the application of block

bootstrapping approach for uncertainty analysis.

We carefully studied the relating works in the investigation of covariate dependence

before we design our own model- Gauss-MRF-GP model. In this section, three major

braches of covariate analysis are studied, namely, spatial dependence, directional

dependence and seasonal dependence. Leading models regarding the abovementioned

concerns are illustrated and a summary of the previous works and their implications end

the discussion. The MRF-GP model that we proposed is followed in the next chapter.

2.2 Parameter Dependence Modeling

2.2.1 Directional Model

In this model, Extreme Value Model is built considering the directionality of the data.

Based on the assumption that the samples from neighboring sites are independent and

identically distributed, threshold exceedances of marginal distributions of the random

variables are fitted to Generalized Pareto Distribution family, where ,ux is a

specified threshold, is the shape parameter and 0 is the scale parameter. Maximum

likelihood estimation is used to estimate and , given a sample of data:

( |( ) ) ,

( )- (

⁄ )……………………..................... (1)

7

In directional extreme value model, using a Fourier form, and vary smoothly with

direction , [0,360). The parameter , and are estimated using roughness-

penalized maximum likelihood estimation, where the optimal value of roughness is

chosen using cross-validation. For simplicity, direction sectors, partition of [0,360),

instead of all directions is used. The cumulative distribution of the maximum Hssp

in any

directional sector is modeled and discussed. Simulation studies are also provided.

Jonathan, Ewans and Forristal [9] compared the directional extreme value and constant

model, which is the extreme value model that assumes that extremal characteristics are

constant with direction, in details. Superiority of the directional extreme value model is

proved by different methods, which is concluded in Fig 2.2.1.

Different methods to prove the superiority of directional model

(using artificial data)

Methods Directional model Constant model

Fit GP to the dataset

The variation of γ and σ

changed with threshold is as

expected, and a relatively low

threshold can guarantee GP

distributon

The variation of γ and σ

changed with threshold is not

as expected, only a high

enough threshold can

guarantee GP distribution

Draw empirical cdf The quality of the fit to two models cannot be easily distinguished

Likelihood ratio testThe probability of rejecting the constant model in favour of the

directional model is high for all but the highest threhold

Estimate the quartiles of

the distribution of the 100-

year maximum

In excellent agreement with

theory (the underlying

distribution of artificial data)

over a range of thresholds

Estimates vary with threshold

Figure 2.2.1 Different method to prove the superiority of directional model

The directional extreme value model is also applicable to other model that considers

other covariate effects, such as seasonality of the data, by substituting the direction

8

parameter in the model into the other interested covariates parameter and performing

minor adjustment. Comparison of the directional model and constant model shows that,

when the datasets shows strong dependence on directionality, superior result will always

be got by taking the directionality of the data into account instead of ignoring direction

parameter and set the extremal characteristics constant in all directions.

2.2.2 Seasonal Model2

Statistics of extremes over threshold depend on seasons as well. In this model, a non

homogeneous Poisson process is adopted to capture the seasonal dependence, with a

simulation study of the storm peak events with respect to the Gulf of Mexico followed.

The extreme tail behavior over the threshold is characterized using the Generalized

Pareto Distribution. The GP parameters and rate of occurrence of extremes over threshold

are designed to vary seasonally, with the seasonally-varying threshold being estimated

independently. The model parameters are smoothed regulated by the roughness-penalized

maximum likelihood. Further, Cross-validation is used to learn the optimal level degree

of roughness.

Capturing covariate effects of the extreme storm peaks is crucial for the design of

offshore facilities. Statistically, it has been proven that the design criteria and precision of

estimation is superior for the model incorporating the covariate effects than we could

predict were we to base our belief on estimate which ignoring the covariate effect3

The data used in this simulation study are significant wave heights from the GOMOS

Gulf of Mexico hind-cast study (Ocean-weather, 2005), spanning from 1900 to

September 2005 inclusive. Data analysis shows that the effects of storm peak direction

and season on extreme sea states are interrelated (Shown in Figure 2.2.2).

2 Philip Jonathan, Kevin Ewans (2008). Modelling The Seasonality of Extreme Waves In the Gulf of Mexico.

Sell 3 Philip Jonathan, Kevin Ewans (2007). The effect of directionality on extreme wave design criteria. Shell

Research Limited & Shell International Exploration and Production.

9

Figure 2.2.2: Empirical density of storm peak events at Gulf of Mexico. Darker shading

represents higher density (Jonathan & Ewans, 2008)

To model the seasonality, the authors adopt a variable threshold µ to reflect the seasonal-

dependence observed from the extreme datasets, rather than a fixed shreshold.

2.2.3 Spatial Model

In environmental applications it is found frequently that the extremes of a variable of

interest, such as currently popular multi-site datasets, are non-stationary in nature,

varying systematically in space. And in these cases, it is commonly observed of non-

negligible inter-site dependence, which is desirable to be estimated accurately. However,

threshold selection may be problematic for modern extreme value models in this regard,

particularly when the extremes are non-stationary. Paul Northrop and Philip Jonathan

proposed a new method to infer this dependence structure implicitly, while adopting a

new approach in selecting covariate-dependent threshold using quantile regression model.

In the paper, the authors argue that if non-stationarity in inter-site extremes is obvious, a

non-constant threshold is advised to set such that the non-stationarity is reflected properly.

Based on this assumption, the authors proposed a quantile regression method to

determine the threshold for each location.

10

To illustrate this model, the authors consider the stochastic behaviors of extreme sea

states represented by the significant wave heights, using the hindast data from 72

locations in the Gulf of Mexico. The author asserted that, to enhance the precision of

estimation, it is advantageous to simultaneously model the wave data. And the observed

strong positive correlation with neighboring sites (Figure 2.2.3) shows evidence for this

hypothesis.

Figure 2.2.3 Observations of the strong positive correlation between neighboring sites

In this paper, the author fits the spatially dependent regression models by using the

methodology suggested by Chandler and Bate (2007) to handle the covariate dependence

in the data, assuming irrelevance of the potential covariate of seasonality and

directionality. The marginal distributions of the local maxima is modeled using the

Generalized Extreme Value (GEV) (Jenkinson, 1955), under the assumption that the

limiting distribution is non-degenerate and the vector of the maxima

* + are independently and identically distributed.

Then, for sufficiently large threshold, given that there is an exceedance, has

approximately a Generalized Pareto Distribution. Further, the authors suggest that the

11

parameterization of this model is invariant to the chosen threshold, which is

advantageous if a non-constant location-based threshold is utilized. And the selection

criteria underlined by the same exceedance probability is naturally sensible, which is

governed by:

Then, assuming the data from different clusters are independent, one can derive the log-

likelihood of the distribution function:

However, the above model building process is only true given that the inter-site

dependence is non-existed and the marginal distributions are indeed stationary, which are

both questionable. To address this problem, the authors proposed to enhance the point

process model by handling the dependence structure implicitly.

2.3 Model Discussions and Limitations

These models prove that the covariate dependence structure indeed exists in many dataset,

and it is necessary for theoretical analyst and practical performers to address these issues

to improve their precision of estimations.

First, comparison of the seasonal model with the models ignoring the seasonality reveals

that the advantages for adopting model incorporating covariate dependence is apparent,

unless it can be proved statistically positive that a model ignoring covariate effect is no

less appropriate and thus the save of extra efforts accountable for the covariate analysis

can outweigh the increase of statistical accuracy.

12

Second, comparison of the directional model and constant model shows that, when the

datasets shows strong dependence on directionality, superior result will always be got by

taking the directionality of the data into account instead of ignoring direction parameter

and set the extremal characteristics constant in all directions.

Third, comparison of the spatial model proposed by Paul Northrop and Phillp Jonathan,

in which the thresholds across different sites are varied with marginal covariate

characteristics to achieve a constant extreme quantile (exceedance probability), with the

constant model also confirms the existence of covariate effects. It is argued that the

former method is more logical to model a constant exceedance probability than a constant

physical value in the presence of non-stationary patterns, which is indeed the case for

majority of the multisite time-series data in practice.

Therefore, it is intuitively true that one can improve the precision of estimation of the

spatial characteristics by taking the covariate effects into consideration. However, as

elaborated in the first chapter, these approaches all stem from the parametric side of

modeling. Therefore, the generic disadvantages of parametric modeling are inherent in

these approaches, and the resultant models may thus be biased or suboptimal.

In sum, motivated by the significance of covariate dependence in the modeling process

and drawbacks of the existing approaches, we propose a non-parametric MRF-GP model

characterized by Markov properties to address these pain-points.

13

CHAPTER 3: PRELIMINARIES

This chapter covers the major preliminary topics that are essential to the comprehension

of the designed our generalized pareto distribution with Markov random field prior

(MRF-GP) model. The concept of extreme modeling and major categories are illustrated

briefly. We then concentrate on discussing the underlying thin membrane Gaussian

graphical model, with the significant terminologies and concepts explained in details. The

various estimation methods are discussed at the end of this chapter.

3.1 Extreme Value Modeling

3.1.1 Extreme Value Theory

Extreme value theory is a branch of statistic science that handles the limiting extreme

deviations apart from the median of a probability model. There are two main approaches

of this theory.

The first theorem of the extreme value theory is the named as Fisher–Tippett–Gnedenko

theorem ((Fisher and Tippett, 1928; Gnedenko, 1943). It governs the behaviors regarding

the asymptotic distributions of the extreme order statistics4. In 1958, Emil Julius Gumbel

asserted that for any well-behaved initial continuous distribution, only a few models are

needed to give the asymptotic estimation of the underlying distributions. Specifically, the

extremes of the i.i.d. distributed multivariate samples after proper renormalizations

converge to one of the three special distribution familities: namely, Gumbel, Weibull and

Frechet distribution, respectively.

In contrast, the second theorem of extreme value theory, also called the Pickands–

Balkema–de Haan theorem, provides the asymptotic tail distributions over peak of a

random variable x when the true distribution of it is unknown5.

The major difference between the two theorems stems from the process of initial data

generation. in case of theorem one, given all the values are already maxima, the data to

4 http://en.wikipedia.org/wiki/Fisher%E2%80%93Tippett%E2%80%93Gnedenko_theorem

5 Balkema, A., and Laurens de Haan (1974). "Residual life time at great age", Annals of Probability, 2,

792–804.

14

be fitted into models are generated in its full range. Theorem two, on the other hand, only

applies to the data that surpasses a specified threshold. This approach is adopted heavily

by the insurance and reinsurance industry, where only payouts over threshold are

concerned by the company.

3.1.2 Generalized Extreme Value Model

In statistics, the Generalized Extreme Value model developed directly from the extreme

value theory is a probability family that gives a generic form that contains the

combination of Gumbel, Fréchet and Weibull distributions. In fact, it is the limiting

distribution of properly normalized maxima of a sequence of i.i.d. distributed random

variables, which can be adopted to estimate the unknown extreme distributions. It has the

form of:

( ) 8 0 (

)1

⁄

9

……………….. (2)

Where is the position parameter, is the scale parameter and is the

shape parameter. It is notable that the shape parameter governs the tail behaviors of the

distribution, and therefore the extreme value family can be further divided into 3 sub-

family: Gumbel, Fréchet and Weibull distributions, corresponding to the case when the

shape parameter is zero, positive or negative, respectively.

3.1.2.1 Type1: Gumbel Distribution

When the shape parameter is equal to zero, the generalized extreme value distribution

follows a Gumbel sub-family, in the form of:

………………….........… (3)

3.1.2.2 Type2: Fréchet Distribution

15

When the shape parameter is positive, the generalized extreme value distribution follows

a Fréchet sub-family, in the form of:

………………… (4)

3.1.2.3 Type3: Weibull Distribution

When the shape parameter is negative, the generalized extreme value distribution follows

a Weibull sub-family, in the form of:

………………… (5)

3.1.3 Peak Over Threshold (POT) Model- Generalized Pareto Distribution

Given is an unknown distribution, one is interested in estimating the tail distributions,

( ) ( | ) ( ) ( )

( )

…………….….. (6)

Where y is the non-negative threshold exceedance. Pickands 6, Balkema and Haan

claimed in their papers that, given a set of independent and identically distributed random

variables ( ), the conditional excess distribution ( ) can be well fitted

by the asymptotic Generalized Pareto Distribution form, when the threshold selected is

sufficiently high.

In fact, the extreme value model we adopt in this paper is Generalized Pareto distribution

(GPD). In modeling extreme events, marginal distributions of the extreme of individual

6 Balkema, A, and Laurens de Haan (1974). „Residual life time at great age‟, Annals of Probability, 2, 792-

804

16

variables are inferred by using empirical data. Subsequently, threshold exceedances of

marginal distributions of variables are fit to GPD as GPD fits the extreme tail behavior of

variables most well by theory.7 The cumulative distribution function of GP distribution is

given in equation (1):

( |( ) ) ,

( )- (

⁄ )

Where ,ux u is a specified threshold, is the shape parameter and 0 is the scale

parameter.

3.2 Thin Membrane Gaussian Graphical Model

3.2.1 Markov Random Field

Markov random field (MRF) is multi-dimensional stochastic process defined on a

discrete lattice, which is a spatial analogue of the transition probabilities of a Markov

chain. Similar to Bayesian network, it is a model with Markov property used to represent

parameter dependencies. In a conditional MRF, the sites of interest interact with one

another via a neighborhood system, { }. Mathematically, it holds as follows:

( | ( )) (( | )

Where is the set of neighboring sites indexed by , and ( ) contains all the sites

excluding the site itself. In other words, the marginal distribution of a particular site is

only dependent on its neighboring sites, and the dependence structure with the other sites

is assumed to be non-existing inside a MRF network.

In addition, a Markov Random Field network belongs to a Gaussian stochastic process if

the neighboring dependence follows a multivariate normal distribution. However, the

7 Philip Jonathan, Kevin Ewans and George Forristall. “Statistical estimation of extreme ocean

environments: the requirement for modeling directionality and other covariate effects,” journal of Ocean

Engineering 35 (2008) 1211- 1225.

17

dependence structure can be under any family of distribution in nature. In this paper, we

choose the Gaussian MRF as our adopted model.

In sum, MRF is a natural framework for modeling covariate dependence, as the events

occurring in each individual site in the regions under consideration is intuitively

interdependent. As a result, it is advantageous to employ the MRF to model the parameter

dependence structure of the underlying marginal GP distribution. This way, the result is

smoothed and a better estimate of the reality is achieved.

3.2.2 Gaussian Graphical Model (GGM)

3.2.2.1 GGM Basics

Generally, the Graphical Model implements the Markov Random Field graphically. In

graph theory, a probability distribution can be captured by a graph G consisting of nodes

V and directed or undirected edges . Conventionally, every node is associated with a

random vector ( where N is the number of nodes) for which the

statistical dependences among the nodes will be represented by the corresponding edges.

Generally, probabilistic graphical models use a graph-based representation as the

foundation for encoding a complete distribution over a multi-dimensional space8

.

Commonly, there are two types of graphical models: directed and undirected. The former

one is often referred as Bayesian network, or belief network. Classical examples of this

type of directed acyclic graph includes Hidden Markov model and neural networks. For

this category, the conditional independence of each node given its parent values (those

vertices pointing directly to the interested node via a directed edge) is assumed to be local

and universal. Precisely, it can be modeled mathematically as:

8 http://en.wikipedia.org/wiki/Graphical_model

18

Where pa(v) is the set of its parents. However, in this paper, we mainly focus on the

undirected graphical models, specifically, the abovementioned Markov Random Field

(MRF), in which the directionality of edges are disregarded. In this regard, if the joint

distributions of all interested nodes are Gaussian in nature, then the MRF is literally a

Gauss- Markov Random Field. And the pdf of the Gaussian process u is defined as:

(* | +

| | 2

( ) ( )3

……………….... (7)

Where of dimension is the covariance matrix of the dataset with

= ( ) and is always positive definite.

3.2.2.2 The Conditional Independence

One important property of the Gaussian Graphical Model is tis conditional independence.

To elaborate this point, we draw a sample graphical model (Figure 3.2.2.2) here:

Figure 3.2.2.2 The conditional independence of the Gaussian Graphical Model

As shown in the above plot, we group the simple graphical model into three blocks (A, B,

C) and we then study its dependence structure. As defined by graphical theory, only the

19

objects with an edge connected are correlated. Therefore, block A and C are conditionally

independent given the information of block B (conditional on B). Mathematically:

( | ) ( | ) ( | )

In the case when the dependence structure follows a Gaussian family, the conditional

dependence and independence are ensured by considering its precision matrix.

3.2.3 Thin Membrane Model

Particularly, the thin membrane model, characterized by the MRF, is selected in this

paper as the distribution function of the priors. Suppose the Gauss MRF field under

consideration is smooth overall, with a certain degree of allowance for discontinuities, the

thin membrane model penalizes the differences between the neighboring nodes by

equation (8):

( | ) . ( )

( ) /..............……………….… (8)

Where is the Gaussian random process, parameterized by mean and covariance

matrix ; ( ) is the neighboring system indexed by ; and is a parameter, which may

also follows a common distribution, specifying the strength of inter-site penalty.

3.2.3.1 The Grid Structure

The underlying system we are interested in can be simplified into a 6 by 13 grid structure

(Figure 3.2.3.1), indexed from 1 to 78, respectively. Each index represents an individual

site, from where the marginal extreme events are observed and to be estimate using the

MRF-GP model. This is an implementation of the graphical model, which is

characterized by the Markov property. For instance, site one is directly connected with

site 2 and 14. Thus, according to the definition of thin membrane model, the marginal

20

distribution of site 1 is only dependent on the observations on site 2 and 14, regardless

what are occurring on other sites.

1 5432 6 10987 131211

27 31302928 32 36353433 393837

14 18171615 19 23222120 262524

40 44434241 45 49484746 525150

66 70696867 71 75747372 787776

53 57565554 58 62616059 656463

Figure 3.2.3.1: Simplified version of the 6 by 13 grid structure

3.2.3.2 The Precision Matrix

The dependence structure of the thin membrane model can be expressed in matrix form

captured by its unique precision matrix. We define as its precision matrix, which is

the inverse of its covariance matrix:

...............................................…..……… (9)

Where is the smoothing parameter controlling the penalizing strength, and is the

adjacent matrix, in the form of equation (10):

<

= .............................................…..…… (10)

21

The diagonal coefficients belong to { +, where is the number of adjacent sites

indexed by site n, , - (p is 78 in our case). The off-diagonal elements are zero if

there is no edge connecting the two sites or minus one if the connecting edge exists. This

matrix is sparse and its inverse almost surely exists. implements the thin membrane

model and characterizes the spatial property of the multi-site data. Additionally, the

smoothing parameter controls its dependence strength. This model indicates the initial

state of the interested system before further observations are made.

3.3 Estimation Method

3.3.1 Maximum Likelihood (ML)

Given a statistical model with unknown parameters, maximum likelihood estimate can be

used to calculate the unknown model parameters. Specifically, assuming a random

variable of interest follows a normal distribution with unknown mean and variance. One

can use MLE to estimate the distributional parameters. This process is accomplished by

treating the mean and variance as variables that are free to vary and find the parametric

value that makes the observed outcome the most probable9. Generically, MLE gives a

unified approach for parameter estimation. For a set of observed data and underlying

designed models, MLE selects the parametric values underlying the model that produces

the distribution giving the observed data the greatest probability or, on other word, the

maximized likelihood function.

To apply this algorithm, we first find the joint probability density function of the

underlying observed data , parameterized by the parameters vector :

* | +

Now, we look at this probability function from a different angle. We regard the observed

n-dimensional as fixed parameters and treat the parameter vector as variable that

9 http://en.wikipedia.org/wiki/Maximum_likelihood

22

can vary freely. The resultant distribution function is conventionally called likelihood

function in the form of equation (11):

.................… (11)

In practice, it is understandable that the log-likelihood is more convenient for further

operations and thus it is often desirable to transform this likelihood to this logarithm form:

By maximizing this likelihood function, the observed data will then be most probable.

Maximum likelihood estimation has several advantages in terms of optimization. First, it

has good convergence properties as the sample size increases, especially for many

conventional distribution families. Second, it is also a simple algorithm that is easy to

implement.

3.3.2 Maximum A Posterior (MAP)

A maximum a posterior (MAP) is a mode of the posterior distribution, which is closely

related to the Fisher‟s Maximum Likelihood algorithm10

. This method can be adopted to

search for point estimate of unobserved information based on empirical data.

MAP deviates from MLE largely because its extra incorporation of a prior distribution

over which one wants to investigate. Prior information is primitive characteristic

regarding the interested system before rather than those evident on recent observations.

Due to the introduction of the „a priori knowledge‟, MAP is regarded as the

regularization of the Maximum Likelihood estimation, and is capable to mitigate the

over-fitting of the underlying model.

10

http://en.wikipedia.org/wiki/Maximum_a_posteriori

23

If x is the observed sample of the unknown population and is the model parameter

underlying the system, the distribution of the observed outcomes follows:

From a different perspective, if we treat the parameter underlying the model as a random

variable that allows to vary freely, while x as a fixed set of quantity, then the posterior

distribution respect to after applying the Bay‟s Theorem is:

( | ) ( | ) ( )

( )

The MAP estimate of is the distribution of that gives the mode of the posterior

distribution, which can be written as:

( ) ( | ) ( | ) ( )

( )

…………….... (12)

Where ( ) is the prior distribution of . Notably, the MAP estimates coincide with the

MLE result when the prior is uniformly distributed. This is the case when there is no

prior information.

24


DEPENDENCE

In this chapter, the proposed non-parametric MRF-GP model is presented. First of all, the

motivation and procedures for model construction are introduced with great details.

Following the introduction, the various smoothing parameter selection techniques are

demonstrated. To optimize the model parameters, some major estimation and

optimization methods are examined, including cross-validation, maximum a posterior and

iterative conditional modes. However, they all have their own flaws that cannot be solved

easily. Therefore, the limitations of these various methods are discussed. The motivation

for other selection methods ends this chapter.

4.1 Model Construction

In this sub-section, we propose a Gauss-MRF based GP distribution model to handle

specifically the spatial dependence structure revealed in the multi-site dataset. A brief

introduction of this model is presented, followed by the designed procedures. The

learning and inference algorithm of this model will be summarized at the back of this

session. A summary of this model completes this section.

4.1.1 Introduction to the MRF-GP Model

Motivated by the large body of previous works regarding the covariate effect

incorporation in statistical prediction models, in this thesis, we propose a new approach

to handle the spatial dependence characteristics based in the currently popular multi-site

and time-series data, in which the samples observed are spatially non-stationary in nature.

First, we assume all the data in various locations are independently and identically

distributed, following the generalized pareto distribution with unknown GP parameters,

we then handle the covariate effect implicitly to reflect the dependence. To start with, the

locally fitted thresholds are initially selected specified by a pre-determined universal

quantile level, referring to which the local GP parameters (shape and scale parameters)

are learned from the given samples as the initial estimates. However, the observed

marginal threshold is inherently blended with Gaussian noise, specified by the hidden

25

underlying threshold and variance. To address this issue, we introduce the Gauss-MRF

based thin membrane model, which is later employed as the roughness-penalized function

to regulate the threshold smoothing process. The interior point method is adopted to

implement the Maximum a Posterior (MAP) estimation for its simplicity.

For each site, the extreme wave heights over the smoothed threshold still follow the

Generalized Pareto Distribution (GP) strictly, characterized by the shape and scale

parameters, respectively. We build the joint distribution of the GP parameters

incorporating again the roughness-penalized function characterized by the thin membrane

model. We apply directly MAP estimation to determine the smoothed model parameters.

The smoothed results are compared with the locally-fitted model, from which a

discussion will be presented.

4.1.2 Locally Data Fitting

Locally fit threshold and GP parameters are necessarily needed as the initial value to find

the MAP estimation of all the unknown parameters based on interior point method.

To obtain the locally fit result, the dataset of interest are locally fitted into the

Generalized Pareto distribution framework, specified by pre-determined marginal

threshold with the same exceedance probability, governed by equation (1):

( |( ) ) ,

( )- (

⁄ )

Where x is the interested multisite date, gamma is the shape parameter, sigma is the scale

parameter and u is the self-specified threshold value.

The resulting model parameters- threshold, shape and scale parameters are recorded in

the Figure 4.1.2 (a), (b) and (c), respectively.

26

Figure 4.1.2 (a): The locally fit threshold surface

Figure 4.1.2 (b): The locally fit shape parameters surface

Figure 4.1.2 (c): The locally fit scale parameters surface

4.1.3 Threshold Optimization Method

4.1.3.1 Threshold Selection

0

2

4

6

0

5

10

15

3.8

4

4.2

4.4

4.6

0

2

4

6

0

5

10

15

3.8

4

4.2

4.4

4.6

0

2

4

6

0

5

10

15

-0.04

-0.02

0

0.02

0.04

0.06

0

2

4

6

0

5

10

15

-0.3

-0.2

-0.1

0

0.1

0

2

4

6

0

5

10

15

-0.3

-0.2

-0.1

0

0.1

0

2

4

6

0

5

10

15

-5

0

5

10

15

20

1

2

3

4

5

6

0

5

10

15

1.5

2

2.5

3

1

2

3

4

5

6

0

5

10

15

1.5

2

2.5

3

0

2

4

6

0

5

10

15

-0.1

-0.05

0

0.05

0.1

0.15

27

Proper threshold selection is crucial for the model to be practically useful. Jonathan and

Paul11

suggested that if non-stationary behaviors are observed within the data, a covariate

dependent non-stationary threshold should be adopted to reflect the non-stationary quality.

Therefore, we set a primitive self-defined constant probability exceedance probability,

instead of constant threshold, before we perform dependence handling and smoothing.

The standard approach to threshold selection is to fit the covariate-dependent model over

a sufficient large range of exceedance probability (quantile level) and expect for stability

in the parameter estimates (Paul and Jonathan, 2010).

Theoretically, if the designed model is applicable given a properly chosen latent

threshold surface with certain tolerance for variation, the resultant GP parameters after

smoothing must fulfill the following criteria to be reliable:

For shape parameter with respect to each individual site, the estimates varying with

different threshold value should be almost constant. In addition, for each corresponding

scale parameter , the stiffness of the trajectory should behave linearly, with the slope

approximately equal to . Adding to that, the randomly chosen percentile of wave

heights should be nearly constant for various threshold values, from which one can

confidently say that the model is indeed a good estimate of the underlying distribution.

Literally, these desirable characteristics can be regarded as the model building criteria.

Good compliance with these lines of argument implies the superior quality of model.

4.1.3.2 The Thin Membrane Model

First, we smooth the threshold of different sites based on thin membrane model

introduced above. Particularly, suppose the Gauss MRF field under consideration is

smooth overall, with a certain degree of allowance for discontinuities, the thin membrane

model (equation (8)) penalizes the differences between the neighboring nodes:

11

Paul Northrop and Philip Jonathan (2010). Modeling spatially-dependent non-stationary extremes with application to hurricane-induced wave heights.

28

( | ) : ∑ ∑ ( )

( )

;

Where is the Gaussian random process, parameterized by mean and covariance

matrix ; ( ) is the neighboring system indexed by ; and is a parameter, which may

also follows a common distribution, specifying the strength of inter-site penalty.

This penalty-term is added into the smoothing process as the roughness-penalized

function to handle the inter-site dependence implicitly. The integration of penalizing

function is commonly adopted in recent works. This is due to fact that many multisite

data are non-stationary in nature but smooth overall. The parameters of extreme marginal

behaviors vary systematically according to the covariate effect, which perfectly match

with one‟s intuitions. In the absence of this penalty term, the variation among data may

be abrupt and the precision of estimation will be deteriorated. To demonstrate the

usefulness and significance of the penalty strength (stiffness) parameter , we perform a

simple extreme scenario analysis to investigate the model‟s asymptotic behaviors:

Scenario one: Alpha= 0

When Alpha is forced to zero, this term will be effectively removed from the smoothing

process. This implies that the resultant parameters will not be smoothed at all, following

strictly to the local fit behaviors. In this case, if we perform optimization method on this

model, the result will be converging to the locally fit parametric data almost surely, and

the effect of optimization will be pure maximization.

Scenario two: Alpha=

In contrast, when the smoothing regulating parameter is forced to infinity, the behaviors

of this model will be completely the opposite as one may assume. In this regard, the

penalizing term, which measures the stiffness of the parametric surface, will be extreme

large. There will be zero degree of flexibility (allowance) for the parameters across

different locations to vary, and therefore all the parameters will be approaching the same

29

converging value. As a result, this will be case of surely smoothing, and the surface

consisting of the all the parametric values across various sites will be flat. This is not

desirable.

As far as the threshold is concerned, our goal is to find the optimal value such that the

abovementioned modeling building criteria is met. The same thing applies to the GP

parameter selection. Therefore, it is important to adjust the , the stiffness of threshold

surface for each site, to find an optimal threshold level where the covariate effects are

properly handled.

With the assumption that the underlying threshold is distributed following a thin

membrane framework, we obtain:

( | ) : ∑ ∑ ( )

( )

;

Where v is the underlying threshold one is looking for, is the stiffness parameter for

threshold, V is the collection of data sites ( ) and ( ) is the neighboring sites

indexed by i.

4.1.3.3 Joint Threshold Distribution

The observed threshold, which can be determined using quantile regression, is the

underlying threshold mixed with Gaussian noise with zero mean and unknown variance.

Therefore, it follows ( ):

√ .

( )

/

…………….... (13)

It is worthy to mention that the variance is a variable that controls the flexibility of the

threshold movement. In other word, viewing the threshold over the grid as a surface, the

variance quantifies the space for threshold to vary. Therefore, when variance tends to be

30

large, the threshold is free to move centering in the underlying threshold. Furthermore,

since for threshold stiffness parameter, we have no prior information, we may intuitively

assume a uniform distribution for it. Therefore we can obtain:

( | ) ( | )

( )

Plugging into the distribution information, one can get the joint distribution with respect

to the underlying threshold v:

( | ) ∏

√ .

( )

/ √ . ( )

( ) / ( )

( )

…………….... (14)

4.1.3.4 Threshold Smoothing

To find the optimal threshold level, one can utilize the various optimization methods

available. In our case, we apply the interior point method powered by MATLAB through

maximizing equation (14) . Besides, since the distribution of the observed threshold is not

affecting the optimization result, it can be simply ignored. Therefore, we need to

maximize:

( | ) ( | )

( ) ( | ) ( | ) ( )

∏

√ .

( )

/ √ . ( )

( ) / ( ),

Where ( ) is uniformly distributed, is the observed threshold, and is the

underlying threshold characterized by the Markov random field.

31

4.1.4 GP Parameter Optimization

Locally fit threshold and GP parameters are necessarily needed as the initial value to find

the MAP estimation of all the unknown parameters based on interior point method.

4.1.4.1 The Thin Membrane Model

Next is to smooth the GP parameters. The thin membrane model is again selected as the

distribution structure of GP parameters ( , ) from different sites. Each site has

connection with its 4 immediate neighbors (except those on the border of the grid)

characterized by the Markov property:

( | ) ( .

/

( ) ) ...................................... (15)

And, similarly:

( | ) ( ( )

( ) ) ...................................... (16)

Where V is the set of all the sites, N(i) denotes the neighbor sites of site i.

4.1.4.2 Joint Parametric Distribution

For each site, the wave height data over threshold is still GP distributed with the

probability density function (pdf) of GP distribution given in equation (1):

( | )

,

( )-

( )

Moreover, since we have no prior information about the distribution of and , we

choose the uniform distribution covering the possible range of and , thus,

( | ) ( )

( )

32

4.1.4.3 GP Parametric Smoothing

To find the optimal parametric model, we want to maximize the joint likelihood of the

GP parameters. Besides, since the distribution of the original dataset is non-changing, it

has no effect on the optimization result as well. Thus, it is safe to neglect this term.

Therefore, our goal will be to maximize:

( | ) ( )

( )

: ∑ ∑ .

/

( )

; : ∑ ∑ (

( )

) ;∏

[

( )]

( )

…………….. (17)

Conventionally, we take the logarithm on both sides:

Log {** | + * ( | )+

∑ ∑ .

/

( )

( ) ∑ ∑ ( )

( )

( )

∑84

5 [

( )] 9

…………….. (18)

Therefore, our next step is to maximize the log likelihood function.

4.2 Smoothing Parameter Selection

In this sub-section, we attempt to apply Cross-validation, Maximum a posterior and

Iterative conditional modes to meet the maximization objectives in the smoothing

33

parameter selection process. However, all the three methods failed to provide satisfactory

results after the actual implementation. Discussions are given at the bottom of the sub-

section to summarize the limitations and disadvantages of the three used optimization

approach.

4.2.1 Cross-validation (CV)

Cross Validation, sometimes called rotation estimation12

, is a commonly used technique

for assessing the predictive performance of an interested dataset. In other word, it is

mainly adopted in the settings where the goal is prediction, by whom wants to assess the

accuracy of the prediction. Generically, the first step in CV is partitioning a sample data

into complementary subgroups (the number of partitions is called folds). Followed by

that, one will perform the modeling fitting on some subsets (known as the training set)

and validate the remaining sets (known as validation set) to test the performance. For the

enhancement of predictive power and minimization of variability, the rounds of cross-

validation are performed iteratively for the number partitions, and the validation results

are averaged over rounds.

To illustrate its usage, for a k-fold Cross-Validation, the dataset is first partitioned into k

subgroups, among which (k-1) groups will be treated as the training set and the remaining

one group is regarded as the validation set. The dataset are fitted into the designed model,

for which the dataset is supposed to be compatible, and the validation result is recorded.

This procedure is repeated for k times until each group is used exactly once for the

purpose of validation. The performances are averaged over the iterative rounds. The

implementation of this methodology specific to our MRF-GP model will be further

studied in the subsequent section.

4.2.2 Maximum a posterior (MAP)

12

Geisser, Seymour (1993). Predictive Inference. New York: Chapman and Hall. ISBN 0412034719.

http://en.wikipedia.org/wiki/International_Standard_Book_Number

http://en.wikipedia.org/wiki/Special:BookSources/0412034719

34

In order to maximize the logarithm likelihood function, we can directly apply Maximum

a Posterior (MAP) estimation by setting the partial derivative with respect to each

unknown parameters to be 0:

To determine , the partial derivative with respect to

is forced to 0:

( | )

∑ .

/

( )

[

( )]

4

5

( )

( )

……………….. (19)

To determine , the partial derivative with respect to is forced to 0:

( | )

∑ ( )

( )

4

5

( )

( )

……………….. (20)


( | )

∑ ∑ .

/

( )

……………….. (21)


35

( | )

∑ ∑ ( )

( )

……………….. (22)

The code we use to achieve this process is fmincon function based on MATLAB. It

searches for a constrained minimum of a scalar function of several variables starting at a

given initial value. This is generally referred to as constrained nonlinear optimization13.

However, this built-in function of MATLAB has its drawbacks, which makes the MAP

approach not practically feasible in this case. Detailed discussion will be provided at the

end of this sub-section.

4.2.3 Iterative Conditional Modes (ICM)

Since it is difficult to maximize a Markov Random Field, Besag (1986) proposed a

method called iterated conditional modes (ICM) to perform this task alternatively. ICM is

a statistical deterministic algorithm to fix the parameter selection that maximizes the joint

probability of a Markov Random Field. It does this by iteratively applying the

optimization method to the local conditional probabilities14

. However this method

inherently has several drawbacks concerning its efficiency and optimization convergence

capabilities, which will be discussed in details in the subsequent section. First, the

threshold is selected and smoothed before we start to estimate the GP parameter.

To estimate GP parameter, we first need to find the expression for the smoothing

parameter for , respectively. In order to do this, we call back the equations (20)

and (21) and we can derive that:

13

http://www.caspur.it/risorse/softappl/doc/matlab_help/toolbox/optim/fmincon.html 14 http://en.wikipedia.org/wiki/Iterated_conditional_modes

36

.

/

( )

………………... (23)

( )

( )

……………....... (24)

In our case, since we are interested in a region consisting of 78 sites (grids), we hence

substitute 78 for p into the equations (23) and (24):

.

/

( )

And

( )

( )

Plugging the locally fit result of the GP parameters, these two results are used as the

initial values for the ICM iterations. Subsequently, these results will be put back into

equation (17), on which the optimization will be performed:

Log {** | + * ( | )+

∑ ∑ .

/

( )

( ) ∑ ∑ ( )

( )

( )

∑84

5 [

( )] 9

37

We apply interior point method to this function to update the GP parameters iteratively.

This procedure will be repeated iteratively until the parametric values converge. However,

the result is disappointing and some drawbacks are detected. In this simulation, due to

large coefficient of the logarithm term:

( ) and

( ) (p is equal to 78 in this

case), the effect of these two terms overweigh those of the penalizing functions. As a

result, the maximization results tend to converge to large and limited by the self-

defined upper bound, which is clearly a case of over-smoothing. Literally, its result is

similar to the constant model, in which all the parametric values are constant with the

covariate.

4.2.4 Limitations

First of all, the low efficiency and biasing of the results limits the practical usefulness of

the abovementioned approaches. After scrutinizing the posterior distribution we want to

maximize, we detect the tendency for the smoothing parameters to converge to infinity,

limited by the upper bound we set in the fmincon function. This is due to the fact that we

are only considering the smoothing parameters as constants, instead of regarding them as

random variables, which are correlated with the choices of the GP parameters. The reason

and impact of this bias in modeling process are explained in more details in Chapter 5.

Second, these methods tend to converge into the local maximum when performing the

optimizations. This is due to the fact that the Generalized Pareto distribution, which we

adopted as the model for marginal fitting is in fact non-linear and non-convex. Therefore,

there is no guarantee of the uniqueness of the maxima. As a result, the smoothing

parameters, which are sensitive to the choice of the initial values, will tend to converge

into the local minimum, leading to a sub-optimal solution.

Disappointed by the above methodologies, we further investigate our situations. Cross-

validation tends to behave like locally-fit; MAP and ICM tend to converge to constant

model. And we want to find some value in between purely smoothing and purely

maximization.

38


EXPECTATION MAXIMIZATION (EM)

In this chapter, another smoothing parameter selection method- Expectation

Maximization- is proposed, motivated by the limitations of the abovementioned

estimation and optimization approaches. The MRF-GP model, with partial modifications,

is reproduced in its matrix form for the convenience of further mathematical operations.

To demonstrate the EM algorithm for parameter selection, some preliminary concepts are

prepared next. Since the results based on Cross-validation and Iterative Conditional

Modes are not satisfactory, we adopt the Expectation- Maximization optimization method.

We first introduce the prior models adopted in this algorithm and the distribution function,

on which the Expectation- Maximization method is based. Sample size of individual

locations is enhanced adopting site averaging approach. Initial model parameters are

learned from data via bootstrapping. The implementation procedures of EM algorithm are

also demonstrated in details. A discussion concerning the converging issue and

robustness of the model concludes this section.

5.1 Model in Matrix Form

5.1.1 The Prior Model: Gauss- Markov Random Field

Our prior model is still based on the Thin Membrane Model introduced in chapter 2:

( | ) : ∑ ∑ ( )

( )

;

Rewritten in its vector form, we have:

( | ) (

)

…………….... (25)

Where is the smoothing parameter controlling the penalizing strength, and is the

adjacent matrix, which is in the form of equation (10):

39

<

=

The diagonal coefficients belong to { +, where is the number of adjacent sites

indexed by site n, , - (p is 78 in our case). The off-diagonal elements are zero if

there is no edge connecting the two sites or minus one if the connecting edge exists. This

matrix is sparse and its inverse almost surely exists. implements the thin membrane

model and characterize the spatial property of the multi-site data. Additionally, the

smoothing parameter controls its dependence strength. This model indicates the initial

state of the interested system, which is characterized by the Markov Property, before

further observations are made.

Based on the MRF-GP Model in section 5, our Priori distribution, which follows an

exponential family, can be expressed in its vector form as:

( | ) √ ( )

√ (

)

(

)

…………….... (26)

5.1.2 Conditional Distribution

This section of our model characterizes the behavior of the observed variable basing on

the prior information. According to MRF-GP model, the conditional distribution should

follow:

( | )

√ ( )

(√ ) (

( ) ( )

…………….... (27)

40

Where R is a sparse matrix with the diagonal elements being the variance indexed by

each site:

[

]

…………….... (28)

In fact, this is an underlying random variable x mixed with a Gaussian noise, whose value

will be determined by bootstrapping, learning from the data through quantile regression

and varying systematically with the space. Obviously, this distribution belongs to the

exponential family.

5.1.3 The Posterior Distribution

According to our MRF-GP, the posterior distribution that we want to maximize is

governed by:

( | ) ( | )

( ) ( | ) ( | ) ( )

∏

√ (

( )

)

√ : ∑ ∑ ( )

( )

; ( )

In vector form,

( | ) (

) (

( ) ( )

41

(

(

) )

…………….... (29)

We define and . C is the called the selection

matrix, which will be used to select the data where the observation is available. For cases

with no missing data, selection matrix is the identity matrix with the dimension the same

as the underlying dataset. J is regarded as the posterior adjacent matrix. Additionally,

since is a diagonal matrix with the same sparsity as , adding the conditional

term preserves the structure of the original system15

, which is desirable. With the

selection matrix C being an identity matrix, the posterior distribution can be simplified as:

( | ) (

)

…………….... (30)

5.1.4 Measurements Bootstrapping

In this section, we present a bootstrapping method to infer the uncertainties of extreme

value model parameters (GP parameters in or case) and thresholds directly, given n

observations for each individual location. Bootstrapping is a standard approach in

statistical inferences. It measures the parameter uncertainties by re-sampling the original

data sample at random16

.

Some works assume the Gaussian noise mixed inside the Markov Random Field is

uniform throughout the interested region17

, with zero mean and constant variance

independent with space or other covariates. In our paper, we propose that the variance

inherited in the system is not uniform. Rather, it varies with covariates systematically,

15

Myung Jin Choi and Alan S. Willsky (2007). Multiscale Gaussian Graphical Models and Algorithms for large-scale inference. Massachusetts Institute of Technology, Electrical and Computer Science. 77 Massachusetts Ave., Cambridge, MA 02139, USA. 16

Philip Jonathan, Kevin Ewans (2008). Uncertainties in extreme wave ueight estimates for hurricane- dominated regions. OMAE-06-1067. 17

Philip Jonathan, Kevin Ewans, George Forristall (2008). Statistical estimation of extreme oceam environments: the requirement for modeling directionality and other covariate effects.

42

which can be learned from the data using bootstrapping. The calculated variance of

threshold and GP parameters are prepared and utilized in the EM inference step. The 95%

confidence intervals of GP parameters are also inferred and the results are discussed in

Chapter 6. The general procedures for estimating the parameter variance and

uncertainties are as follows:

1. Estimate the GP parameters ( ) and threshold u using the whole of the original

data sample as the initial values.

2. Generate m data sub-samples for each individual site * +

by re-sampling

the n observations of each sites at random with replacement, assuming no missing

data, where is the original observations at site and is determined by the

grid structure and * + with ( ) is the collection of all the

sub-samples.

3. For each site, estimate the GP parameters ( ) and threshold level (by fixing the

same quantile) for each of the m sub-samples. We record the vector =

{ }, where ( ) is the set of parameters for sample n.

Normally, m is of the order of 1000 and is equal to 3000 in our case.

4. Next, we can obtain the variance estimates for and respectively by calculating

the variance of .

5. To estimate the 95% confidence interval for the model parameters, we find the critical

vales (

) of such

are the 2.5% and 97.5% quantile of the parameter

vectors, respectively.

5.1.5 Threshold Smoothing

The observation vector of threshold is obtained by computing the same quantile for all

the sites. The noise covariance matrix is assumed to be a diagonal matrix with the

variance for each site estimated using bootstrap approach introduced by Jonathan18

(see

details in section 5.1.4.). We select the threshold smoothing parameter by trial and

18

P. Jonathan, K. Ewans, “Uncertainties in extreme wave height estimates for hurricane-dominated regions”, Journal of offshore mechanics and arctic engineering, vol. 129/1, August, 2007.

43

error in order to measure the effect of the roughness of threshold surface on GP

parameter surface.

For each selected smoothing parameter , we update the next estimator of threshold

based on the posterior distribution of the initial value obtained from quantile regression,

with variance inferred from the bootstrapping model. This estimator ( ) is defined as the

optimal value that can maximize its posterior distribution ( | ) using the Expectation

Maximization algorithm (see details in section 5.3):

( | ) (

(

) )

…………….... (31)

Therefore, we apply MAP method to equation (31) to find the optimal estimate of .

Practically, we set its derivative to zero:

(

( ) )

Hence we obtain:

( )

…………….... (32)

5.1.6 GP Parameter Smoothing

Based on the smoothed threshold, we estimate the observed value of GP parameters and

using maximum likelihood method. The smoothed GP parameters are then estimated

using similar process in the above section.

( )

( )

………………... (33)

44

And

( )

( )

…………….... (34)

However, the smoothing parameters and is learned from the data using the

expectation maximization introduced in the next section.

5.2 The Exponential Family

Since the prior and conditional distributions are given as

( ) ( )

√ (

)

And

( | )

( )

(√ ) (

( ) ( ))

We can derive the joint distribution of x and y as

( | ) ( )

√ (

)

( )

(√ ) (

( ) ( ))

(

( ))

(

( ) ( )

( ))

…………….... (35)

45

Therefore, the joint distribution follows an exponential form. It is notable that, we will

utilize the convex property of the exponential family in the later parameter selection

process.

5.3 Expectation Maximization Algorithm

5.3.1 Introduction to Expectation Maximization

The EM algorithm is an iterative method adopted to estimate parameters in latent variable

models. It is a general purpose algorithm for finding the maximum likelihood estimates

with a good converging property. Specifically, due to the convex property of the

exponential family, the convergence of EM is guaranteed.

Let ( ) where p=78 be the observed data, and ( ) be the

hidden random variables, and be the set of model parameters, then we want to find the

parameter set that can make the current observation the most probable. Therefore, the

log-likelihood function we hope to maximize is:

( ) ( | ) ∫ ( | )

……………....... (36)

And

∫ ( | ) ( | )

∫ ( | )

……………........(37)

5.3.2 Jensen’s Inequality19

Generally, convex function satisfies the Jensen‟s inequality, which states that for

, -, any convex function ( ) follows:

( ) ( ) ( ) ( ( ) )

…………….... (38)

19 Kin Y. Li(2000). Mathematical Excalibur. Volume 5, Number 4.

46

However, this concept is not limited to the bivariate cases. And given and

is non-negative, ( ) ( ) , it can be generalized to:

( ) ( ) ( ) ( ) ( )

Therefore, one can easily derive that:

, ( )- ( , -)

…………….... (39)

This conclusion will be used in the derivation of the EM algorithm in the subsequent

section.

5.3.3 The EM Algorithm

Due to the convexity of the exponential family, we apply Jensen‟s in-equality to obtain

the lower bound of the likelihood function:

( ) ( | ) ∫ ( ) ( | )

( ) ,

( | )

( ) 6

( | )

( )7

∫ ( | ( )) ( | ) ∫ ( | ( )) ( | ( ))

( )

…………….... (40)

Since the lower bound is strictly non-above the likelihood functions, it is intuitively true

that the process of maximizing the likelihood is in fact equivalent to maximize its

concave lower bound, and the choice of function q that may make the lower bound a

maximum is the posterior distribution ( | ( )). To prove this, we force the equality

of the above formula, and we find that when ( ) ( | ( )), the equilibrium is

satisfied as expected:

47

( ) ∫ ( | ) ( | )

( | ) ∫ ( | ) ( | )

( | )∫ ( | ) ( | ) ( )

Where ( ) the likelihood function one is hopes to maximize and ( ) is the concave

lower bounded or EM objective function defined by the Jensen’s Inequality. Therefore, as

one maximizes the objective function, the maximization of the log likelihood function is

also ensured. In sum, the iterations consist of two standard steps:

In the Expectation step: one updates the posterior distribution ( ) conditional on the

observations and meanwhile keep the fixed, with the next estimate obtained from

( )

In the Maximization step, one fixes the and maximize the concave lower bound:

( )

5.3.4 Expectation Maximization in MRF-GP

The Expectation Step

In the E-step, we fix the smoothing parameters and update the next estimator of the latent

variables based on the posterior distribution of the previous values by choosing ( )( )

( | ( )) and estimate , | -. And the resultant estimator is defined as the

current optimal value that can maximize its posterior distribution ( | ). Therefore, we

apply MAP method to ( | ) to find the optimal estimate of . Practically, we set its

derivative to zero:

( | )

48

The Maximization Step

In this step, we want to maximize the lower bound, ( ) defined in,

( ) ∫ ( ) ( | )

( )

∫ ( | ( )) ( | ) ∫ ( | ( )) ( | ( ))

∫ ( | ( )) ( | ) ( ( )( ))

…………….... (41)

Where the second term is the entropy or uncertainty term of ( )( ) and is irrelevant to .

Therefore, we define ( ( )) as the first term of the above equation and estimate

that maximize

( ( )) | ( ), ( | )-

…………….... (42)

These steps will repeat until the convergence, which is ensured by the convexity of the

exponential family.

5.4 Model Implementation by EM

In our model, we have two smoothing parameters that need to be inferred:

{ }

…………….... (43)

And our objective is to estimate the smoothing parameter that best explains the data we

observe. In other words, given observation , we seek the parameter which maximizes

the log-likelihood:

49

( ) ( | ) ∫ ( | ) ∫ ( ) ( | )

( )

5.4.1 The Expectation Step

In E-step, we choose ( ) and estimate , | -. And we

obtain:

( | ) ∫ ( | )

…………….... (44)

Where is the next estimates, y is the current observed value,

, and . The error covariance matrix is the inverse of J, which measures the

updating value‟s error correlation with the initial observation:

[(( ( ))( ( )) | )]

…………….... (45)

It is notable that, this process involves the matrix inverse. For large scale sets of variables,

the inverse of the variable matrix may be sometimes problematic and intractable.

The expression for J is

. is the information

matrix of the underlying dataset in diagonal form, therefore, it is guaranteed to have

matrix inverse. Generally, the inverse of addition of two matrix and will

almost be surely existing if the smoothing parameter alpha is sufficiently small. However,

it is possible that the matrix will be badly conditioned or singular when the alpha value

keeps rising. In this case, alternative method is adopted. To handle these exceptions, we

assume to be infinity, as the value of it in fact increases exponentially during iterations.

Therefore, we just need to choose that maximize

( ) ( ) subject to

the constraint that all the components in equals to each other.

50

5.4.2 The Maximization Step

The Maximization step finds the next parameter that maximize ( ( ) ) given by:

( ( ) ) ∫ ( )( ) ( | )

( )( )

As discussed in the previous section, it is indeed the same to maximize the Q-function:

( ( )) | ( ), ( | )-

| ( )[

]

( )

( )

…………….... (46)

Where . . ( ( )

) / ( )

( )/

…………….... (47)

To maximize the Q-function, we set its derivative with respect to ( ( ))

to 0.

Therefore,

(

( ))

And

( )

…………….... (48)

Where is the number of sites, which is 78 in our case based on the grid structure

designed in chapter 3.

51

5.5 Discussions

In our model, the GP parameters are estimated using this algorithm. For each iteration

step, the difference between the updated and previously observed likelihood is

conditionally guaranteed to be non-negative. Therefore, given the probability to be

maximized is Gaussian and convex, the convergence is assured and the issue of

stabilizing to a local maxima is prevented. In addition, this model is more reliable than

other implementation algorithms. After performing the ICM method, the estimated

smoothing parameter is forced to infinity, which is unrealistic. This can be justified by

scrutinizing the likelihood we attempts to maximize:

∑ ∑ .

/

( )

( ) ∑ ∑ ( )

( )

( )

∑84

5 [

( )] 9

The ICM algorithm is based on an assumption that smoothing parameter and is a

constant value that is independent with the GP parameters. Therefore, we have every

freedom to select any combinations of , , and to achieve the maximization

purpose without considering the dependence structure that should not be ignored. Under

this false assumption, one natural choice is force and to infinity and let

.

/

( ) and ( )

( ) to zero. By this way, the

maximization is achieved. However, all the GP parameters across the different sites are

forced to the same. The parameter surface is flat. This is not realistic and misleading.

In fact, the smoothing parameters are not constants. Instead, they are also random

variables that are interrelated with the choice of GP parameters, and considering them

independently will leads to the above consequence. EM algorithm takes this dependence

into consideration and always regards the smoothing parameters as random variables,

which can be reflected from the formula:

52

( | )

is the next estimate of GP parameters and contains the information of ,

In this way, the relationship of and GP parameter is constructed and the distribution of

the smoothing parameter is taking into consideration properly.

53

CHAPTER 6: RESULTS AND DISCUSSIONS

In this chapter, the model implemented results are prepared and scrutinized. First, the

approaches to initialize the parameters and therefore to construct the data are introduced.

Second, the theoretical expectations of the designed model are argued, with which the

performance of the MRF-GP model is expected to satisfy. After the model construction

criteria are fixed, various sensitivity and uncertainty tests are performed to assess the

robustness of the model and additionally, to prove the superiority of our model over other

suggested frameworks. Specifically, the threshold sensitivity and uncertainty tests are

performed and the MRF-GP model is proved to be almost surely superior to the locally-

fit results with less sensitivity and uncertainty with respect to the threshold levels.

Followed by that, the smoothing parameter sensitivity and uncertainty measurements are

conducted and our model superiority is confirmed. We repeated the above discussion

procedures with reduced sample sizes to test the sizing effect. A summary of the

abovementioned procedures and results are provided at the end of this section.

6.1 Parameters Initialization and Data Generation

The artificial data is generated according to the fitted model proposed by P. Northrop and

P. Jonathan 20

, where threshold is quadratic in longitude and latitude while GP

parameters and are constant. Here, we select a quadratic surface with respect to

threshold changing with longitude and latitude shown in Fig. 1 and GP parameters to be

(-0.3 4.4). Specifically, the threshold surface (Figure 6.1) is constructed according to the

polynomial format:

( ) ( ) ( ) ( ) ( ) ( )

………………... (49)

Where x and y are the longitude and latitude respectively, ( ) and ( ) are linear

functions in the form of (x-a) and (y-b); and ( ) is a quadratic function in the form of

( ( ) ) , and are different coefficients with fixed values.

20

P. Northrop, P. Jonathan, “Modeling spatially-dependent non-stationary extremes with application to

hurricane-induced wave heights”, Publisher: Department of Statistical Science, University College London

54

Since our simulation study is based on synthetic data only, we assign the set of

parameters * + with arbitrary values.

Fig. 6.1: Threshold surface constructed from the quadratic model

6.2 Theoretical Expectations

Theoretically, if the designed model is applicable given a properly chosen latent

threshold surface with certain tolerance for variation, the resultant GP parameters after

smoothing must fulfill the following criteria to be reliable:

For shape parameter with respect to each individual site, the estimates varying with

different threshold value should be almost constant. In addition, for each corresponding

scale parameter , the stiffness of the trajectory should behave linearly, with the slope

approximately equal to . Adding to that, the randomly chosen percentile of wave

heights should be nearly constant for various threshold values, from which one can

confidently say that the model is indeed a good estimate of the underlying distribution.

To test the performance of our model, we do the following analysis

6.3 Threshold Sensitivity and Uncertainty Analysis

0

2

4

60 2 4 6 8 10 12 14

1.5

2

2.5

3

3.5

4

55

For the first simulation study, we generate 1250 samples for each site. Our task here is to

assess the quality of MRF-GP model fitting. Given a fixed selection of , which

controls the strength of the smoothing penalty of the threshold in the thin membrane

model, we examine the variation of gamma, sigma, respectively and compare its

performance with that of the locally fitted model. To prove the superiority of our method,

we measure how well it meets the abovementioned model construction criteria in practice.

For shape parameter gamma, we expect it to remain steady as a function of threshold; for

scale parameter sigma, we expect it to be linear function with gradient equal to gamma,

varying systematically with threshold levels.

In order to implement these assessments, we first plot the function of GP parameters

changed with threshold for the locally-fit and MRF-GP model, and the results of these

two experiments for site 3 are shown in Fig. 6.3.1 and Fig. 6.3.2 respectively. Other sites

demonstrate similar performance.

Fig 6.3.1: GP parameters changed with threshold for locally fitted model. 6.3.1(a): shape

parameter; 6.3.1(b): scale parameter ((Sample Size= 1250))

2 3 4 5 6 7 8 9 10-1

-0.5

0

0.5

threshold

shape p

ara

mete

r

estimate value

95% uncertainty interval

expected behavior

2 3 4 5 6 7 8 9 101.5

2

2.5

3

3.5

4

4.5

5

threshold

scale

para

mete

r

estimate value


expected behavior

56

Fig. 6.3.2 GP parameters changed with threshold for MRF-GP model 6.3.2(a): shape

parameter; 6.3.2(b): scale parameter ((Sample Size= 1250))

Theoretically, if the underlying distribution follows the GP form strictly, varying

threshold will not affect the shape parameter estimates while the scale parameter will be a

linear function of threshold with gradient in a reasonable interval. By inspecting the

above plotted figures, a stronger level of deviation from the estimate values is observed

in the locally-fit results. Based on this strong assumption, we can claim with confidence

that our model fits the artificial data properly with superior performance. Additionally,

compared with the results from the locally fit model, the MRF-GP model provides a

better parameter estimates for the theoretical value with less variations, which can be

reflected from the narrower 95% uncertainty band.

6.4 Smoothing Sensitivity and Uncertainty Analysis

To investigate the sensitivity of the model with varying smoothing parameters, we

temporarily keep the quantile level unchanging while varying the smoothing parameter

for the threshold ( ). The purpose of this analysis is to prove the superiority of our

model with less parameter sensitivity, compared with the locally-fit results. In this case

2 3 4 5 6 7 8 9 10-1

-0.5

0

0.5

threshold

shape p

ara

mete

r

estimate value


expected behavior

2 3 4 5 6 7 8 9 101.5

2

2.5

3

3.5

4

4.5

5

threshold

scale

para

mete

r

estimate value


expected behavior

57

study, we set the quantile of threshold to be 0.4. We plot the GP parameters of all sites

varied with the smoothing parameter , and the results are shown in Figure 6.4.1 and

6.4.2 respectively.

Fig. 6.4.1 GP parameters changed with for locally fit model (Sample Size= 1250)

Fig. 6.4.2 GP parameters changed with for MRF-GP (Sample Size= 1250)

-1 0 1 2 3 4 5 6 7 8 9 10-1

-0.5

0

0.5

log(u)

shape p

ara

mete

r

-1 0 1 2 3 4 5 6 7 8 9 101.5

2

2.5

3

3.5

4

4.5

5

log(u)

scale

para

mete

r

-1 0 1 2 3 4 5 6 7 8 9 10-1

-0.5

0

0.5

log(u)

shape p

ara

mete

r

-1 0 1 2 3 4 5 6 7 8 9 101.5

2

2.5

3

3.5

4

4.5

5

log(u)

scale

para

mete

r

58

It is proven by P. Northrop and P. Jonathan that for a proper chosen threshold smoothing

parameter, which is determined by in our model, a constant model is sufficiently good

to fit surface and surface properly. In other words, a properly constructed threshold

surface, controlled by , may lead to a stiffer GP parameter surfaces. Demonstrated

from the above two figures, the performance of our model is better since for a suitable

range of bounded by 100 in this case, both surface and surface are flat.

6.5 The Sizing Effect

To illustrate the effect of varying sample sizes, we repeat the above analysis procedures

by reducing the sample size to 315 per site.

First, we compare the threshold sensitivity for the MRF-GP model with the locally-fit

results, and the resulting GP parameters changed with threshold level are plotted in

Figure 6.5.1 and 6.5.2, respectively.

Fig. 6.5.1 GP parameters changed with threshold for locally fitted model (Sample Size=

315)

2 3 4 5 6 7 8 9 10

-1

-0.5

0

0.5

threshold

shape p

ara

mete

r

estimate value


expected behavior

2 3 4 5 6 7 8 9 101.5

2

2.5

3

3.5

4

4.5

5

5.5

6

threshold

scale

para

mete

r

estimate value


expected behavior

59

Fig. 6.5.2 GP parameters changed with threshold for MRF-GP (Sample Size= 315)

Apparently, our model still gives a better estimate. The performance is acceptable with

small variations with the theoretical values when there are enough samples (for the range

when threshold level is less than 8). However, the performance understandably

deteriorates when the threshold level is sufficiently high, since the limited number of

samples is not enough to guarantee a consistent estimate. Moreover, due the reduced

sample size, the 95% uncertainty interval is larger than that when sample size is 1250.

In addition, the sensitivity of the smoothing parameter ( ) is also presented, and the GP

parameters changed with for MRF-GP model and the locally-fit results are plotted in

Figure 6.5.3 and 6.5.4, respectively.

2 3 4 5 6 7 8 9 10

-1

-0.5

0

0.5

threshold

shape p

ara

mete

r

estimate value


expected behavior

2 3 4 5 6 7 8 9 101.5

2

2.5

3

3.5

4

4.5

5

5.5

6

thresholdscale

para

mete

r

estimate value


expected behavior

60

Fig. 6,5.3: GP parameters changed with for locally fit model (Sample Size= 315)

Fig. 6.5.4: GP parameters changed with for MRF-GP (Sample Size= 315)

As illustrated from the above figures, our model still keeps a good performance even

though the sample size decreases, with a clear superiority with the locally-fit results.

-1 0 1 2 3 4 5 6 7 8 9 10-1

-0.5

0

0.5

log(u)

shape p

ara

mete

r

-1 0 1 2 3 4 5 6 7 8 9 101.5

2

2.5

3

3.5

4

4.5

5

log(u)

scale

para

mete

r

-1 0 1 2 3 4 5 6 7 8 9 10-1

-0.5

0

0.5

log(u)

shape p

ara

mete

r

-1 0 1 2 3 4 5 6 7 8 9 101.5

2

2.5

3

3.5

4

4.5

5

log(u)

scale

para

mete

r

61

CHAPTER 7: CONCLUSION AND RECOMMENDATIONS

In this chapter, the objective and conclusion of our research work are elaborated and my

contribution towards it is specified. Some limitations of the conducted researches are

admitted and justified, followed by my future action plans. After these, my

recommendations to future researchers in this field are presented. This sections concludes

the whole thesis.

7.1 Summary of the Contributions

The modeling of extreme or catastrophic events over a threshold has gained a rising

popularity and significance recently. Original methods only fit the extreme data to the

Generalized Pareto (GP) distribution locally. However, dependence between neighboring

sites is obvious. In this paper, the author proposes a nonparametric method to handle the

covariate dependence of neighboring sites spatially using Gaussian Markov random fields.

The thesis starts with an introduction section concerning the topic of extreme value and

catastrophic modeling, followed by a detailed literature view covering a discussion

regarding the disadvantages and limitations of the existing approaches. Motivated by the

severity of the catastrophic events, surging demands of the extreme value knowledge

with critical applications and drawbacks of the original methods designed to tackle these

issues, the author is encouraged to propose this currently-new MRF-GP approach to

model the peak-over-threshold (POT) distribution non-parametrically.

To prepare for the model construction and analysis, the essential preliminary knowledge

including the categorization of extreme value modeling, concept of thin membrane gauss

graphical models and estimation methods are elaborated. After these, the algorithm of the

proposed MRF-GP is introduced in great length, with several popular model optimization

methods discussed and tested. This method admits the marginal distribution for each

individual site as Generalized Pareto distribution, in agreement with the previous

approaches. It first assumes that the observed values are the underlying latent random

variables mixed with a Gaussian noise with unknown variance and zero mean. It then

penalizes the distributions using the thin membrane Gaussian graphical models by

62

considering the spatial dependence characterized by the Markov random field, controlled

by a set of smoothing parameters. However, the implementation results are disappointing

and biasing, leading to various sub-optimal solutions. After careful inspection, the

mechanism of these failures is scrutinized and the author concludes that the issue is

largely resulting from the sensitivity and difficulty in the smoothing parameters selection

process. Motivated by these discoveries, the author proposes another parameter selection

method adopting the expectation maximization approach.

After the model implementation by the EM algorithm, the results are recorded and

discussed. And it shows that the results match with the theoretical expectations well.

Specifically, the gamma parameters for individual site are almost constant regardless of

the choice of threshold level. In addition, the sigma parameters behave almost linearly

with the gradient approximately equal to the corresponding gamma. After comparison

with the locally fit results, the proposed MRF-GP model demonstrates clear superiority

across all aspects with less deviations and fluctuations from the expected values.

Furthermore, the performance maintains satisfactory when the sample sizes are

reasonably reduced.

In sum, the author successfully achieved the objective of this research project for extreme

events modeling designs and accomplished the following tasks independently:

Reviewed large amount of the existing research works regarding the extreme

value modeling and analyzed their pros and cons;

Proposed a non-parametric model for catastrophic events prediction and inference

based on the designed model construction criteria;

Implemented the model using various approaches based on MATLAB, followed

by essential case studies and simulation results;

Validated the results of the simulation to ensure the compatibility with the theory;

Demonstrated the superiority of the suggested MRF-GP model with evident

analysis;

Discussed its applications and recommendations for future research works.

63

7.2 Recommendations for future works

We plan to apply this method on the real hind-cast dataset of Gulf of Mexico. Due to the

unavailability of the dataset temporarily, we‟ve made the request from Jonathan21

. We

will implement the model on these data as soon as they are available.

In future, the estimation precision can be enhanced from various ways. For instance, we

will use the multiscale model suggested by Myung Jin Choi22

, which captures the long-

range dependency by introducing several coarser scales (Figure 7.2).

Figure 7.2: Multi-scale gauss graphical grid structure in a snapshot (3D visualization

from MATLAB)

21

P. Northrop, P. Jonathan, “Modeling spatially-dependent non-stationary extremes with application to hurricane-induced wave heights”, Publisher: Department of Statistical Science, University College London. 22 M. J. Choi, A. S. Willsky, “Multiscale Gaussian graphical model and algorithms for large-scale inference”, Statistical Signal Processing, 2007.

http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=4301199

64

REFERENCES

[1]. Philip Jonathan, Kevin Ewans (2008). Modeling the seasonality of extreme waves in

the Gulf of Mexico. Shell technology centre thornton and sell international

exploration and production. Proceedings of omae 2008, the 27th international

Conference on Offshore Mechanics and Arctic Engineering.

[2]. Philip Jonathan, Kevin Ewans (2007). The effect of directionality on extreme wave

design criteria. Shell Research Limited & Shell International Exploration and

Production.

[3]. Balkema, A., and Laurens de Haan (1974). "Residual life time at great age", Annals

of Probability, 2, 792–804.

[4]. Philip Jonathan, Kevin Ewans and George Forristall. (2008) “Statistical estimation of

extreme ocean environments: the requirement for modeling directionality and other

covariate effects,” journal of Ocean Engineering 35 (2008) 1211- 1225.

[5]. Paul Northrop and Philip Jonathan (2010). Modeling spatially-dependent non-

stationary extremes with application to hurricane-induced wave heights.

[6]. Geisser, Seymour (1993). Predictive Inference. New York: Chapman and Hall.

ISBN 0412034719.

[7]. Myung Jin Choi and Alan S. Willsky (2007). Multiscale Gaussian Graphical Models

and Algorithms for large-scale inference. Massachusetts Institute of Technology,

Electrical and Computer Science. 77 Massachusetts Ave., Cambridge, MA 02139,

USA.

[8]. Philip Jonathan, Kevin Ewans (2008). Uncertainties in extreme wave ueight estimates

for hurricane- dominated regions. OMAE-06-1067.

[9]. P. Jonathan, K. Ewans (2007). “Uncertainties in extreme wave height estimates for

hurricane-dominated regions”, Journal of offshore mechanics and arctic engineering,

vol. 129/1, August, 2007.

[10]. Kin Y. Li (2000). Mathematical Excalibur. Volume 5, Number 4.

[11]. P. Northrop, P. Jonathan, “Modeling spatially-dependent non-stationary extremes

with application to hurricane-induced wave heights”, Publisher: Department of

Statistical Science, University College London

http://en.wikipedia.org/wiki/International_Standard_Book_Number

http://en.wikipedia.org/wiki/Special:BookSources/0412034719

65

[12]. Lian Heng (2011). MAS453-Data mining: Session 4: Frequentist and Bayesian

statistics, lecture notes, Nanyang Technological University, Singapore.

[13]. Caroline Keef, Jonathan Tawn and Cecilia Svensson (2009). Spatial risk

assessment for extreme river flows. Appl. Statist (2009) 58, Part 5, pp.

[14]. Caroline Keef, Jonathan Tawn and Cecilia Svensson (2009). Spatial dependence

in extreme river flows and precipitation for Great Britain. Journal of Hydrology 378

(2009) 240-252.

[15]. Sean Bormen (2004). The expectation maximization algorithm- a short tutorial.

[16]. Kevin Ewans and Philip Jonathan (2008). The effect of directionality on Northern

North Sea extreme wave design criteria. Nov 2008, Vol. 130/ 041604-1, journal of

offshore mechanics and arctic engineering.

[17]. Philip Jonathan and Kevin Ewans (2009). A spatio-directional model for extreme

waves in the Gulf of Mexico. Proceedings of OMAE 2009, the 28th

international

conference on offshore mechanics and arctic engineering. 31 May- 4 June, 2009,

Honolulu, U.S.A.

66

APPENDIX- LIST OF CODES

1. Main.m

clear; %clc; %matlabpool 4

%% read data load synthetic_data; XDat=PkHs(1:1250,:); [n,p]=size(XDat);

%% predifine Prm NEP=0.6; %0.4;%[0:0.01:0.1,0.12:0.02:0.24,0.26:0.04:0.3,0.35:0.05:0.6]; N=3000; alpha_u_array=0:25:500; Grid=reshape(1:78,13,6)'; Jp=thin_membrain(Grid); Thrh_array=zeros(length(alpha_u_array),78); Gammah_array=zeros(length(alpha_u_array),78); Sigmah_array=zeros(length(alpha_u_array),78); Gamma0_array=zeros(length(alpha_u_array),78); Sigma0_array=zeros(length(alpha_u_array),78); alpha_s=zeros(1,length(alpha_u_array)); alpha_g=zeros(1,length(alpha_u_array));

%% site averaging %XDat=site_average(XDat0,Grid); X=boostrap(XDat,N); [Thr0,Noise_Var] = Thr_boostrap (XDat,NEP,N,X); %initial values and

variances

%% for i=1:length(alpha_u_array) alpha_u=alpha_u_array(i); %% smooth threshold %[Thrh,alpha_u]=EM_Smth(Thr0,Noise_Var,Jp); %options=optimset('MaxFunEvals',1e20,'MaxIter',1e20,'TolFun',1e-

10,'TolProjCG',1e-10); Thrh=(((alpha_u*Jp+diag(Noise_Var.^-1))\diag(Noise_Var.^-

1))*Thr0')'; %Thrh=fminsearch(@(Prm)Smth_Post_func(Prm,Thr0,Noise_Var,alpha_u,Jp

),ones(1,78),options); Thrh_array(i,:)=Thrh;

%% smooth GP parameters [Gamma0,Sigma0,G_Var,S_Var] = GPPrm_boostrap (XDat,Thrh,N,X); %parfor j=1:p %[Gamma0_array(i,j),Sigma0_array(i,j)]=X_gpfit(XDat{j},Thrh); %end Gamma0_array(i,:)=Gamma0;

67

Sigma0_array(i,:)=Sigma0; [Sigmah,alpha_s(i)]=EM_Smth(Sigma0,S_Var,Jp); [Gammah,alpha_g(i)]=EM_Smth(Gamma0,G_Var,Jp); Gammah_array(i,:)=Gammah; Sigmah_array(i,:)=Sigmah; End

2. X_GpRnd.m

function y=X_GpRnd(n,Gmm,Sgm,Thr);

%function y=X_GpRnd(n,Gmm,Sgm,Thr); % %Philip Jonathan, Statistics & Chemometrics, Thornton %Kevin Ewans, EP Projects, Rijswijk % %ShellX V1.R2.M1 20100912 % %Generates random numbers from specified Generalised Pareto

Distribution % %Input %n 1 x 1 Number of random drawings %Gmm 1 x 1 GaMMa value %Sgm 1 x 1 SiGMa value %Thr 1 x 1 THReshold value % %Output %y n x 1 array of GP random numbers % %History %20100912 - V1.R2.M1

r=rand(n,1);%n is sample size

tGmm=Gmm;tGmm(Gmm==0)=NaN; %manage Gmm=0 if present

y=(Sgm/Gmm)*(r.^(-Gmm)-1)+Thr; %random numbers from GP

%Normal completion return;

3. X_gpfit.m

function [Gamma,Sigma]=X_gpfit(XDat,Thrh)

p=size(XDat,2); Gamma=zeros(1,p); Sigma=zeros(1,p); for j=1:p tX=XDat(:,j)-Thrh(j); tX=tX(tX>0);

68

tPrm=gpfit(tX); Gamma(j)=tPrm(1); Sigma(j)=tPrm(2); end

4. Smth_Post_func.m

function f=Smth_Post_func(Prm,val0,NV,alpha,Jp)

f=(val0-Prm)*diag(NV.^-1)*(val0-Prm)'+Prm*alpha*Jp*Prm';

5. Post_func.m

function f=Post_func(Prm,val0,NV) %,Jp)

f=(val0-Prm*ones(1,78))*diag(NV.^-1)*(val0-

Prm*ones(1,78))';%+Prm(1:end-1)*Prm(end)*Jp*Prm(1:end-1)'-

78*log(Prm(end));

6. thin_membrain.m

function Jp=thin_membrain(Grid)

[m,n]=size(Grid); Jp=zeros(m*n); for i=1:m if i==1 for j=1:n if j==1 Jp(Grid(i,j),Grid(i,j))=2; Jp(Grid(i,j),Grid(i+1,j))=-1; Jp(Grid(i,j),Grid(i,j+1))=-1; elseif j==n Jp(Grid(i,j),Grid(i,j))=2; Jp(Grid(i,j),Grid(i+1,j))=-1; Jp(Grid(i,j),Grid(i,j-1))=-1; else Jp(Grid(i,j),Grid(i,j))=3; Jp(Grid(i,j),Grid(i+1,j))=-1; Jp(Grid(i,j),Grid(i,j-1))=-1; Jp(Grid(i,j),Grid(i,j+1))=-1; end end elseif i==m for j=1:n if j==1 Jp(Grid(i,j),Grid(i,j))=2; Jp(Grid(i,j),Grid(i-1,j))=-1; Jp(Grid(i,j),Grid(i,j+1))=-1; elseif j==n Jp(Grid(i,j),Grid(i,j))=2;

69

Jp(Grid(i,j),Grid(i-1,j))=-1; Jp(Grid(i,j),Grid(i,j-1))=-1; else Jp(Grid(i,j),Grid(i,j))=3; Jp(Grid(i,j),Grid(i-1,j))=-1; Jp(Grid(i,j),Grid(i,j+1))=-1; Jp(Grid(i,j),Grid(i,j-1))=-1; end end else for j=1:n if j==1 Jp(Grid(i,j),Grid(i,j))=3; Jp(Grid(i,j),Grid(i+1,j))=-1; Jp(Grid(i,j),Grid(i-1,j))=-1; Jp(Grid(i,j),Grid(i,j+1))=-1; elseif j==n Jp(Grid(i,j),Grid(i,j))=3; Jp(Grid(i,j),Grid(i-1,j))=-1; Jp(Grid(i,j),Grid(i+1,j))=-1; Jp(Grid(i,j),Grid(i,j-1))=-1; else Jp(Grid(i,j),Grid(i,j))=4; Jp(Grid(i,j),Grid(i+1,j))=-1; Jp(Grid(i,j),Grid(i-1,j))=-1; Jp(Grid(i,j),Grid(i,j-1))=-1; Jp(Grid(i,j),Grid(i,j+1))=-1; end end end end

7. boostrap.m

function X = boostrap(XDat,N)

[n,p]=size(XDat); X=zeros(n,p,N);

for i=1:N; %2:N-1 I=floor(rand(n,1)*n)+1; I(I==n+1)=n; X(:,:,i)=XDat(I,:); %X(:,:,i+1)=XDat(I(l+1:2*l),:); end

8. Thr_boostrap.m

function [Thr,Noise_Var] = Thr_boostrap (XDat,NEP,X)

N=size(X,3); p=size(XDat,2);

70

Qx_bt=zeros(N,p); Thr=quantile(XDat,NEP);

for i=1:N Qx_bt(i,:)=quantile(X(:,:,i),NEP); end

Noise_Var=var(Qx_bt);

9. EM_Var.m

function [valh,varh]=EM_Var(val0,alpha,Jp)

p=size(Jp,1); var0=1;

while 1 x=(alpha*Jp+var0*eye(p))\val0'*var0; varh=p/(sum((val0'-x).^2)+trace(eye(p)/(alpha*Jp+var0*eye(p)))); if abs(varh-var0)<1e-4 break; else var0=varh; end end

valh=x';

10. EM_Smth.m

function [valh,alphah]=EM_Smth(val0,Noise_Var,Jp) p=size(Jp,1); alpha0=0; Rinv=diag(Noise_Var.^-1); while 1 x=((alpha0*Jp+Rinv)\Rinv)*val0'; alphah=p/(trace(Jp/(alpha0*Jp+Rinv))+x'*Jp*x); if abs(alphah-alpha0)<1e-4 || rcond(alpha0*Jp+Rinv)<1e-16 break; else alpha0=alphah; end end if rcond(alpha0*Jp+Rinv)>1e-16 valh=x'; else options=optimset('MaxFunEvals',1e20,'MaxIter',1e20,'TolFun',1e-

10,'TolProjCG',1e-10); Prmh=fminsearch(@(Prm)Post_func(Prm,val0,Noise_Var),1,options);

71

valh=Prmh*ones(1,p); alphah=inf; end

nanyang technological university school of...

Documents