introduction to functional data analysis

Post on 18-Dec-2014

141 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

This is a set of slides made for 15mins presentation at the South African Stat Association conference in November 2013.

TRANSCRIPT

An Introduction to Functional Data Analysis (FDA)

Rene Essomba, Sugnet Lubbe

Department of Statistical Sciences, University of Cape Town

franckess48@gmail.com

November 2013

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 1 / 20

Break-Down

To represent the data in ways that aid further analysis.

To display the data so as to highlight various characteristics.

To study important sources of pattern and variation among the data.

To explain variation in dependent variable by using independentvariable information.

To compare two or more sets of data with respect to certain types ofvariation.

For illustration, the R-packages fda and fda.usc will be used.

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 2 / 20

Overview

1 Introduction

2 Basis RepresentationFourier BasisB-Splines

3 Summary Statistics for functional dataFunctional means and variancesCovariance and Correlation functions

4 Functional Principal Component Analysis (fPCA)

5 Functional Linear Regression Model (fLRM)

6 Some References

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 3 / 20

Introduction

The Main Equation

Zk(ti ) = X (ti ) + ε(ti ) for i = 1, . . . , n & k = 1, . . . ,N

Zk(ti ) is the noisy observation from the k-th cluster.

X (ti ) is the value of a continuous underlying process.

ε(ti ) is the error term.

N.B.: N denotes the number of observed curves on a discrete grid(ti , i = 1, . . . , n)

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 4 / 20

Introduction

Example: The Canadian Weather (temperatures and precipitations)

daily observations (i.e. Zk(ti ));35 different weather stations (i.e. k = 1, . . . , 35);observed at time ti = 0.5, . . . , 364.5.

Therefore, our observed pairs will be (ti ,Zk(ti )).Plot of the raw data for the station located in Saint Johns & Halifax.

Figure :(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 5 / 20

Basis Representation

Example: The Canadian Weather (continued)

X (ti ) continuous process observed at 365 discrete observations.

Finding a linear combination of K basis functions (0 < K < 365)

X (t) ≈K∑

k=1

θkφk(t) with φk(t) as basis functions and θk as the

coefficients.

Types of basis functions:

Fourier Basis

B-Splines

Remark: The optimal number of basis functions is determined by using ageneralized cross validation criterion (GCV).

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 6 / 20

Fourier Basis

Definition

Useful for periodic data, Fourier basis expansion is composed by thefollowing orthonormal functions:

φo(t) = 1/√T , φ2r−1(t) =

sin(rωt)√T/2

and φ2r (t) =cos(rωt)√

T/2,

with r = 1, ..., L/2 where L is an even integer. The period T is by defaultthe range of discretization points t and ω = 2π/T .

In R: create.fourier.basis(rangeval, nbasis,...) (fda package).

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 7 / 20

Fourier Basis

Figure : Fourier Basis plot with 7 basis functions

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 8 / 20

B-Splines

Definition

Appropriate for non-periodic data.

Selecting a series of knots along the t-axis τ1 < τ2 < ... < τL+2M

where M is the order of the spline;

φk,m(t) = t−τkτk+m−1−τk φk,m−1(t) + τk+m−t

τk+m−τk+1φk+1,m−1(t) for

k = 1, ..., L + 2M −m and φk,1(t) = I[τk ;τk+1](t).

In R: create.bspline.basis(rangeval,nbasis,norder,...) (fda package)

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 9 / 20

B-Splines

Figure : B-Splines of order 4

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 10 / 20

Basis Representation

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 11 / 20

Summary Statistics

The usual tools used for summarizing data in an univariate context remainthe same for functional data

Definition

functional mean: X̄ (t) = N−1N∑i=1

Xi (t).

functional variance: Var(X (t)) = (N − 1)−1N∑i=1

(Xi (t)− X̄ (t))2.

functional covariance:

Cov(X (t),X (s)) = (N − 1)−1N∑i=1

{(Xi (s)− X̄ (s))(Xi (t)− X̄ (t))

}.

In R, mean.fd & var.fd (fda package).Remark: The values returned will also be objects of class fd & fdata.

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 12 / 20

Mean Function and Standard Deviation

Figure : Mean temperature and standard deviation

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 13 / 20

Correlation Function

Figure : Temperature Correlation Function

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 14 / 20

Functional Principal Component Analysis (f PCA)

Primarily used as a tool for dimension reduction, it is designed to explainthe source of variation within the functional data created.

Algorithm

1 Find the function ξ1(t) of norm 1 (i.e.∫ξ21(t)dt = 1) such that

N−1∑

i f2i1 is maximized with fi1 =

∫ξ1(t)X c

i (t)dt.

2 On the mth step (m > 1), compute ξm(t) with the orthogonalityconstraint(s):

∫ξm(t)ξk(t)dt = 0, for k < m.

The functional data will therefore be: X̂i (t) =∑M

k=1 fik ξ̂k(t) wherefik =

∫ξk(t)X c

i (t)dt with X ci = Xi (t)− X̄ (t).

f PCA in R: fdata2pc(fdataobj, ncomp,...) (fda.usc package)

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 15 / 20

Functional Principal Component Analysis (f PCA)

Example (Canadian Weather)

R> temp.svd <- fdata2pc(tempdat.fdata, ncomp=3)

R> norm.fdata(temp.svd$rotation[1:2])

[,1]

[1,] 0.9976567

[2,] 0.9980333

# With 3 components that explained 98.56% of the

variability of explicative variables.

# Variability for each component (%): PC1 88.03 PC2 8.47

PC3 2.06

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 16 / 20

Functional Principal Component Analysis (f PCA)

Figure : Loadings for PC1 & PC2

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 17 / 20

Functional Linear Regression Model (f LRM)

Consider the following functional linear regression models:

Functional response with multivariate covariates:

yi (t) = β1(t)xi1 + · · ·+ βp(t)xip + εi (t); i = 1, . . . ,N

Scalar response with functional covariates:

yi = α +

T∫0

p∑j=1

βj(s)xij(s)ds + εi ; i = 1, . . . ,N; s ∈ [0,T ].

Functional response with functional covariates:

yi (t) = α(t) +

T∫0

p∑j=1

βj(t, s)xij(s)ds + εi (t); i = 1, . . . ,N; s ∈ [0,T ].

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 18 / 20

Useful References

M. Febrero-Bande, M. O. De La Fuente (2012)

Statistical Computing in Functional Data Analysis: The R Package fda.usc.

J. O. Ramsay, G. Hooker and S. Graves (2009)

Functional Data Analysis in R and Matlab.

T. Hastie, R. Tibshirani, J. Friedman (2009)

The Elements of Statistical Learning.

J. O. Ramsay and B. W. Silverman (2005)

Functional Data Analysis

Carl de Boor (1978),

A practical guide to splines, Springer-Verlag, New York Heidelberg Berlin.

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 19 / 20

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 20 / 20

top related