introduction to functional data analysis

20
An Introduction to Functional Data Analysis (FDA) Rene Essomba, Sugnet Lubbe Department of Statistical Sciences, University of Cape Town [email protected] November 2013 (Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 1 / 20

Upload: rene-franck

Post on 18-Dec-2014

140 views

Category:

Education


1 download

DESCRIPTION

This is a set of slides made for 15mins presentation at the South African Stat Association conference in November 2013.

TRANSCRIPT

Page 1: Introduction to Functional Data Analysis

An Introduction to Functional Data Analysis (FDA)

Rene Essomba, Sugnet Lubbe

Department of Statistical Sciences, University of Cape Town

[email protected]

November 2013

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 1 / 20

Page 2: Introduction to Functional Data Analysis

Break-Down

To represent the data in ways that aid further analysis.

To display the data so as to highlight various characteristics.

To study important sources of pattern and variation among the data.

To explain variation in dependent variable by using independentvariable information.

To compare two or more sets of data with respect to certain types ofvariation.

For illustration, the R-packages fda and fda.usc will be used.

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 2 / 20

Page 3: Introduction to Functional Data Analysis

Overview

1 Introduction

2 Basis RepresentationFourier BasisB-Splines

3 Summary Statistics for functional dataFunctional means and variancesCovariance and Correlation functions

4 Functional Principal Component Analysis (fPCA)

5 Functional Linear Regression Model (fLRM)

6 Some References

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 3 / 20

Page 4: Introduction to Functional Data Analysis

Introduction

The Main Equation

Zk(ti ) = X (ti ) + ε(ti ) for i = 1, . . . , n & k = 1, . . . ,N

Zk(ti ) is the noisy observation from the k-th cluster.

X (ti ) is the value of a continuous underlying process.

ε(ti ) is the error term.

N.B.: N denotes the number of observed curves on a discrete grid(ti , i = 1, . . . , n)

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 4 / 20

Page 5: Introduction to Functional Data Analysis

Introduction

Example: The Canadian Weather (temperatures and precipitations)

daily observations (i.e. Zk(ti ));35 different weather stations (i.e. k = 1, . . . , 35);observed at time ti = 0.5, . . . , 364.5.

Therefore, our observed pairs will be (ti ,Zk(ti )).Plot of the raw data for the station located in Saint Johns & Halifax.

Figure :(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 5 / 20

Page 6: Introduction to Functional Data Analysis

Basis Representation

Example: The Canadian Weather (continued)

X (ti ) continuous process observed at 365 discrete observations.

Finding a linear combination of K basis functions (0 < K < 365)

X (t) ≈K∑

k=1

θkφk(t) with φk(t) as basis functions and θk as the

coefficients.

Types of basis functions:

Fourier Basis

B-Splines

Remark: The optimal number of basis functions is determined by using ageneralized cross validation criterion (GCV).

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 6 / 20

Page 7: Introduction to Functional Data Analysis

Fourier Basis

Definition

Useful for periodic data, Fourier basis expansion is composed by thefollowing orthonormal functions:

φo(t) = 1/√T , φ2r−1(t) =

sin(rωt)√T/2

and φ2r (t) =cos(rωt)√

T/2,

with r = 1, ..., L/2 where L is an even integer. The period T is by defaultthe range of discretization points t and ω = 2π/T .

In R: create.fourier.basis(rangeval, nbasis,...) (fda package).

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 7 / 20

Page 8: Introduction to Functional Data Analysis

Fourier Basis

Figure : Fourier Basis plot with 7 basis functions

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 8 / 20

Page 9: Introduction to Functional Data Analysis

B-Splines

Definition

Appropriate for non-periodic data.

Selecting a series of knots along the t-axis τ1 < τ2 < ... < τL+2M

where M is the order of the spline;

φk,m(t) = t−τkτk+m−1−τk φk,m−1(t) + τk+m−t

τk+m−τk+1φk+1,m−1(t) for

k = 1, ..., L + 2M −m and φk,1(t) = I[τk ;τk+1](t).

In R: create.bspline.basis(rangeval,nbasis,norder,...) (fda package)

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 9 / 20

Page 10: Introduction to Functional Data Analysis

B-Splines

Figure : B-Splines of order 4

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 10 / 20

Page 11: Introduction to Functional Data Analysis

Basis Representation

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 11 / 20

Page 12: Introduction to Functional Data Analysis

Summary Statistics

The usual tools used for summarizing data in an univariate context remainthe same for functional data

Definition

functional mean: X̄ (t) = N−1N∑i=1

Xi (t).

functional variance: Var(X (t)) = (N − 1)−1N∑i=1

(Xi (t)− X̄ (t))2.

functional covariance:

Cov(X (t),X (s)) = (N − 1)−1N∑i=1

{(Xi (s)− X̄ (s))(Xi (t)− X̄ (t))

}.

In R, mean.fd & var.fd (fda package).Remark: The values returned will also be objects of class fd & fdata.

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 12 / 20

Page 13: Introduction to Functional Data Analysis

Mean Function and Standard Deviation

Figure : Mean temperature and standard deviation

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 13 / 20

Page 14: Introduction to Functional Data Analysis

Correlation Function

Figure : Temperature Correlation Function

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 14 / 20

Page 15: Introduction to Functional Data Analysis

Functional Principal Component Analysis (f PCA)

Primarily used as a tool for dimension reduction, it is designed to explainthe source of variation within the functional data created.

Algorithm

1 Find the function ξ1(t) of norm 1 (i.e.∫ξ21(t)dt = 1) such that

N−1∑

i f2i1 is maximized with fi1 =

∫ξ1(t)X c

i (t)dt.

2 On the mth step (m > 1), compute ξm(t) with the orthogonalityconstraint(s):

∫ξm(t)ξk(t)dt = 0, for k < m.

The functional data will therefore be: X̂i (t) =∑M

k=1 fik ξ̂k(t) wherefik =

∫ξk(t)X c

i (t)dt with X ci = Xi (t)− X̄ (t).

f PCA in R: fdata2pc(fdataobj, ncomp,...) (fda.usc package)

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 15 / 20

Page 16: Introduction to Functional Data Analysis

Functional Principal Component Analysis (f PCA)

Example (Canadian Weather)

R> temp.svd <- fdata2pc(tempdat.fdata, ncomp=3)

R> norm.fdata(temp.svd$rotation[1:2])

[,1]

[1,] 0.9976567

[2,] 0.9980333

# With 3 components that explained 98.56% of the

variability of explicative variables.

# Variability for each component (%): PC1 88.03 PC2 8.47

PC3 2.06

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 16 / 20

Page 17: Introduction to Functional Data Analysis

Functional Principal Component Analysis (f PCA)

Figure : Loadings for PC1 & PC2

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 17 / 20

Page 18: Introduction to Functional Data Analysis

Functional Linear Regression Model (f LRM)

Consider the following functional linear regression models:

Functional response with multivariate covariates:

yi (t) = β1(t)xi1 + · · ·+ βp(t)xip + εi (t); i = 1, . . . ,N

Scalar response with functional covariates:

yi = α +

T∫0

p∑j=1

βj(s)xij(s)ds + εi ; i = 1, . . . ,N; s ∈ [0,T ].

Functional response with functional covariates:

yi (t) = α(t) +

T∫0

p∑j=1

βj(t, s)xij(s)ds + εi (t); i = 1, . . . ,N; s ∈ [0,T ].

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 18 / 20

Page 19: Introduction to Functional Data Analysis

Useful References

M. Febrero-Bande, M. O. De La Fuente (2012)

Statistical Computing in Functional Data Analysis: The R Package fda.usc.

J. O. Ramsay, G. Hooker and S. Graves (2009)

Functional Data Analysis in R and Matlab.

T. Hastie, R. Tibshirani, J. Friedman (2009)

The Elements of Statistical Learning.

J. O. Ramsay and B. W. Silverman (2005)

Functional Data Analysis

Carl de Boor (1978),

A practical guide to splines, Springer-Verlag, New York Heidelberg Berlin.

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 19 / 20

Page 20: Introduction to Functional Data Analysis

(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 20 / 20