functional data analysis - lecture 4 & 5 mathematical ... · functional form of the dependence...

40
Functional Data Analysis Lecture – 4 & 5 Mathematical foundation & Exploratory Data Analysis May 8, 2018

Upload: others

Post on 06-Nov-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Functional Data AnalysisLecture – 4 & 5

Mathematical foundation & Exploratory Data Analysis

May 8, 2018

Page 2: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Outline

Page 3: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Usual linear regression model

y = X α + ε

Corresponding assumptions:

ε ∼ N (0,Σ) (major); mostly, weassume that Σ = I.

In fda, the data is usually some realisation of a stochasticprocess, as against a random variable. Therefore, we’re mostlyinterested in:

y(t) = f (t) + ε(t),

where we wish to estimate f (t), given the observation y(t).

Page 4: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Usual linear regression model

y = X α + ε

Corresponding assumptions: ε ∼ N (0,Σ) (major);

mostly, weassume that Σ = I.

In fda, the data is usually some realisation of a stochasticprocess, as against a random variable. Therefore, we’re mostlyinterested in:

y(t) = f (t) + ε(t),

where we wish to estimate f (t), given the observation y(t).

Page 5: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Usual linear regression model

y = X α + ε

Corresponding assumptions: ε ∼ N (0,Σ) (major); mostly, weassume that Σ = I.

In fda, the data is usually some realisation of a stochasticprocess, as against a random variable. Therefore, we’re mostlyinterested in:

y(t) = f (t) + ε(t),

where we wish to estimate f (t), given the observation y(t).

Page 6: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Usual linear regression model

y = X α + ε

Corresponding assumptions: ε ∼ N (0,Σ) (major); mostly, weassume that Σ = I.

In fda, the data is usually some realisation of a stochasticprocess, as against a random variable.

Therefore, we’re mostlyinterested in:

y(t) = f (t) + ε(t),

where we wish to estimate f (t), given the observation y(t).

Page 7: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Usual linear regression model

y = X α + ε

Corresponding assumptions: ε ∼ N (0,Σ) (major); mostly, weassume that Σ = I.

In fda, the data is usually some realisation of a stochasticprocess, as against a random variable. Therefore, we’re mostlyinterested in:

y(t) = f (t) + ε(t),

where we wish to estimate f (t), given the observation y(t).

Page 8: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Comparing the finite and the infinite dimensional models

Usual regressionMostly linear, or functionalform of the dependence isknown.

Estimation of thecoefficients relies on thedistributional assumptionson noise.

fda

Functional form of thedependence is not known,thus need someassumptions: thefunctional space and thebasisWhat should be thedistribution of noise?

We know how to define mean and (co)variance of randomvectors, but now we need to define the same for infinitedimensional random elements.

Page 9: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Comparing the finite and the infinite dimensional models

Usual regressionMostly linear, or functionalform of the dependence isknown.

Estimation of thecoefficients relies on thedistributional assumptionson noise.

fdaFunctional form of thedependence is not known,thus need someassumptions: thefunctional space and thebasis

What should be thedistribution of noise?

We know how to define mean and (co)variance of randomvectors, but now we need to define the same for infinitedimensional random elements.

Page 10: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Comparing the finite and the infinite dimensional models

Usual regressionMostly linear, or functionalform of the dependence isknown.Estimation of thecoefficients relies on thedistributional assumptionson noise.

fdaFunctional form of thedependence is not known,thus need someassumptions: thefunctional space and thebasis

What should be thedistribution of noise?

We know how to define mean and (co)variance of randomvectors, but now we need to define the same for infinitedimensional random elements.

Page 11: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Comparing the finite and the infinite dimensional models

Usual regressionMostly linear, or functionalform of the dependence isknown.Estimation of thecoefficients relies on thedistributional assumptionson noise.

fdaFunctional form of thedependence is not known,thus need someassumptions: thefunctional space and thebasisWhat should be thedistribution of noise?

We know how to define mean and (co)variance of randomvectors, but now we need to define the same for infinitedimensional random elements.

Page 12: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Comparing the finite and the infinite dimensional models

Usual regressionMostly linear, or functionalform of the dependence isknown.Estimation of thecoefficients relies on thedistributional assumptionson noise.

fdaFunctional form of thedependence is not known,thus need someassumptions: thefunctional space and thebasisWhat should be thedistribution of noise?

We know how to define mean and (co)variance of randomvectors, but now we need to define the same for infinitedimensional random elements.

Page 13: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Outline

Page 14: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

A simplest model for the data space is Hilbert space– acollection of elements which has infinitely many (but countable)basis elements, and one can define an inner product betweenany pair of elements.

For example:`2 = {(a1,a2,a3, . . .) : ai ∈ R, and

∑i≥1 a2

i <∞}.

Looks similar to Rn; and we know how to define standardnormal on Rn. So what about standard normal on `2?

Let us recall how to characterise standard normal distributionon Rn:

density is given by (2π)−n/2 exp(− x2

2

)or, all linear combinations are Normal with appropriatemean and variance.

Page 15: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

A simplest model for the data space is Hilbert space– acollection of elements which has infinitely many (but countable)basis elements, and one can define an inner product betweenany pair of elements.For example:`2 = {(a1,a2,a3, . . .) : ai ∈ R, and

∑i≥1 a2

i <∞}.

Looks similar to Rn; and we know how to define standardnormal on Rn. So what about standard normal on `2?

Let us recall how to characterise standard normal distributionon Rn:

density is given by (2π)−n/2 exp(− x2

2

)or, all linear combinations are Normal with appropriatemean and variance.

Page 16: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

A simplest model for the data space is Hilbert space– acollection of elements which has infinitely many (but countable)basis elements, and one can define an inner product betweenany pair of elements.For example:`2 = {(a1,a2,a3, . . .) : ai ∈ R, and

∑i≥1 a2

i <∞}.

Looks similar to Rn; and we know how to define standardnormal on Rn. So what about standard normal on `2?

Let us recall how to characterise standard normal distributionon Rn:

density is given by (2π)−n/2 exp(− x2

2

)or, all linear combinations are Normal with appropriatemean and variance.

Page 17: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

A simplest model for the data space is Hilbert space– acollection of elements which has infinitely many (but countable)basis elements, and one can define an inner product betweenany pair of elements.For example:`2 = {(a1,a2,a3, . . .) : ai ∈ R, and

∑i≥1 a2

i <∞}.

Looks similar to Rn; and we know how to define standardnormal on Rn. So what about standard normal on `2?

Let us recall how to characterise standard normal distributionon Rn:

density is given by (2π)−n/2 exp(− x2

2

)

or, all linear combinations are Normal with appropriatemean and variance.

Page 18: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

A simplest model for the data space is Hilbert space– acollection of elements which has infinitely many (but countable)basis elements, and one can define an inner product betweenany pair of elements.For example:`2 = {(a1,a2,a3, . . .) : ai ∈ R, and

∑i≥1 a2

i <∞}.

Looks similar to Rn; and we know how to define standardnormal on Rn. So what about standard normal on `2?

Let us recall how to characterise standard normal distributionon Rn:

density is given by (2π)−n/2 exp(− x2

2

)or, all linear combinations are Normal with appropriatemean and variance.

Page 19: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Density?

Density is always defined with respect to the Lebesguemeasure,

which does not exist when the dimension of thespace goes to infinity.

Page 20: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Density?

Density is always defined with respect to the Lebesguemeasure, which does not exist when the dimension of thespace goes to infinity.

Page 21: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Linear combinations?

Adaptable even to infinite dimensions.

Just need to know whatkind of linear combinations should be admissible.One can define Gaussian distribution on `2 with mean m, andcovariance C as long as

m ∈ `2,

and C is a linear operator1 on `2 such that∑i≥1

〈Cei ,ei〉 <∞,

where {ei} is an orthonormal basis of `2. But why?

1Discuss. Compare with matrices.

Page 22: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Linear combinations?

Adaptable even to infinite dimensions. Just need to know whatkind of linear combinations should be admissible.

One can define Gaussian distribution on `2 with mean m, andcovariance C as long as

m ∈ `2,

and C is a linear operator1 on `2 such that∑i≥1

〈Cei ,ei〉 <∞,

where {ei} is an orthonormal basis of `2. But why?

1Discuss. Compare with matrices.

Page 23: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Linear combinations?

Adaptable even to infinite dimensions. Just need to know whatkind of linear combinations should be admissible.One can define Gaussian distribution on `2 with mean m, andcovariance C as long as

m ∈ `2,

and C is a linear operator1 on `2 such that∑i≥1

〈Cei ,ei〉 <∞,

where {ei} is an orthonormal basis of `2.

But why?

1Discuss. Compare with matrices.

Page 24: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Linear combinations?

Adaptable even to infinite dimensions. Just need to know whatkind of linear combinations should be admissible.One can define Gaussian distribution on `2 with mean m, andcovariance C as long as

m ∈ `2,

and C is a linear operator1 on `2 such that∑i≥1

〈Cei ,ei〉 <∞,

where {ei} is an orthonormal basis of `2. But why?

1Discuss. Compare with matrices.

Page 25: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

A quick explanation

Let us consider X = (X1,X2, . . .) an `2-valued random variabledistributed as standard Gaussian.

Meaning {Xi}i≥1 are i.i.d.standard normal. Implying that C = I on `2.Clearly,∑

i≥1

〈Iei ,ei〉 =∞.

What is the magnitude/size of such a Gaussian element?

E(‖X‖2

)= E

∑i≥1

X 2i

=∞

What about ‖X‖2 in general? ‖X‖2 =∞ almost surely. Onecan actually show that ‖X‖2 <∞, whenever C is trace class.

Page 26: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

A quick explanation

Let us consider X = (X1,X2, . . .) an `2-valued random variabledistributed as standard Gaussian. Meaning {Xi}i≥1 are i.i.d.standard normal.

Implying that C = I on `2.Clearly,∑i≥1

〈Iei ,ei〉 =∞.

What is the magnitude/size of such a Gaussian element?

E(‖X‖2

)= E

∑i≥1

X 2i

=∞

What about ‖X‖2 in general? ‖X‖2 =∞ almost surely. Onecan actually show that ‖X‖2 <∞, whenever C is trace class.

Page 27: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

A quick explanation

Let us consider X = (X1,X2, . . .) an `2-valued random variabledistributed as standard Gaussian. Meaning {Xi}i≥1 are i.i.d.standard normal. Implying that C = I on `2.

Clearly,∑i≥1

〈Iei ,ei〉 =∞.

What is the magnitude/size of such a Gaussian element?

E(‖X‖2

)= E

∑i≥1

X 2i

=∞

What about ‖X‖2 in general? ‖X‖2 =∞ almost surely. Onecan actually show that ‖X‖2 <∞, whenever C is trace class.

Page 28: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

A quick explanation

Let us consider X = (X1,X2, . . .) an `2-valued random variabledistributed as standard Gaussian. Meaning {Xi}i≥1 are i.i.d.standard normal. Implying that C = I on `2.Clearly,∑

i≥1

〈Iei ,ei〉 =∞.

What is the magnitude/size of such a Gaussian element?

E(‖X‖2

)= E

∑i≥1

X 2i

=∞

What about ‖X‖2 in general? ‖X‖2 =∞ almost surely. Onecan actually show that ‖X‖2 <∞, whenever C is trace class.

Page 29: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

A quick explanation

Let us consider X = (X1,X2, . . .) an `2-valued random variabledistributed as standard Gaussian. Meaning {Xi}i≥1 are i.i.d.standard normal. Implying that C = I on `2.Clearly,∑

i≥1

〈Iei ,ei〉 =∞.

What is the magnitude/size of such a Gaussian element?

E(‖X‖2

)= E

∑i≥1

X 2i

=∞

What about ‖X‖2 in general? ‖X‖2 =∞ almost surely. Onecan actually show that ‖X‖2 <∞, whenever C is trace class.

Page 30: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

A quick explanation

Let us consider X = (X1,X2, . . .) an `2-valued random variabledistributed as standard Gaussian. Meaning {Xi}i≥1 are i.i.d.standard normal. Implying that C = I on `2.Clearly,∑

i≥1

〈Iei ,ei〉 =∞.

What is the magnitude/size of such a Gaussian element?

E(‖X‖2

)= E

∑i≥1

X 2i

=∞

What about ‖X‖2 in general? ‖X‖2 =∞ almost surely. Onecan actually show that ‖X‖2 <∞, whenever C is trace class.

Page 31: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

A quick explanation

Let us consider X = (X1,X2, . . .) an `2-valued random variabledistributed as standard Gaussian. Meaning {Xi}i≥1 are i.i.d.standard normal. Implying that C = I on `2.Clearly,∑

i≥1

〈Iei ,ei〉 =∞.

What is the magnitude/size of such a Gaussian element?

E(‖X‖2

)= E

∑i≥1

X 2i

=∞

What about ‖X‖2 in general?

‖X‖2 =∞ almost surely. Onecan actually show that ‖X‖2 <∞, whenever C is trace class.

Page 32: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

A quick explanation

Let us consider X = (X1,X2, . . .) an `2-valued random variabledistributed as standard Gaussian. Meaning {Xi}i≥1 are i.i.d.standard normal. Implying that C = I on `2.Clearly,∑

i≥1

〈Iei ,ei〉 =∞.

What is the magnitude/size of such a Gaussian element?

E(‖X‖2

)= E

∑i≥1

X 2i

=∞

What about ‖X‖2 in general? ‖X‖2 =∞ almost surely.

Onecan actually show that ‖X‖2 <∞, whenever C is trace class.

Page 33: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

A quick explanation

Let us consider X = (X1,X2, . . .) an `2-valued random variabledistributed as standard Gaussian. Meaning {Xi}i≥1 are i.i.d.standard normal. Implying that C = I on `2.Clearly,∑

i≥1

〈Iei ,ei〉 =∞.

What is the magnitude/size of such a Gaussian element?

E(‖X‖2

)= E

∑i≥1

X 2i

=∞

What about ‖X‖2 in general? ‖X‖2 =∞ almost surely. Onecan actually show that ‖X‖2 <∞, whenever C is trace class.

Page 34: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Covariance between various linear combinationsIn finite dimensions:On Rn, let Y ∼ N (µ,Σ), then 〈a,Y 〉 and 〈b,Y 〉 are bothnormally distributed, with covariance 〈Σa,b〉.

In infinite dimensions:On `2, let X ∼ N (m, C), then 〈a,X 〉 and 〈b,X 〉 are bothnormally distributed, with covariance 〈Ca,b〉, i.e.,

〈Ca,b〉 = E[〈a,X 〉〈b,X 〉]

After some analysis, one can conclude that

C =∑i≥1

λi φi ⊗ φi (Mercer’s Theorem)

for some ONB {φi}. (compare with matrices, and discuss L2

representation).

Page 35: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Covariance between various linear combinationsIn finite dimensions:On Rn, let Y ∼ N (µ,Σ), then 〈a,Y 〉 and 〈b,Y 〉 are bothnormally distributed, with covariance 〈Σa,b〉.

In infinite dimensions:On `2, let X ∼ N (m, C), then 〈a,X 〉 and 〈b,X 〉 are bothnormally distributed, with covariance 〈Ca,b〉, i.e.,

〈Ca,b〉 = E[〈a,X 〉〈b,X 〉]

After some analysis, one can conclude that

C =∑i≥1

λi φi ⊗ φi (Mercer’s Theorem)

for some ONB {φi}. (compare with matrices, and discuss L2

representation).

Page 36: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Covariance between various linear combinationsIn finite dimensions:On Rn, let Y ∼ N (µ,Σ), then 〈a,Y 〉 and 〈b,Y 〉 are bothnormally distributed, with covariance 〈Σa,b〉.

In infinite dimensions:On `2, let X ∼ N (m, C), then 〈a,X 〉 and 〈b,X 〉 are bothnormally distributed, with covariance 〈Ca,b〉, i.e.,

〈Ca,b〉 = E[〈a,X 〉〈b,X 〉]

After some analysis, one can conclude that

C =∑i≥1

λi φi ⊗ φi (Mercer’s Theorem)

for some ONB {φi}. (compare with matrices, and discuss L2

representation).

Page 37: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

In fact, there exist a sequence {ξn} of zero-mean, uncorrelatedrandom variables such that E(ξ2

i ) = λi , and

Y =∑i≥1

ξi φi (Karhunen–Loeve expansion)

Special case

In case the functional data is realisation of certain stochasticprocess, then the mean m is a mean function m(t) = E(X (t)),and the covariance operator becomes a covariance kernel,C(s, t) = cov(X (s)X (t)).

Page 38: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

In fact, there exist a sequence {ξn} of zero-mean, uncorrelatedrandom variables such that E(ξ2

i ) = λi , and

Y =∑i≥1

ξi φi (Karhunen–Loeve expansion)

Special case

In case the functional data is realisation of certain stochasticprocess, then the mean m is a mean function m(t) = E(X (t)),and the covariance operator becomes a covariance kernel,C(s, t) = cov(X (s)X (t)).

Page 39: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

Outline

Page 40: Functional Data Analysis - Lecture 4 & 5 Mathematical ... · Functional form of the dependence is not known, thus need some assumptions: the functional space and the basis What should

sample mean;sample covariancefunctional PCA (with Karhunen–Loeve –using probefunctions)