influence of the sampling on functional data analysis
DESCRIPTION
Short courses on functional data analysis and statistical learning, part 4TRANSCRIPT
![Page 1: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/1.jpg)
Influence of the sampling on Functional DataAnalysis
Nathalie Villa-Vialaneix - [email protected]://www.nathalievilla.org
Institut de Mathématiques de Toulouse - IUT de Carcassonne, Université de PerpignanFrance
La Havane, September 18th, 2008
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 1 / 30
![Page 2: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/2.jpg)
Table of contents
1 Introduction to the sampling problem
2 Approximating functions with splines
3 Using splines in functional models based on sampling
4 References
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 2 / 30
![Page 3: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/3.jpg)
We do not observe functional data !
In most theoretical work, the functional observations x1, . . . , xn are directlythe true functions.
Or even, not the true sampling:
xi =(xi(t i
1) + εi,t i1, xi(t i
2) + εi,t i2, . . . , xi(t i
di) + εi,t i
di
)
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 3 / 30
![Page 4: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/4.jpg)
We do not observe functional data !
But, in fact we can observe:
xi =
(xi(a), xi
(a +
b − aL
), xi
(a +
b − a2L
), . . . , xi(b)
)
Or even, not the true sampling:
xi =(xi(t i
1) + εi,t i1, xi(t i
2) + εi,t i2, . . . , xi(t i
di) + εi,t i
di
)
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 3 / 30
![Page 5: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/5.jpg)
We do not observe functional data !
Or:xi =
(xi(t i
1), xi(t i2), . . . , xi(t i
di))
Or even, not the true sampling:
xi =(xi(t i
1) + εi,t i1, xi(t i
2) + εi,t i2, . . . , xi(t i
di) + εi,t i
di
)
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 3 / 30
![Page 6: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/6.jpg)
We do not observe functional data !
Or even, not the true sampling:
xi =(xi(t i
1) + εi,t i1, xi(t i
2) + εi,t i2, . . . , xi(t i
di) + εi,t i
di
)Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 3 / 30
![Page 7: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/7.jpg)
Consequences on the estimators and their errors
Hence, most of times, functional data analysis consists in1 Building estimators of the xi from their sampling, x̂i = Υ(xi)
2 Using x̂i (or its derivatives) as if they were the true functions xi .
Problem: Most of theoretical results presented in the past days werebased on the knowledge of xi and not on an the estimation x̂i .What are the consequences on
The estimate Ψn of the regression function Ψ?
The consistency of the error to the optimal Bayes error
when using approximation of xi?
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 4 / 30
![Page 8: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/8.jpg)
Consequences on the estimators and their errors
Hence, most of times, functional data analysis consists in1 Building estimators of the xi from their sampling, x̂i = Υ(xi)
2 Using x̂i (or its derivatives) as if they were the true functions xi .
Problem: Most of theoretical results presented in the past days werebased on the knowledge of xi and not on an the estimation x̂i .
What are the consequences on
The estimate Ψn of the regression function Ψ?
The consistency of the error to the optimal Bayes error
when using approximation of xi?
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 4 / 30
![Page 9: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/9.jpg)
Consequences on the estimators and their errors
Hence, most of times, functional data analysis consists in1 Building estimators of the xi from their sampling, x̂i = Υ(xi)
2 Using x̂i (or its derivatives) as if they were the true functions xi .
Problem: Most of theoretical results presented in the past days werebased on the knowledge of xi and not on an the estimation x̂i .What are the consequences on
The estimate Ψn of the regression function Ψ?
The consistency of the error to the optimal Bayes error
when using approximation of xi?
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 4 / 30
![Page 10: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/10.jpg)
Notations and assumptions
Suppose that we are studying the random pair (X ,Y) where:
X is functional and takes its values in the Hilbert space (X, 〈., .〉X);
Y takes its values inclassification case: {−1, 1};regression case: R.
Suppose that we observe (xτi , yi)i=1,...,n where:
τ is the set of sampling points (the same for all functions);
(xi , yi)i are i.i.d. copies of (X ,Y).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 5 / 30
![Page 11: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/11.jpg)
Notations and assumptions
Suppose that we are studying the random pair (X ,Y) where:
X is functional and takes its values in the Hilbert space (X, 〈., .〉X);Y takes its values in
classification case: {−1, 1};regression case: R.
Suppose that we observe (xτi , yi)i=1,...,n where:
τ is the set of sampling points (the same for all functions);
(xi , yi)i are i.i.d. copies of (X ,Y).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 5 / 30
![Page 12: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/12.jpg)
Notations and assumptions
Suppose that we are studying the random pair (X ,Y) where:
X is functional and takes its values in the Hilbert space (X, 〈., .〉X);Y takes its values in
classification case: {−1, 1};regression case: R.
Suppose that we observe (xτi , yi)i=1,...,n where:
xτi = (xi(t))t∈τ
(non noisy case)
τ is the set of sampling points (the same for all functions);
(xi , yi)i are i.i.d. copies of (X ,Y).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 5 / 30
![Page 13: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/13.jpg)
Notations and assumptions
Suppose that we are studying the random pair (X ,Y) where:
X is functional and takes its values in the Hilbert space (X, 〈., .〉X);Y takes its values in
classification case: {−1, 1};regression case: R.
Suppose that we observe (xτi , yi)i=1,...,n where:
xτi = (xi(t) + εi,t )t∈τ
(noisy case)
τ is the set of sampling points (the same for all functions);
(xi , yi)i are i.i.d. copies of (X ,Y).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 5 / 30
![Page 14: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/14.jpg)
Table of contents
1 Introduction to the sampling problem
2 Approximating functions with splines
3 Using splines in functional models based on sampling
4 References
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 6 / 30
![Page 15: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/15.jpg)
A smooth representation of sampled functions
Given xτ , splines at providing the smoothest representation as possibleof x : [0, 1]→ R.
More precisely, x is approximated by:
x̂λ,τ = arg minhHm
1|τ|
∑t∈τ
(h(t) − xτt
)2+ λ
∫[0,1]
h(m)(t)dt
where for a m > 3/2, the Sobolev space Hm is defined by
Hm = {h ∈ L2([0, 1]) : ∀ k = 1, . . . ,m, h(k) exists in a weak sense
and h(m) ∈ L2([0, 1])}.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 7 / 30
![Page 16: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/16.jpg)
A smooth representation of sampled functions
Given xτ , splines at providing the smoothest representation as possibleof x : [0, 1]→ R. More precisely, x is approximated by:
x̂λ,τ = arg minhHm
1|τ|
∑t∈τ
(h(t) − xτt
)2+ λ
∫[0,1]
h(m)(t)dt
where for a m > 3/2, the Sobolev space Hm is defined by
Hm = {h ∈ L2([0, 1]) : ∀ k = 1, . . . ,m, h(k) exists in a weak sense
and h(m) ∈ L2([0, 1])}.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 7 / 30
![Page 17: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/17.jpg)
Decomposition of Hm
The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '
∥∥∥h(m)∥∥∥
L2 .
This can be done by decomposing Hm into Hm = Hm0 ⊕H
m1 where
Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree
less or equal to m − 1);
Hm1 is an infinite dimensional subspace of Hm defined via m
boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.
Example 1: For m = 2, B : h → (h(0), h(1)) andHm
1 = {h ∈ H2 : h(0) = h(1) = 0}Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm
1 = {h ∈ Hm : Bh = 0}
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 8 / 30
![Page 18: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/18.jpg)
Decomposition of Hm
The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '
∥∥∥h(m)∥∥∥
L2 .This can be done by decomposing Hm into Hm = Hm
0 ⊕Hm1 where
Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree
less or equal to m − 1);
Hm1 is an infinite dimensional subspace of Hm defined via m
boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.
Example 1: For m = 2, B : h → (h(0), h(1)) andHm
1 = {h ∈ H2 : h(0) = h(1) = 0}Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm
1 = {h ∈ Hm : Bh = 0}
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 8 / 30
![Page 19: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/19.jpg)
Decomposition of Hm
The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '
∥∥∥h(m)∥∥∥
L2 .This can be done by decomposing Hm into Hm = Hm
0 ⊕Hm1 where
Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree
less or equal to m − 1);
Hm1 is an infinite dimensional subspace of Hm defined via m
boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.
Example 1: For m = 2, B : h → (h(0), h(1)) andHm
1 = {h ∈ H2 : h(0) = h(1) = 0}Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm
1 = {h ∈ Hm : Bh = 0}
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 8 / 30
![Page 20: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/20.jpg)
Decomposition of Hm
The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '
∥∥∥h(m)∥∥∥
L2 .This can be done by decomposing Hm into Hm = Hm
0 ⊕Hm1 where
Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree
less or equal to m − 1);
Hm1 is an infinite dimensional subspace of Hm defined via m
boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.
Example 1: For m = 2, B : h → (h(0), h(1)) andHm
1 = {h ∈ H2 : h(0) = h(1) = 0}
Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm
1 = {h ∈ Hm : Bh = 0}
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 8 / 30
![Page 21: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/21.jpg)
Decomposition of Hm
The key point for solving the previous optimization problem is to build aHilbert structure on Hm such that ‖h‖Hm '
∥∥∥h(m)∥∥∥
L2 .This can be done by decomposing Hm into Hm = Hm
0 ⊕Hm1 where
Hm0 =KerDm = Pm−1 (the space of polynomial functions of degree
less or equal to m − 1);
Hm1 is an infinite dimensional subspace of Hm defined via m
boundary conditions, denoted B : Hm → Rm, that are such that:KerB ∩ Pm−1.
Example 1: For m = 2, B : h → (h(0), h(1)) andHm
1 = {h ∈ H2 : h(0) = h(1) = 0}Example 2: For m > 3/2, B : h → (h(0), h′(0), . . . , h(m−1)(0)) andHm
1 = {h ∈ Hm : Bh = 0}
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 8 / 30
![Page 22: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/22.jpg)
Hilbert structure of Hm
Hm0 and Hm
1 are Hilbert spaces with respect to the inner product:
∀ u, v ∈ Hm0 , 〈u, v〉Hm
0= (Bu)T (Bv),
∀ u, v ∈ Hm1 , 〈u, v〉Hm
1= 〈Dmu,Dmv〉L2 .
Hence, we obtain this way a inner product on Hm:
〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0
+ 〈P1(u),P1(v)〉Hm1
= 〈Dmu,Dmv〉L2 + (Bu)T (Bv)
where Pj is the projector on Hmj for j = 0, 1.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 9 / 30
![Page 23: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/23.jpg)
Hilbert structure of Hm
Hm0 and Hm
1 are Hilbert spaces with respect to the inner product:
∀ u, v ∈ Hm0 , 〈u, v〉Hm
0= (Bu)T (Bv),
∀ u, v ∈ Hm1 , 〈u, v〉Hm
1= 〈Dmu,Dmv〉L2 .
Hence, we obtain this way a inner product on Hm:
〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0
+ 〈P1(u),P1(v)〉Hm1
= 〈Dmu,Dmv〉L2 + (Bu)T (Bv)
where Pj is the projector on Hmj for j = 0, 1.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 9 / 30
![Page 24: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/24.jpg)
Hilbert structure of Hm
Hm0 and Hm
1 are Hilbert spaces with respect to the inner product:
∀ u, v ∈ Hm0 , 〈u, v〉Hm
0= (Bu)T (Bv),
∀ u, v ∈ Hm1 , 〈u, v〉Hm
1= 〈Dmu,Dmv〉L2 .
Hence, we obtain this way a inner product on Hm:
〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0
+ 〈P1(u),P1(v)〉Hm1
= 〈Dmu,Dmv〉L2 + (Bu)T (Bv)
where Pj is the projector on Hmj for j = 0, 1.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 9 / 30
![Page 25: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/25.jpg)
Hilbert structure of Hm
Hm0 and Hm
1 are Hilbert spaces with respect to the inner product:
∀ u, v ∈ Hm0 , 〈u, v〉Hm
0= (Bu)T (Bv),
∀ u, v ∈ Hm1 , 〈u, v〉Hm
1= 〈Dmu,Dmv〉L2 .
Hence, we obtain this way a inner product on Hm:
〈u, v〉Hm = 〈P0(u),P0(v)〉Hm0
+ 〈P1(u),P1(v)〉Hm1
= 〈Dmu,Dmv〉L2 + (Bu)T (Bv)
where Pj is the projector on Hmj for j = 0, 1.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 9 / 30
![Page 26: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/26.jpg)
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).
Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
![Page 27: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/27.jpg)
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.
Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).
Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
![Page 28: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/28.jpg)
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).
Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
![Page 29: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/29.jpg)
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).If m = 2 and boundary conditions are u(0) = u(1) = 0,{t → t , t → (1 − t)} is an orthonormal basis of Hm
0 and then
k0(s, t) = (1 − t)(1 − s) + st .
Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
![Page 30: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/30.jpg)
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0,
{t → t i
i!
}i=1,...,m−1
is an orthonormalbasis of Hm
0 and then
k0(s, t) =m−1∑k=0
tk sk
(k !)2.
Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
![Page 31: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/31.jpg)
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).Example 2: k1 can be found by the way of the Green function,G : [0, 1]2 → R, satisfying:
u =
∫[0,1]
G(., t)Dmu(t)dt .
We have:k1(s, t) =
∫[0,1]
G(s,w)G(t ,w)dw.
If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
![Page 32: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/32.jpg)
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).Example 2:If m = 2 and boundary conditions are u(0) = u(1) = 0, then
k1(s, t) = (s − t)3+ − s(1 − t)(s2 − 2t + t2)/6.
If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
![Page 33: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/33.jpg)
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
![Page 34: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/34.jpg)
RKHS structure of Hm
Equipped with 〈., .〉Hm , Hm0 , Hm
1 and Hm are Reproducing kernel Hilbertspaces. More precisely, it exists kernels kj : [0, 1]2 → R (j = 0, 1) suchthat:
∀ h ∈ Hmj , ∀ t ∈ [0, 1], 〈kj(t , .), u〉Hm
j= u(t).
Hence, k = k0 + k1 is the reproducing kernel of Hm.Example 1: k0 is easy to compute: if (e0, . . . , em−1) is an orthonormalbasis of Hm
1 = Pm−1 for the norm ‖.‖Hm0
, then k0(s, t) =∑m−1
i=1 ei(s)ei(t).Example 2:If m > 3/2 and the boundary conditions areh(0) = h′(0) = . . . = h(m−1)(0) = 0, then
k1(s, t) =
∫ 1
0
(t − w)m−1+ (s − w)m−1
+
(m − 1)!2dw.
See [Berlinet and Thomas-Agnan, 2004] for further details andexamples.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 10 / 30
![Page 35: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/35.jpg)
Assumptions for existence and unicity of a spline
(A1) |τ| ≥ m − 1;
(A2) sampling points are distinct in [0, 1];
(A3) the m boundary conditions B j are linearly independent from the|τ| linear forms h ∈ Hm → h(t) for t ∈ τ.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 11 / 30
![Page 36: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/36.jpg)
Assumptions for existence and unicity of a spline
(A1) |τ| ≥ m − 1;
(A2) sampling points are distinct in [0, 1];
(A3) the m boundary conditions B j are linearly independent from the|τ| linear forms h ∈ Hm → h(t) for t ∈ τ.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 11 / 30
![Page 37: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/37.jpg)
Assumptions for existence and unicity of a spline
(A1) |τ| ≥ m − 1;
(A2) sampling points are distinct in [0, 1];
(A3) the m boundary conditions B j are linearly independent from the|τ| linear forms h ∈ Hm → h(t) for t ∈ τ.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 11 / 30
![Page 38: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/38.jpg)
Computing the splines
Theorem [Kimeldorf and Wahba, 1971]Under assumptions (A1)-(A3), for all given xτ , the unique solution of theoptimization problem is:
x̂λ,τ = ωT(U(K1 + λI|τ |)
−1UT)−1
U(K1 + λI|τ |)−1xτ (1)
+ηT (K1 + λI|τ |)−1
(I|τ | − UT (U(K1 + λI|τ |)
−1UT )−1U(K1 + λI|τ |)−1
)xτ
= (ωT M0 + ηT M1)xτ
where
{ω1, . . . , ωm} is a basis of Pm−1, ω = (ω1, . . . , ωm)T andU = (ωi(t))i=1,...,m, t∈τ ;
η = (k1(t , .))Tt∈τ and K1 = (k1(t , t ′))t ,t ′∈τ .
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 12 / 30
![Page 39: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/39.jpg)
Computing inner products between splines
Corollary
Under assumptions (A1)-(A3),
〈̂uλ,τ , v̂λ,τ〉Hm = (uτ)T MTOWMOvτ + (uτ)T MT
1 K1M1vτ
= (uτ)T Mτvτ
where the matrix Mτ is symmetric and positive definite (and therefore,defines a inner product on R|τ |).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 13 / 30
![Page 40: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/40.jpg)
Assumptions for convergence of spline estimates
If τ = {t1, t2, . . . , t|τ |}, denote:
∆τ = max t1, t2 − t1, . . . , 1 − t|τ |−1, ∆τ = min{t2−t1, t3−t2, . . . , t|τ |−t|τ |−1}.
and suppose that we are given a series of sampling sets τ1, τ2, . . . , τd ,. . . .λ should now depend on τ. We then note (τd)d for the series of samplingsets and (λd)d for the associated series of regularizing parameters.Suppose:
(A4) There is R ∈ R such that ∆τ/∆τ ≤ R;
(A5) limd→+∞ |τd | = +∞ and limd→+∞ λd = 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 14 / 30
![Page 41: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/41.jpg)
Assumptions for convergence of spline estimates
If τ = {t1, t2, . . . , t|τ |}, denote:
∆τ = max t1, t2 − t1, . . . , 1 − t|τ |−1, ∆τ = min{t2−t1, t3−t2, . . . , t|τ |−t|τ |−1}.
and suppose that we are given a series of sampling sets τ1, τ2, . . . , τd ,. . . .λ should now depend on τ. We then note (τd)d for the series of samplingsets and (λd)d for the associated series of regularizing parameters.
Suppose:
(A4) There is R ∈ R such that ∆τ/∆τ ≤ R;
(A5) limd→+∞ |τd | = +∞ and limd→+∞ λd = 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 14 / 30
![Page 42: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/42.jpg)
Assumptions for convergence of spline estimates
If τ = {t1, t2, . . . , t|τ |}, denote:
∆τ = max t1, t2 − t1, . . . , 1 − t|τ |−1, ∆τ = min{t2−t1, t3−t2, . . . , t|τ |−t|τ |−1}.
and suppose that we are given a series of sampling sets τ1, τ2, . . . , τd ,. . . .λ should now depend on τ. We then note (τd)d for the series of samplingsets and (λd)d for the associated series of regularizing parameters.Suppose:
(A4) There is R ∈ R such that ∆τ/∆τ ≤ R;
(A5) limd→+∞ |τd | = +∞ and limd→+∞ λd = 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 14 / 30
![Page 43: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/43.jpg)
Assumptions for convergence of spline estimates
If τ = {t1, t2, . . . , t|τ |}, denote:
∆τ = max t1, t2 − t1, . . . , 1 − t|τ |−1, ∆τ = min{t2−t1, t3−t2, . . . , t|τ |−t|τ |−1}.
and suppose that we are given a series of sampling sets τ1, τ2, . . . , τd ,. . . .λ should now depend on τ. We then note (τd)d for the series of samplingsets and (λd)d for the associated series of regularizing parameters.Suppose:
(A4) There is R ∈ R such that ∆τ/∆τ ≤ R;
(A5) limd→+∞ |τd | = +∞ and limd→+∞ λd = 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 14 / 30
![Page 44: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/44.jpg)
Convergence of spline estimates
Theorem [Ragozin, 1983]
Under assumptions (A1)-(A3), there are two constants, AR ,m and BR ,m,depending only on R and m, such that for any x ∈ Hm and any positive λ:
∥∥∥̂xλ,τ − x∥∥∥2
L2 ≤
(AR ,mλ + BR ,m
1|τ|2m
) ∥∥∥Dmx∥∥∥2
L2 .
Thus, under additional assumption (A4),∥∥∥̂xλ,τd − x
∥∥∥L2
d→+∞−−−−−−→ 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 15 / 30
![Page 45: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/45.jpg)
Convergence of spline estimates
Theorem [Ragozin, 1983]
Under assumptions (A1)-(A3), there are two constants, AR ,m and BR ,m,depending only on R and m, such that for any x ∈ Hm and any positive λ:
∥∥∥̂xλ,τ − x∥∥∥2
L2 ≤
(AR ,mλ + BR ,m
1|τ|2m
) ∥∥∥Dmx∥∥∥2
L2 .
Thus, under additional assumption (A4),∥∥∥̂xλ,τd − x
∥∥∥L2
d→+∞−−−−−−→ 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 15 / 30
![Page 46: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/46.jpg)
Just a single example: Tecator dataset
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 16 / 30
![Page 47: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/47.jpg)
Table of contents
1 Introduction to the sampling problem
2 Approximating functions with splines
3 Using splines in functional models based on sampling
4 References
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 17 / 30
![Page 48: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/48.jpg)
Notations and method
Suppose that we are given a pair of random variables (X ,Y) taking theirvalues in Hm × {−1, 1} (classification case) or in Hm × {−1, 1} (regressioncase).
We are given a training set of size n, {(xτdi , yi)}i=1,...,n where
xτdi = (xi(t1), . . . , xi(t|τd |));
{(xi , yi)}i are i.i.d. copies of (X ,Y).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 18 / 30
![Page 49: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/49.jpg)
Notations and method
Suppose that we are given a pair of random variables (X ,Y) taking theirvalues in Hm × {−1, 1} (classification case) or in Hm × {−1, 1} (regressioncase).We are given a training set of size n, {(xτd
i , yi)}i=1,...,n where
xτdi = (xi(t1), . . . , xi(t|τd |));
{(xi , yi)}i are i.i.d. copies of (X ,Y).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 18 / 30
![Page 50: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/50.jpg)
A general consistent method based on derivatives
Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.
If the definition of ψD is based on the norm or inner product between (ui)i
(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:
〈Dmxi ,Dmxj〉L2
' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi
τd , xjτd 〉Rd
Write Qτd for the transpose of the Choleski triangle of Mτd
((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd
i )i by using
φn,τd = ψεn,τd
where εn,τd = {(Qτd xτdi , yi)}i .
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 19 / 30
![Page 51: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/51.jpg)
A general consistent method based on derivatives
Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.If the definition of ψD is based on the norm or inner product between (ui)i
(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:
〈Dmxi ,Dmxj〉L2
' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi
τd , xjτd 〉Rd
Write Qτd for the transpose of the Choleski triangle of Mτd
((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd
i )i by using
φn,τd = ψεn,τd
where εn,τd = {(Qτd xτdi , yi)}i .
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 19 / 30
![Page 52: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/52.jpg)
A general consistent method based on derivatives
Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.If the definition of ψD is based on the norm or inner product between (ui)i
(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:
〈Dmxi ,Dmxj〉L2 ' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm
= 〈Mτd xiτd , xj
τd 〉Rd
Write Qτd for the transpose of the Choleski triangle of Mτd
((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd
i )i by using
φn,τd = ψεn,τd
where εn,τd = {(Qτd xτdi , yi)}i .
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 19 / 30
![Page 53: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/53.jpg)
A general consistent method based on derivatives
Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.If the definition of ψD is based on the norm or inner product between (ui)i
(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:
〈Dmxi ,Dmxj〉L2 ' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi
τd , xjτd 〉Rd
Write Qτd for the transpose of the Choleski triangle of Mτd
((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd
i )i by using
φn,τd = ψεn,τd
where εn,τd = {(Qτd xτdi , yi)}i .
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 19 / 30
![Page 54: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/54.jpg)
A general consistent method based on derivatives
Consider a consistent classification or regression scheme for data ofRp × {−1, 1} (or Rp × R) and denote by ψD the obtained classifier (orregression function) based on a learning set D = {(u1, y1), . . . , (un, yn)}.If the definition of ψD is based on the norm or inner product between (ui)i
(and, of course on yi values), then this method can be generalized to beused with derivatives of xi by replacing this inner product by:
〈Dmxi ,Dmxj〉L2 ' 〈x̂iλd ,τd , x̂jλd ,τd〉Hm = 〈Mτd xi
τd , xjτd 〉Rd
Write Qτd for the transpose of the Choleski triangle of Mτd
((Qτd )T Qτd = Mτd ), we thus define a classifier or a regression functionon (Dmxi)i using only the discrete sampling (xτd
i )i by using
φn,τd = ψεn,τd
where εn,τd = {(Qτd xτdi , yi)}i .
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 19 / 30
![Page 55: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/55.jpg)
Consistency property
Theorem [Rossi and Villa, 2008]Under assumptions (A1)-(A4) and assumption(A5a) E
(∥∥∥Dmx∥∥∥
L2
)is finite and Y ∈ {−1, 1};
or(A5b) τd ⊂ τd+1 and E
(Y2
)is finite, we have
limd→+∞
limn→+∞
Lφn,τd = L∗.
Sketch of the proof:
Using assumptions on the consistency of themultidimensional method, we have, for all d
Lφn,τd − L∗τd
n→+∞−−−−−−→ 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 20 / 30
![Page 56: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/56.jpg)
Consistency property
Theorem [Rossi and Villa, 2008]Under assumptions (A1)-(A4) and assumption(A5a) E
(∥∥∥Dmx∥∥∥
L2
)is finite and Y ∈ {−1, 1};
or(A5b) τd ⊂ τd+1 and E
(Y2
)is finite, we have
limd→+∞
limn→+∞
Lφn,τd = L∗.
Sketch of the proof: Using convergence of the splines - (A5a) - and amartingale argument - (A5b) - we demonstrate that
L∗τd− L∗
d→+∞−−−−−−→ 0
where L∗τd= infφ:R|τd |→R P (φ(Xτd ) , Y) (classification case) or
L∗τd= infφ:R|τd |→R E
([φ(Xτd ) − Y ]2
)(regression case).
Using assumptionson the consistency of the multidimensional method, we have, for all d
Lφn,τd − L∗τd
n→+∞−−−−−−→ 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 20 / 30
![Page 57: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/57.jpg)
Consistency property
Theorem [Rossi and Villa, 2008]Under assumptions (A1)-(A4) and assumption(A5a) E
(∥∥∥Dmx∥∥∥
L2
)is finite and Y ∈ {−1, 1};
or(A5b) τd ⊂ τd+1 and E
(Y2
)is finite, we have
limd→+∞
limn→+∞
Lφn,τd = L∗.
Sketch of the proof: Using assumptions on the consistency of themultidimensional method, we have, for all d
Lφn,τd − L∗τd
n→+∞−−−−−−→ 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 20 / 30
![Page 58: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/58.jpg)
Application to kernel methods (SVM and kernel ridgeregression)
Provided additional assumptions, kernel methods
FD = arg minn∑
i=1
L(yi ,F(ui)) + C ‖F‖S
are consistent both for classification and for regression purposes:[Steinwart, 2002, Christmann and Steinwart, 2007].
Application to the general framework to kernel methods lead to thedefinition of the following kernel:
Kτd = K ◦ Qτd
from (R|τd |)2 to R, where K is any usual multidimensional kernel.This kernel uses the sampling xτd
i to approximately compute thefunctional kernel
K : (u, v) ∈ Hm → K(‖u − v‖Hm ).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 21 / 30
![Page 59: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/59.jpg)
Application to kernel methods (SVM and kernel ridgeregression)
Provided additional assumptions, kernel methods
FD = arg minn∑
i=1
L(yi ,F(ui)) + C ‖F‖S
are consistent both for classification and for regression purposes:[Steinwart, 2002, Christmann and Steinwart, 2007].Application to the general framework to kernel methods lead to thedefinition of the following kernel:
Kτd = K ◦ Qτd
from (R|τd |)2 to R, where K is any usual multidimensional kernel.
This kernel uses the sampling xτdi to approximately compute the
functional kernel
K : (u, v) ∈ Hm → K(‖u − v‖Hm ).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 21 / 30
![Page 60: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/60.jpg)
Application to kernel methods (SVM and kernel ridgeregression)
Provided additional assumptions, kernel methods
FD = arg minn∑
i=1
L(yi ,F(ui)) + C ‖F‖S
are consistent both for classification and for regression purposes:[Steinwart, 2002, Christmann and Steinwart, 2007].Application to the general framework to kernel methods lead to thedefinition of the following kernel:
Kτd = K ◦ Qτd
from (R|τd |)2 to R, where K is any usual multidimensional kernel.This kernel uses the sampling xτd
i to approximately compute thefunctional kernel
K : (u, v) ∈ Hm → K(‖u − v‖Hm ).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 21 / 30
![Page 61: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/61.jpg)
Corollary: consistency of kernel based method forclassification
Derivative based SVM consistency
Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)
for all d, the kernel K is universal on any compact subset of R|τ| andhas a covering number of the form N(K , ε) = On(ε−αd ) for a αd > 0on this compact subset,
the sequence of regularization parameters C ≡ (Cdn ) is such that for
each d, limn→+∞ nCdn = +∞ and Cd
n = On
(nβd−1
)for a 0 < βd <
1αd
,
and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the SVM classifier is universally consistent:
limd→+∞
limn→+∞
E (φn,τd ) = L∗.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 22 / 30
![Page 62: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/62.jpg)
Corollary: consistency of kernel based method forclassification
Derivative based SVM consistency
Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)
for all d, the kernel K is universal on any compact subset of R|τ| andhas a covering number of the form N(K , ε) = On(ε−αd ) for a αd > 0on this compact subset,
the sequence of regularization parameters C ≡ (Cdn ) is such that for
each d, limn→+∞ nCdn = +∞ and Cd
n = On
(nβd−1
)for a 0 < βd <
1αd
,
and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .
Then, the SVM classifier is universally consistent:
limd→+∞
limn→+∞
E (φn,τd ) = L∗.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 22 / 30
![Page 63: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/63.jpg)
Corollary: consistency of kernel based method forclassification
Derivative based SVM consistency
Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)
for all d, the kernel K is universal on any compact subset of R|τ| andhas a covering number of the form N(K , ε) = On(ε−αd ) for a αd > 0on this compact subset,
the sequence of regularization parameters C ≡ (Cdn ) is such that for
each d, limn→+∞ nCdn = +∞ and Cd
n = On
(nβd−1
)for a 0 < βd <
1αd
,
and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the SVM classifier is universally consistent:
limd→+∞
limn→+∞
E (φn,τd ) = L∗.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 22 / 30
![Page 64: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/64.jpg)
Corollary: consistency of kernel based method forregression
Derivative based SVM consistency
Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)
for all d, the kernel K is universal on any compact subset of R|τ| ,
the sequence of regularization parameters C ≡ (Cdn ) is such that for
each d, limn→+∞ nCdn = +∞ and limn→+∞ n(Cd
n )4/3 = 0
and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the ridge kernel regression is universally consistent:
limd→+∞
limn→+∞
E (φn,τd ) = L∗.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 23 / 30
![Page 65: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/65.jpg)
Corollary: consistency of kernel based method forregression
Derivative based SVM consistency
Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)
for all d, the kernel K is universal on any compact subset of R|τ| ,
the sequence of regularization parameters C ≡ (Cdn ) is such that for
each d, limn→+∞ nCdn = +∞ and limn→+∞ n(Cd
n )4/3 = 0
and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .
Then, the ridge kernel regression is universally consistent:
limd→+∞
limn→+∞
E (φn,τd ) = L∗.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 23 / 30
![Page 66: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/66.jpg)
Corollary: consistency of kernel based method forregression
Derivative based SVM consistency
Suppose that assumptions (A1)-(A5) are fullfilled. Suppose also(A6)
for all d, the kernel K is universal on any compact subset of R|τ| ,
the sequence of regularization parameters C ≡ (Cdn ) is such that for
each d, limn→+∞ nCdn = +∞ and limn→+∞ n(Cd
n )4/3 = 0
and(A7) for all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .Then, the ridge kernel regression is universally consistent:
limd→+∞
limn→+∞
E (φn,τd ) = L∗.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 23 / 30
![Page 67: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/67.jpg)
Linear regression with noisy covariates
Let’s finally come back to the linear model
Y = 〈X , a〉L2([0,1]) + ε
with Y a real random variable, X a random variable taking its value inL2([0, 1]) and ε a centered real random variable independent of X .
But in the case of “noisy covariates”, we do not observe the couple(X ,Y) but (wi(t1), . . . ,wi(td), yi)i=1,...,n with
wi(tj) = xi(tj) + δi,j
where (xi , yi)i are (not necessarily independent) copies of (X ,Y) and δi,j
is a sequence of independent real centered independent variables that aresupposed to have 4th moment. Moreover, δi,j are supposed to beindependent of X and ε.In the following, we will assumed that the observations are centered toavoid notations’ difficulties.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 24 / 30
![Page 68: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/68.jpg)
Linear regression with noisy covariates
Let’s finally come back to the linear model
Y = 〈X , a〉L2([0,1]) + ε
with Y a real random variable, X a random variable taking its value inL2([0, 1]) and ε a centered real random variable independent of X .But in the case of “noisy covariates”, we do not observe the couple(X ,Y) but (wi(t1), . . . ,wi(td), yi)i=1,...,n with
wi(tj) = xi(tj) + δi,j
where (xi , yi)i are (not necessarily independent) copies of (X ,Y) and δi,j
is a sequence of independent real centered independent variables that aresupposed to have 4th moment. Moreover, δi,j are supposed to beindependent of X and ε.
In the following, we will assumed that the observations are centered toavoid notations’ difficulties.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 24 / 30
![Page 69: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/69.jpg)
Linear regression with noisy covariates
Let’s finally come back to the linear model
Y = 〈X , a〉L2([0,1]) + ε
with Y a real random variable, X a random variable taking its value inL2([0, 1]) and ε a centered real random variable independent of X .But in the case of “noisy covariates”, we do not observe the couple(X ,Y) but (wi(t1), . . . ,wi(td), yi)i=1,...,n with
wi(tj) = xi(tj) + δi,j
where (xi , yi)i are (not necessarily independent) copies of (X ,Y) and δi,j
is a sequence of independent real centered independent variables that aresupposed to have 4th moment. Moreover, δi,j are supposed to beindependent of X and ε.In the following, we will assumed that the observations are centered toavoid notations’ difficulties.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 24 / 30
![Page 70: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/70.jpg)
Splines estimators
In the case where xi are known, a spline estimate of a would be:
an := arg minh∈Hm
1n
n∑i=1
yi −1p
p∑j=1
h(tj)xi(tj)
2
+ ρ ‖h‖Hm
In the case where xi are only known through noisy covariates, a splineestimate of a can be accessed through the Total Least Square approach[Cardot et al., 2007]:
an := arg minh∈Hm ,(xi,j)i,j
1n
n∑i=1
yi −
1p
p∑j=1
h(tj)xi,j
2
+1p
p∑j=1
(xi,j − wi(tj))2
+ρ ‖h‖Hm
The solution is given by
an =1n
(1
npXT X + ρAm − pσ2
k Ip
)−1
XT Y
where σk is replaced, in practice, by σδp where σδ is an estimate of the
standard deviation of δ.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 25 / 30
![Page 71: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/71.jpg)
Splines estimators
In the case where xi are known, a spline estimate of a would be:
an := arg minh∈Hm
1n
n∑i=1
yi −1p
p∑j=1
h(tj)xi(tj)
2
+ ρ ‖h‖Hm
The solution is given by
an =1n
(1
npXT X + ρAm
)−1
XT Y
whereX = (xi(tj))i=1,...,n,j=1,...,p ;Y = (y1, . . . , yn)T ;Am is the matrix that defines the Hm-norm from the discrete samplingat (tj)j .
In the case where xi are only known through noisy covariates, a splineestimate of a can be accessed through the Total Least Square approach[Cardot et al., 2007]:
an := arg minh∈Hm ,(xi,j)i,j
1n
n∑i=1
yi −
1p
p∑j=1
h(tj)xi,j
2
+1p
p∑j=1
(xi,j − wi(tj))2
+ρ ‖h‖Hm
The solution is given by
an =1n
(1
npXT X + ρAm − pσ2
k Ip
)−1
XT Y
where σk is replaced, in practice, by σδp where σδ is an estimate of the
standard deviation of δ.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 25 / 30
![Page 72: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/72.jpg)
Splines estimators
In the case where xi are only known through noisy covariates, a splineestimate of a can be accessed through the Total Least Square approach[Cardot et al., 2007]:
an := arg minh∈Hm ,(xi,j)i,j
1n
n∑i=1
yi −
1p
p∑j=1
h(tj)xi,j
2
+1p
p∑j=1
(xi,j − wi(tj))2
+ρ ‖h‖Hm
The solution is given by
an =1n
(1
npXT X + ρAm − pσ2
k Ip
)−1
XT Y
where σk is replaced, in practice, by σδp where σδ is an estimate of the
standard deviation of δ.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 25 / 30
![Page 73: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/73.jpg)
Splines estimators
In the case where xi are only known through noisy covariates, a splineestimate of a can be accessed through the Total Least Square approach[Cardot et al., 2007]:
an := arg minh∈Hm ,(xi,j)i,j
1n
n∑i=1
yi −
1p
p∑j=1
h(tj)xi,j
2
+1p
p∑j=1
(xi,j − wi(tj))2
+ρ ‖h‖Hm
The solution is given by
an =1n
(1
npXT X + ρAm − pσ2
k Ip
)−1
XT Y
where σk is replaced, in practice, by σδp where σδ is an estimate of the
standard deviation of δ.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 25 / 30
![Page 74: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/74.jpg)
Assumptions for convergence of an
a belongs to Hm;
It exists a constant κ, 0 < κ < 1, such that, for every δ > 0, ∃C:P (|X(t) − X(s)| ≤ C |t − s|, s, t ∈ [0, 1]) ≥ 1 − δ
∃E ∈ R and, for all k ∈ N, a k -dimensional subspace, Lk , of L2 with
E
(inf
h∈Lksup
t
∣∣∣X(t) − h(t)∣∣∣2) ≤ Ek−2q
There is a constant F :Var
(〈Γn
Xζs , ζt 〉L2
)≤ F
nE(〈X − E (X) , ζs〉
2L2
)E
(〈X − E (X) , ζt 〉
2L2
)For each δ > 0, ∃D: P
(1√p
∥∥∥∥ 1np XT Xa
∥∥∥∥Rp> D
)≥ 1 − δ
np−2κ = O(1), limn,p→+∞ ρ = 0 and limn,p→+∞1
nρ = 0.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 26 / 30
![Page 75: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/75.jpg)
Convergence of an
Theorem [Crambes et al., 2008]Under the previous assumptions,
∥∥∥an − a∥∥∥
ΓX= OP
(1
npρ+
1n
+ n−(2q+1)/2)
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 27 / 30
![Page 76: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/76.jpg)
Application to prediction of ozone
The data is a time series of the maximum of ozone rate each day inToulouse (France).
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 28 / 30
![Page 77: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/77.jpg)
Table of contents
1 Introduction to the sampling problem
2 Approximating functions with splines
3 Using splines in functional models based on sampling
4 References
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 29 / 30
![Page 78: Influence of the sampling on Functional Data Analysis](https://reader033.vdocuments.mx/reader033/viewer/2022042814/5550b5a3b4c905fa618b4a9b/html5/thumbnails/78.jpg)
References
Further details for the references are given in the joint document.
Berlinet, A. and Thomas-Agnan, C. (2004).Reproducing Kernel Hilbert Spaces in Probability and Statistics.Kluwer Academic Publisher.
Cardot, H., Crambes, C., Kneip, A., and Sarda, P. (2007).Smoothing splines estimators in functional linear regression witherrors-in-variables.Computational Statistics and Data Analysis, 51:4832–4848.
Christmann, A. and Steinwart, I. (2007).Consistency and robustness of kernel-based regression in convex riskminimization.Bernouilli, 13(3):799–819.
Crambes, C., Kneip, A., and Sarda, P. (2008).Smoothing splines estimators for functional linear regression.Annals of Statistics.
Kimeldorf, G. and Wahba, G. (1971).Some results on Tchebycheffian spline functions.Journal of Mathematical Analysis and Applications, 33(1):82–95.
Ragozin, D. (1983).Error bounds for derivative estimation based on spline smoothing ofexact or noisy data.Journal of Approximation Theory, 37:335–355.
Rossi, F. and Villa, N. (2008).Classification and regression based on derivatives: a consistancyresult applied to functional kernel based classification and regression.Work in progress.
Steinwart, I. (2002).Support vector machines are universally consistent.Journal of Complexity, 18:768–791.
Nathalie Villa (IMT & UPVD) Presentation 4 La Havane, Sept. 18th, 2008 30 / 30