functional calibration estimation by the maximum entropy ... · maximum entropy on the mean...
TRANSCRIPT
Functional Calibration Estimation by the Maximum
Entropy on the Mean Principle
Statistics: A Journal of Theoretical and Applied Statistics, 49, 989-1004. 2015.
Santiago Gallón
(Joint work with F. Gamboa and J-M. Loubes)
Departamento de Matemáticas y Estadística
Facultad de Ciencias Económicas − Universidad de Antioquia
Institut de Mathématiques de Toulouse, Université Toulouse III − Paul Sabatier
Seminario Institucional − Escuela de Estadística
Facultad de Ciencias − Universidad de Nacional de Colombia
5 de septiembre, 2016
Outline
The problem of calibration estimation
Functional calibration estimation
Maximum entropy on the mean −MEM− principle
Simulation study
References
The problem of calibration estimation I
• Let UN = {1, . . . , N} be a finite survey population.
• Assume that there is a survey variable Yi, i ∈ UN .
The goal: Estimate the finite population mean
µY = N−1∑
i∈UN
Yi,
based on a sample a ⊂ UN of n elements.
• The unbiased Horvitz-Thompson −HT− estimator:
µHTY = N−1
∑
i∈a
πi−1Yi = N−1
∑
i∈a
diYi, πi = P(i ∈ a).
• The HT estimator has low precision due to bad sample selection,
• The inference can be improved modifying the weights di = π−1i .
The problem of calibration estimation II• A method can be conducted by incorporating auxiliary information
Xi ∈ Rq, i ∈ UN , q ≥ 1, (Deville and Särndal (1992)).
Sources of auxiliary information:
X Census data
X Administrative registers
X Previous surveys
• Calibration estimation: modify the weights of µHTY by new weights
wi close enough to di’s according to some distance D(w, d), s.t.
N−1∑
i∈a
wiXi = µX = N−1∑
i∈UN
Xi (Calibration equation).
• The weights wi generally result in estimates with smaller variance.
• The calibration estimator of µY is given by
µY = N−1∑
i∈a
wiYi.
Functional calibration estimation
Our aim: Calibration estimation for µY , where Y (t) ∈ C([0, T ]) using
functional information X(t) ∈ C(
[0, T ] ;Rq)
.
• The functional calibration constraint:
N−1∑
i∈a
wi(t)Xi(t) = µX(t) = N−1∑
i∈UN
Xi(t), ∀t ∈ [0, T ] .
• wi(t) are obtained by matching the calibration estimation with the
maximum entropy on the mean principle.
• Example (Chaouch and Goga (2012)): Estimation of the mean
electricity consumption (every 30 min.) during the 2nd week using
the consumption in the 1st week as auxiliary information of 18902
meters sending electricity consumption for two weeks.
Maximum entropy on the mean −MEM− principle I
• Proposed by Navaza (1985, 1986) to solve an inverse problem in
crystallography.
• It is an alternative to the Tikhonov’s regularization of ill-posed
inverse problems.
• Maximum entropy solutions provide a simple and natural way to
incorporate constraints on the solution.
• Let be the linear inverse problem
y = Kx,
where K : X → Y is a known bounded linear operator.
• The MEM considers x as the expectation of a r.v. X, such that
y = KEν(X),
where ν is an unknown probability measure.
Maximum entropy on the mean −MEM− principle II
• The solution: x = Eν∗(X), where ν∗ maximizes the entropy
S(ν ‖ υ) = −D(ν ‖ υ), subject to y = KEν(X),
where D(ν ‖ υ) is the Kullback-Leibler divergence,
D(ν ‖ υ) =
{
∫
log(
dνdυ
)
dν if ν ≪ υ
+∞ otherwise,
• The MEM principle focuses on reconstructing a measure ν∗ that
maxS(ν ‖ υ) between a feasible finite measure ν w.r.t. a given
measure υ subject to a linear constraint.
Reconstruction of ν∗ I
• ν∗ is rebuilt by the random measure method for infinite dimensional
inverse problems (Gzyl and Velásquez (2011)).
• The calibration constraint is expressed as an infinite-dimensional
linear inverse problem, writing wi(t) as
wi(t) =
∫ 1
0K(s, t)i (s) ds+ di for each i ∈ a,
where K(s, t) is a kernel function, i = Eν [Wi (s)] and W is a
stochastic process.
Reconstruction of ν∗ II
• The inverse problem takes the form of a Fredholm integral equation
of the first kind,
Eν [KW] = Eν
{
∑
i∈a
[∫ 1
0K(s, t)dWi (s) + di
]
Xi(t)
}
=
∫ 1
0
∑
i∈a
K(s, t)Xi(t)i (s) ds+∑
i∈a
diXi(t)
= NµX(t)
• The functions ∗i (s) that solve Eν [KW] = NµX(t) are obtained
by considering i (s) as a density of a measure
dWi (s) = i (s) ds, i ∈ a.
The main theoremTheorem (Csiszár (1984)). Let υ be a prior p.m., λ = λ(t) a measure
in the class M (C [0, 1] ;Rq), and V = {ν ≪ υ : Zυ(λ) < +∞}, where
Zυ(λ) = Eυ [exp {〈λ,KW〉}], with
〈λ,KW〉 =
∫ 1
0λ⊤(dt)
(
∫ 1
0
∑
i∈a
K(s, t)Xi(t)dWi(s) +∑
i∈a
diXi(t)
)
.
Then, there exists a unique p.m.
ν∗ = argmaxν∈V
S(ν ‖ υ) subject to Eν [KW] = NµX(t),
which is achieved at
dν∗/dυ = Z−1υ (λ∗) exp {〈λ∗,KW〉} ,
where λ∗(t) minimizes the functional
Hυ(λ) = logZυ(λ)− 〈λ, NµX〉.
Our results
Lemma 1. Let υ be a prior centered stationary Gaussian measure on(
C([0, 1]),B(C([0, 1])))
, and λ(t) ∈ M (C [0, 1]q). Then,
wi(t) =∫ 10 K(s, t)∗(s)ds+ di, i ∈ a, where
∗(s) =∑
i′∈a
∫ 1
0K(s, t′)X⊤
i′ (t′)λ∗(dt′).
Lemma 2. Let Wi(s) =∑N(s)
k=1 ξik be a compound Poisson process,
where N(s) is a Poisson process on [0, 1] with parameter γ > 0, and
ξik, k ≥ 1 are i.i.d. r.v.’s for each i ∈ a with distribution u independent
of N(s). Then, wi(t) =∫ 10 K(s, t)∗
i (s)ds+ di, i ∈ a, where
∗i (s) =
∫
R
ξi exp
{
∑
i∈a
∫ 1
0K(s, t)ξiX
⊤i (t)λ
∗(dt)
}
u (dξi) .
Simulation study I
Yi(t) = α(t) +Xi(t)⊤β(t) + εi(t), i ∈ UN ,
where
• α(t) = 1.2 + 2.3 cos (2πt) + 4.2 sin (2πt),
• β(t) = (β1(t), β2(t))⊤ β1(t) = cos (10t) and β2(t) = t sin (15t),
• εi(t)i.i.d.∼N
(
0, σ2ε(1 + t)
)
with σ2ε = 0.1, independent of Xi(t).
• Xi(t) = (Xi1(t),Xi2(t))⊤
, where Xi1(t) = Ui1 + f1(t) and
Xi2(t) = Ui2 + f2(t) with f1(t) = 3 sin(3πt+ 3),
f2(t) = − cos(πt), Ui1i.i.d.∼ U [−1, 1.3] and Ui2
i.i.d.∼ U [−0.5, 0.5].
• Uniform fixed-size sampling design sample a ∈ UN of n = 0.12Nelements without replacement, with N = 1000.
• K(t, s) = exp{
− |t− s|2 /2σ2}
with σ2 = 0.5.
• For the compound Poisson case ξii.i.d.∼ U [−1, 1], and γ = 1.
Simulation study II
Table : Bias-variance decomposition of MSE
Functional estimator MSE Bias2 Variance
Horvitz-Thompson 0.2391 0.0005 0.2386
MEM (Gaussian) 0.2001 0.0006 0.1995
MEM (Poisson) 0.2333 0.0084 0.2249
0.0 0.2 0.4 0.6 0.8 1.0
−4
−2
0
2
4 X i1(t)
0.0 0.2 0.4 0.6 0.8 1.0
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5 X i2(t)
0.0 0.2 0.4 0.6 0.8 1.0
−5
0
5
10Y i(t) Gaussian measure:
TheoreticalHTMEM
0.0 0.2 0.4 0.6 0.8 1.0
−5
0
5
10Y i(t) Compound Poisson measure:
TheoreticalHTMEM
References I
M. Chaouch and C. Goga. Using complex surveys to estimate the l1-median of a functional
variable: application to electricity load curves. Int. Stat. Rev., 80:40–59, 2012.
I. Csiszár. I-divergence geometry of probability distributions and minimization problems. The
Annals of Probability, 3:146–158, 1975.
I. Csiszár. Sanov property, generalized I-projection and a conditional limit theorem. The Annals
of Probability, 12:768–793, 1984.
J. C. Deville and C. E. Särndal. Calibration estimators in survey sampling. Journal of the
American Statistical Association, 87:376–382, 1992.
H. Engl, M. Hanke, and A. Neubauer. Regularization of Inverse Problems, volume 375 of
Mathematics and Its Applications. Kluwer Academic Publishers, Dordrecht, 2000.
W. A. Fuller. Sampling Statistics. Wiley Series in Survey Methodology. Wiley, 2009.
F. Gamboa. Méthode du maximum d’entropie sur la moyenne et applications. PhD thesis,
Université de Paris-Sud, Orsay, 1989.
H. Gzyl and Y. Velásquez. Linear Inverse Problems: the Maximum Entropy Connection,
volume 83 of Series on Advances in Mathematics for Applied Sciences. World Scientific, 2011.
D. G. Horvitz and D. J. Thompson. A generalization of sampling without replacement from a
finite universe. Journal of the American Statistical Association, 47:663–685, 1953.
J. Navaza. On the maximum entropy estimate of electron density function. Acta
Crystallographica, A41:232–244, 1985.
J. Navaza. The use of non-local constraints in maximum-entropy electron density reconstruction.
Acta Crystallographica, A42:212–223, 1986.
Thanks!!!
Zυ(λ) = exp
{
Eυ [〈λ,KW〉] +1
2Varυ [〈λ,KW〉]
}
=
exp
∑
i∈a
di
∫ 1
0λ⊤(dt)Xi(t) +
1
2
∫ 1
0
(
∑
i∈a
∫ 1
0K(s, t)λ⊤(dt)Xi(t)
)2
ds
due to Eυ [dWi (s)] = 0, and Varυ [dWi (s)] = ds, i ∈ a.
Hυ(λ) = logZυ(λ)− 〈λ, NµX〉 =
1
2
∫ 1
0
(
∑
i∈a
∫ 1
0K(s, t)λ⊤(dt)X i(t)
)(
∑
i′∈a
∫ 1
0K(s, t′)λ⊤(dt′)Xi′(t
′)
)
ds
+
∫ 1
0λ⊤(dt)
(
∑
i∈a
diXi(t)−NµX(t)
)
.
λ∗(dt) that minimizes Hυ(λ) is given by
∑
i∈a
∑
i′∈a
∫ 1
0
∫ 1
0K(s, t)K(s, t′)Xi(t)X
⊤i′ (t
′)λ∗(dt′)ds+∑
i∈a
diXi(t) = NµX
which can be rewritten as
∑
i∈a
[
∫ 1
0K(s, t)
(
∑
i′∈a
∫ 1
0K(s, t′)X⊤
i′ (t′)λ∗(dt′)
)
ds+ di
]
Xi(t) = NµX(
mi ((a, b]) , Wi(b)−Wi(a) =
N(b)∑
k=N(a)+1
ξik.
By the Lévy-Khintchine formula for Lévy processes, the m.g.f. of
W(s) is
Eυ [exp {〈α,W(s)〉}] = exp
{
sγ
∫
Rn
(
e〈α,ξk〉 − 1)
u (dξk)
}
, α ∈ Rn,
Eυ [exp {〈g(s),dWi〉}] = exp
{
γ
∫ 1
0ds
∫
R
(exp {g (s) ξi} − 1) u (dξi)
}
.
〈λ,KW〉
=
∫ 1
0λ⊤(dt)
∫ 1
0
∑
i∈a
K(s, t)Xi(t)dWi(s) +
∫ 1
0λ⊤(dt)
∑
i∈a
diXi(t)
where g(s) =∫ 10 λ⊤(dt)
∑
i∈aK(s, t)X i(t).
Zυ(λ)
= exp
{
γ
∫ 1
0ds
∫
R
(exp {g (s) ξi} − 1) u (dξi) +
⟨
λ,∑
i∈a
diXi(t)
⟩}
Hυ(λ)
= γ
∫ 1
0ds
∫
R
(
exp
{
∫ 1
0λ⊤(dt)
∑
i∈a
K(s, t)ξiXi(t)
}
− 1
)
u (dξi)
+
∫ 1
0λ⊤(dt)
(
∑
i∈a
diXi(t)−NµX(t)
)
.
∑
i∈a
[
∫ 1
0K(s, t)
(
∫
R
ξi exp
{
∑
i∈a
∫ 1
0K(s, t)ξiX
⊤i (t)λ
∗(dt)
}
u (dξi)
)
ds
+di]Xi(t) = NµX(t)