functional calibration estimation by the maximum entropy ... · maximum entropy on the mean...

Functional Calibration Estimation by the Maximum

Entropy on the Mean Principle

Statistics: A Journal of Theoretical and Applied Statistics, 49, 989-1004. 2015.

Santiago Gallón

(Joint work with F. Gamboa and J-M. Loubes)

Departamento de Matemáticas y Estadística

Facultad de Ciencias Económicas − Universidad de Antioquia

Institut de Mathématiques de Toulouse, Université Toulouse III − Paul Sabatier

Seminario Institucional − Escuela de Estadística

Facultad de Ciencias − Universidad de Nacional de Colombia

5 de septiembre, 2016

Outline

The problem of calibration estimation

Functional calibration estimation

Maximum entropy on the mean −MEM− principle

Simulation study

References

The problem of calibration estimation I

• Let UN = {1, . . . , N} be a finite survey population.

• Assume that there is a survey variable Yi, i ∈ UN .

The goal: Estimate the finite population mean

µY = N−1∑

i∈UN

Yi,

based on a sample a ⊂ UN of n elements.

• The unbiased Horvitz-Thompson −HT− estimator:

µHTY = N−1

∑

i∈a

πi−1Yi = N−1

∑

i∈a

diYi, πi = P(i ∈ a).

• The HT estimator has low precision due to bad sample selection,

• The inference can be improved modifying the weights di = π−1i .

The problem of calibration estimation II• A method can be conducted by incorporating auxiliary information

Xi ∈ Rq, i ∈ UN , q ≥ 1, (Deville and Särndal (1992)).

Sources of auxiliary information:

X Census data

X Administrative registers

X Previous surveys

• Calibration estimation: modify the weights of µHTY by new weights

wi close enough to di’s according to some distance D(w, d), s.t.

N−1∑

i∈a

wiXi = µX = N−1∑

i∈UN

Xi (Calibration equation).

• The weights wi generally result in estimates with smaller variance.

• The calibration estimator of µY is given by

µY = N−1∑

i∈a

wiYi.

Functional calibration estimation

Our aim: Calibration estimation for µY , where Y (t) ∈ C([0, T ]) using

functional information X(t) ∈ C(

[0, T ] ;Rq)

.

• The functional calibration constraint:

N−1∑

i∈a

wi(t)Xi(t) = µX(t) = N−1∑

i∈UN

Xi(t), ∀t ∈ [0, T ] .

• wi(t) are obtained by matching the calibration estimation with the

maximum entropy on the mean principle.

• Example (Chaouch and Goga (2012)): Estimation of the mean

electricity consumption (every 30 min.) during the 2nd week using

the consumption in the 1st week as auxiliary information of 18902

meters sending electricity consumption for two weeks.

Maximum entropy on the mean −MEM− principle I

• Proposed by Navaza (1985, 1986) to solve an inverse problem in

crystallography.

• It is an alternative to the Tikhonov’s regularization of ill-posed

inverse problems.

• Maximum entropy solutions provide a simple and natural way to

incorporate constraints on the solution.

• Let be the linear inverse problem

y = Kx,

where K : X → Y is a known bounded linear operator.

• The MEM considers x as the expectation of a r.v. X, such that

y = KEν(X),

where ν is an unknown probability measure.

Maximum entropy on the mean −MEM− principle II

• The solution: x = Eν∗(X), where ν∗ maximizes the entropy

S(ν ‖ υ) = −D(ν ‖ υ), subject to y = KEν(X),

where D(ν ‖ υ) is the Kullback-Leibler divergence,

D(ν ‖ υ) =

{

∫

log(

dνdυ

)

dν if ν ≪ υ

+∞ otherwise,

• The MEM principle focuses on reconstructing a measure ν∗ that

maxS(ν ‖ υ) between a feasible finite measure ν w.r.t. a given

measure υ subject to a linear constraint.

Reconstruction of ν∗ I

• ν∗ is rebuilt by the random measure method for infinite dimensional

inverse problems (Gzyl and Velásquez (2011)).

• The calibration constraint is expressed as an infinite-dimensional

linear inverse problem, writing wi(t) as

wi(t) =

∫ 1

0K(s, t)i (s) ds+ di for each i ∈ a,

where K(s, t) is a kernel function, i = Eν [Wi (s)] and W is a

stochastic process.

Reconstruction of ν∗ II

• The inverse problem takes the form of a Fredholm integral equation

of the first kind,

Eν [KW] = Eν

{

∑

i∈a

[∫ 1

0K(s, t)dWi (s) + di

]

Xi(t)

}

=

∫ 1

0

∑

i∈a

K(s, t)Xi(t)i (s) ds+∑

i∈a

diXi(t)

= NµX(t)

• The functions ∗i (s) that solve Eν [KW] = NµX(t) are obtained

by considering i (s) as a density of a measure

dWi (s) = i (s) ds, i ∈ a.

The main theoremTheorem (Csiszár (1984)). Let υ be a prior p.m., λ = λ(t) a measure

in the class M (C [0, 1] ;Rq), and V = {ν ≪ υ : Zυ(λ) < +∞}, where

Zυ(λ) = Eυ [exp {〈λ,KW〉}], with

〈λ,KW〉 =

∫ 1

0λ⊤(dt)

(

∫ 1

0

∑

i∈a

K(s, t)Xi(t)dWi(s) +∑

i∈a

diXi(t)

)

.

Then, there exists a unique p.m.

ν∗ = argmaxν∈V

S(ν ‖ υ) subject to Eν [KW] = NµX(t),

which is achieved at

dν∗/dυ = Z−1υ (λ∗) exp {〈λ∗,KW〉} ,

where λ∗(t) minimizes the functional

Hυ(λ) = logZυ(λ)− 〈λ, NµX〉.

Our results

Lemma 1. Let υ be a prior centered stationary Gaussian measure on(

C([0, 1]),B(C([0, 1])))

, and λ(t) ∈ M (C [0, 1]q). Then,

wi(t) =∫ 10 K(s, t)∗(s)ds+ di, i ∈ a, where

∗(s) =∑

i′∈a

∫ 1

0K(s, t′)X⊤

i′ (t′)λ∗(dt′).

Lemma 2. Let Wi(s) =∑N(s)

k=1 ξik be a compound Poisson process,

where N(s) is a Poisson process on [0, 1] with parameter γ > 0, and

ξik, k ≥ 1 are i.i.d. r.v.’s for each i ∈ a with distribution u independent

of N(s). Then, wi(t) =∫ 10 K(s, t)∗

i (s)ds+ di, i ∈ a, where

∗i (s) =

∫

R

ξi exp

{

∑

i∈a

∫ 1

0K(s, t)ξiX

⊤i (t)λ

∗(dt)

}

u (dξi) .

Simulation study I

Yi(t) = α(t) +Xi(t)⊤β(t) + εi(t), i ∈ UN ,

where

• α(t) = 1.2 + 2.3 cos (2πt) + 4.2 sin (2πt),

• β(t) = (β1(t), β2(t))⊤ β1(t) = cos (10t) and β2(t) = t sin (15t),

• εi(t)i.i.d.∼N

(

0, σ2ε(1 + t)

)

with σ2ε = 0.1, independent of Xi(t).

• Xi(t) = (Xi1(t),Xi2(t))⊤

, where Xi1(t) = Ui1 + f1(t) and

Xi2(t) = Ui2 + f2(t) with f1(t) = 3 sin(3πt+ 3),

f2(t) = − cos(πt), Ui1i.i.d.∼ U [−1, 1.3] and Ui2

i.i.d.∼ U [−0.5, 0.5].

• Uniform fixed-size sampling design sample a ∈ UN of n = 0.12Nelements without replacement, with N = 1000.

• K(t, s) = exp{

− |t− s|2 /2σ2}

with σ2 = 0.5.

• For the compound Poisson case ξii.i.d.∼ U [−1, 1], and γ = 1.

Simulation study II

Table : Bias-variance decomposition of MSE

Functional estimator MSE Bias2 Variance

Horvitz-Thompson 0.2391 0.0005 0.2386

MEM (Gaussian) 0.2001 0.0006 0.1995

MEM (Poisson) 0.2333 0.0084 0.2249

0.0 0.2 0.4 0.6 0.8 1.0

−4

−2

0

2

4 X i1(t)

0.0 0.2 0.4 0.6 0.8 1.0

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5 X i2(t)

0.0 0.2 0.4 0.6 0.8 1.0

−5

0

5

10Y i(t) Gaussian measure:

TheoreticalHTMEM

0.0 0.2 0.4 0.6 0.8 1.0

−5

0

5

10Y i(t) Compound Poisson measure:

TheoreticalHTMEM

References I

M. Chaouch and C. Goga. Using complex surveys to estimate the l1-median of a functional

variable: application to electricity load curves. Int. Stat. Rev., 80:40–59, 2012.

I. Csiszár. I-divergence geometry of probability distributions and minimization problems. The

Annals of Probability, 3:146–158, 1975.

I. Csiszár. Sanov property, generalized I-projection and a conditional limit theorem. The Annals

of Probability, 12:768–793, 1984.

J. C. Deville and C. E. Särndal. Calibration estimators in survey sampling. Journal of the

American Statistical Association, 87:376–382, 1992.

H. Engl, M. Hanke, and A. Neubauer. Regularization of Inverse Problems, volume 375 of

Mathematics and Its Applications. Kluwer Academic Publishers, Dordrecht, 2000.

W. A. Fuller. Sampling Statistics. Wiley Series in Survey Methodology. Wiley, 2009.

F. Gamboa. Méthode du maximum d’entropie sur la moyenne et applications. PhD thesis,

Université de Paris-Sud, Orsay, 1989.

H. Gzyl and Y. Velásquez. Linear Inverse Problems: the Maximum Entropy Connection,

volume 83 of Series on Advances in Mathematics for Applied Sciences. World Scientific, 2011.

D. G. Horvitz and D. J. Thompson. A generalization of sampling without replacement from a

finite universe. Journal of the American Statistical Association, 47:663–685, 1953.

J. Navaza. On the maximum entropy estimate of electron density function. Acta

Crystallographica, A41:232–244, 1985.

J. Navaza. The use of non-local constraints in maximum-entropy electron density reconstruction.

Acta Crystallographica, A42:212–223, 1986.

Thanks!!!

Zυ(λ) = exp

{

Eυ [〈λ,KW〉] +1

2Varυ [〈λ,KW〉]

}

=

exp

∑

i∈a

di

∫ 1

0λ⊤(dt)Xi(t) +

1

2

∫ 1

0

(

∑

i∈a

∫ 1

0K(s, t)λ⊤(dt)Xi(t)

)2

ds

due to Eυ [dWi (s)] = 0, and Varυ [dWi (s)] = ds, i ∈ a.

Hυ(λ) = logZυ(λ)− 〈λ, NµX〉 =

1

2

∫ 1

0

(

∑

i∈a

∫ 1

0K(s, t)λ⊤(dt)X i(t)

)(

∑

i′∈a

∫ 1

0K(s, t′)λ⊤(dt′)Xi′(t

′)

)

ds

+

∫ 1

0λ⊤(dt)

(

∑

i∈a

diXi(t)−NµX(t)

)

.

λ∗(dt) that minimizes Hυ(λ) is given by

∑

i∈a

∑

i′∈a

∫ 1

0

∫ 1

0K(s, t)K(s, t′)Xi(t)X

⊤i′ (t

′)λ∗(dt′)ds+∑

i∈a

diXi(t) = NµX

which can be rewritten as

∑

i∈a

[

∫ 1

0K(s, t)

(

∑

i′∈a

∫ 1

0K(s, t′)X⊤

i′ (t′)λ∗(dt′)

)

ds+ di

]

Xi(t) = NµX(

mi ((a, b]) , Wi(b)−Wi(a) =

N(b)∑

k=N(a)+1

ξik.

By the Lévy-Khintchine formula for Lévy processes, the m.g.f. of

W(s) is

Eυ [exp {〈α,W(s)〉}] = exp

{

sγ

∫

Rn

(

e〈α,ξk〉 − 1)

u (dξk)

}

, α ∈ Rn,

Eυ [exp {〈g(s),dWi〉}] = exp

{

γ

∫ 1

0ds

∫

R

(exp {g (s) ξi} − 1) u (dξi)

}

.

〈λ,KW〉

=

∫ 1

0λ⊤(dt)

∫ 1

0

∑

i∈a

K(s, t)Xi(t)dWi(s) +

∫ 1

0λ⊤(dt)

∑

i∈a

diXi(t)

where g(s) =∫ 10 λ⊤(dt)

∑

i∈aK(s, t)X i(t).

Zυ(λ)

= exp

{

γ

∫ 1

0ds

∫

R

(exp {g (s) ξi} − 1) u (dξi) +

⟨

λ,∑

i∈a

diXi(t)

⟩}

Hυ(λ)

= γ

∫ 1

0ds

∫

R

(

exp

{

∫ 1

0λ⊤(dt)

∑

i∈a

K(s, t)ξiXi(t)

}

− 1

)

u (dξi)

+

∫ 1

0λ⊤(dt)

(

∑

i∈a

diXi(t)−NµX(t)

)

.

∑

i∈a

[

∫ 1

0K(s, t)

(

∫

R

ξi exp

{

∑

i∈a

∫ 1

0K(s, t)ξiX

⊤i (t)λ

∗(dt)

}

u (dξi)

)

ds

+di]Xi(t) = NµX(t)

functional calibration estimation by the maximum entropy ... · maximum entropy on the mean...

Documents